Intelligence Architecture
The intelligence layer is a multi-system pipeline that transforms raw documents into structured, cross-workspace knowledge. It spans document extraction, file propagation, identity resolution, ownership graph computation, compliance monitoring, and human-in-the-loop oversight - all wired together through event-driven side effects.
Architecture Flow
The following diagram shows how a document upload cascades through the intelligence pipeline - from file registration and extraction through to cross-workspace presence and CSP intelligence side effects.
┌─────────────────────────────────────────────┐
│ DOCUMENT UPLOAD │
│ (matter, submission, direct) │
└──────────────────┬──────────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌───────────────────┐ ┌──────────────────┐
│ File │ │ Extraction │ │ Embedding │
│ Registry │ │ Engine │ │ Engine │
│ │ │ → dynamic cols │ │ → vector chunks │
└──────┬───────┘ └───────────────────┘ └──────────────────┘
│
┌──────────┼───────────────────────┐
▼ ▼ ▼
┌───────────┐ ┌─────────────────┐ ┌──────────────────────┐
│ Propagate │ │ Role Classify │ │ Presence Trigger │
│ to entity │ │ from folder │ │ (KYC/EVIDENCE roles) │
│ rows │ │ path │ │ │
└─────┬─────┘ └────────┬────────┘ └──────────┬───────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────┐ ┌─────────────────────────────┐
│ Files linked to company row │ │ Identity Resolution │
│ Files linked to stakeholder │ │ (secure hash → identity │
│ (with classified role) │ │ → document presence) │
└──────────────────────────────┘ └──────────┬──────────────────┘
│
▼
┌──────────────────────────┐
│ Cross-Workspace │
│ Discovery & Access │
│ (presence → grant → │
│ stream) │
└──────────────────────────┘
Meanwhile, on the CSP intelligence side:
┌──────────────┐ ┌────────────────┐ ┌─────────────────────┐
│ CspRole │───▶│ Register Entry │───▶│ Entity Event │
│ changes │ │ (immutable) │ │ (OWNERSHIP_CHANGE) │
└──────────────┘ └────────────────┘ └──────────┬──────────┘
│
┌─────────────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌────────────────────┐ ┌────────────────┐
│ UBO Recompute │ │ Compliance Recheck │ │ Presence │
│ (ownership chain │ │ (obligation status │ │ Advertise │
│ → snapshot) │ │ degradation) │ │ (cross-WS) │
└──────────────────┘ └────────────────────┘ └────────────────┘
Document Extraction Pipeline
Raw documents are processed through the extraction engine to produce typed extraction results that become dynamic table columns. Each extraction goes through quality scoring to determine usability.
Upload → Extraction Job (PENDING)
→ Document extraction engine
→ Extraction results with typed fields
→ Dynamic table columns created (capped at 50/table)
→ Overflow items queued for manual review
→ Quality score computed (0-1)
Key Details
| Feature | Description |
|---|---|
| Column dedup | Automatic deduplication prevents duplicate columns |
| Type inference | DATE, NUMBER, TEXT inferred from sample values |
| Quality scoring | Structure score (0-1) measures parse quality; threshold determines usability |
| Overflow | Tables with 50+ extraction columns queue new extractions for manual review |
File Registry & Smart Folders
Every file in the system gets an index entry. Link edges connect files to any entity - the same file appears in multiple views simultaneously.
File Record
Metadata index for each file. Tracks filename, MIME type, size, checksum, source type (UPLOAD, GENERATED, IMPORTED), sensitivity level (PUBLIC to RESTRICTED), version chain, and virtual folder path for tree navigation.
File Link
Connects a file to any entity. Supports entity types (ROW, MATTER, MATTER_STEP, SUBMISSION, TEMPLATE, DOCUMENT, GENERATED_DOC) and roles (ATTACHMENT, OUTPUT, SOURCE, KYC, EVIDENCE, SIGNATURE, COVER).
Smart Folder Types
| Folder | Source |
|---|---|
| Matters | Files with entityType = MATTER or MATTER_STEP |
| Individuals | Files with entityType = ROW on Individuals system table |
| Companies | Files with entityType = ROW on Companies system table |
| Recent | Files created in last 7 days |
| Starred | User bookmarks |
| Unlinked | Files with no entity links |
Entity File Propagation
The propagation system automatically creates file links from matter files to the companies and stakeholders the matter serves. This is the core mechanism that makes files appear in multiple places - like a company-centric file model.
Entity Link Types
| Link Type | Meaning | Resolves To |
|---|---|---|
| PRIMARY_ENTITY | The main company this matter serves | Company row |
| STAKEHOLDER | An individual involved in the matter | Stakeholder row |
| PARENT_ENTITY | Parent company in hierarchy | Company row |
| SUBSIDIARY | Child entity | Company row |
| RELATED_PARTY | Other related entity | Company row |
Propagation Flow
File uploaded to matter
→ File record created with matter link
→ Automatic propagation to linked entities
│
├─ Load entity links for this matter
├─ For each PRIMARY_ENTITY / PARENT / SUBSIDIARY:
│ └─ Link file to company row with classified role
├─ For each STAKEHOLDER:
│ └─ Link file to stakeholder row with classified role
└─ File role determined from folder path
Role Classification from Folder Path
| Folder Pattern | Role | Meaning |
|---|---|---|
/KYC/, /Compliance/, /PINCA | KYC | Know Your Customer documents |
/Filings/, /Share Certificates/, /Registers/ | EVIDENCE | Regulatory evidence |
/Constitutional Docs/, /Resolutions & Minutes/ | EVIDENCE | Corporate governance evidence |
/Signed/ | SIGNATURE | Executed documents |
/Welcome Pack/ | OUTPUT | Generated output documents |
| Everything else | ATTACHMENT | General attachments |
Deep File Aggregation Queries
| Query | Aggregates From |
|---|---|
| Entity deep files | Direct company files + all matter files + all stakeholder files with roles in entity |
| Individual deep files | Direct stakeholder files + all matter files + bridged KYC documents |
| Stakeholder file timeline | Direct uploads + entity files + matter files + KYC docs (sorted chronologically, deduplicated) |
Identity Resolution
Identity resolution maps workspace-specific table rows to workspace-agnostic identity records using securely hashed identifiers. No plaintext identity data is ever stored in the global layer.
Three-Layer Architecture
Layer 1: Global Identity
│ Workspace-agnostic person or company
│ Types: INDIVIDUAL, ENTITY
│
├── Layer 2: Hashed Identifiers
│ Securely hashed identity fields
│ Types: EMAIL, PASSPORT_NUMBER, EMIRATES_ID,
│ COMPANY_REG_NUMBER, TAX_ID, NATIONAL_ID
│ Supports versioned key rotation
│
└── Layer 3: Identity Links
Maps workspace row → Global Identity
Includes confidence score (0-1), disputed flag, expiry
Resolution Flow
Row created/updated in Individuals or Companies table
→ Identity resolution triggered automatically
→ Check: is identity resolution enabled for this workspace?
(requires active oversight relationship)
→ Extract identity fields by table type:
INDIVIDUALS: Email, Passport Number, Emirates ID
COMPANIES: Company Number, Tax ID
→ Normalize and securely hash each field
→ Search for matching identifiers
→ Compute confidence score
→ Decision:
High confidence: Auto-link to existing identity
Low confidence: Queue for human verification
No match: Create new identity record
→ Resolution logged for audit trail
Confidence Scoring
| Match Criteria | Score | Action |
|---|---|---|
| 1 email match | 0.3 | Queued for human verification |
| 1 government ID match | 0.5 | Queued for human verification |
| 2 field matches | 0.7 | Auto-linked to identity |
| 3+ field matches | 0.9 | Auto-linked to identity |
| All government IDs match | 1.0 | Auto-linked to identity |
Key Rotation: Versioned keys support zero-downtime secret rotation, with automatic verification against any active key version.
Document Presence
Document presence is the discovery layer that enables workspaces to know a document exists for a shared identity - without revealing the document's content. Only existence, type, and expiry are shared.
How Presence Is Created
File linked with KYC or EVIDENCE role to a row
→ Presence trigger fires automatically
→ Identity link found for the row
→ Document presence record created (VALID status)
→ If row is a CSP stakeholder:
→ Cross-workspace entity presence refreshed
Discovery to Access Flow
Overseer workspace sees document presence:
"John D. has a PASSPORT document (expires 2027-03)"
Content NOT revealed, only existence + type + expiry
│
▼
POST /api/files/[fileId]/request-access
│
├─ Routes to appropriate approver
├─ Auto-approve (if within configured thresholds)
└─ OR queued for admin review
│
▼
On approval: access grant created
│
▼
GET /api/files/shared/[fileId]?grantId=...
→ Validates grant → streams file bytes
Presence Statuses
| Status | Meaning |
|---|---|
| VALID | Document exists and is current |
| EXPIRED | Document past its expiry date |
| SUPERSEDED | Replaced by newer version |
| PENDING | Awaiting review |
| REVOKED | Access revoked or file deleted |
CSP Entity & Stakeholder Intelligence
The CSP intelligence layer models the full corporate structure: entities, individuals, ownership, registers, compliance obligations, and cross-workspace presence. Requires the CSP addon to be installed.
Ownership Graph
The ownership graph traverses shareholding chains to compute effective ownership. Each node includes direct percentage, effective percentage (cascaded through layers), path, and circular detection flag. Additional graph queries include family tree traversal, circular ownership detection, stakeholder exposure analysis, shared stakeholder discovery, and conflict of interest detection.
UBO Computation
UBO computation aggregates effective percentages per individual across all ownership paths, filtering by threshold (default 25%). Snapshots are materialised as immutable records with ownership path data and flags for comparing computed vs declared UBOs.
Temporal Registers
Immutable append-only register entries for all role changes. Entries are never updated (except end dates when superseded). Supports point-in-time reconstruction for any historical date. Source tracking records origin: MANUAL, MATTER_STEP, AI_AGENT, IMPORT.
| Register | Tracks |
|---|---|
| DIRECTORS | Director appointments, resignations |
| SHAREHOLDERS | Share allotments, transfers, cancellations |
| SECRETARIES | Secretary appointments |
| UBOS | UBO declarations and changes |
| AUTHORISED_SIGNATORIES | Signatory appointments |
Compliance Engine
Obligation rules (per-workspace or global) match entities by jurisdiction and type, computing due dates from trigger types: ANNIVERSARY, CALENDAR_DATE, DOCUMENT_EXPIRY, or EVENT_BASED. Status auto-degrades on schedule.
Status Ladder (auto-degraded on schedule):
UPCOMING → WARNING (warningDate reached)
→ DUE (dueDate reached)
→ OVERDUE (dueDate + gracePeriod passed)
→ COMPLETED / FILED (user action)
Entity Compliance Summary (RAG rating):
RED: Any overdue obligations OR expired KYC OR no licence
AMBER: Any warning obligations OR expiring KYC (90 days) OR missing KYC
GREEN: All clear
Event System
Immutable event log driving downstream effects. Each event triggers cascade effects - ownership changes recompute UBO snapshots, compliance changes recompute obligations, and role changes refresh cross-workspace presence.
| Category | Downstream Effect |
|---|---|
| ROLE_CHANGE | Refreshes cross-workspace entity presence |
| OWNERSHIP_CHANGE | Triggers UBO snapshot recomputation |
| COMPLIANCE_CHANGE | Recomputes compliance obligations for entity |
| ENTITY_LIFECYCLE | Fires notifications (incorporation, strike-off, dormancy) |
| DATA_CORRECTION | No downstream dispatch (silent correction) |
Event-driven cascade: Role change -> register entry -> entity event -> UBO recompute -> compliance recheck -> presence refresh. Compliance-critical side effects are tracked with structured failure logging.
Access Control & Sensitivity
Files are classified by sensitivity level, with each level defining minimum role requirements for operations (READ, DOWNLOAD, SHARE, GRANT, DELETE). Cross-workspace access is mediated through grants with full audit logging.
Sensitivity Levels
| Level | Default Access | Typical Content |
|---|---|---|
| PUBLIC | All roles | Published documents |
| INTERNAL | EDITOR+ | Standard business files |
| CONFIDENTIAL | ADMIN+ | KYC, financial, personal data |
| RESTRICTED | OWNER only | Board minutes, legal privilege |
Cross-Workspace Access Grants
Request → Escalation (if needed) → Approval → Grant → Stream
Access Grant:
File, requesting workspace, custodian workspace
Level: VIEW | DOWNLOAD
Status: ACTIVE | REVOKED | EXPIRED
Grant lifecycle is fully audit-logged. Revocation cascades to DocumentPresence status updates.
Retention & Deletion
Per-workspace configurable retention by sensitivity: Standard (180d), Extended (1yr), Finance (7yr), Legal (25yr), Indefinite. Soft delete checks for LEGAL_HOLD, snapshots metadata, marks DELETED, expires presence, revokes grants, and defers byte cleanup to the retention job.
Human-in-the-Loop Escalation
Escalation handling covers operations exceeding auto-approve thresholds: file access requests from other workspaces, identity review queue for low-confidence matches, and cross-jurisdiction transfer approvals. Status flow: OPEN -> ASSIGNED -> RESOLVED / DISMISSED.
Self-Knowledge
The system can search and RAG-embed its own documentation. This enables the AI assistant to answer questions about the platform's own architecture and capabilities by querying internal documentation at runtime.
| Interface | Type | Description |
|---|---|---|
search_system_docs | AI Tool | Semantic search across CLAUDE-*.md documentation files |
embed_system_docs | AI Tool | Trigger RAG embedding of system documentation |
/api/docs/search | REST | Search system documentation via API |
/api/docs/embed | REST | Trigger embedding of system docs via API |
How it works: System documentation (CLAUDE.md, CLAUDE-intelligence.md, etc.) is chunked and embedded via the standard RAG pipeline. The AI assistant uses search_system_docs to retrieve relevant sections when answering questions about platform capabilities, architecture, or configuration.
Architectural Principles
| # | Principle | Description |
|---|---|---|
| 1 | Asynchronous processing | Propagation, presence, events, folders - all non-blocking side effects |
| 2 | Immutable audit trail | Register entries, entity events, deletion certificates - nothing deleted |
| 3 | Presence before access | Discovery via document presence, access via explicit grants |
| 4 | Workspace isolation | All queries scoped to workspace; oversight enables controlled cross-workspace visibility |
| 5 | Role classification by context | Folder path determines file role, automatic and consistent |
| 6 | Temporal reconstruction | Registers reconstruct state at any historical date |
| 7 | Event-driven cascade | Role change -> register -> event -> UBO recompute -> compliance -> presence |
| 8 | Cost-aware operations | RAG embeddings budget-gated, extraction timeouts enforced |
| 9 | Versioned security | Key rotation without downtime, legal holds block deletion |
| 10 | Graph-based computation | Recursive queries for ownership chains, family trees, circular detection |
Security Considerations
The intelligence pipeline has several architectural properties that carry real security or operational consequences. These are documented honestly - not as solved problems, but as structural risks with mitigations in place.
| Area | Status | Description |
|---|---|---|
| Injection scanning | Active | Tool results are scanned for injection patterns. Detected injections are sanitised before processing. |
| Cell locking | Enforced | API and AI updates respect locked cells. Locked cells cannot be modified through standard interfaces. |
| SQL executor | Strong | Multi-layer defence: read-only, workspace-scoped, query timeout, row limits. Requires ADMIN role. |
| Identity hashing | Secured | Secure hashing with versioned key rotation for identity fields. |
| Async processing | Monitored | Compliance-critical operations are tracked with structured failure logging. |
| Autonomy enforcement | Aligned | Configurable autonomy levels. API key creation requires ADMIN/OWNER role. |
| Document extraction | Sandboxed | Extraction runs with timeouts and resource limits. |
| Extraction overflow | Managed | Tables with many extraction columns queue overflow items for admin review. |
| AI endpoint access | Restricted | API access tool is read-only (GET) with sensitive endpoints blocked. |
| Cost controls | Active | Budget tracking with drift detection and spending limits. |
| Grant revocation | Immediate | Grant validation happens at request time with no caching. |
Full documentation: See internal architecture notes (§26) for detailed analysis of each risk with code references, mitigations, and residual risk assessments.