Intelligence Architecture

The intelligence layer is a multi-system pipeline that transforms raw documents into structured, cross-workspace knowledge. It spans document extraction, file propagation, identity resolution, ownership graph computation, compliance monitoring, and human-in-the-loop oversight - all wired together through event-driven side effects.

Architecture Flow

The following diagram shows how a document upload cascades through the intelligence pipeline - from file registration and extraction through to cross-workspace presence and CSP intelligence side effects.

                           ┌─────────────────────────────────────────────┐
                           │              DOCUMENT UPLOAD                │
                           │      (matter, submission, direct)           │
                           └──────────────────┬──────────────────────────┘
                                              │
                    ┌─────────────────────────┼─────────────────────────┐
                    ▼                         ▼                         ▼
            ┌──────────────┐       ┌───────────────────┐     ┌──────────────────┐
            │  File        │       │    Extraction      │     │  Embedding       │
            │  Registry    │       │  Engine            │     │  Engine          │
            │              │       │  → dynamic cols    │     │  → vector chunks │
            └──────┬───────┘       └───────────────────┘     └──────────────────┘
                   │
        ┌──────────┼───────────────────────┐
        ▼          ▼                       ▼
  ┌───────────┐  ┌─────────────────┐  ┌──────────────────────┐
  │ Propagate │  │ Role Classify   │  │ Presence Trigger     │
  │ to entity │  │ from folder     │  │ (KYC/EVIDENCE roles) │
  │ rows      │  │ path            │  │                      │
  └─────┬─────┘  └────────┬────────┘  └──────────┬───────────┘
        │                 │                       │
        ▼                 ▼                       ▼
  ┌──────────────────────────────┐    ┌─────────────────────────────┐
  │  Files linked to company row  │    │  Identity Resolution        │
  │  Files linked to stakeholder │    │  (secure hash → identity    │
  │  (with classified role)      │    │   → document presence)      │
  └──────────────────────────────┘    └──────────┬──────────────────┘
                                                 │
                                                 ▼
                                   ┌──────────────────────────┐
                                   │  Cross-Workspace         │
                                   │  Discovery & Access      │
                                   │  (presence → grant →     │
                                   │   stream)                │
                                   └──────────────────────────┘

  Meanwhile, on the CSP intelligence side:

  ┌──────────────┐    ┌────────────────┐    ┌─────────────────────┐
  │ CspRole      │───▶│ Register Entry │───▶│ Entity Event        │
  │ changes      │    │ (immutable)    │    │ (OWNERSHIP_CHANGE)  │
  └──────────────┘    └────────────────┘    └──────────┬──────────┘
                                                       │
                              ┌─────────────────────────┼──────────────────┐
                              ▼                         ▼                  ▼
                    ┌──────────────────┐    ┌────────────────────┐  ┌────────────────┐
                    │ UBO Recompute    │    │ Compliance Recheck │  │ Presence        │
                    │ (ownership chain │    │ (obligation status │  │ Advertise       │
                    │  → snapshot)     │    │  degradation)      │  │ (cross-WS)     │
                    └──────────────────┘    └────────────────────┘  └────────────────┘

Document Extraction Pipeline

Raw documents are processed through the extraction engine to produce typed extraction results that become dynamic table columns. Each extraction goes through quality scoring to determine usability.

Upload → Extraction Job (PENDING)
  → Document extraction engine
  → Extraction results with typed fields
  → Dynamic table columns created (capped at 50/table)
  → Overflow items queued for manual review
  → Quality score computed (0-1)

Key Details

Feature	Description
Column dedup	Automatic deduplication prevents duplicate columns
Type inference	DATE, NUMBER, TEXT inferred from sample values
Quality scoring	Structure score (0-1) measures parse quality; threshold determines usability
Overflow	Tables with 50+ extraction columns queue new extractions for manual review

File Registry & Smart Folders

Every file in the system gets an index entry. Link edges connect files to any entity - the same file appears in multiple views simultaneously.

File Record

Metadata index for each file. Tracks filename, MIME type, size, checksum, source type (UPLOAD, GENERATED, IMPORTED), sensitivity level (PUBLIC to RESTRICTED), version chain, and virtual folder path for tree navigation.

File Link

Connects a file to any entity. Supports entity types (ROW, MATTER, MATTER_STEP, SUBMISSION, TEMPLATE, DOCUMENT, GENERATED_DOC) and roles (ATTACHMENT, OUTPUT, SOURCE, KYC, EVIDENCE, SIGNATURE, COVER).

Smart Folder Types

Folder	Source
Matters	Files with entityType = MATTER or MATTER_STEP
Individuals	Files with entityType = ROW on Individuals system table
Companies	Files with entityType = ROW on Companies system table
Recent	Files created in last 7 days
Starred	User bookmarks
Unlinked	Files with no entity links

Entity File Propagation

The propagation system automatically creates file links from matter files to the companies and stakeholders the matter serves. This is the core mechanism that makes files appear in multiple places - like a company-centric file model.

Entity Link Types

Link Type	Meaning	Resolves To
PRIMARY_ENTITY	The main company this matter serves	Company row
STAKEHOLDER	An individual involved in the matter	Stakeholder row
PARENT_ENTITY	Parent company in hierarchy	Company row
SUBSIDIARY	Child entity	Company row
RELATED_PARTY	Other related entity	Company row

Propagation Flow

File uploaded to matter
  → File record created with matter link
  → Automatic propagation to linked entities
      │
      ├─ Load entity links for this matter
      ├─ For each PRIMARY_ENTITY / PARENT / SUBSIDIARY:
      │   └─ Link file to company row with classified role
      ├─ For each STAKEHOLDER:
      │   └─ Link file to stakeholder row with classified role
      └─ File role determined from folder path

Role Classification from Folder Path

Folder Pattern	Role	Meaning
`/KYC/, /Compliance/, /PINCA`	KYC	Know Your Customer documents
`/Filings/, /Share Certificates/, /Registers/`	EVIDENCE	Regulatory evidence
`/Constitutional Docs/, /Resolutions & Minutes/`	EVIDENCE	Corporate governance evidence
`/Signed/`	SIGNATURE	Executed documents
`/Welcome Pack/`	OUTPUT	Generated output documents
Everything else	ATTACHMENT	General attachments

Deep File Aggregation Queries

Query	Aggregates From
Entity deep files	Direct company files + all matter files + all stakeholder files with roles in entity
Individual deep files	Direct stakeholder files + all matter files + bridged KYC documents
Stakeholder file timeline	Direct uploads + entity files + matter files + KYC docs (sorted chronologically, deduplicated)

Identity Resolution

Identity resolution maps workspace-specific table rows to workspace-agnostic identity records using securely hashed identifiers. No plaintext identity data is ever stored in the global layer.

Three-Layer Architecture

Layer 1: Global Identity
  │  Workspace-agnostic person or company
  │  Types: INDIVIDUAL, ENTITY
  │
  ├── Layer 2: Hashed Identifiers
  │     Securely hashed identity fields
  │     Types: EMAIL, PASSPORT_NUMBER, EMIRATES_ID,
  │            COMPANY_REG_NUMBER, TAX_ID, NATIONAL_ID
  │     Supports versioned key rotation
  │
  └── Layer 3: Identity Links
        Maps workspace row → Global Identity
        Includes confidence score (0-1), disputed flag, expiry

Resolution Flow

Row created/updated in Individuals or Companies table
  → Identity resolution triggered automatically
  → Check: is identity resolution enabled for this workspace?
     (requires active oversight relationship)
  → Extract identity fields by table type:
     INDIVIDUALS: Email, Passport Number, Emirates ID
     COMPANIES: Company Number, Tax ID
  → Normalize and securely hash each field
  → Search for matching identifiers
  → Compute confidence score
  → Decision:
     High confidence: Auto-link to existing identity
     Low confidence: Queue for human verification
     No match: Create new identity record
  → Resolution logged for audit trail

Confidence Scoring

Match Criteria	Score	Action
1 email match	0.3	Queued for human verification
1 government ID match	0.5	Queued for human verification
2 field matches	0.7	Auto-linked to identity
3+ field matches	0.9	Auto-linked to identity
All government IDs match	1.0	Auto-linked to identity

Key Rotation: Versioned keys support zero-downtime secret rotation, with automatic verification against any active key version.

Document Presence

Document presence is the discovery layer that enables workspaces to know a document exists for a shared identity - without revealing the document's content. Only existence, type, and expiry are shared.

How Presence Is Created

File linked with KYC or EVIDENCE role to a row
  → Presence trigger fires automatically
  → Identity link found for the row
  → Document presence record created (VALID status)
  → If row is a CSP stakeholder:
     → Cross-workspace entity presence refreshed

Discovery to Access Flow

Overseer workspace sees document presence:
  "John D. has a PASSPORT document (expires 2027-03)"
  Content NOT revealed, only existence + type + expiry
      │
      ▼
  POST /api/files/[fileId]/request-access
      │
      ├─ Routes to appropriate approver
      ├─ Auto-approve (if within configured thresholds)
      └─ OR queued for admin review
      │
      ▼
  On approval: access grant created
      │
      ▼
  GET /api/files/shared/[fileId]?grantId=...
  → Validates grant → streams file bytes

Presence Statuses

Status	Meaning
VALID	Document exists and is current
EXPIRED	Document past its expiry date
SUPERSEDED	Replaced by newer version
PENDING	Awaiting review
REVOKED	Access revoked or file deleted

CSP Entity & Stakeholder Intelligence

The CSP intelligence layer models the full corporate structure: entities, individuals, ownership, registers, compliance obligations, and cross-workspace presence. Requires the CSP addon to be installed.

Ownership Graph

The ownership graph traverses shareholding chains to compute effective ownership. Each node includes direct percentage, effective percentage (cascaded through layers), path, and circular detection flag. Additional graph queries include family tree traversal, circular ownership detection, stakeholder exposure analysis, shared stakeholder discovery, and conflict of interest detection.

UBO Computation

UBO computation aggregates effective percentages per individual across all ownership paths, filtering by threshold (default 25%). Snapshots are materialised as immutable records with ownership path data and flags for comparing computed vs declared UBOs.

Temporal Registers

Immutable append-only register entries for all role changes. Entries are never updated (except end dates when superseded). Supports point-in-time reconstruction for any historical date. Source tracking records origin: MANUAL, MATTER_STEP, AI_AGENT, IMPORT.

Register	Tracks
DIRECTORS	Director appointments, resignations
SHAREHOLDERS	Share allotments, transfers, cancellations
SECRETARIES	Secretary appointments
UBOS	UBO declarations and changes
AUTHORISED_SIGNATORIES	Signatory appointments

Compliance Engine

Obligation rules (per-workspace or global) match entities by jurisdiction and type, computing due dates from trigger types: ANNIVERSARY, CALENDAR_DATE, DOCUMENT_EXPIRY, or EVENT_BASED. Status auto-degrades on schedule.

Status Ladder (auto-degraded on schedule):
  UPCOMING → WARNING (warningDate reached)
           → DUE (dueDate reached)
           → OVERDUE (dueDate + gracePeriod passed)
           → COMPLETED / FILED (user action)

Entity Compliance Summary (RAG rating):
  RED:   Any overdue obligations OR expired KYC OR no licence
  AMBER: Any warning obligations OR expiring KYC (90 days) OR missing KYC
  GREEN: All clear

Event System

Immutable event log driving downstream effects. Each event triggers cascade effects - ownership changes recompute UBO snapshots, compliance changes recompute obligations, and role changes refresh cross-workspace presence.

Category	Downstream Effect
ROLE_CHANGE	Refreshes cross-workspace entity presence
OWNERSHIP_CHANGE	Triggers UBO snapshot recomputation
COMPLIANCE_CHANGE	Recomputes compliance obligations for entity
ENTITY_LIFECYCLE	Fires notifications (incorporation, strike-off, dormancy)
DATA_CORRECTION	No downstream dispatch (silent correction)

Event-driven cascade: Role change -> register entry -> entity event -> UBO recompute -> compliance recheck -> presence refresh. Compliance-critical side effects are tracked with structured failure logging.

Access Control & Sensitivity

Files are classified by sensitivity level, with each level defining minimum role requirements for operations (READ, DOWNLOAD, SHARE, GRANT, DELETE). Cross-workspace access is mediated through grants with full audit logging.

Sensitivity Levels

Level	Default Access	Typical Content
PUBLIC	All roles	Published documents
INTERNAL	EDITOR+	Standard business files
CONFIDENTIAL	ADMIN+	KYC, financial, personal data
RESTRICTED	OWNER only	Board minutes, legal privilege

Cross-Workspace Access Grants

Request → Escalation (if needed) → Approval → Grant → Stream

Access Grant:
  File, requesting workspace, custodian workspace
  Level: VIEW | DOWNLOAD
  Status: ACTIVE | REVOKED | EXPIRED

Grant lifecycle is fully audit-logged. Revocation cascades to DocumentPresence status updates.

Retention & Deletion

Per-workspace configurable retention by sensitivity: Standard (180d), Extended (1yr), Finance (7yr), Legal (25yr), Indefinite. Soft delete checks for LEGAL_HOLD, snapshots metadata, marks DELETED, expires presence, revokes grants, and defers byte cleanup to the retention job.

Human-in-the-Loop Escalation

Escalation handling covers operations exceeding auto-approve thresholds: file access requests from other workspaces, identity review queue for low-confidence matches, and cross-jurisdiction transfer approvals. Status flow: OPEN -> ASSIGNED -> RESOLVED / DISMISSED.

Self-Knowledge

The system can search and RAG-embed its own documentation. This enables the AI assistant to answer questions about the platform's own architecture and capabilities by querying internal documentation at runtime.

Interface	Type	Description
`search_system_docs`	AI Tool	Semantic search across CLAUDE-*.md documentation files
`embed_system_docs`	AI Tool	Trigger RAG embedding of system documentation
`/api/docs/search`	REST	Search system documentation via API
`/api/docs/embed`	REST	Trigger embedding of system docs via API

How it works: System documentation (CLAUDE.md, CLAUDE-intelligence.md, etc.) is chunked and embedded via the standard RAG pipeline. The AI assistant uses search_system_docs to retrieve relevant sections when answering questions about platform capabilities, architecture, or configuration.

Architectural Principles

#	Principle	Description
1	Asynchronous processing	Propagation, presence, events, folders - all non-blocking side effects
2	Immutable audit trail	Register entries, entity events, deletion certificates - nothing deleted
3	Presence before access	Discovery via document presence, access via explicit grants
4	Workspace isolation	All queries scoped to workspace; oversight enables controlled cross-workspace visibility
5	Role classification by context	Folder path determines file role, automatic and consistent
6	Temporal reconstruction	Registers reconstruct state at any historical date
7	Event-driven cascade	Role change -> register -> event -> UBO recompute -> compliance -> presence
8	Cost-aware operations	RAG embeddings budget-gated, extraction timeouts enforced
9	Versioned security	Key rotation without downtime, legal holds block deletion
10	Graph-based computation	Recursive queries for ownership chains, family trees, circular detection

Security Considerations

The intelligence pipeline has several architectural properties that carry real security or operational consequences. These are documented honestly - not as solved problems, but as structural risks with mitigations in place.

Area	Status	Description
Injection scanning	Active	Tool results are scanned for injection patterns. Detected injections are sanitised before processing.
Cell locking	Enforced	API and AI updates respect locked cells. Locked cells cannot be modified through standard interfaces.
SQL executor	Strong	Multi-layer defence: read-only, workspace-scoped, query timeout, row limits. Requires ADMIN role.
Identity hashing	Secured	Secure hashing with versioned key rotation for identity fields.
Async processing	Monitored	Compliance-critical operations are tracked with structured failure logging.
Autonomy enforcement	Aligned	Configurable autonomy levels. API key creation requires ADMIN/OWNER role.
Document extraction	Sandboxed	Extraction runs with timeouts and resource limits.
Extraction overflow	Managed	Tables with many extraction columns queue overflow items for admin review.
AI endpoint access	Restricted	API access tool is read-only (GET) with sensitive endpoints blocked.
Cost controls	Active	Budget tracking with drift detection and spending limits.
Grant revocation	Immediate	Grant validation happens at request time with no caching.

Full documentation: See internal architecture notes (§26) for detailed analysis of each risk with code references, mitigations, and residual risk assessments.