Opbox

Intelligence Architecture

The intelligence layer is a multi-system pipeline that transforms raw documents into structured, cross-workspace knowledge. It spans document extraction, file propagation, identity resolution, ownership graph computation, compliance monitoring, and human-in-the-loop oversight - all wired together through event-driven side effects.

Architecture Flow

The following diagram shows how a document upload cascades through the intelligence pipeline - from file registration and extraction through to cross-workspace presence and CSP intelligence side effects.

                           ┌─────────────────────────────────────────────┐
                           │              DOCUMENT UPLOAD                │
                           │      (matter, submission, direct)           │
                           └──────────────────┬──────────────────────────┘
                                              │
                    ┌─────────────────────────┼─────────────────────────┐
                    ▼                         ▼                         ▼
            ┌──────────────┐       ┌───────────────────┐     ┌──────────────────┐
            │  File        │       │    Extraction      │     │  Embedding       │
            │  Registry    │       │  Engine            │     │  Engine          │
            │              │       │  → dynamic cols    │     │  → vector chunks │
            └──────┬───────┘       └───────────────────┘     └──────────────────┘
                   │
        ┌──────────┼───────────────────────┐
        ▼          ▼                       ▼
  ┌───────────┐  ┌─────────────────┐  ┌──────────────────────┐
  │ Propagate │  │ Role Classify   │  │ Presence Trigger     │
  │ to entity │  │ from folder     │  │ (KYC/EVIDENCE roles) │
  │ rows      │  │ path            │  │                      │
  └─────┬─────┘  └────────┬────────┘  └──────────┬───────────┘
        │                 │                       │
        ▼                 ▼                       ▼
  ┌──────────────────────────────┐    ┌─────────────────────────────┐
  │  Files linked to company row  │    │  Identity Resolution        │
  │  Files linked to stakeholder │    │  (secure hash → identity    │
  │  (with classified role)      │    │   → document presence)      │
  └──────────────────────────────┘    └──────────┬──────────────────┘
                                                 │
                                                 ▼
                                   ┌──────────────────────────┐
                                   │  Cross-Workspace         │
                                   │  Discovery & Access      │
                                   │  (presence → grant →     │
                                   │   stream)                │
                                   └──────────────────────────┘

  Meanwhile, on the CSP intelligence side:

  ┌──────────────┐    ┌────────────────┐    ┌─────────────────────┐
  │ CspRole      │───▶│ Register Entry │───▶│ Entity Event        │
  │ changes      │    │ (immutable)    │    │ (OWNERSHIP_CHANGE)  │
  └──────────────┘    └────────────────┘    └──────────┬──────────┘
                                                       │
                              ┌─────────────────────────┼──────────────────┐
                              ▼                         ▼                  ▼
                    ┌──────────────────┐    ┌────────────────────┐  ┌────────────────┐
                    │ UBO Recompute    │    │ Compliance Recheck │  │ Presence        │
                    │ (ownership chain │    │ (obligation status │  │ Advertise       │
                    │  → snapshot)     │    │  degradation)      │  │ (cross-WS)     │
                    └──────────────────┘    └────────────────────┘  └────────────────┘

Document Extraction Pipeline

Raw documents are processed through the extraction engine to produce typed extraction results that become dynamic table columns. Each extraction goes through quality scoring to determine usability.

Upload → Extraction Job (PENDING)
  → Document extraction engine
  → Extraction results with typed fields
  → Dynamic table columns created (capped at 50/table)
  → Overflow items queued for manual review
  → Quality score computed (0-1)

Key Details

FeatureDescription
Column dedupAutomatic deduplication prevents duplicate columns
Type inferenceDATE, NUMBER, TEXT inferred from sample values
Quality scoringStructure score (0-1) measures parse quality; threshold determines usability
OverflowTables with 50+ extraction columns queue new extractions for manual review

File Registry & Smart Folders

Every file in the system gets an index entry. Link edges connect files to any entity - the same file appears in multiple views simultaneously.

File Record

Metadata index for each file. Tracks filename, MIME type, size, checksum, source type (UPLOAD, GENERATED, IMPORTED), sensitivity level (PUBLIC to RESTRICTED), version chain, and virtual folder path for tree navigation.

Connects a file to any entity. Supports entity types (ROW, MATTER, MATTER_STEP, SUBMISSION, TEMPLATE, DOCUMENT, GENERATED_DOC) and roles (ATTACHMENT, OUTPUT, SOURCE, KYC, EVIDENCE, SIGNATURE, COVER).

Smart Folder Types

FolderSource
MattersFiles with entityType = MATTER or MATTER_STEP
IndividualsFiles with entityType = ROW on Individuals system table
CompaniesFiles with entityType = ROW on Companies system table
RecentFiles created in last 7 days
StarredUser bookmarks
UnlinkedFiles with no entity links

Entity File Propagation

The propagation system automatically creates file links from matter files to the companies and stakeholders the matter serves. This is the core mechanism that makes files appear in multiple places - like a company-centric file model.

Link TypeMeaningResolves To
PRIMARY_ENTITYThe main company this matter servesCompany row
STAKEHOLDERAn individual involved in the matterStakeholder row
PARENT_ENTITYParent company in hierarchyCompany row
SUBSIDIARYChild entityCompany row
RELATED_PARTYOther related entityCompany row

Propagation Flow

File uploaded to matter
  → File record created with matter link
  → Automatic propagation to linked entities
      │
      ├─ Load entity links for this matter
      ├─ For each PRIMARY_ENTITY / PARENT / SUBSIDIARY:
      │   └─ Link file to company row with classified role
      ├─ For each STAKEHOLDER:
      │   └─ Link file to stakeholder row with classified role
      └─ File role determined from folder path

Role Classification from Folder Path

Folder PatternRoleMeaning
/KYC/, /Compliance/, /PINCAKYCKnow Your Customer documents
/Filings/, /Share Certificates/, /Registers/EVIDENCERegulatory evidence
/Constitutional Docs/, /Resolutions & Minutes/EVIDENCECorporate governance evidence
/Signed/SIGNATUREExecuted documents
/Welcome Pack/OUTPUTGenerated output documents
Everything elseATTACHMENTGeneral attachments

Deep File Aggregation Queries

QueryAggregates From
Entity deep filesDirect company files + all matter files + all stakeholder files with roles in entity
Individual deep filesDirect stakeholder files + all matter files + bridged KYC documents
Stakeholder file timelineDirect uploads + entity files + matter files + KYC docs (sorted chronologically, deduplicated)

Identity Resolution

Identity resolution maps workspace-specific table rows to workspace-agnostic identity records using securely hashed identifiers. No plaintext identity data is ever stored in the global layer.

Three-Layer Architecture

Layer 1: Global Identity
  │  Workspace-agnostic person or company
  │  Types: INDIVIDUAL, ENTITY
  │
  ├── Layer 2: Hashed Identifiers
  │     Securely hashed identity fields
  │     Types: EMAIL, PASSPORT_NUMBER, EMIRATES_ID,
  │            COMPANY_REG_NUMBER, TAX_ID, NATIONAL_ID
  │     Supports versioned key rotation
  │
  └── Layer 3: Identity Links
        Maps workspace row → Global Identity
        Includes confidence score (0-1), disputed flag, expiry

Resolution Flow

Row created/updated in Individuals or Companies table
  → Identity resolution triggered automatically
  → Check: is identity resolution enabled for this workspace?
     (requires active oversight relationship)
  → Extract identity fields by table type:
     INDIVIDUALS: Email, Passport Number, Emirates ID
     COMPANIES: Company Number, Tax ID
  → Normalize and securely hash each field
  → Search for matching identifiers
  → Compute confidence score
  → Decision:
     High confidence: Auto-link to existing identity
     Low confidence: Queue for human verification
     No match: Create new identity record
  → Resolution logged for audit trail

Confidence Scoring

Match CriteriaScoreAction
1 email match0.3Queued for human verification
1 government ID match0.5Queued for human verification
2 field matches0.7Auto-linked to identity
3+ field matches0.9Auto-linked to identity
All government IDs match1.0Auto-linked to identity

Key Rotation: Versioned keys support zero-downtime secret rotation, with automatic verification against any active key version.

Document Presence

Document presence is the discovery layer that enables workspaces to know a document exists for a shared identity - without revealing the document's content. Only existence, type, and expiry are shared.

How Presence Is Created

File linked with KYC or EVIDENCE role to a row
  → Presence trigger fires automatically
  → Identity link found for the row
  → Document presence record created (VALID status)
  → If row is a CSP stakeholder:
     → Cross-workspace entity presence refreshed

Discovery to Access Flow

Overseer workspace sees document presence:
  "John D. has a PASSPORT document (expires 2027-03)"
  Content NOT revealed, only existence + type + expiry
      │
      ▼
  POST /api/files/[fileId]/request-access
      │
      ├─ Routes to appropriate approver
      ├─ Auto-approve (if within configured thresholds)
      └─ OR queued for admin review
      │
      ▼
  On approval: access grant created
      │
      ▼
  GET /api/files/shared/[fileId]?grantId=...
  → Validates grant → streams file bytes

Presence Statuses

StatusMeaning
VALIDDocument exists and is current
EXPIREDDocument past its expiry date
SUPERSEDEDReplaced by newer version
PENDINGAwaiting review
REVOKEDAccess revoked or file deleted

CSP Entity & Stakeholder Intelligence

The CSP intelligence layer models the full corporate structure: entities, individuals, ownership, registers, compliance obligations, and cross-workspace presence. Requires the CSP addon to be installed.

Ownership Graph

The ownership graph traverses shareholding chains to compute effective ownership. Each node includes direct percentage, effective percentage (cascaded through layers), path, and circular detection flag. Additional graph queries include family tree traversal, circular ownership detection, stakeholder exposure analysis, shared stakeholder discovery, and conflict of interest detection.

UBO Computation

UBO computation aggregates effective percentages per individual across all ownership paths, filtering by threshold (default 25%). Snapshots are materialised as immutable records with ownership path data and flags for comparing computed vs declared UBOs.

Temporal Registers

Immutable append-only register entries for all role changes. Entries are never updated (except end dates when superseded). Supports point-in-time reconstruction for any historical date. Source tracking records origin: MANUAL, MATTER_STEP, AI_AGENT, IMPORT.

RegisterTracks
DIRECTORSDirector appointments, resignations
SHAREHOLDERSShare allotments, transfers, cancellations
SECRETARIESSecretary appointments
UBOSUBO declarations and changes
AUTHORISED_SIGNATORIESSignatory appointments

Compliance Engine

Obligation rules (per-workspace or global) match entities by jurisdiction and type, computing due dates from trigger types: ANNIVERSARY, CALENDAR_DATE, DOCUMENT_EXPIRY, or EVENT_BASED. Status auto-degrades on schedule.

Status Ladder (auto-degraded on schedule):
  UPCOMING → WARNING (warningDate reached)
           → DUE (dueDate reached)
           → OVERDUE (dueDate + gracePeriod passed)
           → COMPLETED / FILED (user action)

Entity Compliance Summary (RAG rating):
  RED:   Any overdue obligations OR expired KYC OR no licence
  AMBER: Any warning obligations OR expiring KYC (90 days) OR missing KYC
  GREEN: All clear

Event System

Immutable event log driving downstream effects. Each event triggers cascade effects - ownership changes recompute UBO snapshots, compliance changes recompute obligations, and role changes refresh cross-workspace presence.

CategoryDownstream Effect
ROLE_CHANGERefreshes cross-workspace entity presence
OWNERSHIP_CHANGETriggers UBO snapshot recomputation
COMPLIANCE_CHANGERecomputes compliance obligations for entity
ENTITY_LIFECYCLEFires notifications (incorporation, strike-off, dormancy)
DATA_CORRECTIONNo downstream dispatch (silent correction)

Event-driven cascade: Role change -> register entry -> entity event -> UBO recompute -> compliance recheck -> presence refresh. Compliance-critical side effects are tracked with structured failure logging.

Access Control & Sensitivity

Files are classified by sensitivity level, with each level defining minimum role requirements for operations (READ, DOWNLOAD, SHARE, GRANT, DELETE). Cross-workspace access is mediated through grants with full audit logging.

Sensitivity Levels

LevelDefault AccessTypical Content
PUBLICAll rolesPublished documents
INTERNALEDITOR+Standard business files
CONFIDENTIALADMIN+KYC, financial, personal data
RESTRICTEDOWNER onlyBoard minutes, legal privilege

Cross-Workspace Access Grants

Request → Escalation (if needed) → Approval → Grant → Stream

Access Grant:
  File, requesting workspace, custodian workspace
  Level: VIEW | DOWNLOAD
  Status: ACTIVE | REVOKED | EXPIRED

Grant lifecycle is fully audit-logged. Revocation cascades to DocumentPresence status updates.

Retention & Deletion

Per-workspace configurable retention by sensitivity: Standard (180d), Extended (1yr), Finance (7yr), Legal (25yr), Indefinite. Soft delete checks for LEGAL_HOLD, snapshots metadata, marks DELETED, expires presence, revokes grants, and defers byte cleanup to the retention job.

Human-in-the-Loop Escalation

Escalation handling covers operations exceeding auto-approve thresholds: file access requests from other workspaces, identity review queue for low-confidence matches, and cross-jurisdiction transfer approvals. Status flow: OPEN -> ASSIGNED -> RESOLVED / DISMISSED.

Self-Knowledge

The system can search and RAG-embed its own documentation. This enables the AI assistant to answer questions about the platform's own architecture and capabilities by querying internal documentation at runtime.

InterfaceTypeDescription
search_system_docsAI ToolSemantic search across CLAUDE-*.md documentation files
embed_system_docsAI ToolTrigger RAG embedding of system documentation
/api/docs/searchRESTSearch system documentation via API
/api/docs/embedRESTTrigger embedding of system docs via API

How it works: System documentation (CLAUDE.md, CLAUDE-intelligence.md, etc.) is chunked and embedded via the standard RAG pipeline. The AI assistant uses search_system_docs to retrieve relevant sections when answering questions about platform capabilities, architecture, or configuration.

Architectural Principles

#PrincipleDescription
1Asynchronous processingPropagation, presence, events, folders - all non-blocking side effects
2Immutable audit trailRegister entries, entity events, deletion certificates - nothing deleted
3Presence before accessDiscovery via document presence, access via explicit grants
4Workspace isolationAll queries scoped to workspace; oversight enables controlled cross-workspace visibility
5Role classification by contextFolder path determines file role, automatic and consistent
6Temporal reconstructionRegisters reconstruct state at any historical date
7Event-driven cascadeRole change -> register -> event -> UBO recompute -> compliance -> presence
8Cost-aware operationsRAG embeddings budget-gated, extraction timeouts enforced
9Versioned securityKey rotation without downtime, legal holds block deletion
10Graph-based computationRecursive queries for ownership chains, family trees, circular detection

Security Considerations

The intelligence pipeline has several architectural properties that carry real security or operational consequences. These are documented honestly - not as solved problems, but as structural risks with mitigations in place.

AreaStatusDescription
Injection scanningActiveTool results are scanned for injection patterns. Detected injections are sanitised before processing.
Cell lockingEnforcedAPI and AI updates respect locked cells. Locked cells cannot be modified through standard interfaces.
SQL executorStrongMulti-layer defence: read-only, workspace-scoped, query timeout, row limits. Requires ADMIN role.
Identity hashingSecuredSecure hashing with versioned key rotation for identity fields.
Async processingMonitoredCompliance-critical operations are tracked with structured failure logging.
Autonomy enforcementAlignedConfigurable autonomy levels. API key creation requires ADMIN/OWNER role.
Document extractionSandboxedExtraction runs with timeouts and resource limits.
Extraction overflowManagedTables with many extraction columns queue overflow items for admin review.
AI endpoint accessRestrictedAPI access tool is read-only (GET) with sensitive endpoints blocked.
Cost controlsActiveBudget tracking with drift detection and spending limits.
Grant revocationImmediateGrant validation happens at request time with no caching.

Full documentation: See internal architecture notes (§26) for detailed analysis of each risk with code references, mitigations, and residual risk assessments.