FALCONIUM BACKEND ARCHITECTURE

Django 5.0 · DRF · PostgreSQL · Custom step-graph runner · Runtime Wiki · SSM-backed secrets · Structured logging · Admin/User dashboards · Banker Brain (Hermes upgrade, flag-gated) · Per-user multi-tenancy
30 June 2026 · Current state · FalcoMain cutover live · Runtime Wiki deployed · Admin + User dashboards live · Banker Brain Phases 1–6 deployed (flag-gated) · Verified against work log + git main. NEW — Per-user isolation: memory + learning are now keyed per user (session→user resolver, merged to main 25 Jun, d2c4895). Identity is routed/inferred today (model C); the verified authenticated chat endpoint (model A) is built on branch feat/chat-path-auth, pending the paired iPad change before deploy. → Frontend Architecture
Status
Built · live
Partial
Planned (not yet started)
Seam — future layer
Layers
L1 · Skills

Banking Skills Library

Ilya-owned · Markdown only · No code
Skills Library Planned
apps/agents/falco_main/skills/
Markdown files encoding expert banking knowledge — pitchbook structure, comps tables, transaction-cost presentation, CIM review, covenant extraction. Injected into the drafter LLM prompt via progressive disclosure based on intent classification.
  • Owned by Ilya, edited via Cowork or Claude Code
  • Versioned in the falconium_backend repo
  • No restart required — read at every request
L2 · Harness

FalcoMain Pipeline · Custom Step-Graph Runner

Ilya-led via Claude Code · Developer support · 14 named Steps · Live in production since 11 May 2026
01
Receive
run_id · identity → user
02
Hydrate
memory · §3 atoms
03
Anonymise
§9 chokepoint · seam
04
Classify
intent + confidence
05
Route
declarative rules.yaml
06–09
Branches A·B·C·D
persona / docs / web / calc
10
Verify
FalcoJudge seam
11
De-anonymise
paired · seam
12
Persist
agent_runs
13
Respond
to forwarder → iPad
14
Memory write
async · Celery
Step-Graph Runner Built
apps/agents/runner/
Custom thin runner — ~400 lines. Each Step is a named class with declared input/output schemas. Honours all six §11.1 properties: explicit graph, deterministic transitions, introspectable state, declarative routing, replayability, localised failure.
  • Step, StepGraph, Run, RunStep
  • ./manage.py replay_run <run_id> — deterministic replay
  • ./manage.py render_graph falco_main — SVG of the graph
  • ~2–3 s wall-clock per chat (Groq ~800–1000 ms, audit row writes ~1–1.5 s)
Drafter LLM Abstraction Built
apps/agents/llm/drafter.py
Vendor-neutral interface. Default path is a single Groq qwen/qwen3-32b call. When AGENTS_MODEL_RESOLUTION_ENABLED is on, the drafter resolves provider/model via the model registry with ordered fallback across Groq, Anthropic and OpenAI. Per §10, today's Groq-only behaviour is a configuration choice, not a code dependency.
Model Resolution + Fallback Built · flag-gated
apps/agents/llm/resolution.py
Multi-model provider resolution with ordered fallback over the existing model registry, behind AGENTS_MODEL_RESOLUTION_ENABLED · flag-gated (default off).
LLM Hook Seam Built · flag-gated
apps/agents/llm/hooks.py
Plugin hook seam exposing pre/post/session hooks around every LLM call, behind AGENTS_LLM_HOOKS_ENABLED · flag-gated (default off).
Context Compression + Lineage Built · flag-gated
apps/agents/llm/compression.py
Context compression with full ContextLineage and prompt-cache-stable prompts, behind AGENTS_CONTEXT_COMPRESSION_ENABLED · flag-gated (default off).
Learning Engine + Approval Queue Built · flag-gated
apps/learning/
Observation / ReflexionNote / CandidateLearning foundation with a human approval queue, behind LEARNING_ENABLED · flag-gated (default off).
Skill Loop (write_approval) Built · flag-gated
apps/learning/skills.py
Self-improving SKILL.md propose/approve loop guarded by a write_approval gate, behind LEARNING_SKILL_LOOP_ENABLED · flag-gated (default off).
ExpeL House-Rules + Voyager Built · flag-gated
apps/learning/expel.py · voyager.py
ExpeL self-pruning house-rules (HouseRule model) and Voyager-style skill growth, behind LEARNING_EXPEL_ENABLED · flag-gated (default off).
Multi-Agent Delegation Built · flag-gated
apps/agents/orchestration/delegate.py
delegate_task with multi-model routing (DelegatedTask model); the deterministic step-graph is retained as the analysis specialist, behind AGENTS_DELEGATION_ENABLED · flag-gated (default off).
FalcoDocs (Branch B) Partial
apps/agents/falco_docs/
Document retrieval active via Weaviate / RAPTOR chain. Typed structured handoff to a dedicated FalcoDocs Python agent in progress. Returns retrieved atoms with §3 metadata schema (event_time, ingest_time, source_id, source_kind, model_id).
  • Neo4j cutover decided April 2026 — ingest half done; Django retrieval still on Weaviate
WebSearch tool (Branch C) Built
apps/agents/tools/web_search.py
Calls Perplexity sonar-pro with key from SSM. Returns sources with §3 metadata. Drafter LLM consumes results.
Calculation tool (Branch D) Built
apps/agents/tools/calculation.py
Restricted Python sandbox. Per §5, the LLM never produces numbers — calculation node does. Provenance attached to every numerical output.
Memory module — read+write Partial · hydrate + write live
apps/memory/
Step 02 hydrates context from memory_user_profile and memory_daily_log at session start. Step 14 dispatches async Celery write. Three memory types per Memory Architecture v1 — semantic, episodic, procedural — specified but Phase 1 full deployment pending. The Supabase / MongoDB pattern from the n8n era is dropped. A curated always-on brain tier + distillation (BrainTier/BrainClaim, apps/memory/brain.py) and a three-layer People model (Facts/Relationship/Synthesis, apps/memory/people.py) now exist, flag-gated (MEMORY_BRAIN_TIER_ENABLED / MEMORY_PEOPLE_LAYERS_ENABLED).
  • Now per-user (merged to main 25 Jun, d2c4895): memory + learning are keyed by a stable user:<id> key from apps/memory/identity.py — every conversation a banker has shares one brain, and one user's memory is invisible to another. Today the owner is inferred from the conversation (model C); the authenticated chat endpoint makes it verified (model A).
Routing rules Built
apps/agents/routing/rules.yaml
Declarative YAML. One readable file. Decides which branch runs based on intent and confidence. Editable by Ilya without touching runner code.
FalcoMain endpoint — legacy/interim Built · live
POST /api/v1/agents/falco/main/
The live chat endpoint. Unauthenticated (model C) — accepts the n8n-shaped request, returns the n8n-shaped response, and lets the graph infer the owner from sessionId. Receives traffic via the n8n forwarder at backend.falconium.ai/n8n/webhook/df5cb261-…. Kept live as the safety net during iPad migration to the authenticated path; trusted-internal-devices only until then.
Authenticated chat endpoint — verified isolation Built · on branch, pending iPad
POST /api/v1/agents/falco/chat/ · branch feat/chat-path-auth
The production-grade chat path (model A). Requires Authorization: Bearer <jwt>, runs as the verified request.user (not inferred), and checks the conversation belongs to the caller — 403 on a cross-user attempt, 401 with no token. Same request/response wire shape as the legacy endpoint, so the iPad change is small: add the token + point the send here, dropping the anonymous n8n hop. Built and tested on a branch; not yet on main — ships paired with the iPad change (PR + TestFlight). The login JWT now also carries user_id/email/role claims (UserTokenObtainPairSerializer).
  • Still open before external use (separate backend hardening): wide-open UsersViewSet, AccountSerializer Outlook-token leak, owner-less document/vector retrieval, unauthenticated audio WebSocket
Admin + User Dashboards Built · live
apps/dashboard/
Server-rendered Admin dashboard (superuser-only) and per-User dashboard, replacing the retired n8n/RAPTOR-era Django admin. Markdown-first config under a git-backed content/ root, routed through three thin stores so infrastructure can be repointed later. Now also hosts 7 Banker Brain control sections: Learning queue, Skill loop, Memory browser, House rules, Run monitor, Context & lineage, and Hooks & flags.
  • ContentStore (filesystem+git, commit-on-save), RecordStore (Postgres), SecretStore (wraps secrets_admin/SSM)
  • Config resolver: system default → product pack → user override
  • Admin sections: system overview/health, users + audit-logged Open-as-user, models, agents, skills library, system style, databases, MCP toggles
  • Dark shared web house style (also applied to secrets_admin + observability); admin/user view switcher
  • Mounted at /dashboard/admin/ and /dashboard/ — live at backend.falconium.ai
API Payload Logging Built · live
apps/dashboard/request_logging.py
Middleware logging one structured JSON line per /api/* and /dashboard/* request — method, path, status, duration, user, IP, full request+response bodies. Sensitive fields (password, token, secret, authorization, cookie) redacted to ***; bodies size-capped 50KB; ships to CloudWatch /falconium/backend via the existing Vector pipeline. Controlled by API_LOG_* env vars.
Secrets Admin Built · live
apps/secrets_admin/
Superuser-only CRUD over /falconium/* SSM Parameter Store params; masked list, audited reveal/edit. Mounted at /admin/secrets/. Wrapped (not reimplemented) by the dashboard's SecretStore.
Cutover completed 7 May 2026. The live MAIN_AGENT_Falconium n8n workflow is now a 2-node forwarder (Webhook → HTTP Request → /api/v1/agents/falco/main/). The original 20-node workflow is preserved as MAIN_AGENT_Falconium [BACKUP pre-rebuild] (id hAC7u2sf0mMBkuFv) for one-click revert. Pipeline stable since 11 May. iPad app unchanged throughout.
Seams

Future Layers — designed-in pass-throughs today

Specified in the architecture · activated by future build phases without restructuring
Anonymisation chokepoint · §9
Steps 03 and 11 of the pipeline
Single point in front of every cloud LLM call where regulated entities are masked, and a paired step where they are reinjected. Day one: pass-through that records pre-anonymisation state. Activates when the anonymisation system is deployed — the chokepoint position never moves.
FalcoJudge · §6
Step 10 of the pipeline · apps/agents/falco_judge/ (planned)
Multi-agent verification. The judge agent reviews the drafter's output for factual coherence, source-grounding, and counter-argument coverage on position-taking outputs. Different vendor than the drafter (per §10). Day one: pass-through stub.
FalcoMath · §5
Branch D · today the calculation tool, tomorrow a full numerical agent
Branch D becomes a richer numerical-reasoning agent over time — multi-step financial calculations with intermediate verification. The Step 09 seam preserves the position so the upgrade is in-place.
Counter-argument pass · §6
Optional branch from Step 10 for position-taking outputs
When a draft takes a position (a recommendation, a buy/sell, a "this deal is risky"), a forced counter-argument pass runs through a fresh prompt. Reduces single-perspective bias.
Financial Observatory · §15
Stretch / post-MVP · apps/observatory/ (planned)
Long-term goal. Continuously-updated reflective layer over the user's corpus. Earliest manifestation: nightly contemplation — scheduled reflective pass producing a morning observation list with full provenance. Specified in v1.1 of the architecture document so it can be promoted into MVP scope by an explicit decision.
Voice Registry
apps/voice_registry/ (planned)
Backend speaker-embedding registry to support meeting-mode diarisation. Embeddings persist across meetings; speakers named post-meeting are recognised in future ones. Required before meeting transcription quality stabilises.
L3 · Infrastructure

Django Apps · Data Layer · Integrations · Hosting

Developer-owned · Singularis (Mikhail backend, Evgeny iPad) · London EC2

Django apps

accounts Built
apps/accounts/
User auth (JWT), profiles, Microsoft OAuth tokens. The User is the per-user tenant root (FK target for conversations, projects, files, memory, learning).
  • Models: User (email auth, JWT, Outlook tokens, avatar, wiki_storage_backend)
  • OAuth callback: /oauth/microsoft/callback/
  • On branch feat/chat-path-auth: issued JWTs now carry user_id/email/role claims (UserTokenObtainPairSerializer, previously referenced in settings but missing)
  • Hardening still open: UsersViewSet is a full ModelViewSet with no admin gate (any user can read/edit/delete any other); AccountSerializer is exclude-based and leaks the user's Outlook OAuth tokens to the client
chats Partial · legacy cleanup pending
apps/chats/
Conversations, messages, audio transcription WebSocket. Conversations carries the owner user FK — the join the authenticated chat path uses to enforce per-user ownership. The legacy LLMRaptorView and routing-style AIAgent chains are queued for removal once the FalcoMain pipeline is confirmed fully stable.
  • Models: Conversations (owner FK), Message, QuestionLLM, PromptExample, AIAgentLog
  • WebSocket: /ws/transcribe/still unauthenticated (AudioConsumer keys only on meeting_id); hardening item before external use
agents Built
apps/agents/
Houses the FalcoMain 14-step pipeline, step-graph runner, drafter LLM abstraction, FalcoDocs, WebSearch, Calculation tool, and FalcoJudge seam. The L2 harness layer's home in code. Live since 11 May 2026.
  • Audit tables: agent_runs, agent_run_steps, agent_routing_decisions
memory Partial · hydrate + write live
apps/memory/
Read (Step 02 hydrate) and async write (Step 14 Celery) operational in the FalcoMain pipeline. Full Memory Architecture Phase 1 — UserMemoryProfile, DailyLog, MemoryEntry models with bootstrap loading and session snapshots — pending. Replaces the n8n-era Supabase/MongoDB pattern. Now keyed per user via apps/memory/identity.py (user:<id> key, merged to main 25 Jun) — the tenancy guarantee for memory + learning.
wiki Built · deployed 22 May
apps/wiki/ · /api/v1/wiki/
Runtime Wiki: per-user structured knowledge store that auto-populates from conversations, meetings, documents, and projects.
  • Models: WikiPage, WikiFieldProvenance, WikiClarification
  • File watcher daemon (falconium-wiki-watcher.service) running on EC2 — inotify bridge updates Postgres + Neo4j on every file write
  • REST API: list, detail, PATCH field (with provenance stamp), soft-delete, restore, search, conversation resume
  • Channels consumer: real-time push to iPad on wiki changes
  • Celery: wiki_lint_nightly (04:00 Rome), wiki_commit_and_push_all (every 5 min)
  • 291 wiki pages populated (9 project, 229 conversation, 50 meeting, 3 document)
  • Wiki folder on EC2: /opt/falconium/wikis/admin_2/
meetings Built
apps/meetings/
Meeting records, transcripts, speaker-attributed messages, auto-generated tasks. WebSocket consumer for streaming transcription.
mails Built
apps/mails/
Microsoft Graph integration. Read, send, reply, bulk operations via OAuth.
mcp_server Built
apps/mcp_server/
Model Context Protocol — exposes AI tools (document retrieval, web search, Q&A) as a standardised interface for LLMs that support MCP natively.
services Built
apps/services/
Projects, tasks, document management, file uploads, system prompt configuration.
  • 26 May: wiki_slug and wiki_summary added to ProjectSerializer and ProjectListSerializer — wiki enrichment live in the projects API

Data layer

PostgreSQL 16 (RDS) Built
Primary relational store · eu-west-2
Users, conversations, messages, projects, meetings, documents. 37 tables; 48 MB.
  • FalcoMain audit trail: agent_runs, agent_run_steps, agent_routing_decisions — append-only
  • Wiki tables: wiki_wikpage, wiki_wikifieldprovenance, wiki_wikiclarification
  • Daily pg_dump to /opt/backups/ on EC2; pre-rebuild dump at SHA-256 cf7a57af…d4814
Neo4j Aura Ingest only
Knowledge-graph · 4d95f6de.databases.neo4j.io
Populated by the n8n Ingest workflow (LlamaParse → Neo4j) and by the wiki watcher (wiki namespace, tenancy-guarded). Django retrieval cutover from Weaviate — decided April 2026, not yet started. Weaviate remains the active retrieval store.
Weaviate Built · active production retrieval
Self-hosted on EC2 · hybrid 75/25 BM25+vector
Serves all current document retrieval via raptor_chain.py. Remains live until the Django → Neo4j retrieval cutover is complete.
Redis Built
Cache + Celery broker
Backs Celery (memory write, wiki lint, wiki git sync, ingest callbacks) and Django Channels (transcription WebSocket, wiki consumer).
AWS S3 Built
File storage
Document uploads, meeting recordings, attachments. Lifecycle policies for cost control.
AWS SSM Parameter Store Built
Unified secrets store · KMS-encrypted · eu-west-2
31 /falconium/* parameters migrated from us-east-1 → eu-west-2 on 7 May 2026. Single source of truth for every credential — backend, iPad build-time. Backend reads via config/load_environments.py.
  • Includes Neo4j, Groq, OpenAI, Anthropic, Perplexity, Sentry, DB, Redis, wiki params
  • Old us-east-1 copies pending deletion after sustained production health
  • Admin form at /admin/secrets/ planned — not yet built
CloudWatch Logs Built · live
Centralised structured log destination
Structured JSON logs from gunicorn/daphne/celery ship to CloudWatch log group /falconium/backend (eu-west-2, 30-day retention) via a Vector agent on the EC2 host. Includes detailed API request/response payload logging (redacted). Searchable in Logs Insights by trace_id.

Integrations

Self-hosted n8n Built · thin forwarder for chat
backend.falconium.ai/n8n
FalcoMain: now a 2-node forwarder only (Webhook → HTTP Request → Django). Continues to host the Ingest document pipeline (LlamaParse → Neo4j) and FalcoEmotions for emotion classification — both out of scope for the FalcoMain rebuild.
  • Backup workflow MAIN_AGENT_Falconium [BACKUP pre-rebuild] retained for one-click revert
LlamaParse Built
Document parsing
Webhook callback into the n8n Ingest pipeline. Out of scope for the FalcoMain rebuild.
Microsoft Graph Built
Outlook email · OAuth 2.0
Tokens stored in accounts. Used by mails for read/send/reply/bulk operations. Entra app registration rebuilt April 2026 under Ilya's tenant.
Sentry Built
Error capture · trace ID propagated
Both backend and iPad. Trace IDs set on every event so a misbehaving chat in Sentry links directly to its agent_runs row. DSN rotated May 2026; new DSN held in SSM at /falconium/SENTRY_DSN.
Cloud LLMs Built
Groq · Anthropic · OpenAI · Perplexity
Reached only through the drafter LLM abstraction. Keys held in SSM. Live drafter: Groq qwen/qwen3-32b. Anthropic and OpenAI activate as model abstraction matures.

Hosting

London EC2 Built · operational
eu-west-2b · t2.medium · 13.42.64.100
AWS account 471112907118. Hosts Django (gunicorn + daphne), n8n, Weaviate, Redis, Filebeat, wiki watcher daemon. Domain: backend.falconium.ai. Caddy reverse proxy.
  • Wiki files: /opt/falconium/wikis/
  • Backups: /opt/backups/ (daily pg_dump)
CodeDeploy pipeline Built
AWS · eu-west-2 · push-to-main triggers deploy
Backend deploys via CodeDeploy on git push to main. Git tag pre-falcomain-rebuild (commit f62e39c) is the rollback anchor. Pre-client phase: no staging environment.
iPad — open repoint Partial · REST still on Singularis
Non-chat endpoints
Chat path now goes through backend.falconium.ai (Django). All other REST and the audio WebSocket still hit the Singularis Kubernetes cluster (falconium.k8s.singularis-lab.com). Wiki enrichment fields (wiki_slug, wiki_summary) will not reach the iPad until the non-chat REST endpoints are repointed — the key outstanding unlock.
Observability · Live Logs Built · live
apps/observability/ + CloudWatch
Live systemd-journal log tail (SSE) at /admin/logs/, superuser-only. Structured JSON logs from gunicorn/daphne/celery ship to CloudWatch log group /falconium/backend (eu-west-2, 30-day retention) via a Vector agent on EC2, correlated by trace_id. Deploy: AWS CodePipeline 'london-falconium-cicd-pipeline' watches main → CodeDeploy → EC2.