Notebook
Note
Download Markdown

PROMPT 00 Main

21 min read

Sample App — Main Spec

Design and architect a secure web app that solves (describe the problem) and results in (success criteria).

DO NOT start working yet. First, ask me clarifying questions so we can define the approach together. Only begin once we’ve aligned.

Capture the spec for this app in a markdown file saved to this folder. If the app is sufficiently complicated that it needs to be broken into phases, then create numbered prompt markdown files aligned to each phase.

Start by designing the user interfaces and asking me to approve them. Design them one at a time in this order:

  1. Home (which manages the Workflow described below)
  2. Admin Dashboard
  3. Sample output report
  4. Other Admin pages like User Roles, Sessions, API, etc.

Use design.md as the design system for the design.

Every time we iterate on this app, save a summary of the work we did in this session and the next steps to a file called context-log.md in this folder. always start each session for this project by reading the context log.

DEVELOPMENT RULES TO FOLLOW


NOTE

The prompts and spec below was written by the AI based on the initial prompt above and iterations where it asked me questions and I responded. I don’t understand most of it.

Build Scope — Phase 1

We are building Phase 1 only. Phase 1 is the core workflow. Do NOT build admin features, email drafting, report creation, or advanced session management yet. Phase 2 and 3 items are listed at the bottom of this document for context only.

Phase 1 includes:

  1. Project setup (Next.js, Tailwind, shadcn/ui, Drizzle, PostgreSQL, Fly.io) **
  2. Authentication (NextAuth.js with Google OAuth, domain restriction). Drive drive.file scope ships unverified-mode for ≤10 users.
  3. Database schema
  4. Prompt selection and display
  5. File upload with validation, ClamAV scan, and Python-sidecar-driven metadata strip
  6. Pre-processing pipeline (parse, calculate, summarise) — runs in the Python sidecar
  7. Results display with formatting toggles and copy + quality trust chip
  8. Document generation — .docx + .pdf (LibreOffice headless) + Google Doc fallback when LibreOffice fails
  9. Basic session saving (auto-save, list past sessions). Retention: 120d for completed sessions, 50d for abandoned/errored/in-progress
  10. Meeting app action items integration (pull outstanding action items from client meetings, with manual paste fallback)
  11. Basic error handling and logging + structured logging with request_id propagated through Next.js + Python sidecar + OpenAI
  12. Privacy retention alignment — privacy notice text aligned to actual deletion windows; backups + R2 lifecycle aligned
  13. AI eval suite
  14. OpenAI cost budget alerts — daily/weekly thresholds → email; alert content uses error-email allowlist
  15. Auto-archive completed sessions to Drive — service-account write of output reports + encrypted JSON manifest into folder
  16. App Health Phase 1 endpoints/api/health/* for liveness (DB, OpenAI, ClamAV, Python sidecar, R2, Drive service-account, retention-job) and /api/metrics/* for analytics (restoration-rate, grounding-rate, cost-rate, eval-baseline, shadow-run state). Phase 2 dashboard wires up to these endpoints.

Phase 1 does NOT include: Admin dashboard UI, user role management UI, email template CRUD UI, Gmail draft creation, full Google Doc UX (drive picker / share controls — Phase 2; Phase 1 has Google Doc only as PDF fallback, API cost reporting page (basic alerts ship in Phase 1, admin page, advanced session management UI, the re-run workflow, the Help section UI, the full file upload to Google Drive UX (the auto-archive piece is Phase 1 ; the user-controlled upload UX with picker/change buttons is Phase 2), the App Health dashboard UI (Phase 1 ships the underlying endpoints; the dashboard UI is Phase 2), the PIPEDA right-to-deletion UI (Phase 1 handles deletion via manual SQL). Build the database tables for these features but do not build the UI or logic.


Tech Stack


Dependency Management & Maintenance

This app uses open-source npm packages. Unlike WordPress (which has a core platform and plugins to update through a dashboard), this app has no “core” to update — but its dependencies need to stay current for security and stability.

Automated Security Monitoring

GitHub Dependabot is configured on the repo (.github/dependabot.yml) with:

Socket.dev

The Claude Code security hook checks every npm install against Socket.dev for supply-chain attacks, typosquatting, and malware before any package is installed. This is in addition to Dependabot.

Update Workflow

  1. Security PRs (urgent) — Merge within 1–2 days. These fix known vulnerabilities.
  2. Routine update PRs (weekly) — Review and merge when convenient, or ask Claude Code to review them.
  3. Quarterly review — Every 3 months, check for major version upgrades (Next.js, React, etc.), deprecated packages, and end-of-life runtimes.


Disaster Recovery (Phase 2)

The app has Fly.io’s built-in daily database backups from day one. Write the full plan in Phase 2.

Targets

Backups

Restore Testing

Runbook


Workflow Health (Phase 1)

The session state machine defines transitions but on its own can’t detect a worker that crashed mid-step. This section adds a heartbeat + detector that catches orphaned sessions before users notice they’re stuck.

Heartbeat

Every Sessions row carries a last_activity_at timestamp, updated on every state transition AND every step completion. Long-running steps (Step 7 stripping, Step 8 pre-processing, Step 9 AI call) write this field every 30 seconds while running so a worker crash mid-step is detectable.

Worker-Crash Recovery

On worker startup, the boot routine queries for sessions in mid-run states (processing, analyzing, stripping) with stale heartbeats (last_activity_at > 60 seconds ago) and marks them error with cleanup. This catches sessions that were running when the previous worker died.


Authentication & Authorisation

Authentication

Use NextAuth.js with Google OAuth. The domain restriction (@yourdomain.ca) is read from the Organizations table (organization.domain), not hard-coded. On login, the app looks up the user’s email domain against the Organizations table to find their organization. If no organization matches, deny login. No external users for v1, but this design means additional organizations can be onboarded later by adding a row to Organizations — no code changes required.


Security Requirements

  1. Authentication: NextAuth.js with Google OAuth, restricted to @yourdomain.ca.

  2. Authorisation: Role-based access control enforced on both frontend routes and every API endpoint. Every API route must check the user’s role before returning data.

    Tenant-scoped reads use a helper. All reads of tenant-scoped entities (Sessions, Clients, Prompts, Outputs, Session Files, Audit Log) go through a single helper function pattern — getEntity(id, auth) — that filters by organization_id AND (user_id == auth.user_id OR auth.role == 'admin'). Never call findById(id) or raw queries that take only an ID. This is enforced by code review and by an explicit lint rule on the data-access layer.

    All primary keys are UUID v4. Sessions, Clients, Prompts, Outputs, Session Files, Idempotency Keys, File Scan Log, Audit Log — all use UUID v4 as the primary key. Sequential or predictable IDs are forbidden because they enable URL-guessing IDOR attacks.

    Admin session-view anomaly detection. Every admin session view writes session_viewed_by_admin to the audit log (already specified). If a single admin views more than 10 sessions in any 5-minute window, fire admin_session_view_anomaly to the audit log AND alert the platform admin (Slack to admin). This catches potential exfiltration attempts where an admin account is compromised.

  3. Data at rest: Encrypt uploaded financial statements in storage. Use PostgreSQL pgcrypto or encrypt files before storing in the filesystem.

  4. Data in transit: HTTPS only (enforced by Fly.io). All API calls to OpenAI over HTTPS.

  5. API keys: Stored as Fly.io secrets (environment variables) only. Never exposed to the frontend. Never logged. Never included in error messages. The admin UI may show a masked placeholder but cannot read or edit the actual key.

  6. Client name & confidential name stripping: All client identifying information and names of customers, suppliers, vendors and donors (for charities) removed from files and replaced with anonymized variables before any data leaves the server to a third-party API. These must be stored encrypted at rest in storage as variables so they can be restored after the data comes back from the AI.

  7. Session isolation: users can only access their own sessions. Admins can access all sessions.

  8. Input validation: Validate all file uploads by type and size. Maximum 25 MB per file, 100 MB per session. Accepted formats: .xlsx, .xls, .csv, .pdf, .zip. Sanitise all user inputs.

  9. Rate limiting: Rate-limit all endpoints, not just AI analysis. Defaults: AI analysis 20 per user per day (configurable by admin), login attempts 5 failed per email per 15 minutes (log to audit log), file uploads 10 per user per hour, read API endpoints 100 per user per minute, write API endpoints 30 per user per minute. Implementation: RateLimiterPostgres from rate-limiter-flexible per ER-3 (NOT in-memory — in-memory limits are per-process and break across multiple Next.js machines, making the actual rate limit N × machine_count). Counters live in a Postgres table; one query per limited request, well within budget for Phase 1 traffic.

  10. Error handling: Never expose stack traces, database queries, or internal file paths to users. Log errors server-side. Send error notification emails to admin@yourdomain.ca.

  11. PIPEDA notice (updated post-CEO-review — Cherry-1 + Cherry-5): Display a brief privacy notice in the UI explaining that financial data is processed server-side, client, supplier, vendor and donor names are stripped before AI analysis, and:

    “Successfully completed analyses are retained for 120 days. Abandoned, errored, and in-progress sessions are deleted within 50 days. Audit logs are retained for 90 days. Backups (Fly.io Postgres + R2 lifecycle) are aligned with these windows. Senior users may view your sessions during prompt comparison (shadow-run feature, Phase 1 Late). We use OpenAI’s API on a Business tier with a signed Data Processing Agreement; OpenAI retains request logs for 30 days and does not use submitted data for model training.”

    PIPEDA right-to-deletion handling (Phase 1): manual SQL handled by admin per F4.3=C; UI deferred to Phase 2.

  12. OpenAI Data Processing Agreement: Use the OpenAI API on a Business or Enterprise tier with a signed Data Processing Agreement (DPA) confirming that submitted data is not used for model training. Verify this is in place before launch. The PIPEDA notice should name the AI provider and link to their privacy policy.

  13. Envelope encryption (Tension-1=A — confirmed Phase 1; ER-4 adds rotation mechanics): Use envelope encryption for data at rest. A master key (MEK, stored as a Fly.io secret) encrypts per-session data encryption keys (DEKs). Each session gets its own DEK, which encrypts that session’s R2 files plus the encrypted JSON fields (name_stripping_map, restoration_verification, content_grounding, shadow_outputs, fellow_match_log). The DEK itself is encrypted with the MEK and stored alongside the session. This limits blast radius — a leaked DEK exposes one session, not all data.

    Rotation mechanics (ER-4): Sessions carry a mek_version integer column from Day 1. Schema supports old + new MEK coexisting during rollover. Rotation script (scripts/rotate-mek.ts, Phase 1 Late) walks Sessions in batches with --dry-run and --confirm flags, decrypts each wrapped DEK with the old MEK, re-encrypts with the new MEK + new mek_version, verifies a sample post-rotation by round-trip-decrypting an encrypted JSON field. Previous MEK kept for 30 days for dual-read / rollback. Keep an offline backup of every active MEK in a secure location separate from the server.

    Outside-voice note (codex review, 2026-05-06): the original CEO-plan-draft attempted to defer envelope-per-session DEKs to Phase 2 to save ~3 days CC. Codex correctly flagged this as a Phase 1 security downgrade, not migration debt — the spec lists per-session DEKs specifically to limit blast radius. Restored to Phase 1 unconditionally.

  14. Deep file metadata stripping: Before sending any data to the AI, strip identifying information using purpose-built libraries that go beyond shallow document-properties stripping.

For Excel (.xlsx / .xls) — use openpyxl:

For PDFs — use pikepdf:

Post-strip verification: After stripping, re-open the file and confirm:

If any residual content is found, log a warning and surface it to the user before they confirm the stripped version: “PDF stripped, but [N] suspicious entries remain. Manual review recommended before continuing.” The session’s existing Step 7 stripping audit log gets a metadata_residuals: [...] array so warnings persist with the session.

Tolerance: warnings, not hard rejects — PDFs are inherently messy and the user gets to decide whether to proceed. 15. Malicious file protection: Reject Excel files that contain VBA macros (check for macro content and reject or strip). Sanitize zip extraction to prevent zip-slip path traversal attacks (validate that extracted paths don’t escape the target directory). Strip Excel formulas to values during parsing — do not evaluate formulas. Scan PDFs for embedded JavaScript and reject if found.

ClamAV deployment:

Fail-closed degradation: the health endpoint pings ClamAV. If ClamAV is unreachable, all uploads are rejected with: “File scanning is temporarily unavailable. Please try again in a few minutes.” Every blocked upload during an outage logs clamav_unavailable to the audit log so the admin can re-invite affected users once service resumes. After 5 minutes of continuous downtime, an admin alert fires (email + Slack webhook).

Magic-byte sniff (independent layer, runs before ClamAV):

ExtensionRequired magic bytes
.xlsx, .xls (post-2007 OOXML)50 4B 03 04 (ZIP signature, since OOXML is a ZIP)
.xls (legacy CFBF)D0 CF 11 E0 A1 B1 1A E1
.pdf25 50 44 46 2D (%PDF-)
.csvNo magic bytes; enforce UTF-8 / Latin-1 / UTF-16 BOM detection — reject binary content
.zip50 4B 03 04

If extension and magic bytes don’t match, reject with: “This file does not match its file extension. Please re-save it in the correct format and try again.” Log to audit log as file_rejected_magic_mismatch.

Scan log: every scan recorded to the new FileScanLog entity (timestamp, user_id, filename, file_size, scan_engine, scan_result, scan_duration_ms, virus_name_if_found). Retention: 90 days. Used for forensics and ClamAV health metrics.

Step 5 ordering: the file-upload pipeline runs in this order — extension check → magic-byte sniff → ClamAV → content-attack mitigations (below) → metadata strip → store. Any rejection along the way short-circuits with the appropriate user-facing message and audit-log entry.


User Workflow

Multi-tenancy note: Every database query in the workflow must filter by organization_id. This is both a multi-tenancy requirement and a defense-in-depth measure against data leaks between organizations.

Idempotency note (Tension-3 — refined hash inputs; ER-10 clarifies timing + PK): The hash is computed at the FINAL “Run Analysis” click — the click the user presses after reviewing the strip preview, after acknowledging data-quality warnings, after confirming the prompt. By that point all hash inputs exist on the client. Earlier clicks (start upload, confirm strip, confirm acks) are NOT idempotency-protected because they’re step-completion submissions, not workflow-run commitments. The IdempotencyKeys.key column is the natural primary key (the hash itself is already 256-bit collision-resistant; the table only ever holds rows for 300s; no UUID id column is added — explicit carve-out from the universal-UUID-PK rule). The “Run Analysis” button computes a deterministic SHA-256 idempotency hash on the client of:

The server stores every key in the IdempotencyKeys table with a state machine (pendingsucceeded / failed) and a 300-second TTL. Server-side TTL handles time-window dedup; the hash itself does not include a timestamp window. On a second request with the same key:

Because the hash includes prompt-version, mode, and manual inputs, two clicks with the same files but different mode/prompt-version/manual-input do NOT collapse into one run. Same files + same prompt_version + same mode + same manual inputs + same acks within 300s = same idempotency key, regardless of which browser tab issued the click. This protects against the “two-tab double-click” failure mode while still allowing legitimate distinct runs.

Step 1 — Authentication

The user (user) must authenticate using their Google Workspace account via NextAuth.js. The permitted domain is read from the Organizations table, not hard-coded.

Step 2 — Client Selection

User selects a client name from a dropdown. The client list is populated from a Google Sheet that syncs. The app reads the Google Sheet using a Google Service Account. The Sheet ID and Service Account credentials are stored as Fly.io secrets.

The sync runs on every app load / user login to keep the client list fresh. The admin Client Sync page (Phase 2) will also have a manual “Sync Now” button.

Sync validation: Log every sync event (clients added, updated, unchanged). If a sync detects an unusually large number of new clients (configurable threshold, default 50), pause the sync and alert the admin rather than importing blindly. This prevents accidental or malicious bulk injection of fake client records.

Each client record includes: client name, sector, Google Drive client folder ID, Google Drive folder name. (Phase 2 — needed for the file upload step; column may be blank for some clients).

FieldColumn header in spreadsheet
Client namefc_client_name
Sectorclient_sector
Google Drive client folder IDdrive_client_folder_id
Google Drive client folderdrive_client_folder_name
Google Drive client folder URLdrive_client_folder_url

If the client is not in the database, the user can enter the client name manually and select the sector from a dropdown.

Manual client tokenization:

Sample Client UX : A seeded Sample Client (Demo) row (is_sample = TRUE) appears in the dropdown with a DEMO chip + amber background to differentiate it from real clients. When selected, Step 4 (Display Required Reports) shows a prominent banner: “You are running the demo client. Output is for practice only and will not be sent to anyone.” The Step 10b output is also flagged. Sample sessions are filtered out of all admin metrics, cost-alert evaluation, App Health analytics, and audit-log search by default (per Security Req #25). Admins can toggle “include samples” when debugging the demo flow.

First-run experience : New users landing on Day 1 see an empty client list with a Try the Demo CTA pointing at the Sample Client. Drops time-to-first-value from “30 minutes of fumbling” to “5 minutes confidence” without requiring a 1:1 onboarding meeting.

Step 3 — Mode Selection

Two modes: Preview and Final.

Step 4 — Display Required Reports & Prompt

After the client is selected, display:

  1. A list of the required financial statements for the client’s sector (see PROMPT-03-required-reports.md).
  2. The prompt that will be used for the AI analysis.

Each client must have an assigned sector and a default prompt. There are two kinds of prompts:

Prompts are named by sector and have full CRUD and clone functionality. Clone means duplicate all fields except date created and unique IDs, then open as a draft.

The humanizer rules (see the bottom section of PROMPT-02-ai-prompts.md) are stored as an editable section of the prompt entity, not hard-coded.