AI Document Analysis

Turn any PDF into
structured intelligence.
Upload a contract, invoice, policy or report. Claude extracts parties, dates, amounts and obligations — then answers your questions about it.
PDF text extracted in your browser via PDF.js  ·  API key secured server-side via Netlify  ·  Zero file uploads — only text is transmitted
Portfolio Project 2 / 5
Claude API  ·  PDF.js  ·  Netlify
Vanilla JS  ·  Single-file frontend
Live & deployed

The Problem
Professionals read entire documents
to find 3–4 facts.
Legal, HR, finance, and procurement teams spend hours in documents just to locate specific information. The reading is mechanical — the AI does it better.
No consistent extraction format
The same contract type gets reviewed differently by every person who touches it
Generic AI tools require copy-paste
Manual text transfer into chatbots and unstructured freeform responses back
Junior staff miss key clauses
Review quality depends entirely on the reviewer’s experience — no structured fallback
Business owners receive contracts blind
Can’t afford a lawyer review for every vendor relationship but still need to understand what they’re signing

The Solution
Upload. Extract. Ask.
Three steps, seconds not hours.
A single-file web app powered by Claude’s 200K context window. No login, no server file storage, no manual copy-paste.
1
Upload or pick a sample document
Drag and drop any PDF — contract, invoice, HR policy, research paper — up to 10MB. Or try one of 3 preloaded samples instantly. PDF.js extracts all text client-side. Nothing is uploaded to any server.
2
Claude returns 9 structured fields as strict JSON
Document type, parties, key dates, amounts, obligations, important clauses, confidence rating, and 4 document-specific follow-up questions — each rendered as its own card with a confidence badge.
3
Ask specific questions in the chat interface
Claude answers from the document only — grounded by the system prompt so it cannot hallucinate. Every answer is anchored to what the document actually says. Suggestion chips auto-fill the input.
Extraction Schema — 9 fields · strict JSONNo prose · No markdown fences
documentTypestringe.g. “Service Agreement”, “Invoice”, “HR Policy”
summarystring5–7 sentence plain-language summary, no jargon
parties[]string[]All named parties, companies, and signatories
keyDates[]{label, date}Effective date, expiry, payment due, review date
amounts[]{label, value}Contract value, invoice total, fees, penalties
obligations[]string[]Key responsibilities and obligations per party
clauses[]{name, summary}Important clauses with a one-sentence summary each
confidenceenumhigh / medium / low — Claude’s self-assessed extraction confidence
suggestions[]string[]4 document-specific follow-up questions from what Claude found

Skills Demonstrated
What this project proves I can build
Each capability is live and observable in the deployed tool — not a claim, a demonstration.
Claude API — long-document processing
Sending up to 150,000 characters of unstructured PDF text to Claude and receiving validated structured output. Truncation logic appends a context note so Claude knows when content was cut.
Structured extraction via prompt engineering
Strict JSON schema output with no prose or markdown fences. The same parseJsonOutput() sanitisation pattern applied consistently across all 5 portfolio projects.
Client-side PDF parsing with PDF.js
Text extracted entirely in the browser via Mozilla’s PDF.js — looping pages with page.getTextContent(). No file ever reaches a server. Stateless, private, and fast.
Grounding enforcement — zero hallucination
Chat system prompt explicitly prohibits Claude from inventing information not in the document. The most critical production constraint for document Q&A — failing here destroys user trust.
Dual-mode serverless proxy
One Netlify function handles both analyze and chat modes, routing on the mode field. ANTHROPIC_API_KEY stays server-side. Zero client-side exposure.
Prompt caching on repeated Q&A
Document text injected as a cached prefix using cache_control: ephemeral. Repeated follow-up questions don’t re-process the full document — latency and cost stay low.

Architecture
Three layers. No database. Entirely stateless.
The document lives only in the user’s browser session. Only extracted text reaches Netlify — no file storage, no persistence.
Browser

index.html — PDF.js extracts text client-side
All pages looped and concatenated. Character count measured. Truncated to 150K chars if needed.
↓  POST { mode, documentText, fileName }
Netlify

chat.js — dual-mode serverless function
analyze: extraction prompt → Claude → parse JSON.  chat: document context → grounded Q&A with history.
↓  Anthropic API call with ANTHROPIC_API_KEY
Claude API

claude-sonnet-4-6 — 200K token context
Returns strict 9-field JSON (analyze) or grounded answers (chat). Prompt caching via cache_control: ephemeral.
↓  Parsed JSON → rendered cards + chat interface
Rendered

9 extraction cards + confidence badge + chat
Each field in its own card. Green / amber / red confidence badge. Suggestion chips auto-fill the chat input.
Frontend
Vanilla JS + HTML + CSS — single file, no framework.
Proxy
Netlify Serverless Function. API key server-side. Same pattern across all 5 projects.
Hosting
Hostinger subdomain. GitHub → Netlify auto-deploy. Live site ↗

Results
Built to spec. Deployed. Measurable.
All success criteria met. Three sample documents preloaded — visitors test immediately without uploading anything.

50+

Pages handled reliably

90%

Field extraction accuracy

<15s

Full analysis, 10-page doc

0

API keys exposed client-side

Sample 01 — Service AgreementParties, dates, payment terms, termination clause

All key fields extracted across a 2-page fictional contract. Effective date, notice period, and liability cap all identified.
Sample 02 — Tax InvoiceLine items, totals, due date, payment terms

Amounts array populated with line-item breakdown. Payment due date parsed and labeled. Invoice number and parties both extracted.
Sample 03 — Remote Work PolicyObligations, effective date, compliance rules

Obligations listed per staff category. Effective and review dates captured. Clauses summarised in plain language.

Use Cases
Who benefits from this
Best fit for organisations handling documents at volume without dedicated review resources. Can be customised or white-labelled for any of these verticals.
Law firms & legal teams
Junior associates get a structured first-pass extraction before escalating to senior review. Consistent format every time, regardless of document complexity.
HR departments
Extract key terms from vendor policies, employment contracts, and compliance documents at volume. Obligations and effective dates surfaced automatically.
Real estate agencies
Key dates, obligations, and liability terms from lease agreements and property contracts. Chat interface lets agents ask specific questions without re-reading the full document.
Procurement & business owners
Payment terms, SLAs, and liability caps from vendor contracts — structured and readable before signing. No lawyer required for routine agreement review.

Work with Cenred

Want this built for your team?
I build custom document intelligence tools, AI automation pipelines, and API integrations. Every project deployed, documented, and delivered end-to-end.
Davao, Philippines  ·  Available for freelance projects  ·  cenredportfolio.bizguro.net