|
AI Document Analysis
Turn any PDF into
structured intelligence. Upload a contract, invoice, policy or report. Claude extracts parties, dates, amounts and obligations — then answers your questions about it.
PDF text extracted in your browser via PDF.js · API key secured server-side via Netlify · Zero file uploads — only text is transmitted
Portfolio Project 2 / 5
Claude API · PDF.js · Netlify Vanilla JS · Single-file frontend Live & deployed |
|
The Problem
Professionals read entire documents
to find 3–4 facts. Legal, HR, finance, and procurement teams spend hours in documents just to locate specific information. The reading is mechanical — the AI does it better.
✕
No consistent extraction format
The same contract type gets reviewed differently by every person who touches it
✕
Generic AI tools require copy-paste
Manual text transfer into chatbots and unstructured freeform responses back
✕
Junior staff miss key clauses
Review quality depends entirely on the reviewer’s experience — no structured fallback
✕
Business owners receive contracts blind
Can’t afford a lawyer review for every vendor relationship but still need to understand what they’re signing
|
|
The Solution
Upload. Extract. Ask.
Three steps, seconds not hours. A single-file web app powered by Claude’s 200K context window. No login, no server file storage, no manual copy-paste.
1
Upload or pick a sample document
Drag and drop any PDF — contract, invoice, HR policy, research paper — up to 10MB. Or try one of 3 preloaded samples instantly. PDF.js extracts all text client-side. Nothing is uploaded to any server.
2
Claude returns 9 structured fields as strict JSON
Document type, parties, key dates, amounts, obligations, important clauses, confidence rating, and 4 document-specific follow-up questions — each rendered as its own card with a confidence badge.
3
Ask specific questions in the chat interface
Claude answers from the document only — grounded by the system prompt so it cannot hallucinate. Every answer is anchored to what the document actually says. Suggestion chips auto-fill the input.
Extraction Schema — 9 fields · strict JSONNo prose · No markdown fences
documentTypestringe.g. “Service Agreement”, “Invoice”, “HR Policy”
summarystring5–7 sentence plain-language summary, no jargon
parties[]string[]All named parties, companies, and signatories
keyDates[]{label, date}Effective date, expiry, payment due, review date
amounts[]{label, value}Contract value, invoice total, fees, penalties
obligations[]string[]Key responsibilities and obligations per party
clauses[]{name, summary}Important clauses with a one-sentence summary each
confidenceenumhigh / medium / low — Claude’s self-assessed extraction confidence
suggestions[]string[]4 document-specific follow-up questions from what Claude found
|
|
Skills Demonstrated
What this project proves I can build
Each capability is live and observable in the deployed tool — not a claim, a demonstration.
Claude API — long-document processing
Sending up to 150,000 characters of unstructured PDF text to Claude and receiving validated structured output. Truncation logic appends a context note so Claude knows when content was cut.
Structured extraction via prompt engineering
Strict JSON schema output with no prose or markdown fences. The same
parseJsonOutput() sanitisation pattern applied consistently across all 5 portfolio projects.Client-side PDF parsing with PDF.js
Text extracted entirely in the browser via Mozilla’s PDF.js — looping pages with
page.getTextContent(). No file ever reaches a server. Stateless, private, and fast.Grounding enforcement — zero hallucination
Chat system prompt explicitly prohibits Claude from inventing information not in the document. The most critical production constraint for document Q&A — failing here destroys user trust.
Dual-mode serverless proxy
One Netlify function handles both
analyze and chat modes, routing on the mode field. ANTHROPIC_API_KEY stays server-side. Zero client-side exposure.Prompt caching on repeated Q&A
Document text injected as a cached prefix using
cache_control: ephemeral. Repeated follow-up questions don’t re-process the full document — latency and cost stay low. |
|
Architecture
Three layers. No database. Entirely stateless.
The document lives only in the user’s browser session. Only extracted text reaches Netlify — no file storage, no persistence.
Browser
index.html — PDF.js extracts text client-side
All pages looped and concatenated. Character count measured. Truncated to 150K chars if needed.
↓ POST { mode, documentText, fileName }
Netlify
chat.js — dual-mode serverless function
analyze: extraction prompt → Claude → parse JSON. chat: document context → grounded Q&A with history.↓ Anthropic API call with ANTHROPIC_API_KEY
Claude API
claude-sonnet-4-6 — 200K token context
Returns strict 9-field JSON (analyze) or grounded answers (chat). Prompt caching via
cache_control: ephemeral.↓ Parsed JSON → rendered cards + chat interface
Rendered
9 extraction cards + confidence badge + chat
Each field in its own card. Green / amber / red confidence badge. Suggestion chips auto-fill the chat input.
|
|
Results
Built to spec. Deployed. Measurable.
All success criteria met. Three sample documents preloaded — visitors test immediately without uploading anything.
Sample 01 — Service AgreementParties, dates, payment terms, termination clause
All key fields extracted across a 2-page fictional contract. Effective date, notice period, and liability cap all identified.
Sample 02 — Tax InvoiceLine items, totals, due date, payment terms
Amounts array populated with line-item breakdown. Payment due date parsed and labeled. Invoice number and parties both extracted.
Sample 03 — Remote Work PolicyObligations, effective date, compliance rules
Obligations listed per staff category. Effective and review dates captured. Clauses summarised in plain language.
|
|
Use Cases
Who benefits from this
Best fit for organisations handling documents at volume without dedicated review resources. Can be customised or white-labelled for any of these verticals.
Law firms & legal teams
Junior associates get a structured first-pass extraction before escalating to senior review. Consistent format every time, regardless of document complexity.
HR departments
Extract key terms from vendor policies, employment contracts, and compliance documents at volume. Obligations and effective dates surfaced automatically.
Real estate agencies
Key dates, obligations, and liability terms from lease agreements and property contracts. Chat interface lets agents ask specific questions without re-reading the full document.
Procurement & business owners
Payment terms, SLAs, and liability caps from vendor contracts — structured and readable before signing. No lawyer required for routine agreement review.
|
|
Work with Cenred
Want this built for your team?
I build custom document intelligence tools, AI automation pipelines, and API integrations. Every project deployed, documented, and delivered end-to-end.
Davao, Philippines · Available for freelance projects · cenredportfolio.bizguro.net
|
