project-2-document-intelligence

AI Document Analysis

Turn any PDF into
structured intelligence.

Upload a contract, invoice, policy or report. Claude extracts parties, dates, amounts and obligations — then answers your questions about it.

PDF text extracted in your browser via PDF.js · API key secured server-side via Netlify · Zero file uploads — only text is transmitted

Portfolio Project 2 / 5
Claude API · PDF.js · Netlify
Vanilla JS · Single-file frontend
Live & deployed

▶ View Live Demo

↗ GitHub Repo

The Problem

Professionals read entire documents
to find 3–4 facts.

Legal, HR, finance, and procurement teams spend hours in documents just to locate specific information. The reading is mechanical — the AI does it better.

No consistent extraction format

The same contract type gets reviewed differently by every person who touches it

Generic AI tools require copy-paste

Manual text transfer into chatbots and unstructured freeform responses back

Junior staff miss key clauses

Review quality depends entirely on the reviewer’s experience — no structured fallback

Business owners receive contracts blind

Can’t afford a lawyer review for every vendor relationship but still need to understand what they’re signing

The Solution

Upload. Extract. Ask.
Three steps, seconds not hours.

A single-file web app powered by Claude’s 200K context window. No login, no server file storage, no manual copy-paste.

Upload or pick a sample document

Drag and drop any PDF — contract, invoice, HR policy, research paper — up to 10MB. Or try one of 3 preloaded samples instantly. PDF.js extracts all text client-side. Nothing is uploaded to any server.

Claude returns 9 structured fields as strict JSON

Document type, parties, key dates, amounts, obligations, important clauses, confidence rating, and 4 document-specific follow-up questions — each rendered as its own card with a confidence badge.

Ask specific questions in the chat interface

Claude answers from the document only — grounded by the system prompt so it cannot hallucinate. Every answer is anchored to what the document actually says. Suggestion chips auto-fill the input.

Extraction Schema — 9 fields · strict JSONNo prose · No markdown fences

documentTypestringe.g. “Service Agreement”, “Invoice”, “HR Policy”

summarystring5–7 sentence plain-language summary, no jargon

parties[]string[]All named parties, companies, and signatories

keyDates[]{label, date}Effective date, expiry, payment due, review date

amounts[]{label, value}Contract value, invoice total, fees, penalties

obligations[]string[]Key responsibilities and obligations per party

clauses[]{name, summary}Important clauses with a one-sentence summary each

confidenceenumhigh / medium / low — Claude’s self-assessed extraction confidence

suggestions[]string[]4 document-specific follow-up questions from what Claude found

Skills Demonstrated

What this project proves I can build

Each capability is live and observable in the deployed tool — not a claim, a demonstration.

Claude API — long-document processing

Sending up to 150,000 characters of unstructured PDF text to Claude and receiving validated structured output. Truncation logic appends a context note so Claude knows when content was cut.

Structured extraction via prompt engineering

Strict JSON schema output with no prose or markdown fences. The same parseJsonOutput() sanitisation pattern applied consistently across all 5 portfolio projects.

Client-side PDF parsing with PDF.js

Text extracted entirely in the browser via Mozilla’s PDF.js — looping pages with page.getTextContent(). No file ever reaches a server. Stateless, private, and fast.

Grounding enforcement — zero hallucination

Chat system prompt explicitly prohibits Claude from inventing information not in the document. The most critical production constraint for document Q&A — failing here destroys user trust.

Dual-mode serverless proxy

One Netlify function handles both analyze and chat modes, routing on the mode field. ANTHROPIC_API_KEY stays server-side. Zero client-side exposure.

Prompt caching on repeated Q&A

Document text injected as a cached prefix using cache_control: ephemeral. Repeated follow-up questions don’t re-process the full document — latency and cost stay low.

Architecture

Three layers. No database. Entirely stateless.

The document lives only in the user’s browser session. Only extracted text reaches Netlify — no file storage, no persistence.

Browser

index.html — PDF.js extracts text client-side

All pages looped and concatenated. Character count measured. Truncated to 150K chars if needed.

↓ POST { mode, documentText, fileName }

Netlify

chat.js — dual-mode serverless function

analyze: extraction prompt → Claude → parse JSON. chat: document context → grounded Q&A with history.

↓ Anthropic API call with ANTHROPIC_API_KEY

Claude API

claude-sonnet-4-6 — 200K token context

Returns strict 9-field JSON (analyze) or grounded answers (chat). Prompt caching via cache_control: ephemeral.

↓ Parsed JSON → rendered cards + chat interface

Rendered

9 extraction cards + confidence badge + chat

Each field in its own card. Green / amber / red confidence badge. Suggestion chips auto-fill the chat input.

Frontend

Vanilla JS + HTML + CSS — single file, no framework.

Proxy

Netlify Serverless Function. API key server-side. Same pattern across all 5 projects.

Hosting

Hostinger subdomain. GitHub → Netlify auto-deploy. Live site ↗

Results

Built to spec. Deployed. Measurable.

All success criteria met. Three sample documents preloaded — visitors test immediately without uploading anything.

50+

Pages handled reliably

90%

Field extraction accuracy

<15s

Full analysis, 10-page doc

API keys exposed client-side

Sample 01 — Service AgreementParties, dates, payment terms, termination clause

All key fields extracted across a 2-page fictional contract. Effective date, notice period, and liability cap all identified.

Sample 02 — Tax InvoiceLine items, totals, due date, payment terms

Amounts array populated with line-item breakdown. Payment due date parsed and labeled. Invoice number and parties both extracted.

Sample 03 — Remote Work PolicyObligations, effective date, compliance rules

Obligations listed per staff category. Effective and review dates captured. Clauses summarised in plain language.

↗ document-intelligence-tool.netlify.app
↗ GitHub: Cenred-Document-Intelligence-Tool

Use Cases

Who benefits from this

Best fit for organisations handling documents at volume without dedicated review resources. Can be customised or white-labelled for any of these verticals.

Law firms & legal teams

Junior associates get a structured first-pass extraction before escalating to senior review. Consistent format every time, regardless of document complexity.

HR departments

Extract key terms from vendor policies, employment contracts, and compliance documents at volume. Obligations and effective dates surfaced automatically.

Real estate agencies

Key dates, obligations, and liability terms from lease agreements and property contracts. Chat interface lets agents ask specific questions without re-reading the full document.

Procurement & business owners

Payment terms, SLAs, and liability caps from vendor contracts — structured and readable before signing. No lawyer required for routine agreement review.

Work with Cenred

Want this built for your team?

I build custom document intelligence tools, AI automation pipelines, and API integrations. Every project deployed, documented, and delivered end-to-end.

✉ Get in touch

← All projects

Davao, Philippines · Available for freelance projects · cenredportfolio.bizguro.net