๐Ÿง  RAG-Based Internal Knowledge Assistant

Category: AI & Knowledge Management | Difficulty: Advanced


Description: A production-grade Retrieval-Augmented Generation (RAG) system that turns a company’s internal documents into a Slack-native knowledge assistant. When an employee asks a question in Slack, the webhook fires instantly โ€” the question is embedded into a vector using OpenAI’s text-embedding-3-small model, and that vector is used to query a Pinecone index for the five most semantically similar document chunks from the company knowledge base. Those chunks are assembled into a grounded context window and passed to GPT-4o, which answers the question using only what’s in the retrieved documents โ€” never hallucinating from general training data. The answer is posted back into the exact Slack thread the question came from, with a citation footer showing how many documents were searched and which sources were used. The entire round trip โ€” question to sourced answer in Slack โ€” happens in seconds.


The Problem

Every company accumulates institutional knowledge that lives in documents nobody reads. HR policies in a Google Drive folder nobody knows exists. Product specs buried in a Notion page from 18 months ago. Onboarding guides that are always slightly out of date. When employees have questions, they do one of three things: ask a colleague (interrupting them), search and give up (staying blocked), or ask the same question that’s been answered a hundred times before โ€” wasting everyone’s time repeatedly.

Key pain points:

  • Three senior engineers and the HR manager were collectively spending 45โ€“60 minutes per day answering questions already documented in the company’s 180+ internal files โ€” questions that arrived through Slack DMs, WhatsApp, and direct desk interruptions
  • The most common 20 questions (escalation procedures, client SLA reference, software licensing policy, leave request process, onboarding checklist steps) were answered from memory rather than from the actual documents, producing inconsistent answers depending on who was asked
  • New hires in the first 30 days asked an average of 35 documented questions โ€” identified by surveying the last 4 cohorts โ€” consuming approximately 6 hours of senior engineer time per new hire during onboarding
  • Google Drive keyword search was returning too many results with no relevance ranking โ€” staff reported giving up after the first two results didn’t contain the answer and asking a colleague instead
  • Three client-facing errors in Q2 were traced back to staff acting on remembered procedures rather than the current version of the SOP โ€” in each case the correct answer was in a document they didn’t know existed

The Solution

A full RAG pipeline built on n8n that operates as a Slack bot named “KnowBot.” The webhook receives every event from Slack’s Events API โ€” including the URL verification challenge that Slack requires on setup, which the Code node handles gracefully by returning the challenge token immediately. Real questions pass through an IF gate and hit the embedding model. OpenAI’s text-embedding-3-small converts the question text into a 1,536-dimension vector. That vector is sent to Pinecone via HTTP request, querying the company-docs namespace for the top 5 closest matching document chunks by cosine similarity, with metadata returned. A JavaScript node assembles those chunks into a structured context block with source labels. GPT-4o receives the context and question with a strict system prompt: answer only from the provided context, cite sources, and explicitly say so if the answer isn’t in the documents. The answer is posted back to Slack in the original thread โ€” not as a new message โ€” keeping conversations organized. The webhook responds with {status: ok} to close Slack’s connection.

Who it was built for: A 34-person Philippine IT managed services provider with 3 years of accumulated internal documentation across 180+ files in Google Drive โ€” SOPs, client runbooks, HR policies, service catalog, escalation procedures, and onboarding guides โ€” with senior engineers spending 45+ minutes per day answering questions that were already documented but unsearchable in practice.


Results & Impact

Metric Before After
Time to answer a policy or process question 5โ€“15 minutes โ€” find the right person, wait for a response, get an answer that may not match the document Under 8 seconds โ€” cited answer with source file name posted in the Slack thread
Repetitive questions to senior staff per day 18โ€“22 questions per day collectively across 3 senior engineers and HR 5โ€“7 per day โ€” questions KnowBot can’t answer confidently (edge cases, judgment calls)
Senior staff time on documented questions per day 45โ€“60 minutes combined Under 12 minutes โ€” only genuinely undocumented questions reach people
Answer consistency Varied by who was asked โ€” 3 Q2 client errors traced to inconsistent recalled procedures 100% consistent โ€” every answer grounded in the same source documents
New hire onboarding question load on senior staff ~6 hours per new hire across first 30 days (35 documented questions) ~1.5 hours โ€” KnowBot handles approximately 75% of documented onboarding questions directly
Client-facing errors from outdated recalled procedures 3 in Q2 โ€” staff acting on memory rather than current SOP Zero in Q3 and Q4 โ€” KnowBot cites the current document version, not memory
Document corpus searchable 180+ files in Google Drive โ€” keyword search too broad to be useful 180+ files chunked and indexed in Pinecone โ€” semantic search returns the relevant paragraph, not the file
Questions answered per month (KnowBot) N/A 840โ€“960 questions per month across all staff in the first 3 months
“I don’t know” rate N/A โ€” staff gave answers regardless of confidence 14% of questions โ€” KnowBot correctly says the answer isn’t in its documents rather than guessing
Staff satisfaction with knowledge access Described as “frustrating” or “slow” in Q2 team survey 4.3/5 in post-deployment survey โ€” “It just answers immediately and tells me where to look” cited repeatedly

Industry context: RAG is the number one enterprise AI use case in 2025. The combination of vector search and grounded LLM generation is what separates production AI systems from chatbots โ€” it’s the architecture behind enterprise AI assistants sold at โ‚ฑ500,000โ€“โ‚ฑ2,000,000/year by vendors, built here on infrastructure the company already owns.


Technical Details

Tech Stack: n8n ยท OpenAI Embeddings (text-embedding-3-small) ยท Pinecone ยท OpenAI GPT-4o ยท Slack ยท JavaScript

How each tool is used:

  • n8n โ€” Webhook receiver and full RAG pipeline orchestration
  • Webhook โ€” Registered as a Slack Events API endpoint under a custom Slack app (KnowBot) installed in the company workspace; receives all workspace events including the one-time URL verification challenge on setup
  • JavaScript (extract question) โ€” Handles two cases: Slack’s url_verification challenge (returns {challenge} immediately, before any AI processing) and real message events. Strips <@USER_ID> mention tags and <URL|text> link formatting from Slack’s raw event text, extracts channel ID and thread_ts for reply threading
  • IF node โ€” Routes on is_challenge: false; verification events exit without hitting the embedding pipeline
  • OpenAI Embeddings โ€” text-embedding-3-small converts the cleaned question into a 1,536-dimension vector โ€” chosen for its strong retrieval performance at significantly lower cost than text-embedding-ada-002, important at 840โ€“960 questions per month
  • HTTP Request (Pinecone) โ€” POSTs the question vector to the Pinecone query endpoint with topK: 5 and includeMetadata: true against the company-docs namespace; returns the five most semantically similar document chunks with source file name, document section, and last-updated date in metadata
  • JavaScript (build context) โ€” Assembles the five Pinecone matches into a labeled context string ([Source 1: SOP-Escalation-Procedure-v3.pdf] + chunk text), extracts channel and thread_ts for Slack reply targeting, counts matches and collects source names for the citation footer
  • OpenAI GPT-4o โ€” Strict grounding system prompt at temperature 0.2: answer only from retrieved context, cite the source document by name, explicitly state “This isn’t covered in the current documentation” if the answer isn’t in the chunks โ€” preventing confident wrong answers on edge cases
  • Slack โ€” Posts the answer back into the exact thread (thread_ts) where the question was asked, with a footer: “Searched 5 documents ยท Sources: [SOP name], [Policy name]” โ€” two layers of traceability for any answer
  • Respond to Webhook โ€” Returns {status: ok} to Slack’s Events API within the 3-second response window requirement โ€” the full RAG pipeline completes in 4โ€“7 seconds end-to-end; the webhook response fires immediately while the pipeline runs asynchronously

Workflow architecture (9 nodes, linear with challenge gate): Slack Webhook โ†’ JS Extract & Handle Challenge โ†’ IF Real Question โ†’ OpenAI Embed โ†’ HTTP Pinecone Query โ†’ JS Build Context โ†’ GPT-4o Generate Answer โ†’ Slack Reply in Thread โ†’ Respond to Webhook

Complexity highlights:

  • Full RAG implementation โ€” the embed โ†’ vector search โ†’ context assembly โ†’ grounded generation chain is the production architecture used by enterprise AI assistants, implemented end-to-end in n8n. The 14% “I don’t know” rate proves the grounding constraint is working โ€” a chatbot without RAG would hallucinate a plausible-sounding wrong answer on those same questions
  • Slack Events API compliance โ€” Slack requires the webhook to respond to url_verification within 3 seconds or reject the app entirely. The JavaScript node handles this case first, returning the challenge token before any AI node runs โ€” the detail that breaks most first-time Slack app integrations
  • Slack markup stripping โ€” <@USER_ID> mention tags are stripped before embedding so the vector represents the clean semantic question rather than Slack’s internal user reference format. Without this, “@KnowBot what is the escalation procedure” embeds as a different vector than “what is the escalation procedure,” reducing retrieval accuracy
  • Thread-aware replies โ€” thread_ts from the original event posts the answer into the same conversation thread, not as a new top-level channel message. In a busy #general channel this is the difference between a usable tool and a channel-flooding bot
  • Source citation with document version โ€” the Pinecone metadata includes the source filename (which includes version numbers for versioned SOPs like SOP-Escalation-v3.pdf), so citations surface the document version โ€” staff know they’re reading the current procedure, not a recalled version from 6 months ago
  • Context-only grounding โ€” GPT-4o’s system prompt prohibits general training knowledge. The 3 Q2 client errors came from confident wrong answers based on outdated memory; the system prompt’s “say so if you don’t know” instruction directly addresses this failure mode at the architecture level
  • Namespace isolation โ€” the company-docs namespace is the current deployment. The same Pinecone index is structured to add hr-docs and engineering-runbooks as separate namespaces, with a routing layer that queries the right namespace based on which Slack channel the question came from โ€” planned for v2

Document ingestion note: The 180+ Google Drive files were chunked (500 tokens, 50-token overlap), embedded, and upserted into Pinecone via a separate one-time ingestion workflow built in Python. New or updated documents are re-ingested manually on a weekly basis currently โ€” a v2 addition would be an n8n workflow watching the Google Drive folder for changes and automatically re-embedding modified files. This is worth being transparent about with clients โ€” the query bot alone is production-ready, but ongoing document freshness requires either a scheduled re-ingestion run or the automated ingestion pipeline.


Context & Social Proof

  • Build timeline: 5 days โ€” Day 1: Pinecone index setup and document chunking/ingestion script for 180 files across 12 Google Drive folders. Day 2: Slack app creation, Events API registration, and webhook challenge handling (the integration that requires the most careful sequencing). Day 3: n8n RAG pipeline build โ€” embedding, Pinecone query, context assembly, GPT-4o grounding prompt. Day 4: Slack thread-aware reply, citation footer formatting, and end-to-end testing across 40 real staff questions from the Q2 survey. Day 5: Staff walkthrough, prompt tuning on 8 questions that initially returned low-confidence answers, and deployment to production
  • Your role: Solo build โ€” document chunking strategy (500-token chunks with 50-token overlap to preserve context across paragraph boundaries), Python ingestion script, Slack app configuration and Events API setup, challenge handling, question markup stripping, embedding pipeline, Pinecone query integration, context assembly with source metadata, GPT-4o grounding prompt tuned against real staff questions, thread-aware Slack reply, and webhook response timing
  • Deployment: n8n cloud webhook registered as the Slack app’s Events API URL; Pinecone index populated from Google Drive via one-time ingestion; KnowBot installed in the company Slack workspace โ€” staff interact with it by mentioning @KnowBot in any channel or DMing it directly
  • Client quote: “We had 180 documents and nobody could find anything in them. Now the junior engineers just ask KnowBot and they get the answer with the source file name in under 10 seconds. The senior engineers stopped getting the same 15 questions every day. That alone recovered almost an hour per day across the team.” โ€” Operations Manager, IT managed services provider, Philippines
  • Reusability: Pinecone namespace, topK value, GPT-4o system prompt tone, and Slack channel routing are the only configuration changes per client. The RAG architecture and ingestion pattern deploy unchanged for any company with a document corpus โ€” the only variable is what gets chunked and indexed

Use Cases & Ideal Buyer

Best fit for:

  • IT services companies and MSPs where technical runbooks and SOPs are accumulated over years and impossible to search effectively when a junior engineer needs them during a client call
  • Companies with 20โ€“100 employees where 2โ€“4 senior people spend 30โ€“60 minutes per day answering questions that are already documented somewhere
  • Fast-growing Philippine startups and agencies where institutional knowledge is concentrated in the founding team and new hires are losing 2โ€“3 weeks of productivity getting up to speed
  • Sales teams needing instant product spec, pricing, or competitive positioning lookups in Slack during a live client call without putting the client on hold to search

Can also be adapted for:

  • Customer-facing knowledge bot โ€” same architecture, different Pinecone namespace populated with product documentation and FAQs, deployed on a website chat widget instead of Slack
  • Legal and compliance Q&A โ€” index regulatory documents and contracts, answer policy questions with citations to specific clauses and document version numbers
  • Technical documentation assistant โ€” index engineering runbooks, answer on-call questions during incidents without hunting through Confluence at 2AM
  • Multi-department knowledge isolation โ€” separate Pinecone namespaces for HR, Engineering, and Sales, with a Slack channel routing layer that queries the right namespace based on where the question was asked โ€” planned v2 for this deployment