Most enterprise AI pilots fail because data leaves the organisation. We built Kerdos AI — a fully private, RAG-powered document Q&A system — using open-source tools that run entirely within your environment.
The Problem: Enterprise AI Without Privacy
Most enterprise AI demos look impressive until a CTO asks: "Where does our data go?" Sending confidential documents to OpenAI, Anthropic, or any external API is a non-starter for most regulated organisations. Legal contracts, HR policies, financial models — these documents cannot leave the building.
We built Kerdos AI to solve exactly this problem. It's a Retrieval-Augmented Generation (RAG) system that runs entirely within your environment, grounded strictly in documents you upload, with zero external data transfer in the enterprise edition.
What is RAG and Why It Matters
RAG (Retrieval-Augmented Generation) combines a vector database for retrieval with a language model for generation. Instead of asking the LLM to memorise your proprietary information, RAG retrieves the most relevant document chunks at query time and injects them into the prompt context. The key advantage: the LLM's job shifts from "know everything" to "reason about what I'm given."
- Grounded answers: Every response is backed by retrievable source chunks
- No hallucination on your domain: The model can only say what your documents say
- Updateable without retraining: Add a new policy doc and it's instantly queryable
The Architecture
Our pipeline has six stages:
- Document Parsing: PyMuPDF for PDFs, python-docx for Word files, plain parsers for TXT/MD/CSV
- Text Chunking: 512-character chunks with 64-character overlap to preserve context at boundaries
- Embedding: sentence-transformers/all-MiniLM-L6-v2 — a fast, CPU-friendly model producing 384-dimensional dense vectors
- Indexing: FAISS in-memory flat L2 index for the demo; IVF indexes for enterprise scale
- Retrieval: Cosine similarity search returning the Top-K most relevant chunks
- Generation: meta-llama/Llama-3.1-8B-Instruct receives only the retrieved chunks and produces a grounded, cited answer
Why LLaMA 3.1 and Not GPT-4?
Three reasons: Deployability — LLaMA 3.1 can run entirely on-premise; Licensing — Meta's license permits commercial use for enterprises under 700M MAU; Performance — LLaMA 3.1 8B Instruct scores within 10-15% of GPT-4 on document Q&A benchmarks.
The Demo vs. The Enterprise Edition
The public demo on Hugging Face Spaces uses the HuggingFace Inference API (data passes through HF's infrastructure). The enterprise edition replaces this with a self-hosted LLaMA 3.1 instance (vLLM or Ollama), a persistent FAISS/Milvus index with authentication, optional domain fine-tuning, and white-label branding.
Try It Yourself
The demo is live at kerdosdotio/Custom-LLM-Chat on Hugging Face Spaces. For enterprise deployment, partnerships, or investment: partnership@kerdos.in
Dr. Johnson leads AI research and implementation at Kerdos Infrasoft, specializing in healthcare AI and machine learning applications with over 12 years of experience.