Kerdos AI API Reference
A stateful, session-based REST API for document Q&A. Upload any document, build a FAISS vector index, and query it with LLaMA 3.1 8B — all in three HTTP calls.
All Endpoints
/sessionsCreate a new isolated session. Returns a session_id.
Response
{ "session_id": "uuid-v4" }/sessions/{id}Get the status and metadata of an existing session.
Response
{ "session_id": "...", "created_at": "...", "doc_count": 2 }/sessions/{id}Delete the session and free all in-memory resources (FAISS index + documents).
Response
{ "deleted": true }/sessions/{id}/documentsUpload and index one or more files. Supported: PDF, DOCX, TXT, MD, CSV (max 50 MB each).
Body: multipart/form-data — field: files (one or more)
Response
{ "indexed": ["report.pdf", "policy.docx"] }/sessions/{id}/chathf_tokenAsk a question. Returns a grounded answer generated only from the uploaded documents. Optionally pass hf_token to use production LLaMA inference.
Body: { "question": "string", "hf_token": "hf_..." (optional) }
Response
{ "answer": "Based on the document..." }/sessions/{id}/historyClear the conversation history for a session (keeps the indexed documents).
Response
{ "cleared": true }/healthHealth check. Returns API status and version.
Response
{ "status": "ok", "version": "1.0.0" }Typical Integration Workflow
Create session → upload document → ask question → delete session. Three steps, three endpoints.
BASE="https://kerdosdotio-kerdos-llm-rag-api.hf.space"
# 1. Create session
SESSION=$(curl -s -X POST $BASE/sessions | jq -r '.session_id')
echo "Session: $SESSION"
# 2. Upload a document
curl -X POST "$BASE/sessions/$SESSION/documents" \
-F "files=@your_doc.pdf"
# 3. Ask a question
curl -X POST "$BASE/sessions/$SESSION/chat" \
-H "Content-Type: application/json" \
-d '{"question": "Summarise this document", "hf_token": "hf_..."}'
# 4. Delete session
curl -X DELETE "$BASE/sessions/$SESSION"import requests
BASE = "https://kerdosdotio-kerdos-llm-rag-api.hf.space"
# 1. Create session
session_id = requests.post(f"{BASE}/sessions").json()["session_id"]
print(f"Session: {session_id}")
# 2. Upload documents
with open("your_doc.pdf", "rb") as f:
requests.post(
f"{BASE}/sessions/{session_id}/documents",
files={"files": ("your_doc.pdf", f, "application/pdf")},
)
# 3. Ask a question
resp = requests.post(
f"{BASE}/sessions/{session_id}/chat",
json={"question": "Summarise this document", "hf_token": "hf_..."},
)
print(resp.json()["answer"])
# 4. Clean up
requests.delete(f"{BASE}/sessions/{session_id}")const BASE = "https://kerdosdotio-kerdos-llm-rag-api.hf.space";
// 1. Create session
const { session_id } = await fetch(`${BASE}/sessions`, {
method: "POST",
}).then((r) => r.json());
// 2. Upload a document
const form = new FormData();
form.append("files", fileInput.files[0]);
await fetch(`${BASE}/sessions/${session_id}/documents`, {
method: "POST",
body: form,
});
// 3. Ask a question
const { answer } = await fetch(`${BASE}/sessions/${session_id}/chat`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
question: "Summarise this document",
hf_token: process.env.HF_TOKEN, // keep server-side!
}),
}).then((r) => r.json());
console.log(answer);
// 4. Delete session
await fetch(`${BASE}/sessions/${session_id}`, { method: "DELETE" });Authentication & Environment Variables
HuggingFace Token (hf_token)
Required only for the POST /sessions/{id}/chat endpoint. The token is passed in the JSON request body — never as a header.
meta-llama/Llama-3.1-8B-Instruct (accept the licence on HuggingFace)| Variable | Default | Description |
|---|---|---|
| HF_TOKEN | — | HuggingFace token for LLaMA inference |
| SESSION_TTL_MINUTES | 60 | Session auto-expiry in minutes |
| MAX_UPLOAD_MB | 50 | Max file size per upload in MB |
Security Best Practice
Never expose hf_token in client-side code. Use a Next.js API route or server action as a proxy to keep your token server-side. For enterprise deployments, replace hf_token with your own OAuth2 / API key system.
Ready to integrate?
Try the demo now or contact us for a private, on-premise enterprise deployment with your own models and infrastructure.