Skip to main content

Overview

Retrieval-Augmented Generation (RAG) is a common AI pattern that combines document retrieval with LLM generation. Tracing RAG pipelines helps you debug retrieval quality and generation issues.

Basic RAG with Tracing

Here’s a complete RAG pipeline with comprehensive tracing:
from artanis import Artanis

artanis = Artanis(api_key="sk_...")

def answer_with_rag(question: str) -> str:
    trace = artanis.trace("rag-answer")

    # Capture the document corpus state for replay
    trace.state("corpus", [doc.id for doc in document_store.list()])

    # Retrieval - find relevant chunks
    chunks = retriever.search(question, top_k=5)

    # Store what was retrieved (with scores for debugging)
    trace.state("chunks", [
        {"id": c.id, "score": c.score, "text": c.text[:200]}
        for c in chunks
    ])

    # Generate
    prompt = build_prompt(question, chunks)
    trace.input(prompt=prompt, model="gpt-5.1")

    response = llm.generate(prompt)
    trace.output(response)

    return response

RAG with Reranking

Improve retrieval quality with two-stage retrieval:
def answer_with_rerank(question: str) -> str:
    trace = artanis.trace("rag-with-rerank")

    # Capture state
    trace.state("corpus", [doc.id for doc in document_store.list()])
    trace.state("config", {
        "retriever": "bm25",
        "reranker": "cross-encoder",
        "model": "gpt-5.1"
    })

    # Initial retrieval
    chunks = retriever.search(question, top_k=20)  # Get more candidates

    # Rerank
    ranked = reranker.rank(chunks, question, algorithm="cross-encoder")
    top_chunks = ranked[:5]

    trace.state("reranked_chunks", [
        {"id": c.id, "score": c.score, "original_rank": i}
        for i, c in enumerate(top_chunks)
    ])

    # Generate
    prompt = build_prompt(question, top_chunks)
    trace.input(prompt=prompt, model="gpt-5.1")

    response = llm.generate(prompt)
    trace.output(response)

    return response

Hybrid RAG (BM25 + Vector)

Combine multiple retrieval strategies:
def hybrid_rag_answer(question: str) -> str:
    trace = artanis.trace("hybrid-rag")

    trace.state("corpus", [doc.id for doc in document_store.list()])

    # BM25 retrieval
    bm25_results = bm25_retriever.search(question, top_k=10)

    # Vector search
    vector_results = vector_retriever.search(question, top_k=10)

    # Merge results
    merged = merge_results(bm25_results, vector_results)
    trace.state("merged_chunks", [
        {"id": c.id, "bm25_score": c.bm25_score, "vector_score": c.vector_score}
        for c in merged[:5]
    ])

    # Generate
    prompt = build_prompt(question, merged[:5])
    trace.input(prompt=prompt, model="gpt-5.1")

    response = llm.generate(prompt)
    trace.output(response)

    return response

What to Trace

Capture which documents were available at inference time:
trace.state("corpus", [doc.id for doc in document_store.list()])
Store what the retriever found:
trace.state("chunks", [
    {"id": c.id, "score": c.score, "text": c.text[:200]}
    for c in chunks
])
Record what you send to the LLM:
trace.input({
    "prompt": prompt,
    "model": "gpt-5.1",
    "temperature": 0.7,
    "max_tokens": 500
})
Capture settings that affect behavior:
trace.state("config", {
    "retriever": "bm25",
    "top_k": 5,
    "rerank": True,
    "model": "gpt-5.1"
})

Debugging RAG with Traces

Use traces to debug common RAG issues:
IssueWhat to Check in Trace
Wrong answerCheck retrieved_chunks - are the right docs there?
HallucinationCheck prompt - is context actually included?
Low qualityCheck retrieval_scores - are scores too low?
InconsistentCheck corpus state - did docs change?

Next Steps