Debugging retrieval quality#
How to debug retrieval quality#
You are on this page because your root cause analysis said that improving retrieval a the root cause to address.
Retrieval quality is arguably the most important component of a RAG application. If the most relevant chunks are not returned for a given query, the LLM will not have access to the necessary information to generate a high-quality response. Poor retrieval can thus lead to irrelevant, incomplete, or hallucinated output. This step requires manual effort to analyze the underlying data. With Mosaic AI, this becomes considerably easier given the tight integration between the data platform (Unity Catalog and Vector Search), and experiment tracking (MLflow LLM evaluation and MLflow tracing).
Instructions#
Here’s a step-by-step process to address retrieval quality issues:
Open the
05_evaluate_poc_quality
NotebookUse the queries to load MLflow traces of the records that retrieval quality issues.
For each record, manually examine the retrieved chunks. If available, compare them to the ground-truth retrieval documents.
Look for patterns or common issues among the queries with low retrieval quality. Some examples might include:
Relevant information is missing from the vector database entirely
Insufficient number of chunks/documents returned for a retrieval query
Chunks are too small and lack sufficient context
Chunks are too large and contain multiple, unrelated topics
The embedding model fails to capture semantic similarity for domain-specific terms
Based on the identified issue, hypothesize potential root causes and corresponding fixes. See the Common reasons for poor retrieval quality table below for guidance on this.
Follow the steps in implement and evaluate changes to implement and evaluate a potential fix.
This may involve modifying the data pipeline (e.g., adjusting chunk size, trying a different embedding model) or modifying the RAG chain (e.g., implementing hybrid search, retrieving more chunks).
If retrieval quality is still not satisfactory, repeat steps 4-5 for the next most promising fixes until the desired performance is achieved.
Re-run the root cause analysis to determine if the overall chain has any additional root causes that should be addressed.
Common reasons for poor retrieval quality#
Each of these potential fixes are can be broadly categorized into three buckets:
changes
changes
changes
Based on the type of change, you will follow different steps in the implement and evaluate changes step.
Retrieval Issue | Debugging Steps | Potential Fix |
---|---|---|
Chunks are too small |
|
|
Chunks are too large |
|
|
Chunks don't have enough information about the text from which they were taken |
|
|
Embedding model doesn't accurately understand the domain and/or key phrases in user queries |
|
|
Relevant information missing from the vector database |
|
|
Retrieval queries are poorly formulated |
|
|