Prerequisite: Gather requirements#
Defining clear and comprehensive use case requirements is a critical first step in developing a successful RAG application. These requirements serve two primary purposes. Firstly, they help determine whether RAG is the most suitable approach for the given use case. If RAG is indeed a good fit, these requirements guide solution design, implementation, and evaluation decisions. Investing time at the outset of a project to gather detailed requirements can prevent significant challenges and setbacks later in the development process, and ensures that the resulting solution meets the needs of end-users and stakeholders. Well-defined requirements provide the foundation for the subsequent stages of the development lifecycle we'll walk through.
You can find all of the sample code referenced throughout this section here.
Is the use case a good fit for RAG?#
The first thing you’ll need to establish is whether RAG is even the right approach for your use case. Given the hype around RAG, it’s tempting to view it as a possible solution for any problem. However, there are nuances as to when RAG is suitable versus not.
RAG is a good fit when:
Reasoning over retrieved information (both unstructured and structured) that doesn’t entirely fit within the LLM’s context window
Synthesizing information from multiple sources (e.g., generating a summary of key points from different articles on a topic)
Dynamic retrieval based on a user query is necessary (e.g., given a user query, determine what data source to retrieve from)
The use case requires generating novel content based on retrieved information (e.g., answering questions, providing explanations, offering recommendations)
Conversely, RAG may not be the best fit when:
The task does not require query-specific retrieval. For example, generating call transcript summaries; even if individual transcripts are provided as context in the LLM prompt, the retrieved information remains the same for each summary.
The entire set of information to retrieve can fit within the LLM’s context window
Extremely low-latency responses are required (i.e., when responses are required in milliseconds)
Simple rule-based or templated responses are sufficient (e.g., a customer support chatbot that provides predefined answers based on keywords)
Requirements to discover#
Having established that RAG is indeed a good fit for your use case, consider the following questions to capture concrete requirements. For each requirement, we have prioritized them:
🟢 P0: Must define this requirement before starting your POC
🟡 P1: Must define before going to production, but can iteratively refine during the POC
⚪ P2: Nice to have requirement
User Experience#
Define how users will interact with the RAG system and what kind of responses are expected
🟢 What will a typical request to the RAG chain look like? Ask stakeholders for examples of potential user queries.
🟢 What kind of responses will users expect (e.g., short answers, long-form explanations, a combination, or something else)?
🟡 How will users interact with the system? Through a chat interface, search bar, or some other modality?
🟡 What tone or style should generated responses take? (e.g., formal, conversational, technical)
🟡 How should the application handle ambiguous, incomplete, or irrelevant queries? Should any form of feedback or guidance be provided in such cases?
⚪ Are there specific formatting or presentation requirements for the generated output? Should the output include any metadata in addition to the chain’s response?
Data#
Determine the nature, source(s), and quality of the data that will be used in the RAG solution
🟢 What are the available sources to use?
For each data source:
🟢 Is data structured or unstructured?
🟢 What is the source format of the retrieval data (e.g., PDFs, documentation with images/tables, structured API responses)?
🟢 Where does that data reside?
🟢 How much data is available?
🟡 How frequently is the data updated? How should those updates be handled?
🟡 Are there any known data quality issues or inconsistencies for each data source?
Consider creating an inventory table to consolidate this information, for example:
Data Source |
Source |
File type(s) |
Size |
Update frequency |
---|---|---|---|---|
Data source 1 |
Unity Catalog Volume |
JSON |
10GB |
Daily |
Data source 2 |
Public API |
XML |
n/a (API) |
Real-time |
Data source 3 |
SharePoint |
PDF, DOCX |
500MB |
Monthly |
Performance constraints#
Capture performance and resource requirements for the RAG application
🟡 What is the maximum acceptable latency for generating the responses?
🟡 What is the maximum acceptable time to first token?
🟡 If the output is being streamed, is higher total latency acceptable?
🟡 Are there any cost limitations on compute resources available for inference?
🟡 What are the expected usage patterns and peak loads?
🟡 How many concurrent users or requests should the system be able to handle?
NOTE: Databricks natively handles such scalability requirements, through the ability to scale automatically with Model Serving.
Evaluation#
Establish how the RAG solution will be evaluated and improved over time
🟢 What is the business goal / KPI you want to impact? What is the baseline value and what is the target?
🟢 Which users or stakeholders will provide initial and ongoing feedback?
🟢 What metrics should be used to assess the quality of generated responses?
Note: Mosaic AI Agent Evaluation provides a recommended set of metrics to yo use
🟡 What is the set of questions the RAG app must be good at to go to production?
🟡 Does an evaluation set exist? Is it possible to get an evaluation set of user queries, along with ground-truth answers and (optionally) the correct supporting documents that should be retrieved?
🟡 How will user feedback be collected and incorporated into the system?
Security#
Identify any security and privacy considerations
🟢 Are there sensitive/confidential data that needs to be handled with care?
🟡 Do access controls need to be implemented in the solution (e.g., a given user can only retrieve from a restricted set of documents)?
Deployment#
Understanding how the RAG solution will be integrated, deployed, and maintained
🟡 How should the RAG solution integrate with existing systems and workflows?
🟡 How should the model be deployed, scaled, and versioned?
NOTE: we will cover how this end-to-end lifecycle can be handled on Databricks with MLflow, Unity Catalog, Agent SDK, and Model Serving**.**
Note that this is by no means an exhaustive list of questions. However, it should provide a solid foundation for capturing the key requirements for your RAG solution.
Example#
As an example, let’s review how these questions apply to the internal Databricks RAG application used by our customer support team:
Considerations |
Requirements |
|
---|---|---|
User experience |
- Interaction modality |
- Chat interface integrated with Slack |
Data |
- Number and type of data sources |
- 3 data sources |
Performance |
- Maximum acceptable latency |
- Maximum latency: |
Evaluation |
- Evaluation dataset availability |
- SMEs from each product area will help review outputs and adjust incorrect answers to create the evaluation dataset |
Security |
- Sensitive data handling |
- No sensitive customer data should be in the retrieval source |
Deployment |
- Integration with existing systems |
- Integration with Databricks support ticket system |