Step 2: Deploy POC to collect stakeholder feedback#
Expected time: 30-60 minutes
Requirements
Completed start here steps
Data from your requirements is available in your Lakehouse inside a Unity Catalog volume
You can find all of the sample code referenced throughout this section here.
Expected outcome
At the end of this step, you will have deployed the Agent Evaluation Review App which allows your stakeholders to test and provide feedback on your POC. Detailed logs from your stakeholder’s usage and their feedback will flow to Delta Tables in your Lakehouse.
Overview
The first step in evaluation-driven development is to build a proof of concept (POC). A POC offers several benefits:
Provides a directional view on the feasibility of your use case with RAG
Allows collecting initial feedback from stakeholders, which in turn enables you to create the first version of your Evaluation Set
Establishes a baseline measurement of quality to start to iterate from
Databricks recommends building your POC using the simplest RAG architecture and our recommended defaults for each knob/parameter.
Note
Why start from a simple POC? There are hundreds of possible combinations of knobs you can tune within your RAG application. You can easily spend weeks tuning these knobs, but if you do so before you can systematically evaluate your RAG, you’ll end up in what we call the POC doom loop—iterating on settings, but with no way to objectively know if you made an improvement—all while your stakeholders sit around impatiently waiting.
The POC template in this cookbook are designed with quality iteration in mind. That is, they are parameterized with the knobs that our research has shown are important to tune in order to improve RAG quality. Said differently, these templates are not “3 lines of code that magically make a RAG”—rather, they are a well-structured RAG application that can be tuned for quality in the following steps of an evaluation-driven development workflow.
This enables you to quickly deploy a POC, but transition quickly to quality iteration without needing to rewrite your code.
Below is the technical architecture of the POC application:
Note
By default, the POC uses the open source models available on Mosaic AI Foundation Model Serving. However, because the POC uses Mosaic AI Model Serving, which supports any foundation model, using a different model is easy - simply configure that model in Model Serving and then replace the embedding_endpoint_name
and llm_endpoint_name
in the 00_config
Notebook.
Follow these steps for other open source models available in the Databricks Marketplace
Follow this notebook or these instructions for 3rd party models such as Azure OpenAI, OpenAI, Cohere, Anthropic, Google Gemini, etc.
Instructions
Open the
agent_app_sample_code
If your data doesn’t meet one of the above requirements, you can customize the parsing function (
file_parser
) within02_data_pipeline
in the above directory to work with your file types.Inside the POC folder, you will see the following notebooks:
Tip
The notebooks referenced below are relative to the specific POC you’ve chosen. For example, if you see a reference to 00_config
and you’ve chosen pdf_uc_volume
, you’ll find the relevant 00_global_config
notebook at 00_global_config
.
Optionally, review the default parameters
Open the
00_global_config
Notebook within the directory to view the POC’s applications default parameters for the data pipeline and RAG chain.Note
Important: our recommended default parameters are by no means perfect, nor are they intended to be. Rather, they are a place to start from - the next steps of our workflow guide you through iterating on these parameters.
Run the data pipeline
The POC data pipeline is a Databricks Notebook based on Apache Spark. Open the
02_data_pipeline
Notebook and press Run All to execute the pipeline. The pipeline will:Load the raw documents from the UC Volume
Parse each document, saving the results to a Delta Table
Chunk each document, saving the results to a Delta Table
Embed the documents and create a Vector Index using Mosaic AI Vector Search
Metadata (output tables, configuration, etc) about the data pipeline are logged to MLflow:
You can inspect the outputs by looking for links to the Delta Tables/Vector Indexes output near the bottom of the notebook:
Vector index: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>_poc_chunked_docs_gold_index Output tables: Bronze Delta Table w/ raw files: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>__poc_raw_files_bronze Silver Delta Table w/ parsed files: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>__poc_parsed_docs_silver Gold Delta Table w/ chunked files: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>__poc_chunked_docs_gold
Deploy the POC chain to the Review App
The default POC chain is a multi-turn conversation RAG chain built using LangChain.
Tip
The POC Chain uses MLflow code-based logging. To understand more about code-based logging, visit the docs.
Open the
03_agent_proof_of_concept
NotebookRun each cell of the Notebook.
You will see the MLflow Trace that shows you how the POC application works. Adjust the input question to one that is relevant to your use case, and re-run the cell to “vibe check” the application.
Modify the default instructions to be relevant to your use case. These are displayed in the Review App.
instructions_to_reviewer = f"""## Instructions for Testing the {AGENT_NAME}'s Initial Proof of Concept (PoC) Your inputs are invaluable for the development team. By providing detailed feedback and corrections, you help us fix issues and improve the overall quality of the application. We rely on your expertise to identify any gaps or areas needing enhancement. 1. **Variety of Questions**: - Please try a wide range of questions that you anticipate the end users of the application will ask. This helps us ensure the application can handle the expected queries effectively. 2. **Feedback on Answers**: - After asking each question, use the feedback widgets provided to review the answer given by the application. - If you think the answer is incorrect or could be improved, please use "Edit Answer" to correct it. Your corrections will enable our team to refine the application's accuracy. 3. **Review of Returned Documents**: - Carefully review each document that the system returns in response to your question. - Use the thumbs up/down feature to indicate whether the document was relevant to the question asked. A thumbs up signifies relevance, while a thumbs down indicates the document was not useful. Thank you for your time and effort in testing {AGENT_NAME}. Your contributions are essential to delivering a high-quality product to our end users.""" print(instructions_to_reviewer)
Run the deployment cell to get a link to the Review App.
Review App URL: https://<your-workspace-url>.databricks.com/ml/review/<uc-catalog>.<uc-schema>.<uc-model-name>/<uc-model-version>
Grant individual users permissions to access the Review App.
You can grant access to non-Databricks users by following these steps.
Test the Review App by asking a few questions yourself and providing feedback.
Note
MLflow Traces and the user’s feedback from the Review App will appear in Delta Tables in the catalog/schema you have configured. Logs can take up to 2 hours to appear in these Delta Tables.
Share the Review App with stakeholders
You can now share your POC RAG application with your stakeholders to get their feedback.
Important
We suggest distributing your POC to at least 3 stakeholders and having them each ask 10 - 20 questions. It is important to have multiple stakeholders test your POC so you can have a diverse set of perspectives to include in your Evaluation Set.