Step 2: Deploy POC to collect stakeholder feedback

Step 2: Deploy POC to collect stakeholder feedback#

../_images/workflow_poc.png

Expected time: 30-60 minutes

Requirements

  1. Completed start here steps

  2. Data from your requirements is available in your Lakehouse inside a Unity Catalog volume

Code Repository

You can find all of the sample code referenced throughout this section here.

Expected outcome

At the end of this step, you will have deployed the Agent Evaluation Review App which allows your stakeholders to test and provide feedback on your POC. Detailed logs from your stakeholder’s usage and their feedback will flow to Delta Tables in your Lakehouse.

../_images/review_app2.gif

Overview

The first step in evaluation-driven development is to build a proof of concept (POC). A POC offers several benefits:

  1. Provides a directional view on the feasibility of your use case with RAG

  2. Allows collecting initial feedback from stakeholders, which in turn enables you to create the first version of your Evaluation Set

  3. Establishes a baseline measurement of quality to start to iterate from

Databricks recommends building your POC using the simplest RAG architecture and our recommended defaults for each knob/parameter.

Note

Why start from a simple POC? There are hundreds of possible combinations of knobs you can tune within your RAG application. You can easily spend weeks tuning these knobs, but if you do so before you can systematically evaluate your RAG, you’ll end up in what we call the POC doom loop—iterating on settings, but with no way to objectively know if you made an improvement—all while your stakeholders sit around impatiently waiting.

The POC template in this cookbook are designed with quality iteration in mind. That is, they are parameterized with the knobs that our research has shown are important to tune in order to improve RAG quality. Said differently, these templates are not “3 lines of code that magically make a RAG”—rather, they are a well-structured RAG application that can be tuned for quality in the following steps of an evaluation-driven development workflow.

This enables you to quickly deploy a POC, but transition quickly to quality iteration without needing to rewrite your code.

Below is the technical architecture of the POC application:

../_images/5_img2.png

Note

By default, the POC uses the open source models available on Mosaic AI Foundation Model Serving. However, because the POC uses Mosaic AI Model Serving, which supports any foundation model, using a different model is easy - simply configure that model in Model Serving and then replace the embedding_endpoint_name and llm_endpoint_name in the 00_config Notebook.

  • Follow these steps for other open source models available in the Databricks Marketplace

  • Follow this notebook or these instructions for 3rd party models such as Azure OpenAI, OpenAI, Cohere, Anthropic, Google Gemini, etc.

Instructions

  1. Open the agent_app_sample_code

    If your data doesn’t meet one of the above requirements, you can customize the parsing function (file_parser) within 02_data_pipeline in the above directory to work with your file types.

    Inside the POC folder, you will see the following notebooks:

../_images/6_img.png

Tip

The notebooks referenced below are relative to the specific POC you’ve chosen. For example, if you see a reference to 00_config and you’ve chosen pdf_uc_volume, you’ll find the relevant 00_global_config notebook at 00_global_config.


  1. Optionally, review the default parameters

    Open the 00_global_config Notebook within the directory to view the POC’s applications default parameters for the data pipeline and RAG chain.

    Note

    Important: our recommended default parameters are by no means perfect, nor are they intended to be. Rather, they are a place to start from - the next steps of our workflow guide you through iterating on these parameters.

  2. Run the data pipeline

    The POC data pipeline is a Databricks Notebook based on Apache Spark. Open the 02_data_pipeline Notebook and press Run All to execute the pipeline. The pipeline will:

    1. Load the raw documents from the UC Volume

    2. Parse each document, saving the results to a Delta Table

    3. Chunk each document, saving the results to a Delta Table

    4. Embed the documents and create a Vector Index using Mosaic AI Vector Search


    Metadata (output tables, configuration, etc) about the data pipeline are logged to MLflow:

    ../_images/datapipelinemlflow.gif

    You can inspect the outputs by looking for links to the Delta Tables/Vector Indexes output near the bottom of the notebook:

    Vector index: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>_poc_chunked_docs_gold_index
    
    Output tables:
    
    Bronze Delta Table w/ raw files: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>__poc_raw_files_bronze
    Silver Delta Table w/ parsed files: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>__poc_parsed_docs_silver
    Gold Delta Table w/ chunked files: https://<your-workspace-url>.databricks.com/explore/data/<uc-catalog>/<uc-schema>/<app-name>__poc_chunked_docs_gold
    
  3. Deploy the POC chain to the Review App

    The default POC chain is a multi-turn conversation RAG chain built using LangChain.

    Tip

    The POC Chain uses MLflow code-based logging. To understand more about code-based logging, visit the docs.

    1. Open the 03_agent_proof_of_concept Notebook

    2. Run each cell of the Notebook.

    3. You will see the MLflow Trace that shows you how the POC application works. Adjust the input question to one that is relevant to your use case, and re-run the cell to “vibe check” the application.

      ../_images/mlflow_trace2.gif
    4. Modify the default instructions to be relevant to your use case. These are displayed in the Review App.

         instructions_to_reviewer = f"""## Instructions for Testing the {AGENT_NAME}'s Initial Proof of Concept (PoC)
      
         Your inputs are invaluable for the development team. By providing detailed feedback and corrections, you help us fix issues and improve the overall quality of the application. We rely on your expertise to identify any gaps or areas needing enhancement.
      
         1. **Variety of Questions**:
            - Please try a wide range of questions that you anticipate the end users of the application will ask. This helps us ensure the application can handle the expected queries effectively.
      
         2. **Feedback on Answers**:
            - After asking each question, use the feedback widgets provided to review the answer given by the application.
            - If you think the answer is incorrect or could be improved, please use "Edit Answer" to correct it. Your corrections will enable our team to refine the application's accuracy.
      
         3. **Review of Returned Documents**:
            - Carefully review each document that the system returns in response to your question.
            - Use the thumbs up/down feature to indicate whether the document was relevant to the question asked. A thumbs up signifies relevance, while a thumbs down indicates the document was not useful.
      
         Thank you for your time and effort in testing {AGENT_NAME}. Your contributions are essential to delivering a high-quality product to our end users."""
      
         print(instructions_to_reviewer)
      
    5. Run the deployment cell to get a link to the Review App.

      Review App URL: https://<your-workspace-url>.databricks.com/ml/review/<uc-catalog>.<uc-schema>.<uc-model-name>/<uc-model-version>
      
  4. Grant individual users permissions to access the Review App.

    You can grant access to non-Databricks users by following these steps.

  5. Test the Review App by asking a few questions yourself and providing feedback.

    Note

    MLflow Traces and the user’s feedback from the Review App will appear in Delta Tables in the catalog/schema you have configured. Logs can take up to 2 hours to appear in these Delta Tables.

  6. Share the Review App with stakeholders

    You can now share your POC RAG application with your stakeholders to get their feedback.

    Important

    We suggest distributing your POC to at least 3 stakeholders and having them each ask 10 - 20 questions. It is important to have multiple stakeholders test your POC so you can have a diverse set of perspectives to include in your Evaluation Set.