Step 4: Evaluate the POC’s quality

Step 4: Evaluate the POC’s quality#

../_images/workflow_baseline.png

Expected time: 5 - 60 minutes

Time varies based on the number of questions in your evaluation set. For 100 questions, evaluation will take approximately 5 minutes.

Code Repository

You can find all of the sample code referenced throughout this section here.

Overview & expected outcome#

This step will use the evaluation set you just curated to evaluate your POC app and establish the baseline quality/cost/latency. The evaluation results are used by the next step to root cause any quality issues.

Evaluation is done using Mosaic AI Agent Evaluation and looks comprehensively across all aspects of quality, cost, and latency outlined in the metrics section of this cookbook.

The aggregated metrics and evaluation of each question in the evaluation set are logged to MLflow. For more details, see the evaluation outputs documentation.

Requirements#

  • Your Evaluation Set is available

  • All requirements from previous steps

Instructions#

  1. Open the 05_evaluate_poc_quality Notebook within your chosen POC directory and press Run All.

  2. Inspect the results of evaluation in the Notebook or using MLflow.

Note

If the results meet your requirements for quality, you can skip directly to the Deployment section. Because the POC application is built on Databricks, it is ready to be deployed to a scalable, production-ready REST API.

Next step: Using this baseline evaluation of the POC’s quality, identify the root causes of any quality issues and iteratively fix those issues to improve the app.