Step 1: Clone code repo & create compute#
The implement section is coupled with a repository of sample code designed to work on Databricks.
Follow these steps to load the sample code to your Databricks workspace and configure the global settings for the application.
You can find all of the sample code referenced throughout this section here.
Requirements#
A Databricks workspace with serverless and Unity Catalog enabled
A Mosaic AI Vector Search endpoint, either:
An existing endpoint
Permissions to create a new endpoint - the setup Notebook will do this for you
Unity Catalog Schema where the output Delta Tables with the parsed/chunked documents and Vector Search indexes are stored, either:
Write access to an existing Unity Catalog and Schema
Permissions to create a new Unity Catalog and Schema - the setup Notebook will do this for you
A cluster running with access to the internet
Internet access is required to download the necessary Python and system packages
Instructions#
Clone this repository into your workspace using Git Folders
Open the
00_global_config
Notebook and adjust the settings there.# The name of the RAG application. This is used to name the chain's UC model and prepended to the output Delta Tables + Vector Indexes AGENT_NAME = 'my_agent_app' # UC Catalog & Schema where outputs tables/indexs are saved # If this catalog/schema does not exist, you need create catalog/schema permissions. UC_CATALOG = f'{user_name}_catalog' UC_SCHEMA = f'rag_{user_name}' ## UC Model name where the POC chain is logged UC_MODEL_NAME = f"{UC_CATALOG}.{UC_SCHEMA}.{AGENT_NAME}" # Vector Search endpoint where index is loaded # If this does not exist, it will be created VECTOR_SEARCH_ENDPOINT = f'{user_name}_vector_search' # Source location for documents # You need to create this location and add files SOURCE_PATH = f"/Volumes/{UC_CATALOG}/{UC_SCHEMA}/source_docs"
Proceed to the Deploy POC step.