Step 1: Clone code repo & create compute

Step 1: Clone code repo & create compute#

../_images/workflow_poc.png

The implement section is coupled with a repository of sample code designed to work on Databricks.

Follow these steps to load the sample code to your Databricks workspace and configure the global settings for the application.

Code Repository

You can find all of the sample code referenced throughout this section here.

Requirements#

  1. A Databricks workspace with serverless and Unity Catalog enabled

  2. A Mosaic AI Vector Search endpoint, either:

    • An existing endpoint

    • Permissions to create a new endpoint - the setup Notebook will do this for you

  3. Unity Catalog Schema where the output Delta Tables with the parsed/chunked documents and Vector Search indexes are stored, either:

    • Write access to an existing Unity Catalog and Schema

    • Permissions to create a new Unity Catalog and Schema - the setup Notebook will do this for you

  4. A cluster running with access to the internet

    • Internet access is required to download the necessary Python and system packages

Instructions#

  1. Clone this repository into your workspace using Git Folders

    ../_images/clone_repo.gif

  1. Open the 00_global_config Notebook and adjust the settings there.

    # The name of the RAG application.  This is used to name the chain's UC model and prepended to the output Delta Tables + Vector Indexes
    AGENT_NAME = 'my_agent_app'
    
    # UC Catalog & Schema where outputs tables/indexs are saved
    # If this catalog/schema does not exist, you need create catalog/schema permissions.
    UC_CATALOG = f'{user_name}_catalog'
    UC_SCHEMA = f'rag_{user_name}'
    
    ## UC Model name where the POC chain is logged
    UC_MODEL_NAME = f"{UC_CATALOG}.{UC_SCHEMA}.{AGENT_NAME}"
    
    # Vector Search endpoint where index is loaded
    # If this does not exist, it will be created
    VECTOR_SEARCH_ENDPOINT = f'{user_name}_vector_search'
    
    # Source location for documents
    # You need to create this location and add files
    SOURCE_PATH = f"/Volumes/{UC_CATALOG}/{UC_SCHEMA}/source_docs"
    

Proceed to the Deploy POC step.