Knowledge Base

AI Model Training Process Overview

1. Overview

Narrative enables users to train and fine-tune AI models tailored to their unique needs. By leveraging Narrative's tools and ecosystem, users can:

Create custom AI models: Train models on proprietary datasets to meet specific business requirements.
Simplify the model training workflow: Seamlessly integrate datasets, mappings, and compute resources using Narrative’s Prompt Studio and Model Studio.
Collaborate securely: Use datasets from Narrative’s marketplace, partners, or your own uploads with complete control over data usage.

This capability empowers anyone to unlock the full potential of their data by producing fine-tuned models optimized for their unique challenges.

2. Key Concepts for Training Data

How Does Fine-Tuning Work?
- A pre-trained model is adapted using a fine-tuning dataset that provides task-specific examples.
- Fine-tuning updates the model’s parameters, enabling it to perform specific tasks while retaining its general knowledge.
- The dataset must be structured and formatted to align with the model training process.
Conversational Prompts
Conversational prompts are critical for training dialogue-based models. These prompts structure data into interactions that help the model learn how to respond to specific inputs.
Fine-Tuning Datasets
Datasets must be curated and formatted for fine-tuning tasks. Tools like Prompt Studio help map raw data into formats like fine_tuning_conversation, ensuring compatibility with the model training process.
Hardware Requirements
Fine-tuning requires significant computational resources, typically GPU-enabled hardware. Model Studio allows users to select the appropriate compute instance to efficiently manage the fine-tuning workload.

Dataset
The raw data for fine-tuning. It must be mapped and materialized into a format like fine_tuning_conversation to align with training workflows.
Attributes
Templates like fine_tuning_conversation define the structure of training datasets. Attributes ensure datasets are properly formatted for fine-tuning tasks, particularly those involving conversational prompts.
Mappings
Align raw dataset fields to attributes, ensuring data meets the required schema for fine-tuning. Mappings are configured in Prompt Studio.
Prompt Studio
A tool for mapping datasets to attributes, configuring conversational prompts, and embedding dynamic values. Learn more in the Prompt Studio Help Doc.
Model Studio
The interface for selecting base models, attribute-mapped datasets, and compute resources, and initiating fine-tuning. Learn more in the Model Studio Help Doc.

3. High-Level Overview of the Training Process

Step 1: Access or Upload a Dataset

Start by obtaining a dataset suitable for your use case. Datasets can come from:

Narrative Marketplace: Browse datasets shared by other users or companies.
Partners: Use datasets shared directly with you by Narrative partners.
Uploads: Upload your own datasets into Narrative.

Step 2: Map the Dataset in Prompt Studio

Use Prompt Studio to map your dataset fields to the required attribute format (e.g., fine_tuning_conversation).
Configure prompts and macros to define how data fields should be used in the model training process.

For detailed instructions, see the Prompt Studio Help Doc.

Step 3: Train the Model in Model Studio

In Model Studio, select:
- A base model (e.g., Llama or Mistral).
- A dataset formatted using the mappings from Prompt Studio.
- Compute resources optimized for your workload.
Provide metadata for the trained model (e.g., name, description, license).
Kick off the training process, which runs via Axolotl to fine-tune the model on the selected dataset.

For detailed instructions, see the Model Studio Help Doc.

By following these steps, you can create high-performance AI models tailored to your unique requirements while maintaining control and flexibility within the Narrative platform.