Together AI: Fine-Tuning LLMs

This notebook provides a step-by-step guide to fine-tuning Large Language Models (LLMs) using the Together AI platform. We will cover the entire workflow, from preparing your dataset to making inference calls with your new custom model.

git add -A && git commit -m "Update project ($(date -u +'%Y-%m-%dT%H:%M:%SZ'))" && git push origin main

💡 Key Concepts in Fine-Tuning

Before we write code, let's grasp the core ideas:

  • Fine-Tuning: This is the process of taking a general-purpose, pre-trained LLM and further training it on a smaller, specific dataset. This adapts the model to your particular domain or task, such as a customer support chatbot or a code generator for a specific programming language.

  • Dataset Formatting: The quality and format of your data are critical. For instruction-based fine-tuning, you need to structure your data with clear prompts and desired responses. Together AI expects data in a JSONL format, where each line is a JSON object containing a "text" field.

  • Base Model: This is the pre-trained model you start with. Your choice of base model is important. For example, a model pre-trained on chat is a better starting point for a chatbot than a raw text completion model. Together AI offers many state-of-the-art open-source models.

  • Hyperparameters: These are the settings for your training job, such as learning_rate, batch_size, and the number of epochs (how many times the model sees the entire dataset). Tuning these can significantly impact your model's performance.

⚙️ 1. Setup and Installation

First, we need to install the necessary Python libraries.

# Uncomment to install the required packages
# %pip install -U together datasets transformers python-dotenv -q

Loading API Keys

We'll use the dotenv library to securely load our Together AI API key from a .env file. Create a file named .env in the same directory as this notebook and add your key:

TOGETHER_API_KEY="your-together-api-key-here"
import os
import together
from dotenv import load_dotenv

load_dotenv()

# Load API keys from environment variables
os.environ["TOGETHER_API_KEY"] = os.environ.get("TOGETHER_API_KEY", "")
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = os.environ.get("HUGGINGFACE_ACCESS_TOKEN", "")
TOGETHER_API_KEY = os.environ.get("TOGETHER_API_KEY")
HUGGINGFACE_ACCESS_TOKEN = os.environ.get("HUGGINGFACE_ACCESS_TOKEN")

📂 2. Data Preparation

We will use a small sample from the databricks/databricks-dolly-15k dataset. We'll format it into the required JSONL structure using a standard instruction template (<s>[INST]...[/INST]...</s>) that works well with models like Llama.

import json
from datasets import load_dataset

# Load a sample of 500 examples from the dataset
dataset = load_dataset("databricks/databricks-dolly-15k", split="train", token=HUGGINGFACE_ACCESS_TOKEN).select(range(500))

def format_for_finetuning(example):
    # Use a standard instruction format
    return {"text": f"<s>[INST] {example['instruction']} [/INST] {example['response']} </s>"}

formatted_dataset = dataset.map(format_for_finetuning)

# Save the prepared data to a JSONL file (only the 'text' field per line)
file_name = "dolly_prepared.jsonl"
with open(file_name, 'w') as f:
    for item in formatted_dataset:
        f.write(json.dumps({"text": item["text"]}) + "\n")

print(f"Dataset prepared and saved to {file_name}")
print("--- Sample Entry ---")
with open(file_name, 'r') as f:
    print(json.loads(f.readline())['text'])
Dataset prepared and saved to dolly_prepared.jsonl
--- Sample Entry ---
<s>[INST] When did Virgin Australia start operating? [/INST] Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. </s>

🚀 3. Upload File & Start Fine-Tuning

Now we upload our prepared dataset and launch the fine-tuning job. We will fine-tune the togethercomputer/llama-2-7b-chat model.

# 1. Upload the training file
try:
    upload_response = together.Files.upload(file=file_name)
    training_file_id = upload_response['id']
    print(f"File uploaded successfully. File ID: {training_file_id}")

    # 2. Create the fine-tuning job
    fine_tune_response = together.Finetune.create(
      training_file=training_file_id,
      model='togethercomputer/llama-2-7b-chat', # Base model to fine-tune
      n_epochs=3,                            # Number of training epochs
      n_checkpoints=1,                       # Number of checkpoints to save
      batch_size=8,                          # Batch size
      learning_rate=1e-5,                    # Learning rate
      suffix='dolly-llama2-7b-tutorial',     # A custom name for your fine-tuned model
    )

    print("\nFine-tuning job created:")
    print(fine_tune_response)
except Exception as e:
    print(f"Error uploading file or creating fine-tune job: {e}")
/var/folders/pv/g_b0j0n53rz5fm8yrlw3jg040000gn/T/ipykernel_55499/1191766534.py:3: DeprecationWarning: Call to deprecated function upload.
  upload_response = together.Files.upload(file=file_name)
Uploading file dolly_prepared.jsonl: 100%|██████████| 266k/266k [00:01<00:00, 212kB/s]
File uploaded successfully. File ID: file-9984a196-2c4b-4f82-b44e-da96665f34b1
/var/folders/pv/g_b0j0n53rz5fm8yrlw3jg040000gn/T/ipykernel_55499/1191766534.py:8: DeprecationWarning: Call to deprecated function create.
  fine_tune_response = together.Finetune.create(
Fine-tuning job created:
{'id': 'ft-91b93aa1-b4dd', 'training_file': 'file-9984a196-2c4b-4f82-b44e-da96665f34b1', 'model': 'togethercomputer/llama-2-7b-chat', 'n_epochs': 3, 'n_checkpoints': 1, 'n_evals': 0, 'batch_size': 8, 'learning_rate': 1e-05, 'lr_scheduler': {'lr_scheduler_type': 'cosine', 'lr_scheduler_args': {'min_lr_ratio': 0.0, 'num_cycles': 0.5}}, 'warmup_ratio': 0.0, 'max_grad_norm': 1.0, 'weight_decay': 0.0, 'eval_steps': 0, 'training_type': {'type': 'Lora'}, 'created_at': '2025-08-05T17:54:21.187Z', 'updated_at': '2025-08-05T17:54:21.187Z', 'status': <FinetuneJobStatus.STATUS_PENDING: 'pending'>, 'events': [], 'token_count': 0, 'total_price': 0, 'wandb_base_url': '', 'wandb_project_name': '', 'wandb_name': '', 'train_on_inputs': 'auto', 'suffix': 'dolly-llama2-7b-tutorial', 'training_method': {'method': 'sft', 'train_on_inputs': 'auto'}, 'random_seed': 'null', 'max_steps': -1, 'save_steps': 0, 'warmup_steps': 0, 'validation_split_ratio': 0, 'per_device_batch_size': 0, 'per_device_eval_batch_size': 0, 'gradient_accumulation_steps': 1, 'continued_checkpoint': '', 'merge_parent_adapter': False, 'parent_ft_id': '', 'try_byoa_upload': True, 'user_id': '68920bddaeed77e341146664', 'owner_address': '0xb8e99171f6536df47bc53657526b6dbdcfbc0ee9'}

🔎 4. Monitor the Fine-Tuning Job

The fine-tuning process can take some time. You can monitor its status programmatically. The job will go through queued, running, processing_files, and finally completed states.

# Wait til the fine tuning finish (could take a while)
import time

job_id = "ft-338e34c5-fdc5"
status = together.Finetune.retrieve(fine_tune_id=job_id)
job_status = status.get('status', 'unknown')
print(f"Current job status: {job_status}")
/var/folders/pv/g_b0j0n53rz5fm8yrlw3jg040000gn/T/ipykernel_55499/2545042428.py:5: DeprecationWarning: Call to deprecated function retrieve.
  status = together.Finetune.retrieve(fine_tune_id=job_id)
Current job status: completed

🤖 5. Use the Fine-Tuned Model for Inference

Once the job is complete, your model is ready! Use the new model name returned by the API for inference. Remember to use the same prompt format ([INST]...[/INST]) that you used for training.

# This cell will only work if the monitoring step above has completed successfully.
# Get the fine-tuned model name from together fine tuning dashboard
dedicated_endpoint = 'https://api.together.ai/v1/inference/devon_a863/llama-2-7b-chat-dolly-llama2-7b-tutorial-e305d828'  # FAKE dedicated endpoint for demonstration

import requests

headers = {
    'Authorization': f'Bearer {os.environ.get("TOGETHER_API_KEY", "")}',
    'Content-Type': 'application/json'
}

payload = {
    "model": "devon_a863/llama-2-7b-chat-dolly-llama2-7b-tutorial-e305d828",
    "prompt": "[INST] What is the secret to a successful startup? [/INST]",
    "max_tokens": 256,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.7,
    "repetition_penalty": 1.1,
    "stop": ["[/INST]", "</s>"]
}

try:
    response = requests.post(dedicated_endpoint, headers=headers, json=payload)
    response.raise_for_status()
    print("\nFake Dedicated Endpoint Response:")
    print(response.json())
except Exception as e:
    print(e)

Summary

This notebook demonstrates the full workflow for fine-tuning a Large Language Model (LLM) using the Together AI platform:

  • Key Concepts: Introduces fine-tuning, dataset formatting, base models, and hyperparameters.
  • Setup: Installs required libraries and loads API keys securely from a .env file.
  • Data Preparation: Downloads a sample from the databricks/databricks-dolly-15k dataset, formats it for instruction-based fine-tuning, and saves it as a JSONL file.
  • File Upload & Fine-Tuning: Uploads the prepared dataset to Together AI and creates a fine-tuning job using a base model (togethercomputer/llama-2-7b-chat).
  • Job Monitoring: Shows how to monitor the fine-tuning job status programmatically.
  • Inference: Demonstrates how to use the fine-tuned model for inference, including handling errors and (for demonstration) how to call a fake dedicated endpoint.

This notebook provides a practical, end-to-end guide for customizing LLMs with your own data on Together AI, including robust error handling and troubleshooting tips.