Fine Tuning in Azure AI Studio

We have customers seeking to fine-tune models for use in self-hosted Duo features. This issue is to enable use of Azure AI studio for fine tuning models.

The Azure AI Studio covers much of the requirements for fine-tuning, except for:

baseline model validation
dataset preparation for fine-tuning
tuned-model validation

The components that are already in flight for Custom Models, to include:

Pre-Requisites

Model Baselines: Having a baseline for performance without fine-tuning is essential for knowing whether or not fine-tuning improves model performance. Fine-tuning with bad data makes the base model worse, but without a baseline, it's hard to detect regressions.
Dataset created for fine tuning -- Your training data and validation data sets consist of input and output examples for how you would like the model to perform.
- You identified a dataset for fine-tuning.
- Your dataset is in the appropriate format for training: The training and validation data you use must be formatted as a JSON Lines (JSONL) document. For gpt-35-turbo (all versions), gpt-4, gpt-4o, and gpt-4o-mini, the fine-tuning dataset must be formatted in the conversational format that is used by the Chat completions API.
- You employed some level of curation to ensure dataset quality.
- The more training examples you have, the better. Fine tuning jobs will not proceed without at least 10 training examples, but such a small number isn't enough to noticeably influence model responses. It is best practice to provide hundreds, if not thousands, of training examples to be successful. In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.

Definition of Done

Customers are able to use Gitlab to enable fine-tuning in Azure AI Studio.

Edited Oct 30, 2024 by Susie Bitters