Skip to content

Create schema and guidelines for desired GitLab tools for Agent Platform

Problem

In order for us to not run into the same problems when creating tools for the MCP server that have already been brought up in gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#1328 (closed) the main important part is to think about what kind of tools work well for the usage with LLMs and MCP rather than ad-hoc create tool as the need arises or make them a 1-to-1 relationship between API and tool.

The problem there is that many engineers don't yet have the knowledge or good intuition around how to design good tools, even though they don't have a problem actually implementing the tool (e.g. the data fetching logic).

Desired Outcome

Ideally we have two outcomes (not :

  1. Guidelines on how to create good tools, e.g. considerations around input parameters, prompt as well as surface area of the tools
  2. A schema proposal that maps the tools to interact with GitLab (CRUD issue/work_item etc.) in the Duo Agent Platform to a new, likely reduced set of tools with more LLM friendly inputs

Proposed Solution

PoC demo: https://youtu.be/pPN6gAXiBg4

Summary

This proposal introduces a centralized Tool Registry repository to manage tool schemas for GitLab's AI agent platform. Currently, tool development lacks standardization, leading to inconsistent quality and poor LLM compatibility. The Tool Registry will serve as a single source of truth for all tool schemas, enforce quality standards through expert review, include automated routing evaluation, and generate versioned clients for multiple programming languages. This approach will enable scalable, high-quality tool development while maintaining backward compatibility and supporting both GitLab.com and self-managed deployments.

Ideal State: A centralized system where AI experts review and approve all tool schemas, automated tests validate routing performance, and versioned client packages are automatically generated for implementation across MCP servers, Duo Workflow Service (DWS), and Node Executor. The Registry will maintain multiple versions of each tool simultaneously, allowing services to adopt schema changes at their own pace while ensuring backward compatibility.

Overview

The Tool Registry is a centralized repository that hosts tool schemas (not implementations) with the following core features:

  • Schema-only repository: Stores only tool definitions; implementations remain in relevant services
  • Expert review process: AI experts committee reviews all schema changes
  • Integrated evaluation: Day-one tool routing evaluation for schema changes
  • Tool versions support: Support multiple versions of each tool simultaneously, allowing services to adopt schema changes at their own pace.
  • Multi-language support: Generates versioned packages in Python, Ruby, and Node.js
  • Comprehensive documentation: Guidelines for creating effective tools

Key Benefits

Centralized Management

  • Single repo for all tool schemas
  • Simple to add tools (just a YAML file)
  • Auto-generates interfaces for multiple languages/registries

Enable Quality at Scale

  • Built-in routing evaluation for data-driven decisions
  • Easy adoption of industry standards (e.g., Structured Output)
  • Centralized guidelines and quality control

Streamline Collaboration

  • Decouples schema design from implementation
  • AI experts review all changes in one place
  • Backend engineers focus on implementation (their expertise)

Maintain Competitive Advantage

  • Flexibility to rapidly adopt industry evolution in tool design
  • Version support for safe migrations and custom flow building
  • Keep GitLab AI solution ahead of competitors

Architecture

image

The tool registry acts as a central hub that:

  1. Stores all tool schemas with multiple version supports
  2. Enforces quality through automated tests and expert review
  3. Generates client packages with tool interfaces for runtime implementation in various languages
  4. Validates routing performance continuously
  5. Support customizable structure tool output schema

Repository Structure

Demo repo: https://gitlab.com/junminghuang/tool-registry

├── clients
│   ├── node
│   ├── python
│   │   ├── pyproject.toml
│   │   ├── README.md
│   │   ├── src
│   │   │   └── tool_registry
│   │   │       ├── __init__.py
│   │   │       ├── get_gitlab_issue
│   │   │       │   └── __init__.py
│   │   │       ├── read_file
│   │   │       │   ├── __init__.py
│   │   │       │   ├── v0_0_1.py
│   │   │       │   └── v0_0_2.py
│   │   │       └── schema.py
│   │   └── tests
│   │       ├── __init__.py
│   │       └── test_main.py
│   └── ruby
├── docs
│   ├── 01-introduction.md
│   ├── 02-tool-basics.md
│   ├── 03-defining-your-tool.md
│   ├── 04-implementation-guide.md
│   ├── 05-best-practices.md
│   └── 06-testing-and-debugging.md
├── eval
│   ├── configs
│   │   ├── get_gitlab_issue
│   │   │   └── v0_0_1.yaml
│   │   └── read_file
│   │       ├── v0_0_1.yaml
│   │       └── v0_0_2.yaml
│   └── main.py
├── generators
│   ├── gen_node.py
│   ├── gen_python.py
│   └── gen_ruby.py
├── pyproject.toml
├── README.md
├── tests
├── tools
│   ├── __init__.py
│   ├── schema.py
│   ├── specs
│   │   ├── get_gitlab_issue
│   │   │   └── v0_0_1.yaml
│   │   └── read_file
│   │       ├── v0_0_1.yaml
│   │       └── v0_0_2.yaml
│   └── tool_factory.py
└── uv.lock

Schema Definitions

Tool Schema

As yaml is widely used in GitLab, tool-registry will use yaml as the schema format and using python pydantic BaseModel for schema validation. Every tool yaml will be loaded as a ToolSpec instance for validation check. https://gitlab.com/junminghuang/tool-registry/-/blob/62924f5f72c612c1191d4abbf4c822ce8cc1b356/tools/schema.py

An example tool schema is just a simple yaml file as follows.

version: 0.0.2
title: Read File
name: read_file
description: A tool to read the partial content of a given file path based on the offset and limit setting
host: duo_workflow_service
inputSchema:
  parameters:
    - name: file_path
      description: The path of the file
      type: string
      required: true
    - name: offset
      description: number of byte to skip read
      type: integer
      required: false
      default: 1
    - name: limit
      description: max number of byte to read
      type: integer
      required: false
      default: 2000

Tool Versioning Strategy

Tools will follow semantic versioning (MAJOR.MINOR.PATCH):

  • MAJOR: Breaking changes to input/output schemas
  • MINOR: Backward-compatible functionality additions
  • PATCH: Backward-compatible bug fixes or documentation updates

Tool Development Process

1. Introducing a New Tool

  1. Create Tool Schema yaml
  2. Add Routing Evaluation Config
  3. Run and pass the test and routing evaluation
  4. Submit Merge Request
  5. AI Expert Review
  6. Approval, Merge and publish tool-registry package with new version
  7. Install the package and start the tool execution implementation

2. Minor Tool Updates (Non-Breaking)

  1. Create Tool Schema yaml with increment MINOR or PATCH version
  2. Update Routing Evaluation Config
  3. Run and pass the test and routing evaluation
  4. Submit Merge Request
  5. AI Expert Review
  6. Approval, Merge and publish tool-registry package with new version
  7. Install the package and start the tool execution implementation

3. Breaking Changes (Major Version Bump)

Use Cases: Removing parameters, changing parameter types, renaming tools

  1. Create Tool Schema yaml with new major version
  2. Update Routing Evaluation Config
  3. Run and pass the test and routing evaluation
  4. Submit Merge Request
  5. AI Expert Review
  6. Approval, Merge and publish tool-registry package with new version
  7. Install the package and start the tool execution implementation

Tool Implementation

  1. Installed the correct tool registry package
  2. Import the Tool base class and output schema
  3. Implement the _execute method
  4. Test and create MR

Example in python

from tool_registry.read_file.v0_0_2 import ReadFileBase, ReadFileOuput, spec


class ReadFile(ReadFileBase):

    def _execute(self, file_path, offset=1, limit=2000) -> ReadFileOuput:

        content = (
            "Do you want to know more about Junming?" + "\n" * 2100
            if offset < 2000
            else "Junming is a Senior ML Engineer at GitLab!"
        )
        return ReadFileOuput(
            result=f"content read from version: {spec.version} " + content,
            metadata={"offset": offset, "limit": limit, "total_lines": 5000},
            instruction=f"The content is partial from the file: {file_path}, if the result doesn't have content you need, try to increase the offset.",
        )

Legacy content

Click to expand

Tool Schema

Based on all the gitlab related tools in DWS, here are the proposed consolidation plan.

  • list_issues
  • get_issue
  • list_issue_notes
  • get_issue_note
  • gitlab_issue_search

Consolidated to tool:

{
  "title": "get_gitlab_issue",
  "description": "Get GitLab issue related data from a given API endpoint.",
  "properties": {
    "project_id": {
      "type": "integer",
      "description": "The id of the GitLab project"
    },
    "endpoint": {
      "type": "string",
      "description": "API endpoint, strictly follows the below examples:
        - issues
        - issues?assignee_username=john
        - issues?author_username=john
        - issues?confidential=true
        - issues?iids[]=42&iids[]=43
        - issues?labels=foo
        - issues?labels=foo,bar 
        - issues?labels=foo,bar&state=opened
        - issues?milestone=1.0.0
        - issues?milestone=1.0.0&state=opened
        - issues?my_reaction_emoji=star
        - issues?search=issue+title+or+description
        - issues?state=closed
        - issues?state=opened"
    }
  },
  "required": [
    "project_id",
    "endpoint"
  ],
  "type": "object",
  "strict": false
}

  • create_issue_note

  • create_issue

  • update_issue

  • list_work_items

  • get_work_item

  • get_work_item_notes

  • gitlab_note_search

  • create_work_item

  • create_work_item_note

  • get_epic

  • list_epics

  • list_epic_notes

  • create_epic

  • update_epic


  • get_merge_request
  • list_merge_request_diffs
  • list_all_merge_request_notes
  • gitlab_merge_request_search

Consolidated to tool:

{'name': 'get_gitlab_merge_request',
  'description': 'Get GitLab merge request related data from a given API path.',
  'parameters': {'properties': {'api_path': {'description': 'GitLab merge request rest api path, it should be one of the following path:\n- `/api/v4/projects/<project_id>/merge_requests` (list all the merge request in the project)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>` (get a specific merge request)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>/notes` (list all the notes from a specific merge request)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>/diffs` (list all the diffs from a specific merge request)\n',
     'type': 'string'}},
   'required': ['api_path'],
   'type': 'object'}}

  • create_merge_request_note

  • create_merge_request

  • update_merge_request

  • get_commit

  • gitlab_commit_search

  • list_commits

  • get_commit_diff

  • get_commit_comments

  • create_commit

  • list_vulnerabilities

  • confirm_vulnerability

  • link_vulnerability_to_issue

  • get_vulnerability_details

  • update_vulnerability_severity

  • dismiss_vulnerability

  • list_instance_audit_events

  • list_project_audit_events

  • list_group_audit_events

  • get_repository_file

  • list_repository_tree

  • get_project

  • gitlab__user_search

  • get_current_user

  • get_previous_session_context

  • get_job_logs

  • gitlab_blob_search

  • gitlab_wiki_blob_search

  • gitlab_group_project_search

  • gitlab_documentation_search

  • gitlab_milestone_search

  • ci_linter

  • get_pipeline_errors

Edited by Junming Huang