Create schema and guidelines for desired GitLab tools for Agent Platform

Problem

In order for us to not run into the same problems when creating tools for the MCP server that have already been brought up in gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#1328 (closed) the main important part is to think about what kind of tools work well for the usage with LLMs and MCP rather than ad-hoc create tool as the need arises or make them a 1-to-1 relationship between API and tool.

The problem there is that many engineers don't yet have the knowledge or good intuition around how to design good tools, even though they don't have a problem actually implementing the tool (e.g. the data fetching logic).

Desired Outcome

Ideally we have two outcomes (not :

Guidelines on how to create good tools, e.g. considerations around input parameters, prompt as well as surface area of the tools
A schema proposal that maps the tools to interact with GitLab (CRUD issue/work_item etc.) in the Duo Agent Platform to a new, likely reduced set of tools with more LLM friendly inputs

Proposed Solution

PoC demo: https://youtu.be/pPN6gAXiBg4

Summary

This proposal introduces a centralized Tool Registry repository to manage tool schemas for GitLab's AI agent platform. Currently, tool development lacks standardization, leading to inconsistent quality and poor LLM compatibility. The Tool Registry will serve as a single source of truth for all tool schemas, enforce quality standards through expert review, include automated routing evaluation, and generate versioned clients for multiple programming languages. This approach will enable scalable, high-quality tool development while maintaining backward compatibility and supporting both GitLab.com and self-managed deployments.

Ideal State: A centralized system where AI experts review and approve all tool schemas, automated tests validate routing performance, and versioned client packages are automatically generated for implementation across MCP servers, Duo Workflow Service (DWS), and Node Executor. The Registry will maintain multiple versions of each tool simultaneously, allowing services to adopt schema changes at their own pace while ensuring backward compatibility.

Overview

The Tool Registry is a centralized repository that hosts tool schemas (not implementations) with the following core features:

Schema-only repository: Stores only tool definitions; implementations remain in relevant services
Expert review process: AI experts committee reviews all schema changes
Integrated evaluation: Day-one tool routing evaluation for schema changes
Tool versions support: Support multiple versions of each tool simultaneously, allowing services to adopt schema changes at their own pace.
Multi-language support: Generates versioned packages in Python, Ruby, and Node.js
Comprehensive documentation: Guidelines for creating effective tools

Key Benefits

Centralized Management

Single repo for all tool schemas
Simple to add tools (just a YAML file)
Auto-generates interfaces for multiple languages/registries

Enable Quality at Scale

Built-in routing evaluation for data-driven decisions
Easy adoption of industry standards (e.g., Structured Output)
Centralized guidelines and quality control

Streamline Collaboration

Decouples schema design from implementation
AI experts review all changes in one place
Backend engineers focus on implementation (their expertise)

Maintain Competitive Advantage

Flexibility to rapidly adopt industry evolution in tool design
Version support for safe migrations and custom flow building
Keep GitLab AI solution ahead of competitors

Architecture

The tool registry acts as a central hub that:

Stores all tool schemas with multiple version supports
Enforces quality through automated tests and expert review
Generates client packages with tool interfaces for runtime implementation in various languages
Validates routing performance continuously
Support customizable structure tool output schema

Repository Structure

Demo repo: https://gitlab.com/junminghuang/tool-registry

├── clients
│   ├── node
│   ├── python
│   │   ├── pyproject.toml
│   │   ├── README.md
│   │   ├── src
│   │   │   └── tool_registry
│   │   │       ├── __init__.py
│   │   │       ├── get_gitlab_issue
│   │   │       │   └── __init__.py
│   │   │       ├── read_file
│   │   │       │   ├── __init__.py
│   │   │       │   ├── v0_0_1.py
│   │   │       │   └── v0_0_2.py
│   │   │       └── schema.py
│   │   └── tests
│   │       ├── __init__.py
│   │       └── test_main.py
│   └── ruby
├── docs
│   ├── 01-introduction.md
│   ├── 02-tool-basics.md
│   ├── 03-defining-your-tool.md
│   ├── 04-implementation-guide.md
│   ├── 05-best-practices.md
│   └── 06-testing-and-debugging.md
├── eval
│   ├── configs
│   │   ├── get_gitlab_issue
│   │   │   └── v0_0_1.yaml
│   │   └── read_file
│   │       ├── v0_0_1.yaml
│   │       └── v0_0_2.yaml
│   └── main.py
├── generators
│   ├── gen_node.py
│   ├── gen_python.py
│   └── gen_ruby.py
├── pyproject.toml
├── README.md
├── tests
├── tools
│   ├── __init__.py
│   ├── schema.py
│   ├── specs
│   │   ├── get_gitlab_issue
│   │   │   └── v0_0_1.yaml
│   │   └── read_file
│   │       ├── v0_0_1.yaml
│   │       └── v0_0_2.yaml
│   └── tool_factory.py
└── uv.lock

Schema Definitions

Tool Schema

As yaml is widely used in GitLab, tool-registry will use yaml as the schema format and using python pydantic BaseModel for schema validation. Every tool yaml will be loaded as a ToolSpec instance for validation check. https://gitlab.com/junminghuang/tool-registry/-/blob/62924f5f72c612c1191d4abbf4c822ce8cc1b356/tools/schema.py

An example tool schema is just a simple yaml file as follows.

version: 0.0.2
title: Read File
name: read_file
description: A tool to read the partial content of a given file path based on the offset and limit setting
host: duo_workflow_service
inputSchema:
  parameters:
    - name: file_path
      description: The path of the file
      type: string
      required: true
    - name: offset
      description: number of byte to skip read
      type: integer
      required: false
      default: 1
    - name: limit
      description: max number of byte to read
      type: integer
      required: false
      default: 2000

Tool Versioning Strategy

Tools will follow semantic versioning (MAJOR.MINOR.PATCH):

MAJOR: Breaking changes to input/output schemas
MINOR: Backward-compatible functionality additions
PATCH: Backward-compatible bug fixes or documentation updates

Tool Development Process

1. Introducing a New Tool

Create Tool Schema yaml
Add Routing Evaluation Config
Run and pass the test and routing evaluation
Submit Merge Request
AI Expert Review
Approval, Merge and publish tool-registry package with new version
Install the package and start the tool execution implementation

2. Minor Tool Updates (Non-Breaking)

Create Tool Schema yaml with increment MINOR or PATCH version
Update Routing Evaluation Config
Run and pass the test and routing evaluation
Submit Merge Request
AI Expert Review
Approval, Merge and publish tool-registry package with new version
Install the package and start the tool execution implementation

3. Breaking Changes (Major Version Bump)

Use Cases: Removing parameters, changing parameter types, renaming tools

Create Tool Schema yaml with new major version
Update Routing Evaluation Config
Run and pass the test and routing evaluation
Submit Merge Request
AI Expert Review
Approval, Merge and publish tool-registry package with new version
Install the package and start the tool execution implementation

Tool Implementation

Installed the correct tool registry package
Import the Tool base class and output schema
Implement the _execute method
Test and create MR

Example in python

from tool_registry.read_file.v0_0_2 import ReadFileBase, ReadFileOuput, spec


class ReadFile(ReadFileBase):

    def _execute(self, file_path, offset=1, limit=2000) -> ReadFileOuput:

        content = (
            "Do you want to know more about Junming?" + "\n" * 2100
            if offset < 2000
            else "Junming is a Senior ML Engineer at GitLab!"
        )
        return ReadFileOuput(
            result=f"content read from version: {spec.version} " + content,
            metadata={"offset": offset, "limit": limit, "total_lines": 5000},
            instruction=f"The content is partial from the file: {file_path}, if the result doesn't have content you need, try to increase the offset.",
        )

Legacy content

Click to expand

Tool Schema

Based on all the gitlab related tools in DWS, here are the proposed consolidation plan.

list_issues
get_issue
list_issue_notes
get_issue_note
gitlab_issue_search

Consolidated to tool:

{
  "title": "get_gitlab_issue",
  "description": "Get GitLab issue related data from a given API endpoint.",
  "properties": {
    "project_id": {
      "type": "integer",
      "description": "The id of the GitLab project"
    },
    "endpoint": {
      "type": "string",
      "description": "API endpoint, strictly follows the below examples:
        - issues
        - issues?assignee_username=john
        - issues?author_username=john
        - issues?confidential=true
        - issues?iids[]=42&iids[]=43
        - issues?labels=foo
        - issues?labels=foo,bar 
        - issues?labels=foo,bar&state=opened
        - issues?milestone=1.0.0
        - issues?milestone=1.0.0&state=opened
        - issues?my_reaction_emoji=star
        - issues?search=issue+title+or+description
        - issues?state=closed
        - issues?state=opened"
    }
  },
  "required": [
    "project_id",
    "endpoint"
  ],
  "type": "object",
  "strict": false
}

create_issue_note
create_issue
update_issue
list_work_items
get_work_item
get_work_item_notes
gitlab_note_search
create_work_item
create_work_item_note
get_epic
list_epics
list_epic_notes
create_epic
update_epic

get_merge_request
list_merge_request_diffs
list_all_merge_request_notes
gitlab_merge_request_search

Consolidated to tool:

{'name': 'get_gitlab_merge_request',
  'description': 'Get GitLab merge request related data from a given API path.',
  'parameters': {'properties': {'api_path': {'description': 'GitLab merge request rest api path, it should be one of the following path:\n- `/api/v4/projects/<project_id>/merge_requests` (list all the merge request in the project)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>` (get a specific merge request)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>/notes` (list all the notes from a specific merge request)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>/diffs` (list all the diffs from a specific merge request)\n',
     'type': 'string'}},
   'required': ['api_path'],
   'type': 'object'}}

create_merge_request_note
create_merge_request
update_merge_request
get_commit
gitlab_commit_search
list_commits
get_commit_diff
get_commit_comments
create_commit
list_vulnerabilities
confirm_vulnerability
link_vulnerability_to_issue
get_vulnerability_details
update_vulnerability_severity
dismiss_vulnerability
list_instance_audit_events
list_project_audit_events
list_group_audit_events
get_repository_file
list_repository_tree
get_project
gitlab__user_search
get_current_user
get_previous_session_context
get_job_logs
gitlab_blob_search
gitlab_wiki_blob_search
gitlab_group_project_search
gitlab_documentation_search
gitlab_milestone_search
ci_linter
get_pipeline_errors

Edited Oct 21, 2025 by Junming Huang