Create schema and guidelines for desired GitLab tools for Agent Platform
Problem
In order for us to not run into the same problems when creating tools for the MCP server that have already been brought up in gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#1328 (closed) the main important part is to think about what kind of tools work well for the usage with LLMs and MCP rather than ad-hoc create tool as the need arises or make them a 1-to-1 relationship between API and tool.
The problem there is that many engineers don't yet have the knowledge or good intuition around how to design good tools, even though they don't have a problem actually implementing the tool (e.g. the data fetching logic).
Desired Outcome
Ideally we have two outcomes (not :
- Guidelines on how to create good tools, e.g. considerations around input parameters, prompt as well as surface area of the tools
- A schema proposal that maps the tools to interact with GitLab (CRUD issue/work_item etc.) in the Duo Agent Platform to a new, likely reduced set of tools with more LLM friendly inputs
Proposed Solution
PoC demo: https://youtu.be/pPN6gAXiBg4
Summary
This proposal introduces a centralized Tool Registry repository to manage tool schemas for GitLab's AI agent platform. Currently, tool development lacks standardization, leading to inconsistent quality and poor LLM compatibility. The Tool Registry will serve as a single source of truth for all tool schemas, enforce quality standards through expert review, include automated routing evaluation, and generate versioned clients for multiple programming languages. This approach will enable scalable, high-quality tool development while maintaining backward compatibility and supporting both GitLab.com and self-managed deployments.
Ideal State: A centralized system where AI experts review and approve all tool schemas, automated tests validate routing performance, and versioned client packages are automatically generated for implementation across MCP servers, Duo Workflow Service (DWS), and Node Executor. The Registry will maintain multiple versions of each tool simultaneously, allowing services to adopt schema changes at their own pace while ensuring backward compatibility.
Overview
The Tool Registry is a centralized repository that hosts tool schemas (not implementations) with the following core features:
- Schema-only repository: Stores only tool definitions; implementations remain in relevant services
- Expert review process: AI experts committee reviews all schema changes
- Integrated evaluation: Day-one tool routing evaluation for schema changes
- Tool versions support: Support multiple versions of each tool simultaneously, allowing services to adopt schema changes at their own pace.
- Multi-language support: Generates versioned packages in Python, Ruby, and Node.js
- Comprehensive documentation: Guidelines for creating effective tools
Key Benefits
Centralized Management
- Single repo for all tool schemas
- Simple to add tools (just a YAML file)
- Auto-generates interfaces for multiple languages/registries
Enable Quality at Scale
- Built-in routing evaluation for data-driven decisions
- Easy adoption of industry standards (e.g., Structured Output)
- Centralized guidelines and quality control
Streamline Collaboration
- Decouples schema design from implementation
- AI experts review all changes in one place
- Backend engineers focus on implementation (their expertise)
Maintain Competitive Advantage
- Flexibility to rapidly adopt industry evolution in tool design
- Version support for safe migrations and custom flow building
- Keep GitLab AI solution ahead of competitors
Architecture
The tool registry acts as a central hub that:
- Stores all tool schemas with multiple version supports
- Enforces quality through automated tests and expert review
- Generates client packages with tool interfaces for runtime implementation in various languages
- Validates routing performance continuously
- Support customizable structure tool output schema
Repository Structure
Demo repo: https://gitlab.com/junminghuang/tool-registry
├── clients
│ ├── node
│ ├── python
│ │ ├── pyproject.toml
│ │ ├── README.md
│ │ ├── src
│ │ │ └── tool_registry
│ │ │ ├── __init__.py
│ │ │ ├── get_gitlab_issue
│ │ │ │ └── __init__.py
│ │ │ ├── read_file
│ │ │ │ ├── __init__.py
│ │ │ │ ├── v0_0_1.py
│ │ │ │ └── v0_0_2.py
│ │ │ └── schema.py
│ │ └── tests
│ │ ├── __init__.py
│ │ └── test_main.py
│ └── ruby
├── docs
│ ├── 01-introduction.md
│ ├── 02-tool-basics.md
│ ├── 03-defining-your-tool.md
│ ├── 04-implementation-guide.md
│ ├── 05-best-practices.md
│ └── 06-testing-and-debugging.md
├── eval
│ ├── configs
│ │ ├── get_gitlab_issue
│ │ │ └── v0_0_1.yaml
│ │ └── read_file
│ │ ├── v0_0_1.yaml
│ │ └── v0_0_2.yaml
│ └── main.py
├── generators
│ ├── gen_node.py
│ ├── gen_python.py
│ └── gen_ruby.py
├── pyproject.toml
├── README.md
├── tests
├── tools
│ ├── __init__.py
│ ├── schema.py
│ ├── specs
│ │ ├── get_gitlab_issue
│ │ │ └── v0_0_1.yaml
│ │ └── read_file
│ │ ├── v0_0_1.yaml
│ │ └── v0_0_2.yaml
│ └── tool_factory.py
└── uv.lock
Schema Definitions
Tool Schema
As yaml is widely used in GitLab, tool-registry will use yaml as the schema format and using python pydantic BaseModel for schema validation. Every tool yaml will be loaded as a ToolSpec instance for validation check. https://gitlab.com/junminghuang/tool-registry/-/blob/62924f5f72c612c1191d4abbf4c822ce8cc1b356/tools/schema.py
An example tool schema is just a simple yaml file as follows.
version: 0.0.2
title: Read File
name: read_file
description: A tool to read the partial content of a given file path based on the offset and limit setting
host: duo_workflow_service
inputSchema:
parameters:
- name: file_path
description: The path of the file
type: string
required: true
- name: offset
description: number of byte to skip read
type: integer
required: false
default: 1
- name: limit
description: max number of byte to read
type: integer
required: false
default: 2000
Tool Versioning Strategy
Tools will follow semantic versioning (MAJOR.MINOR.PATCH):
- MAJOR: Breaking changes to input/output schemas
- MINOR: Backward-compatible functionality additions
- PATCH: Backward-compatible bug fixes or documentation updates
Tool Development Process
1. Introducing a New Tool
- Create Tool Schema yaml
- Add Routing Evaluation Config
- Run and pass the test and routing evaluation
- Submit Merge Request
- AI Expert Review
- Approval, Merge and publish tool-registry package with new version
- Install the package and start the tool execution implementation
2. Minor Tool Updates (Non-Breaking)
- Create Tool Schema yaml with increment MINOR or PATCH version
- Update Routing Evaluation Config
- Run and pass the test and routing evaluation
- Submit Merge Request
- AI Expert Review
- Approval, Merge and publish tool-registry package with new version
- Install the package and start the tool execution implementation
3. Breaking Changes (Major Version Bump)
Use Cases: Removing parameters, changing parameter types, renaming tools
- Create Tool Schema yaml with new major version
- Update Routing Evaluation Config
- Run and pass the test and routing evaluation
- Submit Merge Request
- AI Expert Review
- Approval, Merge and publish tool-registry package with new version
- Install the package and start the tool execution implementation
Tool Implementation
- Installed the correct tool registry package
- Import the Tool base class and output schema
- Implement the
_executemethod - Test and create MR
Example in python
from tool_registry.read_file.v0_0_2 import ReadFileBase, ReadFileOuput, spec
class ReadFile(ReadFileBase):
def _execute(self, file_path, offset=1, limit=2000) -> ReadFileOuput:
content = (
"Do you want to know more about Junming?" + "\n" * 2100
if offset < 2000
else "Junming is a Senior ML Engineer at GitLab!"
)
return ReadFileOuput(
result=f"content read from version: {spec.version} " + content,
metadata={"offset": offset, "limit": limit, "total_lines": 5000},
instruction=f"The content is partial from the file: {file_path}, if the result doesn't have content you need, try to increase the offset.",
)
Legacy content
Click to expand
Tool Schema
Based on all the gitlab related tools in DWS, here are the proposed consolidation plan.
- list_issues
- get_issue
- list_issue_notes
- get_issue_note
- gitlab_issue_search
Consolidated to tool:
{
"title": "get_gitlab_issue",
"description": "Get GitLab issue related data from a given API endpoint.",
"properties": {
"project_id": {
"type": "integer",
"description": "The id of the GitLab project"
},
"endpoint": {
"type": "string",
"description": "API endpoint, strictly follows the below examples:
- issues
- issues?assignee_username=john
- issues?author_username=john
- issues?confidential=true
- issues?iids[]=42&iids[]=43
- issues?labels=foo
- issues?labels=foo,bar
- issues?labels=foo,bar&state=opened
- issues?milestone=1.0.0
- issues?milestone=1.0.0&state=opened
- issues?my_reaction_emoji=star
- issues?search=issue+title+or+description
- issues?state=closed
- issues?state=opened"
}
},
"required": [
"project_id",
"endpoint"
],
"type": "object",
"strict": false
}
-
create_issue_note
-
create_issue
-
update_issue
-
list_work_items
-
get_work_item
-
get_work_item_notes
-
gitlab_note_search
-
create_work_item
-
create_work_item_note
-
get_epic
-
list_epics
-
list_epic_notes
-
create_epic
-
update_epic
- get_merge_request
- list_merge_request_diffs
- list_all_merge_request_notes
- gitlab_merge_request_search
Consolidated to tool:
{'name': 'get_gitlab_merge_request',
'description': 'Get GitLab merge request related data from a given API path.',
'parameters': {'properties': {'api_path': {'description': 'GitLab merge request rest api path, it should be one of the following path:\n- `/api/v4/projects/<project_id>/merge_requests` (list all the merge request in the project)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>` (get a specific merge request)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>/notes` (list all the notes from a specific merge request)\n- `/api/v4/projects/<project_id>/merge_requests/<merge_request_id>/diffs` (list all the diffs from a specific merge request)\n',
'type': 'string'}},
'required': ['api_path'],
'type': 'object'}}
-
create_merge_request_note
-
create_merge_request
-
update_merge_request
-
get_commit
-
gitlab_commit_search
-
list_commits
-
get_commit_diff
-
get_commit_comments
-
create_commit
-
list_vulnerabilities
-
confirm_vulnerability
-
link_vulnerability_to_issue
-
get_vulnerability_details
-
update_vulnerability_severity
-
dismiss_vulnerability
-
list_instance_audit_events
-
list_project_audit_events
-
list_group_audit_events
-
get_repository_file
-
list_repository_tree
-
get_project
-
gitlab__user_search
-
get_current_user
-
get_previous_session_context
-
get_job_logs
-
gitlab_blob_search
-
gitlab_wiki_blob_search
-
gitlab_group_project_search
-
gitlab_documentation_search
-
gitlab_milestone_search
-
ci_linter
-
get_pipeline_errors
