Sign in or sign up before continuing. Don't have an account yet? Register now to get started.
Register now
Support large YAML definitions in AI Catalog without hitting JSONB size limits
## Problem The AI Catalog currently stores YAML definitions using a pattern that duplicates data: ```ruby YAML.safe_load(params[:definition]).merge(yaml_definition: params[:definition]) ``` This approach stores both: 1. The parsed YAML data structure 2. The original raw YAML string under `yaml_definition` This duplication causes two issues: 1. **Size limit constraint**: The JsonSchemaValidator has a 64kb limit, but due to duplication, users can only submit ~32kb YAML files before hitting the limit. (Note: We need to [maintain this contraint](https://docs.gitlab.com/development/migration_style_guide/#storing-json-in-database)) 2. **Scalability concern**: AI Catalog definitions could legitimately be much larger than 64kb (potentially 100kb+ for complex workflows), making the current approach unsuitable ## Current Implementation ```ruby # ee/app/services/ai/catalog/concerns/yaml_definition_parser.rb def definition_parsed return unless params[:definition].present? YAML.safe_load(params[:definition]).merge(yaml_definition: params[:definition]) rescue Psych::SyntaxError nil end ``` The merged result gets validated with: ```ruby JsonSchemaValidator.new({ attributes: :definition, size_limit: 64.kilobytes, # ... }).validate(self) ``` ## Proposed Solutions ### Option 1: Object Storage (Recommended) Store large YAML definitions in object storage and keep only metadata + reference in JSONB: ```ruby # Database stores only: { "yaml_definition_url": "https://storage.../definitions/abc123.yml", "checksum": "sha256:...", "size": 150000, "version": "v1", "title": "My Workflow", # ... other parsed metadata for querying } ``` **Benefits:** - No size limits - Better performance (don't load large YAML unless needed) - Follows GitLab patterns for large content - Maintains audit trail with checksums ### Option 2: Hybrid Approach - Small definitions (< 32kb): Keep current inline storage - Large definitions: Automatically promote to object storage - Transparent to consumers via accessor methods ## Acceptance Criteria - [ ] Support YAML definitions larger than 64kb - [ ] Maintain backward compatibility with existing definitions - [ ] Preserve original YAML for audit/display purposes - [ ] No performance regression for small definitions - [ ] Follow GitLab patterns for large content storage ## Additional Context - All other JsonSchemaValidator usage in GitLab uses exactly 64kb limit - This would be the first case requiring larger limits - AI workflows legitimately need complex, large definitions - Current duplication pattern is inefficient but serves important purposes (audit trail, display fidelity)
issue