Analytics API Guidelines

Analytics (private) API Guidelines

This issue describes a few design guidelines for exposing data to the analytics frontend features. When we have a consensus we can move this to dev docs.

  • The APIs are "private" only used by the frontend.
  • The API endpoints are feature (chart) specific, we're not considering GraphQL at the moment (TODO: discuss pros and cons).

Parameters

Expressing Date Ranges

  • Good for looking at a specific period, generated links could be easily shared.
  • Use ISO 8601 date (YYYY-MM-DD)
  • from - required parameter or provide a fallback value (30 days ago)
  • to - optional parameter, defaults to today's date
  • from < to, needs to be validated and let the client know ( 422 response).
  • Consider limiting the date range to avoid DB timeouts for large datasets.

Named Date Ranges

  • Good for periodically checking the latest stats.
  • from - required parameter, potential values:
    • 30days
    • 7days
    • last_week
    • last_month
  • Do not try to parse it, keep a list of allowed values.

Sorting

  • Check before providing sorting options, database indices will likely to be needed.
    • Sorting by creation time (created_at): prefer sorting by id on the DB level.
  • Parameter name: sort
  • Naming convention:
    • $FIELD_$DIRECTION
    • created_at_desc
    • committed_at_desc
  • Keep a list of allowed fields for sorting and provide a default sorting option as fallback.

Pagination

  • Use the standard helpers provided by kaminari gem.
  • page - defaults to 1.
  • per_page - the default depends on the feature.
  • Validate per_page and provide a fallback value to prevent large values to be passed.
  • For now, provide the standard pagination headers:
    • X-Per-Page
    • X-Page
    • X-Next-Page
    • X-Prev-Page
    • X-Total
    • X-Total-Pages
  • Heads up: offset pagination is expensive from DB point of view, they way we're doing pagination in GitLab might change.

Group Filter

  • Currently we are using group_id=$group_full_path for filtering groups.
    • Example: group_id=gitlab-org/subgroup
    • I'm not sure what's the main reason behind this, but can be confusing.
    • Also looks ugly when url escaped.
  • Group ID is not "hidden" from the users, so wouldn't be a problem exposing it.
  • Proposal: use the database ID for group_id
  • If the filter allows multiple groups: group_ids[] parameter should be used.

Project Filter

  • Currently we are using project_id=$project_full_path for filtering projects.
    • Example: group_id=gitlab-org/gitlab
  • Proposal: use the database ID for project_id
  • If the filter allows multiple groups: project_ids[] parameter should be used.
    • If group_id is provided, the query should be scoped for the group to avoid leaking data from other groups.

API Response

  • For now we are returning a standard JSON response.
    • Success: 200
    • License not available, page not found: 404
    • Current user does't have enough permission, forbidden: 403
  • For the successful API response, verify it against a JSON schema.
    • Example: expect(response).to match_response_schema('analytics/cycle_analytics/stages', dir: 'ee')

Response Example

  • Avoid: nested hashes (difficult to describe with schema, difficult to extend)
  • Prefer: arrays

Example (avoid):

{
  "2018-01-01": 1,
  "2018-01-02": 4,
  "2018-01-03": 15
}

Example (prefer):

[
  { "date": "2018-01-01", "count": 1, "my_new_attribute": 1 },
  { "date": "2018-01-01", "count": 4, "my_new_attribute": 1 },
  { "date": "2018-01-01", "count": 15, "my_new_attribute": 1 }
]

Feature and Permission Check

  • Each analytics feature should be guarded by a license entry (ee/app/models/license.rb).
  • Ensure feature availability in the following places:
    • Routes: ee/config/routes/analytics.rb
    • Controllers
    • Services (if we use service objects)