Commit 2a343903 authored by Douwe Maan's avatar Douwe Maan
Browse files

Merge branch 'master' into 'master'

feat: add `pipelines` and `pipelines_extended` endpoints, update README

See merge request !23
parents 025964d1 35ac1b71
# Changelog
## 0.9.6
[!23](https://gitlab.com/meltano/tap-gitlab/-/merge_requests/23)
* Add `pipelines` endpoint (https://docs.gitlab.com/ee/api/pipelines.html#list-project-pipelines)
* Add `pipelines_extended` endpoint (https://docs.gitlab.com/ee/api/pipelines.html#get-a-single-pipeline)
* Make `gen_request` function handle requests that return a single JSON object, instead of an array of objects
* Add `stats__*` columns to `commits` endpoint
* Add `released_at` column to `releases` endpoint
([Tomasz Zbrozek](https://gitlab.com/tomekzbrozek))
## 0.9.5
* [!22](https://gitlab.com/meltano/tap-gitlab/-/merge_requests/22) Fix bug causing only projects in first group to be synced if multiple groups are specified ([Tomasz Zbrozek](https://gitlab.com/tomekzbrozek))
......@@ -10,7 +19,7 @@
* [#16](https://gitlab.com/meltano/tap-gitlab/issues/17) Remove requirement to have the api version hard-coded on the `api_url` parameter. The `api_url` now requires only the base URL of the GitLab instance, e.g. `https://gitlab.com`. Old configuration settings or manually setting the version are still supported.
## 0.9.2
* [#16](https://gitlab.com/meltano/tap-gitlab/issues/16) Handle 401 (Unauthorized), 403 (Forbidden) and 404 (Not Found) Resource errors gracefully: Skip extracting that resource and continue with the rest. That can happen, for example, when accessing a private project or accessing the members, milestones or labels of a project without sufficient privileges.
* [#16](https://gitlab.com/meltano/tap-gitlab/issues/16) Handle 401 (Unauthorized), 403 (Forbidden) and 404 (Not Found) Resource errors gracefully: Skip extracting that resource and continue with the rest. That can happen, for example, when accessing a private project or accessing the members, milestones or labels of a project without sufficient privileges.
## 0.9.1
* Update Issues to also fetch the closed_by_id attribute.
......@@ -24,7 +33,7 @@
## 0.8.0
* Add support for incremental extraction of Commits, Issues, Merge Requests and Epics.
* Properly use STATE and the `start_date` to only fetch entities created/updated after that date.
* Properly use STATE and the `start_date` to only fetch entities created/updated after that date.
(tap-gitlab was fetching everything and filtering the results afterwards, which resulted in huge overhead for large projects)
* Add dedicated STATE for commits, issues and merge_requests per Project and for epics per Group.
* Ensure that the last message emitted is the final STATE.
......@@ -34,7 +43,7 @@
* Fix the pagination not working for very large projects with more than 10,000 entities per response.
* Use the `X-Next-Page` header instead of the `X-Total-Pages` header.
https://docs.gitlab.com/ee/api/#other-pagination-headers
* Use the `per_page` param to fetch 100 records per call instead of 20.
* Use the `per_page` param to fetch 100 records per call instead of 20.
No more need for 5K calls to fetch all the gitlab-ce commits. A win for all.
* Explicitly set the per_page param to 20 for labels API end points until gitlab-org/gitlab-ce#63103 is fixed.
......@@ -58,7 +67,7 @@
* Add support for fetching Tags and Releases
* Add support for fetching Merge Requests
* Add support for fetching Group and Project Members
* Update Users with Member info
* Update Users with Member info
* Fetch additional {'default', 'can_push'} attributes for Branches
* Fetch additional {'authored_date', 'committed_date', 'parent_ids'} attributes for Commits
* Fetch additional {'upvotes', 'downvotes', 'merge_requests_count' 'weight'} attributes for Issues
......
......@@ -29,9 +29,10 @@ This tap:
1. Install
```bash
> pip install tap-gitlab
```
Currently this project is not hosted on Python Package Index. To install, run:
```
pip install git+https://gitlab.com/meltano/tap-gitlab.git
```
2. Get your GitLab access token
......@@ -46,7 +47,7 @@ This tap:
- API URL for your GitLab account. If you are using the public gitlab.com this will be `https://gitlab.com/api/v3`
- Groups to track (space separated)
- Projects to track (space separated)
Notes on group and project options:
- either groups or projects need to be provided
- filling in 'groups' but leaving 'projects' empty will sync all group projects.
......@@ -57,11 +58,12 @@ This tap:
{
"api_url": "https://gitlab.com",
"private_token": "your-access-token",
"groups": "myorg mygroup",
"groups": "myorg mygroup",
"projects": "myorg/repo-a myorg/repo-b",
"start_date": "2018-01-01T00:00:00Z",
"ultimate_license": true,
"fetch_merge_request_commits": false
"fetch_merge_request_commits": false,
"fetch_pipelines_extended": false
}
```
......@@ -71,6 +73,8 @@ This tap:
If `fetch_merge_request_commits` is true (defaults to false), then for each Merge Request, also fetch the MR's commits and create the join table `merge_request_commits` with the Merge Request and related Commit IDs. In the current version of GitLab's API, this operation requires one API call per Merge Request, so setting this to True can slow down considerably the end-to-end extraction time. For example, in a project like `gitlab-org/gitlab-foss`, this would result to 15x more API calls than required for fetching all the other Entities supported by `tap-gitlab`.
If `fetch_pipelines_extended` is true (defaults to false), then for every Pipeline fetched with `sync_pipelines` (which returns N pages containing all pipelines per project), also fetch extended details of each of these pipelines with `sync_pipelines_extended`. Similar concerns as those related to `fetch_merge_request_commits` apply here - every pipeline fetched with `sync_pipelines_extended` requires a separate API call.
4. [Optional] Create the initial state file
You can provide JSON file that contains a date for the API endpoints
......@@ -85,7 +89,7 @@ This tap:
"project_278964_commits": "2017-01-17T00:00:00Z"
}
```
Note:
- You have to provide the id of each project you are syncing. For example, in the case of `gitlab-org/gitlab` it is 278964.
- You can find the Project ID for a project in the homepage for the project, under its name.
......
......@@ -3,7 +3,7 @@
from setuptools import setup
setup(name='tap-gitlab',
version='0.9.5',
version='0.9.6',
description='Singer.io tap for extracting data from the GitLab API',
author='Meltano Team && Stitch',
url='https://singer.io',
......
......@@ -40,7 +40,7 @@ RESOURCES = {
'key_properties': ['project_id', 'name'],
},
'commits': {
'url': '/projects/{id}/repository/commits?since={start_date}',
'url': '/projects/{id}/repository/commits?since={start_date}&with_stats=true',
'schema': load_schema('commits'),
'key_properties': ['id'],
},
......@@ -119,6 +119,16 @@ RESOURCES = {
'schema': load_schema('epic_issues'),
'key_properties': ['group_id', 'epic_iid', 'epic_issue_id'],
},
'pipelines': {
'url': '/projects/{id}/pipelines?updated_after={start_date}',
'schema': load_schema('pipelines'),
'key_properties': ['id']
},
'pipelines_extended': {
'url': '/projects/{id}/pipelines/{secondary_id}',
'schema': load_schema('pipelines_extended'),
'key_properties': ['id']
},
}
ULTIMATE_RESOURCES = ("epics", "epic_issues")
......@@ -208,8 +218,14 @@ def gen_request(url):
while next_page:
params['page'] = int(next_page)
resp = request(url, params)
for row in resp.json():
yield row
resp_json = resp.json()
# handle endpoints that return a single JSON object
if isinstance(resp_json, dict):
yield resp_json
# handle endpoints that return an array of JSON objects
else:
for row in resp_json:
yield row
next_page = resp.headers.get('X-Next-Page', None)
except ResourceInaccessible as exc:
# Don't halt execution if a Resource is Inaccessible
......@@ -495,6 +511,40 @@ def sync_group(gid, pids):
singer.write_record("groups", group, time_extracted=time_extracted)
def sync_pipelines(project):
entity = "pipelines"
# Keep a state for the pipelines fetched per project
state_key = "project_{}_pipelines".format(project['id'])
start_date=get_start(state_key)
url = get_url(entity=entity, id=project['id'], start_date=start_date)
with Transformer(pre_hook=format_timestamp) as transformer:
for row in gen_request(url):
transformed_row = transformer.transform(row, RESOURCES[entity]["schema"])
# Write the Pipeline record
singer.write_record(entity, transformed_row, time_extracted=utils.now())
utils.update_state(STATE, state_key, row['updated_at'])
# Sync additional details of a pipeline using get-a-single-pipeline endpoint
# https://docs.gitlab.com/ee/api/pipelines.html#get-a-single-pipeline
if CONFIG['fetch_pipelines_extended']:
sync_pipelines_extended(project, transformed_row)
singer.write_state(STATE)
def sync_pipelines_extended(project, pipeline):
entity = "pipelines_extended"
url = get_url(entity=entity, id=project['id'], secondary_id=pipeline['id'])
with Transformer(pre_hook=format_timestamp) as transformer:
for row in gen_request(url):
row['project_id'] = project['id']
transformed_row = transformer.transform(row, RESOURCES["pipelines_extended"]["schema"])
singer.write_record("pipelines_extended", transformed_row, time_extracted=utils.now())
def sync_project(pid):
url = get_url(entity="projects", id=pid)
......@@ -535,6 +585,7 @@ def sync_project(pid):
sync_labels(project)
sync_releases(project)
sync_tags(project)
sync_pipelines(project)
singer.write_record("projects", project, time_extracted=time_extracted)
utils.update_state(STATE, state_key, last_activity_at)
......@@ -586,6 +637,7 @@ def main_impl():
CONFIG.update(args.config)
CONFIG['ultimate_license'] = truthy(CONFIG['ultimate_license'])
CONFIG['fetch_merge_request_commits'] = truthy(CONFIG['fetch_merge_request_commits'])
CONFIG['fetch_pipelines_extended'] = truthy(CONFIG['fetch_pipelines_extended'])
if '/api/' not in CONFIG['api_url']:
CONFIG['api_url'] += '/api/v4'
......
......@@ -55,6 +55,20 @@
"type": "null"
}
]
},
"stats": {
"type": "object",
"properties": {
"additions": {
"type": "integer"
},
"deletions": {
"type": "integer"
},
"total": {
"type": "integer"
}
}
}
}
}
{
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"status": {
"type": "string"
},
"ref": {
"type": ["null", "string"]
},
"sha": {
"type": "string"
},
"web_url": {
"type": "string"
},
"created_at": {
"type": "string",
"format": "date-time"
},
"updated_at": {
"type": "string",
"format": "date-time"
}
}
}
{
"type": "object",
"properties": {
"project_id": {
"type": "integer"
},
"id": {
"type": "integer"
},
"status": {
"type": "string"
},
"ref": {
"type": ["null", "string"]
},
"sha": {
"type": "string"
},
"before_sha": {
"type": "string"
},
"tag": {
"type": "boolean"
},
"yaml_errors": {
"type": ["null", "string"]
},
"user": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"username": {
"type": "string"
},
"id": {
"type": "integer"
},
"state": {
"type": "string"
}
}
},
"created_at": {
"type": "string",
"format": "date-time"
},
"updated_at": {
"type": "string",
"format": "date-time"
},
"started_at": {
"anyOf": [
{
"type": "string",
"format": "date-time"
},
{
"type": "null"
}
]
},
"finished_at": {
"anyOf": [
{
"type": "string",
"format": "date-time"
},
{
"type": "null"
}
]
},
"committed_at": {
"anyOf": [
{
"type": "string",
"format": "date-time"
},
{
"type": "null"
}
]
},
"duration": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
]
},
"coverage": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
]
},
"web_url": {
"type": "string"
}
}
}
......@@ -29,6 +29,17 @@
"type": "null"
}
]
},
"released_at": {
"anyOf": [
{
"type": "string",
"format": "date-time"
},
{
"type": "null"
}
]
}
}
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment