Bash script to download license-db export files does not download all files
Summary
When manually downloading the license-db export files via the documented script, only the first 1000 export files are considered. As a result, the license-db local copy is incomplete and can lead to inaccurate license scanning results.
Steps to reproduce
- Copy the REST API call from the script that lists the bucket objects.
- Verify that the REST API response contains a
nextPageToken
variable which indicates that the response does not contain all objects..
Example Project
What is the current bug behavior?
Only first 1000 objects are downloaded by script.
What is the expected correct behavior?
All objects from the bucket are downloaded.
Relevant logs and/or screenshots
N/A
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Implementation Plan
Iteration 1
Iteration 2
-
Amend existing curl
command to iterate all objects in the bucket usingnextPageToken
, storing them in separatejson
files -
Use jq
to combine all resultingjson
files -
Apply the existing @tsv
formatting withjq
to full set of objects
nextPageToken
Example
#!/usr/bin/env bash
set -euo pipefail
MAX_RESULTS=1000
BASE="out"
COUNTER=0
FILE="$BASE.$COUNTER.json"
curl --silent --show-error --request GET "https://storage.googleapis.com/storage/v1/b/prod-export-license-bucket-1a6c642fc4de57d4/o?maxResults=$MAX_RESULTS" > "$FILE"
NEXT_PAGE_TOKEN="$(jq -r '.nextPageToken' $FILE)"
while [ "$NEXT_PAGE_TOKEN" != "null" ]
do
COUNTER=$(( $COUNTER + 1 ))
FILE="$BASE.$COUNTER.json"
curl --silent --show-error --request GET "https://storage.googleapis.com/storage/v1/b/prod-export-license-bucket-1a6c642fc4de57d4/o?maxResults=$MAX_RESULTS&pageToken=$NEXT_PAGE_TOKEN" > "$FILE"
NEXT_PAGE_TOKEN="$(jq -r '.nextPageToken' $FILE)"
sleep 1
done
echo "complete"
Edited by Oscar Tovar