npm feeder job exceeds job timeout

Problem

The npm feeder is currently hitting the job timeout of 3 hours. As a result, we're currently unable to update our license database. This is happening because each time the job is executed it starts from the beginning and fetches all the required splits. Contrast this with the go feeder that uses a cursor persisted in cloud storage in order to carry on working from where it left off.

Failing Job

run feeder

Proposed Solutions

  • Persist the last round's splits

Implementation Plan 1

  • Save the splits from the latest round to the cursor
  • Instead of starting from "scratch", start from the last saved split. their offsets need to be fetched again, as those might have moved
  • Sub-split as needed
  • Persist the results of the new sub-splits

Please note: we worked through this solution but ran into an issue with rate-limiting. An extenstion to the solution was needed.

Implementation Plan 2

Iteration 1

  • Create a new compute instance in ext-license-db-dev-d6ba6f35 in the us-east1 region
  • Create an administrator account and set a password
  • Setup couchdb
  • Create a new database called license-db-npm-mirror
  • Start the replication process using https://replicate.npmjs.com/registry as the source and license-db-npm-mirror as the target. Select continuous as the type of replication
  • Monitor progress and ensure it completes
  • Create a disk image backup
  • Set custom hostname for the instance to avoid relying on an ephemeral ip address
  • Perform some manual QA to validate that removing rate-limiting moves us closer to our goal

Iteration 2

Iteration 3

  • Review infrastructure changes
  • Align with existing approaches to security
  • Add terraform code to provision couchdb and associated infrastructure
Edited by Philip Cunningham