Many npm package licenses reported as Unknown even if license type is unambiguous in npmjs (couchDB replication issue)

added 1 deleted label

changed the description

This issue was automatically tagged with the label groupvulnerability research by TanukiStan, a machine learning classification model, with a probability of 0.85.

If this label is incorrect, please tag this issue with the correct group label as well as automation:ml wrong to help TanukiStan learn from its mistakes.

If you are unsure about the correct group, please do not leave the issue without a group label. Please refer to GitLab's shared responsibility functionality guidelines for more information on how to triage this kind of issues.

Authors who do not have permission to update labels can leave the issue to be triaged by group leaders initially assigned by TanukiStan

This message was generated automatically. You're welcome to improve it.

added automation:ml groupvulnerability research labels

added devopssecure sectionsec labels

Hey @martin.levesque, thank you for creating this issue!

To get the right eyes on your issue more quickly, we encourage you to follow the issue triage best practices.

PLEASE NOTE: You will need to use the @gitlab-bot label command to apply labels to this issue.

To set expectations, GitLab product managers or team members can't make any promise if they will proceed with this. However, we believe everyone can contribute, and welcome you to work on this proposed change, feature or bug fix. There is a bias for action, so you don't need to wait. Try and spin up that merge request yourself.

If you need help doing so, we're always open to mentor you to drive this change.

This message was generated automatically. You're welcome to improve it.

added automation:self-triage-encouraged label

@gitlab-bot label groupcompliance @gitlab-bot label typebug @gitlab-bot label severity1

Thoughts on this @mhenriksen @thiagocsf . Is this more for VR or for composition analysis to determine which team should be responsible for looking into this?

@wayne it sounds to me like it's in composition analysis' domain. The VR team has not been involved much in the license database after the handoff of the initial version. I remember we designed it to fall back to Unknown license if there was any doubt, and not try to be too clever about guessing it, as showing a wrong license would be worse than showing Unknown.

We will of course gladly assist with any debugging if needed!

added typebug label

changed milestone to %16.9

added Category:Software Composition Analysis bugtransient groupcomposition analysis priority2 severity2 labels and removed groupvulnerability research label

added automation:ml wrong label

added workflowrefinement label

changed the description

set weight to 2

added workflowready for development label and removed workflowrefinement label

We have some ongoing issues with the components responsible for the license classification of npm packages which might be related to this: NPM Interfacer fails when GCP Load Balancer doe... (#436057 - closed)

I'm not refining further this issue but suggest an investigation timeboxed to 1 day to verify the DB content and if the problem is related to the other npm issue.

assigned to @gonzoyumo

changed health status to on track

mentioned in issue #432259 (closed)

mentioned in issue gitlab-com/Product#13036 (closed)

added blocked label

marked this issue as blocked by #436057 (closed)

added Deliverable label

removed blocked label

added workflowverification label and removed workflowready for development label

I have created a copy of the provided example project under https://gitlab.com/gitlab-org/secure/tests/npm-license-detection.

I confirm that all detected packages (including word-wrap) have a corresponding license and none of them has been flagged with an unknonw license:

Source: https://gitlab.com/gitlab-org/secure/tests/npm-license-detection/-/licenses

@martin.levesque I do not have access to the dependency list and license compliance pages in your existing project so I can't confirm it's all fixed there. Please reopen this issue if that's not the case.

One gotcha that we need to follow-up on is the fact that groupthreat insights has started to store detected licenses along with components in their own DB tables. This acts as a caching mechanism for the results provided by groupcomposition analysis features and we need to further clarify how that contextual stored data can be refreshed. This should not yet be a problem for the project level dependency list and license compliance page, but will soon be with the completion of Use database for project dependency list (&8293 - closed). This should impact the Group level depenedncy list though.

closed

added closedcomplete workflowcomplete labels and removed workflowverification label

@gonzoyumo, I could not check with the test project as my trial expired, but looking at projects we have internally, the number of unknown licenses has gone down to a manageable level.

However, there are still a couple of unknown license apart from our internal dependencies (which we expect anyways to be listed as unknown).

Most of them have alternate licensing, so that's probably the reason for "unknown", but there are a few that have an obvious license that still pop up:

webpack --> MIT

@graphql-codegen/* --> MIT

vite --> MIT

@martin.levesque thanks for the feedback.

Indeed internal dependencies aren't yet supported.

I've checked webpack and indeeed the last known version for this package in our DB is 5.85.0 while the latest on the registry is 5.90.0. I've checked with the team and there is actually a pending decision on the resync of full npm data, similar to the maven discussion in #433541 (comment 1750040839).

I'm reopening this issue and will close it once this is addressed.

@gonzoyumo, could this be due to Re-generate interfacer pubsub messages dead let... (#439485 - closed)?

@thiagocsf exactly, I wasn't aware of that issue, thanks for pointing at it!

reopened

added workflowverification label and removed workflowcomplete label

marked this issue as blocked by #439485 (closed)

mentioned in issue #439485 (closed)

groupcomposition analysis Engineering Manager, @johncrowley and @willmeek,

This groupcomposition analysis bug has at most 50% of the SLO duration remaining and is an SLONear Miss breach. Please consider taking action before this becomes an SLOMissed in 3 days (2024-02-13).

changed due date to February 13, 2024

added SLONear Miss label

Even after a full re-sync of npm today I still can't see the latest versions of webpack in our DB, we'll need to continue to investigate next week.

I'm moving this to the next milestone.

I've spent time today diving into the Package Metadata DB projects and here are is a summary (all actions have been executed on the DEV environement):

I've sent a test message on the pubsub topic dev-package-interfacer-topic-dev-npm-interfacer-cloud-run with following payload to trigger the interfacer logic:
```
{"api_version": "1-0-0", "package_registry": "npm", "package": "webpack"}
```
I've observed several messages on the log explorer matching the webpack pakage for that time period: https://cloudlogging.app.goo.gl/PxDLHtNPJCA3D7bCA
I've checked the SQL DB for this package but still can't see any newer version than the ones from June 2023
I've looked into our couchDB replica and could not find the document corresponding to webpack.
this document is available in the source we fetch data from: https://replicate.npmjs.com/registry/webpack

Thus I suspect there is an issue in the couchDB replication mechanism which prevents the webpack document to be added on our end.

@philipcunningham is that a problem you've seen before?

Thank you, @gonzoyumo. I think you did a good job at isolating the issue and I agree with your assessment that it indicates an issue with replication. I can see from looking at the scheduled pipeline page that the verification job associated with the NPM scheduled pipeline is failing:

$ ./test/end-to-end-npm-license-feeder.sh
Official registry count (2678172) and replica count (2624454) differ by 53718 documents.
Delta is not within tolerance (1500). Please check replication is running as expected

I will take a look today to see if there is anything obvious.

The web UI has a generic error:

The replication job will be tried at increasing intervals

Previously I would see the retrying state flick between retrying and in progress but that no longer seems to happen. Digging into this more now.

The dev instance can communicate with the upstream CouchDB database using curl, which would indicate that it isn't a network connectivity issue.

The issue appears to be that the credentials were rotated but the replication task relies on the credentials. I've confirmed this by fixing replication between prod and dev and will to do the same for dev.

@gonzoyumo I've confirmed that replication has now resumed. It will take some time to resync but, as soon as I see it has completed, I will manually trigger the NPM Feeder. This will also give some indication regarding the delta.

/cc @nilieskou

@gonzoyumo the delta is within bounds now:

$ ./test/end-to-end-npm-license-feeder.sh
Official registry count (2679032) and replica count (2679287) differ by 255 documents.
Delta is within tolerance (1500).

Thanks @philipcunningham. Unfortunately the webpack document is still missing (from both prod and dev instances) :/

Do you have more insights on how to further debug the replication before I dive in?

@gonzoyumo I haven't investigated the case where a document has failed to replicate before. Let me know if I can help

@philipcunningham I've tried today to setup some local replication and focusing on syncing a single document (webpack) but without success. I've managed to setup a full replication but not really kean to go that route I was hoping to leverage the selector object as documented but keep getting timeout from replicate.npmjs.com (even when setting crazy timeout) when using this method.

[error] 2024-02-19T19:02:37.290639Z nonode@nohost <0.19278.4> -------- Replicator, request POST to "https://replicate.npmjs.com/registry/_changes?filter=_selector&feed=normal&style=all_docs&since=0&timeout=1666666" failed due to error {connection_closed,mid_stream}
[error] 2024-02-19T19:03:07.007785Z nonode@nohost <0.19176.4> -------- ChangesReader process died with reason: {changes_reader_died,{timeout,ibrowse_stream_cleanup}}
[error] 2024-02-19T19:03:07.008240Z nonode@nohost <0.19176.4> -------- Replication `484b9f7d51ebaf72ad24a59c79dc249f` (`https://replicate.npmjs.com/registry/` -> `http://172.17.0.3:5984/test_replication/`) failed: {changes_reader_died,{timeout,ibrowse_stream_cleanup}}

Maybe we can discuss that in our 1:1 tomorrow?

@gonzoyumo have you tried testing out replication of another package with our replica? It would let you test out the process with a fast DB instance first before switching it out to try with webpack on the public registry. It might help identify if there's something particular about the webpack package in the public registry (e.g. a checkpoint issue).

Maybe we can discuss that in our 1:1 tomorrow?

Sounds good to to me.

Thanks @philipcunningham. Unfortunately none of the above made a difference :/

When looking one last time at the documentation after these failures, I luckily found the doc_ids option which unfortunately was not mentionned in the replication documentation

It worked like a charm and I was able to finally see an error for webpack sync

[error] 2024-02-20T17:09:15.430150Z nonode@nohost <0.19512.1> -------- Replicator: failed to write doc [<<"{\"_id\":\"webpack\",\"_rev\":\"1884-5c5484e457c41fc90624553d66ef8e5e\"
   [...] 
   uglify-js\":\"1.2.5\",\"sprint...">>,...]. Too large
[notice] 2024-02-20T17:09:15.435819Z nonode@nohost <0.19388.1> -------- Replication `0f2802853b037623b9610986c99c48dd` completed (triggered by `replicate_doc_2`)
[notice] 2024-02-20T17:09:15.435912Z nonode@nohost <0.389.0> -------- couch_replicator_scheduler: Job {"0f2802853b037623b9610986c99c48dd",[]} completed normally

So it seems the webpack document is just too big for the replication to handle it! It's a bit annoying that the replication considers it went fine and silents this problem :/

Apparently the limit has been dropped down to 8MB on couchdb version 3 and this document takes 11.3MB. Setting the [couchdb]max_document_size to a bigger value and retrying the sync fixed the problem https://docs.couchdb.org/en/latest/config/couchdb.html#couchdb/max_document_size

Actions taken:

I've updated our dev and prod environement to set this setting to 16MB
I've added a one time replication doc to resync the webpack package:
- Dev: https://dev-couchdb-npm-mirror.org/_utils/#database/_replicator/2023-02-20-webpack_fix
- Prod: https://prod-couchdb-npm-mirror.org/_utils/#/database/_replicator/2023-02-20-webpack_fix
Now the webpack package is present on both envs:
- Dev: https://dev-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/webpack
- Prod: https://prod-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/webpack

Next steps:

I'll wait for the next feeder execution to see if it picks up the webpack package and add it to our cloudSQL DB. Then after the next exporter run I'll check that it's in our GCP bucket data and that the GitLab instance correctly detects the license for webpack.

EDIT: also added an entry to our runbooks to address similar use cases: https://gitlab.com/gitlab-org/security-products/license-db/deployment/-/merge_requests/229

Excellent sleuthing!

@gonzoyumo it makes we wonder if it would be beneficial to re-run replication from the beginning now that you've adjusted this setting. I think it could offer improved precision for our customers on one of the most popular package ecosystems. What do you think?

@philipcunningham that's a good idea. I was wondering how we could figure out which npm documents are above that 8MB threshold or any other way to identify the currently ~255 missing documents.

But simply re-running a full replication might just do it indeed

Here is another package (@types/node) that we have in our couchDB but has not been updated to the latest rev as it is now above size limit (8.4MB): https://prod-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/%40types%2Fnode

For webpack, I've just checked and our CloudSQL DB now have additional records for newer version of the package that were missing:

     id     | license_ids |    version     |            created            
------------+-------------+----------------+-------------------------------
 2804395566 | {248}       | 5.90.0         | 2024-02-21 05:35:49.312942+00
 2804395410 | {248}       | 5.90.1         | 2024-02-21 05:35:49.312942+00
 2804395576 | {248}       | 5.90.2         | 2024-02-21 05:35:49.312942+00
 2804395093 | {248}       | 5.85.1         | 2024-02-21 05:35:49.312942+00
 2804395604 | {248}       | 5.86.0         | 2024-02-21 05:35:49.312942+00
 2804395204 | {248}       | 5.87.0         | 2024-02-21 05:35:49.312942+00
 2804395138 | {248}       | 5.88.0         | 2024-02-21 05:35:49.312942+00
 2804395207 | {248}       | 5.88.1         | 2024-02-21 05:35:49.312942+00
 2804395254 | {248}       | 5.88.2         | 2024-02-21 05:35:49.312942+00
 2804394777 | {248}       | 5.89.0         | 2024-02-21 05:35:49.312942+00
 2804395458 | {248}       | 5.90.3         | 2024-02-21 05:35:49.312942+00
 2804395055 | {248}       | 4.47.0         | 2024-02-21 05:35:49.312942+00

The exporter also just ran and I can see the highest_version updated for webpack in the GCP bucket data:

{"name":"webpack","lowest_version":"0.1.0","highest_version":"5.90.3","default_licenses":["MIT"]}

And I've succesfuly seen webpack identified as having an MIT license in a pipeline I just ran on gitlab.com:

Before triggering a full re-sync, we might try to figure out what would be the appropriate value to set for max_document_size. I've put 16MB so far but checking on some other problematic packages I can see we can get far above this:

@graphql-codegen_cli: 26MB
vite: 38MB

@philipcunningham before going that route, do you know if there is any risk or constraint on the infrastructure side if we increase the couchDB disk usage?

Considering the fact that we are currently missing 255 documents, the worst case scenario would be 255xmax_document_size MB of additional storage needed. Let's get some margin and say we pick 64MB, that's ~16GB more.

On the ohter hand the best case scenario would we 255x8MB (current limit) so ~2GB.

Current DB size is 47.4GB so that means we would go with an increase of ~4% to ~30%, for a final size between 50GB and 64GB.

I recall storage is marginal on our overal operational cost so I'd go for it but @thiagocsf feel free to correct me on this.

About how to trigger a full-resync, we could probably just trigger a one-time replication without any filter on doc_ids. This replication would run concurrently to the continous one that is already in place and thus do not prevent updates emited during the full re-sync. I'm still checking but haven't found any specific guidance on that topic in the documentation.

Your suggestion sounds good to me, @gonzoyumo. Crank it up to the largest known size.

This might all be reviewed soon anyway, but doesn't change the fact that storage is the lowest cost of everything considered here.

@philipcunningham before going that route, do you know if there is any risk or constraint on the infrastructure side if we increase the couchDB disk usage?

@gonzoyumo I'm not aware of any issues and we are OK for disk utilization on both instances:

Dev

Filesystem     Type      1K-blocks      Used  Available Use% Mounted on
udev           devtmpfs   16430156         0   16430156   0% /dev
tmpfs          tmpfs       3288192       448    3287744   1% /run
/dev/sda1      ext4     5160354048 254570136 4695959524   6% /
tmpfs          tmpfs      16440956         0   16440956   0% /dev/shm
tmpfs          tmpfs          5120         0       5120   0% /run/lock
/dev/sda15     vfat         126710     10900     115810   9% /boot/efi
tmpfs          tmpfs       3288188         0    3288188   0% /run/user/1000

Prod

Filesystem     Type      1K-blocks      Used  Available Use% Mounted on
udev           devtmpfs   16430156         0   16430156   0% /dev
tmpfs          tmpfs       3288192       448    3287744   1% /run
/dev/sda1      ext4     5160354048 380339252 4570190408   8% /
tmpfs          tmpfs      16440956         0   16440956   0% /dev/shm
tmpfs          tmpfs          5120         0       5120   0% /run/lock
/dev/sda15     vfat         126710     10900     115810   9% /boot/efi
tmpfs          tmpfs       3288188         0    3288188   0% /run/user/1000

About how to trigger a full-resync, we could probably just trigger a one-time replication without any filter on doc_ids. This replication would run concurrently to the continous one that is already in place and thus do not prevent updates emited during the full re-sync. I'm still checking but haven't found any specific guidance on that topic in the documentation.

I think this sounds like a good suggestion. Two things spring to mind:

The increased traffic from the same machine will mean we'll start hitting the public registry's rate limiting sooner. This is probably OK but we should keep an eye on it to make sure that it isn't resulting in the document count delta increasing.
It might be worth identifying a "known-bad" package to validate that the replication was successful.

Thanks for the feedback!

I've looked at the couchDB logs on the VM and there I was able to extract a list of 450 packages for which the error Too large was raised (see below).

So on top of completely missing documents, we also have an increasing number of documents for which the sync no longer worked as they go over the limit.

So instead of a full sync, I can now focus the replication on these docs only.

See list of packages

@alfresco/adf-cli
@alfresco/adf-content-services
@alfresco/adf-core
@alfresco/adf-insights
@alfresco/adf-process-services
@alfresco/adf-process-services-cloud
@alfresco/adf-testing
@alfresco/js-api
@applicaster/quick-brick-core
@applicaster/zapp-react-native-bridge
@applicaster/zapp-react-native-default-player
@applicaster/zapp-react-native-redux
@applicaster/zapp-react-native-ui-components
@applicaster/zapp-react-native-utils
@applicaster/zapplicaster-cli
@atlaskit/editor-core
@atomist/skill
@aws-amplify/ui-react
@aws-amplify/ui-vue
@balena/open-balena-api
@budibase/server
@budibase/string-templates
@c8y/apps
@c8y/cli
@c8y/client
@c8y/ngx-components
@c8y/style
@capacitor/google-maps
@capacitor/local-notifications
@carbon/ibmdotcom-services
@carbon/ibmdotcom-styles
@carbon/ibmdotcom-utilities
@carbon/ibmdotcom-web-components
@carbon/web-components
@codecademy/gamut
@codecademy/gamut-kit
@codecademy/styleguide
@commercetools-uikit/async-creatable-select-field
@commercetools-uikit/async-creatable-select-input
@commercetools-uikit/async-select-field
@commercetools-uikit/async-select-input
@commercetools-uikit/creatable-select-field
@commercetools-uikit/creatable-select-input
@commercetools-uikit/data-table
@commercetools-uikit/data-table-manager
@commercetools-uikit/search-select-field
@commercetools-uikit/select-field
@commercetools-uikit/select-input
@corva/ui
@datagrok/meta
@dfinity/nns
@dfinity/sns
@dxos/cli
@edgio/core
@elliemae/ds-data-table
@elliemae/ds-datagrids
@elliemae/ds-form
@elliemae/ds-icons
@elliemae/ds-mobile
@esri/calcite-components
@eui/components
@genesislcap/foundation-layout
@gitpod/gitpod-protocol
@gitpod/local-app-api-grpcweb
@gitpod/public-api
@gitpod/supervisor-api-grpc
@gitpod/supervisor-api-grpcweb
@gooddata/api-client-bear
@gooddata/api-client-tiger
@gooddata/catalog-export
@gooddata/mock-handling
@gooddata/reference-workspace
@gooddata/sdk-backend-base
@gooddata/sdk-backend-bear
@gooddata/sdk-backend-mockingbird
@gooddata/sdk-backend-tiger
@gooddata/sdk-ui
@gooddata/sdk-ui-charts
@gooddata/sdk-ui-ext
@gooddata/sdk-ui-filters
@gooddata/sdk-ui-geo
@gooddata/sdk-ui-kit
@gooddata/sdk-ui-pivot
@gooddata/sdk-ui-vis-commons
@grafana/data
@grafana/e2e-selectors
@grafana/ui
@grapecity/wijmo
@grapecity/wijmo.all
@grapecity/wijmo.angular.all
@grapecity/wijmo.angular.base
@grapecity/wijmo.angular.chart
@grapecity/wijmo.angular.chart.analytics
@grapecity/wijmo.angular.chart.animation
@grapecity/wijmo.angular.chart.annotation
@grapecity/wijmo.angular.chart.finance
@grapecity/wijmo.angular.chart.finance.analytics
@grapecity/wijmo.angular.chart.hierarchical
@grapecity/wijmo.angular.chart.interaction
@grapecity/wijmo.angular.chart.radar
@grapecity/wijmo.angular.core
@grapecity/wijmo.angular.gauge
@grapecity/wijmo.angular.grid
@grapecity/wijmo.angular.grid.detail
@grapecity/wijmo.angular.grid.filter
@grapecity/wijmo.angular.grid.grouppanel
@grapecity/wijmo.angular.grid.multirow
@grapecity/wijmo.angular.grid.sheet
@grapecity/wijmo.angular.input
@grapecity/wijmo.angular.nav
@grapecity/wijmo.angular.olap
@grapecity/wijmo.angular.viewer
@grapecity/wijmo.angular2.all
@grapecity/wijmo.angular2.chart
@grapecity/wijmo.angular2.chart.analytics
@grapecity/wijmo.angular2.chart.animation
@grapecity/wijmo.angular2.chart.annotation
@grapecity/wijmo.angular2.chart.finance
@grapecity/wijmo.angular2.chart.finance.analytics
@grapecity/wijmo.angular2.chart.hierarchical
@grapecity/wijmo.angular2.chart.interaction
@grapecity/wijmo.angular2.chart.radar
@grapecity/wijmo.angular2.core
@grapecity/wijmo.angular2.directivebase
@grapecity/wijmo.angular2.gauge
@grapecity/wijmo.angular2.grid
@grapecity/wijmo.angular2.grid.detail
@grapecity/wijmo.angular2.grid.filter
@grapecity/wijmo.angular2.grid.grouppanel
@grapecity/wijmo.angular2.grid.multirow
@grapecity/wijmo.angular2.grid.sheet
@grapecity/wijmo.angular2.input
@grapecity/wijmo.angular2.nav
@grapecity/wijmo.angular2.olap
@grapecity/wijmo.angular2.viewer
@grapecity/wijmo.chart
@grapecity/wijmo.chart.analytics
@grapecity/wijmo.chart.animation
@grapecity/wijmo.chart.annotation
@grapecity/wijmo.chart.finance
@grapecity/wijmo.chart.finance.analytics
@grapecity/wijmo.chart.hierarchical
@grapecity/wijmo.chart.interaction
@grapecity/wijmo.chart.radar
@grapecity/wijmo.chart.render
@grapecity/wijmo.gauge
@grapecity/wijmo.grid
@grapecity/wijmo.grid.detail
@grapecity/wijmo.grid.filter
@grapecity/wijmo.grid.grouppanel
@grapecity/wijmo.grid.multirow
@grapecity/wijmo.grid.pdf
@grapecity/wijmo.grid.sheet
@grapecity/wijmo.grid.xlsx
@grapecity/wijmo.input
@grapecity/wijmo.knockout.all
@grapecity/wijmo.knockout.base
@grapecity/wijmo.knockout.chart
@grapecity/wijmo.knockout.chart.analytics
@grapecity/wijmo.knockout.chart.animation
@grapecity/wijmo.knockout.chart.annotation
@grapecity/wijmo.knockout.chart.finance
@grapecity/wijmo.knockout.chart.finance.analytics
@grapecity/wijmo.knockout.chart.interaction
@grapecity/wijmo.knockout.core
@grapecity/wijmo.knockout.gauge
@grapecity/wijmo.knockout.grid
@grapecity/wijmo.knockout.grid.filter
@grapecity/wijmo.knockout.grid.grouppanel
@grapecity/wijmo.knockout.grid.multirow
@grapecity/wijmo.knockout.grid.sheet
@grapecity/wijmo.knockout.input
@grapecity/wijmo.knockout.nav
@grapecity/wijmo.knockout.olap
@grapecity/wijmo.meta
@grapecity/wijmo.nav
@grapecity/wijmo.odata
@grapecity/wijmo.olap
@grapecity/wijmo.pdf
@grapecity/wijmo.purejs.all
@grapecity/wijmo.react.all
@grapecity/wijmo.react.base
@grapecity/wijmo.react.chart
@grapecity/wijmo.react.chart.analytics
@grapecity/wijmo.react.chart.animation
@grapecity/wijmo.react.chart.annotation
@grapecity/wijmo.react.chart.finance
@grapecity/wijmo.react.chart.finance.analytics
@grapecity/wijmo.react.chart.hierarchical
@grapecity/wijmo.react.chart.interaction
@grapecity/wijmo.react.chart.radar
@grapecity/wijmo.react.gauge
@grapecity/wijmo.react.grid
@grapecity/wijmo.react.grid.filter
@grapecity/wijmo.react.grid.grouppanel
@grapecity/wijmo.react.grid.multirow
@grapecity/wijmo.react.grid.sheet
@grapecity/wijmo.react.input
@grapecity/wijmo.react.nav
@grapecity/wijmo.react.olap
@grapecity/wijmo.react.viewer
@grapecity/wijmo.viewer
@grapecity/wijmo.vue2.all
@grapecity/wijmo.vue2.base
@grapecity/wijmo.vue2.chart
@grapecity/wijmo.vue2.chart.analytics
@grapecity/wijmo.vue2.chart.animation
@grapecity/wijmo.vue2.chart.annotation
@grapecity/wijmo.vue2.chart.finance
@grapecity/wijmo.vue2.chart.finance.analytics
@grapecity/wijmo.vue2.chart.hierarchical
@grapecity/wijmo.vue2.chart.interaction
@grapecity/wijmo.vue2.chart.radar
@grapecity/wijmo.vue2.core
@grapecity/wijmo.vue2.gauge
@grapecity/wijmo.vue2.grid
@grapecity/wijmo.vue2.grid.detail
@grapecity/wijmo.vue2.grid.filter
@grapecity/wijmo.vue2.grid.grouppanel
@grapecity/wijmo.vue2.grid.multirow
@grapecity/wijmo.vue2.grid.sheet
@grapecity/wijmo.vue2.input
@grapecity/wijmo.vue2.nav
@grapecity/wijmo.vue2.olap
@grapecity/wijmo.vue2.viewer
@grapecity/wijmo.webcomponents.all
@grapecity/wijmo.webcomponents.base
@grapecity/wijmo.webcomponents.chart
@grapecity/wijmo.webcomponents.chart.analytics
@grapecity/wijmo.webcomponents.chart.animation
@grapecity/wijmo.webcomponents.chart.finance
@grapecity/wijmo.webcomponents.chart.finance.analytics
@grapecity/wijmo.webcomponents.chart.hierarchical
@grapecity/wijmo.webcomponents.chart.interaction
@grapecity/wijmo.webcomponents.chart.radar
@grapecity/wijmo.webcomponents.gauge
@grapecity/wijmo.webcomponents.grid
@grapecity/wijmo.webcomponents.grid.filter
@grapecity/wijmo.webcomponents.grid.grouppanel
@grapecity/wijmo.webcomponents.grid.multirow
@grapecity/wijmo.webcomponents.grid.sheet
@grapecity/wijmo.webcomponents.input
@grapecity/wijmo.webcomponents.nav
@grapecity/wijmo.webcomponents.olap
@grapecity/wijmo.webcomponents.viewer
@grapecity/wijmo.xlsx
@graphql-codegen/typescript
@graphql-codegen/typescript-operations
@graphql-codegen/typescript-resolvers
@graphql-codegen/visitor-plugin-common
@graphql-hive/cli
@hyperledger/node-vcx-wrapper
@innovorder/serverless-resize-bucket-images
@instructure/quiz-number-input
@ionic/core
@ironsource/fusion-ui
@itwin/frontend-devtools
@knapsack/app
@knapsack/babel-config
@knapsack/babel-config-starter
@knapsack/color-utils
@knapsack/core
@knapsack/creator-utils
@knapsack/design-token-utils
@knapsack/docs-generator
@knapsack/esbuild-tool
@knapsack/eslint-config-starter
@knapsack/file-utils
@knapsack/https
@knapsack/plugin-changelog-md
@knapsack/postcss-config-starter
@knapsack/prettier-config
@knapsack/renderer-angular
@knapsack/renderer-client
@knapsack/renderer-hbs
@knapsack/renderer-html
@knapsack/renderer-react
@knapsack/renderer-twig
@knapsack/renderer-vue
@knapsack/renderer-web-components
@knapsack/renderer-webpack-base
@knapsack/rollup-config-starter
@knapsack/schema-utils
@knapsack/test-ava
@knapsack/types
@knapsack/typescript-config-starter
@knapsack/utils
@leaflink/stash
@ledgerhq/live-cli
@ledgerhq/live-common
@mdn/yari
@microsoft/teamsfx
@mikro-orm/better-sqlite
@mikro-orm/cli
@mikro-orm/core
@mikro-orm/entity-generator
@mikro-orm/knex
@mikro-orm/mariadb
@mikro-orm/migrations
@mikro-orm/migrations-mongodb
@mikro-orm/mongodb
@mikro-orm/mysql
@mikro-orm/postgresql
@mikro-orm/reflection
@mikro-orm/seeder
@mikro-orm/sqlite
@modern-js/app-tools
@modern-js/plugin-testing
@modern-js/runtime
@modern-js/utils
@o3r/localization
@omnia/fx
@omnia/tooling-vue
@onereach/ui-components
@paydock/client-sdk
@pdftron/webviewer
@preply/chat
@preply/navigation
@preply/video
@primer/primitives
@primer/react
@primer/view-components
@prisma/client
@prisma/debug
@prisma/engines
@prisma/fetch-engine
@prisma/generator-helper
@prisma/get-platform
@prisma/instrumentation
@prisma/language-server
@prisma/migrate
@procore/core-css
@procore/core-css-webpack-plugin
@procore/core-icons
@procore/core-react
@procore/core-scripts
@procore/core-utils
@procore/core-webpack
@ptkdev/svelte-electron-boilerplate
@ptkdev/svelte-spa-boilerplate
@pulumi/pulumi
@qrvey/utils
@quintype/framework
@quip/collab
@redwoodjs/api
@redwoodjs/api-server
@redwoodjs/auth
@redwoodjs/auth-supabase-web
@redwoodjs/cli
@redwoodjs/codemods
@redwoodjs/core
@redwoodjs/eslint-config
@redwoodjs/internal
@redwoodjs/prerender
@redwoodjs/record
@redwoodjs/router
@redwoodjs/structure
@redwoodjs/telemetry
@redwoodjs/testing
@redwoodjs/web
@remotion/google-fonts
@salesforce/cli
@salto-io/cli
@salto-io/core
@salto-io/logging
@salto-io/netsuite-adapter
@salto-io/salesforce-adapter
@salto-io/zuora-billing-adapter
@sap-cloud-sdk/openapi-generator
@segment/action-destinations
@shopify/polaris
@sphereon/ssi-sdk.issuance-branding
@sprucelabs/heartwood-view-controllers
@sprucelabs/mercury-client
@sprucelabs/mercury-types
@sprucelabs/spruce-view-plugin
@stoplight/cli
@theia/cli
@theia/core
@theia/debug
@theia/plugin
@theia/task
@theia/variable-resolver
@thirdweb-dev/auth
@thirdweb-dev/react
@thirdweb-dev/react-core
@thirdweb-dev/sdk
@thirdweb-dev/wallets
@truedat/bg
@typescript-eslint/eslint-plugin
@typescript-eslint/parser
@typescript-eslint/typescript-estree
@vizzly/dashboard
@webex/plugin-meetings
@wistia/vhs
@wix/wix-code-types
@xata.io/cli
@xata.io/client
@xh/hoist
aiware-js
amazon-cognito-identity-js
aws-amplify
balena-cli
binaryen
ccxt
cdk-gitlab-runner
construct-hub-probe
cordova
cordova-common
create-knapsack
create-redwood-app
ember-inspector
hls.js
homebridge-config-ui-x
intraactive-sdk-ui
jsii
jsii-rosetta
lighthouse
mikro-orm
next
nocodb-daily
nocodb-sdk-daily
ol
playwright
playwright-core
prisma
pro-gallery
pro-layouts
public-unscoped-test-package
quiz-api-client
react-native
react-native-navigation
react-native-ui-lib
react-native-windows
renovate
rsshub
rubic-sdk
safecheck-client
sale-client
salto-vscode
scratch-gui
sfdx-hardis
tabris
thirdweb
typescript
viem
vite
weboptimizer
wickes-css
wijmo
wrangler

Getting there...

I had to sync in batches of ~50 to ~100 documents otherwise the replication queries timed out :/ I've done 6 batches on the DEV environement and the PROD automatically from it:

dev: 2685958 total docs
prod: 2685956 total docs

The size of the DB has reached 50 GB, so we're actually close to the best case scenario here. That said, a few packages are still above that 64MB limit. So far, 14 documents still caused the Too large error in the logs:

package	size
@primer/react	116MB
sfdx-hardis	106 MB
@redwoodjs/cli	105MB
@carbon/ibmdotcom-web-components	103 MB
@c8y/ngx-components	98.5MB
@salesforce/cli	91.9MB
@thirdweb-dev/react	87.4MB
binaryen	86.6MB
@typescript-eslint/eslint-plugin"	75.0MB
renovate	71.6MB
hls.js	70.1MB
rubic-sdk	67MB
nocodb-daily	63.6MB
quiz-api-client	16.3 MB

I've raised the limit to 128MB and will try to resync these.

All above packages have been synced.

I've checked the prod logs and found 2 other packages that raised an error and were missing:

cozy-ui
@instructure/quiz-number-input

These are now present in prod too.

There seems to be a little drift remaining between prod and dev:

prod: "doc_count":2686115, "doc_del_count":1534152
dev: "doc_count":2686117, "doc_del_count":1534153

I also just realized that the initial diff of 255 documents with the official npm registry that @philipcunningham reported above was actually not missing documents on our ends, it looks like we have more documents on our replica

Querying the replicate.npmjs.com couchdb at the same time as our instances I get:

"doc_count":2685030, "doc_del_count":1534160

So we have indeed ~1000 docs more than the official registry, while missing less than 10 deleted documents...

@thiagocsf @philipcunningham I think I'll stop here I'll add a reminder to check the logs in a few day and advocate for getting these exported in GCP log explorer like Philippe suggested (mentionned in reaction rotation) and then we can probably setup log based alerts: https://cloud.google.com/logging/docs/alerting/log-based-alerts

I also just realized that the initial diff of 255 documents with the official npm registry that @philipcunningham reported above was actually not missing documents on our ends, it looks like we have more documents on our replica

@gonzoyumo that is interesting. Perhaps we could improve the communication in that test. WDYT?

Here's an example of previous job where the the public registry was significantly ahead of the replica.

@philipcunningham it's probably just my fault not paying enough attention and assuming we were behind as the whole context of this issue is about mising documents

Though that's a good call out to make that more explicit, I've pushed an MR to address this: https://gitlab.com/gitlab-org/security-products/license-db/deployment/-/merge_requests/232+

@gonzoyumo, great work getting to the bottom of this, addressing the problem and doing follow-ups. Thank you also for the regular updates

changed milestone to %16.10

added missed:16.9 label

added missed-deliverable label

mentioned in issue gl-retrospectives/secure-sub-dept/composition-analysis#42 (closed)

mentioned in issue gitlab-org/quality/triage-reports#16278 (closed)

added SLOMissed label and removed SLONear Miss label

added workflowready for development label and removed workflowverification label

marked the checklist item Timeboxed investigation ~1 day: please advise for next steps with the investigation results. as completed

mentioned in issue gitlab-org/secure/general#297 (closed)

assigned to @philipcunningham

added workflowin dev label and removed workflowready for development label

closed

added workflowcomplete label and removed workflowin dev label

changed title from Many npm package licenses reported as Unknown even if license type is unambiguous in npmjs to Many npm package licenses reported as Unknown even if license type is unambiguous in npmjs (couchDB replication issue)

I've checked the logs today and found 3 more occurences of Too large errors since Friday:

@c8y/ngx-components => behind on dev: 3893, missing on prod vs 3898 on official replicate
- https://dev-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/%40thirdweb-dev%2Freact
- https://prod-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/%40thirdweb-dev%2Freact
@thirdweb-dev/react => behind on dev: 1932, behind on prod: 1932 vs 1943 on official replicate
- https://dev-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/%40thirdweb-dev%2Freact
- https://prod-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/%40thirdweb-dev%2Freact
nocodb-daily => behind on dev: 4755, behind on prod: 4755 vs 4795 on official replicate
- https://dev-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/nocodb-daily
- https://prod-couchdb-npm-mirror.org/_utils/#database/license-db-npm-mirror/nocodb-daily

Though, it seems these docs are below the current limit and trying to re-sync these actually highlighted other timeout errors.

[error] 2024-02-26T17:58:56.595369Z couchdb@127.0.0.1 <0.5644.1403> -------- Replicator, request GET to "https://replicate.npmjs.com/registry/%40thirdweb-dev%2Freact?atts_since=%5B%221932-7e79d8eb63462dccf6d27927f96a7733%22%5D&revs=true&open_revs=%5B%221943-ed3efcc3b488f5c9fae6412384a35cdc%22%5D&latest=true" failed due to error timeout

I've set the setting [replicator]connection_timeout to 180000 (3 minutes). This then triggered another error:

OS Process Error <0.11431.1403> :: {os_process_error,"OS process timed out."}

I went ahead and set [couchdb]os_process_timeout to 180000 (3 minutes) too. Though, the error is still raised. I'll open a follow-up issue to further investigate this one as there are several other packages that raise such error.

Issue created: Npm couchdb replication errors ("OS process tim... (#443364 - closed) • Unassigned • Backlog

marked this issue as related to #443364 (closed)

mentioned in issue #443364 (closed)

mentioned in issue gitlab-org/quality/triage-reports#16737 (closed)

mentioned in issue gl-retrospectives/secure-sub-dept/composition-analysis#43 (closed)

Many npm package licenses reported as Unknown even if license type is unambiguous in npmjs (couchDB replication issue)

Summary

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Results of GitLab application Check

Possible fixes

Designs

Child items ...

Activity

Dev

Prod

Many npm package licenses reported as Unknown even if license type is unambiguous in npmjs (couchDB replication issue)

Summary

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Results of GitLab application Check

Possible fixes

Is blocked by

Relates to

Activity

Dev

Prod