Several packages with the version 9.44.0.Final in the PMDB with timestamps indicating successful ingestion in 2023. However, these versions were not exported and are missing from the export buckets. This discrepancy between ingestion and export needs to be addressed to understand the root cause and prevent future occurrences.
This timestamp confirms the package was ingested successfully into the PMDB but was not exported for some reason.
Solution
We need to investigate why these specific versions were not picked up by the export despite being present in the PMDB. Additionally, we should explore whether replaying the packages (e.g. updating their timestamps) can resolve the issue.
Philip Cunninghamchanged title from Investigate Missing Export for {--}9.44.0.Final{--} Packages in Package Metadata DB (PMDB) to Investigate Missing Export for 9.44.0.Final Packages in Package Metadata DB (PMDB)
changed title from Investigate Missing Export for {--}9.44.0.Final{--} Packages in Package Metadata DB (PMDB) to Investigate Missing Export for 9.44.0.Final Packages in Package Metadata DB (PMDB)
@gonzoyumo shared this note with me, and it raised a few interesting points that might be relevant here.
To give some context, I checked the number of versions for org.drools/drools-compiler, and there are currently 241 versions in total. However, many of these versions include suffixes such as:
Based on the full list of versions, here are the versions that do not have suffixes:
5.0.1
5.1.0
5.1.1
Here's what actually shows up in the export bucket:
% ag 'org.drools/drools-compiler'1690988514/000000000.ndjson1810:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1690297343/000000019.ndjson9409:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1690383726/000000019.ndjson9909:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1691161281/000000000.ndjson1661:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1694098880/000000000.ndjson1980:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1694012473/000000000.ndjson1977:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1690221385/000000019.ndjson9409:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1713884600/000000000.ndjson1807:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1713452601/000000000.ndjson2193:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1699542082/000000000.ndjson2030:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1692370874/000000000.ndjson2603:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}1692457276/000000000.ndjson1616:{"name":"org.drools/drools-compiler","lowest_version":"5.0.1","highest_version":"7.40.0.20200703","default_licenses":["Apache-2.0"]}
What's interesting to me is that these are all unique entries on different days but they all have the same version range. Comparing to versions without suffixes, the lowest_version makes sense (5.0.1). However, it selects 7.40.0.20200703 as the highest_version, which is likely because its suffix is the only one that is entirely numeric.
I'm not sure what the best path forward might be for this license-exporter project but I believe we can try working around the limitations of go-version with some custom implementation. We could use a struct that holds both the original raw version and it's parsed counterpart, and figure out a transformation of the invalid version syntax into something supported. For instance, 9.44.0.Final is not supported but 9.44.0-Final (with a dash instead of a dot) works fine and will be sorted adequately: https://go.dev/play/p/xczLlS2XXft
We would sort these struct based on the parsedVersion but use the original rawVersion in the exported value. Here is a quick and dirty attempt (please excuse my poor [go]language): https://go.dev/play/p/NWPXaJ_haeb
The obvious alternative is to look for a more versatile library but this might be a dead end or come with other problems and will likely require extensive testing to prevent regressions.
We could scope the approach to address the immediate issue with hashicorp/go-version and non-semver suffixes like .Final.
To resolve this, as you suggest, we could implement a custom solution that uses both the original version string and a normalized variant for internal sorting. This would allow for accurate comparisons without altering the actual version data. For example:
Custom Version Struct: Introduce a struct to store both the OriginalVersion (unaltered for output) and a NormalizedVersion (adjusted for compatibility with go-version).
For example, convert 9.44.0.Final to 9.44.0-Final for internal sorting, while still using the raw string 9.44.0.Final in the final output.
Flexible Parsing and Sorting:
Create a normalization function to handle common cases like .Final by replacing unsupported syntax with SemVer-compatible notation (e.g., -Final).
Potential for Future Extension: If new version formats emerge, the struct-based approach would allow us to make adjustments to fit our needs.
I'd welcome @fcatteau and @hacks4oats's input on this, as they may have encountered variations of this issue when working on semver_dialects or be aware of another Go package that might be more suitable for our use case.
We could use a struct that holds both the original raw version and it's parsed counterpart, and figure out a transformation of the invalid version syntax into something supported
@gonzoyumo In theory this could work, but that doesn't seem practical.
Parsing and sorting Maven versions is really complex.
We can't simply figure out a logic that improves the compression rate. For accuracy it has to be consistent with the logic implemented in the backend, otherwise versions might not get the correct licenses. (They might get the default licenses instead of unkown, or the other way around.)
Besides, the same problem also applies to other syntaxes, like the one used by Python packages.
we could have a service that compresses raw NDJSON files (i.e. where all versions are listed) similar to the compression logic implemented in license-exporter. That service would be implemented in Ruby and would use semver_dialects.
We could implement that even before switching to ranges of versions,
and so it's definitely worth a dedicating issue.
I seem to recall we also discussed in another issue about calling Ruby from Go to unify normalization semantics across system boundaries. Calling it as a service to handle sorting would also make sense to me.
Found this interesting link of how we could call a Ruby func from Golang. Maybe we don't need to build a whole service in Ruby. Maybe there is a way to just call the semver_dialect from go.
@nilieskou that will likely degrade performance, there is overhead setting up the interpreter and the call stack to call this code. Maybe doing so only in some cases, or in a batched manner would be a good way to offset costs.
@tkopel I don't have enough insights about performance but I am sure it would degrade performance. What I was thinking though is how much of a performance bottleneck do we really have on the exporter. Probably not that much. I think it would be worth to have a very quick experiment to figure out if it makes sense. WDYT?
This groupcomposition analysis bug has at most 50% of the SLO duration remaining and is an SLONear Miss breach. Please consider taking action before this becomes an SLOMissed in 6 days (2024-11-19).
Tal Kopelchanged title from Investigate Missing Export for 9.44.0.Final Packages in Package Metadata DB (PMDB) to Fix Missing Export for 9.44.0.Final Packages in Package Metadata DB (PMDB)
changed title from Investigate Missing Export for 9.44.0.Final Packages in Package Metadata DB (PMDB) to Fix Missing Export for 9.44.0.Final Packages in Package Metadata DB (PMDB)