Performance review Data&Metadata Manager (ISTAT's framework) ($1930465) · Snippets · SIS-CC / Stat Suite / dotstatsuite-core-data-access

@pedroacarranza The analyse and report are excellent, great!

Few comments:

There seems being a typo in: "The reason, why this is important to analyse, is that in the .Stat V7 the time to create the actual content constraint was ...". Was V8 meant here?
Do you know if ISTAT does also correctly/completely check for content constraints during upload and download? I feel that this feature might be time-consuming and penalise a solution that is complete. In general, we need to extend the analysis to highlight if there are big/highly important functional differences that might be the reason for performance differences. This might then trigger a decision process about cost/benefit analysis of specific features... How would you propose to address this aspect of the comparison?
For the brainstorm on future approaches I would be very keen to see in addition the performance of the "simplified" ISTAT model where double mapping is replaced by simple maps. For that:
- the automatically generated mappings would need to be replaced by mappings from SDMX codes in MappingStore directly to the internal IDs (rather than passing through SDMX codes also managed in the data DB) and
- the table views used to do mapping between internal IDs and SDMX codes also managed in the data DB would need to be replaced by views directly on the fact tables.
  Do you think that would be feasible?

Thanks for the feedback @dosse.

For your first point, I meant both, but the message was not clear.

For the second point, I did not go in detail on what features are implemented. I propose that the next step (analysis of the implementation) should start with the creation of a summary table with a matrix of features. This matrix should contain information like:

Is the feature implemented by the solution?
Importance of the feature.
The degree of penalisation to uploads/downloads that this feature could cause.

Afterwards, based on this matrix we can drive the analysis of implementation, putting special focus on those features that appear in both solutions, and that have a high degree of penalisation on the processes being measured. This will help us to avoid spending too much time analysing every feature.

For the third point, I agree that this implementation seems to bring many benefits and should be looked at more in detail. At this point, I do not know how feasible would be to quickly create a similar scenario in .stat suite.

A few questions/comments:

• For .stat suite, in previous analysis, the results showed that the service successfully processed all update requests. Although the expected behavior was that, it would accept only the first request, and reject the rest. => should we worry about that ?

• Comparison with current dotstat performance would be nice

• Having an indication of the volume already stored into the database and checking the impact of the volume of stored information (if any) by running same tests on a .stat Suite with no other data, 1 Gb of other data, and 30 Gb of other data

• What is the slowest data actions in the current dotstat do not seems to have been included in the benchmark: metadata update, replace all, and other actions that are not simple data updates.

Merci Arnaud.

@atoch Please see tentative answers here:

• Should we worry about that ? --> No, the current queuing behaviour is the functionally expected behaviour and it will be maintained.

• Comparison with current dotstat performance would be nice --> Please see here

• Impact of the volume of stored information --> The tests have been done with more than 80,000,000 observations in the database. We do not expect massive performance decrease due to volume similar to V7 where we do not have performance issues due to volume (amount of datasets). Anyway, it seems that the volume used by ECO ADB is significantly below the volume used for the performance tests.

• Benchmark for metadata update, replace all, and other actions that are not simple data updates --> Referential metadata are not yet implemented, but attributes at different attachment levels were included in the tests. The "replace all" action is not an action supported today by SDMX and thus also not in .Stat Suite.

@atoch In the same link shared by @dosse, I added the section Related Documents. There you will find previous analysis containing the comparison between V8 and V7. The scenarios that you suggest in the third point are captured in the documents Performance Review 1.0 and Performance Review 2.0.

I would really apreciate to hear your feedback and suggestions for test scenarios where V7 currently struggles in production.