Search result boost by SEARCH_WEIGHT annotation of dataflows
The .Stat DE search should allow to index the dataflow annotation of type SEARCH_WEIGHT and to use it for enhancing the search result order by "Relevance" with information provided by the data publishers.
This would allow using "Relevance" as default for both freetext-search and facet-navigation.
This feature should be configurable (switch on/off).
The SEARCH_WEIGHT annotation has the type "SEARCH_WEIGHT". The weight value is an integer or number. It is to be found:
- in the localised Annotation Text, in case the order is language-dependent, otherwise
- in the non-localised Annotation Title
Boosting impact is proportional to the descending weight values per locale.
The SEARCH_WEIGHT is an additional factor applied on the score determined by SOLR according to the relevance rules.
When an SEARCH_WEIGHT annotation is not provided for a dataflow, then we can assume it's the order value being 1
.
Test case in QA
Given several dataflows categorised under the same category 'Databases' > 'ILO modelled estimates and projections (ILOEST) (DB_575)':
- When I browse for this category: https://de-qa.siscc.org/?fs[0]=Databases%2C0%7CILO%20modelled%20estimates%20and%20projections%20%28ILOEST%29%23DB_575%23&pg=0&fc=Databases , I get a search result with 6 dataflows.
- 3 dataflows with an
SEARCH_WEIGHT
annotation type: - 3 other dataflows with no annotation
Recommendations from SEASE:
See: Sease-SISCC-technical-roadmap__002_.pdf, page 6 and following: Order Indexing
To use the SEARCH_WEIGHT annotation value at query time, first of all we need to index such content, let’s see what is necessary to be developed.
- Configuration - Add a configuration capability in the schema, this will grant the users the capability to index the “annotations” they are interested in, e.g.
SEARCH_WEIGHT: {
type: ANNOTATIONS_TYPE,
ext: INT_EXT,
out: true,
}
- Parsing - src/server/sdmx/datafl ow.js to parse the Json annotations from the API response and then build the Js Json data structure with the content of the annotations
- SolrDocument - src/server/builder/index.js:165 it must be implemented an extractAnnotations similar to extractFacets, it will have the responsibility to extract the annotations, including
localisations when necessary.
If no localisation is in place the annotations value will be indexed with no locale.
Page 13:
ADVANCED BOOSTING
Apart from the boost queries currently in use, moving to edismax query parsing gives additional possibility in terms of:
- boosting fields ...
- boosting phrase matches ...
- boosting NGrams ...
- boosting base on functions, this applies to your SEARCH_WEIGHT boost use case, once you have the value at indexing time you could boost documents based on the SEARCH_WEIGHTvalue they have.
To do that at query time, with the edismax would be, e.g.:
Query = term1 term2 term3
boost=search_weight_i
Doc1 = {field1:(term3 term1 term2), search_weight_i:50}
Doc2 = {field1:(bla bla term1 term2 term3 bla bla), search_weight_i:35}
Score(Doc1) = score(doc1) * 50
Score(Doc2) = score(doc2) * 35
N.B. there many other advanced functions to use in boosting, first of all you can decide if additive or multiplicative: https://lucene.apache.org/solr/guide/6_6/function-queries.html
Removed specs
If for simplification reasons, the SEARCH_WEIGHT annotations would not yet be used to improve the relevance sorting, but only be used on its own with a sorting exactly as indicated by SEARCH_WEIGHT annotation, in this case an additional localisable "Sort" menu item "Recommendation" needs to be added, which is selected by default.