Allow to use non uniform distribution for values taken as foreign keys
Problem to solve
In SNDS, lots of columns values are taken foreign "nomenclature" tables.
We currently sample these values uniformly. We would like to be able to sample these values following observed distribution of values.
Add a specific
_freq column in foreign tables. This column would be a number indicating the relative frequencies (or count) of the line occurences.
Extend tsfaker to read this
_freq column and sample values following this distribution.
_freq should be treated as a default column name. It should be easy to latter extend tsfaker to accept other column names (either from command line or from tableschema foreign keys syntax). This would for example allow to use different frequencies for different columns (or year) sampling values from the same table.