Possibility to subsample input data in pipeline #minor
- new rule
create_data_subset
added to the pipeline- at first the metadata is subsampled by
backend/scripts/create_metadata_subset.py
- afterwards the FASTA file is subsampled by
backend/scripts/extract_seqs_from_multifasta.sh
accordingly - the new rule is executed after the rule
filter_and_merge
- at first the metadata is subsampled by
- added tests for new functionalities
- added subsample parameters in
config.yaml
and adjustedcreate_config.py
script- num_seqs for number restriction
- last_x_days for time restriction
- priority to set prio when both conditions can't be met
- added section in README file with parameter explanation
Edited by Alice Wittig