Skip to content

Possibility to subsample input data in pipeline #minor

Julius Tembrockhaus requested to merge subset_input_data into master
  • new rule create_data_subset added to the pipeline
    • at first the metadata is subsampled by backend/scripts/create_metadata_subset.py
    • afterwards the FASTA file is subsampled by backend/scripts/extract_seqs_from_multifasta.sh accordingly
    • the new rule is executed after the rule filter_and_merge
  • added tests for new functionalities
  • added subsample parameters in config.yaml and adjusted create_config.py script
    • num_seqs for number restriction
    • last_x_days for time restriction
    • priority to set prio when both conditions can't be met
  • added section in README file with parameter explanation
Edited by Alice Wittig

Merge request reports