Skip to content

Integrate best hits and task map generation into run rules

Problems

  • double acc column in task map files
  • confusing which files/snakemake pathways need best_hits and map generation rules

Changes

id2annot database map files

  • remove double acc/ID column
  • renamed headers of dbcan, pfam, tigrfam
  • acc/ID now in all maps in column 1
  • new id2annot.map.gz files in test resources

database migration v5 to v6

to ensure new id2annot files are created, the id2annot.map.gz files are deleted

Rules

moved best_hits and map generation rules in:

  • run_hmmer.smk
  • run_emapper.smk -> only for emapper_v1
  • run_diamond.smk

task map creation functions are in utils/create_task_map.py
best_hits creation functions are in best_hits rules

Emapper v1

  • og'.'prot / or 'NA.'prot terms (original in the first column, now replaced by prot column) are removed from id2annot emapper map and instead this terms are builded in the write_func_task_map_for_emapper_v1 (this terms are required for matching with the best_hits dict).
  • added test vor emapper_v1 mapping

Example of id2annot map before:

  • 0RT9A.9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 0RT9A neurofilament heavy polypeptide 9593.ENSGGOP00000011681
  • 16AWT.9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 16AWT neurofilament heavy polypeptide 9593.ENSGGOP00000011681
  • 0IGME.9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 0IGME Neurofilament 9593.ENSGGOP00000011681
  • NA.9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 12SMI Neurofilament 9593.ENSGGOP00000011681
    ...

Example of d2annot map after:

  • 9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 0RT9A neurofilament heavy polypeptide
  • 9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 16AWT neurofilament heavy polypeptide
  • 9593.ENSGGOP00000011681 cellular processes and signaling Cytoskeleton 0IGME Neurofilament
    ...
Edited by Juliane Schmachtenberg

Merge request reports