Tags give the ability to mark specific points in history as being important
-
1.8
0a7212fb · ·Version 1.8 * hybran now depends on `networkx` and `intervaltree` Enhancements ------------ * New subcommand: - `hybran compare`: compare two annotations of the same genome * Rewritten comparison engine based on intervaltrees. This fixes some edge cases, such as pairs of in-frame overlapping genes with an unrelated gene between them, as well as unannotated regions at contig edges. The code is modular, available to run in the above-mentioned new command, and used in annomerge. * Enhancements to `hybran standardize`: - New option `-r`/`--ref-names-only`. - Simplified usage: If passing in a hybran output directory, no other arguments will be necessary. * Annotations now include pseudoscan evidence codes as note fields. These were previously only found in the pseudoscan report files. Bugs fixed ---------- * Annomerge no longer produces redundant CDS entries (#69) * Synonymous SNPs in start codons do not automatically invalidate reference-correspondence. * pseudoscan: Fixed typo potentially affecting calls where the reference is pseudo (#68). * fusionfisher: - Identify another signature of hybrid fusion genes (previously identified as misannotations) - Update FeatureProperties for fusion genes based on their components. Fusions no longer categorically pseudo. - Eliminated more false positive fusion calls by accounting for another misannotation signature. * Coordinate corrections no longer collapse compound locations ("joins") into single intervals (#70). * Include handling of `transl_except` qualifiers for amino acids introduced through translational recoding. * Restored compatibility with biopython >=1.82 and applied more preventitive maintenance in response to deprecation warnings. * Translations, in particular for RATT-transferred annotations, fixed to account for alternative methionine start codons. * Recognize .gbff as a genbank format file extension. * Write out all Prokka postprocessed contigs' annotations instead of just the last one.
-
1.7.1
b4e47d11 · ·Version 1.7.1 Bugs fixed ---------- * Fixed issue with `onegene` where unnamed reference genes did not get unified to use the locus tag of the primary reference as a gene name, instead unnecessarily assigning a generic name.
-
1.7
8f5b036c · ·Version 1.7 Enhancements ------------ * New subcommands: - `hybran standardize`: remove generic gene names in the final annotations. - `hybran onegene`: unify names of highly conserved gene copies * New option `-s`/`--organism` to set organism name in the final genbank header (#58). * Intercept malformed features in RATT output to allow continuity of the pipeline (#64). * The `--dedupe-references` option is now deprecated, being made a core part of the pipeline with `onegene` since the generic names that it introduces to collapse reference paralogs can be undone by `hybran standardize` if desired afterwards. * Reference unification now uses a different generic prefix than that used for unnamed genes to allow differentiating between them. `hybran standardize` handles these as well. * Full support for multiple reference annotations. * Postprocessed versions of RATT and ab initio annotations, with associated reports, are now saved. * Substantial speedups due to parallelization of all postprocessing logic. * Logging and reporting: - Comprehensive reporting of pseudoscan results. - Reorganized the invalid and rejected features reports. - Removed spaces from column names of novelty report. Bugs fixed ---------- * reference unification (formerly activated by the `--dedupe-references` option): - fixed for multiple references (#55) - enabled checkpointing on this step in the full pipeline. * pseudoscan: now identifies delayed stop codons when start coordinate correction changes the reading frame without introducing an internal stop. * coord_check: - Fixed issue in coord_check's reporting status when not attempting correction. - Coordinate correction sometimes resulted in genes with no stop codons. coord_check now checks for this and extends the ORF to the next in-frame stop codon to make a proper correction. These address the root cause of the problem for which the temporary measure from v1.6.1 was taken. That temporary measure has been removed. * Improved detection of gene fusions when one of the components is derived from a reference pseudogene. * fissionfuser: fixed issue where apparent complementary fragments are combined despite one of the copies being non-pseudo (#66). Although the issue severity was mitigated in that the combined annotation would be rejected in favor of RATT's, the second gene would have been lost. * annomerge: reject the `source` feature from RATT. Some final annotations contained two of them: one from RATT and one from Prokka. * Fixed issues that occur when sequence IDs contain "|" character (#62). * Fixed handling of situations where either RATT or Prokka find no annotations. * Fixed problem with redundant fusion gene name components when detected using both RATT and Prokka. * Improved detection of gene fusions due to adjustments of alignment internal gap extension penalty and refined delayed-stop calling criteria. Housekeeping ------------ * Changed default generic ORF prefix to "HYBRA" for greater clarity. * set default output directory to current directory. * Now ignoring warnings when generating translations for pseudogenes that aren't multiples of three. * Enabled setting any of RATT's configured transfer types and fixed names for *.global parameter sets.
-
1.6.1
34cdcd73 · ·Version 1.6.1 Bugs fixed ---------- * Fixed scenario where hybran crashes when coordinate correction matches to an adjacent locus and attempts correction (#61). * Added a temporary measure to inclusion criteria to penalize CDSs lacking stop codons (inadequately postprocessed) (#60).
-
1.6
3663b73a · ·Version 1.6 Enhancements ------------- * Massive streamlining of the pipeline. Reworked components into new subsystems: - `pseudoscan`: identification of anomalous copies of reference genes using new criteria independent of alignment coverage. (#50, #56, and #59) - `fissionfuser` (formerly `process_split_genes()`): improved detection and combining of gene fragments (that ab initio annotations tend to produce) into a single record. - `fusionfisher`: detection of gene fusion events and putative misannotations. - `thunderdome`: more aggressive conflict resolution between RATT and ab initio annotations. * Output GFF files no longer include the genome sequence. Bugs fixed ---------- * Fixed handling of conflicting annotations that are differently named (#57). * Reimplemented coordinate correction and applied to ab initio ORFs as part of `pseudoscan`. This resolves many instances of false `pseudo` CDSs ab initio that were due simply to incorrect start coordinate predictions spuriously shortening the genes. * Fixed handling of compound intervals in reference annotations (#46, #47) * Resolved issues involving reference annotations with multiple contigs/chromosomes (#48) * Fixed issue with some gene name assignments being dropped later in the pipeline due to some obsolete code (#43). * More comprehensive tracking of RATT/ab initio overlaps and conflicts (#49). * Checking in-frame overlaps with pseudo ORFs containing internal stop codons * Revamped postprocessing of RATT-introduced compound intervals (#44, #45) * Updated inclusion criteria for special handling of pseudo ORFs (#42)
-
1.5.2
93b8459c · ·Version 1.5.2 Bugs fixed ---------- * Made a consistent non-CDS policy for RATT: Take everything except rRNA and tRNA (#22) * Clarified some rejection reasons for RATT/ab initio features. * Fixed representation of blast results for CDSs when there aren't any hits * Fixed issue with RATT handoff if sample/contig names contain `.` or `|`. * Fixed issue with logging merged genes.
-
1.5.1
48f89620 · ·Version 1.5.1 Bugs fixed ---------- * Fixed newly introduced issue with rejecting RATT annotations. Housekeeping ------------ * Removed some unused code.
-
1.5
76ace62d · ·Version 1.5 Bugs fixed ---------- * Prevented overlapping RATT-transferred annotations from automatically being handled as conflicts, leading one of the two to be discarded (#39) * Corrected distinguishing of ab initio vs reference-transferred annotation in final conflict resolution step (#38) * Added logging of some missed cases of annotation rejections (for {ratt,prokka}_unused.tsv) * Fixed the reference gene <=> locus_tag mapping dictionaries used in annomerge (#41) * Made sure to track and process all ab initio annotations that overlap RATT-transferred CDSs (#40). * Fixed handling of multi-fasta inputs (#35) Enhancements ------------ * Streamlined Prokka workflow (#33) * Parallelized BLASTing to reference genes in annomerge. * Set sequence names in the output annotation files. Housekeeping ------------ * Added exit status checks so pipeline fails as early as possible when things go wrong. * Switched default evalue to Prokka's current setting of 1e-9 (was 1e-6) * Slightly streamlined the RATT / Prokka comparison workflow * Added more unit tests
-
1.4.1
1d03783b · ·Version 1.4.1 Bugs fixed ---------- * Genes split into multiple adjacent fragments used to have a single /gene record but multiple CDS records with the same locus tag. For INSDC compliance, they now only have a single CDS record as well. * Removed /translation fields for /pseudo CDSs.
-
1.4
f1c8220a · ·Version 1.4 Enhancements ------------ * Generalized for any prokaryote. - Genetic code and taxonomy ID detected from reference annotation. RATT configuration is now automatically generated based on the detected genetic code, so a configuration file is no longer bundled. - Now using "ORF" prefix for generic genes rather than "MTB" by default. Option `--orf-prefix` added for customizability. * Removed checking for dnaA as the first gene at the first base position. * Made eggNOG-mapper step optional. * Gene fragments are now identified using the corresponding reference gene names, but are distinguished with a /pseudo tag. * RATT and (some) Prokka options are now under user control. Bugs fixed ---------- * Account for translationless CDSs that are labeled with the 'pseudo' qualifier instead of 'pseudogene' * Allow input fasta files with alternative standard extensions. * Fixed handling of reference annotations that may not have /gene qualifiers for all annotations. * Fixed handling of input genome when it's the same as the reference. * Set proper field from which to draw eggNOG-mapper annotations. * Uniform locus tags are now assigned for every sample. * Better identification of reference and unnamed genes in processing of clusters.
-
1.3.1
78d61a77 · ·1.3.1 Enhancements ------------ * Hybran version now recorded in the genbank annotation header. Bugs fixed ------------ * Updated the Prokka reference proteome generation format to enable Prokka to set gene names and product fields rather than leaving it to the final clustering step. * Fixed installation location of resource file.
-
1.3.0
9ccf2ce1 · ·Version 1.3.0 Enhancements * Added `--dedupe-references` option to assign a single generic gene name to duplicate genes in the provided reference annotations. * Sequence identity and alignment coverage thresholds are no longer applied to RATT-transferred annotations by default. (#28) The original behavior can be restored by passing the new `--filter-ratt` option. * reference annotations can now be passed as individual file names or file of file names, in addition to a directory name (#21) Bugs fixed * eggnog-mapper step no longer gets skipped (#23) * alignment query coverage threshold is now applied directly in Prokka (#24) * Dropped criterion of excluding hypothetical genes from Prokka-no-reference (#30) * Fixed calculation of query and reference alignment coverage (#27). * Corrected selection of top blastp hits for the one-to-one and one-to-many searches. When there were multiple hits in these cases, only the last one output by BLAST was being retained, which actually corresponds to the worst hit (by e-value). We now retain only the first hit.
-
1.2.0
01658506 · ·1.2.0 Enhancements ------------ - Add thorough logging of gene annotations merged, rejected, and newly-named. (#19) - Better tolerance of directory name inputs (#18) - New option -c/--coverage-threshold for tuning gene matching. The sequence identity threshold is now taken through -i/--identity-threshold. - Incorporate a RATT configuration file for using codons from translation table 11 (#17) Bugs fixed ---------- - use provided identity/coverage thresholds for all instances of BLAST, CD-HIT (#10) - ensure that the final gff output file gets updated (#15) - (#14)
-
1.1.1
0710c8e1 · ·1.1.1 * Fix issue with writing of merged_genes.gbk that sometimes caused hybran to crash during a run.
-
1.1.0
ca537eb8 · ·Version 1.1.0 * Migrated to Python 3 * Removed limitation of 30 references * Allowed sequence identity threshold to be user-defined * Proper handling of temporary files * Fixed issue preventing clustering step from running
-