Inconsistent output after modifying run_mlplasmids.R
I ran run_mlplasmids.R on the data provided on the repo (data/GCA_000250945.1_ASM25094v1_genomic.fna.gz) and specifying 'Acinetobacter baumannii' as species and threshold 0.7, and as expected the resulting tab file had no predicted plasmid contig. I wanted to test if the script could allow to run this species model.
I then modified the script run_mlplasmids.R to also obtain the chromosome contigs, by replacing
example_prediction <- plasmid_classification(path_input_file = input_path, prob_threshold=thresh, species = species)
by
example_prediction <- plasmid_classification(path_input_file = input_path, prob_threshold=thresh, species = species, full_output = TRUE)
I obtained a tab file with two predicted chromosome contigs and 2 predicted plasmid contigs, despite both having a plasmid probability below 0.7:
"Prob_Chromosome" "Prob_Plasmid" "Prediction" "Contig_name" "Contig_length"
0.746439586088951 0.253560413911049 "chromosome" "CP003351.1 Enterococcus faecium Aus0004, complete genome" 2955294
0.354636857186167 0.645363142813833 "plasmid" "CP003352.1 Enterococcus faecium Aus0004 plasmid AUS0004_p1, complete sequence" 56520
0.602554148772757 0.397445851227243 "chromosome" "CP003353.1 Enterococcus faecium Aus0004 plasmid AUS0004_p2, complete sequence" 3847
0.448970229653748 0.551029770346252 "plasmid" "CP003354.1 Enterococcus faecium Aus0004 plasmid AUS0004_p3, complete sequence" 4119
This is not consistent with a threshold of 0.7.
cedric chauve