Skip to content

CARD ontology search improved

Dear @antunderwood

I have been working on this pull request to improve and get the most of the information provided by CARD ontology.

Basically, information is stored in a dataframe, indexed by ARO terms that are unique and in different columns I have add several information provided by the pronto ontology file from CARD.

I have discarded several functions that I was not using anymore in your code. There is no possibility to store the information in json format. Pandas directly dumps info into csv or gets data from it.

Once the database is downloaded and parsed, it migth take a couple of minutes, new searches are really fast. Also, all available information for each entry will be retrieved, being mucho more informative and mucho more clarifying for the final user.

I implemented to generate batch search, passing multiples terms in a file and I also implemented linked searches. You can retrieve all genes CTX and get only those conferring resistance to a specific antibiotic.

I have tested for several examples and it was retrieving the same information as your version and I tried several examples that I knew where not working, the same as the ones stated by Chris Rands in the other pull request (#5)

I have not tested distribution. The only new requirement would be pandas module.

I hope you appreciate the work done here and the new implementations generated.

Please tell me if you will be interested in pulling the request. It might be not the case for your specific desire or intention.

Anyhow, I would be using my forked version in my pipeline but if you finally pull the request I would change to use the module from pip. If not, I would share and distribute this code as a forked version from your original code.

Regards Jose F. Sanchez-Herrero

Merge request reports