Table of contents
The input for this program is one or more corresponding trees and alignments in FASTA and Newick tree format. FASTA descriptions and Newick names must match and has to be in one of the following formats:
OTU is the operational taxonomical unit (usually the species) and
ID is a unique annotation or sequence identifier. For example:
Sequence data may be kept on a single line or span multiple lines.
Example Newick tree file:
--dir flag, followed by the path to a directory, to point towards a directory which contains one or more corresponding trees and alignments. This program will look for alignments and trees that share the same name within this directory and stage them for processing.
phylopypruner --dir <directory>
The program will automatically recognize FASTA and Newick tree files based on their filetype extension. This process is case-insensitive, meaning that both files named
.fasta will be recognized.
Table 1: Recognized filetype extensions.
|Filetype||Extensions recognized by PhyloPyPruner|
The minimum support value – provided with the
--min-support flag – can be either in percentage (1-100) or a decimal value between 0 to 1.0. Internally, support values are stored as floating point values, ranging between 0 to 1.0. If the support values in your Newick tree files are stored in percentage, PhyloPyPruner will automatically convert those values into the proper format. This will allow you to do analyses on multiple trees that uses different formats for support values.
You can define subclades, prior to the analysis, in order to analyse their overall stability. Subclades are defined in plain text, in a similar manner to a "Subclade Definition File" in BaCoCa (see the manual): Each line represents one subclade and each subclade consists of the name of the subclade and two, or more, taxa. Each entry is separated by a comma (',') and the first entry on each line is the name of the clade, followed by two, or more, taxa.
subclade_1,taxon_1,taxon_2,... subclade_2,taxon_3,taxon_4,... ...
Spaces (' ') are allowed in both subclade names and taxon names (however, these needs to correspond to the taxon name defined in your MSA and tree files). Each taxon can appear more than once and you are not required to specify a group for every taxon. Here is an example of a subclades definition file:
Annelida,HROB,CTEL,PLAM,ASUC,BPRO,CTOR,GDIB,PGOU,PAGA Brachiopoda,GPYR,HPSI,LCAL,NANO Ecdysozoa,DMEL,DPUL,PCAU Entoprocta,BGRA,LPEC,LVIV,PCER Gastrotricha,MACR,MEGA Mollusca,PVUL,VLIE,ETEN,GTOL,LHYA,OVUL,SVEL,HRUF,PFUC,LGIG,RPHI,NLAP,ACAL,ROLI,CFOR,MEDU,SLES,DGIG,SESC,LRUG,CGIG,ACRA,ETET,GEBO,LASE,MSCH,NCAR,NPER,PPUL,PCAL,SCLE Nemertea,CHON,CMAR,CLIN,LLAC,LLON,LRUB,MGRO,PPER,TPO1,TPO2 Phoronida,PPSA,PVAN Platyhelminthes,TPIS,SMAN,SMED Rotifera,BPLI,ARIC
Once you have formatted the subclade file, then you can provide it as an input to PhyloPyPruner by using the
--subclade flag. For example, if the path to your subclade definition file is "clades.txt", then you would write `--subclade clades.txt", in order to load the file into PhyloPyPruner.