... | ... | @@ -2,10 +2,12 @@ |
|
|
|
|
|
* [Output directory and overview](#output_directory)
|
|
|
* [Output alignments](#output_alignments)
|
|
|
* [Supermatrix statistics](#summary_statistics)
|
|
|
* [Input alignment statistics](#homolog_statistics)
|
|
|
* [Output alignment statistics](#ortholog_statistics)
|
|
|
* [Paralogy Frequency Statistics](#paralog_frequency)
|
|
|
* [Supermatrix statistics](#summary_statistics)
|
|
|
* [Input alignment statistics](#homolog_statistics)
|
|
|
* [Output alignment statistics](#ortholog_statistics)
|
|
|
* [OTU statistics](#otu_stats)
|
|
|
* [Log file](#log_file)
|
|
|
* [Paralogy Frequency Statistics](#paralog_frequency)
|
|
|
|
|
|
## Output directory and overview <a name="output_directory"></a>
|
|
|
|
... | ... | @@ -35,10 +37,12 @@ All output alignments are stored in a subfolder to the output folder with the pa |
|
|
|
|
|
By default sequence data is kept on a single line. For a more readable output, you can wrap sequences at column `n` by typing `--wrap n`, where `n` is a positive integer. For example, wrap sequence data at column 80 by typing `--wrap 80`. Output alignments use the same name as the input alignments, but with the string `_pruned` appended to the end. Note that for some paralogy pruning algorithms, such as maximum inclusion (MI), multiple orthologs may be produced for a single input file and in those cases an index will also be added to the end of the name.
|
|
|
|
|
|
### Supermatrix statistics <a name="summary_statistics"></a>
|
|
|
## Supermatrix statistics <a name="summary_statistics"></a>
|
|
|
|
|
|
This file contains statistics of all input and output alignments, treated as a single concatenated alignment. Supermatrix statistics are stored to the `supermatrix_stats.csv` file and uses a semicolon (';') as a field separator. If jackknifing was performed, results will be included here, but none of the alignments will be saved.
|
|
|
|
|
|
Missing data is calculated by counting the number of gaps characters (they are '-', '?' or 'x') for each sequence
|
|
|
|
|
|
**Table.** Example of a supermatrix statistics file.
|
|
|
|
|
|
| id | alignments | sequences | otus | meanSequences | meanOtus | meanSeqLen | shortestSeq | longestSeq | pctMissingData | catAlignmentLen |
|
... | ... | @@ -60,7 +64,7 @@ Statistics for each _individual_ input alignment are stored into `output_alignme |
|
|
| 01029.fa | 57 | 68 | 211 | 106 | 233 | 0.0903181014895 | 233 | 0 | 0 | 11 | 21 | 0 |
|
|
|
| 05466.fa | 62 | 92 | 128 | 69 | 139 | 0.0734282139506 | 139 | 0 | 0 | 28 | 29 | 2 |
|
|
|
|
|
|
### Output alignment statistics <a name="ortholog_statistics"></a>
|
|
|
## Output alignment statistics <a name="ortholog_statistics"></a>
|
|
|
|
|
|
Statistics for each _individual_ output alignment are stored into `output_alignment_stats.csv`, using a semicolon (';') as a field separator. The following is an example of what such a file might look like.
|
|
|
|
... | ... | @@ -74,7 +78,65 @@ Statistics for each _individual_ output alignment are stored into `output_alignm |
|
|
| 01029_pruned.fa | 56 | 56 | 218 | 106 | 233 | 0.064071122011 | 233 |
|
|
|
| 05466_pruned.fa | 60 | 60 | 133 | 69 | 139 | 0.0398081534772 | 139 |
|
|
|
|
|
|
### Paralogy Frequency Statistics <a name="paralog_frequency"></a>
|
|
|
## OTU statistics <a name="otu_statistics"></a>
|
|
|
|
|
|
## Log file <a name="log_file></a>
|
|
|
|
|
|
The log file stores information about time and date when the analysis was made, input data, settings, supermatrix statistics in a readable format, as well as the time that the analysis took.
|
|
|
|
|
|
```
|
|
|
PhyloPyPruner version 0.3.0
|
|
|
Tuesday, 23. October 2018 09:42AM
|
|
|
---------------------------------
|
|
|
Input data:
|
|
|
Directory: /Users/feli/Phylogenomics/trees+alignments/Kocot_et_al_2017_Syst_Biol_Lophotrochozoa/alignments_and_FastTree_trees_pre-PhyloTreePruner
|
|
|
|
|
|
Parameters:
|
|
|
Minimum number of OTUs: 40
|
|
|
Minimum sequence length: 50
|
|
|
Long branch threshold: 4.0
|
|
|
Minimum support value: 0.8
|
|
|
Include: None
|
|
|
Exclude: None
|
|
|
Monophyly masking method: longest
|
|
|
Rooting method: midpoint
|
|
|
Outgroup rooting: ['DMEL']
|
|
|
Paralogy pruning method: LS
|
|
|
Paralogy frequency threshold: 4.0
|
|
|
Trim divergent percentage: 0.25
|
|
|
Jackknife: False
|
|
|
|
|
|
Input Alignments
|
|
|
----------------
|
|
|
# of alignments: 1034
|
|
|
# of sequences: 83508
|
|
|
# of OTUs: 74
|
|
|
avg # of sequences per alignment: 80
|
|
|
avg # of OTUs: 58
|
|
|
avg sequence length (ungapped): 177
|
|
|
shortest sequence (ungapped): 22
|
|
|
longest sequence (ungapped): 415
|
|
|
% missing data: 31.5
|
|
|
concatenated alignment length: 202420
|
|
|
|
|
|
Output Alignments
|
|
|
-----------------
|
|
|
# of alignments: 1016
|
|
|
# of sequences: 55626
|
|
|
# of OTUs: 73
|
|
|
avg # of sequences per alignment: 54
|
|
|
avg # of OTUs: 54
|
|
|
avg sequence length (ungapped): 183
|
|
|
shortest sequence (ungapped): 50
|
|
|
longest sequence (ungapped): 415
|
|
|
% missing data: 32.8
|
|
|
concatenated alignment length: 200398
|
|
|
|
|
|
-----------------------
|
|
|
Run time: 81.74 seconds
|
|
|
```
|
|
|
|
|
|
## Paralogy Frequency statistics <a name="paralog_frequency"></a>
|
|
|
|
|
|
Paralogy frequency (PF) calculates the number of paralogs for a OTU divided by the number of alignments that said OTU is present in. This data is saved to a CSV file called `<timestamp>_ppp_paralog_freq.csv` and, if [Matplotlib](https://matplotlib.org/) is installed, a PF plot will be saved to `<timestamp>_ppp_paralog_freq.png`, similar to the plot in **Figure 3**.
|
|
|
|
... | ... | |