Changes

Andre Gohr · 9fb9ecd7
--- a/o8_high_level/cmpr_features.md
+++ b/o8_high_level/cmpr_features.md
 [Back](../home#group-8-high-level-analyses)
-**Description**:
+**Description**: Use this command if you want to visually inspect and statistically test dependencies between a feature X and other features given by columns in a table. This command goes through these steps:
+1. if necessary will run [get_ifeatures](o5_analyze_seqs/get_ifeatures)/[get_efeatures](o5_analyze_seqs/get_efeatures) on the input table to determine exon-related/intron-related features
+2. sub-group data into bins (data quantiles) for each feature
+3. for each bin obtain all corresponding values of feature X
+4. apply Kruskal-Wallis test to the comparison of distributions of feature X across data bins
+5. apply Mann-Whitney U test to the pairwise comparison of distributions of feature X across pairs of data bins
+6. output:
+    1. PDF report summarizing results with box plots
+    2. table with all tested features
+    3. table with details on performed statistical tests including p values
+    4. all box plots as PDF graphics
+If the input table describes exons/introns, then it needs to contain these columns
+* start coordinate
+* end coordinate
+* chromosome ID; these chromosome IDs must match with the chromosome IDs in the GTF and FASTA which you use
+* strand
+* gene ID of gene where exon/intron occurs in; these gene IDs must match with the gene IDs in the GTF which you use
+**Example**: Table introns.tab describes introns by columns
+* START
+* END
+* SCAFFOLD
+* STRAND
+* GENEID_ENSEMBL
+It contains another column DPSI with dPSI values. In detail, introns (rows) have been sub-selected previously for a cleaner analysis such that only introns with positive dPSI were left in the table. The user can now extract intron-related features and study dependencies between all these features and the dPSI. For the extraction of the intron-related features, the user needs to specify further a gene annotation file (GTF), a genome sequence file (FASTA), the species (here Hsap), and other arguments. Last, the user needs to decide on the number of bins or define directly the quantiles which should be used in this analysis:
+```bash
+matt cmpr_features introns.tab -a DPSI -mattintron START END SCAFFOLD STRAND GENEID_ENSEMBL Hsa19.gtf Hsa19.fa Hsap 150 -points -bins 5 -o output_dir
+```
+The argument 150 specifies the length of the 3'-end of the introns which should be searched for SF1 hits and which should be included into the branch point analysis.
+It will generate the output folder output_dir and place therein:
+1. PDF report summarizing results with box plots: summary.pdf
+2. table with all tested (potentially extracted) features: 2000_extracted_features.tab
+3. table with results of comparisons including p values: 000_feature_comparison_results.tab
+4. all box plots as PDF graphics
 [Back](../home#group-8-high-level-analyses)
\ No newline at end of file