Release 1.4 (13 March 2017)
Two new commands - `mpileup` and `csq`:
* The `mpileup` command has been imported from samtools to bcftools. The
reasoning behind this is that bcftools calling is intimately tied to mpileup
and any changes to one, often requires changes to the other. Only the
genotype likelihood (BCF output) part of mpileup has moved to bcftools,
while the textual pileup output remains in samtools. The BCF output option
in `samtools mpileup` will likely be removed in a release or two or when
changes to `bcftools call` are incompatible with the old mpileup output.
The basic mpileup functionality remains unchanged as do most of the command
line options, but there are some differences and new features that one
should be aware of:
- The option `samtools mpileup -t, --output-tags` changed to `bcftools
mpileup -a, --annotate` to avoid conflict with the `-t, --targets`
option common across other bcftools commands.
- `-O, --output-BP` and `-s, --output-MQ` are no longer used as they are
only for textual pipelup output, which is not included in `bcftools
mpileup`. `-O` short option reassigned to `--output-type` and `-s`
reassigned to `--samples` for consistency with other bcftools commands.
- `-g, --BCF`, `-v, --VCF`, and ` -u, --uncompressed` options from
`samtools mpileup` are no longer used, being replaced by the
`-O, --output-type` option common to other bcftools commands.
- The `-f, --fasta-ref` option is now required by default to help avoid user
errors. Can be diabled using `--no-reference`.
- The option `-d, --depth .. max per-file depth` now behaves as expected
and according to the documentation, and prints a meaningful diagnostics.
- The `-S, --samples-file` can be used to rename samples on the fly. See man
page for details.
- The `-G, --read-groups` functionality has been extended to allow
reassignment, grouping and exclusion of readgroups. See man page for
details.
- The `-l, --positions` replaced by the `-t, --targets` and
`-T, --targets-file` options to be consistent with other bcftools
commands.
- gVCF output is supported. Per-sample gVCFs created by mpileup can be
merged using `bcftools merge --gvcf`.
- Can generate mpileup output on multiple (indexed) regions using the
`-r, --regions` and `-R, --regions-file` options. In samtools, one
was restricted to a single region with the `-r, --region` option.
- Several speedups thanks to @jkbonfield (cf3a55a).
* `csq`: New command for haplotype-aware variant consequence calling.
See man page and [paper](https://www.ncbi.nlm.nih.gov/pubmed/28205675).
Updates, improvements and bugfixes for many other commands:
* `annotate`: `--collapse` option added. `--mark-sites` now works with
VCF files rather than just tab-delimited files. Now possible to annotate
a subset of samples from tab file, not just VCF file (#469). Bugfixes (#428).
* `call`: New option `-F, --prior-freqs` to take advantage of prior knowledge
of population allele frequencies. Improved calculation of the QUAL score
particularly for REF sites (#449, 7c56870). `PLs>=256` allowed in
`call -m`. Bugfixes (#436).
* `concat --naive` now works with vcf.gz in addition to bcf files.
* `consensus`: handle variants overlapping region boundaries (#400).
* `convert`: gvcf2vcf support for mpileup and GATK. new `--sex` option to
assign sex to be used in certain output types (#500). Large speedup of
`--hapsample` and `--haplegendsample` (e8e369b) especially with `--threads`
option enabled. Bugfixes (#460).
* `cnv`: improvements to output (be8b378).
* `filter`: bugfixes (#406).
* `gtcheck`: improved cross-check mode (#441).
* `index` can now specify the path to the output index file. Also, gains the
`--threads` option.
* `merge`: Large overhaul of `merge` command including support for merging
gVCF files created by `bcftools mpileup --gvcf` with the new `-g, --gvcf`
option. New options `-F` to control filter logic and `-0` to set missing
data to REF. Resolved a number of longstanding issues (#296, #361, #401,
#408, #412).
* `norm`: Bugfixes (#385,#452,#439), more informative error messages (#364).
* `query`: `%END` plus `%POS0`, `%END0` (0-indexed) support - allows easy BED
format output (#479). `%TBCSQ` for use with the new `csq` command. Bugfixes
(#488,#489).
* `plugin`: A number of new plugins:
- `GTsubset` (thanks to @dlaehnemann)
- `ad-bias`
- `af-dist`
- `fill-from-fasta`
- `fixref`
- `guess-ploidy` (deprecates `vcf2sex` plugin)
- `isecGT`
- `trio-switch-rate`
and changes to existing plugins:
- `tag2tag`: Added `gp-to-gt`, `pl-to-gl` and `--threshold` options and
bugfixes (#475).
- `ad-bias`: New `-d` option for minimum depth.
- `impute-info`: Bugfix (49a9eaf).
- `fill-tags`: Added ability to aggregate tags for sample subgroups, thanks
to @mh11. (#503). HWE tag added as an option.
- `mendelian`: Bugfix (#566).
* `reheader`: allow muiltispace delimiters in `--samples` option.
* `roh`: Now possible to process multiple samples at once. This allows
considerable speedups for files with thousands of samples where the cost of
HMM is neglibible compared to I/O and decompressing. In order to fit tens of
thousands samples in memory, a sliding HMM can be used (new `--buffer-size`
option). Viterbi training now uses Baum-Welch algorithm, and works much
better. Support for gVCFs or FORMAT/PL tags. Added `-o, output` and
`-O, --output-type` options to control output of sites or regions
(compression optional). Many bugs fixed - do not segfault on missing PL
values anymore, a typo in genetic map calculation resulted in a slowdown and
incorrect results.
* `stats`: Bugfixes (16414e6), new options `-af-bins` and `-af-tags` to control
allele frequency binning of output. Per-sample genotype concordance tables
added (#477).
* `view -a, --trim-alt-alleles` various bugfixes for missing data and more
informative errors should now be given on failure to pinpoint problems.
General changes:
* Timestamps are now added to header lines summarising the command (#467).
* Use of the `--threads` options should be faster across the board thanks to
changes in HTSlib meaning meaning threads are now shared by the compression
and decompression calls.
* Changes to genotype filtering with `-i, --include` and `-e, --exclude` (#454).