calc_candidate_regions window and step size not performing as expected
Hi! I'm using rehh 3.2.2 on a Linux cluster with R version 4.1.1. Thank you very much for the extremely helpful vignettes! I just noticed something using the calc_candiate_regions
function. It appears that it is making windows that are equal to the size of window_size
+ overlap
.
For this code:
cr.NA<-as.data.table(calc_candidate_regions(wgscan.ihs, threshold=4, window_size=5000, overlap=2500))
I get this table, note that the total window size based on START and END is 7500 bp rather than 5000. Based on the manual, I was expecting 5000 bp windows shifted every 2500 bp.
CHR START END N_MRK MEAN_MRK MAX_MRK N_EXTR_MRK
1: Scaffold_1 2692500 2700000 46 0.5971401 4.074501 1
2: Scaffold_1 3372500 3380000 79 0.1623711 5.454025 1
3: Scaffold_1 24955000 24972500 154 1.1939520 5.701953 8
4: Scaffold_1 26165000 26172500 44 0.8266236 4.750508 1
5: Scaffold_2 1850000 1857500 22 1.2850308 4.393345 2
---
270: Scaffold_5 21485000 21492500 86 0.7723968 4.291471 1
271: Scaffold_5 22917500 22925000 31 0.4373078 4.877142 1
272: Scaffold_5 23142500 23150000 59 0.3461267 4.425686 1
273: Scaffold_5 23582500 23590000 17 1.2485792 4.621083 2
274: Scaffold_5 24397500 24405000 15 1.7401656 4.608976 1
Changing to window size of 1000 and overlap of 500 I get 1500 bp windows in the results:
cr.NA<-as.data.table(calc_candidate_regions(wgscan.ihs, threshold=4, window_size=1000, overlap=500))
CHR START END N_MRK MEAN_MRK MAX_MRK N_EXTR_MRK
1: Scaffold_1 24958500 24960000 34 0.8126399 4.159548 1
2: Scaffold_2 1852000 1853500 24 2.6557830 4.913181 6
3: Scaffold_2 2543500 2545000 10 1.0993284 4.809978 1
4: Scaffold_2 3244000 3246000 18 2.2145842 4.419010 5
5: Scaffold_2 23282000 23283500 27 1.6423490 4.182361 2
---
732: Scaffold_5 7874500 7876000 16 3.0675244 4.791696 1
733: Scaffold_5 7877000 7878500 30 1.3457374 4.922541 2
734: Scaffold_5 7882000 7884000 27 1.5637379 4.751006 4
735: Scaffold_5 24399500 24401000 7 2.3004580 4.938355 1
736: Scaffold_5 24652000 24654000 16 2.3174407 4.748404 5
PERC_EXTR_MRK MEAN_EXTR_MRK
1: 2.94 4.159548
2: 25.00 4.577210
3: 10.00 4.809978
4: 27.78 4.244684
5: 7.41 4.128900
---
732: 6.25 4.791696
733: 6.67 4.922541
734: 14.81 4.338285
735: 14.29 4.938355
736: 31.25 4.505172
I've attached my rehh_wgscan_bychr_NorthAmerica.Rdat file containing the wgscan.ihs
object. This is not a huge issue for my application and I can easily adjust the parameters to get my intended sizes, but just thought I would let you know while I was thinking about it!