Commit 69e820bb authored by Jozef Hajnala's avatar Jozef Hajnala

Add vroom::vroom + purrr::map_dfr

parent 3cddd147
......@@ -32,6 +32,12 @@ bash bench/bench.sh rscripts/02_fread.R &> results/out_fread.txt
bash bench/bench.sh rscripts/03_readr.R &> results/out_readr.txt
```
### For vroom::vroom with purrr::map_dfr
```
bash bench/bench.sh rscripts/06_vroom_purrr.R &> results/out_vroom_purrr.txt
```
### For data.table::fread with grep
```
......@@ -49,7 +55,8 @@ bash bench/bench.sh rscripts/05_readr_grep.R &> results/out_readr_grep.txt
| method | max. memory | avg. time |
|----------------------------------------|------------:|----------:|
| `utils::read.csv` + `base::rbind` | 21.70 GB | 8.13 m |
| `readr::read_csv` + `purrr:map_dfr` | 27.02 GB | 3.43 m |
| `readr::read_csv` + `purrr::map_dfr` | 27.02 GB | 3.43 m |
| `vroom::vroom` + `purrr::map_dfr` * ** | 25.70 GB | 1.67 m |
| `data.table::fread` + `rbindlist` | 15.25 GB | 1.40 m |
| `data.table::fread` from `grep` | 1.68 GB | 0.34 m |
| `readr::read_csv`+ `pipe()` from `grep`| 1.70 GB | 0.88 m |
......@@ -57,6 +64,12 @@ bash bench/bench.sh rscripts/05_readr_grep.R &> results/out_readr_grep.txt
- max. memory = Maximum resident set size
- avg. time = Average maximum of real time and user time as measured by `time`
### Notes
1. (*) note that vroom does not get all advertised speed benefits with the used R version (3.4.4)
2. (**) note the unusually high user time in the [detailed results](results/out_vroom_purrr.txt)
## SessionInfo
- R version 3.4.4 (2018-03-15)
......@@ -69,3 +82,4 @@ used packages:
- readr_1.3.1
- magrittr_1.5
- purrr_0.2.4
- vroom_1.0.1
rscripts/06_vroom_purrr.R
Maximum resident set size (kbytes): 26945988
real 16m44.595s
user 69m15.484s
sys 2m4.276s
suppressPackageStartupMessages({
library(vroom)
library(purrr)
library(magrittr)
})
dataDir <- path.expand("~/dataexpo")
dataFiles <- dir(dataDir, pattern = "csv$", full.names = TRUE)
# rbind_rows won't coerce, prefedine
col_types <- vroom::cols(
.default = vroom::col_double(),
UniqueCarrier = vroom::col_character(),
TailNum = vroom::col_character(),
Origin = vroom::col_character(),
Dest = vroom::col_character(),
CancellationCode = vroom::col_character(),
CarrierDelay = vroom::col_double(),
WeatherDelay = vroom::col_double(),
NASDelay = vroom::col_double(),
SecurityDelay = vroom::col_double(),
LateAircraftDelay = vroom::col_double()
)
df <- dataFiles %>%
purrr::map_dfr(
vroom::vroom,
col_types = col_types,
progress = FALSE,
num_threads = 8L
)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment