Commit 22bd5694 authored by Jon Tavernier's avatar Jon Tavernier

explore gnu parallel

parent f9687510
# GNU Parallel
GNU Parallel is new to me. It looks like a great way to get a script that's written to process one thing to process many things on a larger box (i.e. process one thing per core).
This "Hello World" exercise shows passing multiple arguments. `parallel` looks a lot more powerful based on the options I see. I'm just scratching the surface here.
## Process One City
The `./process_one_city.sh` script simiulates processing one city. The script has two parameters: `CITY_NAME` and `STATE_ABBREVIATION`.
### Usage
```bash
# crunch data for Chicago, IL
./process_one_city.sh Chicago IL
# output
Chicago, IL - Processing in PID 57675
Chicago, IL - Finished Processing in PID 57675 in 6 seconds
```
## Processing All Cities
The `./process_all_cities.sh` script reads `./cities.tsv` and passes the arguments to `parallel`, which executes `./process_one_city.sh` in parallel.
Chicago IL
Anaheim CA
Buffalo NY
Boulder CO
Atlanta GA
Seattle WA
Ventura CA
Bozeman MT
Hammond IN
Lincoln MO
Orlando FL
#! /usr/bin/env bash
process_all_cities(){
# source data is tab delimited
cat ./cities.tsv | \
parallel \
--bar \
--colsep ' ' \
./process_one_city.sh {1} {2}
}
main(){
process_all_cities
}
main
#! /usr/bin/env bash
set -euo pipefail
CITY_NAME=$1
STATE_ABBREVIATION=$2
process_thing(){
# pretend to process data.
# sleep a random amount of seconds.
local sleep_seconds=$[($RANDOM % 10) + 1]
echo "${CITY_NAME}, ${STATE_ABBREVIATION} - Processing in PID $$"
sleep "${sleep_seconds}s"
echo "${CITY_NAME}, ${STATE_ABBREVIATION} - Finished Processing in PID $$ in ${sleep_seconds} seconds"
}
main(){
process_thing ${CITY_NAME} ${STATE_ABBREVIATION}
}
main
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment