Commit 3d0397b6 authored by Mike Ledger's avatar Mike Ledger

readme

parent 028640a9
# csvmaps
This is a command-line tool for manipulating map-like CSV data, where the first
column is treated as a "key" for that row.
## Usage
```shell
$ csvmaps --help
Usage: csvmaps [-i|--infiles STRING]... [-o|--outfile STRING]
[--has-header BOOL] [--op MAPOP]
[--expr MAPEXPR BYTESTRING [BYTESTRING]] [--save-labels BOOL]
[--verbose BOOL]
Available options:
-h,--help Show this help text
-i,--infiles STRING... Files to combine. The first column of each csv will
be used as a key. For non-csv files ( iles that don't
end in .csv), each line will be treated as a key.
-o,--outfile STRING .csv file to write output to. If omitted, use stdout.
--has-header BOOL whether to ignore header in csv files
--op MAPOP The operation to use to combine csv files (default:
union)
--expr MAPEXPR BYTESTRING [BYTESTRING]
The expression to use to combine csv files. Reference
the inputs with $N for the Nth document in the
"infiles". The operations available are:
1. Union with the + operator;
2. Difference with the - operator;
3. Intersection with the * operator;
4. Union combining all columns with the +. operator;
5. Intersection combining all columns with the *.
operator
6. Labels for expressions with the syntax
'"LABEL": EXPR'.
```
####
### Syntax
`$N`
: References the Nth document specified, starting at 1.
`keys A`
: Discards the values of all rows.
`nulls A`
: Discards the rows that have non-empty values.
`non-nulls A`
: Discards the rows that have empty values.
`A : "label"`
: Assigns a label to the expression `A`. When used with the `--save-labels`
option, this will also result in a file called `label.csv` being created using
`A`.
`const ["Col_1","Col_2",...] A`
: Replaces all values of `A` with the columns given in the first argument.
`pad N A`
: Ensures the number of columns after each key is at least N; fills ones that
don't exist with empty strings.
`col N A`
: Takes the Nth column.
`col [N1,N2,...] A`
: Uses the columns given.
#### Unions
`A + B`
: Left-biased union of A and B. Same as `A + B`. When keys exist in both maps,
the values at those keys are taken from the left operand.
`A |+ B`
: Left-biased union of A and B. Same as `A + B`. When keys exist in both maps,
the left operand's values at those keys are used, unless they are empty.
`A +| B`
: Right-biased union of A and B. Same as `A +| B`. When keys exist in both maps,
the right operand's values at those keys are used, unless they are empty.
`A |+| B`
: Union of A and B that concatenates the values of all keys that both operands
have in common.
#### Intersections
`A * B`
: Left-biased intersection of A and B. Same as `A * B`. When keys exist in both
maps, the values at those keys are taken from the left operand.
`A |* B`
: Left-biased intersection of A and B. Same as `A * B`. When keys exist in both
maps, the left operand's values at those keys are used, unless they are empty.
`A *| B`
: Right-biased intersection of A and B. Same as `A *| B`. When keys exist in both
maps, the right operand's values at those keys are used, unless they are empty.
`A |*| B`
: Intersection of A and B that concatenates the values of all keys that both
operands have in common.
#### Difference
`A - B`
: Returns the rows of A whose keys are not in B.
## Motivation
The use case that motivated it was to manipulate a dictionary of terms from a
project that we were tasked to clean up, with very little time to actually do
it.
The dictionary was simply all the title-case(*) terms that we could
automatically pull out of a particular version of the project. This lead to
there being thousands of spurious terms (e.g., from words at the beginning of a
sentence), and terms that were only partially detected, and a proportional
amount of manual work to cull or edit them.
The tool allowed me to:
1. Delete the same terms deleted in version A in version A+1
2. For terms that were edited rather than deleted in version A, replace the
non-edited version from A+1 (if it existed) with the version from A
2. Detect entirely new terms
3. Combine the official project dictionaries with the title-case one I'd
produced
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment