Proposal for teensy/csv config
Proposal
I'm thinking about a yaml
file to keep some information about the teensy (or just generally csv files during experiments, e.g. teensy or pi), for (1) record-keeping, (2) assistance with parser and (3) potential modifications on the experimenters' side.
Ideally each experimenter would not change these too much, and only have a few per person, and save somewhere more globally, detached from their data
folders.
Why
-
yaml
file is serializable and intuitive enough for the experimenters to modify and/or add when needed. - Variability within each experimenter and across many experimenters.
- Hopefully this provides users with more freedom, though would still need sufficient documentation and some initial demo to ease the users in.
Potential features
(not all are used or integrated or all thought out yet)
-
Allow end-users (experimenters) to define the organization of their acquisition file (and hopefully also encourage them to be organized) -
Option to remove certain meaningless columns -
SI unit conversion via pint
by needing only theunit
field (optional if unitless) -
Define some known valid range to filter out invalid data or turn them into nan
values -
Invalid data could also be mapped into something else or transformed (like shifting) (though not recommended here) -
Integration of an odor/stimulus dictionary(ies) (see the reference
field andsupport
section below), TBD on how to integrate to outputnwb
-
Integration of each field's description/notes (see description
field) to output file -
Integration of the category
(ortag
) field, to tag certain columns and possibly tellnwb_converter
where these fields should go.- This needs to be accompanied with sufficient guide for users to know where to file each column under which category.
- This might encourage users to learn a bit about the standards and organization of
nwb
as well.
-
For complicated pre-processing, maybe point to a .py
file instead (though not recommended here)
Related issues
-
fleischmann-lab/calcium-imaging/calimag#68 (closed): hopefully we won't have to make
EXPERIMENTAL_VERSION
a needed parameter to rely on in order to figure out how to process each experiment.- This would remove some hard-coded parts in
parsers.py
, especially once the 1p data come in. - However, I still like the idea of
EXPERIMENTAL_VERSION
and I think we can have some rough guidelines on how to format this, but keep it optional maybe?
- This would remove some hard-coded parts in
- fleischmann-lab/calcium-imaging/calimag#81 (closed): this proposal hopefully can solve this
-
fleischmann-lab/calcium-imaging/calimag#50: I have not used
parsita
but it looks quite nice.- However, I'm not sure whether it would fall on the end-user or developer to define the grammar to parse the "dynamic/variable" csv-file setup, and how hard it would be.
- I'm curious on how it would apply to the microscope files though, which are more "static/standardized" files.
- fleischmann-lab/calcium-imaging/calimag#58 (closed): potentially the user could describe the odor dictionary inside this file, or have a field pointing to a specific file for that (this file should not change too much), though still need to validate the odor dictionary somehow
Example
yaml
file
Longer example Below is a rough draft of what the config/data-dictionary looks like for SD_20210511_513_teensy.csv
file. This file below, for the sake of demonstration and completion, is longer than intended. See the other yaml
files for (hopefully) more concise demo.
# SD_teensy.yml
from: teensy # device or source like teensy or respi
author: SD # need to have an agreement on how to save label
description: |
Acquisition from Teensy
Simon's 2p experiment
meta:
delimiter: "," # optional, default: ","
has_headers: false # optional, default: false
dirty: true # required
cleaners:
keep_only_valid_num_fields: true # remove lines without the right # fields in `main`
remove_lonely_lines: true # remove lines not surrounding by valid lines
main:
- label: time # required for all fields
unit: ms # if has unit, if unitless then either `unit = 1` or remove field
- label: odor
description: odor stimulus identity # TBD on integration
reference: odor_identity # optional, to be saved in somewhere in output
category: # TBD on integration; categories/tags to consider where to put in nwb, need to have an agreed items, and whether order matters
- acquisition
- stimulus
unit: ms
- label: unknown1
discard: true # remove this column
- label: wheel
unit: rad
valid_range: [-.inf, 4096] # valid range after unit conversion
action_on_invalid: to_nan # only available options: to_nan (default), remove
category:
- behavior
- acquisition
- label: trial
unit: 1 # unitless
- label: unknown2
discard: true
support:
odor_identity: # this is a dictionary for experimenter to enter, TBD on how to integrate. Can also refer to a specific file
type: map # or dict or ref
source: PubChem_CID # SMILES, CAS, PubChem IDs, etc
values:
- 0:
id: 5988
label: sugar
dilution: # or concentration
manufacture: Sigma
flow_rate: 123
- 1:
id: [14252071, 3082460]
label: [Arseno sugar 3, Invert sugar]
dilution: |
something mixes with something in a% and b%
manufacture: Sigma
flow_rate: 456
Short-ish example for spontaneous
from: teensy
author: MS
description: |
Acquisition from Teensy
Max's 2p spontaneous experiment (year unknown)
meta:
dirty: true
cleaners:
keep_only_valid_num_fields: true
remove_lonely_lines: true
main:
- label: time
unit: ms
valid_range: [1.0e-3, 30.0]
action_on_invalid: drop
- label: flow
unit: cm^3/min
category:
- acquisition
- behavior
- label: wheel
unit: rad
valid_range: [0, 4096]
category:
- acquisition
- behavior
Usage
Required numpy
, pandas
, pyyaml
Usage
from exp_csv_parser import read_acq_csv_file
df = read_acq_csv_file(
info_file='SD_teensy.yml',
data_file='SD_20210511_513_teensy.csv'
)
Notes on files
- exp_csv_parser.py: draft code
-
SD_teensy.yml: the file above, for
SD_20210511_513_teensy.csv
in thedata/Raw_Data_Example
folder, correspondig tov2021.05:2p_imaging_head_fixed+Simon:Teensy
-
MS_2019_teensy.yml: for
MS_20190710_M002_teensy.txt
in thedata
folder, corresponding tov2019.07:2p_imaging_head_fixed+Max:Teensy
-
MS_spont.yml: for
MS_spontdemo_teensy.txt
in thedata/spontaneous-activity
folder, corresponding tov2021.04:2p_imaging_head_fixed+Max:Teensy_spontaneous-activity
- MS_2022_teensy.yml: for the data referenced in issue fleischmann-lab/calcium-imaging/calimag#81 (closed) (not publicly available)
For @maxseppo data, notice that the 2 files MS_2019_teensy.yml
and MS_2022_teensy.yml
differ only a bit, which means the user would only need to modify only some lines.