Proposal for teensy/csv config

Proposal

I'm thinking about a yaml file to keep some information about the teensy (or just generally csv files during experiments, e.g. teensy or pi), for (1) record-keeping, (2) assistance with parser and (3) potential modifications on the experimenters' side.

Ideally each experimenter would not change these too much, and only have a few per person, and save somewhere more globally, detached from their data folders.

Why

yaml file is serializable and intuitive enough for the experimenters to modify and/or add when needed.
Variability within each experimenter and across many experimenters.
Hopefully this provides users with more freedom, though would still need sufficient documentation and some initial demo to ease the users in.

Potential features

(not all are used or integrated or all thought out yet)

Allow end-users (experimenters) to define the organization of their acquisition file (and hopefully also encourage them to be organized)
Option to remove certain meaningless columns
SI unit conversion via pint by needing only the unit field (optional if unitless)
Define some known valid range to filter out invalid data or turn them into nan values
Invalid data could also be mapped into something else or transformed (like shifting) (though not recommended here)
Integration of an odor/stimulus dictionary(ies) (see the reference field and support section below), TBD on how to integrate to output nwb
Integration of each field's description/notes (see description field) to output file
Integration of the category (or tag) field, to tag certain columns and possibly tell nwb_converter where these fields should go.
- This needs to be accompanied with sufficient guide for users to know where to file each column under which category.
- This might encourage users to learn a bit about the standards and organization of nwb as well.
For complicated pre-processing, maybe point to a .py file instead (though not recommended here)

Related issues

fleischmann-lab/calcium-imaging/calimag#68 (closed): hopefully we won't have to make EXPERIMENTAL_VERSION a needed parameter to rely on in order to figure out how to process each experiment.
- This would remove some hard-coded parts in parsers.py, especially once the 1p data come in.
- However, I still like the idea of EXPERIMENTAL_VERSION and I think we can have some rough guidelines on how to format this, but keep it optional maybe?
fleischmann-lab/calcium-imaging/calimag#81 (closed): this proposal hopefully can solve this
fleischmann-lab/calcium-imaging/calimag#50: I have not used parsita but it looks quite nice.
- However, I'm not sure whether it would fall on the end-user or developer to define the grammar to parse the "dynamic/variable" csv-file setup, and how hard it would be.
- I'm curious on how it would apply to the microscope files though, which are more "static/standardized" files.
fleischmann-lab/calcium-imaging/calimag#58 (closed): potentially the user could describe the odor dictionary inside this file, or have a field pointing to a specific file for that (this file should not change too much), though still need to validate the odor dictionary somehow

Example

Longer example `yaml` file

Below is a rough draft of what the config/data-dictionary looks like for SD_20210511_513_teensy.csv file. This file below, for the sake of demonstration and completion, is longer than intended. See the other yaml files for (hopefully) more concise demo.

# SD_teensy.yml
from: teensy # device or source like teensy or respi 
author: SD # need to have an agreement on how to save label 
description: |
  Acquisition from Teensy 
  Simon's 2p experiment 
meta: 
  delimiter: "," # optional, default: ","
  has_headers: false # optional, default: false
  dirty: true # required 
  cleaners: 
    keep_only_valid_num_fields: true # remove lines without the right # fields in `main`
    remove_lonely_lines: true # remove lines not surrounding by valid lines 
main:
  - label: time # required for all fields
    unit: ms # if has unit, if unitless then either `unit = 1` or remove field
  - label: odor
    description: odor stimulus identity # TBD on integration  
    reference: odor_identity # optional, to be saved in somewhere in output
    category: # TBD on integration; categories/tags to consider where to put in nwb, need to have an agreed items, and whether order matters 
      - acquisition
      - stimulus
    unit: ms
  - label: unknown1
    discard: true # remove this column 
  - label: wheel
    unit: rad
    valid_range: [-.inf, 4096] # valid range after unit conversion 
    action_on_invalid: to_nan # only available options: to_nan (default), remove
    category:
      - behavior
      - acquisition 
  - label: trial
    unit: 1 # unitless 
  - label: unknown2
    discard: true

support:
  odor_identity: # this is a dictionary for experimenter to enter, TBD on how to integrate. Can also refer to a specific file 
    type: map # or dict or ref
    source: PubChem_CID # SMILES, CAS, PubChem IDs, etc
    values: 
      - 0: 
          id: 5988 
          label: sugar
          dilution:  # or concentration
          manufacture: Sigma
          flow_rate: 123
      - 1:
          id: [14252071, 3082460]
          label: [Arseno sugar 3, Invert sugar]
          dilution: |
            something mixes with something in a% and b%
          manufacture: Sigma
          flow_rate: 456

Short-ish example for spontaneous

from: teensy
author: MS
description: |
  Acquisition from Teensy 
  Max's 2p spontaneous experiment (year unknown)
meta: 
  dirty: true
  cleaners: 
    keep_only_valid_num_fields: true 
    remove_lonely_lines: true 
main:
  - label: time
    unit: ms
    valid_range: [1.0e-3, 30.0]
    action_on_invalid: drop
  - label: flow
    unit: cm^3/min
    category:
      - acquisition
      - behavior
  - label: wheel
    unit: rad
    valid_range: [0, 4096]
    category:
      - acquisition
      - behavior

Usage

Required numpy, pandas, pyyaml

Usage

from exp_csv_parser import read_acq_csv_file
df = read_acq_csv_file(
    info_file='SD_teensy.yml',
    data_file='SD_20210511_513_teensy.csv'
)

Notes on files

exp_csv_parser.py: draft code
SD_teensy.yml: the file above, for SD_20210511_513_teensy.csv in the data/Raw_Data_Example folder, correspondig to v2021.05:2p_imaging_head_fixed+Simon:Teensy
MS_2019_teensy.yml: for MS_20190710_M002_teensy.txt in the data folder, corresponding to v2019.07:2p_imaging_head_fixed+Max:Teensy
MS_spont.yml: for MS_spontdemo_teensy.txt in the data/spontaneous-activity folder, corresponding to v2021.04:2p_imaging_head_fixed+Max:Teensy_spontaneous-activity
MS_2022_teensy.yml: for the data referenced in issue fleischmann-lab/calcium-imaging/calimag#81 (closed) (not publicly available)

For @maxseppo data, notice that the 2 files MS_2019_teensy.yml and MS_2022_teensy.yml differ only a bit, which means the user would only need to modify only some lines.

Edited Aug 04, 2022 by Tuan Pham

Proposal for teensy/csv config

Proposal

Why

Potential features

Related issues

Example

Longer example yaml file

Short-ish example for spontaneous

Usage

Notes on files

Longer example `yaml` file