Skip to content
Update Data preparation workflow authored by Stephen Parsons's avatar Stephen Parsons
......@@ -54,7 +54,7 @@ Record the x and y offsets as well as the width and height of the bounding box y
![Screenshot_2023-02-04_at_3.29.53_PM](uploads/1611b662833a0ae3f76a537258c666d9/Screenshot_2023-02-04_at_3.29.53_PM.png)
# Generate/extract slices, window and crop
# Generate/extract slices, window, and crop x-y
The goal of this step is to get a 16-bit .tif image stack. For benchtop sources, this is probably already done as part of the reconstruction process. For synchrotron scans, this step may be necessary. For example in the 2019 Diamond Light Source scans, the reconstruction output 32-bit float .hdf files from which .tif slices need to be extracted. Often, such as with the fragments scanned in that session, there is a separate .hdf for each "slab". Slices should be extracted from each, and then merged later. The extraction for multiple .hdf files can be done in one command.
......@@ -80,50 +80,9 @@ sbatch -p CAL48M192_L --time=14-00:00:00 inkid_general_cpu.sh python /usr/local/
--crop-min-y 4555 \
--crop-height 1399
```
# Merge slices, crop z
## One time setup:
For convenience, a singularity container and skeleton SLURM script have been placed in the DRI Datasets drive directory under Resources. These will allow for easy use of the relevant scripts from volume cartographer on the LCC servers, and should be copied to your scratch space before you begin:
```shell
rclone copy dri-datasets-remote:/Resources/ $SCRATCH/data_processing_temp_space/ -v
```
The SLURM script included should be lightly edited to be specific to the user. In particular the email field should be changed to the relevant address.
## File transfer
You will need to move the source hdf slab(s) to your scratch space on the LCC. This is because we will be running crop and packaging on one of the LCC worker nodes for which the `/gemini1-3/` location is not mounted, so the files need moved somewhere the job nodes can find them:
On the LCC data transfer node:
```shell
rclone copy /path/to/large/dataset $SCRATCH/data_processing_temp_space/ -v --include *.hdf
```
This will copy the hdf5 source files to your scratch space, but not the previously extracted .tif files, to view these .tif files and determine the appropriate crop dimensions to perform, you will still need to transfer these files to your work machine and view them in ImageJ/Photoshop.
On your work machine:
```shell
rclone copy dtn-remote:/path/to/large/dataset ./data_processing_temp_space/ -v --include *.tif
```
## Running extract/crop on LCC
Now use sbatch and the previously copied SLURM scripts to run extract/crop on the LCC system. The parameters passed to this script will be passed along to the hdf5_to_tif.py file included with volume cartographer, so should be treated in the same way:
```shell
sbatch run_hdf_to_tif.sh --input-file ./data_processing_temp_space/slab_file.hdf --output-dir ./data_processing_temp_space/cropped_slab/ --min-x min_val --max-x max_val --min-y min_val --max-y max_val
```
This should run fairly quickly and multiple slabs can be processed at once. Now you can transfer these cropped slices to your workstation with rclone and proceed to packaging as normal. Don't forget to include with your volume package the details of the crop performed, as specified below.
# Transfer data
Get the original volume or slices onto the machine you are using to process the dataset. This depends on context, but typically we use scp, rclone, etc.
# Crop slices (optional)
LEFT OFF
Many of our datasets are too large to be processed efficiently in their native format. Cropping is the preferred method for reducing size as it maintains the spatial resolution of the scan. Scan through the slices to determine a good bounding box for the object in the scan. Test your crop using the `convert` utility provided by ImageMagick. The following command creates a 9060x1794 image starting at pixel (670,830) in the `full_slice_0000.tif` input image:
......
......