Skip to content
Update Data preparation workflow authored by Stephen Parsons's avatar Stephen Parsons
......@@ -64,7 +64,7 @@ python /usr/local/dri/ink-id/inkid/scripts/sample_hdf_min_max.py --hdf-file /psc
For particularly large datasets (such as those split into slabs) the entire dataset may not fit on your desktop machine. In these cases it may then be more efficient to crop the source files on the source server before transferring them to your desktop for tasks requiring a graphical interface/user intervention. This allows for slabs to be processed in parallel up until volume packaging and greatly reduces the size of the initial data transfer.
### One time setup:
## One time setup:
For convenience, a singularity container and skeleton SLURM script have been placed in the DRI Datasets drive directory under Resources. These will allow for easy use of the relevant scripts from volume cartographer on the LCC servers, and should be copied to your scratch space before you begin:
......@@ -74,7 +74,7 @@ rclone copy dri-datasets-remote:/Resources/ $SCRATCH/data_processing_temp_space/
The SLURM script included should be lightly edited to be specific to the user. In particular the email field should be changed to the relevant address.
### File transfer
## File transfer
You will need to move the source hdf slab(s) to your scratch space on the LCC. This is because we will be running crop and packaging on one of the LCC worker nodes for which the `/gemini1-3/` location is not mounted, so the files need moved somewhere the job nodes can find them:
......@@ -92,7 +92,7 @@ On your work machine:
rclone copy dtn-remote:/path/to/large/dataset ./data_processing_temp_space/ -v --include *.tif
```
### Running extract/crop on LCC
## Running extract/crop on LCC
Now use sbatch and the previously copied SLURM scripts to run extract/crop on the LCC system. The parameters passed to this script will be passed along to the hdf5_to_tif.py file included with volume cartographer, so should be treated in the same way:
......@@ -102,11 +102,11 @@ sbatch run_hdf_to_tif.sh --input-file ./data_processing_temp_space/slab_file.hdf
This should run fairly quickly and multiple slabs can be processed at once. Now you can transfer these cropped slices to your workstation with rclone and proceed to packaging as normal. Don't forget to include with your volume package the details of the crop performed, as specified below.
## Transfer data
# Transfer data
Get the original volume or slices onto the machine you are using to process the dataset. This depends on context, but typically we use scp, rclone, etc.
## Crop slices (optional)
# Crop slices (optional)
Many of our datasets are too large to be processed efficiently in their native format. Cropping is the preferred method for reducing size as it maintains the spatial resolution of the scan. Scan through the slices to determine a good bounding box for the object in the scan. Test your crop using the `convert` utility provided by ImageMagick. The following command creates a 9060x1794 image starting at pixel (670,830) in the `full_slice_0000.tif` input image:
......@@ -141,11 +141,11 @@ Output: cropped/
Crop: 9060x1794+670+830
```
## Resize volume (optional)
# Resize volume (optional)
\*\* Coming soon \*\*
## Create .volpkg
# Create .volpkg
If your object does not yet have a Volume Package, use `vc_packager` to create one. New packages require the following flags:
......@@ -183,9 +183,9 @@ vc_packager -v ObjectName.volpkg -s MixedData/slices_%04d.tif
vc_packager -v ObjectName.volpkg -s OnlySlices/
```
## Segmentation
# Segmentation
### Hidden layers
## Hidden layers
Now two options are available for segmenting hidden layers:
......@@ -235,7 +235,7 @@ path/to/volume-cartographer/build/bin/vc_convert_pointset -i pointset_name.vcps
- (Fall 2020) Upload the merged point cloud and all pointsets (mask_pointset.vcps and pointset.vcps for each segmentation) to the DRI Experiments Shared Drive inside the folder linked here: https://drive.google.com/drive/folders/1U7wg1mGDlg6wLsRx_EtCIEMsNmh8yxJj?usp=sharing Create a new folder named after the page number, and put the .vcps files, point clouds, and meshes inside that folder.
- **(Fall 2020: ignore this step)** These point clouds need further cleaning before they can be turned into a mesh. This process is done in Meshlab. Refer to the processing instructions for canny segmentation below (see the "Exposed layers" section.) The same steps will be necessary for these point clouds.
### Exposed layers
## Exposed layers
For flat, exposed layers, manually segment the layer using the canny edge detection segmentation utility. To better keep track of these manually define segmentations, first make a working directory for your segmentation inside the Volume Package and then run `vc_canny_segment`:
......@@ -273,11 +273,11 @@ The output of this process, `canny_raw.ply`, is a dense point set and requires f
* `Mesh has N > 0 holes`: Run `Filters/Remeshing, Simplification and Reconstruction/Close Holes`. Adjust `Max size to be closed` to large values until all holes are closed.
10. Save your final mesh as a new file with a name which matches your working directory (e.g. `54kv_surface_layer.ply`). After selecting the output file location, a window with saving options will open. Click the box to uncheck `Binary encoding` to save the file in an ASCII format. **This is required for using this mesh with vc_render.**
## Texturing
# Texturing
All texturing should be performed with the `vc_render` command-line application. Do not use VC Texture.app.
### Segmentations from VC.app
## Segmentations from VC.app
Make a new working directory for your segmentation inside the Volume Package and provide `vc_render` with the volume package and segmentation ID of your segmentation:
......@@ -292,7 +292,7 @@ cd working/54kv_internal_layer/
vc_render -v ../../ -s 20200125113143 --output-ppm 54kv_internal_layer.ppm --uv-plot 54kv_internal_layer_uvs.png --method 1 -o 54kv_internal_layer.obj
```
### Segmentations from canny segmentation
## Segmentations from canny segmentation
Provide `vc_render` with the volume package, the final mesh produced by Meshlab, and the ID of the segmented volume:
......@@ -303,7 +303,7 @@ Provide `vc_render` with the volume package, the final mesh produced by Meshlab,
vc_render -v ../../ --input-mesh 54kv_surface_layer.ply --volume 20200125113143 --output-ppm 54kv_surface_layer.ppm --uv-algorithm 2 --uv-plot 54kv_surface_layer_uvs.png --method 1 -o 54kv_surface_layer.obj
```
### Retexturing the segmentation
## Retexturing the segmentation
The above commands generate a texture image using the Intersection texture method (`--method 1`). This is the fastest texturing method and will help you more quickly verify that your flattened surface is correctly oriented and contains no significant flattening errors. However, this image is not always useful for aligning the reference image. If you have difficulty finding point correspondences in the [registration step](#align-the-reference-image), use the `vc_render_from_ppm` utility to generate new texture images using alternative parameters:
......@@ -318,7 +318,7 @@ There are many texturing parameters available in both `vc_render` and `vc_render
* Integral method: Return the sum of the neighborhood's intensity values. Sometimes shows subtle details that are missed by the Composite method. Enable with these options: `--method 2`.
* Adjust the texturing radius: The size of the texturing neighborhood is automatically determined by the Volume Package's material thickness metadata field. Because this value is an estimate of a layer's thickness, it is sometimes too small/too large. To manually set the search radius, provide the `--radius` option a real value in voxel units.
### Speeding up flattening and PPM generation
## Speeding up flattening and PPM generation
The processing times for flattening and PPM generation are sensitive to the number of faces in the segmentation mesh. In particular, meshes generated from the `vc_canny_segment` process are often densely sampled, thus leading to long processing times. For these meshes, use the mesh resampling options in `vc_render`:
......@@ -328,7 +328,7 @@ vc_render -v ../../ --input-mesh 54kv_surface_layer.ply --volume 20200125113143
See `vc_render --help` for more options related to resampling. This flag is enabled by default for segmentation inputs passed with the `-s` option, but disabled for all inputs pass with `--input-mesh`. The number of vertices in the output mesh can be controlled with the `--mesh-resample-factor` option, which sets the approximate number of vertices per square millimeter in the resampled mesh. Newer versions of volume-cartographer (5abb42db and up) additionally have the `--mesh-resample-vcount` option which exactly controls the number of vertices in the output mesh. Be careful to not set the vertex count value too low, as this can modify your mesh such that it no longer intersects the object's surface.
### Fixing orientation errors
## Fixing orientation errors
The various flattening (aka UV) algorithms available in volume-cartographer will produce flattened surfaces which are often flipped or rotated relative to what the observer would expect if they were to look at the surface in real life. The presence of these transformations may not become known until attempting to align the reference photograph to the generated texture image. **Textures, PPMs, and all subsequent steps should be updated to match the expected orientation when these problems are detected.**
......@@ -345,9 +345,9 @@ vc_render -v ../../ -s 20200125113143 --output-ppm 54kv_internal_layer.ppm --uv-
Consult an expert or scholar to ensure the orientation at this stage is correct. To us CS folk, it can be easy to have text that looks correct but is actually mirrored, for example. This is the time to make sure it is oriented correctly!
## Align the reference image
# Align the reference image
### Using algorithmic registration
## Using algorithmic registration
Using the [Landmark Picker GUI app](https://code.cs.uky.edu/seales-research/landmark-picker), generate a landmarks file which maps points in the reference photograph onto the same points in the texture image generated by the previous step.
......@@ -369,7 +369,7 @@ rt_apply_transform -f 54kv_surface_layer.png -m PHercParis2_Fr143r_RGB.jpg -t 54
This can be useful if you wish to align an RGB photograph to the texture image, but surface details can only be seen in an alternative channel (i.e. infrared).
### Manual registration using Photoshop's Puppet Warp
## Manual registration using Photoshop's Puppet Warp
- drag files into PS
- will consolidate into render layer (need to take other layers back to 8 bit later)
......@@ -396,7 +396,7 @@ This can be useful if you wish to align an RGB photograph to the texture image,
- can re-enter puppet warp and make more changes if desired by double clicking "puppet warp" under smart filters in layers dialog
- might want to read up on puppet warp documentation
## Generate ink labels
# Generate ink labels
Ink labels are black-and-white images which indicate those areas of the PPM which contain ink and those which do not. They are manually created in Photoshop using the following steps:
......@@ -415,31 +415,10 @@ Ink labels are black-and-white images which indicate those areas of the PPM whic
* Select `File/Save As...` and save this image as a PNG to your working directory (e.g. `54kv_surface_layer_inklabels.png`). **Be careful not to overwrite the Photoshop file saved previously.**
* Close Photoshop but **do not** save the Photoshop file.
## Region set
# Region set
For now, manually create region set .json file defining training and prediction regions.
## Run ML
# Run ML
Based on inkid documentation and examples, found in the README here. The “SLURM Jobs” section points you to documentation for running jobs using SLURM and Singularity. A prebuilt container is available here so you shouldn’t have to go through the build process yourself.
## Uploading to Google Drive
## Gotchas
- **Transfer:**
- Transferring ><span dir="">\~</span>1TB volumes can be surprisingly difficult. Google Drive offers us unlimited space but maximum 750GB/day upload/download. This only applies when beginning to transfer a new file, so it is still possible to deal with files that are >750GB, you just only get one per day. So, it is possible to create an archive file of a set of slices (usually compressing them takes more time than it is worth) and then just transfer the one file. For some reason rclone struggles with this, and has so far failed to work on files this large for me (but it will spend a day getting to 70% before failing and starting over). I have had more luck just downloading the 1TB file through a web browser and the Google Drive web UI. Of course that is also prone to paused or canceled downloads if you aren't careful.
- **Resize/crop slices:**
- Presently, only 16-bit TIF volumes are supported by Volume Cartographer. It is tempting when dealing with a particularly large volume to convert it to 8-bit to reduce the size by half, in an attempt to fit it into memory for running `inkid`. If you do this you'll be silently reminded later (by messed up texture images) that 8-bit volumes are not supported yet.
- ImageJ can load a set of slices for visualization/processing without holding them all in memory at once (if using "Virtual Stack"). This is quite nice. Unfortunately to do any heavy lifting (resize/interpolate an entire volume) it has to load the whole thing into memory, which is not possible with large enough volumes. It can also batch process the slices as individual images, so I have tried that for cropping the slices of a volume. However, it messed with the dynamic range of the slices in some silent and unknown way. The slices looked fine individually, but had different mean brightness from each other. This was only discovered way down the line when looking at a texture image, which was marred with odd stripes to the point of being useless.
- **Create .volpkg:**
- Remember that we have a tool to do this, `vc_packager`. If you forget, as I have, you will manually do a lot of creating of directories, moving/bulk renaming slice files, copying/pasting/adjusting `config.json` files, etc. This allows you to introduce all sorts of errors at the structural level, and also things like you might accidentally use 8-bit slices and only later discover that does not work.
- **Segment:**
- Meshlab uses different z coordinates, so the mesh will look like a mirror image of what it's supposed to look like in your head or based on photos of the actual object. This is OK, just ignore it...
- **Texture:**
- PPMs are presently made to match the volume resolution, so they are huge files (have so far seen 11GB+) for high resolution volumes. We don't yet have a way to create a lower resolution texture that only samples some of these points. So creating a texture image from one of these PPMs takes ages, and of course if anything is wrong you won't know until the program is done executing. While some things cannot be checked until the texture is complete, use the file generated by the `--uv-plot` to catch any flattening and orientation problems early.
- For now have created `ink-id/scripts/misc/scale_down_ppm.py` to create a much smaller PPM from an input large PPM. This just scales down by some power of 2 and resamples, no interpolation or anything like that. It allows one to more quickly create texture images to sanity check the process, but is not a solution to the general problem of PPMs having resolution fixed to a particular scan.
- **Align reference:**
- **Label (mask/image):**
- **Region set:**
- **Run ML:**
\ No newline at end of file