@@ -24,9 +24,15 @@ Files may be distributed to begin with. Check the following documents and locati
*`gemini1-2:/mnt/gemini1-4/seales_uksr/`
*`lcc:/pscratch/seales_uksr/`
## Optional: Extract/crop hdf files on LCC resources
# Generate/extract slices
For particularly large datasets (such as those split into slabs) the entire dataset may not fit on your desktop machine. In these cases it may then be more efficient to crop the source files on the source server BEFORE transferring it to your desktop for tasks requiring a graphical interface/user intervention. This allows for slabs to be processed in parallel up till volume packaging and greatly reduces the size of the initial data transfer.
The goal of this step is to get a 16-bit .tif image stack. For benchtop sources, this is probably already done as part of the reconstruction process. For synchrotron scans, this step may be necessary. For example in the 2019 Diamond Light Source scans, the reconstruction output 32-bit float .hdf files from which .tif slices need to be extracted. Often, such as with the fragments scanned in that session, there is a separate .hdf for each "slab". Slices should be extracted from each, and then merged later.
The range of values in the float .hdf is not the same as the 16-bit integer representation, so the values need to be stretched to \[0-65535\] during this process. The script to extract the slices can do this, but needs an input range specified. To find this range, we use the 1st and 99th percentiles of the input float files. To find these values, one can use this script:
For particularly large datasets (such as those split into slabs) the entire dataset may not fit on your desktop machine. In these cases it may then be more efficient to crop the source files on the source server before transferring them to your desktop for tasks requiring a graphical interface/user intervention. This allows for slabs to be processed in parallel up until volume packaging and greatly reduces the size of the initial data transfer.
* If you enabled dump-vis, a new directory called 'debugvis' will appear inside the working directory. Two directories inside, called 'mask' and 'skeleton', contain images that show what is being segmented. You can use these images as a reference to help you determine when to stop the segmentation.
To obtain a good-quality segmentation, the mask must cover the majority of the layer of interest, but it is fine if some small parts aren't covered or parts of neighboring pages get segmented too. This is an example of a good-quality segmentation: https://drive.google.com/file/d/1_qzL2L2gZpYHYUJznCZENbsW2ueUj8_\_/view?usp=sharing
To obtain a good-quality segmentation, the mask must cover the majority of the layer of interest, but it is fine if some small parts aren't covered or parts of neighboring pages get segmented too. This is an example of a good-quality segmentation: https://drive.google.com/file/d/1_qzL2L2gZpYHYUJznCZENbsW2ueUj8\_\\\_/view?usp=sharing
Check the segmentation occasionally. If a lot of neighboring pages are getting segmented or if the segmentation loses the layer you are segmenting, use Ctrl-C to kill vc_segment **provided you ran vc_segment with `--save interval 1`**.
...
...
@@ -396,7 +402,7 @@ Based on inkid documentation and examples, found in the README here. The “SLUR
## Gotchas
-**Transfer:**
- Transferring >\~1TB volumes can be surprisingly difficult. Google Drive offers us unlimited space but maximum 750GB/day upload/download. This only applies when beginning to transfer a new file, so it is still possible to deal with files that are >750GB, you just only get one per day. So, it is possible to create an archive file of a set of slices (usually compressing them takes more time than it is worth) and then just transfer the one file. For some reason rclone struggles with this, and has so far failed to work on files this large for me (but it will spend a day getting to 70% before failing and starting over). I have had more luck just downloading the 1TB file through a web browser and the Google Drive web UI. Of course that is also prone to paused or canceled downloads if you aren't careful.
- Transferring ><spandir="">\~</span>1TB volumes can be surprisingly difficult. Google Drive offers us unlimited space but maximum 750GB/day upload/download. This only applies when beginning to transfer a new file, so it is still possible to deal with files that are >750GB, you just only get one per day. So, it is possible to create an archive file of a set of slices (usually compressing them takes more time than it is worth) and then just transfer the one file. For some reason rclone struggles with this, and has so far failed to work on files this large for me (but it will spend a day getting to 70% before failing and starting over). I have had more luck just downloading the 1TB file through a web browser and the Google Drive web UI. Of course that is also prone to paused or canceled downloads if you aren't careful.
-**Resize/crop slices:**
- Presently, only 16-bit TIF volumes are supported by Volume Cartographer. It is tempting when dealing with a particularly large volume to convert it to 8-bit to reduce the size by half, in an attempt to fit it into memory for running `inkid`. If you do this you'll be silently reminded later (by messed up texture images) that 8-bit volumes are not supported yet.
- ImageJ can load a set of slices for visualization/processing without holding them all in memory at once (if using "Virtual Stack"). This is quite nice. Unfortunately to do any heavy lifting (resize/interpolate an entire volume) it has to load the whole thing into memory, which is not possible with large enough volumes. It can also batch process the slices as individual images, so I have tried that for cropping the slices of a volume. However, it messed with the dynamic range of the slices in some silent and unknown way. The slices looked fine individually, but had different mean brightness from each other. This was only discovered way down the line when looking at a texture image, which was marred with odd stripes to the point of being useless.