Skip to content

Resolve "Speed up processing"

David Maier requested to merge 29-speed-up-processing into develop

Closes #29

Write a faster processing function that integrates in stopos workflow and stays under the time-limit for the Cartesius express-queue (by not combining all data first before slicing it (into the exact same parts) again for processing)

The dataset I am testing this with took over 600 seconds to combine and then around 3600s to process, total 4200s (70min) for processing. The Goal is to bring this down at least one order of magnitude.

Update: I wrote a small function which skips the merging of the dataframes. However the results are not sufficient yet:

While I managed to completely get rid of the 600 seconds needed for combining the dataframes, the processing was sped up by only ~50%. Total processing time for same dataset ~1800s.

Update 2: Multiprocessing has now successfully been implemented.

On my local machine the speedup is another factor of 2 giving a total speed up of a bit over 4.
However, on a Cartesius node with upwards of 20 available CPUs the total speedup of the new function will be over factor 40, thus getting the time needed to process the data down from over one hour to about 1 minute. success

Edited by David Maier

Merge request reports