Investigate performance for GPX data collection
Most of the time for the upload is spent gathering data from the GPX files (such as distance traveled, max speed, ...) with gpxpy
. The gpxpy
repository states1:
If lxml is available, then it will be used for XML parsing, otherwise minidom is used. Lxml is 2-3 times faster so, if you can choose -- use it.
In addition to using lxml
, using pypy
can speed up the pure Python computations.
However, some quick tests show that the truth might not be as simple. pytest
reports the following running times:
# Pypy, no lxml
platform linux -- Python 3.8.13[pypy-7.3.9-final], pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/pypy3/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items
tests/integration/test_browse.py .. [ 7%]
tests/integration/test_login.py .... [ 23%]
tests/integration/test_smoke.py . [ 26%]
tests/integration/test_upload.py ... [ 38%]
tests/unit/test_util.py .............. [ 92%]
tests/unit/views/test_browse.py .. [100%]
============================================================================================================= 26 passed in 46.29s =============================================================================================================
# CPython, no lxml
platform linux -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/python/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items
tests/integration/test_browse.py .. [ 7%]
tests/integration/test_login.py .... [ 23%]
tests/integration/test_smoke.py . [ 26%]
tests/integration/test_upload.py ... [ 38%]
tests/unit/test_util.py .............. [ 92%]
tests/unit/views/test_browse.py .. [100%]
============================================================================================================== warnings summary ===============================================================================================================
tests/integration/test_browse.py: 25 warnings
<frozen importlib._bootstrap_external>:572: DeprecationWarning: find_module() is deprecated and slated for removal in Python 3.12; use find_spec() instead
tests/integration/test_browse.py: 25 warnings
<frozen importlib._bootstrap_external>:1523: DeprecationWarning: FileFinder.find_loader() is deprecated and slated for removal in Python 3.12; use find_spec() instead
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 26 passed, 50 warnings in 56.38s =======================================================================================================
# Pypy, with lxml
platform linux -- Python 3.8.13[pypy-7.3.9-final], pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/pypy3/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items
tests/integration/test_browse.py .. [ 7%]
tests/integration/test_login.py .... [ 23%]
tests/integration/test_smoke.py . [ 26%]
tests/integration/test_upload.py ... [ 38%]
tests/unit/test_util.py .............. [ 92%]
tests/unit/views/test_browse.py .. [100%]
======================================================================================================= 26 passed in 351.17s (0:05:51) ========================================================================================================
# CPython, with lxml
platform linux -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/python/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items
tests/integration/test_browse.py .. [ 7%]
tests/integration/test_login.py .... [ 23%]
tests/integration/test_smoke.py . [ 26%]
tests/integration/test_upload.py ... [ 38%]
tests/unit/test_util.py .............. [ 92%]
tests/unit/views/test_browse.py .. [100%]
============================================================================================================== warnings summary ===============================================================================================================
tests/integration/test_browse.py: 25 warnings
<frozen importlib._bootstrap_external>:572: DeprecationWarning: find_module() is deprecated and slated for removal in Python 3.12; use find_spec() instead
tests/integration/test_browse.py: 25 warnings
<frozen importlib._bootstrap_external>:1523: DeprecationWarning: FileFinder.find_loader() is deprecated and slated for removal in Python 3.12; use find_spec() instead
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================= 26 passed, 50 warnings in 70.04s (0:01:10) ==================================================================================================
The combination of pypy
and lxml
is by far the slowest in the tests, while even CPython without lxml
is faster than CPython with lxml
. Of course, those tests don't represent a proper benchmark and therefore should be taken with a grain of salt, but the difference does look quite stark.
It is probably worth investigating where the slowdown comes from and how the situation could be improved. For now, I think it is worth testing the application on pypy
since that seems to provide the best performance without any code changes (yet), allowing users to run fietsboek
with pypy
. For the future, we might think about a Rust based extension to do the GPX computations for us, if the speed really does become a concern.