Skip to content

Investigate performance for GPX data collection

Most of the time for the upload is spent gathering data from the GPX files (such as distance traveled, max speed, ...) with gpxpy. The gpxpy repository states1:

If lxml is available, then it will be used for XML parsing, otherwise minidom is used. Lxml is 2-3 times faster so, if you can choose -- use it.

In addition to using lxml, using pypy can speed up the pure Python computations.

However, some quick tests show that the truth might not be as simple. pytest reports the following running times:

# Pypy, no lxml
platform linux -- Python 3.8.13[pypy-7.3.9-final], pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/pypy3/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items                                                                                                                                                                                                                            

tests/integration/test_browse.py ..                                                                                                                                                                                                     [  7%]
tests/integration/test_login.py ....                                                                                                                                                                                                    [ 23%]
tests/integration/test_smoke.py .                                                                                                                                                                                                       [ 26%]
tests/integration/test_upload.py ...                                                                                                                                                                                                    [ 38%]
tests/unit/test_util.py ..............                                                                                                                                                                                                  [ 92%]
tests/unit/views/test_browse.py ..                                                                                                                                                                                                      [100%]

============================================================================================================= 26 passed in 46.29s =============================================================================================================

# CPython, no lxml
platform linux -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/python/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items                                                                                                                                                                                                                            

tests/integration/test_browse.py ..                                                                                                                                                                                                     [  7%]
tests/integration/test_login.py ....                                                                                                                                                                                                    [ 23%]
tests/integration/test_smoke.py .                                                                                                                                                                                                       [ 26%]
tests/integration/test_upload.py ...                                                                                                                                                                                                    [ 38%]
tests/unit/test_util.py ..............                                                                                                                                                                                                  [ 92%]
tests/unit/views/test_browse.py ..                                                                                                                                                                                                      [100%]

============================================================================================================== warnings summary ===============================================================================================================
tests/integration/test_browse.py: 25 warnings
  <frozen importlib._bootstrap_external>:572: DeprecationWarning: find_module() is deprecated and slated for removal in Python 3.12; use find_spec() instead

tests/integration/test_browse.py: 25 warnings
  <frozen importlib._bootstrap_external>:1523: DeprecationWarning: FileFinder.find_loader() is deprecated and slated for removal in Python 3.12; use find_spec() instead

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 26 passed, 50 warnings in 56.38s =======================================================================================================

# Pypy, with lxml
platform linux -- Python 3.8.13[pypy-7.3.9-final], pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/pypy3/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items                                                                                                                                                                                                                            

tests/integration/test_browse.py ..                                                                                                                                                                                                     [  7%]
tests/integration/test_login.py ....                                                                                                                                                                                                    [ 23%]
tests/integration/test_smoke.py .                                                                                                                                                                                                       [ 26%]
tests/integration/test_upload.py ...                                                                                                                                                                                                    [ 38%]
tests/unit/test_util.py ..............                                                                                                                                                                                                  [ 92%]
tests/unit/views/test_browse.py ..                                                                                                                                                                                                      [100%]

======================================================================================================= 26 passed in 351.17s (0:05:51) ========================================================================================================

# CPython, with lxml
platform linux -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/python/.pytest_cache
rootdir: /mnt/home/daniel/Coding/fietsboek, configfile: pytest.ini, testpaths: fietsboek, tests
plugins: cov-3.0.0
collected 26 items                                                                                                                                                                                                                            

tests/integration/test_browse.py ..                                                                                                                                                                                                     [  7%]
tests/integration/test_login.py ....                                                                                                                                                                                                    [ 23%]
tests/integration/test_smoke.py .                                                                                                                                                                                                       [ 26%]
tests/integration/test_upload.py ...                                                                                                                                                                                                    [ 38%]
tests/unit/test_util.py ..............                                                                                                                                                                                                  [ 92%]
tests/unit/views/test_browse.py ..                                                                                                                                                                                                      [100%]

============================================================================================================== warnings summary ===============================================================================================================
tests/integration/test_browse.py: 25 warnings
  <frozen importlib._bootstrap_external>:572: DeprecationWarning: find_module() is deprecated and slated for removal in Python 3.12; use find_spec() instead

tests/integration/test_browse.py: 25 warnings
  <frozen importlib._bootstrap_external>:1523: DeprecationWarning: FileFinder.find_loader() is deprecated and slated for removal in Python 3.12; use find_spec() instead

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================= 26 passed, 50 warnings in 70.04s (0:01:10) ==================================================================================================

The combination of pypy and lxml is by far the slowest in the tests, while even CPython without lxml is faster than CPython with lxml. Of course, those tests don't represent a proper benchmark and therefore should be taken with a grain of salt, but the difference does look quite stark.

It is probably worth investigating where the slowdown comes from and how the situation could be improved. For now, I think it is worth testing the application on pypy since that seems to provide the best performance without any code changes (yet), allowing users to run fietsboek with pypy. For the future, we might think about a Rust based extension to do the GPX computations for us, if the speed really does become a concern.