Proposal for publishing a versioned openipf dataset snapshot on Kaggle

Hello OpenPowerlifting Team,

First of all, thank you for the fantastic work you do in maintaining and sharing such a comprehensive dataset. It's an incredible resource.

My name is Stefano Librizzi, a data science student from Italy. I recently completed a statistical analysis using your data to identify structural profiles of athletes within the IPF. The full project (R Markdown code and PDF report) is available in my GitHub repository here: https://github.com/StefanoLibrizzi/Structural-Profiles-in-Powerlifting

I used your official openipf dataset for my analysis, and I was very impressed to see that it is updated almost daily. This is amazing for keeping the data current, but it also presents a challenge for reproducibility on platforms like Kaggle.

My goal is to publish my analysis as a public, interactive notebook on Kaggle. For it to be scientifically valid and useful to others, it must be fully reproducible, meaning the code needs to run on the exact same data I used.

Therefore, before proceeding, I wanted to ask for your permission and guidance. My proposed solution is to publish a static, versioned snapshot of the openipf dataset on Kaggle (eg. openipf-dataset-snapshot-2025-10-25). This would create a permanent, citable data source that perfectly matches my analysis.

Of course, if you grant permission, I would:

  1. Clearly state in the dataset description that it is a static snapshot.
  2. Prominently credit OpenPowerlifting as the sole and official source.
  3. Provide direct links back to your main project and live dataset.

Please let me know if this approach works for you. If you have a different preferred method for citing and using your data on Kaggle, I would be grateful for your direction.

Thank you for your time and for considering this.

Best regards,

Stefano Librizzi