Proposal for publishing a versioned openipf dataset snapshot on Kaggle
Hello OpenPowerlifting Team,
First of all, thank you for the fantastic work you do in maintaining and sharing such a comprehensive dataset. It's an incredible resource.
My name is Stefano Librizzi, a data science student from Italy. I recently completed a statistical analysis using your data to identify structural profiles of athletes within the IPF. The full project (R Markdown code and PDF report) is available in my GitHub repository here: https://github.com/StefanoLibrizzi/Structural-Profiles-in-Powerlifting
I used your official openipf dataset for my analysis, and I was very impressed to see that it is updated almost daily. This is amazing for keeping the data current, but it also presents a challenge for reproducibility on platforms like Kaggle.
My goal is to publish my analysis as a public, interactive notebook on Kaggle. For it to be scientifically valid and useful to others, it must be fully reproducible, meaning the code needs to run on the exact same data I used.
Therefore, before proceeding, I wanted to ask for your permission and guidance. My proposed solution is to publish a static, versioned snapshot of the openipf dataset on Kaggle (eg. openipf-dataset-snapshot-2025-10-25). This would create a permanent, citable data source that perfectly matches my analysis.
Of course, if you grant permission, I would:
- Clearly state in the dataset description that it is a static snapshot.
- Prominently credit OpenPowerlifting as the sole and official source.
- Provide direct links back to your main project and live dataset.
Please let me know if this approach works for you. If you have a different preferred method for citing and using your data on Kaggle, I would be grateful for your direction.
Thank you for your time and for considering this.
Best regards,
Stefano Librizzi