Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
OpenPowerlifting
OpenPowerlifting
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
    • Locked Files
  • Issues 115
    • Issues 115
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 5
    • Merge Requests 5
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Registry
    • Registry
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • OpenPowerlifting Group
  • OpenPowerliftingOpenPowerlifting
  • Issues
  • #57

Closed
Open
Opened Dec 27, 2016 by Sean Stangl@sstangl
  • Report abuse
  • New issue
Report abuse New issue

Division data is very dirty

Created by: ajvondrak

Thanks for gathering all this data! I got curious about doing some analysis on it (e.g., percentiles by various criteria). Don't know if you'll be able to use any of the code I'm writing, per se, but I hope I can give you some sort of input of value.

Circa my build of the latest data, there are 2,551 distinct divisions in the lifters data. I figured that this was down to the how ad hoc divisions are across federations (everybody gets a 🏆). There's something to be said for keeping the raw data from how the meet director decided to encode divisions. Still, there's a lot of normalization that could be applied, which would make analysis easier:

  • Many specify the sex of the lifter, which should ideally be given by the Sex column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'men' | wc -l
436
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'men' | head -10
13-15 Junior Men
13-15 Junior Women
13-15 Men
13-15 Teen Men
13-15 Women
148Submaster Women 35-39
16-17 Junior Men
16-17 Men
16-17 Teen Men
16-18 Junior Men
  • Some contain the weight class, which should ideally be given by the WeightClassKg column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i '198' 
198.25 DL 40-44
198.25 IM 35-39
198.25 IM 50-54
198.25 IM 65-69
198.25 RB 14-15
198.25 RB 20-23
198.25 RB 45-49
198.25 RB 50-54
198.25 RB OPEN
198.25 SB 40-44
Heavy group (181 198 ) Wilks formula
Heavywt group (181 198 )
Heavywt women-148-198
Heavywt women-148-198 by Wilks formula
Heavywt women-198
Lightwt group (181 198)
Lightwt group (198 220) by Wilks formula
Medium group (198 220 242) Wilks formula
Middlewt group (198 220) by Wilks formula
  • Many specify raw vs equipped, which should ideally be given by the Equipment column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep '\bR' | tail -10
R-T3
R-T-3
R-THW
R-T&J;
R-TJR
R-Var
R-Y
R-Y1
R-Y2
R-Y3
  • There are many ways of spelling the same "core" divisions that are common across federations. Formatting of all kinds plays into this (parentheses, capitalization, spacing, punctuation, etc).
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'sub.\?junior' | grep -v -i 'amateur'
Subjunior
SubJunior
Sub-Junior
Subjuniors
Sub-Juniors
  • So on and so forth. Normalization being the classic problem of data engineering and all.

Granted, there will still be some divisions that can't "cross over" between feds: people will put different age bounds on their various divisions (youth/teen/juniors/submasters/masters) that we couldn't pull strictly from the lifter's age, I spy some "Crossfit" divisions, yadda. But I think we can do a lot better than 2,551 distinct values.

Related issues

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
No due date
4
Labels
data error enhancement good first issue lifter-request
Assign labels
  • View project labels
Reference: openpowerlifting/opl-data#57