Skip to content

GitLab

    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
    • Menu
    Projects Groups Snippets
  • Sign up now
  • Login
  • Sign in / Register
  • OpenPowerlifting OpenPowerlifting
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 147
    • Issues 147
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • OpenPowerlifting Group
  • OpenPowerliftingOpenPowerlifting
  • Issues
  • #57
Closed
Open
Created Dec 27, 2016 by Sean Stangl@sstanglOwner

Division data is very dirty

Created by: ajvondrak

Thanks for gathering all this data! I got curious about doing some analysis on it (e.g., percentiles by various criteria). Don't know if you'll be able to use any of the code I'm writing, per se, but I hope I can give you some sort of input of value.

Circa my build of the latest data, there are 2,551 distinct divisions in the lifters data. I figured that this was down to the how ad hoc divisions are across federations (everybody gets a 🏆). There's something to be said for keeping the raw data from how the meet director decided to encode divisions. Still, there's a lot of normalization that could be applied, which would make analysis easier:

  • Many specify the sex of the lifter, which should ideally be given by the Sex column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'men' | wc -l
436
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'men' | head -10
13-15 Junior Men
13-15 Junior Women
13-15 Men
13-15 Teen Men
13-15 Women
148Submaster Women 35-39
16-17 Junior Men
16-17 Men
16-17 Teen Men
16-18 Junior Men
  • Some contain the weight class, which should ideally be given by the WeightClassKg column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i '198' 
198.25 DL 40-44
198.25 IM 35-39
198.25 IM 50-54
198.25 IM 65-69
198.25 RB 14-15
198.25 RB 20-23
198.25 RB 45-49
198.25 RB 50-54
198.25 RB OPEN
198.25 SB 40-44
Heavy group (181 198 ) Wilks formula
Heavywt group (181 198 )
Heavywt women-148-198
Heavywt women-148-198 by Wilks formula
Heavywt women-198
Lightwt group (181 198)
Lightwt group (198 220) by Wilks formula
Medium group (198 220 242) Wilks formula
Middlewt group (198 220) by Wilks formula
  • Many specify raw vs equipped, which should ideally be given by the Equipment column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep '\bR' | tail -10
R-T3
R-T-3
R-THW
R-T&J;
R-TJR
R-Var
R-Y
R-Y1
R-Y2
R-Y3
  • There are many ways of spelling the same "core" divisions that are common across federations. Formatting of all kinds plays into this (parentheses, capitalization, spacing, punctuation, etc).
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'sub.\?junior' | grep -v -i 'amateur'
Subjunior
SubJunior
Sub-Junior
Subjuniors
Sub-Juniors
  • So on and so forth. Normalization being the classic problem of data engineering and all.

Granted, there will still be some divisions that can't "cross over" between feds: people will put different age bounds on their various divisions (youth/teen/juniors/submasters/masters) that we couldn't pull strictly from the lifter's age, I spy some "Crossfit" divisions, yadda. But I think we can do a lot better than 2,551 distinct values.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking