Skip to content

Division data is very dirty

Created by: ajvondrak

Thanks for gathering all this data! I got curious about doing some analysis on it (e.g., percentiles by various criteria). Don't know if you'll be able to use any of the code I'm writing, per se, but I hope I can give you some sort of input of value.

Circa my build of the latest data, there are 2,551 distinct divisions in the lifters data. I figured that this was down to the how ad hoc divisions are across federations (everybody gets a 🏆). There's something to be said for keeping the raw data from how the meet director decided to encode divisions. Still, there's a lot of normalization that could be applied, which would make analysis easier:

  • Many specify the sex of the lifter, which should ideally be given by the Sex column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'men' | wc -l
436
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'men' | head -10
13-15 Junior Men
13-15 Junior Women
13-15 Men
13-15 Teen Men
13-15 Women
148Submaster Women 35-39
16-17 Junior Men
16-17 Men
16-17 Teen Men
16-18 Junior Men
  • Some contain the weight class, which should ideally be given by the WeightClassKg column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i '198' 
198.25 DL 40-44
198.25 IM 35-39
198.25 IM 50-54
198.25 IM 65-69
198.25 RB 14-15
198.25 RB 20-23
198.25 RB 45-49
198.25 RB 50-54
198.25 RB OPEN
198.25 SB 40-44
Heavy group (181 198 ) Wilks formula
Heavywt group (181 198 )
Heavywt women-148-198
Heavywt women-148-198 by Wilks formula
Heavywt women-198
Lightwt group (181 198)
Lightwt group (198 220) by Wilks formula
Medium group (198 220 242) Wilks formula
Middlewt group (198 220) by Wilks formula
  • Many specify raw vs equipped, which should ideally be given by the Equipment column.
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep '\bR' | tail -10
R-T3
R-T-3
R-THW
R-T&J;
R-TJR
R-Var
R-Y
R-Y1
R-Y2
R-Y3
  • There are many ways of spelling the same "core" divisions that are common across federations. Formatting of all kinds plays into this (parentheses, capitalization, spacing, punctuation, etc).
[alex@pc openpowerlifting]$ cut -d',' -f3 build/openpowerlifting.csv | sort -u | grep -i 'sub.\?junior' | grep -v -i 'amateur'
Subjunior
SubJunior
Sub-Junior
Subjuniors
Sub-Juniors
  • So on and so forth. Normalization being the classic problem of data engineering and all.

Granted, there will still be some divisions that can't "cross over" between feds: people will put different age bounds on their various divisions (youth/teen/juniors/submasters/masters) that we couldn't pull strictly from the lifter's age, I spy some "Crossfit" divisions, yadda. But I think we can do a lot better than 2,551 distinct values.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information