Skip to content

Auto-disambiguation for THSPA lifters

Laura Rettig requested to merge voidness/opl-data:thspa-disambig into main

Problem: Many common names in THSPA data with lots of meets. Manual disambiguation is tedious.

Opportunities: THSPA records are all annotated with team/school. Typically a name+school combo is unique and can be used for disambiguating the lifter.

Considerations: Shared names between lifters in THSPA and other federations. Request might be made by someone outside of THSPA, so #1 may have to be assigned to someone not in the directory.

Approach: A lifter requests to have their records separated from other lifters sharing the same name. For a given name we want to disambiguate all THSPA lifters of that name by assigning them #s.

  • parameter for starting # : if manual disambiguations for n lifters outside of THSPA have been made before running this script, pass the parameter for the starting # as n+1
  • print out lifters and their meets for manual common-sense verification, and also to see how many distinct lifters were disambiguated for further non-THSPA manual disambiguation and for updating lifter-data/name-disambiguation.csv to the correct number of total profiles under this name

Limitations: If the requester is within THSPA and it is desired to give them #1, this would have to be done manually before running the script as the # assigned cannot be guaranteed.

Further development: Currently, name + school is assumed to be unique. Common-sense verification needs to be done manually by viewing the assigned groupings. Given that lifters are in HS for a limited number of years we could place boundaries on how long meets are allowed to be apart. We could also use bodyweight if two lifters under the same name at the same school but with vastly different bodyweights competed during the same period.

Merge request reports