Deal with messy listenbrainz history
In a special mix playlist for the year 2014, I found the song "The Toasters - Pool Shark" which has first recorded listen in 2006.
select datetime(listened_at, 'unixepoch'), * from imports_listenbrainz where track_name == "Pool Shark";
datetime(listened_at, 'unixepoch')|id|listened_at|artist_msid|recording_msid|release_msid|track_name|artist_name|release_name|tracknumber|recording_mbid|artist_mbids|release_mbid|release_group_mbid|work_mbids|origin_url
2020-01-29 20:32:55|10266|1580329975|227e13df-a495-450c-af25-be34f4611adf|d4912499-8727-48f4-8e27-84459d6bdc71|245da0bd-01b4-4a0b-8055-c72e8e117751|Pool Shark|The Toasters|Frankenska|||||||
2017-02-25 21:38:12|20040|1488058692|227e13df-a495-450c-af25-be34f4611adf|d4912499-8727-48f4-8e27-84459d6bdc71|245da0bd-01b4-4a0b-8055-c72e8e117751|Pool Shark|The Toasters|Frankenska|||||||
2014-08-28 14:17:51|26957|1409235471|227e13df-a495-450c-af25-be34f4611adf|f347ce39-8d98-4989-a811-2b930ff13422|9a66b568-a4df-4f9b-b9d6-554470d68be5|Pool Shark|The Toasters|Pool Shark|||||||
2014-08-28 13:36:18|26958|1409232978|227e13df-a495-450c-af25-be34f4611adf|79f5350f-521a-4ca9-a315-6a31ef3346ef|619c813b-cda9-4586-a45e-7370bbe8f014|Pool Shark|The Toasters|Skaboom!|||||||
...
2006-01-29 23:24:48|115451|1138577088|227e13df-a495-450c-af25-be34f4611adf|79f5350f-521a-4ca9-a315-6a31ef3346ef|619c813b-cda9-4586-a45e-7370bbe8f014|Pool Shark|The Toasters|Skaboom!|||||||
The issue: there is a listen from 2014-08-28 which has incorrect album info, and the recording_msid
field is different to other instances of the song "Pool Shark".
The calliope.listenbrainz.History.tracks() method groups query results by recording_msid, so this is treated as a new song that was first listened to in 2014.
Possible fixes:
- clean up the data in Listenbrainz
- listenbrainz should be able to resolve MSIDs to MBIDs... right? but seems it does not.
- there's no API call that could allow us to do this ourselves, either
- double check the data in special_mix
- for every song, search for listens where track and artist are the same, and first listen is older.
- change History.tracks() query:
- avoid recording_msid, group tracks by
artist_name + track_name
instead -> this is what last.fm history already does
- avoid recording_msid, group tracks by