For our benchmark we are using a small subset of the dataset songs.csv. It was generated with this command:
xsv sample --seed 42 song.csv -o smol_songs.csv
The original songs.csv datasets is available here