Hello:
I’m reading the blog http://www.benfrederickson.com/distance-metrics/ and try to use kdb+/q 32 bit on linux to do some experiment.
after download and unzip the dataset from http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html,
q)t1: (“SSSI”;“\t”) 0: `$“:usersha1-artmbid-artname-plays.tsv”
q)t1
00000c289a1829a808ac09c00daf10bc3c4e223b 00000c289a1829a808ac09c00daf10bc3c4e..
3bd73256-3905-4f3a-97e2-8b341527f805 f2fb0ff0-5679-42ec-a55c-15109ce6e320..
betty blowtorch die Ärzte ..
2137 1099 ..
q)count t1 0
8668227
q)
but
> wc -l usersha1-artmbid-artname-plays.tsv
17559530 usersha1-artmbid-artname-plays.tsv
so 0: is not reading full data.
I found at least that user 42cf1f37b26b59c9960362939af89ff938ae7d9e is not in t but in tsv.
I also tried
grep 42cf1f37b26b59c9960362939af89ff938ae7d9e usersha1-artmbid-artname-plays.tsv > 1.tsv
and then load from 1.tsv with no problem.
any idea on what’s going on?
Thanks in advance.