incomplete data load with 0:

denghao8888 · May 10, 2016, 6:25pm

Hello:

I’m reading the blog http://www.benfrederickson.com/distance-metrics/ and try to use kdb+/q 32 bit on linux to do some experiment.

after download and unzip the dataset from http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html,

q)t1: (“SSSI”;“\t”) 0: `$“:usersha1-artmbid-artname-plays.tsv”

q)t1

00000c289a1829a808ac09c00daf10bc3c4e223b 00000c289a1829a808ac09c00daf10bc3c4e..

3bd73256-3905-4f3a-97e2-8b341527f805 f2fb0ff0-5679-42ec-a55c-15109ce6e320..

betty blowtorch die Ärzte ..

2137 1099 ..

q)count t1 0

8668227

q)

but

> wc -l usersha1-artmbid-artname-plays.tsv

17559530 usersha1-artmbid-artname-plays.tsv

so 0: is not reading full data.

I found at least that user 42cf1f37b26b59c9960362939af89ff938ae7d9e is not in t but in tsv.

I also tried

grep 42cf1f37b26b59c9960362939af89ff938ae7d9e usersha1-artmbid-artname-plays.tsv > 1.tsv

and then load from 1.tsv with no problem.

any idea on what’s going on?

Thanks in advance.

david_demner · May 10, 2016, 6:30pm

Its a pretty large text file. You might have more luck using .Q.fs as detailed in: http://code.kx.com/wiki/Cookbook/LoadingFromLargeFiles<o:p></o:p>

<o:p> </o:p>

From: personal-kdbplus@googlegroups.com [mailto:personal-kdbplus@googlegroups.com] On Behalf Of Hao Deng
Sent: Tuesday, May 10, 2016 10:25 AM
To: Kdb+ Personal Developers <personal-kdbplus@googlegroups.com>
Subject: [personal kdb+] incomplete data load with 0:<o:p></o:p>

<o:p> </o:p>

Hello:<o:p></o:p>

I’m reading the blog http://www.benfrederickson.com/distance-metrics/ and try to use kdb+/q 32 bit on linux to do some experiment.<o:p></o:p>

after download and unzip the dataset from http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html,<o:p></o:p>

<o:p> </o:p>

q)t1: (“SSSI”;“\t”) 0: `$“:usersha1-artmbid-artname-plays.tsv”<o:p></o:p>

q)t1<o:p></o:p>

00000c289a1829a808ac09c00daf10bc3c4e223b 00000c289a1829a808ac09c00daf10bc3c4e..<o:p></o:p>

3bd73256-3905-4f3a-97e2-8b341527f805 f2fb0ff0-5679-42ec-a55c-15109ce6e320..<o:p></o:p>

betty blowtorch die Ärzte ..<o:p></o:p>

2137 1099 ..<o:p></o:p>

q)count t1 0<o:p></o:p>

8668227<o:p></o:p>

q)<o:p></o:p>

<o:p> </o:p>

but<o:p></o:p>

> wc -l usersha1-artmbid-artname-plays.tsv <o:p></o:p>

17559530 usersha1-artmbid-artname-plays.tsv<o:p></o:p>

<o:p> </o:p>

so 0: is not reading full data.<o:p></o:p>

<o:p> </o:p>

I found at least that user 42cf1f37b26b59c9960362939af89ff938ae7d9e is not in t but in tsv.<o:p></o:p>

I also tried<o:p></o:p>

grep 42cf1f37b26b59c9960362939af89ff938ae7d9e usersha1-artmbid-artname-plays.tsv > 1.tsv<o:p></o:p>

and then load from 1.tsv with no problem.<o:p></o:p>

<o:p> </o:p>

any idea on what’s going on?<o:p></o:p>

<o:p> </o:p>

Thanks in advance.<o:p></o:p>

<o:p> </o:p>

–
Submitted via Google Groups

charlie1 · May 10, 2016, 7:14pm

maybe there are unmatched quotes?
can remove them with

$cat usersha1-artmbid-artname-plays.tsv | tr “"” “'” > new.tsv

and then load new.tsv.

kdb+3.3 is sensitive to unmatched double-quotes

denghao8888 · May 10, 2016, 7:27pm

@charles: you are right. it works.

q)t1: usermbidartnameplays!(“SSSI”;“\t”)0: `$“:new.tsv”

q)count t1 `user

17559530

wc -l usersha1-artmbid-artname-plays.tsv

17559530 usersha1-artmbid-artname-plays.tsv

denghao8888 · May 10, 2016, 7:29pm

Is it possible to change 0: function to raise an error or display an error message?
also is it possible to fix this double quote bug?

charlie1 · May 10, 2016, 7:54pm

it’s a side-effect of allowing line returns inside quoted fields.
the behavior becomes configurable in the forthcoming 3.4.

Topic		Replies	Views
There is issue of loading a big CSV file? Community Support kdb-and-q	7	34	May 16, 2018
Loading .dat files Community Support kdb-and-q	1	4	May 22, 2015
How to avoid "wsfull" error by process small part of data each time? Community Support kdb-and-q	0	9	April 26, 2011
unable to load a *.q file [0] (<load>) Community Support kdb-and-q	1	12	December 6, 2017
bug in loader.q for loading large CSV files Community Support kdb-and-q	0	5	December 30, 2017

incomplete data load with 0:

Related topics