HI, I have a CSV file like to load to KDB
[user@server path]$ wc /data/file.csv
2785438 11014912 239200400 /data/file.csv
However, if I run this line to load the CSV:
count tt:tabColName xcol (“sssssissfdd”; enlist csv) 0:(`:/data/file.csv);
It is only loaded 467830 lines
Why that happen?
Regards,
Carfield
Hi Carfield,
It might be worth double checking the record count in the file, according to KDB, using read0, along the lines of:
q)count read0`:/data/file.csv
which will read in your file as plain text and return the number of new lines it contains. This should essentially return you your record count and would help identify if wc is mishandling newlines.
You could also use the following line to identify if there are any empty lines in your file:
q)group count each read0`:/data/file.csv
Which will return you a dictionary of row counts, banged against the number of times a row with that count appears in your CSV, which could be checked for a 0 key entry, i.e. an empty row count.
If the record count returned by this method is the same as returned by wc the issue must be lying elsewhere.
Let us know how you get on,
Joseph
Thanks a lot Joseph
Yes “q)count read0:/data/file.csv" does return the identical line count as wc return. However, I sure there is no empty line and for "q)group count each read0
:/data/file.csv” there is no 0 key entry, what else I can do to investigate the problem?
Regards,
Carfield
?inspect read0[`:/data/file.csv]?467829+til 2 for anything irregular
Thanks, I don’t really understand this, is 467829 a magic number? Anyway, it return same number as “count read0`:/data/file.csv”
467830 rows were loaded, so have a look at the lines in the file around that point: 467829+til 2
might you have carriage returns within quotes? the behavior of parsing this data has changed in recent versions. you can now control this with an optional third parameter to 0:
https://code.kx.com/wiki/Reference/ZeroColon#Examples:_4
if you are using an older version of kdb try a newer copy.
Thanks a lot, yes, it is a old version of KDB issue, there is an element in my CSV have a single ‘"’ ( double quote ) and this causing the CSV don’t load completely