Can't save to a splayed table using "path set tablename"

Hi folks, 

I’m fairly new to Kdb+ and am having some trouble saving data to a local splayed table.

I’ve written a C# application to crawl file system entries, generate some simple data, and then save that into an in memory table. Works a treat and handles about 500,000 items in about 30 seconds.

I can’t seem to save my table to disk though, I keep getting the following error/information returned.


Welcome to kdb+ 32bit edition

For support please see http://groups.google.com/d/forum/personal-kdbplus

Tutorials can be found at http://code.kx.com/wiki/Tutorials

To exit, type \

To remove this startup msg, edit q.q

q)`:/q/Databases/FileCrawl/fileinformation/ set fileinformation

k){$[@x;.[x;();:;y];-19!((,y),x)]}

'type

q.q))


I have no idea what to make of the highlighted line. If I change my path to end with fileinformation.bat then it saves a file to disk no bother.


I think this might be due to the fact that I have a lot of string (symbol) data in my table… and I am not enumerating it, as I don’t really know how. But I’m not an expert, so I am not sure if that is the issue.

My data is created using similar code to the AquaQ demo app, using kx.c.Flip for column names and data.

Should I be doing something earlier on during table creation or can I proceed as I am now, but just need to do something else before saving to disk? 

Many thanks for any help.

Regards, 

Mark

Hi Mark,

you need to enumerate the symbols.

q):/q/Databases/FileCrawl/fileinformation/ set .Q.en[:.;fileinformation]

the first arg to .Q.en is the path to the enum target file, sym.

Hi, 

yeah I have got it working mere minutes after posting… when the table is in memory I just did this:

q)finfoenum: .Q.en[`:/q/Databases/Filecrawl/] fileinformation

q)`:/q/Databases/FileCrawl/fileinformation/ set finfoenum

`:/q/Databases/FileCrawl/fileinformation/


and have the sym file at the root of “FileCrawl” and then the table data in /fileinformation/


Can I just ask if that is the best way to structure my data? I haven’t got around to pulling it back from the disk yet. However the database I am going to create will have multiple tables. I am going to treat “FileCrawl” as the “Database” and then each sub folder as a table. Should I be moving the sym files elsewhere so they aren’t going to overwrite eachother… or can one sym file handle multiple tables?


Regards,


Mark


Hi Mark

One sym file can be used to handle lots of tables.  However, symbols shouldn’t be used for non or infrequently repeating values.  Good examples for symbol types would be ticker code, exchange id, bad examples would be orderid or any sort of free text field.  If you end up with a large sym file then things start to slow down. 

Thanks 

Jonny

How can I specify what should and shouldn’t be in the sym file?

Can I do that on a per field basis or should the data be more normalised?

I couldn’t create a splayed table without enumerating it, so surely I have to create a sym file for that table regardless?

Sent from my iPhone

Any data stored as type symbol ends up in the sym file.  If you don’t want it to go in the sym file store it as another type e.g. character array

You don’t have to enumerate the table if it doesn’t have any columns of type symbol in it. 

I wouldn’t necessarily advise it, but you can also enumerate different columns to different symbol files by writing a custom version of .Q.en.  When saved on disk the data is self describing (i.e. the file on disk knows which file it is enumerated against) so you only have to worry about the write down part and kdb+ will take care of the reading part.  I would stick to vanilla though unless you have a good reason.

Ah right,

That’s interesting. Would you generally store things in char arrays? I’m primarily dealing with file system information > 700,000,000 rows at times but essentially every row will have an individual file path.

Is a string/symbol not in essence a char array? Is it possible to store that differently?

Sent from my iPhone

A symbol is a special type - they are enumerated both on disk and in-memory (making comparisons faster)

http://code.kx.com/wiki/JB:KdbplusForMortals/splayed_tables#1.2.7.3_Symbols_vs._Strings

As it says in the book, it’s much easier to store as char arrays and then make them symbols rather than the other way around.  In your case, the file paths should be strings probably, but maybe you have other id fields which might be a candidate for symbol (host? mount? file type?)

Things like type are often repeated so should probably be enumerated into a sym file.

But if I can store them as char arrays and achieve a relatively similar performance over huge data sets that would be good.

Think I need to do some more reading on when is best to use each type.

Many thanks for the insight, really appreciate it.

Sent from my iPhone

symbols will usually give better performance - it’s easy to test: 

/ save two tables with different types

Jonny-MacBook-Pro-2:~ jonny$ q

KDB+ 3.2 2014.09.29 Copyright (C) 1993-2014 Kx Systems

q)types:txtcsvpdfgz`zip                                                                                                                                                             

q)n:10000000                                                                                                                                                                            

q)t:(t:n?types)                                                                                                                                                                       

q):hdb/t1/ set .Q.en[:hdb;t]                                                                                                                                                          

`:hdb/t1/

q)`:hdb/t2/ set update string t from t                                                                                                                                                  

`:hdb/t2/

q)\                                                                                                                                                                                    

/- check the size

Jonny-MacBook-Pro-2:~ jonny$ du -sh hdb/*

4.0Khdb/sym

 38Mhdb/t1

103Mhdb/t2

/- query times                                                                                                                                                                       

Jonny-MacBook-Pro-2:~ jonny$ q hdb

KDB+ 3.2 2014.09.29 Copyright (C) 1993-2014 Kx Systems

q)\t select from t1 where t=`csv                                                                                                                                                        

95

q)\t select from t2 where t like “csv”                                                                                                                                                  

1037

q)\t select from t1 where t in csvpdf                                                                                                                                                 

149

q)\t select from t2 where t in (“csv”;“pdf”)                                                                                                                                            

1523

/- sort it and test with attributes:

q)t xasc :t1                                                                                                                                                                          

`:t1

q)t xasc :t2                                                                                                                                                                          

`:t2

q)\l .                                                                                                                                                                                  

q)meta t1                                                                                                                                                                               

c| t f a

-| -----

t| s   s

q)meta t2                                                                                                                                                                               

c| t f a

-| -----

t| C   s

q)\t select from t1 where t=`csv                                                                                                                                                        

9

q)\t select from t2 where t like “csv”                                                                                                                                                  

804