?what does “partitioning by sym” really mean? how is this different from partitioning by date or time (if possible)?
A quick answer is: they’re not the same thing. Column has `p attrib (parted by sym) and you use a date, month, year or int to partition your blocks of data (i.e. the folders in your hdb at the top level). There is also segmenting. You might want to read through:
https://code.kx.com/trac/wiki/KdbplusForMortals/contents
For more clarity.
thanks. It is still not clear to me why we have to partition in thefirst place. Maybe to speed up joins?does the physical structure on disk change if you partition by time?On Apr 4, 4:43?pm, Timothy Rieder <trie…> wrote:> A quick answer is: they’re not the same thing. Column has `p attrib (parted by sym) and you use a date, month, year or int to partition your blocks of data (i.e. the folders in your hdb at the top level). There is also segmenting. You might want to read through:>> https://code.kx.com/trac/wiki/KdbplusForMortals/contents>> For more clarity.>> On Apr 4, 2011, at 6:16 AM, K4 Monk <k4m…> wrote:>> > ?what does “partitioning by sym” really mean? how is this different from partitioning by date or time (if possible)?> > –> >
Submitted via Google Groups</k4m…></trie…>
charset=us-ascii
X-Mailer: iPhone Mail (8F190)
In-Reply-To:
Message-Id: <304BCB64-2D0B-42CC-A0DB-96332F374F17@gmail.com>
Date: Mon, 4 Apr 2011 09:25:46 -0400
To: “personal-kdbplus@googlegroups.com”
Mime-Version: 1.0 (iPhone Mail 8F190)
Reduces the amount of disk io and memory usage. Would you rather map in a fe=
w gb or a few hundred gb?
Partitioning on time doesn’t really make sense, at least for anything I can t=
hink of…
Look at .Q.dpft and execute it twice and then look at the disk.
On Apr 4, 2011, at 8:32 AM, K4 Monk wrote:
> thanks. It is still not clear to me why we have to partition in the
first place.
rdb data is pure time-ordered since that’s how it comes in from the
market
hdb data is usually sym-ordered (within a day) since that’s how it’s
most frequently needed–“give me all IBM trades for the last 3 months”
is a much more common query than “give me all trades between 10:00 and
11:00 for the last three months”
the `p attribute per se creates an index (in the sql sense, more or
less) on the column that lets q jump straight to the relevant block of
data
> does the physical structure on disk change if you partition by time?
there’s a (fairly recent, iirc) function .Q.dpt which doesn’t rearrange
the data at all, just writes it out to disk (as opposed to .Q.dpft,
which re-sorts it by the f arg (typically sym), applies
p#, and then
writes)
compare tables written out by dpft and dpt to see the diff (ordering,
attributes)
btw the jargon is usually “part by time” since “partition” normally
refers to the directory structure that separates data from different
dates