partitioning

K4_Monk · April 4, 2011, 11:16am

?what does “partitioning by sym” really mean? how is this different from partitioning by date or time (if possible)?

trieder · April 4, 2011, 12:43pm

A quick answer is: they’re not the same thing. Column has `p attrib (parted by sym) and you use a date, month, year or int to partition your blocks of data (i.e. the folders in your hdb at the top level). There is also segmenting. You might want to read through:

https://code.kx.com/trac/wiki/KdbplusForMortals/contents

For more clarity.

K4_Monk · April 4, 2011, 1:32pm

thanks. It is still not clear to me why we have to partition in thefirst place. Maybe to speed up joins?does the physical structure on disk change if you partition by time?On Apr 4, 4:43?pm, Timothy Rieder <trie…> wrote:> A quick answer is: they’re not the same thing. Column has `p attrib (parted by sym) and you use a date, month, year or int to partition your blocks of data (i.e. the folders in your hdb at the top level). There is also segmenting. You might want to read through:>> https://code.kx.com/trac/wiki/KdbplusForMortals/contents>> For more clarity.>> On Apr 4, 2011, at 6:16 AM, K4 Monk <k4m…> wrote:>> > ?what does “partitioning by sym” really mean? how is this different from partitioning by date or time (if possible)?> > –> >

Submitted via Google Groups</k4m…></trie…>

trieder · April 4, 2011, 2:25pm

charset=us-ascii
X-Mailer: iPhone Mail (8F190)
In-Reply-To:
Message-Id: <304BCB64-2D0B-42CC-A0DB-96332F374F17@gmail.com>
Date: Mon, 4 Apr 2011 09:25:46 -0400
To: “personal-kdbplus@googlegroups.com”

Mime-Version: 1.0 (iPhone Mail 8F190)

Reduces the amount of disk io and memory usage. Would you rather map in a fe=
w gb or a few hundred gb?

Partitioning on time doesn’t really make sense, at least for anything I can t=
hink of…

Look at .Q.dpft and execute it twice and then look at the disk.

Aaron_Davies · April 5, 2011, 3:24am

On Apr 4, 2011, at 8:32 AM, K4 Monk wrote:

> thanks. It is still not clear to me why we have to partition in the
first place.

rdb data is pure time-ordered since that’s how it comes in from the
market

hdb data is usually sym-ordered (within a day) since that’s how it’s
most frequently needed–“give me all IBM trades for the last 3 months”
is a much more common query than “give me all trades between 10:00 and
11:00 for the last three months”

the `p attribute per se creates an index (in the sql sense, more or
less) on the column that lets q jump straight to the relevant block of
data

> does the physical structure on disk change if you partition by time?

there’s a (fairly recent, iirc) function .Q.dpt which doesn’t rearrange
the data at all, just writes it out to disk (as opposed to .Q.dpft,
which re-sorts it by the f arg (typically sym), applies p#, and then
writes)

compare tables written out by dpft and dpt to see the diff (ordering,
attributes)

btw the jargon is usually “part by time” since “partition” normally
refers to the directory structure that separates data from different
dates

Topic		Views
Partition HDB by date vs sym Community Support kdb-and-q	4	June 21, 2019
HDB - map sym to int for partitioning Community Support kdb-and-q	2	February 24, 2023
HDB - map sym to int for partitioning Community Support imported , kdb-and-q	2	February 24, 2023
Question about the performance difference of two queries on HDB Community Support kdb-and-q	7	January 28, 2016
Segmentation questions Community Support kdb-and-q	23	January 12, 2015

partitioning

Related topics