Hello,
I have a table with ‘parent’ and ‘child’ columns containing an id. I have another column named ‘chain’ that is a list containing id’s of parents all the way to the root. This means that for a given row, the chain column contains the path to the root node.
Now what I’d like to do is find all rows whose chain column contains a particular id. I have the following:
select from table where id in/:chain
Is there a better way to achieve this? I’m looking to see if this can be improved either by redoing the query or a reorganization of the data.
Can you make up a sample table with a few rows of data and share the q code?
It’ll help people understand your use-case and make it easier for them to answer the question.
You can see details on how to include a code block:
https://community.kx.com/t5/Community-Blogs/Creating-q-code-block-with-syntax-highlighting-in-a-message/ba-p/11094
-
What datatype are you using for ids?
-
How many rows do you expect to be in the final table?
-
Will the table live on disk or in memory?
-
Will it receive a lot of updates or be mostly static?
-
What will the average length of a chain be?
-
You name the column ‘child’, so each node can only have one child? Or would ‘children’ make more sense? as a list?
One basic unoptimized example using .z.s self to get the parenting chain of each row
q)tree:( row:til 5;parent:0N 0 0 1 1) q)tree row parent ---------- 0 1 0 2 0 3 1 4 1 q)getChain:{[c;r] $[null p:tree[r]`parent;c,p;.z.s[c,p;p]]} q)getChain[();4] 1 0 0N q)update chain:getChain[()] each row from tree row parent chain ----------------- 0 ,0N 1 0 0 0N 2 0 0 0N 3 1 1 0 0N 4 1 1 0 0N
These links to existing resources may also be of use:
Thank you for the reply and links to resources. I’ll check those out later.
What I have is a tree where each node has an id and a node can contain children. A child can only have one parent. This data is time based (day granularity) so a tree contains changes on a daily basis. What I’d like to do is for a given day, find a subtree for a given id. That’s the basic problem I’m trying to solve.
To answer the questions:
- The ids are symbols (the example below is simplified with integers)
- The entire table right now is ~23M rows and a snapshot for a given day is ~80,000 rows
- The table is stored on disk as a splayed table
- The table receives updates on a daily basis
- The chain as computed by the code below has length 11 for all nodes with repeating root id where the subtree ends early
- Yes children would’ve been a better choice
/ This is what the table looks after after the chain is added id | date parent chain ------------------------------ 0 0 0 0 0 1 2023.01.23 0 0 0 0 2 2023.01.23 1 1 0 0 3 2023.01.23 2 2 1 0 4 2023.01.23 1 1 0 0 loadtable
id xkey table filterTableByDateWithChain:{[d] t::select from table where date=d; t::update parent:
t$parent from t; update chain:flip parent scan parent from t} t:filteredTableByDateWithChain 2023.01.23 subtree:select from t where id in/: chain
I’ve just started learning q and kdb a few weeks ago so still learning.