Most optimum way to search a column containing a list of values

smdesai1 · January 24, 2023, 9:25pm

Hello,
I have a table with ‘parent’ and ‘child’ columns containing an id. I have another column named ‘chain’ that is a list containing id’s of parents all the way to the root. This means that for a given row, the chain column contains the path to the root node.

Now what I’d like to do is find all rows whose chain column contains a particular id. I have the following:
select from table where id in/:chain

Is there a better way to achieve this? I’m looking to see if this can be improved either by redoing the query or a reorganization of the data.

rocuinneagain1 · January 25, 2023, 9:30am

Can you make up a sample table with a few rows of data and share the q code?

It’ll help people understand your use-case and make it easier for them to answer the question.

You can see details on how to include a code block:

https://community.kx.com/t5/Community-Blogs/Creating-q-code-block-with-syntax-highlighting-in-a-message/ba-p/11094

What datatype are you using for ids?
How many rows do you expect to be in the final table?
Will the table live on disk or in memory?
Will it receive a lot of updates or be mostly static?
What will the average length of a chain be?
You name the column ‘child’, so each node can only have one child? Or would ‘children’ make more sense? as a list?

One basic unoptimized example using .z.s self to get the parenting chain of each row

q)tree:( row:til 5;parent:0N 0 0 1 1) q)tree row parent ---------- 0 1 0 2 0 3 1 4 1 q)getChain:{[c;r] $[null p:tree[r]`parent;c,p;.z.s[c,p;p]]} q)getChain[();4] 1 0 0N q)update chain:getChain[()] each row from tree row parent chain ----------------- 0 ,0N 1 0 0 0N 2 0 0 0N 3 1 1 0 0N 4 1 1 0 0N

These links to existing resources may also be of use:

https://code.kx.com/q/ml/toolkit/graph/
https://code.kx.com/q/ml/toolkit/clustering/kdtree/

smdesai1 · January 25, 2023, 5:49pm

Thank you for the reply and links to resources. I’ll check those out later.

What I have is a tree where each node has an id and a node can contain children. A child can only have one parent. This data is time based (day granularity) so a tree contains changes on a daily basis. What I’d like to do is for a given day, find a subtree for a given id. That’s the basic problem I’m trying to solve.

To answer the questions:

The ids are symbols (the example below is simplified with integers)
The entire table right now is ~23M rows and a snapshot for a given day is ~80,000 rows
The table is stored on disk as a splayed table
The table receives updates on a daily basis
The chain as computed by the code below has length 11 for all nodes with repeating root id where the subtree ends early
Yes children would’ve been a better choice

/ This is what the table looks after after the chain is added id | date parent chain ------------------------------ 0 0 0 0 0 1 2023.01.23 0 0 0 0 2 2023.01.23 1 1 0 0 3 2023.01.23 2 2 1 0 4 2023.01.23 1 1 0 0 loadtable id xkey table filterTableByDateWithChain:{[d] t::select from table where date=d; t::update parent:t$parent from t; update chain:flip parent scan parent from t} t:filteredTableByDateWithChain 2023.01.23 subtree:select from t where id in/: chain

I’ve just started learning q and kdb a few weeks ago so still learning.

Topic		Replies	Views
Most optimum way to search a column containing a list of values Community Support imported , kdb-and-q	1	72	January 25, 2023
query for a list in a table Community Support kdb-and-q	7	28	February 18, 2017
Quering Complex Column Data Community Support kdb-and-q	4	8	August 6, 2009
Querying a column that holds a list Community Support kdb-and-q	0	11	March 16, 2011
defining trees, heaps in q Community Support kdb-and-q	7	8	March 5, 2018

KDB-X

KDB.AI

Insights Enterprise

OneTick Market Data

Delta

Dashboards

VS Code Extension

Accelerators

Most optimum way to search a column containing a list of values

KDB-X

Most optimum way to search a column containing a list of values

Related topics