Merge joins and other performance related questions

sannysanoff1 · September 27, 2018, 9:25pm

I often find myself in need to join two tables which are sorted and unkeyed.

Do i understand correctly, there’s no O(n) operation to do it? Merge join would do well, but it’s not here, or I missed it. I have to do it in C now.

Also, i often do v[where v within (a;b)] on large enough v where resulting value can only be 10% or less. During evaluation, the intermediate large vector is built (result of within function), then it is passed to where function, then indexed application to v is done. Things are even worse in case of v[where (v>a) and (v<b)] and similar requests.

I wrote where_within in C, it resulted in 10x speedup in my usual case, but maybe I miss something important.

Also I cannot imagine the implementation of distinct, look at this case:

q)t:20000000?20000
q)ts:`$string t
q)\t distinct ts
58
q)\t sum t
26

The distinct is almost O(n). How can it be that good?

What hash set implementation do they use for symbols?

I understand, work is done on interned pointers. I could only invent hash map where i use pointer>>4 as a hash, and then just modulo internal table size (large enough), and using next slots if current is busy. I don’t know how to make this approach generic, apart from making internal table size large enough (3 times the initial array, for example). Any general thoughts on “distinct” implementation?

Thank you.

P.S. I understand and appreciate the minimalist approach of kdb+ though.

User7 · September 28, 2018, 10:04am

What attributes are you applying to the tables in question, this should give you a performance boost

q)n:10000000

q)t1:(sym:n?`3;col1:n?10.0;col2:n?.Q.a)

q)t2:(sym:n?`3;col3:n?10;col4:n?100.0) )

q)\t t1 lj 1!t2

131

q)update g#sym from t1

`t1

q)update g#sym from t2

`t2

q)\t t1 lj 1!t2

86

Topic		Views
Better way to join the table Community Support kdb-and-q	2	August 24, 2010
improving native lj performance Community Support kdb-and-q	3	January 3, 2018
re: [personal kdb+] generic join on splayed and mapped tables on any column Community Support imported , kdb-and-q	0	December 3, 2012
SQL mindset Community Support kdb-and-q	3	March 1, 2015
Questions on window join Community Support kdb-and-q	0	September 17, 2015

Merge joins and other performance related questions

Related topics