https://learninghub.kx.com/forums/topic/performance-using-a-table-to-combine-where-phrases
Hi all,
Passing a table in to a where clause instead of the individual values significantly affects performance, even compared to using & to join all the where phrases into one. What is the query doing under the covers?
1#.eoh.f date
SRC PRICE_TYPE INDEX1 INDEX2 CONTRACT1 PROFILE CONTRACT_TYPE
--------------------------------------------------------------------------
2023.06.28 market dev JPM 2023.10.M BLA
// PUBLICATIONS is date-partitioned, and doesn't have attributes on the columns we filter on
ts select from PUBLICATIONS where ([]date;SRC;PRICE_TYPE;INDEX1;INDEX2;CONTRACT1;PROFILE;CONTRACT_TYPE) in 1#.eoh.f
1176 177568
ts select from PUBLICATIONS where date=2023.06.28,src=`market,PRICE_TYPE=`dev,INDEX1=`JPM,INDEX2=`,CONTRACT1=`2023.10.M,PROFILE=`,CONTRACT_TYPE=`BLA
5 2101760
ts select from PUBLICATIONS where (date=2023.06.28)&(src=`market)&(PRICE_TYPE=`dev)&(INDEX1=`JPM)&(INDEX2=`)&(CONTRACT1=`2023.10.M)&(PROFILE=`)&CONTRACT_TYPE=`BLA
531 179664
ts select from PUBLICATIONS where (date in 2023.06.28)&(SRC in `market)&(PRICE_TYPE in `dev)&(INDEX1 in `JPM)&(INDEX2 in `)&(CONTRACT1 in `2023.10.M)&(PROFILE in `)&CONTRACT_TYPE in `BLA
504 179664
Using kdb+ 3.5 2017.10.11
Thanks,
Eoghan
I would presume optimizations such as left to right sub-clause filtering as mentioned here https://code.kx.com/q4m3/9_Queries_q-sql/#9333-multiple-where-subphrases are lost when filtering using a table/dict lookup.
i.e. we need to check all of the columns specified in the lookup simlataneouly, instead of passing each consective filter to the next sub-phrase.
To visualize differently:
q)show tab:([]c1:`a`b`c;c2:1 2 3) c1 c2 ----- a 1 b 2 c 3 q)tab in 1#tab 100b q)select from tab where tab in 1#tab c1 c2 ----- a 1
Also worth having a look at the parse trees:
q)parse "select from PUBLICATIONS where date=2023.06.28,src=`market,PRICE_TYPE=`dev,INDEX1=`JPM,INDEX2=`,CONTRACT1=`2023.10.M,PROFILE=`,CONTRACT_TYPE=`BLA" ? `PUBLICATIONS
,((=;`date;2023.06.28);(=;`src;,`market);(=;`PRICE_TYPE;,`dev);(=;`INDEX1;,`JPM);(=;`INDEX2;,`);(=;`CONTRACT1;,`2023.10.M);(=;`PROFILE;,`);(=;`CONTRACT_TYPE;,`BLA)) 0b ()
q)parse "select from PUBLICATIONS where ([]date;SRC;PRICE_TYPE;INDEX1;INDEX2;CONTRACT1;PROFILE;CONTRACT_TYPE) in 1#.eoh.f" ? `PUBLICATIONS
,,(in;(+:;(!;,`date`SRC`PRICE_TYPE`INDEX1`INDEX2`CONTRACT1`PROFILE`CONTRACT_TYPE;(enlist;`date;`SRC;`PRICE_TYPE;`INDEX1;`INDEX2;`CONTRACT1;`PROFILE;`CONTRACT_TYPE)));(#;1;`.eoh.f)) 0b ()
If you have a list of where clauses separated by a comma, it is executed sequentially, successively reducing the number of rows processed. On the other hand if you use the & operator, all operations will run on the full table.