Does linear search stop on the first match?

t:( a:til long$1e6; b:til long$1e6)

/ Q1. how can i make this linear search start from i=0 and stops on first match?

\ts:100 select first a, first b from t where a<3

/ Q2. how can i make this reverse linear search start from i=n-1 and stop on the last match?

\ts:100 select last a, last b from t where a>1e6-3

/ Q3. what if the table has a sorted attribute?

/ Q4. what if it is a 64 bit long list w/o table structure?

/ Thx.

To: “[kdb+] [kdb+]” X-Mailer: Apple Mail (2.2104)Hi Yan Yan,just came across this post again looking for something elsewondering what you ended up doingsee my own thoughts below> On 2 Apr 2015, at 12:59, Yan Yan wrote:> t:( a:til long$1e6; b:til long$1e6)> / Q1. how can i make this linear search start from i=0 and stops on first match?> \ts:100 select first a, first b from t where a<3333 1049088q)gt:{i:0;do[count y;if[x>y i;:i];i+:1];:0N}q)\ts:100 gt[3;t.a]0 448but worst case would be pretty badq)\ts select first a, first b from t where a<-14 1049120q)\ts t gt[-1;t.a]383 496processing blocks helps somewhatq)gtb:{n:1000&count y;i:til n;do[count[y]div n;if[n>f:(x>y i)?1b;:f];i+:n];:0N}q)\ts t gtb[-1;t.a]14 17664you could rewrite in C if even more speed is needed> / Q2. how can i make this reverse linear search start from i=n-1 and stop on the last match?> \ts:100 select last a, last b from t where a>1e6-3very similarly, with similar caveats> / Q3. what if the table has a sorted attribute?bin should work> / Q4. what if it is a 64 bit long list w/o table structure?no problem, all previous code had nothing to do with tablesCheers, Attila

I have yet to find any built-in language support. My latest idea is to create sublists of lengths 1 2 4 8 16… until they add up to the length of the full list, or until the first match is found. O(log n) vector operations is applied in the worst case. O(n) elements are checked in the worst case. To avoid applying vector operations on breadcrumbs the minimum length can start at 1024. A cache-aware optimization would be limiting the max sublist length to the CPU cache size, but this may lead to O(n) vector operations. The algorithm should work backwards, too. Similar ideas are found in stl vector reallocation, java ArrayList growth, and bubble memory algorithm.

q)t:( a:til long$1e6; b:til long$1e6)

q)\ts:100 select first a, first b from t where a<3

497 1049088


if you expect the match to be near the first or last then maybe this optimization?

q)end:{[x;y;z]n:ceiling count[t]z;select x#a,x#b from $[z=1;t;(nx)#t] where y a} /look in the z end rows of t

q)lin:{if[not 0N=first exec b from e:end[x;y;z];:e];end[x;y;1]} /end else end with z=1 (look at every row)


q)\ts:100 lin[1;3>;1e-5]

0 1072

q)\ts:100 lin[-1;990000<;0.1] /backwards

100 2359840

q)\ts:100 lin[1;3>;0.001]

0 17888

q)\ts:100 lin[1;3000<;0.01]

10 393760

q)\ts:100 lin[1;99998<;0.1]

85 2228688

q)\ts:100 lin[1;(1e6-1)<;1e-5] /worst case

2415 9437776

maybe if you are looking for better worst case, when looking at “the rest” of the table, store rank[a] and update periodically depending on how you update or insert t.

q)t:( a:til `long$1e6; b:1000000?1000000)

q)t

a b

---------

0 22814

1 465463

2 611242

3 108627

4 922459

5 199067

6 92097

7 79091

8 727663

..

q)\t exec rank b from t

280

linear search:

last (1+)[{not 3>t.a x};0]

n:`long$1e6;

t:( a:til n; b:til n);

last (1+)[{not 3>t.a x};0]; / wrong result 0

last (1+)[{(x<count[t]) & not 3=t.a x};0] / works

last (1+)[{(x<count[t]) & not neg[1]=t.a x};0] / returns n and slow

last (neg[1]+)[{(x>=0) & not 3=t.a x};count[t]-1] / returns 3 and slow

last (neg[1]+)[{(x>=0) & not (n-3)=t.a x};count[t]-1] / returns n-3

last (neg[1]+)[{(x>=0) & not neg[1]=t.a x};count[t]-1] / returns -1 and slow

t.a?3 / found and fast

t.a?neg[1] / not found and fast

i’ve just pointed to “iterate” as a solution.
if you look for exact matches (which wasn’t the case in your original Q1) then use “find”.

oh yes, sorry that i have forgotten my original Q1.

q)n:`long$1e6;
q)t:( a:til n; b:til n);
q)last (1+)[{not 3>t.a x};0]

q)(1+)/[{not 3>t.a x};0] /“converge” saves some code and maybe some space:

q)(1+)/[{(x<count[t]) & not 3>t.a x};0]  /and, as Yan indicates, check for x out of bounds

but as Yan notes, this is slower:

q)\t:5 (1+)/[{(x<count[t]) & not 1e5<t.a x};0]

553

q)\t:5 select first a,first b from t where 1e5<a

157

q)\t:5 lin[1;1e5<;1] /=select first a,…

160

q)\t:5 lin[1;1e5<;0.11] /match in first 11% of rows

10

When I call findFirst, there is a good chance that the needle will be near the BEGINNING of the haystack.
When I call findLast, there is a good chance that the needle will be near the END of the haystack.
But it may not be found in the worst case.