q)m: (til 5000)!(til 5000) / a trivial mapping
q)v: 1000 50#(537?100) / a 2D list of values to map
q)\t m[v] / <== this takes a surprisingly
long time
190
q)\t m[raze v] / 1D list - same amount of mapping
as m[v] but MASSIVELY faster
0
As you can see, applying the dictionary to a list of list is MASSIVELY
slower than applying a dictionary to a single list. This surprises
me. I could imagine some overhead, but two orders of magnitude means
there are forces at play that I clearly don’t understand.
Any insights available? Any work arounds (other than raze-ings 2D
lists to 1D lists)? Am I doing anything wrong?
Cheers,
Mike
Hi Mike,
A mapping without any attribute on the keys means a linear search on the keys.
q)m!
til 10000000
q)\t do[10;m 0]
0
q)\t do[10;m 99]
0
q)\t do[10;m 9999]
0
q)\t do[10;m 99999]
2
q)\t do[10;m 999999]
23
q)\t do[10;m 9999999]
202
If you set the keys to have the sorted attribute (s) then binary search will be used, if you set them to unique (
u#) than a hash-map
is created (it takes considerable time if the list is long) and will
be used
q)m!
`s#til 1000000
q)\t do[10;m 9999999]
0
q)\t do[10000;m 9999999]
13
q)\t m!
`u#til 10000000
2750
q)\t do[10;m 99999999]
0
q)\t do[10000;m 99999999]
6
However this only starts to have an advantage if the list is more than
100 items long:
q)m!
til 100
q)\t do[100000;m 99]
73
q)m!
`s#til 100
q)\t do[100000;m 99]
84
q)m!
`u#til 100
q)\t do[100000;m 99]
72
In your case
q)m!
til 5000
q)v:1000 50#537?100
q)\t m v
62
You can achieve some speedup by applying on atoms
q)\t m each/:v
18
Ot setting the key sorted/unique:
q)m!
s#til 5000 q)\t m v 29 q)m!:m:
u#til 5000
q)\t m v
13
In kdb+ only vector-operations are optimized (matrix is list of
lists), so the more operations done in bulk (on long vectors) the
better.
So doing one m - raze v (50000 elements) lookup is a lot better than
doing 1000 m - v[i] (50 elements) lookup.
I hope it helps.
Regards,
Attila