Win Nick Psaris' new book "Q Tips"

Hi Kdb+/q Gods and mere mortals!

This is mainly for people who are in Hong Kong, China

The prize  is Nick Psaris  book.

Winning will be cheaper than buying via amazon 37 USD Kindle, 54 USD real book

http://www.meetup.com/HK-Functional-programming/events/222347173/

Here is the question

http://keatonchan.blogspot.hk/2009/08/hkid-check-digit-calculation-algorithm.html

Anton

May the fastest code win…  \ts

will a 1-level rainbow table fit in 6M L3 cache?

will a 2-level rainbow table fit in 256k L2 cache?

how about the 32k L1 cache?

\ts:100000 hkidcheck each ids / 1689 msec

\ts:100000 hkidcheck each ids / 1280 msec

This doesn’t seem to be a valid competition. How can one compare performance by a simple \ts output without actually knowing the test configuration?

\ts:100000 hkidcheck each ids / 871j, 624j

My cpus are around i5 3GHz. The timings are within 10% of each other.

Does kdb 32 bit free use SIMD assembly when it performs vector additions and multiplications? Thx.

http://code.kx.com/wiki/Cookbook/FloatPrecision#KDB.2B\_SIMD\_sum

yes, it would not be fair if everyone were allowed to report their own times. code should be submitted before the event and all entries will be times at the meetup with my MacBook Pro using the 32 bit version of q. 

Noel has provided a very good first cut. I ask that we don’t share our solutions until after the meetup. I will report the winning entry and also the fastest algorithm from those who could not attend the meetup. can anyone beat Attila?

in the case of a tie, the shortest solution wins. in addition, global variables can not be used. 

> global variables can not be used.
You have just banned pre-computed tables :)

What is the current record?

My record w/o any pre-computed global lookup table is 7xx.

\ts:100000 hkidcheck each ids / 724j, 624j

each of our machines have different speeds - mine being old and slow.  it is not possible to compare runtime performance across machines.

btw who is good at number theory here?

\ts:100000 hkidcheck each ids / 687j, 624j

w/o global variables

the challenge is insufficiently defined

Using the HKIDs from the above page, Nick will confirm correctness:

q)ids:(“B123456”;“A182361”;“CA182361”;“AB123456”;“ZA182361”;“AZ182361”;“XZ182361”;“ZX182361”;“XX182361”)
q)“651936862”~hkidcheck each ids
1b

Is this to be taken literarily? If so one can come up with fast but useless solutions that satisfy this correctness test

q)“651936862”~hkidcheck each ids
1b
q)
q)\ts:1000 hkidcheck each ids
2 432

Otherwise the (i) length of the input HKID list and (ii) whether it contains duplicates will have a significant impact on determining the most efficient solution

Thanks,
Alex

P.S. OK I just read about the ban on global variables which may (?) make my last comment irrelevant. Still only participants who will have read this thread will know about this additional constraint

would you like to verify with me?

d3:.Q.n cross .Q.n cross .Q.n;

/ sample check

0x462d2a700f9d9feac8012cf818b2a136~md5 raze {hkidcheck each (enlist x) cross (d3 cross d3)} peach (.Q.A,.Q.A cross .Q.A) 20*til 5 / 2401 msec

/ full check

0xd18449f5943cce23d5b420aa0770a926~md5 raze {hkidcheck each (enlist x) cross (d3 cross d3)} peach (.Q.A,.Q.A cross .Q.A) / 4 min

hello alexandre,

yes, it is true that a better test of the submitted function’s accuracy would be to run the function across all possible HKIDs.

a one or two letter prefix and a 6 digit number results in a list of 373,071,582 values. in addition to putting a very high limit on the HK population, it also makes it slow to test all possible combinations. (not to mention the fact that i can’t even generate the complete list of hkids without maxing out the memory on my machine.)

the supplied test helps ensure your submission is accurate.

there have been submissions that incorrectly include the possibility of a “B” checkdigit. the following example test for this case:

q)hkidcheck “XX182460”
“0”

the new list of ids will be:

ids:( “B123456”;“A182361”;“CA182361”;
“AB123456”;“ZA182361”;“AZ182361”;
“XZ182361”;“ZX182361”;“XX182361”;“XX182460”)

nick

> 373,071,582 
How did you get that number? I thought it is 27*26e6=702e6. The regex is [A-Z][A-Z][0-9]{6}

yes, agreed.  it is 27*26*10^6  (not 27*26*9^6)

> Otherwise the (i) length of the input HKID list and (ii) whether it contains duplicates will have a significant impact on determining the most efficient solution

why?

The algorithm complexity is O(n), n=strlen(hkid)=7 or 8. The spreadsheet below illustrates the thought process. I did some peephole optimizations and micro-benchmarks as well. Nick Psaris wrote a great book. Thanks First Derivatives for sponsoring.

hkidcheck.xlsx encoded in base64

http://pastebin.com/2qxjPRYJ