Re: [personal kdb+] Re: Search words inside list of list of words.

Here’s an update, to the problem for finding words in a list of list of words :
To recap:

  • kspl is a list of list of words, such as :

(“digby”;“morrell”;“born”;“10”;“october”;“1979”;“is”;,“a”;“former”;“australian”;“rules”;“footballer”;“who”;“played”;"..

(“alfred”;,“j”;“lewy”;“aka”;“sandy”;“lewy”;“graduated”;“from”;“university”;“of”;“chicago”;“in”;“1973”;“after”;"studyi..

(“harpdog”;“brown”;“is”;,“a”;“singer”;“and”;“harmonica”;“player”;“who”;“has”;“been”;“active”;“in”;“canadas”;“blues”;"..

(“franz”;“rottensteiner”;“born”;“in”;“waidmannsfeld”;“lower”;“austria”;“austria”;“on”;“18”;“january”;“1942”;“is”;“an”..

(“henry”;“krvits”;“born”;“30”;“december”;“1974”;“in”;“tallinn”;“better”;“known”;“by”;“his”;“stagename”;“genka”;“is”;"..

(“sam”;“henderson”;“born”;“october”;“18”;“1969”;“is”;“an”;“american”;“cartoonist”;“writer”;“and”;“expert”;“on”;"ameri..

(“aaron”;“lacrate”;“is”;“an”;“american”;“music”;“producer”;“recording”;“artist”;“dj”;“fashion”;“designer”;“of”;"milkc..

(“trevor”;“ferguson”;“aka”;“john”;“farrow”;“born”;“11”;“november”;“1947”;“is”;,“a”;“canadian”;“novelist”;“who”;"lives..

(“grant”;“nelson”;“born”;“27”;“april”;“1971”;“in”;“london”;“also”;“known”;“as”;“wishdokta”;“bump”;“flex”;“and”;“nng”;..

I want to count the number of times each distinct word of kspl  occurs in the rows of kspl. So, to elaborate, I want to know how many times “digby” occurs in kspl(all rows included), number of times “morrell” occurs in all rows of kspl and return a count of all occurrences for each word. 

Turns out, creating dr to use, like in the earlier mail is a waste, I could do the following without going through the creation of dr at all:

/ collect all words inside kspl and group to get a dictionary

g:group raze kspl;

gcount: count each g;

q)count each g

“digby”     | 1

“morrell”   | 1

“born”      | 79

“10”        | 13

“october”   | 14

“1979”      | 4

“is”        | 98

,“a”        | 99

“former”    | 24

“australian”| 5

“rules”     | 2

“footballer”| 5

“who”       | 38

“played”    | 24

“with”      | 85

“the”       | 100

“kangaroos” | 1

“and”       | 100

“carlton”   | 1

“in”        | 100

“football”  | 10

“league”    | 13

“aflfrom”   | 1

“western”   | 1

“australia” | 7

“his”       | 82

“early”     | 17

This works blazingly fast. I’m running a rickety, faithful Pentium with 4GB RAM. So if this screams on my machine, it must be truly fast. I’ve tried with 50,000 rows/lists in kspl. 

q)\t count each group raze kspl

189

Thanks again, Jack and Sean. 

Kumar