Hello,
I have a list of words :
q)words
“digby”
“morrell”
“born”
“10”
“october”
“1979”
“is”
,“a”
“former”
“australian”
“rules”
“footballer”
“who”
“played”
“with”
“the”
“kangaroos”
“and”
“carlton”
“in”
“football”
“league”
..
I also have a list of lists(each containing words):
q)kspl
(“digby”;“morrell”;“born”;“10”;“october”;“1979”;“is”;,“a”;“former”;"australia..
(“alfred”;,“j”;“lewy”;“aka”;“sandy”;“graduated”;“from”;“university”;“of”;"chi..
(“harpdog”;“brown”;“is”;,“a”;“singer”;“and”;“harmonica”;“player”;“who”;“has”;..
(“franz”;“rottensteiner”;“born”;“in”;“waidmannsfeld”;“lower”;“austria”;“on”;"..
(“henry”;“krvits”;“born”;“30”;“december”;“1974”;“in”;“tallinn”;“better”;"know..
(“sam”;“henderson”;“born”;“october”;“18”;“1969”;“is”;“an”;“american”;"cartoon..
(“aaron”;“lacrate”;“is”;“an”;“american”;“music”;“producer”;“recording”;"artis..
(“trevor”;“ferguson”;“aka”;“john”;“farrow”;“born”;“11”;“november”;“1947”;“is”..
(“grant”;“nelson”;“born”;“27”;“april”;“1971”;“in”;“london”;“also”;“known”;"as..
(“cathy”;“caruth”;“born”;“1955”;“is”;“frank”;,“h”;,“t”;“rhodes”;“professor”;"..
(“sophia”;“violet”;“sophie”;“crumb”;“born”;“september”;“27”;“1981”;“is”;“an”;..
(“jenn”;“ashworth”;“is”;“an”;“english”;“writer”;“she”;“was”;“born”;“in”;"1982..
(“jonathan”;“hoefler”;“born”;“august”;“22”;“1970”;“is”;“an”;“american”;"typef..
(“anthony”;“fitzhardinge”;“gueterbock”;“18th”;“baron”;“berkeley”;“and”;“obe”;..
(“david”;“chernushenko”;“born”;“june”;“1963”;“in”;“calgary”;“alberta”;“is”;,"..
(“joerg”;“steineck”;“is”;,“a”;“german”;“filmmaker”;“editor”;“and”;“graphic”;"..
(“fr”;“andrew”;“pinsent”;“born”;“19”;“august”;“1966”;“is”;“research”;"directo..
(“paddy”;“dunne”;“was”;,“a”;“gaelic”;“football”;“player”;“from”;“park”;“in”;"..
(“alexandros”;“mouzas”;“born”;“1962”;“is”;,“a”;“greek”;“composer”;“he”;"studi..
(“john”;“angus”;“campbell”;“born”;“march”;“10”;“1942”;“in”;“portland”;"oregon..
(“chris”;“batstone”;“was”;“the”;“20002002”;“lead”;“singer”;“of”;“thirdwave”;"..
(“ceiron”;“thomas”;“born”;“23”;“october”;“1983”;“is”;,“a”;“welsh”;“rugby”;"un..
the list of lists is shaped as follows :
q)kspl[0]
“digby”
“morrell”
“born”
“10”
“october”
“1979”
“is”
,“a”
“former”
“australian”
“rules”
“footballer”
“who”
“played”
“with”
“the”
“kangaroos”
“and”
“carlton”
“in”
“football”
“league”
.. … and so on(created by - " " vs [a list of string lists])
I need to know the rows of ‘kspl’ that each word of ‘words’ appears in , so I do:
q)lc:{dr in kspl}each til count kspl;
q)lc
11111111111111111111111111111111111111111111111111111111111111111111111111111..
00000011000000110101000001011000000011000010100100000000001000000100100000000..
00000011000010110101000001001000000011000000100111000000001000000100000000000..
00100011000000110101000001001000000011010010100100000000001000000000000000000..
00100011000010110101000001001000000001000000100010000010000000001010000000000..
00101011000000110101000001001000000011000000100010000000001000000000000000000..
00110011000001110101000001101100000011000011100111000000000000000110000010000..
00100011000010110101000001101100000011000011100110010000001000100110000100000..
- returns locations of each word of ‘words’ in each row of ‘kspl’.
This is, however, extremely slow. How may I speed this up? I have 59071 rows in all and 580893 words in all.
Thank you,
Kumar
P.S : the calculation of lc above is akin to the document frequency calculation(df in tfidf).