Hello John,
thanks for the thorough reply. It’s very useful indeed.
regards,
Fausto
Il giorno giovedì 30 novembre 2017 10:45:07 UTC+1, Jonathon McMurray ha scritto:
A slightly less crude solution is to modify your get_date function to be vectorised e.g.
q)get_date:{[t] t where {(not x) and (<>)scan x} t in “”} //assumed ;t at end of function was a typo, as that would return original string
q)get_date2:{-1_'1_'raze (" *“;”,")0:x} //vectorised version using 0:
q)read0`:b.csv
“172.16.5.2,172.16.1.112,[07/Nov/2017:03:16:10 +0100],"JAX-WS",0.513,0.513”
“172.16.5.2,172.16.1.112,[07/Nov/2017:03:16:11 +0100],"JAX-WS",0.542,0.542”
“172.16.5.2,172.16.1.112,[07/Nov/2017:03:16:13 +0100],"JAX-WS",0.574,0.573”
q)get_date each read0`:b.csv
“07/Nov/2017:03:16:10 +0100”
“07/Nov/2017:03:16:11 +0100”
“07/Nov/2017:03:16:13 +0100”
q)get_date2 read0`:b.csv //no need for each as function is vectorised
“07/Nov/2017:03:16:10 +0100”
“07/Nov/2017:03:16:11 +0100”
“07/Nov/2017:03:16:13 +0100”
q)\ts:10000 get_date each read0`:b.csv //speed & memory usage comparison
380 3648
q)\ts:10000 get_date2 read0`:b.csv
131 1392
However, .Q.fs does not return the return value of the function:
q).Q.fs[get_date2]`:b.csv
222
So the function would need to be modified to append to a global variable e.g.
q)get_date3:{d,::-1_'1_'raze (" *“;”,")0:x}
q).Q.fs[get_date3]`:b.csv
222
q)d
“07/Nov/2017:03:16:10 +0100”
“07/Nov/2017:03:16:11 +0100”
“07/Nov/2017:03:16:13 +0100”
Hope that helps
Jonathon
From: personal…@googlegroups.com [mailto:personal…@googlegroups.com] On Behalf Of JW Buitenhuis
Sent: 29 November 2017 22:05
To: personal…@googlegroups.com
Subject: Re: [personal kdb+] .Q.fs type error
Hi Fausto,
.Q.fs splits the file up in chunks and passes these chunks into the callback function.
A crude solution would be
.Q.fs[{get_date each x};`:h.csv]
Good luck.
On 29 November 2017 at 21:54, Fausto Saporito <fausto…@gmail.com> wrote:
Hello all,
I have a simple function extracting a datetime from a string:
get_date:{[t] t where{(not x)and(<>)scan x} t in"";t}
If I apply this function to a string, it works without any problem:
get_date[“172.16.5.2 172.16.1.112 [07/Nov/2017:03:16:10 +0100] "JAX-WS" 0.513 0.513”]
so I tried to use it with .Q.fs to reading a quite large text file “h.csv” (UNIX format):
[…]
172.16.5.2,172.16.1.112,[07/Nov/2017:03:16:10+0100],“JAX-WS”,0.513,0.513
172.16.5.2,172.16.1.112,[07/Nov/2017:03:16:11+0100],“JAX-WS”,0.542,0.542
172.16.5.2,172.16.1.112,[07/Nov/2017:03:16:13+0100],“JAX-WS”,0.574,0.573
[…]but I get this error:
q).Q.fs[get_date;`:h.csv]
{[t] t where {(not x) and (<>)scan x} t in “”;t}
'type
&:
(0000000000000000000000000000000000000000000000000000000000000000000000000000..
q))
If I use in Q.fs, for example, the function “show”, I can view the entire file without any problem.
Why apache_date, in this case, complains about the datatype ?
thanks,
Fausto
–
Submitted via Google Groups