Hello,
Wondering if anyone can help me out on this. I have a user sending a query that’s hogging resources. Does anyone know a better way to send this query that would not be as resource intensive.
h “select last price by date, sym from trade where date within 2013.01.03 2015.10.28 ,sym in A
AAAAAP
AACAAIT
AALAAME
AANAAOI
AAONAAP
AAPLAAT
AAVAAVL
AAWWAAXJ
AB , time within 15:30:00 16:00:00”
possibly with a much larger symbol list. Is there a better way to send this using adverbs and not sending the whole query directly the hdb?
What resource are you trying to constrain? The amount of time the HDB is locked up (i.e. query speed) or RAM/CPE usage?
If you just want to make this more “server friendly” you can put a q gateway in between your user and the database, which will divvy up the query into batches (say smaller sets of dates/syms) and then aggregate it together at the end. This is a bit more controlled than having a mix of negative port number along with -s slaves. Of course, it’s slower for the end user.
In terms of efficiency date,sym,time in the where clause is optimal for date-partioned HDB’s (assuming sym is parted)
The main thread will independently go into a date partition, pull the data off disk & put it into memory for operation - all of which will take up CPU, RAM and iops on disk.
If you have -s defined on command line (though that won’t guarantee speed up unless you can be cleverer about where the data is actually living on disks, i.e. different disks) then each thread will do this.
Yes I mean the amount of time the hdb is locked.
Yes I do have a gateway between user and db which user connects to via R. I don’t want to lock the gateway either. is it better to loop each date outside of q?
I have slaves running with mserve.q loaded but not sure the best way to utilize it because the deferred sync queries still lock the gateway.
I think the only way you’d manage to keep both gateway and HDB available to other users whilst the hungry query is running is to actually have the client break the query down as you suggest.
You can of course use negative port number for multithreaded input queues, but then you may well get uncontrolled proliferation of memory usage when lots of people are querying simultaneously.
Thanks Manish, What do you think the best way to break the query down is?
Should I loop the dates and the symbol list outside of q? ie just send a query that looks like
h “select from trade where date=d, sym=s”
or loop the dates outside of q and keep symbol list intact in the query?
h “select from trade where date=d, sym in A
AAAAP
AAC`AAIT…”
Am I correct to assume that if one user is looping over the dates and looping over the symbols outside of q, then another user sends a query while that is running, the second query should slide in and get executed synchronously.
> h “select from trade where date=d, sym in A
AAAAP
AAC`AAIT…”
That’s better, because I assume the db is parted on sym and doing a “last x by ..” type of query would be very fast over multiple syms in a single query vs lots of finer queries. You’ll need to expiriment whether its worth doing a small section of days too (e.g. a week at a time).
> Am I correct to assume that if one user is looping over the dates and looping over the symbols outside of q, then another user sends a query while that is running, the second query should slide in and get executed synchronously.
That’s correct
Yes it is parted on sym. Thanks for your help.