- Does KDB+ can be integrated with Hadoop like OpensTSDB/MongoDB/Cassandra?
- KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?
Can anyone explain in detail about this?
Can anyone explain in detail about this?
> 1. Does KDB+ can be integrated with Hadoop like OpensTSDB/MongoDB/Cassandra?
almost all systems allow for another system to integrate. even if it’s a chess game that only accepts mouse clicks, a program can control the OS mouse position and buttons and read the screen.
most systems provide a way to interact with them through
. a C API
. a network protocol
. the filesystem
kdb+ distinguishes itself by having
. the simplest API writing client apps over IPC
. the same API for extending/integrating kdb+ as a shared library.
, an open network protocol (see websocket code for a js impl)
and scripts or data can be deposited on the filesystem and read
i’m sure the systems you mention above have either a C API or a network protocol that, with a little effort, will integrate witih kdb+
> 2. KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?
. yes, kdb+ can be run on multiple hosts with communication between the kdb+processes on any of the hosts (search wiki - hopen)
. “MapReduce” is a bit of a laugh - it’s been a fundamental concept years before “MapReduce”.
- a simple example in kdb+ is +/‘where ’ is map and +/ is reduce. +/’ will sum each row of a matrix
. i’ve briefly looked at the Hadoop docs and it’s hard to find a simple (one page) case study of how to use it.
- i’d go with kdb+, but then i know a little kdb+.
jack
effbaie is correct there. kdb+ could be installed on every Hadoop node and your Hadoop worker could use it, but you would have to use it a software component that is not Hadoop distributed.
Hadoop and kdb+ complementary technologies - not rivals. Spark Hadoop has some excellent graph theoretic and machine learning libraries. This are ideal for the very slow processing that a Spark cluster performs.
With Spark, you have to send a whole Java container over to a remote host, start it up, load it; so if you’re going to all that trouble, you better do something really compute intensive with long difficult lines of control, multi-threaded heuristic stuff - like machine learning. (Once you have the Spark worker nodes running, you can send new data to them.)
q/kdb+, on the other hand, is fast! Starts in a flash, runs all day. Bomb-proof. And many FinTech firms use q/kdb+ for algorithmic trading. The same 5 years of data an analyst uses for back-testing with ksql can be used by a transactional real-time system - the Ticker Plant and all that.
You would probably have run a Spark machine learning job overnight to analyse some the data that q/kdb+ has prepared. This would generate metrics for tuning a trading system.
And again effbaie points out that q/kdb+ has a very simple C interface and is really fast over a network, so you might have a single site Spark cluster using ‘q’ as a client component accessing a server kdb+ database, rather than all that mucking about with the Hadoop File System.
kdb+ databases have fixed schema. NoSQL databases don’t: the record structure can change.
In a word: No.
Le dimanche 24 avril 2016 13:04:53 UTC+1, Kumaresan G a écrit :
- Does KDB+ can be integrated with Hadoop like OpensTSDB/MongoDB/Cassandra?
- KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?
Can anyone explain in detail about this?