Is it possible to integrate Kdb+ and hadoop?

  1. Does KDB+ can be integrated with Hadoop like OpensTSDB/MongoDB/Cassandra?
  2. KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?

Can anyone explain in detail about this?

> 1. Does KDB+ can be integrated with Hadoop like OpensTSDB/MongoDB/Cassandra?

almost all systems allow for another system to integrate.  even if it’s a chess game that only accepts mouse clicks, a program can control the OS mouse position and buttons and read the screen. 

most systems provide a way to interact with them through

  . a C API

  . a network protocol

  . the filesystem

kdb+ distinguishes itself by having

 . the simplest API writing client apps over IPC
 . the same API for extending/integrating kdb+ as a shared library.

 , an open network protocol (see websocket code for a js impl)

and scripts or data can be deposited on the filesystem and read

i’m sure the systems you mention above have either a C API or a network protocol that, with a little effort, will integrate witih kdb+

> 2. KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?

 . yes, kdb+ can be run on multiple hosts with communication between the kdb+processes on any of the hosts (search wiki - hopen)

 . “MapReduce” is a bit of a laugh - it’s been a fundamental concept years before “MapReduce”. 

    - a simple example in kdb+ is +/‘where ’ is map and +/ is reduce. +/’ will sum each row of a matrix

 . i’ve briefly looked at the Hadoop docs and it’s hard to find a simple (one page) case study of how to use it.

    - i’d go with kdb+, but then i know a little kdb+.

jack

effbaie is correct there. kdb+ could be installed on every Hadoop node and your Hadoop worker could use it, but you would have to use it a software component that is not Hadoop distributed.

Hadoop and kdb+ complementary technologies - not rivals. Spark Hadoop has some excellent graph theoretic and machine learning libraries. This are ideal for the very slow processing that a Spark cluster performs.

With Spark, you have to send a whole Java container over to a remote host, start it up, load it; so if you’re going to all that trouble, you better do something really compute intensive with long difficult lines of control, multi-threaded heuristic stuff - like machine learning. (Once you have the Spark worker nodes running, you can send new data to them.)

q/kdb+, on the other hand, is fast! Starts in a flash, runs all day. Bomb-proof. And many FinTech firms use q/kdb+ for algorithmic trading. The same 5 years of data an analyst uses for back-testing with ksql can be used by a transactional real-time system - the Ticker Plant and all that.

You would probably have run a Spark machine learning job overnight to analyse some the data that q/kdb+ has prepared. This would generate metrics for tuning a trading system.

And again effbaie points out that q/kdb+ has a very simple C interface and is really fast over a network, so you might have a single site Spark cluster using ‘q’ as a client component accessing a server kdb+ database, rather than all that mucking about with the Hadoop File System.

kdb+ databases have fixed schema. NoSQL databases don’t: the record structure can change.

In a word: No.

Le dimanche 24 avril 2016 13:04:53 UTC+1, Kumaresan G a écrit :

  1. Does KDB+ can be integrated with Hadoop like OpensTSDB/MongoDB/Cassandra?
  2. KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?

Can anyone explain in detail about this?