How to debug/resolve 'badmsg

Hi, Masters:
   I’m having a TP, a RDB, a FeedHandler. Good news is they are working together. Bad news is, sometimes, the TP raises a 'badmsg, then whole system halts. This causes no data be fed into RDB any more until I notice the error and restart FeedHandler.

   My question is:

   1. How to analyse the 'badmsg raised by kdb+? It was said .z.bm comes in handby, but I really don’t know how to use it.

   2. You know, TP is a Q script. Then, how to make the Q script be a fault tolerance progaram?

   Do you have any suggestion?

Regards

Zheng

'badmsg means your feedhandler is publishing corrupted messages to tp. So the first place to check, is not the tp, but the feedhandler.

The purpose of .z.bm is to log the bad message so the sending party can be fixed. Try something like 
.z.bm:{show “received bad msg:”,-3!x} and track down the error.

Unfortunately there doesn’t seem to be a way to ignore bad messages, and they will cause the connection to drop.

Thanks. So I prefer to find what message sent to tp caused `badmsg. Because the feedhandler not always send wrong data. If I can find the bad one, I’ll try to tell what’s the difference between others.

Hi, JW:
  Thanks for your help.

  I appended the .z.bm function you mentioned into the TickPlanter script file. After several days running, I got the badmsg this morning. The output at TP side is:

  q)“received bad msg:(656i;0x010100003c000000003bcc0b3bd330080b00010000003630303033312e5348000b000100000042000800010000000ad70341060001000000820f0000)”
'badmsg

  Then, comes the other question. How can I analyse the message to know what caused the exception?

  I check the Feedhandler output log, it doesn’t contain a data with “656i”.

Thanks & Regards

Zheng

https://code.kx.com/q/ref/dotz/#zbm-msg-validator
The first argument is the network, you’d want to analyse the second bit which is the data.

-9! reads bytes - and as you can see it trips up when calling it on that data.

q)-9!0x010100003c000000003bcc0b3bd330080b00010000003630303033312e5348000b000100000042000800010000000ad70341060001000000820f0000

'badmsg

  [0]  -9!0x010100003c000000003bcc0b3bd330080b00010000003630303033312e5348000b000100000042000800010000000ad70341060001000000820f0000

You’d need to find what causes this particular series of bytes to be sent. One approach is to log what you send with a time stamp, and log 'badmsg with a timestamp, and then correlate when you hit another example.

You can also play around with -8! to see what kind of data this could be, or compare with some correct data from the feed handler.

Hi, JW:
  Many thanks for your explanation and doc resource!

  It’s sounds a good way to find the root reason. Let me try my best and feedback the status.

Thanks!

Zheng

From the message below, it appears that it’s corrupted right from the “3bcc” sequence, where its supposed to be the start of a generic list.

Details about the serialization method q uses can be found in doc: http://code.kx.com/q/ref/ipc/

Hi, Flying and JW:
  Thanks for your help!

  These days, I ran my toolkit, and met 'badmsg again. The out put such as:

  TP side message:

  q)msg
  2018.09.20D13:00:10.374033000

  (524i;0x010100003c000000008be8dc2cb632080b00010000003630313137392e5348000b0001000000420008000100000085eb614006000100000040060000)

  At the same time, FH side message:

  2018-09-20T13:00:10.374033000

  [QList([‘2018-09-20T05:00:06.000000000’], dtype=‘datetime64[ns]’), QList([b’abc’], dtype=‘|S3’), QList([b’S’], dtype=‘|S1’), QList([20.98], dtype=float32), QList([2800])]

  [QList([‘2018-09-20T05:00:00.730000000’], dtype=‘datetime64[ns]’), QList([b’xyz’], dtype=‘|S3’), QList([b’S’], dtype=‘|S1’), QList([8.61], dtype=float32), QList([3900])]

  FeedHandler was written by Python. According to the output log, I can’t tell what’s wrong with it. :-(

  Actually, several badmsg I met but all are seems same data structure and normal data. 

  Many appreciate if you have further guide, thanks!

  From my personal view. Is it possible because of the poor performance that Python can’t overtake?

Regards

Zheng

Hi Zheng,

It appears that that Python feedhandler encodes some of the data in an incorrect format. Maybe it’s out of date?

Perhaps try an alternative Python Q module. Which one are you using? 

Will

Hi, JW:
  Sorry for a (very) late reply.

  For the Python version, the pyrfa(v8.5.2) and qpython(v1.2.2) packages are attached in my code. Currently, I’m trying to switch to a C++ version.

Thanks

Zheng

Here, it can be confirmed that badmsg was caused by my python interface has low efficiency.
With a new one written by C++, whole process works smoothly and no exception any more.

JW, Flying, Cormac, Thanks for your suppoooooooooort!