Need help to read hdf. file written in python to kdb+?

Hi Community,

I am using the https://code.kx.com/q/interfaces/hdf5/ and trying to read HDF file (written in python) in kdb+ (q)?

Attached HDF file and error message.

Please advise how to read this file?

 

File from iOS
Download Binary

rgds,

Marion

 

The error message JPEG looks like source code; I don’t see an HDF file attached. Did I miss something?

 

Hi, 

How can I attach a (.hdf) file in this community?

The below error message says the valid file types are jpg, gif, png, json and txt only.

please advise.

Unless I can email it to you from my @first derivatives.com account?

 

rgds,

Marion

Try appending e.g. file.hdf as file.hdf.txt ?

Hi,

Please find the file attached as test.hdf.txt for your review.

rgds,

Marion

Hi,

I tried to attach a .txt file but the format is not allowed to attach.

Thus, I screen grabbed the .hdf file in png for your review here.

How to send a .hdf or .txt file here? It is not allowed, sorry!

Hi,

I tried again just now and managed to attach the file in txt.

Can you please have a look and advise back?

Thank you so much!

Hi MSHK - there does not seem to be any txt file attached. Only PNG files (real PNG contents - rename to hdf is invalid)

Hi,

Thanks for checking.

There was an error to attach .txt file, thus after I type the message, the file could not be attached and was removed. How to send the sample hdf file to you?

There was an error to attach .txt file, thus after I type the message, the file could not be attached and was removed. How to send the sample hdf file to you?

If you run through the example in q then inspect the created file in python you will see what the interface expects

https://code.kx.com/q/interfaces/hdf5/examples/#create-a-dataset 

The table columns are stored individually inside groups

 

>>>data = h5.File(‘experiments.h5’, ‘r’) >>> data[‘experiment2_tables’] <HDF5 group “/experiment2_tables” (1 members)> >>> data[‘experiment2_tables/tab_dset’] <HDF5 group “/experiment2_tables/tab_dset” (5 members)> >>> data[‘experiment2_tables/tab_dset/class’] <HDF5 dataset “class”: shape (10000,), type “<i2”>

 

(The filename must end in ‘.h5’)

For you to store data from Python you should match this style using groups for columns.

import h5py as h5 import pandas as pd df = pd.DataFrame({“AA”:[1, 2], “BB”:[3, 4], “CC”:[5, 6]}) f = h5.File(‘forKX.h5’,‘w’) project = f.create_group(“project”) table = project.create_group(“table”) for col in df.columns: table[col] = df[col].to_numpy() f.close()

kdb+ still does not know you intend this data to be a table.

As outline in the docs https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

Attributes would be needed.

Without the attributes you can reshape in to a table like so:

q){flip x!{.hdf5.readData[“forKX.h5”;“project/table/”,string x]} each x}AABB`CC AA BB CC -------- 1 3 5 2 4 6

 

 

 

This code creates a basic table in a file written by KX:

 

t:( AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile[“byKX.h5”] .hdf5.createGroup[“byKX.h5”;“project”] .hdf5.writeData[fname;“project/table”;t]

 

If we expand out what it is doing to match the documentation we can create the exact same file with:

https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries 

 

t:( AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile[“diy.h5”] .hdf5.createGroup[“diy.h5”;“project”] .hdf5.createGroup[“diy.h5”;“project/table”] {.hdf5.writeData[“diy.h5”;“project/table/”,string x;t x]} each cols t .hdf5.writeAttr[“diy.h5”;“project/table”;“datatype_kdb”;“table”] .hdf5.writeAttr[“diy.h5”;“project/table”;“kdb_columns”;cols t]

 

Finally this would be the python equivalent:

 

import h5py as h5 import pandas as pd import numpy as np df = pd.DataFrame({“AA”:[1, 2], “BB”:[3, 4], “CC”:[5, 6]}) f = h5.File(‘forKX.h5’,‘w’) project = f.create_group(“project”) table = project.create_group(“table”) table.attrs[“datatype_kdb”] = np.array( [ord(c) for c in ‘table’], dtype=np.int8) table.attrs[“kdb_columns”] = [x.encode(‘ascii’) for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() f.close()

 

All three read in the same way:

 

q).hdf5.readData[“byKX.h5”;“project/table”] AA BB CC -------- 1 3 5 2 4 6 q).hdf5.readData[“diy.h5”;“project/table”] AA BB CC -------- 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5”;“project/table”] AA BB CC -------- 1 3 5 2 4 6

 

 

h5dump is useful to inspect h5 files.

https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm 

Used it will print the shapes and types of the contents of your file

 

h5dump diy.h5

 

 

 

The main 3 takeaways are:

  1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping 

  2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example

  3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+