Need help to read hdf. file written in python to kdb+?

MSHK1 · May 30, 2022, 4:23am

Hi Community,

I am using the https://code.kx.com/q/interfaces/hdf5/ and trying to read HDF file (written in python) in kdb+ (q)?

Attached HDF file and error message.

Please advise how to read this file?

File from iOS
Download Binary

rgds,

Marion

SJT1 · May 30, 2022, 7:58am

The error message JPEG looks like source code; I dont see an HDF file attached. Did I miss something?

MSHK1 · May 30, 2022, 10:05am

Hi,

How can I attach a (.hdf) file in this community?

The below error message says the valid file types are jpg, gif, png, json and txt only.

please advise.

Unless I can email it to you from my @first derivatives.com account?

rgds,

Marion

SJT1 · May 30, 2022, 4:50pm

Try appending e.g. file.hdf as file.hdf.txt ?

MSHK1 · May 31, 2022, 2:59am

Hi,

Please find the file attached as test.hdf.txt for your review.

rgds,

Marion

MSHK1 · May 31, 2022, 3:19am

Hi,

I tried to attach a .txt file but the format is not allowed to attach.

Thus, I screen grabbed the .hdf file in png for your review here.

How to send a .hdf or .txt file here? It is not allowed, sorry!

MSHK1 · June 1, 2022, 7:58am

Hi,

I tried again just now and managed to attach the file in txt.

Can you please have a look and advise back?

Thank you so much!

rocuinneagain1 · June 2, 2022, 11:04am

Hi MSHK - there does not seem to be any txt file attached. Only PNG files (real PNG contents - rename to hdf is invalid)

MSHK1 · June 2, 2022, 11:08am

Hi,

Thanks for checking.

There was an error to attach .txt file, thus after I type the message, the file could not be attached and was removed. How to send the sample hdf file to you?

MSHK1 · June 2, 2022, 11:09am

There was an error to attach .txt file, thus after I type the message, the file could not be attached and was removed. How to send the sample hdf file to you?

rocuinneagain1 · June 2, 2022, 12:31pm

If you run through the example in q then inspect the created file in python you will see what the interface expects

https://code.kx.com/q/interfaces/hdf5/examples/#create-a-dataset

The table columns are stored individually inside groups

>>>data = h5.File(‘experiments.h5’, ‘r’) >>> data[‘experiment2_tables’] <HDF5 group “/experiment2_tables” (1 members)> >>> data[‘experiment2_tables/tab_dset’] <HDF5 group “/experiment2_tables/tab_dset” (5 members)> >>> data[‘experiment2_tables/tab_dset/class’] <HDF5 dataset “class”: shape (10000,), type “<i2”>

(The filename must end in ‘.h5’)

For you to store data from Python you should match this style using groups for columns.

import h5py as h5 import pandas as pd df = pd.DataFrame({“AA”:[1, 2], “BB”:[3, 4], “CC”:[5, 6]}) f = h5.File(‘forKX.h5’,‘w’) project = f.create_group(“project”) table = project.create_group(“table”) for col in df.columns: table[col] = df[col].to_numpy() f.close()

kdb+ still does not know you intend this data to be a table.

As outline in the docs https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

Attributes would be needed.

Without the attributes you can reshape in to a table like so:

q){flip x!{.hdf5.readData[“forKX.h5”;“project/table/”,string x]} each x}AABB`CC AA BB CC -------- 1 3 5 2 4 6

rocuinneagain1 · June 2, 2022, 6:56pm

This code creates a basic table in a file written by KX:

t:( AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile[“byKX.h5”] .hdf5.createGroup[“byKX.h5”;“project”] .hdf5.writeData[fname;“project/table”;t]

If we expand out what it is doing to match the documentation we can create the exact same file with:

https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

t:( AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile[“diy.h5”] .hdf5.createGroup[“diy.h5”;“project”] .hdf5.createGroup[“diy.h5”;“project/table”] {.hdf5.writeData[“diy.h5”;“project/table/”,string x;t x]} each cols t .hdf5.writeAttr[“diy.h5”;“project/table”;“datatype_kdb”;“table”] .hdf5.writeAttr[“diy.h5”;“project/table”;“kdb_columns”;cols t]

Finally this would be the python equivalent:

import h5py as h5 import pandas as pd import numpy as np df = pd.DataFrame({“AA”:[1, 2], “BB”:[3, 4], “CC”:[5, 6]}) f = h5.File(‘forKX.h5’,‘w’) project = f.create_group(“project”) table = project.create_group(“table”) table.attrs[“datatype_kdb”] = np.array( [ord(c) for c in ‘table’], dtype=np.int8) table.attrs[“kdb_columns”] = [x.encode(‘ascii’) for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() f.close()

All three read in the same way:

q).hdf5.readData[“byKX.h5”;“project/table”] AA BB CC -------- 1 3 5 2 4 6 q).hdf5.readData[“diy.h5”;“project/table”] AA BB CC -------- 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5”;“project/table”] AA BB CC -------- 1 3 5 2 4 6

h5dump is useful to inspect h5 files.

https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm

Used it will print the shapes and types of the contents of your file

h5dump diy.h5

The main 3 takeaways are:

Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping
In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example
If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+

Topic		Replies	Views
Need help to read hdf. file written in python to kdb+? Community Support imported , kdb-and-q	3	0	June 2, 2022
Multidimenstional Object Dataset File Format HDF5 Community Support kdb-and-q	1	0	August 4, 2008
RE: [personal kdb+] Writing Csv File with .Q.fs[] Community Support kdb-and-q	1	1	January 26, 2017
Question about "no append to q2 file format" Community Support kdb-and-q	4	0	February 27, 2016
File not found ( official site ) Community Support kdb-and-q	5	0	November 6, 2020

Need help to read hdf. file written in python to kdb+?

The main 3 takeaways are:

Related topics