I’m curious if kdb+/q would be an appropriate solution for spatial time series data. Climate data is typically 4-D and can be quite large. Would like be able to do some in memory computations and also query at much faster speeds than current standard structures (grib or netcdf files with python xarray).
Yes it could be used.
To test you could look at PyKX for an easy Python interface.
A 2 minute example of passing in a Dataset to q in shown below.
PyKX allows Registering Custom Conversions so you could create a function to pass the Dataset in exactly the form you wish to q instead of passing it all as a dictionary in my example.
import pykx as kx import xarray as xr import numpy as np import pandas as pd
ds = xr.Dataset( {"foo": (("x", "y"), np.random.rand(5, 5))}, coords={ "x": [10, 20, 30, 40, 50], "y": pd.date_range("2000-01-01", periods=5), "z": ("x", list("abcde")), }, )
kx.q['ds'] = kx.toq(ds.to_dict())
kx.q('ds')
pykx.Dictionary(pykx.q(' coords | `x`y`z!+`dims`attrs`data!((,`x;,`y;,`x);(()!();()!();()!());(10 20.. attrs | ()!() dims | `x`y!5 5 data_vars| (,`foo)!+`dims`attrs`data!(,`x`y;,()!();,(0.7412575 0.2054306 0.10.. '))
kx.q('flip ds[`coords;;`data]')
pykx.Table(pykx.q(' x y z ---------------------------------- 10 2000.01.01D00:00:00.000000000 a 20 2000.01.02D00:00:00.000000000 b 30 2000.01.03D00:00:00.000000000 c 40 2000.01.04D00:00:00.000000000 d 50 2000.01.05D00:00:00.000000000 e '))
kx.q('ds[`data_vars;`foo;`data]')
pykx.List(pykx.q(' 0.7412575 0.2054306 0.1009393 0.8792678 0.04105999 0.1811459 0.01659637 0.2406029 0.4900055 0.551788 0.6303767 0.0702013 0.6831359 0.5961667 0.3722388 0.9255059 0.9202499 0.5055902 0.9767793 0.7440498 0.7331576 0.003197568 0.4939932 0.5433492 0.01175784 '))