If you had a folder structure like:
$ tree .
.
├── A1
│ ├── B1
│ │ ├── part1.parquet
│ │ └── part2.parquet
│ └── B2
│ ├── part1.parquet
│ └── part2.parquet
└── A2
├── B1
│ ├── part1.parquet
│ └── part2.parquet
└── B2
├── part1.parquet
└── part2.parquet
6 directories, 8 files
You can build information using the folder structure:
q)files:([] file:system"find . -name *.parquet")
q)files:update split:"/" vs/:file from files
q)files:update file:hsym `$file, columnA:`$.[;(::;1)] split,columnB:`$.[;(::;2)] split from files
q)files
file split columnA columnB
---------------------------------------------------------------------
:./A2/B2/part2.parquet ,"." "A2" "B2" "part2.parquet" A2 B2
:./A2/B2/part1.parquet ,"." "A2" "B2" "part1.parquet" A2 B2
:./A2/B1/part2.parquet ,"." "A2" "B1" "part2.parquet" A2 B1
:./A2/B1/part1.parquet ,"." "A2" "B1" "part1.parquet" A2 B1
:./A1/B2/part2.parquet ,"." "A1" "B2" "part2.parquet" A1 B2
:./A1/B2/part1.parquet ,"." "A1" "B2" "part1.parquet" A1 B2
:./A1/B1/part2.parquet ,"." "A1" "B1" "part2.parquet" A1 B1
:./A1/B1/part1.parquet ,"." "A1" "B1" "part1.parquet" A1 B1
And then use the information to build the virtual table:
q).pq:use`kx.pq
q).pq.t:use`kx.pq.t
q)t:.pq.t.mkP (`columnA`columnB#files)!.pq.pq each exec file from files
q)select from t
columnA columnB c_bool c_int8
-----------------------------
A2 B2 0 5
A2 B2 0 6
A2 B2 1 7
A2 B2 0 8
A2 B2 0 9
A2 B2 1 1
A2 B2 1 2
A2 B2 0 3
A2 B2 1 4
A2 B1 0 5
A2 B1 0 6
A2 B1 1 7
A2 B1 0 8
A2 B1 0 9
A2 B1 1 1
A2 B1 1 2
A2 B1 0 3
A2 B1 1 4
A1 B2 0 5
A1 B2 0 6
A1 B2 1 7
A1 B2 0 8
A1 B2 0 9
A1 B2 1 1
A1 B2 1 2
A1 B2 0 3
A1 B2 1 4
A1 B1 0 5
A1 B1 0 6
A1 B1 1 7
A1 B1 0 8
A1 B1 0 9
A1 B1 1 1
A1 B1 1 2
A1 B1 0 3
A1 B1 1 4
To convert this to a more dynamic reusable pattern you could create a utility in a file buildVT.q:
.pq:use`kx.pq
.pq.t:use`kx.pq.t
buildVT:{[path;levels]
files:([] file:system"find ",(1 _ string path)," -name *.parquet");
files:update split:"/" vs/:file from files;
lv:1+count where "/"=string path;
levels:(),levels;
files:![files;();0b;(`file,levels)!enlist[({hsym `$x};`file)],{(({`$.[y;(::;x)]};x);`split)} each lv+til count levels];
.pq.t.mkP (levels#files)!.pq.pq each exec file from files
};
Then call buildVT passing the root path and names for the nested levels:
q)\l buildVT.q
q)t:buildVT[`:/home/rocuinneagain/pqHDB;`columnA`columnB]
q)select from t
columnA columnB c_bool c_int8
-----------------------------
A2 B2 0 5
A2 B2 0 6
A2 B2 1 7
A2 B2 0 8
A2 B2 0 9
...
If you have a large number of files rather than running buildVT each time you could save the files table out to a .csv and use that on startup for faster creation of the virtual table.