I’m working on the Advanced Capstone project and I keep getting this fail shown below as “check if all values are correct,” for exercise 1.3. I don’t understand why I’m getting this fail because when I have applied the code (also shown below) to smaller cases such as the example given in the exercise to which I can match the results for. I have tried both wj and wj1 as well as eliminating rows where sensorValue is null and in all cases give the same fail. Could anyone help diagnose what’s going on? I have provided below the final table and its meta from when I run the ‘testSection[`exercise1]’ function to see what the output is. All other exercises are fine.
// @fileOverview A function to aggregate event and sensor data creating a new table with average sensor values per sensor per lap // @param eventTab {table} A table that has columns datesessionlapIdtimeendTime // @param sensorTab {table} A table that has columns datesensorIdtimelapIdunitssensorValuesession // @returns {table} Aggregated sensor values per lap derived using window join with columns sessionlapIdtimeendTimesensorIdsensorValue {[eventTab;sensorTab] r:(select session,lapId,time,endTime from eventTab) cross select distinct sensorId from sensorTab; wj1[flip exec time from select time:(time,'endTime) from r; `session`lapId`sensorId`time; r;(select from sensorTab;(avg;`sensorValue))] //Pseudocode: // 1. create cross product of event table with unique list of sensorId’s - see https://code.kx.com/q/ref/cross/ // 2. get a pair of lists of times/timestamps, begin and end - use time and endTime from cross product table (#1) // 3. use wj: see https://code.kx.com/q/ref/wj/ // with window from #2 // columns to join on are sensorId and time //first table to join on from #1 //sensor table as second table to join on // avg sensorValue as aggregation function }
Thanks for the question. Make sure to check the pseudocode when creating these functions as they provide lots of guidance and tips.
Look at the example that they provided and look closely at what happens to the ` tirePressureBackLeft sensorId. It doesn’t have a reading in the time range for the lapTable but the average value column is still populated. This is different to the results that you got, as your table contains null values.
The issue stems from the common columns in the join. You have specified too many to join over and so some rows don’t get an average. Try out fewer suitable common column combinations but make sure to check the pseudocode as well.
I was able to pass all section 1 tests, but I still would like have a bit of a conceptual discussion about windows joins for clarity. I’ll refer to the parameters as they are labeled in the documentation (Window Joins) and introduce some of my own shorthand.
Here is my current conception of window joins. Speaking generally, suppose we have a table t, c=``c1c2c3`time, and for simplicity we’re performing one aggregation f0 on column c0 in table q. Would a window join with these parameters tell kdb+,
"Okay, if table t has a row r with values v1,v2,v3 for columns c1,c2,c3 respectively, then:
Look in the reference table q for the set of rows also having values v1,v2,v3 for columns c1,c2,c3 respectively (Let’s call this set of rows in q, Rq). The additional specification for this set is that its time column values are within and including the bounds set by the window in w corresponding to row r in t.
Now, for entries in column c0 of Rq, run the aggregation function f0 (Let’s call the result f0(Rq)).
Attach the result f0(Rq) to row r.
Repeat this process for each remaining row in table t.
Return the resulting table (called the window join)."?
If this is true, then if c=sensorIdtime, wouldn’t that according to the above be averaging also across all sessions and lapIds for that sensor (within the given window in w)? Which is not what we want; we want averages by session, lapId, and sensorId; not just sensorId. Therefore I thought that c should be sessionlapId`sensorId``time. I know this isn’t the case, but I’m unsure how to reconcile my current conception. I apologize in advance for the length of my question, but I believe this sort of inquiry requires thoroughness. If you or anyone can offer any guidance, I would be very grateful for your help.
That’s a wonderful question, and you are precisely right. In the question, we would be averaging over all sessions and lapIds. However, in the data we provide to the function, the time windows restrict it so that they already match up the session and lapId. This is why the common columns are specified in the psuedocode, just for consistency when performing the tests.
If you perform the same function with the 2 different sets of common columns, you should find that the data matches up perfectly, except one of the tables has empty values as the prevailing average is ignored.
Thanks so much Luke for your response and confirming that algorithm, you have greatly reassured me. Though, I am not sure what you mean by “the time windows restrict it so that they already match up the session and lapId” - do you mean that, by letting c=sensorId time, this coincidentally happens to create an Rq that matches r also on session and lapId? If so, how could we have known that about the data beforehand?
Thank you for raising this ambiguity and persisting in trying to get to an answer. You are right - the user wouldn’t know this feature. This appears to be a genuine conflict in how the question is asked verses the pseudocode/ unit tests.
We will adjust the latter of these to reflect your point - the join should happen on 4 columns meaning there will be some empty values in resulting.