Machine learning and KDB+ - Kaggle competition.

Hello q folks, 

I’ve implemented a linear regression model (with L2 regression) on some Kaggle competition data - particularly this one - https://www.kaggle.com/c/house-prices-advanced-regression-techniques (named “House Prices-Advanced regression techniques”).

The competition is for predicting housing prices based on 79 descriptive features of homes. 

The code for data wrangling and regression(training and testing) - is at : https://github.com/krish240574/kaggle-deandecock/tree/master. You will need to download the train and test datasets from Kaggle.

I submitted the results to Kaggle using the code and have a score of 0.33 on the leaderboard. What remains is the following :

  • more stringent feature engineering - I’m using all features now, need to use them based on correlation values with SalePrice(the output). I might need to drop some, 

  • XGBoost - I’ll probably implement code for this in the next few days, in q. 

  • Some data in the training set is skewed, I’ll need to handle that. 

  • Lasso regression - I have code for that in my repo, will integrate that. 

The code runs for 1000 iterations, feel free to change to as many as you want. 

Regards, 

Kumar