Hello q folks,
I’ve implemented a linear regression model (with L2 regression) on some Kaggle competition data - particularly this one - https://www.kaggle.com/c/house-prices-advanced-regression-techniques (named “House Prices-Advanced regression techniques”).
The competition is for predicting housing prices based on 79 descriptive features of homes.
The code for data wrangling and regression(training and testing) - is at : https://github.com/krish240574/kaggle-deandecock/tree/master. You will need to download the train and test datasets from Kaggle.
I submitted the results to Kaggle using the code and have a score of 0.33 on the leaderboard. What remains is the following :
-
more stringent feature engineering - I’m using all features now, need to use them based on correlation values with SalePrice(the output). I might need to drop some,
-
XGBoost - I’ll probably implement code for this in the next few days, in q.
-
Some data in the training set is skewed, I’ll need to handle that.
-
Lasso regression - I have code for that in my repo, will integrate that.
The code runs for 1000 iterations, feel free to change to as many as you want.
Regards,
Kumar