Heritage Housing Issues

Category:

Machine Learning

Project For:

Code Institute

Duration:

4 weeks

Heritage Housing Issues

In this project, I've built a Data App with a Machine Learning User Interface (UI) combining (1) Python packages for Machine Learning, Data Analysis and Data Visualization and (2) Streamlit for fast Machine Learning prototyping. This project was created for my last milestone project at Code Institute to showcase my knowledge in performing critical data analysis to generate valuable insights and deliver data-driven recommendations.

The project immerses you into an environment that fully reflects professional business requirements. It achieves this by encouraging you to reflect on the "whys" and the "hows" of a Machine Learning system that delivers tangible value to your organization. The UI and the data analysis are conducted in a way that aligns with the business requirements.


Dataset Content

The dataset is sourced from Kaggle. We created a fictitious user story in which predictive analytics can be applied to a real project in the workplace.

The dataset has almost 1.5 thousand rows and represents housing records from Ames, Iowa, indicating house profile (Floor Area, Basement, Garage, Kitchen, Lot, Porch, Wood Deck, Year Built) and its respective sale price for houses built between 1872 and 2010.


Business requirements

As a good friend, you are requested by your friend, who has received an inheritance from a deceased great-grandfather located in Ames, Iowa, to help in maximizing the sales price for the inherited properties.

Although your friend has an excellent understanding of property prices in her own state and residential area, she fears that basing her estimates for property worth on her current knowledge might lead to inaccurate appraisals. What makes a house desirable and valuable where she comes from might not be the same in Ames, Iowa. She found a public dataset with house prices for Ames, Iowa, and will provide you with that:

  1. The client is interested in discovering how the house attributes correlate with the sale price. Therefore, the client expects data visualizations of the correlated variables against the sale price to show that.

  2. The client is interested in predicting the sales price of her four inherited houses and any other house in Ames, Iowa.


Hypothesis and how to validate

Hypothesis One

Suspect houses with larger square footing may have had a higher sales price.

  • A Correlation study can help in this investigation.

Hypothesis Two

Suspect that between houses with similar square footing, those with a more recent Year Built date may have had a higher sales price.

  • A Correlation study can help in this investigation.

Hypothesis Three

Suspect that between houses with similar square footing and year built date, those with a more recent Remodel date may have had a higher sales price.

  • A Correlation study can help in this investigation.

Hypothesis Four

Suspect that between houses with similar square footing, those with higher quality and condition scores may have had a higher sales price.

  • A Correlation study can help in this investigation.


Rationale to map the business requirements to the Data Visualizations and ML tasks

Business Requirement 1: Data Visualization and Correlation Study
  • We will inspect the data related to the houses.

  • We will conduct a correlation study (Pearson and Spearman) to understand better how the variables are correlated to Sale Price.

  • We will plot the main variables against the Sale Price to visualize insights.

  • As a client, I want to inspect the data related to the house records to discover how the house attributes correlate with the sale price.

  • As a client, I want to conduct a correlation study(Pearson and Spearman) to better understand how the variables are correlated to the Sale Price so that I can discover how the house attributes correlate with the sale price.

  • As a client, I want to plot the main variables against the Sale Price to Visualize insights and discover how the house attributes correlate with the Sale Price.

Business Requirement 2: Classification, Regression, Cluster, Data Analysis
  • We want to predict the value of a house. We want to build a regression model to predict the dependent variable.

  • We want to make plots to visualize the train, and test sets predictions vs the actual.

  • We want to run a regression evaluation to demonstrate the R2 Score and Mean Absolute Error.

  • As a client, I want to predict the Sale Price for a given house. We want to build an ML Model so the client can predict the house Sales Price from her four inherited dwellings and any other home in Ames, Iowa.

  • As a client, want to build a regression model or change the ML task to classification depending on the regressor performance.


ML Business Case

Predict Sale Price
Regression Model
  • We want an ML model to predict the sale price of a house. A target variable is a serial number. We consider a regression model, which is supervised and uni-dimensional.

  • Our ideal outcome is to provide our client with reliable insight into what sale price she should expect for her inherited houses.

  • The model success metrics are

    • At least 0.7 for R2 score, on train and test set

  • The ML model is considered a failure if:

    • After 12 months of usage, the model's predictions are 50% off more than 30% of the time. Say a prediction is >50% off if predicted ten months and the actual value was two months.

  • The output should be a constant value for the sale price.