This is Data Science Regression case study and aims to show how to apply data science tools on real project in Python. For better understanding and clarification the blog has been organized as separate parts (like data understanding, EDA, modelling, etc.). Both code and explanation have been provided.
Hope this blog will contribute your knowledge at your Data Science journey.
You are a data scientist of an automobile consulting company and you have been given data about cars of different manufactures. Your task is to analyze the factors affecting the price and give the meaningful insights about the case.
Let’s start by importing dataset to the notebook.
I have assigned the imported csv file to “data” variable and then created its copy. This is to keep the original dataset pure. You can think that the excel file is always there and safe. But in the real works, you can work with the source of data and you may do permanent changes and spoil the data. As data collection is very costly and time consuming task, it will harm your company or your project. That is why, always try to create the habit of saving original data during working.
import pandas as pddata = pd.read_csv('CarPrice_Assignment.csv')
df = data.copy()
Pandas library provides wide range of tools to help you to get familiar with data. For initial steps I use head and info.
- head function shows first five (you can give any number, five is default)rows of the dataframe.
pd.options.display.max_columns = None # to print all columns
2. info function returns number of non — null values and data types of each column. It is the simplest method to see if the data contains null values.
The steps of data science projects haven’t been decided by an authority or a community. It is the natural flow of projects. Which means…