The dataset indicates whether or not a user clicked on an advertisement on a company website. I created a logistic regression model to predict whether the user will click on an ad based off the features of that user. Dataset:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_style('whitegrid')
ad_data = pd.read_csv('advertising.csv')
ad_data.head(5)
ad_data.info()
ad_data.describe()
Histogram of Age
sns.distplot(ad_data["Age"], bins = 30,kde = False)
Jointplot for Area Income vs Age
sns.jointplot("Age", "Area Income", data = ad_data, kind = 'scatter')
Jointplot to compare Daily Time Spent on the Site and Age
sns.jointplot("Age", "Daily Time Spent on Site", data = ad_data, kind = 'kde')
Jointplot of Daily Time Spent on Site vs Daily Internet Usage
sns.jointplot("Daily Time Spent on Site", "Daily Internet Usage", data = ad_data)
Pairplot for the "Clicked on Ad" field (This takes a while to load)
sns.pairplot(ad_data, hue = "Clicked on Ad", palette = "bright")
Based on the charts above, I would train my model using the following fields:
ad_data.drop(['Timestamp', 'City', 'Country', 'Ad Topic Line'], axis = 1, inplace = True)
ad_data.head(2)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(ad_data.drop('Clicked on Ad', axis = 1), ad_data['Clicked on Ad'], test_size = 0.3, random_state = 101)
from sklearn.linear_model import LogisticRegression
lg = LogisticRegression()
lg.fit(X_train, y_train)
predictions = lg.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))