The Fisher's Iris Data set is a popular multivariate dataset introduced in 1936.
It consists of 50 samples from each of the 3 species of Iris: Iris Setosa, Iris Virginica, and Iris Versicolor.
It provides four features for each sample: the length and width of the sepals and petals.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline 
In [2]:
iris = sns.load_dataset('iris')
In [3]:
iris.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
species         150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB
In [4]:
iris.head()
Out[4]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Exploratory Data Analysis

First, we check the distribution of species by their sepal and petal dimensions

In [5]:
x = sns.FacetGrid(iris, hue = "species", size = 5)
x.map(plt.scatter, "sepal_length", "sepal_width")
Out[5]:
<seaborn.axisgrid.FacetGrid at 0x2ae87569828>
In [6]:
x = sns.FacetGrid(iris, hue = "species", size = 5)
x.map(plt.scatter, "petal_length", "petal_width")
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x2ae87657780>
In [7]:
sns.violinplot(x = "species", y = "petal_length", data = iris)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ae875cc748>

It appears that Iris Setosa is the most distinct out of the three species.

In [8]:
setosa = iris[iris['species']=='setosa']
sns.kdeplot( setosa['sepal_width'], setosa['sepal_length'],cmap="viridis", shade=True, shade_lowest=False)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ae87703da0>

We use the parallel coordinates visualization provided by Pandas to visualize all the four features for the samples

In [9]:
from pandas.tools.plotting import parallel_coordinates
In [10]:
parallel_coordinates(iris, "species")
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ae87b72278>

Since the dataset isn't very big, we can use the grid search algorithm. This technique implements "fitting" on all the possible combinations of parameters and retains the one with the best cross-validation score.

In [11]:
from sklearn.model_selection import train_test_split
In [12]:
X = iris.drop('species',axis=1)
y = iris['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
In [13]:
from sklearn.svm import SVC
In [14]:
from sklearn.model_selection import GridSearchCV
In [15]:
param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001]} 
In [16]:
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)
grid.fit(X_train,y_train)
Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] gamma=1, C=0.1 ..................................................
[CV] ................................... gamma=1, C=0.1, total=   0.0s
[CV] gamma=1, C=0.1 ..................................................
[CV] ................................... gamma=1, C=0.1, total=   0.0s
[CV] gamma=1, C=0.1 ..................................................
[CV] ................................... gamma=1, C=0.1, total=   0.0s
[CV] gamma=0.1, C=0.1 ................................................
[CV] ................................. gamma=0.1, C=0.1, total=   0.0s
[CV] gamma=0.1, C=0.1 ................................................
[CV] ................................. gamma=0.1, C=0.1, total=   0.0s
[CV] gamma=0.1, C=0.1 ................................................
[CV] ................................. gamma=0.1, C=0.1, total=   0.0s
[CV] gamma=0.01, C=0.1 ...............................................
[CV] ................................ gamma=0.01, C=0.1, total=   0.0s
[CV] gamma=0.01, C=0.1 ...............................................
[CV] ................................ gamma=0.01, C=0.1, total=   0.0s
[CV] gamma=0.01, C=0.1 ...............................................
[CV] ................................ gamma=0.01, C=0.1, total=   0.0s
[CV] gamma=0.001, C=0.1 ..............................................
[CV] ............................... gamma=0.001, C=0.1, total=   0.0s
[CV] gamma=0.001, C=0.1 ..............................................
[CV] ............................... gamma=0.001, C=0.1, total=   0.0s
[CV] gamma=0.001, C=0.1 ..............................................
[CV] ............................... gamma=0.001, C=0.1, total=   0.0s
[CV] gamma=1, C=1 ....................................................
[CV] ..................................... gamma=1, C=1, total=   0.0s
[CV] gamma=1, C=1 ....................................................
[CV] ..................................... gamma=1, C=1, total=   0.0s
[CV] gamma=1, C=1 ....................................................
[CV] ..................................... gamma=1, C=1, total=   0.0s
[CV] gamma=0.1, C=1 ..................................................
[CV] ................................... gamma=0.1, C=1, total=   0.0s
[CV] gamma=0.1, C=1 ..................................................
[CV] ................................... gamma=0.1, C=1, total=   0.0s
[CV] gamma=0.1, C=1 ..................................................
[CV] ................................... gamma=0.1, C=1, total=   0.0s
[CV] gamma=0.01, C=1 .................................................
[CV] .................................. gamma=0.01, C=1, total=   0.0s
[CV] gamma=0.01, C=1 .................................................
[CV] .................................. gamma=0.01, C=1, total=   0.0s
[CV] gamma=0.01, C=1 .................................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] .................................. gamma=0.01, C=1, total=   0.0s
[CV] gamma=0.001, C=1 ................................................
[CV] ................................. gamma=0.001, C=1, total=   0.0s
[CV] gamma=0.001, C=1 ................................................
[CV] ................................. gamma=0.001, C=1, total=   0.0s
[CV] gamma=0.001, C=1 ................................................
[CV] ................................. gamma=0.001, C=1, total=   0.0s
[CV] gamma=1, C=10 ...................................................
[CV] .................................... gamma=1, C=10, total=   0.0s
[CV] gamma=1, C=10 ...................................................
[CV] .................................... gamma=1, C=10, total=   0.0s
[CV] gamma=1, C=10 ...................................................
[CV] .................................... gamma=1, C=10, total=   0.0s
[CV] gamma=0.1, C=10 .................................................
[CV] .................................. gamma=0.1, C=10, total=   0.0s
[CV] gamma=0.1, C=10 .................................................
[CV] .................................. gamma=0.1, C=10, total=   0.0s
[CV] gamma=0.1, C=10 .................................................
[CV] .................................. gamma=0.1, C=10, total=   0.0s
[CV] gamma=0.01, C=10 ................................................
[CV] ................................. gamma=0.01, C=10, total=   0.0s
[CV] gamma=0.01, C=10 ................................................
[CV] ................................. gamma=0.01, C=10, total=   0.0s
[CV] gamma=0.01, C=10 ................................................
[CV] ................................. gamma=0.01, C=10, total=   0.0s
[CV] gamma=0.001, C=10 ...............................................
[CV] ................................ gamma=0.001, C=10, total=   0.0s
[CV] gamma=0.001, C=10 ...............................................
[CV] ................................ gamma=0.001, C=10, total=   0.0s
[CV] gamma=0.001, C=10 ...............................................
[CV] ................................ gamma=0.001, C=10, total=   0.0s
[CV] gamma=1, C=100 ..................................................
[CV] ................................... gamma=1, C=100, total=   0.0s
[CV] gamma=1, C=100 ..................................................
[CV] ................................... gamma=1, C=100, total=   0.0s
[CV] gamma=1, C=100 ..................................................
[CV] ................................... gamma=1, C=100, total=   0.0s
[CV] gamma=0.1, C=100 ................................................
[CV] ................................. gamma=0.1, C=100, total=   0.0s
[CV] gamma=0.1, C=100 ................................................
[CV] ................................. gamma=0.1, C=100, total=   0.0s
[CV] gamma=0.1, C=100 ................................................
[CV] ................................. gamma=0.1, C=100, total=   0.0s
[CV] gamma=0.01, C=100 ...............................................
[CV] ................................ gamma=0.01, C=100, total=   0.0s
[CV] gamma=0.01, C=100 ...............................................
[CV] ................................ gamma=0.01, C=100, total=   0.0s
[CV] gamma=0.01, C=100 ...............................................
[CV] ................................ gamma=0.01, C=100, total=   0.0s
[CV] gamma=0.001, C=100 ..............................................
[CV] ............................... gamma=0.001, C=100, total=   0.0s
[CV] gamma=0.001, C=100 ..............................................
[CV] ............................... gamma=0.001, C=100, total=   0.0s
[CV] gamma=0.001, C=100 ..............................................
[CV] ............................... gamma=0.001, C=100, total=   0.0s
[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:    0.4s finished
Out[16]:
GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'gamma': [1, 0.1, 0.01, 0.001], 'C': [0.1, 1, 10, 100]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=2)
In [17]:
grid_predictions = grid.predict(X_test)
In [18]:
from sklearn.metrics import classification_report
In [19]:
print(classification_report(y_test,grid_predictions))
             precision    recall  f1-score   support

     setosa       1.00      1.00      1.00        14
 versicolor       0.94      0.94      0.94        17
  virginica       0.93      0.93      0.93        14

avg / total       0.96      0.96      0.96        45

In [ ]:
 
In [ ]: