Scikit-Learn (sklearn) Example |QualityPoint Technologies (QPT)

Wednesday, February 12, 2025

Scikit-Learn (sklearn) Example

Scikit-learn (sklearn) is a popular Python library for machine learning. It provides simple and efficient tools for data mining and analysis. Whether you're a beginner or an experienced data scientist, Scikit-learn makes it easy to implement machine learning algorithms.

In this blog post, we'll explore the basics of Scikit-learn using an example that classifies flowers using the famous Iris dataset and a Decision Tree model.

What is Scikit-Learn?

Scikit-learn is built on top of popular scientific computing libraries like NumPy and SciPy. It provides tools for:

Classification
Regression
Clustering
Dimensionality reduction
Model selection
Preprocessing

Key Features of Scikit-Learn:

Simple and efficient API
Well-documented and easy to use
Supports various machine learning models
Works well with other Python libraries

Now, let’s dive into an example using a Decision Tree classifier to predict flower species based on their features.

ebook - Unlocking AI: A Simple Guide for Beginners

Find below the sample code. You can run it on Google Colab.

from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data  # Features (sepal & petal length/width)
y = iris.target  # Labels (Flower types)

# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Choose a model (Decision Tree)
model = DecisionTreeClassifier()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Predict a new sample
new_sample = [[2.3, 2.8, 0.1, 1.5]]  # Example flower
prediction = model.predict(new_sample)
print(f"Predicted class: {iris.target_names[prediction[0]]}")

Understanding the Code Step-by-Step

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

1. Importing Required Libraries

datasets: Provides access to various built-in datasets, including the Iris dataset.
train_test_split: Splits data into training and testing sets.
DecisionTreeClassifier: A machine learning model used for classification tasks.
accuracy_score: Measures the accuracy of our model.

# Load dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data  # Features (sepal & petal length/width)
y = iris.target  # Labels (Flower types)

2. Loading the Dataset

The Iris dataset is a well-known dataset in machine learning.
It contains 150 samples of iris flowers categorized into three species: Setosa, Versicolor, and Virginica.
X contains the features (sepal and petal dimensions).
y contains the target labels (flower species).

# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Splitting the Data

The dataset is split into training (80%) and testing (20%) sets.
The random_state=42 ensures that the split is consistent each time the code runs.

# Choose a model (Decision Tree)
model = DecisionTreeClassifier()

4. Selecting a Model

We choose the DecisionTreeClassifier, a simple yet effective model for classification tasks.

# Train the model
model.fit(X_train, y_train)

5. Training the Model

The .fit() function trains the model using the training data.

# Make predictions
y_pred = model.predict(X_test)

6. Making Predictions

The trained model makes predictions on the test set.

# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

7. Evaluating the Model

accuracy_score() compares predicted values with actual values to measure accuracy.
The accuracy score is printed as a percentage.

# Predict a new sample
new_sample = [[2.3, 2.8, 0.1, 1.5]]  # Example flower
prediction = model.predict(new_sample)
print(f"Predicted class: {iris.target_names[prediction[0]]}")

8. Predicting a New Sample

A new flower sample is provided as input.
The model predicts the class of the flower.
The target_names array is used to display the predicted species name.

Scikit-learn simplifies machine learning implementation with its user-friendly API. In this example, we used a Decision Tree classifier to predict flower species based on their features. This is just the beginning—Scikit-learn supports many other powerful models and techniques for machine learning tasks.

If you're new to machine learning, experimenting with different models and datasets in Scikit-learn is a great way to build your skills. Try modifying the dataset, adjusting model parameters, and testing different classifiers to see how they perform!

QualityPoint Technologies (QPT)

Wednesday, February 12, 2025