Wednesday, February 12, 2025

Scikit-Learn (sklearn) Example


Scikit-learn (sklearn) is a popular Python library for machine learning. It provides simple and efficient tools for data mining and analysis. Whether you're a beginner or an experienced data scientist, Scikit-learn makes it easy to implement machine learning algorithms.

In this blog post, we'll explore the basics of Scikit-learn using an example that classifies flowers using the famous Iris dataset and a Decision Tree model.

What is Scikit-Learn?

Scikit-learn is built on top of popular scientific computing libraries like NumPy and SciPy. It provides tools for:

  • Classification

  • Regression

  • Clustering

  • Dimensionality reduction

  • Model selection

  • Preprocessing

Key Features of Scikit-Learn:

  • Simple and efficient API

  • Well-documented and easy to use

  • Supports various machine learning models

  • Works well with other Python libraries

Now, let’s dive into an example using a Decision Tree classifier to predict flower species based on their features.

ebook - Unlocking AI: A Simple Guide for Beginners 

Find below the sample code. You can run it on Google Colab.

 from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data # Features (sepal & petal length/width)
y = iris.target # Labels (Flower types)

# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Choose a model (Decision Tree)
model = DecisionTreeClassifier()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Predict a new sample
new_sample = [[2.3, 2.8, 0.1, 1.5]] # Example flower
prediction = model.predict(new_sample)
print(f"Predicted class: {iris.target_names[prediction[0]]}")


Understanding the Code Step-by-Step

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

1. Importing Required Libraries

  • datasets: Provides access to various built-in datasets, including the Iris dataset.

  • train_test_split: Splits data into training and testing sets.

  • DecisionTreeClassifier: A machine learning model used for classification tasks.

  • accuracy_score: Measures the accuracy of our model.

# Load dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data  # Features (sepal & petal length/width)
y = iris.target  # Labels (Flower types)

2. Loading the Dataset

  • The Iris dataset is a well-known dataset in machine learning.

  • It contains 150 samples of iris flowers categorized into three species: Setosa, Versicolor, and Virginica.

  • X contains the features (sepal and petal dimensions).

  • y contains the target labels (flower species).

# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Splitting the Data

  • The dataset is split into training (80%) and testing (20%) sets.

  • The random_state=42 ensures that the split is consistent each time the code runs.

# Choose a model (Decision Tree)
model = DecisionTreeClassifier()

4. Selecting a Model

  • We choose the DecisionTreeClassifier, a simple yet effective model for classification tasks.

# Train the model
model.fit(X_train, y_train)

5. Training the Model

  • The .fit() function trains the model using the training data.

# Make predictions
y_pred = model.predict(X_test)

6. Making Predictions

  • The trained model makes predictions on the test set.

# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

7. Evaluating the Model

  • accuracy_score() compares predicted values with actual values to measure accuracy.

  • The accuracy score is printed as a percentage.

# Predict a new sample
new_sample = [[2.3, 2.8, 0.1, 1.5]]  # Example flower
prediction = model.predict(new_sample)
print(f"Predicted class: {iris.target_names[prediction[0]]}")

8. Predicting a New Sample

  • A new flower sample is provided as input.

  • The model predicts the class of the flower.

  • The target_names array is used to display the predicted species name.


Scikit-learn simplifies machine learning implementation with its user-friendly API. In this example, we used a Decision Tree classifier to predict flower species based on their features. This is just the beginning—Scikit-learn supports many other powerful models and techniques for machine learning tasks.

If you're new to machine learning, experimenting with different models and datasets in Scikit-learn is a great way to build your skills. Try modifying the dataset, adjusting model parameters, and testing different classifiers to see how they perform!


ebook - Unlocking AI: A Simple Guide for Beginners 

No comments:

Search This Blog