Scikit-learn (sklearn) is a popular Python library for machine learning. It provides simple and efficient tools for data mining and analysis. Whether you're a beginner or an experienced data scientist, Scikit-learn makes it easy to implement machine learning algorithms.
In this blog post, we'll explore the basics of Scikit-learn using an example that classifies flowers using the famous Iris dataset and a Decision Tree model.
What is Scikit-Learn?
Scikit-learn is built on top of popular scientific computing libraries like NumPy and SciPy. It provides tools for:
Classification
Regression
Clustering
Dimensionality reduction
Model selection
Preprocessing
Key Features of Scikit-Learn:
Simple and efficient API
Well-documented and easy to use
Supports various machine learning models
Works well with other Python libraries
Now, let’s dive into an example using a Decision Tree classifier to predict flower species based on their features.
ebook - Unlocking AI: A Simple Guide for Beginners
Find below the sample code. You can run it on Google Colab.
from sklearn import datasets
Understanding the Code Step-by-Step
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
1. Importing Required Libraries
datasets
: Provides access to various built-in datasets, including the Iris dataset.train_test_split
: Splits data into training and testing sets.DecisionTreeClassifier
: A machine learning model used for classification tasks.accuracy_score
: Measures the accuracy of our model.
# Load dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data # Features (sepal & petal length/width)
y = iris.target # Labels (Flower types)
2. Loading the Dataset
The Iris dataset is a well-known dataset in machine learning.
It contains 150 samples of iris flowers categorized into three species: Setosa, Versicolor, and Virginica.
X
contains the features (sepal and petal dimensions).y
contains the target labels (flower species).
# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
3. Splitting the Data
The dataset is split into training (80%) and testing (20%) sets.
The
random_state=42
ensures that the split is consistent each time the code runs.
# Choose a model (Decision Tree)
model = DecisionTreeClassifier()
4. Selecting a Model
We choose the
DecisionTreeClassifier
, a simple yet effective model for classification tasks.
# Train the model
model.fit(X_train, y_train)
5. Training the Model
The
.fit()
function trains the model using the training data.
# Make predictions
y_pred = model.predict(X_test)
6. Making Predictions
The trained model makes predictions on the test set.
# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
7. Evaluating the Model
accuracy_score()
compares predicted values with actual values to measure accuracy.The accuracy score is printed as a percentage.
# Predict a new sample
new_sample = [[2.3, 2.8, 0.1, 1.5]] # Example flower
prediction = model.predict(new_sample)
print(f"Predicted class: {iris.target_names[prediction[0]]}")
8. Predicting a New Sample
A new flower sample is provided as input.
The model predicts the class of the flower.
The
target_names
array is used to display the predicted species name.
If you're new to machine learning, experimenting with different models and datasets in Scikit-learn is a great way to build your skills. Try modifying the dataset, adjusting model parameters, and testing different classifiers to see how they perform!
No comments:
Post a Comment