Scikit-learn (sklearn) is a popular Python library for machine learning. It provides simple and efficient tools for data mining and analysis. Whether you're a beginner or an experienced data scientist, Scikit-learn makes it easy to implement machine learning algorithms.
In this blog post, we'll explore the basics of Scikit-learn using an example that classifies flowers using the famous Iris dataset and a Decision Tree model.
What is Scikit-Learn?
Scikit-learn is built on top of popular scientific computing libraries like NumPy and SciPy. It provides tools for:
- Classification 
- Regression 
- Clustering 
- Dimensionality reduction 
- Model selection 
- Preprocessing 
Key Features of Scikit-Learn:
- Simple and efficient API 
- Well-documented and easy to use 
- Supports various machine learning models 
- Works well with other Python libraries 
Now, let’s dive into an example using a Decision Tree classifier to predict flower species based on their features.
ebook - Unlocking AI: A Simple Guide for Beginners
Find below the sample code. You can run it on Google Colab.
from sklearn import datasets
Understanding the Code Step-by-Step
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score1. Importing Required Libraries
- datasets: Provides access to various built-in datasets, including the Iris dataset.
- train_test_split: Splits data into training and testing sets.
- DecisionTreeClassifier: A machine learning model used for classification tasks.
- accuracy_score: Measures the accuracy of our model.
# Load dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data  # Features (sepal & petal length/width)
y = iris.target  # Labels (Flower types)2. Loading the Dataset
- The Iris dataset is a well-known dataset in machine learning. 
- It contains 150 samples of iris flowers categorized into three species: Setosa, Versicolor, and Virginica. 
- Xcontains the features (sepal and petal dimensions).
- ycontains the target labels (flower species).
# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)3. Splitting the Data
- The dataset is split into training (80%) and testing (20%) sets. 
- The - random_state=42ensures that the split is consistent each time the code runs.
# Choose a model (Decision Tree)
model = DecisionTreeClassifier()4. Selecting a Model
- We choose the - DecisionTreeClassifier, a simple yet effective model for classification tasks.
# Train the model
model.fit(X_train, y_train)5. Training the Model
- The - .fit()function trains the model using the training data.
# Make predictions
y_pred = model.predict(X_test)6. Making Predictions
- The trained model makes predictions on the test set. 
# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")7. Evaluating the Model
- accuracy_score()compares predicted values with actual values to measure accuracy.
- The accuracy score is printed as a percentage. 
# Predict a new sample
new_sample = [[2.3, 2.8, 0.1, 1.5]]  # Example flower
prediction = model.predict(new_sample)
print(f"Predicted class: {iris.target_names[prediction[0]]}")8. Predicting a New Sample
- A new flower sample is provided as input. 
- The model predicts the class of the flower. 
- The - target_namesarray is used to display the predicted species name.
If you're new to machine learning, experimenting with different models and datasets in Scikit-learn is a great way to build your skills. Try modifying the dataset, adjusting model parameters, and testing different classifiers to see how they perform!
 

No comments:
Post a Comment