How to Train a Machine Learning Model in Python

in

Let’s get right into it with the world of Python and learn how to train our very own model!

To kick things off: let’s install all the necessary libraries. We need NumPy, Pandas, Scikit-Learn, and Matplotlib for this journey. Don’t worry if you don’t know what any of those mean we’ll explain it as we go along!

# Import necessary libraries
import numpy as np # Importing NumPy library for scientific computing
import pandas as pd # Importing Pandas library for data manipulation and analysis
import sklearn # Importing Scikit-Learn library for machine learning algorithms
import matplotlib.pyplot as plt # Importing Matplotlib library for data visualization

# Install required libraries
!pip install numpy pandas scikit-learn matplotlib # Using pip to install the necessary libraries for the project

Now that our tools are ready, let’s load in some data. We’re going to use the famous Iris dataset for this tutorial it contains measurements of different iris flowers and their species.

# Import necessary libraries
import pandas as pd # Importing the pandas library and assigning it an alias "pd" for easier referencing later on
from sklearn.model_selection import train_test_split # Importing the train_test_split function from the sklearn.model_selection library
from sklearn.metrics import accuracy_score # Importing the accuracy_score function from the sklearn.metrics library
from sklearn.linear_model import LogisticRegression # Importing the LogisticRegression model from the sklearn.linear_model library
import matplotlib.pyplot as plt # Importing the pyplot module from the matplotlib library and assigning it an alias "plt" for easier referencing later on

# Load data from CSV file
data = pd.read_csv('iris.csv') # Using the read_csv function from the pandas library to read in the data from the 'iris.csv' file and assigning it to the variable "data"

Alright, now that we have our data loaded in, let’s take a look at what it looks like!

# Print first five rows of dataset
print(data.head()) # This line prints the first five rows of the dataset using the head() function. This allows us to quickly preview the data and get an idea of its structure and contents.

This should give us something like this:

| sepal length | sepal width | petal length | petal width | species |
|——————|————–|—————|————-|————–|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |

As you can see, we have four features (sepal length, sepal width, petal length, and petal width) and a target variable (species). Our goal is to train our model to predict the species based on these features!

Now that we’ve loaded in our data, let’s split it into training and testing sets. We’ll use 80% of the data for training and 20% for testing. This will help us evaluate how well our model performs on new, unseen data!

# Split dataset into training and testing sets
# X is a dataframe containing the features we want to use for prediction
# y is a series containing the target variable we want to predict
# train_x, test_x, train_y, test_y are variables that will store the split data
# test_size=0.2 means we want to use 20% of the data for testing and 80% for training
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)

# The train_test_split function from the sklearn library splits the data into training and testing sets
# The first parameter (X) is the data we want to split, the second parameter (y) is the target variable
# The test_size parameter specifies the percentage of data we want to use for testing
# The remaining data will be used for training the model

Alright, now that we have our training and testing sets ready, let’s create a Logistic Regression model!

# create logistic regression model
# Import the LogisticRegression class from the sklearn library
from sklearn.linear_model import LogisticRegression

# Create an instance of the LogisticRegression class and assign it to the variable 'model'
model = LogisticRegression()

Next, let’s fit the model to our training data using the `fit()` method. This will update the weights of our model based on the input features and target variable!

# Fit the model to the training data using the `fit()` method
# This will update the weights of our model based on the input features and target variable

# Import the necessary library for logistic regression
from sklearn.linear_model import LogisticRegression

# Create an instance of the logistic regression model
model = LogisticRegression()

# Train the model on the training data
# `train_x` represents the input features and `train_y` represents the target variable
model.fit(train_x, train_y)

Now that we have trained our model, let’s test it on some new data to see how well it performs! We’ll use the `predict()` method to make predictions based on the input features and then compare them to the actual target variable.

# Predict species for testing set using trained model
predictions = model.predict(test_x) # Calls the `predict()` method on the trained model, passing in the testing data `test_x` as input. This will generate predictions for the target variable based on the input features.
# The `predictions` variable will store the output of the `predict()` method, which will be a list of predicted values for the target variable.

Finally, let’s evaluate our model by calculating its accuracy score! This will give us an idea of how well it performs on new data that it hasn’t seen before.

# Calculate accuracy score for testing set using trained model
# Import necessary library
from sklearn.metrics import accuracy_score

# Calculate accuracy score by comparing test_y and predictions
accuracy = accuracy_score(test_y, predictions)

# Print the accuracy score in percentage
print("Accuracy: ", round(accuracy * 100))

# The code above calculates the accuracy score for the testing set using a trained model. 
# First, the necessary library, sklearn.metrics, is imported. 
# Then, the accuracy score is calculated by comparing the actual values in test_y with the predicted values in predictions. 
# Finally, the accuracy score is printed in percentage by rounding it to the nearest whole number.

We’ve successfully trained a machine learning model in Python and evaluated its performance on new data. Who knew it could be so easy?

SICORPS