Original
Matplotlib is a powerful library in Python that allows us to create various types of charts and graphs. It’s commonly used by data scientists, statisticians, and researchers to analyze and present their findings. In this tutorial, we will explore some basic concepts and techniques for using Matplotlib to visualize data.
First, let’s import the necessary libraries:
# Import the pandas library and alias it as "pd"
import pandas as pd
# Import the pyplot module from the matplotlib library and alias it as "plt"
import matplotlib.pyplot as plt
# Use the magic command "%matplotlib inline" to display plots within the Jupyter Notebook
%matplotlib inline
Next, we will load a dataset from CSV file using Pandas library and store it in a variable called `df`. We can then use Matplotlib to create various types of charts based on this data.
For example, let’s say we have a dataset that contains information about the fuel efficiency (in miles per gallon) for different car models. Let’s plot a scatter plot using Matplotlib:
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# Loading CSV file into Pandas DataFrame
df = pd.read_csv('car_data.csv') # Reads the CSV file and stores it in a variable called 'df'
# Creating Scatter Plot with x-axis as number of cylinders and y-axis as fuel efficiency (MPG)
plt.scatter(x=df['n_Cylinders'], y=df['mpg']) # Plots a scatter plot using the data from the 'n_Cylinders' and 'mpg' columns in the DataFrame
plt.title('Scatter Plot: Fuel Efficiency vs Number of Cylinders') # Sets the title of the plot
plt.ylabel('Fuel Efficiency (MPG)') # Sets the label for the y-axis
plt.xlabel('Number of Cylinders') # Sets the label for the x-axis
plt.show() # Displays the plot on the screen
This will create a scatter plot with x-axis as number of cylinders and y-axis as fuel efficiency (MPG). We can also add labels to the axes, title, and show the chart using `plt.show()`.
Another useful type of visualization is histograms which allow us to see the distribution of data in a more detailed way. Let’s say we have another dataset that contains information about the price range for different car models:
# Loading CSV file into Pandas DataFrame
df = pd.read_csv('car_data.csv') # reads the csv file and stores it in a pandas dataframe called 'df'
# Creating Histogram with x-axis as Price Range and y-axis as Frequency of Occurrence
plt.hist(x=df['price'], bins='auto', color='blue') # creates a histogram with the x-axis as the 'price' column from the dataframe and automatically determines the number of bins, with the bars colored blue
plt.title('Histogram: Price Range for Car Models') # sets the title of the histogram
plt.ylabel('Frequency of Occurrence') # sets the label for the y-axis
plt.xlabel('Price Range ($)') # sets the label for the x-axis
plt.show() # displays the histogram
This will create a histogram with x-axis as price range and y-axis as frequency of occurrence. We can also add labels to the axes, title, and show the chart using `plt.show()`.