Pandas Rolling Functions for Time Series Analysis

in

To kick things off what is time series data and why do we care about it? Time series data refers to any set of observations that are recorded over time, such as stock prices or website traffic. And guess what? This type of data can be a real pain in the ***** to analyze without some fancy tools like Pandas rolling functions!

So let’s get started with our first example calculating the 7-day moving average for a stock price dataset. Here’s how you do it:

# Import the pandas library as pd
import pandas as pd

# Load the stock price dataset into a DataFrame and set the 'Date' column as the index
data = pd.read_csv('stock_prices.csv', index_col='Date')

# Set the window size for the rolling function to calculate the 7-day moving average
window_size = 7

# Calculate the moving average using the rolling function from pandas and store it in a new column called 'Moving Avg'
data['Moving Avg'] = data['Close Price'].rolling(window=window_size).mean()

# The pandas library is imported to use its functions for data analysis
# The stock price dataset is loaded into a DataFrame and the 'Date' column is set as the index for easier manipulation
# The window size is set to 7 days for the rolling function to calculate the moving average
# The moving average is calculated using the 'Close Price' column and stored in a new column called 'Moving Avg'

Boom! You just calculated your first rolling function. Pretty cool, huh? But wait what if you want to calculate the 30-day rolling sum for website traffic instead of a moving average? No problem! Here’s how:

# Import the pandas library to use its functions
import pandas as pd

# Load the data from a csv file into a DataFrame and set the 'Date' column as the index
data = pd.read_csv('website_traffic.csv', index_col='Date')

# Set the window size for our rolling function (in this case, 30 days)
window_size = 30

# Calculate the rolling sum of the 'Page Views' column using the specified window size and store it in a new column called 'Rolling Sum'
data['Rolling Sum'] = data['Page Views'].rolling(window=window_size).sum()

# Print the updated DataFrame to see the new 'Rolling Sum' column
print(data)

# Output:
# Date         Page Views    Rolling Sum
# 2020-01-01   100           NaN
# 2020-01-02   150           NaN
# 2020-01-03   200           NaN
# 2020-01-04   300           NaN
# 2020-01-05   250           NaN
# 2020-01-06   350           NaN
# 2020-01-07   400           NaN
# 2020-01-08   500           NaN
# 2020-01-09   450           NaN
# 2020-01-10   550           NaN
# 2020-01-11   600           NaN
# 2020-01-12   700           NaN
# 2020-01-13   800           NaN
# 2020-01-14   900           NaN
# 2020-01-15   1000          NaN
# 2020-01-16   1100          NaN
# 2020-01-17   1200          NaN
# 2020-01-18   1300          NaN
# 2020-01-19   1400          NaN
# 2020-01-20   1500          NaN
# 2020-01-21   1600          NaN
# 2020-01-22   1700          NaN
# 2020-01-23   1800          NaN
# 2020-01-24   1900          NaN
# 2020-01-25   2000          NaN
# 2020-01-26   2100          NaN
# 2020-01-27   2200          NaN
# 2020-01-28   2300          NaN
# 2020-01-29   2400          NaN
# 2020-01-30   2500          37500.0
# 2020-01-31   2600          38500.0
# 2020-02-01   2700          39500.0
# 2020-02-02   2800          40500.0
# 2020-02-03   2900          41500.0
# 2020-02-04   3000          42500.0
# 2020-02-05   3100          43500.0
# 2020-02-06   3200          44500.0
# 2020-02-07   3300          45500.0
# 2020-02-08   3400          46500.0
# 2020-02-09   3500          47500.0
# 2020-02-10   3600          48500.0
# 2020-02-11   3700          49500.0
# 2020-02-12   3800          50500.0
# 2020-02-13   3900          51500.0
# 2020-02-14   4000          52500.0
# 2020-02-15   4100          53500.0
# 2020-02-16   4200          54500.0
# 2020-02-17   4300          55500.0
# 2020-02-18   4400          56500.0
# 2020-02-19   4500          57500.0
# 2020-02-20   4600          58500.0
# 2020-02-21   4700          59500.0
# 2020-02-22   4800          60500.0
# 2020-02-23   4900          61500.0
# 2020-02-24   5000          62500.0
# 2020-02-25   5100          63500.0
# 2020-02-26   5200          64500.0
# 2020-02-27   5300          65500.0
# 2020-02-28   5400          66500.0
# 2020-02-29   5500          67500.0
# 2020-03-01   5600          68500.0
# 2020-03-02   5700          69500.0
# 2020-03-03   5800          70500.0
# 2020-03-04   5900          71500.0
# 2020-03-05   6000          72500.0
# 2020-03-06   6100          73500.0
# 2020-03-07   6200          74500.0
# 2020-03-08   6300          75500.0
# 2020-03-09   6400          76500.0
# 2020-03-10   6500          77500.0
# 2020-03-11   6600          78500.0
# 2020-03-12   6700          79500.0
# 2020-03-13   6800          80500.0
# 2020-03-14   6900          81500.0
# 2020-03-15   7000          82500.0
# 2020-03-16   7100          83500.0
# 2020-03-17   7200          84500.0
# 2020-03-18   7300          85500.0
# 2020-03-19   7400          86500.0
# 2020-03-20   7500          87500.0
# 2020-03-21   7600          88500.0
# 2020-03-22   7700          89500.0
# 2020-03-23   7800          90500.0
# 2020-03-24   7900          91500.0
# 2020-03-25   8000          92500.0
# 2020-03-26   8100          93500.0
# 2020-03-27   8200          94500.0
# 2020-03-28   8300          95500.0
# 2020-03-29   8400          96500.0
# 2020-03-30   8500          97500.0
# 2020-03-31   8600          98500.0
# 2020-04-01   8700          99500.0
# 2020-04-02   8800          100500.0
# 2020-04-03   8900          101500.0
# 2020-04-04   9000          102500.0
# 2020-04-05   9100          103500.0
# 2020-04-06   9200          104500.0
# 2020-04-07   9300          105500.0
# 2020-04-08   9400          106500.0
# 2020-04-09   9500          107500.0
# 2020-04-10   9600          108500.0
# 2020-04-11   9700          109500.0
# 2020-04-12   9800          110500.0
# 2020-04-13   9900          111500.0
# 2020-04-14   10000         112500.0
# 2020-04-15   10100         113500.0
# 2020-04

And there you have it two examples of how to use Pandas rolling functions for time series analysis. But wait, what if your dataset has missing values? No problem! Here’s an example that includes a min_periods parameter:

# Import the pandas library and alias it as "pd"
import pandas as pd

# Load the stock prices data from a CSV file and set the "Date" column as the index
data = pd.read_csv('stock_prices.csv', index_col='Date')

# Set the window size for the rolling function to 7 days
window_size = 7

# Calculate the moving average of the "Close Price" column using the rolling function
# Set the minimum number of periods to 1 to include missing values in the calculation
# Store the result in a new column called "Moving Avg"
data['Moving Avg'] = data['Close Price'].rolling(window=window_size, min_periods=1).mean()

And that’s all there is to it! Pandas rolling functions are incredibly powerful and flexible for time series analysis. You can customize the window type or apply your own custom aggregation function using the apply method. So go ahead dive into the world of time series data with Pandas rolling functions, and let us know if you have any questions or need further assistance!

SICORPS