To kick things off what is time series data and why do we care about it? Time series data refers to any set of observations that are recorded over time, such as stock prices or website traffic. And guess what? This type of data can be a real pain in the ***** to analyze without some fancy tools like Pandas rolling functions!
So let’s get started with our first example calculating the 7-day moving average for a stock price dataset. Here’s how you do it:
# Import the pandas library as pd
import pandas as pd
# Load the stock price dataset into a DataFrame and set the 'Date' column as the index
data = pd.read_csv('stock_prices.csv', index_col='Date')
# Set the window size for the rolling function to calculate the 7-day moving average
window_size = 7
# Calculate the moving average using the rolling function from pandas and store it in a new column called 'Moving Avg'
data['Moving Avg'] = data['Close Price'].rolling(window=window_size).mean()
# The pandas library is imported to use its functions for data analysis
# The stock price dataset is loaded into a DataFrame and the 'Date' column is set as the index for easier manipulation
# The window size is set to 7 days for the rolling function to calculate the moving average
# The moving average is calculated using the 'Close Price' column and stored in a new column called 'Moving Avg'
Boom! You just calculated your first rolling function. Pretty cool, huh? But wait what if you want to calculate the 30-day rolling sum for website traffic instead of a moving average? No problem! Here’s how:
# Import the pandas library to use its functions
import pandas as pd
# Load the data from a csv file into a DataFrame and set the 'Date' column as the index
data = pd.read_csv('website_traffic.csv', index_col='Date')
# Set the window size for our rolling function (in this case, 30 days)
window_size = 30
# Calculate the rolling sum of the 'Page Views' column using the specified window size and store it in a new column called 'Rolling Sum'
data['Rolling Sum'] = data['Page Views'].rolling(window=window_size).sum()
# Print the updated DataFrame to see the new 'Rolling Sum' column
print(data)
# Output:
# Date Page Views Rolling Sum
# 2020-01-01 100 NaN
# 2020-01-02 150 NaN
# 2020-01-03 200 NaN
# 2020-01-04 300 NaN
# 2020-01-05 250 NaN
# 2020-01-06 350 NaN
# 2020-01-07 400 NaN
# 2020-01-08 500 NaN
# 2020-01-09 450 NaN
# 2020-01-10 550 NaN
# 2020-01-11 600 NaN
# 2020-01-12 700 NaN
# 2020-01-13 800 NaN
# 2020-01-14 900 NaN
# 2020-01-15 1000 NaN
# 2020-01-16 1100 NaN
# 2020-01-17 1200 NaN
# 2020-01-18 1300 NaN
# 2020-01-19 1400 NaN
# 2020-01-20 1500 NaN
# 2020-01-21 1600 NaN
# 2020-01-22 1700 NaN
# 2020-01-23 1800 NaN
# 2020-01-24 1900 NaN
# 2020-01-25 2000 NaN
# 2020-01-26 2100 NaN
# 2020-01-27 2200 NaN
# 2020-01-28 2300 NaN
# 2020-01-29 2400 NaN
# 2020-01-30 2500 37500.0
# 2020-01-31 2600 38500.0
# 2020-02-01 2700 39500.0
# 2020-02-02 2800 40500.0
# 2020-02-03 2900 41500.0
# 2020-02-04 3000 42500.0
# 2020-02-05 3100 43500.0
# 2020-02-06 3200 44500.0
# 2020-02-07 3300 45500.0
# 2020-02-08 3400 46500.0
# 2020-02-09 3500 47500.0
# 2020-02-10 3600 48500.0
# 2020-02-11 3700 49500.0
# 2020-02-12 3800 50500.0
# 2020-02-13 3900 51500.0
# 2020-02-14 4000 52500.0
# 2020-02-15 4100 53500.0
# 2020-02-16 4200 54500.0
# 2020-02-17 4300 55500.0
# 2020-02-18 4400 56500.0
# 2020-02-19 4500 57500.0
# 2020-02-20 4600 58500.0
# 2020-02-21 4700 59500.0
# 2020-02-22 4800 60500.0
# 2020-02-23 4900 61500.0
# 2020-02-24 5000 62500.0
# 2020-02-25 5100 63500.0
# 2020-02-26 5200 64500.0
# 2020-02-27 5300 65500.0
# 2020-02-28 5400 66500.0
# 2020-02-29 5500 67500.0
# 2020-03-01 5600 68500.0
# 2020-03-02 5700 69500.0
# 2020-03-03 5800 70500.0
# 2020-03-04 5900 71500.0
# 2020-03-05 6000 72500.0
# 2020-03-06 6100 73500.0
# 2020-03-07 6200 74500.0
# 2020-03-08 6300 75500.0
# 2020-03-09 6400 76500.0
# 2020-03-10 6500 77500.0
# 2020-03-11 6600 78500.0
# 2020-03-12 6700 79500.0
# 2020-03-13 6800 80500.0
# 2020-03-14 6900 81500.0
# 2020-03-15 7000 82500.0
# 2020-03-16 7100 83500.0
# 2020-03-17 7200 84500.0
# 2020-03-18 7300 85500.0
# 2020-03-19 7400 86500.0
# 2020-03-20 7500 87500.0
# 2020-03-21 7600 88500.0
# 2020-03-22 7700 89500.0
# 2020-03-23 7800 90500.0
# 2020-03-24 7900 91500.0
# 2020-03-25 8000 92500.0
# 2020-03-26 8100 93500.0
# 2020-03-27 8200 94500.0
# 2020-03-28 8300 95500.0
# 2020-03-29 8400 96500.0
# 2020-03-30 8500 97500.0
# 2020-03-31 8600 98500.0
# 2020-04-01 8700 99500.0
# 2020-04-02 8800 100500.0
# 2020-04-03 8900 101500.0
# 2020-04-04 9000 102500.0
# 2020-04-05 9100 103500.0
# 2020-04-06 9200 104500.0
# 2020-04-07 9300 105500.0
# 2020-04-08 9400 106500.0
# 2020-04-09 9500 107500.0
# 2020-04-10 9600 108500.0
# 2020-04-11 9700 109500.0
# 2020-04-12 9800 110500.0
# 2020-04-13 9900 111500.0
# 2020-04-14 10000 112500.0
# 2020-04-15 10100 113500.0
# 2020-04
And there you have it two examples of how to use Pandas rolling functions for time series analysis. But wait, what if your dataset has missing values? No problem! Here’s an example that includes a min_periods parameter:
# Import the pandas library and alias it as "pd"
import pandas as pd
# Load the stock prices data from a CSV file and set the "Date" column as the index
data = pd.read_csv('stock_prices.csv', index_col='Date')
# Set the window size for the rolling function to 7 days
window_size = 7
# Calculate the moving average of the "Close Price" column using the rolling function
# Set the minimum number of periods to 1 to include missing values in the calculation
# Store the result in a new column called "Moving Avg"
data['Moving Avg'] = data['Close Price'].rolling(window=window_size, min_periods=1).mean()
And that’s all there is to it! Pandas rolling functions are incredibly powerful and flexible for time series analysis. You can customize the window type or apply your own custom aggregation function using the apply method. So go ahead dive into the world of time series data with Pandas rolling functions, and let us know if you have any questions or need further assistance!