Alright, something that’ll make your eyes glaze over faster than a math professor trying to explain calculus: Recurrent Event Modelling for AV Reliability Analysis using California Driving Data. Don’t Worry!
Before anything else: What is Recurrent Event Modelling (REM)? It’s basically a fancy statistical technique that helps us understand how often something happens over time. In this case, we’re using it to analyze the reliability of autonomous vehicles (AVs) on California roads based on data collected by the DMV.
Now, why REM is important for AV reliability analysis. Imagine you’re driving down a busy highway in your car and suddenly, without warning, your brakes fail. This is an example of what we call a “disengagement event,” where the human driver has to take over control from the autonomous system because it couldn’t handle the situation on its own.
The DMV collects data about these disengagement events and makes it available for researchers like us to analyze. But here’s the catch: These events aren’t evenly spaced out in time. Sometimes there are long periods of time with no disengagements, followed by a cluster of them all at once. This is where REM comes in handy it allows us to model these irregular patterns and better understand how often AVs experience disengagement events over time.
So, Let’s get cracking with the technical details. First, we need to load our data from the DMV website (which can be a painstaking process due to their slow servers). Once we have that loaded up in R or Python, we’ll use REM to model the disengagement events over time.
Here’s an example of what this might look like:
# Import necessary libraries
import pandas as pd
from lifelines import KaplanMeierFitter
import datetime # Import datetime library for date manipulation
import numpy as np # Import numpy library for mathematical operations
import matplotlib.pyplot as plt # Import matplotlib library for data visualization
# Load data from DMV website and clean it up
df = pd.read_csv('disengagement_data.csv') # Load data from csv file into a pandas dataframe
df['time'] = df['date'].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y')) # Convert date column to datetime format
df['duration'] = (df['time'].shift(-1) - df['time']) / np.timedelta64(1, 's') # Calculate time between events in seconds
df = df[df['disengagement'] == 1] # Filter for disengagements only
# Calculate time between events and create a survival function using Kaplan-Meier estimator
survival = SurvivalFunction(time=df['duration'], event_observed='disengaged') # Create survival function using duration and event columns from dataframe
kmf = KaplanMeierFitter() # Initialize KaplanMeierFitter object
kmf.fit(survival, label="Kaplan-Meier") # Fit survival function to data and label it as "Kaplan-Meier"
plt.plot([0, df['duration'].max()], [0, 1], '--k', lw=2) # Add a dashed line for the baseline survival function
plt.xlabel('Time (seconds)', fontsize='large') # Set x-axis label
plt.ylabel('Survival Probability', fontsize='large') # Set y-axis label
plt.title('Kaplan-Meier Survival Function for AV Disengagement Events in California', fontsize='xx-large') # Set title for plot
plt.legend(['Observed Data', 'Baseline'], loc="upper left") # Add legend to plot
# The script above loads data from the DMV website, cleans it up, and creates a survival function using the Kaplan-Meier estimator. It then plots the survival function and adds labels and a legend for better visualization.
This code loads the data, cleans it up (including converting dates to datetime objects and filtering out non-disengagements), calculates time between events using a survival function, and creates a Kaplan-Meier estimator. The resulting plot shows us how often AVs experience disengagement events over time based on the data collected by the DMV in California.