To begin with: what are data classes and why should you care? Data classes are a built-in feature in Python 3.7+ that allow us to define simple, immutable objects without having to write boilerplate code for initializing them or implementing their string representation. Theyre perfect for representing complex data structures with minimal effort.
But let’s be real here: the original implementation of dataclasses left a lot to be desired. For starters, there was no way to customize the ordering of your objects without resorting to hacks or third-party libraries. And if you wanted to add default values for some fields, well… good luck with that!
Don’t Worry, bro: Python 3.8 has come to save the day (or at least make our lives a little easier). With this release, we now have two new features that will change the game when it comes to data classes: field defaults and custom ordering.
Okay, first things first, field defaults. In previous versions of Python, if you wanted to add default values for some fields in your dataclasses, you had to write a bunch of boilerplate code that looked something like this:
# Importing the dataclass module from the standard library
from dataclasses import dataclass
# Creating a dataclass called Person
@dataclass
class Person:
# Defining the fields of the dataclass with their respective data types
name: str
age: int = 0 # Setting a default value for the age field
# Defining a method to be executed after the initialization of the dataclass
def __post_init__(self):
# Checking if the age field is less than 18
if self.age < 18:
# Raising a ValueError if the age is less than 18
raise ValueError("Age must be at least 18")
As you can see, we’re using the `__post_init__()` method to check that our age field is greater than or equal to 18. This works fine, but it adds a lot of unnecessary complexity and makes our code harder to read and understand.
But with Python 3.8, you can now define default values for your fields directly in the dataclass definition:
# Import the dataclass module from the standard library
from dataclasses import dataclass
# Define a dataclass called Person
@dataclass
class Person:
# Define a field called name with type string
name: str
# Define a field called age with type integer and a default value of 0
age: int = 0
# Define a method called __post_init__ that will be called after the object is initialized
def __post_init__(self):
# Check if the age field is less than 18
if self.age < 18:
# If it is, raise a ValueError with a message
raise ValueError("Age must be at least 18")
As you can see, we’re now using the `=` operator to set a default value for our age field. This is much cleaner and more concise than writing boilerplate code in your `__post_init__()` method!
Python 3.8 also introduced custom ordering for dataclasses. In previous versions of Python, if you wanted to sort a list or dictionary based on the values of certain fields in your data classes, you had to write some pretty ugly code:
# Importing necessary modules
from typing import List, Dict # Importing the typing module to add type annotations
from dataclasses import dataclass # Importing the dataclass module to create data classes
import operator # Importing the operator module for sorting purposes
# Creating a data class called Person with two attributes: name and age
@dataclass
class Person:
name: str # Adding type annotation for the name attribute
age: int # Adding type annotation for the age attribute
# Defining a function called sort_people that takes in a list of Person objects and returns a sorted list of Person objects
def sort_people(people: List[Person]) -> List[Person]: # Adding type annotations for the function parameters and return value
return sorted(people, key=lambda p: (p.age, p.name)) # Using the sorted() function to sort the list of Person objects based on their age and name attributes, using a lambda function as the key to specify the sorting criteria.
As you can see, we’re using a lambda function to define our custom ordering based on age and name. This works fine, but it adds a lot of unnecessary complexity and makes our code harder to read and understand.
But with Python 3.8, you can now define your own `__eq__()` and `__lt__()` methods for your data classes:
# Import the necessary modules
from dataclasses import dataclass
import operator
# Define a dataclass called Person with two attributes: name and age
@dataclass
class Person:
name: str
age: int
# Define a custom __eq__() method to compare two Person objects based on their age and name
def __eq__(self, other):
return self.age == other.age and self.name == other.name
# Define a custom __lt__() method to compare two Person objects based on their age and name
def __lt__(self, other):
# Check if the age of the current Person object is not equal to the age of the other Person object
if self.age != other.age:
# Use the less than operator to compare the ages of the two Person objects
return self.age < other.age
else:
# If the ages are equal, use the less than operator to compare the names of the two Person objects
return self.name < other.name
As you can see, we’re now defining our own `__eq__()` and `__lt__()` methods for our data classes based on age and name. This is much cleaner and more concise than writing boilerplate code in a lambda function!
With field defaults and custom ordering, we can now define simple, immutable objects with minimal effort and maximum flexibility. So go ahead and give them a try your data classes (and your sanity) will thank you!