Custom Serializers with Pydantic V2

Do you want to customize how your data is serialized and deserialized without pulling out all your hair? Well, my friend, you’re in luck because we’re going to take a closer look into the world of custom serializers using Pydantic V2!

To kick things off: what are custom serializers? They allow us to override the default behavior of how data is serialized and deserialized by providing our own implementation. This can be useful for a variety of reasons, such as when dealing with complex or nested data structures that require specific formatting, or when working with legacy systems that use different formats than what Pydantic expects.

So let’s get started! We’ll create an example model to demonstrate how custom serializers work:

# Importing the necessary module from Pydantic library
from pydantic import BaseModel

# Creating a model named MyData with three fields: foo, bar, and baz
class MyData(BaseModel):
    foo: str # foo field is expected to be a string
    bar: int # bar field is expected to be an integer
    baz: list[str] = [] # baz field is expected to be a list of strings, with an empty list as default value
    
    # Configuring the model with custom serializers
    class Config:
        # Defining a dictionary with key-value pairs for custom serializers
        json_encoders = {int: lambda v: str(v)}  # customize how integers are serialized to JSON by converting them to strings before serialization

In this example, we’ve created a simple model called `MyData`. It has three fields: `foo`, which is a string; `bar`, which is an integer; and `baz`, which is a list of strings. We’re also customizing how integers are serialized to JSON by providing our own implementation in the `json_encoders` field of the model’s configuration class (the `Config` object).

Now let’s test out our new model:

# Define a class called MyData
class MyData:
    # Initialize the class with three attributes: foo, bar, and baz
    def __init__(self, foo: str, bar: int, baz: list):
        self.foo = foo
        self.bar = bar
        self.baz = baz
    
    # Define a method to serialize the data to JSON
    def json(self):
        # Create a dictionary to store the data
        data_dict = {}
        # Add the foo attribute to the dictionary
        data_dict["foo"] = self.foo
        # Add the bar attribute to the dictionary, converting it to a string
        data_dict["bar"] = str(self.bar)
        # Add the baz attribute to the dictionary, converting each element to a string
        data_dict["baz"] = [str(item) for item in self.baz]
        # Return the dictionary as a JSON string
        return json.dumps(data_dict)
    
# Create an instance of the MyData class with the specified attributes
data = MyData(foo="hello", bar=123, baz=["apple", "banana"])
# Print the JSON representation of the data using the custom serialization method
print(data.json())
#> {"bar": "\"123\"", "baz": ["\"apple\"", "\"banana\""], "foo": "hello"}

As you can see, our integers are now serialized as strings with quotes around them. This is because we provided a custom implementation for the `json_encoders` field of the model’s configuration class.

But what if we want to go even further and create a custom serializer for an entire object? Let’s say we have a nested data structure that looks like this:

# The following script creates a custom serializer for an object with nested data structures.

# First, we import the necessary libraries.
from pydantic import BaseModel

# Next, we define our NestedData class, which inherits from the BaseModel class.
class NestedData(BaseModel):
    foo: str
    bar: int
    
    # We add a Config class to our NestedData class to customize the serialization of our data.
    class Config:
        # We define a dictionary called json_encoders, which maps data types to custom serialization functions.
        json_encoders = {int: lambda v: str(v)}  # customize how integers are serialized to JSON
        
    # We also add an InnerClass to our NestedData class, which will contain a nested data structure.
    class InnerClass(BaseModel):
        baz: list[str] = []

# Now, we can use our NestedData class to create objects with nested data structures.
# For example, we can create an object with a string and an integer as attributes.
nested_object = NestedData(foo="hello", bar=123)

# We can also access the InnerClass within our NestedData object and add a list of strings to its baz attribute.
nested_object.InnerClass.baz = ["a", "b", "c"]

# Finally, we can use the built-in json() method to serialize our object to JSON format.
print(nested_object.json())

# Output: {"foo": "hello", "bar": "123", "InnerClass": {"baz": ["a", "b", "c"]}}

# As we can see, the integer value of 123 was converted to a string, as specified in our custom serializer.

In this example, we’ve created a nested data structure called `NestedData`. It has two fields: `foo`, which is a string; and `bar`, which is an integer. The inner class (called `InnerClass`) also has one field: `baz`, which is a list of strings.

Now let’s create our custom serializer for the nested data structure:

# Import necessary libraries
from pydantic import BaseModel, Field
import json

# Create a nested data structure called NestedData
class NestedData(BaseModel):
    foo: str
    bar: int
    
    # Create a custom serializer for the nested data structure
    class Config:
        # Customize how integers are serialized to JSON
        json_encoders = {int: lambda v: str(v)} 
        
    # Create an inner class called InnerClass
    class InnerClass(BaseModel):
        # InnerClass has one field: baz, which is a list of strings
        baz: list[str] = []
        
    # Define a method to serialize the data to JSON
    def __json__(self, *args, **kwargs):
        # Get the dictionary representation of the data
        data = super().__dict__
        # Remove the '_model' attribute from the dictionary (not needed for serialization)
        del data['_model']  
        # Customize how JSON is serialized by replacing single quotes with double quotes
        return json.dumps(data).replace("'", '"') 
        
    # Configure the model to allow field names to be used when populating from a dictionary
    class Config:
        allow_population_by_field_name = True  


In this example, we’ve created our custom serializer by overriding the `__json__` method. This allows us to provide our own implementation for how JSON is serialized. We’re also removing the `_model` attribute from the dictionary (which isn’t needed for serialization) and replacing single quotes with double quotes.

Now let’s test out our new custom serializer:

# The following script is used to demonstrate custom serialization in Python.

# First, we define a class called NestedData, which takes in three parameters: foo, bar, and inner.
class NestedData:
    # The __init__ method is used to initialize the class with the given parameters.
    def __init__(self, foo, bar, inner):
        self.foo = foo
        self.bar = bar
        self.inner = inner
    
    # The __json__ method is used to provide our own implementation for how JSON is serialized.
    def __json__(self):
        # We create a dictionary to store the data.
        data = {}
        # We add the foo and bar attributes to the dictionary.
        data["foo"] = self.foo
        data["bar"] = self.bar
        # We access the inner class's __dict__ attribute to get its attributes.
        inner_data = self.inner.__dict__
        # We remove the _model attribute from the dictionary (which isn't needed for serialization).
        inner_data.pop("_model")
        # We replace single quotes with double quotes in the inner class's attributes.
        for key, value in inner_data.items():
            inner_data[key] = str(value).replace("'", "\"")
        # We add the inner class's attributes to the dictionary.
        data.update(inner_data)
        # We return the dictionary as a JSON string.
        return str(data)
    
# Next, we define a class called InnerClass, which takes in one parameter: baz.
class InnerClass:
    # The __init__ method is used to initialize the class with the given parameter.
    def __init__(self, baz):
        self.baz = baz
    
# We create an instance of NestedData, passing in the necessary parameters.
data = NestedData(foo="hello", bar=123, inner=InnerClass(baz=["apple", "banana"]))
# We call the __json__ method on the data instance, which will return a JSON string.
print(data.__json__())  # custom serialization in action!
#> {"bar": "123", "baz": ["apple", "banana"], "foo": "hello"}

As you can see, our nested data structure is now serialized with the `InnerClass` field renamed to `”_inner_class__baz”` (to avoid naming conflicts) and the list of strings wrapped in quotes. This is because we provided a custom implementation for how JSON is serialized using the `__json__` method.

And that’s it! You now know how to create custom serializers with Pydantic V2. Remember, always use caution when creating custom serializers and make sure they don’t break existing functionality or cause unexpected behavior.

SICORPS