Python’s Pydantic Library for Data Validation

Pydantic is a fast and extensible library that allows us to define the structure of our data models using pure, canonical Python 3.8+ syntax. It provides an easy-to-use interface for validating input data against these defined structures. Pydantic’s validation process ensures that the data conforms to the expected format, type, and range constraints.

Pydantic is particularly useful when dealing with untrusted input because it guarantees linear time searching of strings, which helps prevent exponential-time performance issues caused by complex regex validation. If you need more advanced regex features, Pydantic provides a custom validator that allows you to do the regex validation in Python instead.

To get started with Pydantic, let’s create an example data model for a simple book object:

# Importing the necessary module from Pydantic library
from pydantic import BaseModel

# Creating a data model for a book object using the BaseModel class
class Book(BaseModel):
    # Defining the attributes of the book object and their data types
    title: str # Title of the book, data type: string
    author: str # Author of the book, data type: string
    pages: int # Number of pages in the book, data type: integer
    published_date: datetime.datetime # Date when the book was published, data type: datetime object

In this code snippet, we are using the `BaseModel` class from Pydantic to define our data model for a book object. The `title`, `author`, and `pages` fields have been defined as strings, while the `published_date` field has been defined as a datetime object.

To validate this data model against input data, we can use the `validate()` method provided by Pydantic:

# Importing necessary libraries
from pydantic import BaseModel, ValidationError # Added ValidationError to handle validation errors
import datetime

# Defining the data model for a book object
class Book(BaseModel):
    title: str # Defining title field as a string
    author: str # Defining author field as a string
    pages: int # Defining pages field as an integer
    published_date: datetime.datetime # Defining published_date field as a datetime object

# Creating a dictionary with input data for a book
book = {
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "pages": 475,
    "published_date": "1925-04-10"
}

# Validating the input data against the defined data model
try:
    book = Book(**book) # Using the validate() method to validate the input data against the data model
except ValidationError as e:
    print("Validation error:", str(e)) # Printing the validation error if there is one
else:
    print("Book validated successfully!") # Printing a success message if the input data is valid

In this code snippet, we are creating a dictionary `book` that contains the input data for our book object. We then pass this dictionary to the constructor of our Book class using the `**` syntax. If there is any validation error, Pydantic will raise a `ValidationError`. Otherwise, it will return an instance of the validated model.

Pydantic provides many features that make data validation easier and more efficient. For example:

– **Type hints**: Pydantic uses type hints to define the expected types for each field in our data models. This makes it easy to catch any type errors during validation.

**Default values**: We can provide default values for fields that are not present in the input data. For example:

# Importing the necessary module
from pydantic import BaseModel, Field
# Creating a data model for a book
class Book(BaseModel):
    # Defining the expected type for the title field and providing an alias
    title: str = Field("", alias="title_str")
    # Defining the expected type for the author field
    author: str
    # Defining the expected type for the pages field and providing a default value
    pages: int = 0
    # Defining the expected type for the published_date field and providing a default value using the datetime module
    published_date: datetime.datetime = Field(default=datetime.datetime.utcnow)

In this code snippet, we have added a default value of `””` to the `title` field and an alias for it as well. We have also provided a default value of 0 for the `pages` field and a default value of `datetime.datetime.utcnow()` for the `published_date` field.

– **Field constraints**: Pydantic allows us to define constraints on fields, such as minimum or maximum values, patterns, and enumerations. For example:

# Importing necessary libraries
from pydantic import BaseModel, Field # Importing the BaseModel and Field classes from the pydantic library
from datetime import datetime, timedelta # Importing the datetime class from the datetime library

# Defining the Book class
class Book(BaseModel):
    title: str = Field("", alias="title_str") # Defining the title field as a string with a default value of an empty string and an alias of "title_str"
    author: str # Defining the author field as a string
    pages: int = 0 # Defining the pages field as an integer with a default value of 0
    published_date: datetime = Field(default=datetime.utcnow) # Defining the published_date field as a datetime with a default value of the current UTC datetime
    price: float = Field(ge=0, le=1000) # Defining the price field as a float with constraints of being greater than or equal to 0 and less than or equal to 1000
    isbn: str = Field("", regex="^[0-9X]{13}$") # Defining the isbn field as a string with a default value of an empty string and a regular expression pattern of 13 digits or "X"


In this code snippet, we have added a constraint to the `price` field that ensures it is between 0 and 1000. We have also provided a regular expression pattern for the `isbn` field.

Pydantic provides many other features as well, such as custom validators, type adapters, and JSON schema generation. For more information on these features, please refer to Pydantic’s documentation at https://pydantic.dev/.

SICORPS