Skip to main content

Data Modeling with JSON Schema

Data modeling in this project is centered around the Schema class, which provides a rigorous implementation of the OpenAPI 3.1.0 Specification. This version of OpenAPI is significant because it fully aligns with JSON Schema 2020-12, allowing for a unified approach to data validation and API documentation.

Alignment with JSON Schema 2020-12

The Schema class in fastapi/openapi/models.py is designed to represent the JSON Schema Core Vocabulary and its various sub-vocabularies. Because JSON Schema uses keywords that are reserved in Python (such as not, if, and else) or characters that are invalid in Python identifiers (such as $schema and $id), the implementation relies heavily on Pydantic's Field aliases.

class Schema(BaseModelWithConfig):
# Core Vocabulary using aliases for JSON Schema keywords
schema_: str | None = Field(default=None, alias="$schema")
id: str | None = Field(default=None, alias="$id")
ref: str | None = Field(default=None, alias="$ref")

# Subschema application
allOf: list["SchemaOrBool"] | None = None
anyOf: list["SchemaOrBool"] | None = None
oneOf: list["SchemaOrBool"] | None = None
not_: Optional["SchemaOrBool"] = Field(default=None, alias="not")

This design choice ensures that the generated openapi.json remains compliant with the specification while allowing developers to interact with the models using idiomatic Python names (e.g., using schema.not_ instead of schema.not).

Structural Validation and Type Flexibility

One of the primary roles of the Schema object is structural validation. In alignment with JSON Schema 2020-12, the type field in the Schema class is highly flexible. It can be a single string, a list of strings (allowing for multi-type definitions like ["string", "null"]), or None.

The valid types are defined by the SchemaType literal:

SchemaType = Literal[
"array", "boolean", "integer", "null", "number", "object", "string"
]

As demonstrated in tests/test_openapi_schema_type.py, the Schema model validates these types strictly:

from fastapi.openapi.models import Schema

# Valid multi-type schema
schema = Schema(type=["string", "null"])

# This would raise a ValidationError due to strict typing in the model
# Schema(type=True)

Polymorphism with Discriminators

To support complex data structures where a single field determines the specific subtype of an object, the project implements the Discriminator class. This is essential for handling Union types in a way that client generators can understand.

The Discriminator object maps a specific property name to a set of schema references:

class Discriminator(BaseModel):
propertyName: str
mapping: dict[str, str] | None = None

In practice, when a Pydantic model uses a discriminator field, FastAPI's internal utilities map that configuration to this Discriminator model in the final OpenAPI output. This allows the API to explicitly state, for example, that if the type field is "car", the rest of the object follows the Car schema.

XML Serialization Rules

While JSON is the primary format, the Schema object includes an xml field that references the XML class. This allows for fine-grained control over how data is serialized when the Content-Type is application/xml.

class XML(BaseModelWithConfig):
name: str | None = None
namespace: str | None = None
prefix: str | None = None
attribute: bool | None = None
wrapped: bool | None = None

This model addresses the structural differences between JSON and XML. For instance, the attribute field determines if a value should be rendered as an XML attribute rather than an element, and wrapped determines if array items should be enclosed in a parent element.

Reusability and References

To prevent duplication and manage complexity in large API definitions, the Reference class is used to implement the $ref mechanism.

class Reference(BaseModel):
ref: str = Field(alias="$ref")

The Schema class itself also contains a ref field. The distinction is that a Reference object is typically used when a property is a reference to another component, whereas the ref field inside a Schema object is part of the JSON Schema 2020-12 core vocabulary, allowing a schema to extend or reference another schema dynamically.

Tradeoffs and Evolution

The implementation reflects the evolution of the OpenAPI specification. A notable example is the handling of examples. In OpenAPI 3.0.x, a single example field was used. OpenAPI 3.1.0 shifted to the JSON Schema examples (plural) field, which is an array.

The Schema class maintains both for backward compatibility, but marks the singular example as deprecated:

    examples: list[Any] | None = None
example: Annotated[
Any | None,
typing_deprecated(
"Deprecated in OpenAPI 3.1.0 that now uses JSON Schema 2020-12, "
"although still supported. Use examples instead."
),
] = None

This transition highlights the project's commitment to staying current with industry standards while providing a stable interface for existing integrations.