Designing Scalable APIs with Domain Aggregates
API Boundaries with Aggregates
The aggregate pattern in Domain-Driven Design (DDD) groups multiple related domain objects under a single unit called an aggregate. At the center of this group is the aggregate root, a single entity that is the externally facing entry point for interacting with the aggregate to enforce consistency and encapsulate business rules. Think of it as the interface layer for any external consumers.
The core building blocks which make up the aggregate include,
- Entities - An object with a distinct identity, for e.g.
Customer,OrderorInvoiceare core entities represented with their own identifiers. - Value objects - An object which is defined by its attributes rather than identity, for example a customers
AddressorPhone Numberwould be value objects as two addresses with the same data would be considered equal. - Invariants - Business rules which govern the state of the aggregate’s value objects, these rules are enforced by the aggregate root whenever changes occur which keeps the domain model consistent & meaningful. For e.g. An invariant for the “Customer” aggregate might be, “A customer’s profile must always include an email address”.
- Domain events - Events published by the aggregate signalling that something important has happened within the domain. Examples include
OrderPlacedorAddressChanged.
The aggregate provides a transactional boundary around these objects which means any changes to an aggregates internal state should be committed as part of a single atomic operation, as an example you couldn’t update a customer’s email address unless you know about the customer first, those changes need to consider the aggregate as a whole rather than in isolation.
We can apply these principles when designing our API resources & payloads. Lets look at an example of an API for a “Customer Aggregate” with a value object & a sub-entity.
Our API endpoints could look something like the below,
/customers/{customerId}/email-address/customers/{customerId}/payment-methods/{paymentMethodId}The aggregate design pattern starts to give us a clear logical boundary of the operations around our customer resource. From an implementation perspective this gives us a structure for our microservices & the data models it would be responsible for. However these boundaries can grow in complexity if not managed properly upfront, for our customer aggregate lets look to expand its operations to now manage the customers orders & order items.
We’ve now added two more endpoints to our API;
/customers/{customerId}/email-address/customers/{customerId}/payment-methods/{paymentMethodId}
/customers/{customerId}/orders/{orderId}/customers/{customerId}/orders/{orderId}/items/{itemId}We start to see some complexity in the API design and potential crossover with other bounded contexts in our API’s.
There’s no easy rule to identify complexity in our aggregates as it can depend on the context we work in and having the orders entity as a sub-entity for a customer could be ok in the domain context we work in, however from an API design perspective we could look at splitting orders into a separate resource as the complexity increases in a sub-entity.
We’ll explore a few ways on how we can identify complexity in our aggregates.
Complex Transactional Boundaries
One approach towards identifying complex aggregates is through analysis of the relationships of the aggregate. Our customer entity has a one-to-many relationship to an order so it might seem natural to place it as a sub-entity of the customer however lets revisit a core principle for aggregates;
Operations that need to be consistent and atomic are encapsulated within a single aggregate. Transactions should not cross aggregate boundaries to maintain consistency and support scalability
By placing the order entity inside the customer aggregate, we start to blur the aggregate’s transactional boundaries. The order entity often has its own lifecycle and rules where it would transition in states such as from a placed status to a cancelled status. These events are independent of changes to the customer entity and in our current design any change to an order would require committing this change as part of the entire customer aggregate for example,
PATCH /customers/{customerId}/orders/{orderId}/status
{ "id": "customer-1", "order": { "status": "cancelled" }}As mentioned previously there’s no hard rule against the above and the decision will depend entirely on the use-case, however we should consider what the above update would entail in our code implementation for the service,
- In the above example, are we required to update any aspect of the customer entity to update the order status?
- Does the current design provide any observability benefits?
- How complex do our transactions become with the above design?
- What impact does it have on the performance of the service?
- How coupled is our data model to the above design?
Next we’ll look into a performance related technical indicator which can help us identify complexity in our aggregate design.
The N + 1 Query Problem
A performance issue that can arise from overly complex aggregates is the N+1 query problem. The problem occurs when a system executes a query for retrieving a list of records with an additional N queries for each of the parent records to fetch child data.
Lets explore this for our customer aggregate, an example response for querying all the customers and their orders might look as below;
GET /customers[ { "id": "customer-1", "name": "Jane Doe", "orders": [ { "id": "order-123", "items": [ { "id": "item-1" }, { "id": "item-2" } ] }, ... more orders ] } ... more customers]To get this response here’s whats happening;
- Query all customers e.g.
SELECT * FROM customers; - For each customer, query their orders e.g.
SELECT * FROM orders where customer_id = {customerId}; - For each order, query its order items e.g.
SELECT * FROM order_items where order_id = {orderId};
Now you might be thinking: “But what if I only query a single customer, like /customers/{id}/orders?”
And you’d be right, querying a single customer and only top-level order details typically doesn’t cause a problem. However, the N+1 issue re-emerges as soon as you start needing nested child data (e.g., items, payments, shipments) for each order.
The deeper we nest our resources the more impact we see on the queries performance & runtime complexity.
As you scale the runtime complexity is represented as O(N) or even O(N x M) where N = Number of parent records & M = Number of child records per parent which can quickly become a bottleneck in large systems.
Deeply nested queries are often a symptom of overly complex aggregate design & can be a good identifier for us to redesign our aggregates to be more focused.
Maintaining Entity Relation via HATEOS
A better approach for our existing customer API is to separate them into /customers & /orders resources.
However as with all solutions this comes with the tradeoff of having to now maintain the integration between these resources as they independently evolve. Our consumers must now ensure the version compatibility of the API’s they are integrating with.
One approach which can help us navigate this complexity is through HATEOS.
HATEOAS is a REST constraint where the API response includes hypermedia links to related resources. It lets clients navigate relationships between resources dynamically without hardcoding URLs
Implementing HATEOS gives us the flexibility to navigate the relationships & compatibility across our aggregates.
Lets apply this to our customer & order aggregate API’s which conceptually looks as below;
The API response for the customer aggregate now exposes references to compatible orders, this releases the burden of maintaining versions from our consumers whilst maintaining clear aggregate boundaries.
GET /customers/customer-1{ "id": "customer-1", "name": "Jane Doe", "_links": { "self": { "href": "/customers/customer-1" }, "orders": [ { "href": "/orders/order-1" }, { "href": "/orders/order-2" } ] }}Conclusion
Identifying the correct boundaries for our API’s is extremely difficult and the way to approach this will differ for each organisation. The aggregate pattern in Domain Driven Design gives us a framework on identifying logical boundaries which in turn effect how we design our API’s. We have primarily touched on how aggregates effect API design, however there are many layers of implementation details & data modelling which also play a part in effective design which we will go into more detail in future articles.
Links
← Back to blog