Normalize Your Django REST Serializers

When dealing with models with nested relationships, it may initially make sense to serialize them in a nested format. However, you may soon discover that this has a couple of potential issues.

  1. This structure can result in a lot of duplication in the serialized data, especially for many-to-many relationships.
  2. Since those objects are nested, you don’t have them all in one place for easy referencing or updating. Working around this requires tedious iteration and transformation.

These issues are especially pronounced in Javascript applications that use state methodologies or technologies like Redux. Redux applications work much better when your data is normalized. Normalization facilitates cleaner code, makes updating state simpler, and ensures that the fewest possible UI components are forced to re-render due to such updates.

When you are using an external API that you have no control over, you may be forced to normalize your data on the client. For this, you can use a tool such as normalizr. However, if you are creating your own Django REST API, you can save yourself some trouble and added frontend complexity and normalize server-side.

Understand the Problem

Consider an example blog application. Your models might look like this:

from django.db import models


class Blog(models.Model):
    name = models.CharField(max_length=50)

    def __str__(self):
        return self.name


class BlogPost(models.Model):
    title = models.CharField(max_length=100)
    body = models.CharField(max_length=200)
    author = models.ForeignKey('User', on_delete=models.CASCADE)
    blog = models.ForeignKey(
        'Blog', on_delete=models.CASCADE, related_name='posts')

    def __str__(self):
        return '{} - {}'.format(self.blog, self.title)


class User(models.Model):
    username = models.CharField(max_length=100)
    name = models.CharField(max_length=100)

    def __str__(self):
        return '{} - {}'.format(self.username, self.name)


class Comment(models.Model):
    author = models.ForeignKey('User', on_delete=models.CASCADE)
    comment = models.CharField(max_length=200)
    post = models.ForeignKey(
        'BlogPost', on_delete=models.CASCADE, related_name='comments')

    def __str__(self):
        return '{} - {}'.format(self.post, self.comment)

In order to include all relevant data, your serializers may have a nested structure like this:

class UserSerializer(serializers.ModelSerializer):

    class Meta:
        model = User
        fields = ('id', 'name', 'username',)


class CommentSerializer(serializers.ModelSerializer):
    author = UserSerializer()

    class Meta:
        model = Comment
        fields = ('id', 'author', 'comment', 'post')


class BlogPostSerializer(serializers.ModelSerializer):
    author = UserSerializer()
    comments = CommentSerializer(many=True)

    class Meta:
        model = BlogPost
        fields = ('id', 'author', 'body', 'comments', 'title',)


class BlogSerializer(serializers.ModelSerializer):
    posts = BlogPostSerializer(many=True)

    class Meta:
        model = Blog
        fields = ('id', 'name', 'posts',)

Finally, an example response from the blogs endpoint looks like this:

[
    {
        "id": 1,
        "name": "Stormlight Archives Blog",
        "posts": [
            {
                "id": 1,
                "author": {
                    "id": 1,
                    "name": "Dalinar",
                    "username": "user1"
                },
                "body": ".........",
                "comments": [
                    {
                        "id": 1,
                        "author": {
                            "id": 2,
                            "name": "Kaladin",
                            "username": "user2"
                        },
                        "comment": ".........",
                        "post": 1
                    },
                    {
                        "id": 2,
                        "author": {
                            "id": 3,
                            "name": "Shallan",
                            "username": "user3"
                        },
                        "comment": ".........",
                        "post": 1
                    },
                    {
                        "id": 3,
                        "author": {
                            "id": 1,
                            "name": "Dalinar",
                            "username": "user1"
                        },
                        "comment": ".........",
                        "post": 1
                    }
                ],
                "title": "Dalinar's Blog Post"
            },
            {
                "id": 2,
                "author": {
                    "id": 3,
                    "name": "Shallan",
                    "username": "user3"
                },
                "body": ".........",
                "comments": [
                    {
                        "id": 4,
                        "author": {
                            "id": 2,
                            "name": "Kaladin",
                            "username": "user2"
                        },
                        "comment": ".........",
                        "post": 2
                    },
                    {
                        "id": 5,
                        "author": {
                            "id": 1,
                            "name": "Dalinar",
                            "username": "user1"
                        },
                        "comment": ".........",
                        "post": 2
                    }
                ],
                "title": "Shallan's Blog Post"
            }
        ]
    }
]

Notice how author data is duplicated unnecessarily, and how comments are deeply nested and hard to access. These are the problems we intend to fix.

Make Django REST Do the Work

In order to normalize our data and make it as easy to work with as possible, we need to do three things.

  1. Remove the nested data.
  2. Aggregate the removed data into their own sections.
  3. Make the new sections allow for easy lookups.

Remove the nested data

This is the easy part. Remove any nested serializers from your serializers:

class UserSerializer(serializers.ModelSerializer):

    class Meta:
        model = User
        fields = ('id', 'name', 'username',)


class CommentSerializer(serializers.ModelSerializer):

    class Meta:
        model = Comment
        fields = ('id', 'author', 'comment', 'post')


class BlogPostSerializer(serializers.ModelSerializer):

    class Meta:
        model = BlogPost
        fields = ('id', 'author', 'body', 'comments', 'title',)


class BlogSerializer(serializers.ModelSerializer):

    class Meta:
        model = Blog
        fields = ('id', 'comments', 'name', 'posts',)

This will make it so that nested fields are represented as ids instead of full representations.

Aggregate the removed data into their own sections

Now that we’ve removed the nested data, we need to gather the same data and place them at the root of the response, for easy access.

For our example, the BlogSerializer needs to have a list of authors and a list of comments. This will require you to write a couple queries using Django REST’s SerializerMethodField. These queries will collect the authors of each post and of all comments for all posts, and then all of the comments for all posts. Collecting them in one place, at the root of the response, will prevent duplication and allow easy reference.

class BlogSerializer(serializers.ModelSerializer):
    posts = BlogPostSerializer(many=True)
    comments = serializers.SerializerMethodField()
    authors = serializers.SerializerMethodField()

    def get_comments(self, blog):
        comments = Comment.objects.filter(
            post__blog=blog,
        )
        return CommentSerializer(
            comments,
            many=True,
            context={'request': self.context['request']}
        ).data

    def get_authors(self, blog):
        comments = Comment.objects.filter(
            post__blog=blog,
        )
        authors = User.objects.filter(
            Q(comments__in=comments) | Q(posts__in=blog.posts.all()),
        )
        return UserSerializer(
            authors,
            many=True,
            context={'request': self.context['request']},
        ).data

    class Meta:
        model = Blog
        fields = ('id', 'authors', 'comments', 'name', 'posts',)

After this change, our API response looks like this:

[
    {
        "id": 1,
        "authors": [
            {
                "id": 1,
                "name": "Dalinar",
                "username": "user1"
            },
            {
                "id": 1,
                "name": "Dalinar",
                "username": "user1"
            },
            {
                "id": 2,
                "name": "Kaladin",
                "username": "user2"
            },
            {
                "id": 2,
                "name": "Kaladin",
                "username": "user2"
            },
            {
                "id": 3,
                "name": "Shallan",
                "username": "user3"
            }
        ],
        "comments": [
            {
                "id": 1,
                "author": 2,
                "comment": ".........",
                "post": 1
            },
            {
                "id": 2,
                "author": 3,
                "comment": ".........",
                "post": 1
            },
            {
                "id": 3,
                "author": 1,
                "comment": ".........",
                "post": 1
            },
            {
                "id": 4,
                "author": 2,
                "comment": ".........",
                "post": 2
            },
            {
                "id": 5,
                "author": 1,
                "comment": ".........",
                "post": 2
            }
        ],
        "name": "Stormlight Archives Blog",
        "posts": [
            {
                "id": 1,
                "author": 1,
                "body": ".........",
                "comments": [
                    1,
                    2,
                    3
                ],
                "title": "Dalinar's Blog Post"
            },
            {
                "id": 2,
                "author": 3,
                "body": ".........",
                "comments": [
                    4,
                    5
                ],
                "title": "Shallan's Blog Post"
            }
        ]
    }
]

Allow for easy lookups

Our API response is already looking great. It’s more concise and organized, and it doesn’t have deeply nested or duplicated data. The last step is to make our entities easier to reference. We will do this by using dictionaries for each section instead of lists. These dictionaries will have the id as the key and the actual entity as the value.

We will accomplish this by creating a subclass of ListSerializer.

class DictSerializer(serializers.ListSerializer):
    """
    Overrides default ListSerializer to return a dict with a custom field from
    each item as the key. Makes it easier to normalize the data so that there
    is minimal nesting. dict_key defaults to 'id' but can be overridden.
    """
    dict_key = 'id'

    @property
    def data(self):
        """
        Overriden to return a ReturnDict instead of a ReturnList.
        """
        ret = super(serializers.ListSerializer, self).data
        return ReturnDict(ret, serializer=self)

    def to_representation(self, data):
        """
        Converts the data from a list to a dictionary.
        """
        items = super(DictSerializer, self).to_representation(data)
        return {item[self.dict_key]: item for item in items}

This subclass needs to be plugged into our serializers with the Meta.list_serializer_class attribute.

class UserSerializer(serializers.ModelSerializer):

    class Meta:
        model = User
        fields = ('id', 'name', 'username',)
        list_serializer_class = DictSerializer


class CommentSerializer(serializers.ModelSerializer):

    class Meta:
        model = Comment
        fields = ('id', 'author', 'comment', 'post')
        list_serializer_class = DictSerializer


class BlogPostSerializer(serializers.ModelSerializer):

    class Meta:
        model = BlogPost
        fields = ('id', 'author', 'body', 'comments', 'title',)
        list_serializer_class = DictSerializer

Finally, our API response now looks like this:

[
    {
        "id": 1,
        "authors": {
            "1": {
                "id": 1,
                "name": "Dalinar",
                "username": "user1"
            },
            "2": {
                "id": 2,
                "name": "Kaladin",
                "username": "user2"
            },
            "3": {
                "id": 3,
                "name": "Shallan",
                "username": "user3"
            }
        },
        "comments": {
            "1": {
                "id": 1,
                "author": 2,
                "comment": ".........",
                "post": 1
            },
            "2": {
                "id": 2,
                "author": 3,
                "comment": ".........",
                "post": 1
            },
            "3": {
                "id": 3,
                "author": 1,
                "comment": ".........",
                "post": 1
            },
            "4": {
                "id": 4,
                "author": 2,
                "comment": ".........",
                "post": 2
            },
            "5": {
                "id": 5,
                "author": 1,
                "comment": ".........",
                "post": 2
            }
        },
        "name": "Stormlight Archives Blog",
        "posts": {
            "1": {
                "id": 1,
                "author": 1,
                "body": ".........",
                "comments": [
                    1,
                    2,
                    3
                ],
                "title": "Dalinar's Blog Post"
            },
            "2": {
                "id": 2,
                "author": 3,
                "body": ".........",
                "comments": [
                    4,
                    5
                ],
                "title": "Shallan's Blog Post"
            }
        }
    }
]

Benefits

Our API response is now set up similar to a database. Each entity has its own “table” and relationships are represented by foreign keys. We can now take those foreign keys and do a quick lookup on the corresponding table. The following psuedocode illustrates the types of lookups you can do with this structure.

post = posts['1']
authors = post.authors
comments = post.comments
post_author = authors[post.author]
comment = comments[post.comments[0]]
comment_author = authors[comment.author]

With this structure, the workflow in an environment with something like Redux becomes much simpler. Updates can happen at a single place, without concern for duplication, and it will be easier to ensure that only UI components that need updating are updated.

Closing Remarks

Normalized data is easier to work with and is more compact. By making Django REST do this normalization for you, you avoid having to write troublesome transformations on the client side.

To see the full example code on GitHub.

One thought on “Normalize Your Django REST Serializers”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.