How to Optimize Your Django REST Viewsets

The combination of Django and the Django REST framework is powerful. With just three classes (a model, a viewset, and a serializer), you can have a basic CRUD endpoint up and running. Although it is easy to get set up, it is also easy to end up with a view that makes hundreds of unnecessary database queries. As database queries are relatively slow, we want to avoid them as much as possible. In order to do this, we will follow Tip #4 from my Django ORM Optimization Tips post:

4. Use select_related() and prefetch_related() when you will need foreign-key/reverse related objects.

Example

The example we will be working with is based on a blog site.

Here are the models:

from django.db import models


class BlogPost(models.Model):
    title = models.CharField(max_length=100)
    body = models.CharField(max_length=200)
    author = models.ForeignKey(
        'User', on_delete=models.CASCADE, related_name='posts')

    def __str__(self):
        return '{} - {}'.format(self.author.name, self.title)


class User(models.Model):
    username = models.CharField(max_length=100)
    name = models.CharField(max_length=100)

    def __str__(self):
        return '{} - {}'.format(self.username, self.name)


class Comment(models.Model):
    author = models.ForeignKey(
        'User', on_delete=models.CASCADE, related_name='comments')
    comment = models.CharField(max_length=200)
    post = models.ForeignKey(
        'BlogPost', on_delete=models.CASCADE, related_name='comments')

    def __str__(self):
        return '{} - {}'.format(self.post, self.comment)

Here are the serializers:

from rest_framework import serializers

from .models import BlogPost, Comment, User


class UserSerializer(serializers.ModelSerializer):

    class Meta:
        model = User
        fields = ('id', 'name', 'username',)


class CommentSerializer(serializers.ModelSerializer):
    author = UserSerializer()

    class Meta:
        model = Comment
        fields = ('id', 'author', 'comment', 'post')


class BlogPostSerializer(serializers.ModelSerializer):
    author = UserSerializer()
    comments = CommentSerializer(many=True)

    class Meta:
        model = BlogPost
        fields = ('id', 'author', 'body', 'comments', 'title',)

Here is the viewset:

from rest_framework import viewsets

from .models import BlogPost
from .serializers import BlogPostSerializer


class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = BlogPost.objects.all()
    serializer_class = BlogPostSerializer

And finally, an example response from our blog posts endpoint looks like this:

[
    {
        "id": 1,
        "author": {
            "id": 1,
            "name": "Dalinar",
            "username": "user1"
        },
        "body": ".........",
        "comments": [
            {
                "id": 1,
                "author": {
                    "id": 2,
                    "name": "Kaladin",
                    "username": "user2"
                },
                "comment": ".........",
                "post": 1
            },
            {
                "id": 2,
                "author": {
                    "id": 3,
                    "name": "Shallan",
                    "username": "user3"
                },
                "comment": ".........",
                "post": 1
            },
            {
                "id": 3,
                "author": {
                    "id": 1,
                    "name": "Dalinar",
                    "username": "user1"
                },
                "comment": ".........",
                "post": 1
            }
        ],
        "title": "Dalinar's Blog Post"
    },
    {
        "id": 2,
        "author": {
            "id": 3,
            "name": "Shallan",
            "username": "user3"
        },
        "body": ".........",
        "comments": [
            {
                "id": 4,
                "author": {
                    "id": 2,
                    "name": "Kaladin",
                    "username": "user2"
                },
                "comment": ".........",
                "post": 2
            },
            {
                "id": 5,
                "author": {
                    "id": 1,
                    "name": "Dalinar",
                    "username": "user1"
                },
                "comment": ".........",
                "post": 2
            }
        ],
        "title": "Shallan's Blog Post"
    }
]

How to Profile

In order to optimize anything, it is important to know how to profile it. There are many different ways to profile Django views, but I like to start out by creating a print statement with the number of database queries that have happened so far.

from rest_framework import viewsets

from .models import BlogPost
from .serializers import BlogPostSerializer


class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = BlogPost.objects.all()
    serializer_class = BlogPostSerializer

    def dispatch(self, *args, **kwargs):
        response = super().dispatch(*args, **kwargs)

        # For debugging purposes only.
        from django.db import connection
        print('# of Queries: {}'.format(len(connection.queries)))

        return response

Here, we have overridden the dispatch() method, which is the entry point for all Django views. The print statement prints the number of database queries that have happened so far, so you want to place it after the code you are profiling. You can even place one before and after a block of code and take the difference in order to narrow down the culprit.

Sending a request to /blog-posts/ produces the following in the console:

# of Queries: 10
[22/Dec/2018 17:53:45] "GET /blog-posts/ HTTP/1.1" 200 11732

It currently takes ten database queries to serialize two blog posts. We can improve on this.

Use select_related() for foreign key relationships

The thing we need to look for is foreign key relationships that are being serialized. Let’s take another look at the BlogPost model and serializer.

class BlogPost(models.Model):
    title = models.CharField(max_length=100)
    body = models.CharField(max_length=200)
    author = models.ForeignKey(
        'User', on_delete=models.CASCADE, related_name='posts')

    def __str__(self):
        return '{} - {}'.format(self.author.name, self.title)
class BlogPostSerializer(serializers.ModelSerializer):
    author = UserSerializer()
    comments = CommentSerializer(many=True)

    class Meta:
        model = BlogPost
        fields = ('id', 'author', 'body', 'comments', 'title',)

Now we can see that BlogPost has a foreign key relationship author to User, and it has been included in the serializer. By default, Django will make a database query each time an author needs to be serialized. To avoid this, we can use select_related() on the viewset’s queryset to tell Django to merge this extra query with the primary BlogPost query.

from rest_framework import viewsets

from .models import BlogPost
from .serializers import BlogPostSerializer


class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = (
        BlogPost.objects
        .select_related(
            'author',
        )
    )
    serializer_class = BlogPostSerializer

    def dispatch(self, *args, **kwargs):
        response = super().dispatch(*args, **kwargs)

        # For debugging purposes only.
        from django.db import connection
        print('# of Queries: {}'.format(len(connection.queries)))

        return response

Sending another response to our endpoint produces this:

# of Queries: 8
[22/Dec/2018 17:57:11] "GET /blog-posts/ HTTP/1.1" 200 11732

We only reduced the number of queries by one per blog post. We still have some work to do.

Use prefetch_related() for reverse relationships

The next thing to look out for is reverse relationships that are being serialized. Let’s take a look at the serializer again, and the Comment model this time.

class BlogPostSerializer(serializers.ModelSerializer):
    author = UserSerializer()
    comments = CommentSerializer(many=True)

    class Meta:
        model = BlogPost
        fields = ('id', 'author', 'body', 'comments', 'title',)
class Comment(models.Model):
    author = models.ForeignKey(
        'User', on_delete=models.CASCADE, related_name='comments')
    comment = models.CharField(max_length=200)
    post = models.ForeignKey(
        'BlogPost', on_delete=models.CASCADE, related_name='comments')

    def __str__(self):
        return '{} - {}'.format(self.post, self.comment)

Here we see that BlogPostSerializer serializes a list of comments from the reverse relationship to Comment. Without optimization, this will cause a database query to occur for each blog post. We can use prefetch_related() on the viewset’s queryset to reduce this to just one extra query (in addition to the primary BlogPost query) to get this data for all blog posts.

from rest_framework import viewsets

from .models import BlogPost
from .serializers import BlogPostSerializer


class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = (
        BlogPost.objects
        .select_related(
            'author',
        )
        .prefetch_related(
            'comments',
        )
    )
    serializer_class = BlogPostSerializer

    def dispatch(self, *args, **kwargs):
        response = super().dispatch(*args, **kwargs)

        # For debugging purposes only.
        from django.db import connection
        print('# of Queries: {}'.format(len(connection.queries)))

        return response

Sending another request to our endpoint produces:

# of Queries: 7
[22/Dec/2018 18:02:36] "GET /blog-posts/ HTTP/1.1" 200 11732

This reduced our total numbers of queries down by one, as expected, but there are still a lot of queries happening. We need to go one level deeper and look at CommentSerializer.

class CommentSerializer(serializers.ModelSerializer):
    author = UserSerializer()

    class Meta:
        model = Comment
        fields = ('id', 'author', 'comment', 'post')

Aha! Comment has a foreign key relationship author to User that it is serializing. We need to include this in prefetch_related(), like so:

class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = (
        BlogPost.objects
        .select_related(
            'author',
        )
        .prefetch_related(
            'comments__author',
        )
    )
    serializer_class = BlogPostSerializer

    def dispatch(self, *args, **kwargs):
        response = super().dispatch(*args, **kwargs)

        # For debugging purposes only.
        from django.db import connection
        print('# of Queries: {}'.format(len(connection.queries)))

        return response

Now, hitting our endpoint again produces this:

# of Queries: 3
[22/Dec/2018 18:07:53] "GET /blog-posts/ HTTP/1.1" 200 11732

That resulted in a large improvement. However, for a single queryset, we should generally shoot for one primary query and one extra query per reverse relationship. Since we only have one reverse relationship here, we should be shooting for no more than two database queries. We need to dig a little deeper.

Use Prefetch objects to control prefetch_related() at a deeper level

When we get down to a smaller number of queries, it is sometimes beneficial to print out the actual queries to see where they are coming from. We can do this by printing out connection.queries instead of len(connection.queries).

class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = (
        BlogPost.objects
        .select_related(
            'author',
        )
        .prefetch_related(
            'comments__author',
        )
    )
    serializer_class = BlogPostSerializer

    def dispatch(self, *args, **kwargs):
        response = super().dispatch(*args, **kwargs)

        # For debugging purposes only.
        from django.db import connection
        for query in connection.queries:
            print(query['sql'])

        return response

This yields the following when hitting our endpoint:

SELECT "demonstration_blogpost"."id", "demonstration_blogpost"."title", "demonstration_blogpost"."body", "demonstration_blogpost"."author_id", "demonstration_user"."id", "demonstration_user"."username", "demonstration_user"."name" FROM "demonstration_blogpost" INNER JOIN "demonstration_user" ON ("demonstration_blogpost"."author_id" = "demonstration_user"."id")
SELECT "demonstration_comment"."id", "demonstration_comment"."author_id", "demonstration_comment"."comment", "demonstration_comment"."post_id" FROM "demonstration_comment" WHERE "demonstration_comment"."post_id" IN (1, 2)
SELECT "demonstration_user"."id", "demonstration_user"."username", "demonstration_user"."name" FROM "demonstration_user" WHERE "demonstration_user"."id" IN (1, 2, 3)
[23/Dec/2018 16:09:35] "GET /blog-posts/ HTTP/1.1" 200 11732

The first thing to look at when looking at these queries is the FROM clause. This tells you which table is being queried, and usually directly corresponds to one of your models. The first query is FROM demonstration_blogpost, so we know this is the primary BlogPost query we expect. The second one is FROM demonstration_comment, so this is the query on Comment that we expect of a reverse relationship. The final one, however, is FROM demonstration_user. This is the one we need to try to eliminate.

This case is actually hard to figure out without knowing the details of how Django handles foreign key relationships on reverse relationships. You can read the docs for more information, but the gist is that adding comments__author to our prefetch_related() results in a query on BlogPost, Comment, and User. In order to prevent this, we need to tell Django to include the author data in the Comment query. Django provides a Prefetch object that allows this extra level of control.

from django.db.models import Prefetch
from rest_framework import viewsets

from .models import BlogPost, Comment
from .serializers import BlogPostSerializer


class BlogPostViewSet(viewsets.ModelViewSet):
    queryset = (
        BlogPost.objects
        .select_related(
            'author',
        )
        .prefetch_related(
            Prefetch(
                'comments',
                queryset=Comment.objects.select_related('author')
            )
        )
    )
    serializer_class = BlogPostSerializer

    def dispatch(self, *args, **kwargs):
        response = super().dispatch(*args, **kwargs)

        # For debugging purposes only.
        from django.db import connection
        print('# of Queries: {}'.format(len(connection.queries)))

        return response

Using Prefetch allows us to define the queryset for the prefetch. This means we can use select_related() to merge the comments and comments__author queries into one. Let’s try hitting our endpoint again.

# of Queries: 2
[23/Dec/2018 16:34:21] "GET /blog-posts/ HTTP/1.1" 200 11732

Finally, we’re down to two queries. This is the best we can do, as each prefetch_related() argument causes at least one database query.

Closing Remarks

Although our example started with ten database queries, each blog post added to the system would have increased that number. That can have a crippling effect on your system as your data grows in volume. After optimization, however, the database queries required to retrieve our data will always be two; no matter how much data accumulates.

This is a common theme when working with Django. Interacting with the database is easy and concise, but you can end up with something really inefficient if you don’t know what to look out for. I highly recommend reading my Django ORM Optimization Tips post to become more effective with the Django ORM. Also see Normalize Your Django REST Serializers to make your data easier for your client to use.

For full example code, check out the GitHub repository.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.