When inserting a large number of objects into the database with Django, your first thought should be to use
bulk_create. It is much more efficient than calling
create for each object, and it generally only results in a single query. However, when dealing with tens of thousands, hundreds of thousands, or even more objects, you may run into out of memory errors.
Consider the following code:
from my_app.models import MyModel def create_data(data): objs =  for row in data: obj = MyModel(field1=data['field1']) objs.append(obj) MyModel.objects.bulk_create(objs)
This code works fine and is efficient from a database perspective for most cases. However, as
data grows large,
objs has the potential to take up too much memory and cause errors.
This is a solved problem in Python. Just use a generator. A generator is a function that can be iterated over. The benefit of this is that the full results of the function are not loaded into memory at once; they are loaded as the iteration occurs. We can use this to our advantage:
from my_app.models import MyModel def create_data(data): MyModel.objects.bulk_create(generator()) def generator(data): for row in data: yield MyModel(field1=data['field1'])
This would work, in theory. Unfortunately, at the time of writing, Django’s
bulk_create converts its iterable argument into a list. This causes the generator to be fully evaluated and thus negates the purpose of it.
bulk_create converts the generator into a list, we need to split the data into batches and pass each chunk in one at a time.
from itertools import islice from my_app.models import MyModel def create_data(data): bulk_create(MyModel, generator()) def bulk_create(model, generator, batch_size=10000): """ Uses islice to call bulk_create on batches of Model objects from a generator. """ while True: items = list(islice(generator, batch_size)) if not items: break model.objects.bulk_create(items) def generator(data): for row in data: yield MyModel(field1=data['field1'])
Here, we use
islice to slice a generator (without fully evaluating it) and pass each batch into
bulk_create. This makes it so that only
batch_size items are loaded into memory at a time. This comes at the cost of more database queries, so pick a batch size that is high enough to be performant, but low enough to avoid out of memory errors.
For more Django optimization tips, check out Django ORM Optimization Tips.