When you try to iterate over a query set with about 0.5 million items (a few hundred megs of db storage), the memory usage can become somewhat problematic. Adding .iterator to your query set helps somewhat, but still loads the entire query result into memory. Cronjobs at YouTellMe.nl where unfortunately starting to fail. My colleague Rick came up with the following fix.
This solution chunks up the querying in bits of 1000 (by default). While this is somewhat heavier on your database (multiple queries) it seriously reduces the memory usage. Curious to hear how other django developers have worked around this problem.
import gc def queryset_iterator(queryset, chunksize=1000): ''' Iterate over a Django Queryset ordered by the primary key This method loads a maximum of chunksize (default: 1000) rows in it's memory at the same time while django normally would load all rows in it's memory. Using the iterator() method only causes it to not preload all the classes. Note that the implementation of the iterator does not support ordered query sets. ''' pk = 0 last_pk = queryset.order_by('-pk')[0].pk queryset = queryset.order_by('pk') while pk < last_pk: for row in queryset.filter(pk__gt=pk)[:chunksize]: pk = row.pk yield row gc.collect() #Some Examples: #old MyItem.objects.all() #better MyItem.objects.all().iterator() #even better queryset_iterator(MyItem.objects.all())