# 【Django】Django ORM - 性能优化

\$ python manage.py shell
>>> from core import models
>>> a = models.User.objects[0:5]
...


# 最重要的原则: Work at the appropriate level

# QuerySet operation on the database
# fast, because that's what databases are good at
my_bicycles.count()

# counting Python objects
# slower, because it requires a database query anyway, and processing
# of the Python objects
len(my_bicycles)

# Django template filter
# slower still, because it will have to count them in Python anyway,
# and because of template language overheads
\{\{ my_bicycles|length \}\}


# Lazy 的 QuerySets

## Background

>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)

# ps. 上边的这种多条件查询, 官方推荐这种写法:
Entry.objects.filter(
).exclude(
pub_date__gte=datetime.date.today()
).filter(
pub_date__gte=datetime(2005, 1, 30)
)


Internally, a QuerySet can be constructed, filtered, sliced, and generally passed around without actually hitting the database. No database activity actually occurs until you do something to evaluate the queryset.

## Situations to let queryset to be evaluated

1. Iteration，即对 Queryset 进行遍历循环的操作。

for e in Entry.objects.all():

2. slicing：即 Entry.objects.all()[:5], 获取 queryset 中的前五个对象，相当于 sql 中的 LIMIT 5

• 值得注意的是，Slicing an unevaluated QuerySet usually returns another unevaluated QuerySet, but Django will execute the database query if you use the “step” parameter of slice syntax, and will return a list. Slicing a QuerySet that has been evaluated also returns a list.
3. picling/caching。

4. repr/str：A QuerySet is evaluated when you call repr() on it. This is for convenience in the Python interactive interpreter, so you can immediately see your results when using the API interactively.

5. len：A QuerySet is evaluated when you call len() on it. This, as you might expect, returns the length of the result list.

• If you only need to determine the number of records in the set (and don’t need the actual objects), it’s much more efficient to handle a count at the database level using SQL’s SELECT COUNT(*). Django provides a count() method for precisely this reason.
6. list()：Force evaluation of a QuerySet by calling list() on it. For example:

entry_list = list(Entry.objects.all())

7. bool()：Testing a QuerySet in a boolean context, such as using bool(), or, and or an if statement, will cause the query to be executed. If there is at least one result, the QuerySet is True, otherwise False. For example:

if Entry.objects.filter(headline="Test"):
print("There is at least one Entry with the headline Test")


**举个栗子: **

>>> queryset = Entry.objects.all()
>>> print([p.headline for p in queryset]) # Evaluate the query set.
>>> print([p.pub_date for p in queryset]) # Re-use the cache from the evaluation, rather than evaluate the queery set again.


>>> queryset = Entry.objects.all()
>>> print queryset[5] # Queries the database
>>> print queryset[5] # Queries the database again

>>> queryset = Entry.objects.all()
>>> [entry for entry in queryset] # Queries the database
>>> print queryset[5] # Uses cache
>>> print queryset[5] # Uses cache


# Caching and QuerySets

Each QuerySet contains a cache to minimize database access. Understanding how it works will allow you to write the most efficient code.

In a newly created QuerySet, the cache is empty. The first time a QuerySet is evaluated – and, hence, a database query happens – Django saves the query results in the QuerySet’s cache and returns the results that have been explicitly requested (e.g., the next element, if the QuerySet is being iterated over). Subsequent evaluations of the QuerySet reuse the cached results.

Keep this caching behavior in mind, because it may bite you if you don’t use your QuerySets correctly. For example, the following will create two QuerySets, evaluate them, and throw them away:

>>> print([e.headline for e in Entry.objects.all()])
>>> print([e.pub_date for e in Entry.objects.all()])


That means the same database query will be executed twice, effectively doubling your database load. Also, there’s a possibility the two lists may not include the same database records, because an Entry may have been added or deleted in the split second between the two requests.

To avoid this problem, simply save the QuerySet and reuse it:

>>> queryset = Entry.objects.all()
>>> print([p.headline for p in queryset]) # Evaluate the query set.
>>> print([p.pub_date for p in queryset]) # Re-use the cache from the evaluation.


## When QuerySets are not cached

Querysets do not always cache their results. When evaluating only part of the queryset, the cache is checked, but if it is not populated then the items returned by the subsequent query are not cached. Specifically, this means that limiting the queryset using an array slice or an index will not populate the cache.

For example, repeatedly getting a certain index in a queryset object will query the database each time:

>>> queryset = Entry.objects.all()
>>> print(queryset[5]) # Queries the database
>>> print(queryset[5]) # Queries the database again


However, if the entire queryset has already been evaluated, the cache will be checked instead:

>>> queryset = Entry.objects.all()
>>> [entry for entry in queryset] # Queries the database
>>> print(queryset[5]) # Uses cache
>>> print(queryset[5]) # Uses cache


Here are some examples of other actions that will result in the entire queryset being evaluated and therefore populate the cache:

>>> [entry for entry in queryset]
>>> bool(queryset)
>>> entry in queryset
>>> list(queryset)


# 一些结论

## 1 显式地声明你需要的字段

>>> a = models.User.objects.all()
>>> a[0].id
1


SELECT core_user.id, core_user.name, core_user.password, core_user.salt, core_user.join_time, core_user.last_login_time, core_user.type FROM core_user LIMIT 1


## 2 当判断某条数据是否存在时，尽量使用 exists()

>>>test = models.User.objects.filter(id=1).exists()


SELECT (1) AS a FROM core_user WHERE core_user.id = 1  LIMIT 1


entry = Entry.objects.get(pk=123)
if some_queryset.filter(pk=entry.pk).exists():
print("Entry contained in queryset")


entry = User(1, ...)
some_queryset = models.User.objects.all()
if entry in some_queryset: # evaluate the sql here
print("Entry contained in QuerySet")


SELECT core_user.id, core_user.name, core_user.password, core_user.salt, core_user.join_time, core_user.last_login_time, core_user.type FROM core_user


And to find whether a queryset contains any items:

if some_queryset.exists():
print("There is at least one object in some_queryset")


Which will be faster than:

if some_queryset:
print("There is at least one object in some_queryset")


… but not by a large degree (hence needing a large queryset for efficiency gains).

SELECT (1) AS a FROM core_user WHERE core_user.id = 1  LIMIT 1


Additionally, if a some_queryset has not yet been evaluated, but you know that it will be at some point, then using some_queryset.exists() will do more overall work (one query for the existence check plus an extra one to later retrieve the results) than simply using bool(some_queryset), which retrieves the results and then checks if any were returned.

## 3 获取行的数量时，使用QuerySet.count()

If you only want the count, rather than doing len(queryset).

## 4 直接使用外键

If you only need a foreign key value, use the foreign key value that is already on the object you’ve got, rather than getting the whole related object and taking its primary key. i.e. do:

entry.blog_id


entry.blog.id


## 5 批量插入

When creating objects, where possible, use the bulk_create() method to reduce the number of SQL queries. For example:

Entry.objects.bulk_create([
])


…is preferable to:

Entry.objects.create(headline='This is a test')


Note that there are a number of caveats to this method, so make sure it’s appropriate for your use case.

This also applies to ManyToManyFields, so doing:

my_band.members.add(me, my_friend)


…is preferable to:

my_band.members.add(me)


…where Bands and Artists have a many-to-many relationship.

# 一些测试

## Test1

>>> a = models.User.objects.values("id")[0:5]
>>> for i in range(0,2):
...     a[i]["id"]


SQL 执行：

2019-11-17T13:15:18.454675Z	   35 Query	SELECT core_user.id FROM core_user LIMIT 1
2019-11-17T13:15:18.456044Z	   35 Query	SELECT core_user.id FROM core_user LIMIT 1 OFFSET 1


### 结论

• 执行完 a = models.User.objects.values("id")[0:5] 语句后，并没有触发执行 SQL 语句；
• 每次进入执行 a[i]["id"]时，都会真正触发执行一次 SQL 语句。

## Test3 - Debug 模式下的特殊情况

def test():
users = models.User.objects.all()
user = users[0]
print "test"


2019-11-24T04:39:21.763123Z	   46 Connect	root@localhost on task2 using TCP/IP
2019-11-24T04:39:21.773622Z	   46 Query	SET AUTOCOMMIT = 0
2019-11-24T04:39:21.776080Z	   46 Query	SET SQL_AUTO_IS_NULL = 0
2019-11-24T04:39:21.783327Z	   46 Query	SET AUTOCOMMIT = 1
2019-11-24T04:39:21.789287Z	   46 Query	SELECT core_user.id, core_user.name, core_user.password, core_user.salt, core_user.join_time, core_user.last_login_time, core_user.type FROM core_user LIMIT 21


# django-debug-toolbar

## 原生的 explain 方法

>>> print(Blog.objects.filter(title='My Blog').explain(verbose=True))
Seq Scan on public.blog  (cost=0.00..35.50 rows=10 width=12) (actual time=0.004..0.004 rows=10 loops=1)
Output: id, title
Filter: (blog.title = 'My Blog'::bpchar)
Planning time: 0.064 ms
Execution time: 0.058 ms


## connection.queries 方法

from django.db import connection
connection.queries
>> [{'sql': 'SELECT polls_polls.id, polls_polls.question, polls_polls.pub_date FROM polls_polls',
'time': '0.002'}]

from django.db import reset_queries
reset_queries()