优化Django管理后台的SELECT COUNT(*)查询

Django管理后台打开界面后,会在上面默认显示出“总共 515657”这样的总数据条数,要查询出这个数字,django就会对表进行 select count(*) from table 这样的查询,而对于数据量比较大的InnoDB表来说,这个查询语句会比较慢,至少得要好几秒。

下面这个办法可以解决,当发现不带任何条件的count查询时,就用 show table status 语句中的innodb的估计值替代。方法来自于 stackoverflow:
http://stackoverflow.com/questions/10433173/prevent-django-admin-from-running-select-count-on-the-list-form

代码如下:

class ApproxCountQuerySet(QuerySet):
    """Counting all rows is very expensive on large Innodb tables. This
    is a replacement for QuerySet that returns an approximation if count()
    is called with no additional constraints. In all other cases it should
    behave exactly as QuerySet.

    Only works with MySQL. Behaves normally for all other engines.
    """

    def count(self):
        # Code from django/db/models/query.py

        if self._result_cache is not None and not self._iter:
            return len(self._result_cache)

        is_mysql = 'mysql' in connections[self.db].client.executable_name.lower()

        query = self.query
        if (is_mysql and not query.where and
                query.high_mark is None and
                query.low_mark == 0 and
                not query.select and
                not query.group_by and
                not query.having and
                not query.distinct):
            # If query has no constraints, we would be simply doing
            # "SELECT COUNT(*) FROM foo". Monkey patch so the we
            # get an approximation instead.
            cursor = connections[self.db].cursor()
            cursor.execute("SHOW TABLE STATUS LIKE %s",
                    (self.model._meta.db_table,))
            return cursor.fetchall()[0][4]
        else:
            return self.query.get_count(using=self.db)

然后想办法重载Django原生的QuerySet就好了

class ModelManager(models.Manager):

    def get_query_set(self):
        return CustomQuerySet(self.model, using=self._db)

然后 

class BaseModel(models.Model,):
    objects = manager.ModelManager()

models.Model = BaseModel

这样一来,凡是django中的不带条件count查询出来的值都会是个预估值,会不准,而且每次查询都会变。如果你想查一个准确值,就带个明显成立的条件好了,比如id大于0:

In [4]: User.objects.all().count()
Out[4]: 519110L

In [5]: User.objects.all().count()
Out[5]: 459258L

In [6]: User.objects.all().count()
Out[6]: 454654L

In [7]: User.objects.filter(id__gt=0).count()
Out[7]: 511441

In [8]: User.objects.filter(id__gt=0).count()
Out[8]: 511441