PostgreSQL的最佳特性你用了吗？-postgresql的优点

SQL语句通常不是很容易理解，特别是你阅读别人已经写好的语句。因此，很多人指出我们应该遵循在其他语言中遵循的原则，像加上注释和功能模块化。我***注意到一个很多人都没有使用的Postgres关键特性，也就是 @timonk在AWS Re:Invent 大会关于数据仓库服务Redshift主题演讲时指出的一个特性。这个特性实际上使得SQL兼具了可读性和模块性。在以前，我回头阅读自己的几个月前的 SQL语句，通常很难理解，而现在我可以做到这一点。

这个特性就是CTEs，也就是公用表表达式，你有可能称做它为WITH 语句。和数据库中视图一样，它的主要好处就是，它允许你在当前事务中创建临时表。你可以大量使用它，因为它允许你思路清晰的构建模块，别人很容易就理解你在做什么。

让我们举个简单的例子

WITH users_tasks AS ( 
  SELECT 
         users.email, 
         array_agg(tasks.name) as task_list, 
         projects.title 
  FROM 
       users, 
       tasks, 
       project 
  WHERE 
        users.id = tasks.user_id 
        projects.title = tasks.project_id 
  GROUP BY 
           users.email, 
           projects.title 
)

通过这样定义临时表users_tasks，我就可以在后面加上对users_tasks基本查询语句，像：

SELECT * 
FROM users_tasks;

有趣的是你可以将它们连在一起。当我知道分配给每个用户的任务量时，也许我想知道在一个指定的任务上，谁因为对这个任务负责超过了50%而因此造成瓶颈。为了简化，我们可以使用多种方式，先计算每个任务的总量，然后是每人针对每个任务的负责总量。

total_tasks_per_project AS ( 
  SELECT 
         project_id, 
         count(*) as task_count 
  FROM tasks 
  GROUP BY project_id 
), 
  
tasks_per_project_per_user AS ( 
  SELECT 
         user_id, 
         project_id, 
         count(*) as task_count 
  FROM tasks 
  GROUP BY user_id, project_id 
),

现在我们将组合一下然后发现超过50%的用户

overloaded_users AS ( 
  SELECT tasks_per_project_per_user.user_id, 
  
  FROM tasks_per_project_per_user, 
       total_tasks_per_project 
  WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) 
)

最终目标，我想获得超负荷工作这的用户和任务的逗号分隔列表。我们只要简单地对overloaded_users和 users_tasks的初始列表进行join操作。放在一起可能有点长，但是可读性强。作为额外帮助，我又在每一层加了注释。

--- Created by Craig Kerstiens 11/18/2013 
--- Query highlights users that have over 50% of tasks on a given project 
--- Gives comma separated list of their tasks and the project 
  
--- Initial query to grab project title and tasks per user 
WITH users_tasks AS ( 
  SELECT 
         users.id as user_id, 
         users.email, 
         array_agg(tasks.name) as task_list, 
         projects.title 
  FROM 
       users, 
       tasks, 
       project 
  WHERE 
        users.id = tasks.user_id 
        projects.title = tasks.project_id 
  GROUP BY 
           users.email, 
           projects.title 
), 
  
--- Calculates the total tasks per each project 
total_tasks_per_project AS ( 
  SELECT 
         project_id, 
         count(*) as task_count 
  FROM tasks 
  GROUP BY project_id 
), 
  
--- Calculates the projects per each user 
tasks_per_project_per_user AS ( 
  SELECT 
         user_id, 
         project_id, 
         count(*) as task_count 
  FROM tasks 
  GROUP BY user_id, project_id 
), 
  
--- Gets user ids that have over 50% of tasks assigned 
overloaded_users AS ( 
  SELECT tasks_per_project_per_user.user_id, 
  
  FROM tasks_per_project_per_user, 
       total_tasks_per_project 
  WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) 
) 
  
SELECT 
       email, 
       task_list, 
       title 
FROM 
     users_tasks, 
     overloaded_users 
WHERE 
      users_tasks.user_id = overloaded_users.user_id

CTEs通常不如经过精简优化过的SQL语句性能高。大多数差距小于一倍差距。对我而言，这种为了可读性作出的折中是毋庸置疑的。Postgres优化器以后肯定会针对这点变的更好。

多说一句，是的我可以用大约10-15行简短的SQL语句做同样的事情，但是你也许不能很快的理解它。当你碰到需要保证SQL做正确的事情时，可读性的优势就出来了。SQL语句总是有个结果，你对此毫无疑问。确保你SQL语句容易推理是保证正确性的关键。

原文链接：http://www.craigkerstiens.com/2013/11/18/best-postgres-feature-youre-not-using/

PostgreSQL的最佳特性 你用了吗？

PostgreSQL的最佳特性你用了吗？