NAME
SELECT - 从表或视图中取出若干行
SYNOPSIS
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] * | expression [ AS output_name ] [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL ] select ] [ ORDER BY expression [ ASC | DESC | USING operator ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start ] [ FOR UPDATE [ OF table_name [, ...] ] ] where from_item can be one of: [ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ] function_name ( [ argument [, ...] ] ) [ AS ] alias [ ( column_alias [, ...] | column_definition [, ...] ) ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] ) from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
[Comment: FIXME: This last syntax is incorrect if the join type is an INNER or OUTER join (in which case one of NATURAL, ON ..., or USING ... is mandatory, not optional). What's the best way to fix this?]
DESCRIPTION 描述
SELECT 将从一个或更多表中返回记录行。 SELECT 通常的处理如下:
- 1.
计算列出在 FROM 中的所有元素。(FROM 中的每个元素都是一个真正的或者虚拟的表。)如果在 FROM 列表里声明了多过一个元素,那么他们就交叉连接在一起。(参阅下面的 FROM Clause [select(7)] )。- 2.
如果声明了 WHERE 子句,那么在输出中消除所有 不满足条件的行。(参阅下面的 WHERE Clause [select(7)] )。- 3.
如果声明了 GROUP BY 子句,输出就分成匹配一个或多个数值的不同组里。 如果出现了 HAVING 子句,那么它消除那些不满足给出条件的组。(参阅下面的 GROUP BY Clause [select(7)] 和 HAVING Clause [select(7)] )。- 4.
使用 UNION,INTERSECT, 和 EXCEPT,我们可以把多个 SELECT 语句的输出合并成一个结果集。UNION 操作符返回在两个结果集或者其中一个中的行, INTERSECT 操作符返回严格地在两个结果集中都有的行。 EXCEPT 操作符返回在第一个结果集中,但是不在第二个结果集中的行。不管哪种情况, 重复的行都被删除,除非声明了 ALL。(参阅下面的 UNION Clause [select(7)], INTERSECT Clause [select(l)], 和 EXCEPT Clause [select(7)] )。- 5.
实际输出行的时候,SELECT 先为每个选出的行计算输出表达式 (参阅下面的 SELECT List [select(7)] )。- 6.
如果声明了 ORDER BY 子句,那么返回的行是按照指定的顺序排序的。 如果没有给出 ORDER BY,那么数据行是按照系统认为可以最快生成的方法给出的。 (参阅下面的 ORDER BY Clause [select(7)] )。- 7.
如果给出了 LIMIT 或者 OFFSET 子句,那么 SELECT 语句只返回结果行的一个子集。(参阅下面的 LIMIT Clause [select(7)] )。- 8.
- DISTINCT 从结果中删除那些重复的行。 DISTINCT ON 删除那些匹配所有指定表达式的行。 ALL (缺省)将返回所有候选行,包括重复的。 (参阅下面的 DISTINCT Clause [select(7)] )。
- 9.
- FOR UPDATE 子句导致 SELECT 语句对并发的更新锁住选定的行。(参阅下面的 FOR UPDATE Clause [select(7)] )。
你必须有 SELECT 权限用来从表中读取数值。 使用 FOR UPDATE 还要求 UPDATE 权限。
PARAMETERS 参数
FROM 子句
FROM 子句为 SELECT 声明一个或者多个源表。 如果声明了多个源表,那么结果就是所有源表的笛卡儿积(交叉连接)。 但是通常我们会添加一些条件,把返回行限制成笛卡儿积的一个小的结果集。
FROM-子句可以包括:
- table_name
一个现存的表或视图的名字(可以有模式修饰)。 如果声明了ONLY,则只扫描该表。 如果没有声明ONLY,该表和所有其派生表(如果有的话)都被扫描。 可以在表名后面跟一个*来表示扫所有其后代表, 但在目前的版本里,这是缺省特性。 (在 PostgreSQL 7.1 以前的版本里,ONLY是缺省特性。) 缺省的特性可以通过修改配置选项 sql_interitance 来改变。- alias
为那些包含别名的 FROM 项目取的别名。别名用于缩写或者在自连接中消除歧义(自连接里,同一个表扫描了多次)。 如果提供了别名,那么它就会完全隐藏表或者函数的实际名字; 比如,如果给出 FROM foo AS f,那么 SELECT 剩下的东西必须吧这个 FROM 项以 f 而不是 foo 引用。如果写了别名, 我们也可以提供一个字段别名列表,这样可以替换表中一个或者多个字段的名字。- select
一个子 SELECT 在 FROM 子句里出现的。 它的输出作用好象是为这条 SELECT 命令在其生存期里创建一个临时表。 请注意这个子 SELECT 必须用园括弧包围。 并且必须给它加别名。- function_name
函数调用可以出现在 FROM 子句里。 (对于那些返回结果集的函数特别有用,但是任何函数都能用。) 这么做就好像在这个 SELECT 命令的生命期中, 把函数的输出创建为一个临时表一样。我们也可以使用别名。如果写了别名, 我们还可以写一个字段别名列表,为函数返回的复合类型的一个或多个属性提供名字替换。 如果函数定义为了 record 数据类型, 那么必须出现一个 AS 关键字或者别名,后面跟着一个字段定义列表, 形如:( column_name data_type [, ... ])。 这个字段定义列表必须匹配函数返回的字段的实际数目和类型。- join_type
- *
- [ INNER ] JOIN
- *
- LEFT [ OUTER ] JOIN
- *
- RIGHT [ OUTER ] JOIN
- *
- FULL [ OUTER ] JOIN
- *
- CROSS JOIN
之一。 就 INNER 和 OUTER 连接类型, 我们必须声明一个连接条件,也就是说一个 NATURAL, ON join_condition, 或者 USING (join_column [, ...])。 见下文获取它们的含义,对于 CROSS JOIN,这些子句都不能出现。
一个 JOIN 子句,组合了两个 FROM 项。 必要时使用圆括弧以决定嵌套的顺序。 如果没有圆括弧,JOIN 的嵌套从左向右。 在任何情况下,JOIN 都比逗号分隔的 FROM 项绑定得更紧。
CROSS JOIN 和 INNER JOIN 生成一个简单的笛卡儿积,和你在 FROM 的顶层列出两个项的结果相同。 CROSS JOIN 等效于 INNER JOIN ON (true), 也就是说,没有被条件删除的行。这种连接类型只是符号上的方便, 因为它们和你用简单的 FROM 和 WHERE 干的事情是一样的。
LEFT OUTER JOIN 返回有条件的笛卡儿积(也就是说, 所有组合出来的行都通过了连接条件)中的行,加上左手边的表中没有对应的右手边表的行可以一起匹配通过连接条件的那些行。 这样的左手边的行扩展成连接生成表的全长,方法是在那些右手边表对应的字段位置填上空。请注意,只有在决定那些行是匹配的时候, 之计算 JOIN 子句自己的条件。外层的条件是在这之后施加的。
对应的是,RIGHT OUTER JOIN 返回所有连接出来的行, 加上每个不匹配的右手边行(左边用空值扩展)。这只是一个符号上的便利,因为我们总是可以把它转换成一个 LEFT OUTER JOIN, 只要把左边和右边的输入对掉一下即可。
FULL OUTER JOIN 返回所有连接出来的行,加上每个不匹配的左手边的行(右边用空值扩展), 加上每个不匹配的右手边的行(左边用空值扩展)。
- ON join_condition
- join_condition 是一个表达式, 生成类型为 boolean 的结果(类似WHERE 子句), 表示连接中那些行被认为是匹配的。
- USING (join_column [, ...])
一个形如 USING ( a, b, ... ) 的子句, 是ON left_table.a = right_table.a AND left_table.b = right_table.b ... 的缩写。同样,USING 蕴涵着:每对等效字段中只有一个包含在连接输出中,而不是两个都输出的意思。- NATURAL
- NATURAL 是一个 USING 列表的缩写,这个列表说的是两个表中同名的的字段。
WHERE 子句
可选的 WHERE 条件有如下常见的形式:
WHERE condition
这里 condition 可以是任意生成类型为 boolean 的表达式。 任何不满足这个条件的行都会从输出中删除。如果一个行的数值替换到条件的引用中计算出来的条件为真,那么该行就算满足条件。
GROUP BY 子句
可选的 GROUP BY 子句的一般形式
GROUP BY expression [, ...]
GROUP BY 将把所有在组合了的表达式上共享同样的值的行压缩成一行。 expression 可以是一个输入字段名字, 或者是一个输入字段(SELECT 列表)的序号,或者也可以是任意从输入字段值形成的任意表达式。 在有歧义的情况下,一个 GROUP BY 的名字将被解释成输入字段的名字,而不是输出字段的名字。
如果使用了聚集函数,那么就会对组成一组的所有行进行计算,为每个组生成一个独立的值(而如果没有 GROUP BY, 那么聚集对选出来的所有行计算出一个值)。如果出现了 GROUP BY, 那么 SELECT 列表表达式中再引用那些没有分组的字段就是非法的, 除非放在聚集函数里,因为对于未分组的字段,可能会返回多个数值。
HAVING 子句
可选的 HAVING 子句有如下形式:
HAVING condition
这里 condition 和为 WHERE 子句里声明的相同。
HAVING 去除了一些不满足条件的组行。 HAVING 与 WHERE 不同: WHERE 在使用 GROUP BY 之前过滤出单独的行,而 HAVING 过滤由 GROUP BY 创建的行。 在 condition 里引用的每个字段都必须无歧义地引用一个分组的行,除非引用出现在一个聚集函数里。
UNION 子句
UNION 子句的一般形式是:
select_statement UNION [ ALL ] select_statement
这里 select_statement 是任意没有 ORDER BY,LIMIT,或者 FOR UPDATE 子句的 SELECT语句。 (如果用圆括弧包围,ORDER BY 和 LIMIT 可以附着在子表达式里。 如果没有圆括弧,这些子句将交给 UNION 的结果使用, 而不是给它们右手边的输入表达式。)
UNION 操作符计算那些涉及到的所有 SELECT 语句返回的行的结果联合。 一个行如果至少在两个结果集中的一个里面出现,那么它就会在这两个结果集的集合联合中。 两个做为 UNION 直接操作数的SELECT必须生成相同数目的字段, 并且对应的字段必须有兼容的数据类型。
缺省地,UNION 的结果不包含任何重复的行,除非声明了 ALL 子句。 ALL 制止了消除重复的动作。
同一SELECT语句中的多个 UNION 操作符是从左向右计算的, 除非用圆括弧进行了标识。
目前,FOR UPDATE 不能在 UNION 的结果或输入中声明。
INTERSECT 子句
INTERSECT 子句的一般形式是:
select_statement INTERSECT [ ALL ] select_statement
select_statement 是任何不带 ORDER BY, LIMIT,或者 FOR UPDATE 子句的 SELECT 语句。
INTERSECT 计算涉及的 SELECT 语句返回的行的集合交集。 如果一个行在两个结果集中都出现,那么它就在两个结果集的交集中。
NTERSECT 的结果不包含任何重复行,除非你声明了 ALL 选项。 用了 ALL 以后,一个在左手边的表里有 m 个重复而在右手边表里有 n 个重复的行将出现 min(m,n) 次。
除非用圆括号指明顺序, 同一 SELECT 语句中的多个 INTERSECT 操作符是从左向右计算的。 INTERSECT 比 UNION 绑定得更紧 --- 也就是说 A UNION B INTERSECT C 将读做 A UNION (B INTERSECT C),除非你用圆括弧声明。
EXCEPT 子句
EXCEPT 子句有如下的通用形式:
select_statement EXCEPT [ ALL ] select_statement
这里 fIselect_statement 是任何没有 ORDER BY,LIMIT,或者 FOR UPDATE 子句的 SELECT 表达式。
EXCEPT 操作符计算存在于左边SELECT 语句的输出而不存在于右边语句输出的行。
EXCEPT 的结果不包含任何重复的行,除非声明了 ALL 选项。 使用 ALL 时,一个在左手边表中有 m 个重复而在右手边表中有 n 个重复的行将出现 max(m-n,0) 次。
除非用圆括弧指明顺序,同一 SELECT 语句中的多个 EXCEPT 操作符是从左向右计算的。 EXCEPT 和 UNION 绑定级别相同。
SELECT 列表
SELECT 列表(在关键字 SELECT 和 FROM) 之间的东西)声明一个表达式,这个表达式形成 SELECT 语句的输出行。这个表达式可以(通常也的确是)引用那些在 FROM 子句里计算的字段。 通过使用 AS output_name, 我们可以为一个输出行声明另外一个名字。这个名字主要用做显示该行的标签。 它也可以在 ORDER BY 和 GROUP BY 子句里当作字段值的引用, 但是不能在 WHERE 或者 HAVING 子句里这么用;在那里,你必须写出表达式。
除了表达式之外,我们也可以在输出列表上写一个 * 表示选出的行的所有字段的缩写。同样,我们可以写 table_name.* 作为来自某个特定表的字段的缩写。
ORDER BY 子句
可选的 ORDER BY 子句有下面的一般形式:
ORDER BY expression [ ASC | DESC | USING operator ] [, ...]
expression 可以是一个输出字段(SELECT 列表)的名字或者序号, 或者也可以是用输入字段的数值组成的任意表达式。
ORDER BY 子句导致结果行根据指定的表达式进行排序。 如果根据最左边的表达式,两行的结果相同,那么就根据下一个表达式进行比较, 依此类推。如果对于所有声明的表达式他们都相同,那么以随机顺序返回。
序数指的是列/字段按顺序(从左到右)的位置。 这个特性让我们可以对没有唯一名称的列/字段进行排序。 这一点从来不是必须的, 因为总是可以通过 AS 子句给一个要计算的列/字段赋予一个名称。
在 ORDER BY 里还可以使用任意表达式, 包括那些没有出现在SELECT结果列表里面的字段。 因此下面的语句现在是合法的:
SELECT name FROM distributors ORDER BY code;
这个特性的一个局限就是应用于 UNION,INTERSECT, 或者 EXCEPT 查询的 ORDER BY 子句只能在一个输出字段名或者数字上声明,而不能在一个表达式上声明。
请注意如果一个 ORDER BY 表达式是一个简单名称, 同时匹配结果字段和输入字段, ORDER BY 将把它解释成结果字段名称。 这和 GROUP BY 在同样情况下做的选择正相反。 这样的不一致是由 SQL 标准强制的。
我们可以给 ORDER BY 子句里每个列/字段加一个关键字 DESC (降序)或 ASC(升序)。如果不声明, ASC 是缺省。 我们还可以在 USING 子句里声明一个排序操作符来实现排序。 ASC 等效于使用 USING < 而 DESC 等效于使用 USING >。 (But the creator of a user-defined data type can define exactly what the default sort ordering is, and it might correspond to operators with other names.)
在一个域里,空值排序时排在其它数值前面。换句话说,升序排序时, 空值排在末尾,而降序排序时空值排在开头。
字符类型的数据是按照区域相关的字符集顺序排序的,这个区域是在数据库集群初始化的时候建立的。
LIMIT 子句
LIMIT 子句由两个独立的子句组成:
LIMIT { count | ALL } OFFSET start
这里 count 声明返回的最大行数,而 start 声明开始返回行之前忽略的行数。
.PP
LIMIT 允许你检索由查询其他部分生成的行的某一部分。 如果给出了限制计数,那么返回的行数不会超过哪个限制。 如果给出了一个偏移量,那么开始返回行之前会忽略那个数量的行。
在使用 LIMIT 时, 一个好习惯是使用一个 ORDER BY 子句把结果行限制成一个唯一的顺序。 否则你会得到无法预料的查询返回的子集 --- 你可能想要第十行到第二十行, 但以什么顺序?除非你声明 ORDER BY,否则你不知道什么顺序。
查询优化器在生成查询规划时把 LIMIT 考虑进去了, 所以你很有可能因给出的 LIMIT 和 OFFSET 值不同而得到不同的规划(生成不同的行序)。 因此用不同的 LIMIT/OFFSET 值选择不同的查询结果的子集将不会产生一致的结果, 除非你用 ORDER BY 强制生成一个可预计的结果顺序。 这可不是毛病;这是 SQL 生来的特点,因为除非用了 ORDER BY 约束顺序, SQL 不保证查询生成的结果有任何特定的顺序。
DISTINCT 子句
如果声明了 DISTINCT,那么就从结果集中删除所有重复的行(每个有重复的组都保留一行)。 ALL 声明相反的作用:所有行都被保留;这个是缺省。
DISTINCT ON ( expression [, ...] ) 只保留那些在给出的表达式上运算出相同结果的行集合中的第一行。 DISTINCT ON 表达式是使用与 ORDER BY (见上文) 相同的规则进行解释的。请注意,除非我们使用了 ORDER BY 来保证我们需要的行首先出现,否则,每个 "第一行" 是不可预测的。 比如,
SELECT DISTINCT ON (location) location, time, report FROM weather_reports ORDER BY location, time DESC;
为每个地点检索最近的天气报告。但是如果我们没有使用 ORDER BY 来强制对每个地点的时间值进行降序排序,那么我们就会得到每个地点的不知道什么时候的报告。
DISTINCT ON 表达式必须匹配最左边的 ORDER BY 表达式。 ORDER BY 子句将通常包含额外的表达式来判断每个 DISTINCT ON 组里面需要的行的优先级。
FOR UPDATE 子句
FOR UPDATE 子句有下面的形式
FOR UPDATE [ OF table_name [, ...] ]
FOR UPDATE 令那些被 SELECT 语句检索出来的行被锁住,就像要更新一样。 这样就避免它们在当前事务结束前被其它事务修改或者删除; 也就是说,其它视图 UPDATE,DELETE, 或者 SELECT FOR UPDATE 这些行的事务将被阻塞, 直到当前事务结束。同样,如果一个来自其它事务的 UPDATE, DELETE,或者 SELECT FOR UPDATE 已经锁住了某个或某些选定的行,SELECT FOR UPDATE 将等到那些事务结束, 并且将随后锁住并返回更新的行(或者不返回行,如果行已经被删除)。更多的讨论参阅 Chapter 12 ``Concurrency Control'' 。
如果特定的表在 FOR UPDATE 中,那么只有来自这些表中的行才被锁住; 任何在 SELECT 中使用的其它表都只是和平常一样读取。
FOR UPDATE 不能在那些无法使用独立的表数据行清晰标识返回行的环境里; 比如,它不能和聚集一起使用。
FOR UPDATE 可以在 LIMIT 前面出现, 主要是为了和 7.3 之前的 PostgreSQL 兼容。 不过,它在 LIMIT 后面执行更高效,因此我们建议放在 LIMIT 后面。
EXAMPLES 例子
将表 films 和表 distributors 连接在一起:
SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d, films f WHERE f.did = d.did title | did | name | date_prod | kind -------------------+-----+--------------+------------+---------- The Third Man | 101 | British Lion | 1949-12-23 | Drama The African Queen | 101 | British Lion | 1951-08-11 | Romantic ...
统计用kind 分组的所有电影和组的列/字段的 len(长度)的和:
SELECT kind, sum(len) AS total FROM films GROUP BY kind; kind | total ----------+------- Action | 07:34 Comedy | 02:58 Drama | 14:28 Musical | 06:42 Romantic | 04:38
统计所有电影(films),组的列/字段 len(长度)的和,用 kind 分组并且显示小于5小时的组总和:
SELECT kind, sum(len) AS total FROM films GROUP BY kind HAVING sum(len) < interval '5 hours'; kind | total ----------+------- Comedy | 02:58 Romantic | 04:38
下面两个例子是根据第二列(name)的内容对单独的结果排序的经典的方法:
SELECT * FROM distributors ORDER BY name; SELECT * FROM distributors ORDER BY 2; did | name -----+------------------ 109 | 20th Century Fox 110 | Bavaria Atelier 101 | British Lion 107 | Columbia 102 | Jean Luc Godard 113 | Luso films 104 | Mosfilm 103 | Paramount 106 | Toho 105 | United Artists 111 | Walt Disney 112 | Warner Bros. 108 | Westward
下面这个例子演示如何获得表 distributors 和 actors的连接, 只将每个表中以字母 W 开头的取出来。 因为只取了不相关的行,所以关键字 ALL 被省略了:
distributors: actors: did | name id | name -----+-------------- ----+---------------- 108 | Westward 1 | Woody Allen 111 | Walt Disney 2 | Warren Beatty 112 | Warner Bros. 3 | Walter Matthau ... ... SELECT distributors.name FROM distributors WHERE distributors.name LIKE 'W%' UNION SELECT actors.name FROM actors WHERE actors.name LIKE 'W%'; name ---------------- Walt Disney Walter Matthau Warner Bros. Warren Beatty Westward Woody Allen
这个例子显示了如何在 FROM 子句中使用一个函数, 包括带有和不带字段定义列表的。
CREATE FUNCTION distributors(int) RETURNS SETOF distributors AS ' SELECT * FROM distributors WHERE did = $1; SELECT * FROM distributors(111); did | name -----+------------- 111 | Walt Disney CREATE FUNCTION distributors_2(int) RETURNS SETOF record AS ' SELECT * FROM distributors WHERE did = $1; SELECT * FROM distributors_2(111) AS (f1 int, f2 text); f1 | f2 -----+------------- 111 | Walt Disney
COMPATIBILITY 兼容性
当然,SELECT 语句和 SQL 标准兼容。但是还有一些扩展和一些缺少的特性。
省略 FROM 子句
PostgreSQL 允许我们在一个查询里省略 FROM 子句。 它的最直接用途就是计算简单的常量表达式的结果:
SELECT 2+2; ?column? ---------- 4
其它有些 SQL 数据库不能这么做,除非引入一个单行的伪表做 SELECT 的数据源。
这个特性的另外一个不太明显的用途是把一个普通的从一个或多个表的 SELECT 缩写:
SELECT distributors.* WHERE distributors.name = 'Westward'; did | name -----+---------- 108 | Westward
这样也可以运行是因为我们给 SELECT 中引用了但没有在 FROM 中提到的每个表都加了一个隐含的 FROM 项。
尽管这是个很方便的写法,但它却容易误用。 比如,下面的查询
SELECT distributors.* FROM distributors d;
可能就是个错误;用户最有可能的意思是
SELECT d.* FROM distributors d;
而不是下面的他实际上得到的无约束的连接
SELECT distributors.* FROM distributors d, distributors distributors;
为了帮助检测这种错误, PostgreSQL 以及以后的版本将在你使用一条即有隐含 FROM 特性又有明确的 FROM 子句的查询的时候给出警告。 Also, it is possible to disable the implicit-FROM feature by setting the ADD_MISSING_FROM parameter to false.
AS 关键字
在 SQL 标准里,可选的关键字 AS 是多余的,可以忽略掉而不对语句产生任何影响。 PostgreSQL 分析器在重命名列/字段时需要这个关键字, 因为类型扩展的特性会导致在这个环境里的歧义。 不过,AS 在 FROM 项里是可选的。
GROUP BY 和 ORDER BY 里可用的名字空间
在 SQL92 标准里,ORDER BY 子句只能使用结果字段名或者编号, 而 GROUP BY 子句只能用基于输入字段名的表达式。 PostgreSQL 对这两个子句都进行了扩展, 允许另外一种选择(但是如果存在歧义,则使用标准的解释)。 PostgreSQL 还允许两个子句声明任意的表达式。 请注意在表达式中出现的名字强总是被当作输入字段名,而不是结果字段名。
SQL99 uses a slightly different definition which is not upward compatible with SQL92. In most cases, however, PostgreSQL will interpret an ORDER BY or GROUP BY expression the same way SQL99 does.
非标准子句
DISTINCT ON, LIMIT, 和 OFFSET 都没有在 SQL 标准中定义。
#p#
NAME
SELECT - retrieve rows from a table or view
SYNOPSIS
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] * | expression [ AS output_name ] [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL ] select ] [ ORDER BY expression [ ASC | DESC | USING operator ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start ] [ FOR UPDATE [ OF table_name [, ...] ] ] where from_item can be one of: [ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ] function_name ( [ argument [, ...] ] ) [ AS ] alias [ ( column_alias [, ...] | column_definition [, ...] ) ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] ) from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
[Comment: FIXME: This last syntax is incorrect if the join type is an INNER or OUTER join (in which case one of NATURAL, ON ..., or USING ... is mandatory, not optional). What's the best way to fix this?]
DESCRIPTION
SELECT retrieves rows from one or more tables. The general processing of SELECT is as follows:
- 1.
- All elements in the FROM list are computed. (Each element in the FROM list is a real or virtual table.) If more than one element is specified in the FROM list, they are cross-joined together. (See FROM Clause [select(7)] below.)
- 2.
- If the WHERE clause is specified, all rows that do not satisfy the condition are eliminated from the output. (See WHERE Clause [select(7)] below.)
- 3.
- If the GROUP BY clause is specified, the output is divided into groups of rows that match on one or more values. If the HAVING clause is present, it eliminates groups that do not satisfy the given condition. (See GROUP BY Clause [select(7)] and HAVING Clause [select(7)] below.)
- 4.
- Using the operators UNION, INTERSECT, and EXCEPT, the output of more than one SELECT statement can be combined to form a single result set. The UNION operator returns all rows that are in one or both of the result sets. The INTERSECT operator returns all rows that are strictly in both result sets. The EXCEPT operator returns the rows that are in the first result set but not in the second. In all three cases, duplicate rows are eliminated unless ALL is specified. (See UNION Clause [select(7)], INTERSECT Clause [select(l)], and EXCEPT Clause [select(7)] below.)
- 5.
- The actual output rows are computed the SELECT output expressions for each selected row. (See SELECT List [select(7)] below.)
- 6.
- If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce. (See ORDER BY Clause [select(7)] below.)
- 7.
- If the LIMIT or OFFSET clause is specified, the SELECT statement only returns a subset of the result rows. (See LIMIT Clause [select(7)] below.)
- 8.
- DISTINCT eliminates duplicate rows from the result. DISTINCT ON eliminates rows that match on all the specified expressions. ALL (the default) will return all candidate rows, including duplicates. (See DISTINCT Clause [select(7)] below.)
- 9.
- The FOR UPDATE clause causes the SELECT statement to lock the selected rows against concurrent updates. (See FOR UPDATE Clause [select(7)] below.)
You must have SELECT privilege on a table to read its values. The use of FOR UPDATE requires UPDATE privilege as well.
PARAMETERS
FROM CLAUSE
The FROM clause specifies one or more source tables for the SELECT. If multiple sources are specified, the result is the Cartesian product (cross join) of all the sources. But usually qualification conditions are added to restrict the returned rows to a small subset of the Cartesian product.
FROM-clause elements can contain:
- table_name
- The name (optionally schema-qualified) of an existing table or view. If ONLY is specified, only that table is scanned. If ONLY is not specified, the table and all its descendant tables (if any) are scanned. * can be appended to the table name to indicate that descendant tables are to be scanned, but in the current version, this is the default behavior. (In releases before 7.1, ONLY was the default behavior.) The default behavior can be modified by changing the sql_interitance configuration option.
- alias
- A substitute name for the FROM item containing the alias. An alias is used for brevity or to eliminate ambiguity for self-joins (where the same table is scanned multiple times). When an alias is provided, it completely hides the actual name of the table or function; for example given FROM foo AS f, the remainder of the SELECT must refer to this FROM item as f not foo. If an alias is written, a column alias list can also be written to provide substitute names for one or more columns of the table.
- select
- A sub-SELECT can appear in the FROM clause. This acts as though its output were created as a temporary table for the duration of this single SELECT command. Note that the sub-SELECT must be surrounded by parentheses, and an alias must be provided for it.
- function_name
- Function calls can appear in the FROM clause. (This is especially useful for functions that return result sets, but any function can be used.) This acts as though its output were created as a temporary table for the duration of this single SELECT command. An alias may also be used. If an alias is written, a column alias list can also be written to provide substitute names for one or more attributes of the function's composite return type. If the function has been defined as returning the record data type, then an alias or the key word AS must be present, followed by a column definition list in the form ( column_name data_type [, ... ] ). The column definition list must match the actual number and types of columns returned by the function.
- join_type
- One of
- *
- [ INNER ] JOIN
- *
- LEFT [ OUTER ] JOIN
- *
- RIGHT [ OUTER ] JOIN
- *
- FULL [ OUTER ] JOIN
- *
- CROSS JOIN
For the INNER and OUTER join types, a join condition must be specified, namely exactly one of NATURAL, ON join_condition, or USING (join_column [, ...]). See below for the meaning. For CROSS JOIN, none of these clauses may appear.
A JOIN clause, combines two FROM items. (Use parentheses if necessary to determine the order of nesting.)
CROSS JOIN and INNER JOIN produce a simple Cartesian product, the same as you get from listing the two items at the top level of FROM. CROSS JOIN is equivalent to INNER JOIN ON (true), that is, no rows are removed by qualification. These join types are just a notational convenience, since they do nothing you couldn't do with plain FROM and WHERE.
LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the JOIN clauses own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards.
Conversely, RIGHT OUTER JOIN returns all the joined rows, plus one row for each unmatched right-hand row (extended with nulls on the left). This is just a notational convenience, since you could convert it to a LEFT OUTER JOIN by switching the left and right inputs.
FULL OUTER JOIN returns all the joined rows, plus one row for each unmatched left-hand row (extended with nulls on the right), plus one row for each unmatched right-hand row (extended with nulls on the left).
- ON join_condition
- join_condition is an expression resulting in a value of type boolean (similar to a WHERE clause) that specifies which rows in a join are considered to match.
- USING (join_column [, ...])
- A clause of the form USING ( a, b, ... ) is shorthand for ON left_table.a = right_table.a AND left_table.b = right_table.b .... Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both.
- NATURAL
- NATURAL is shorthand for a USING list that mentions all columns in the two tables that have the same names.
WHERE CLAUSE
The optional WHERE clause has the general form
WHERE condition
where condition is any expression that evaluates to a result of type boolean. Any row that does not satisfy this condition will be eliminated from the output. A row satisfies the condition if it returns true when the actual row values are substituted for any variable references.
GROUP BY CLAUSE
The optional GROUP BY clause has the general form
GROUP BY expression [, ...]
GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions. expression can be an input column name, or the name or ordinal number of an output column (SELECT list), or it can be an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name.
Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without GROUP BY, an aggregate produces a single value computed across all the selected rows). When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
HAVING CLAUSE
The optional HAVING clause has the general form
HAVING condition
where condition is the same as specified for the WHERE clause.
HAVING eliminates group rows that do not satisfy the condition. HAVING is different from WHERE: WHERE filters individual rows before the application of GROUP BY, while HAVING filters group rows created by GROUP BY. Each column referenced in condition must unambiguously reference a grouping column, unless the reference appears within an aggregate function.
UNION CLAUSE
The UNION clause has this general form:
select_statement UNION [ ALL ] select_statement
select_statement is any SELECT statement without an ORDER BY, LIMIT, or FOR UPDATE clause. (ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses. Without parentheses, these clauses will be taken to apply to the result of the UNION, not to its right-hand input expression.)
The UNION operator computes the set union of the rows returned by the involved SELECT statements. A row is in the set union of two result sets if it appears in at least one of the result sets. The two SELECT statements that represent the direct operands of the UNION must produce the same number of columns, and corresponding columns must be of compatible data types.
The result of UNION does not contain any duplicate rows unless the ALL option is specified. ALL prevents elimination of duplicates.
Multiple UNION operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses.
Currently, FOR UPDATE may not be specified either for a UNION result or for the inputs of UNION.
INTERSECT CLAUSE
The INTERSECT clause has this general form:
select_statement INTERSECT [ ALL ] select_statement
select_statement is any SELECT statement without an ORDER BY, LIMIT, or FOR UPDATE clause.
The INTERSECT operator computes the set intersection of the rows returned by the involved SELECT statements. A row is in the intersection of two result sets if it appears in both result sets.
The result of INTERSECT does not contain any duplicate rows unless the ALL option is specified. With ALL, a row that has m duplicates in the left table and n duplicates in the right table will appear min(m,n) times in the result set.
Multiple INTERSECT operators in the same SELECT statement are evaluated left to right, unless parentheses dictate otherwise. INTERSECT binds more tightly than UNION. That is, A UNION B INTERSECT C will be read as A UNION (B INTERSECT C).
EXCEPT CLAUSE
The EXCEPT clause has this general form:
select_statement EXCEPT [ ALL ] select_statement
select_statement is any SELECT statement without an ORDER BY, LIMIT, or FOR UPDATE clause.
The EXCEPT operator computes the set of rows that are in the result of the left SELECT statement but not in the result of the right one.
The result of EXCEPT does not contain any duplicate rows unless the ALL option is specified. With ALL, a row that has m duplicates in the left table and n duplicates in the right table will appear max(m-n,0) times in the result set.
Multiple EXCEPT operators in the same SELECT statement are evaluated left to right, unless parentheses dictate otherwise. EXCEPT binds at the same level as UNION.
SELECT LIST
The SELECT list (between the key words SELECT and FROM) specifies expressions that form the output rows of the SELECT statement. The expressions can (and usually do) refer to columns computed in the FROM clause. Using the clause AS output_name, another name can be specified for an output column. This name is primarily used to label the column for display. It can also be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.
Instead of an expression, * can be written in the output list as a shorthand for all the columns of the selected rows. Also, one can write table_name.* as a shorthand for the columns coming from just that table.
ORDER BY CLAUSE
The optional ORDER BY clause has this general form:
ORDER BY expression [ ASC | DESC | USING operator ] [, ...]
expression can be the name or ordinal number of an output column (SELECT list), or it can be an arbitrary expression formed from input-column values.
The ORDER BY clause causes the result rows to be sorted according to the specified expressions. If two rows are equal according to the leftmost expression, the are compared according to the next expression and so on. If they are equal according to all specified expressions, they are returned in random order.
The ordinal number refers to the ordinal (left-to-right) position of the result column. This feature makes it possible to define an ordering on the basis of a column that does not have a unique name. This is never absolutely necessary because it is always possible to assign a name to a result column using the AS clause.
It is also possible to use arbitrary expressions in the ORDER BY clause, including columns that do not appear in the SELECT result list. Thus the following statement is valid:
SELECT name FROM distributors ORDER BY code;
A limitation of this feature is that an ORDER BY clause applying to the result of a UNION, INTERSECT, or EXCEPT clause may only specify an output column name or number, not an expression.
If an ORDER BY expression is a simple name that matches both a result column name and an input column name, ORDER BY will interpret it as the result column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.
Optionally one may add the key word ASC (ascending) or DESC (descending) after each expression in the ORDER BY clause. If not specified, ASC is assumed by default. Alternatively, a specific ordering operator name may be specified in the USING clause. ASC is usually equivalent to USING < and DESC is usually equivalent to USING >. (But the creator of a user-defined data type can define exactly what the default sort ordering is, and it might correspond to operators with other names.)
The null value sorts higher than any other value. In other words, with ascending sort order, null values sort at the end, and with descending sort order, null values sort at the beginning.
Character-string data is sorted according to the locale-specific collation order that was established when the database cluster was initialized.
LIMIT CLAUSE
The LIMIT clause consists of two independent clauses:
LIMIT { count | ALL } OFFSET start
count specifies the maximum number of rows to return, and start specifies the number of rows to skip before starting to return rows.
When using LIMIT, it is a good idea to use an ORDER BY clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows---you may be asking for the tenth through twentieth rows, but tenth through twentieth in what ordering? You don't know what ordering unless you specify ORDER BY.
The query planner takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.
DISTINCT CLAUSE
If DISTINCT is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates). ALL specifies the opposite: all rows are kept; that is the default.
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the ``first row'' of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example,
SELECT DISTINCT ON (location) location, time, report FROM weather_reports ORDER BY location, time DESC;
retrieves the most recent weather report for each location. But if we had not used ORDER BY to force descending order of time values for each location, we'd have gotten a report from an unpredictable time for each location.
FOR UPDATE CLAUSE
The FOR UPDATE clause has this form:
FOR UPDATE [ OF table_name [, ...] ]
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be blocked until the current transaction ends. Also, if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has already locked a selected row or rows, SELECT FOR UPDATE will wait for the other transaction to complete, and will then lock and return the updated row (or no row, if the row was deleted). For further discussion see the chapter called ``Concurrency Control'' in the documentation.
If specific tables are named in FOR UPDATE, then only rows coming from those tables are locked; any other tables used in the SELECT are simply read as usual.
FOR UPDATE cannot be used in contexts where returned rows can't be clearly identified with individual table rows; for example it can't be used with aggregation.
FOR UPDATE may appear before LIMIT for compatibility with PostgreSQL versions before 7.3. It effectively executes after LIMIT, however, and so that is the recommended place to write it.
EXAMPLES
To join the table films with the table distributors:
SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d, films f WHERE f.did = d.did title | did | name | date_prod | kind -------------------+-----+--------------+------------+---------- The Third Man | 101 | British Lion | 1949-12-23 | Drama The African Queen | 101 | British Lion | 1951-08-11 | Romantic ...
To sum the column len of all films and group the results by kind:
SELECT kind, sum(len) AS total FROM films GROUP BY kind; kind | total ----------+------- Action | 07:34 Comedy | 02:58 Drama | 14:28 Musical | 06:42 Romantic | 04:38
To sum the column len of all films, group the results by kind and show those group totals that are less than 5 hours:
SELECT kind, sum(len) AS total FROM films GROUP BY kind HAVING sum(len) < interval '5 hours'; kind | total ----------+------- Comedy | 02:58 Romantic | 04:38
The following two examples are identical ways of sorting the individual results according to the contents of the second column (name):
SELECT * FROM distributors ORDER BY name; SELECT * FROM distributors ORDER BY 2; did | name -----+------------------ 109 | 20th Century Fox 110 | Bavaria Atelier 101 | British Lion 107 | Columbia 102 | Jean Luc Godard 113 | Luso films 104 | Mosfilm 103 | Paramount 106 | Toho 105 | United Artists 111 | Walt Disney 112 | Warner Bros. 108 | Westward
This example shows how to obtain the union of the tables distributors and actors, restricting the results to those that begin with letter W in each table. Only distinct rows are wanted, so the key word ALL is omitted.
distributors: actors: did | name id | name -----+-------------- ----+---------------- 108 | Westward 1 | Woody Allen 111 | Walt Disney 2 | Warren Beatty 112 | Warner Bros. 3 | Walter Matthau ... ... SELECT distributors.name FROM distributors WHERE distributors.name LIKE 'W%' UNION SELECT actors.name FROM actors WHERE actors.name LIKE 'W%'; name ---------------- Walt Disney Walter Matthau Warner Bros. Warren Beatty Westward Woody Allen
This example shows how to use a function in the FROM clause, both with and without a column definition list.
CREATE FUNCTION distributors(int) RETURNS SETOF distributors AS ' SELECT * FROM distributors WHERE did = $1; SELECT * FROM distributors(111); did | name -----+------------- 111 | Walt Disney CREATE FUNCTION distributors_2(int) RETURNS SETOF record AS ' SELECT * FROM distributors WHERE did = $1; SELECT * FROM distributors_2(111) AS (f1 int, f2 text); f1 | f2 -----+------------- 111 | Walt Disney
COMPATIBILITY
Of course, the SELECT statement is compatible with the SQL standard. But there are some extensions and some missing features.
OMITTED FROM CLAUSES
PostgreSQL allows one to omit the FROM clause. It has a straightforward use to compute the results of simple expressions:
SELECT 2+2; ?column? ---------- 4
Some other SQL databases cannot do this except by introducing a dummy one-row table from which to do the SELECT.
A less obvious use is to abbreviate a normal SELECT from tables:
SELECT distributors.* WHERE distributors.name = 'Westward'; did | name -----+---------- 108 | Westward
This works because an implicit FROM item is added for each table that is referenced in other parts of the SELECT statement but not mentioned in FROM.
While this is a convenient shorthand, it's easy to misuse. For example, the command
SELECT distributors.* FROM distributors d;
is probably a mistake; most likely the user meant
SELECT d.* FROM distributors d;
rather than the unconstrained join
SELECT distributors.* FROM distributors d, distributors distributors;
that he will actually get. To help detect this sort of mistake, PostgreSQL will warn if the implicit-FROM feature is used in a SELECT statement that also contains an explicit FROM clause. Also, it is possible to disable the implicit-FROM feature by setting the ADD_MISSING_FROM parameter to false.
THE AS KEY WORD
In the SQL standard, the optional key word AS is just noise and can be omitted without affecting the meaning. The PostgreSQL parser requires this key word when renaming output columns because the type extensibility features lead to parsing ambiguities without it. AS is optional in FROM items, however.
NAMESPACE AVAILABLE TO GROUP BY AND ORDER BY
In the SQL92 standard, an ORDER BY clause may only use result column names or numbers, while a GROUP BY clause may only use expressions based on input column names. PostgreSQL extends each of these clauses to allow the other choice as well (but it uses the standard's interpretation if there is ambiguity). PostgreSQL also allows both clauses to specify arbitrary expressions. Note that names appearing in an expression will always be taken as input-column names, not as result-column names.
SQL99 uses a slightly different definition which is not upward compatible with SQL92. In most cases, however, PostgreSQL will interpret an ORDER BY or GROUP BY expression the same way SQL99 does.
NONSTANDARD CLAUSES
The clauses DISTINCT ON, LIMIT, and OFFSET are not defined in the SQL standard.