示例数据
下面是一个用于将数据加载到其中的表。
CREATE TABLE weather_data (
station text,
temps text
);
对于此示例,直接 INSERT 数据会更容易。PostgreSQL 的COPY功能也能以类似的方式工作。
INSERT INTO weather_data VALUES
('Station North','-1,-4,-14,-15,-16,-15,-12,-9,-3,0,1,2'),
('Station West','2,4,5,6,9,10,15,16,13,12,10,9,5,3,1'),
('Station East','5,3,2,4,5,6,9,10,15,16,13,12,10,9,5,4,2,1'),
('Station South','12,18,22,25,29,30,33,31,30,29,28,25,24,23,14');
将字符串转换为数组
有了表中的数据,下一个问题是:如何处理那个用逗号分隔的温度列表字符串?首先,通过使用string_to_array(string, separator)函数,将其转换为数组以使其更可用。
拆分为数组;
SELECT
station,
string_to_array(temps,',') AS array
FROM weather_data;
查询结果;
station | array
---------------+------------------------------------------------
Station North | {-1,-4,-14,-15,-16,-15,-12,-9,-3,0,1,2}
Station West | {2,4,5,6,9,10,15,16,13,12,10,9,5,3,1}
Station East | {5,3,2,4,5,6,9,10,15,16,13,12,10,9,5,4,2,1}
Station South | {12,18,22,25,29,30,33,31,30,29,28,25,24,23,14}
使用数组而不是字符串看起来并没有更有用,但我们可以通过对数据执行 “仅数组类” 的操作(例如返回数组长度),来证明我们现在实际上拥有了结构化数据。
拆分为数组,再分析数组;
SELECT
station,
cardinality(string_to_array(temps,',')) AS array_size
FROM weather_data;
查询结果;
station | array_size
---------------+------------
Station North | 12
Station West | 15
Station East | 18
Station South | 15
展开和分析数组
然而,到目前为止,使用这样的数组最有趣的就是对它使用unnest(array)!unnest(array)函数是一个 “返回集合的函数”,这意味着它可以返回多行。这是如何运作的?传入行的所有其他部分都是重复的,因此每一行都有一个完整的数据集合,如下所示。
拆分为数组,再展开;
SELECT
station,
unnest(string_to_array(temps,',')) AS temps
FROM weather_data;
查询结果;
station | temps
---------------+-------
Station North | -1
Station North | -4
Station North | -14
Station North | -15
Station North | -16
Station North | -15
Station North | -12
Station North | -9
Station North | -3
Station North | 0
Station North | 1
Station North | 2
Station West | 2
Station West | 4
Station West | 5
Station West | 6
Station West | 9
Station West | 10
Station West | 15
Station West | 16
Station West | 13
Station West | 12
Station West | 10
Station West | 9
Station West | 5
Station West | 3
Station West | 1
Station East | 5
Station East | 3
Station East | 2
Station East | 4
Station East | 5
Station East | 6
Station East | 9
Station East | 10
Station East | 15
Station East | 16
Station East | 13
Station East | 12
Station East | 10
Station East | 9
Station East | 5
Station East | 4
Station East | 2
Station East | 1
Station South | 12
Station South | 18
Station South | 22
Station South | 25
Station South | 29
Station South | 30
Station South | 33
Station South | 31
Station South | 30
Station South | 29
Station South | 28
Station South | 25
Station South | 24
Station South | 23
Station South | 14
现在的数据看起来很像我们在标准数据模型中将表连接在一起而获得的数据,我们现在实际上可以进行标准的分析工作了,例如计算每个站点的温度范围。
拆分为数组,展开并分析温度;
WITH unnested_data AS (
SELECT
station,
unnest(string_to_array(temps,',')) AS temps
FROM weather_data
)
SELECT
station,
max(temps) AS max_temp,
min(temps) AS min_temp
FROM unnested_data
GROUP BY station;
查询结果;
station | max_temp | min_temp
---------------+----------+----------
Station North | 2 | -1
Station West | 9 | 1
Station East | 9 | 1
Station South | 33 | 12
(4 rows)
最后,为了完整起见,如果您想将关联的表保存在字符串中,但又不喜欢逗号,以下是使用新分隔符拆分和重新连接数据的方法。
拆分为数组,再联接为字符串;
SELECT
station,
array_to_string(string_to_array(temps,','),'|') AS temps
FROM weather_data;
查询结果;
station | temps
---------------+----------------------------------------------
Station North | -1|-4|-14|-15|-16|-15|-12|-9|-3|0|1|2
Station West | 2|4|5|6|9|10|15|16|13|12|10|9|5|3|1
Station East | 5|3|2|4|5|6|9|10|15|16|13|12|10|9|5|4|2|1
Station South | 12|18|22|25|29|30|33|31|30|29|28|25|24|23|14
(4 rows)