Oracle 的间隔分区，PostgreSQL 已经完全 Get 到了！-51CTO.COM

介绍

使用更智能的数据保留策略，您每个月都可以节省数据库上面的成本。分区的一个主要原因，也是一个巨大的好处是，使用它来自动归档您的数据。例如，您可能有一个巨大的日志表。出于业务需要，您需要将此数据保留 30 天。此表会随着时间的推移而不断增长，保留所有数据会使数据库维护具有挑战性。使用基于时间的分区，您可以简单地将超过 30 天的数据进行归档。

大多数关系数据库的性质，意味着删除大量数据的效率可能非常低，并且该空间（如果有的话）不会立即返回给文件系统。在运行正常的删除操作时，PostgreSQL 不会将其保留的空间返回给文件系统，除非是在以下特殊的条件下：

1. 关系表末尾的页面被完全清空

2. 针对关系运行 VACUUM FULL/CLUSTER（排他性锁定关系，直到完成）

如果你发现自己需要更及时地回收空间，或者没有侵入性锁定，那么分区可以提供一种更简单的方法，来删除旧数据：删除表。删除的动作几乎是即时的（除非有任何事务锁定了表），并会立即将空间返回给文件系统。pg_partman 是用于分区的 PostgreSQL 扩展，它提供了一种非常简单的方法，来管理基于时间和整数的分区。

pg_partman 按日分区的示例

pg_partman 最近发布了 5.1 版本，其中包括的新功能有，例如单值整数的列表分区、受控维护性运行排序，和对数值型分区的实验性支持。此新版本还包括几个错误修复，因此请尽可能更新到最新版本！所有示例均使用此最新版本完成。

首先，让我们获得一个简单的、基于时间的按日分区集

CREATE TABLE public.time_stuff
    (col1 int
        , col2 text default 'stuff'
        , col3 timestamptz NOT NULL DEFAULT now() )
    PARTITION BY RANGE (col3);

SELECT partman.create_parent('public.time_stuff', 'col3', '1 day');

\d+ time_stuff
                                             Partitioned table "public.time_stuff"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: time_stuff_p20240408 FOR VALUES FROM ('2024-04-08 00:00:00-04') TO ('2024-04-09 00:00:00-04'),
            time_stuff_p20240409 FOR VALUES FROM ('2024-04-09 00:00:00-04') TO ('2024-04-10 00:00:00-04'),
            time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-11 00:00:00-04'),
            time_stuff_p20240411 FOR VALUES FROM ('2024-04-11 00:00:00-04') TO ('2024-04-12 00:00:00-04'),
            time_stuff_p20240412 FOR VALUES FROM ('2024-04-12 00:00:00-04') TO ('2024-04-13 00:00:00-04'),
            time_stuff_p20240413 FOR VALUES FROM ('2024-04-13 00:00:00-04') TO ('2024-04-14 00:00:00-04'),
            time_stuff_p20240414 FOR VALUES FROM ('2024-04-14 00:00:00-04') TO ('2024-04-15 00:00:00-04'),
            time_stuff_p20240415 FOR VALUES FROM ('2024-04-15 00:00:00-04') TO ('2024-04-16 00:00:00-04'),
            time_stuff_p20240416 FOR VALUES FROM ('2024-04-16 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_default DEFAULT

设置数据保留策略

此分区集创建于 2024 年 4 月 12 日，因此默认设置将在之前创建 4 个分区，之后创建 4 个分区。要配置保留期的第一个设置，也是唯一需要的设置，是part_config表中的retention列。在此示例中，我们将保留期设置为 2 天。我们还将增加 premake 值，只是为了查看正常维护是否正常工作。

UPDATE partman.part_config SET retention = '2 days', premake = 6 WHERE parent_table = 'public.time_stuff';

默认情况下，如果分区集中没有数据，pg_partman 也不会创建新的子表，因此让我们也添加一些数据。

INSERT INTO public.time_stuff (col1, col3)
VALUES (generate_series(1,10), CURRENT_TIMESTAMP);

SELECT * FROM partman.part_config
WHERE parent_table = 'public.time_stuff';

-[ RECORD 1 ]--------------+-----------------------------------
parent_table               | public.time_stuff
control                    | col3
partition_interval         | 1 day
partition_type             | range
premake                    | 6
automatic_maintenance      | on
template_table             | partman.template_public_time_stuff
retention                  | 2 days
retention_schema           |
retention_keep_index       | t
retention_keep_table       | t
epoch                      | none
constraint_cols            |
optimize_constraint        | 30
infinite_time_partitions   | f
datetime_string            | YYYYMMDD
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
inherit_privileges         | f
constraint_valid           | t
ignore_default_data        | t
default_table              | t
date_trunc_interval        |
maintenance_order          |
retention_keep_publication | f
maintenance_last_run       |

在 pg_partman 中，数据保留管理是在创建新分区的同时处理的。因此，一个简单的调用run_maintenance_proc()，将同时处理两者。

CALL partman.run_maintenance_proc();

\d+ time_stuff
                                             Partitioned table "public.time_stuff"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-11 00:00:00-04'),
            time_stuff_p20240411 FOR VALUES FROM ('2024-04-11 00:00:00-04') TO ('2024-04-12 00:00:00-04'),
            time_stuff_p20240412 FOR VALUES FROM ('2024-04-12 00:00:00-04') TO ('2024-04-13 00:00:00-04'),
            time_stuff_p20240413 FOR VALUES FROM ('2024-04-13 00:00:00-04') TO ('2024-04-14 00:00:00-04'),
            time_stuff_p20240414 FOR VALUES FROM ('2024-04-14 00:00:00-04') TO ('2024-04-15 00:00:00-04'),
            time_stuff_p20240415 FOR VALUES FROM ('2024-04-15 00:00:00-04') TO ('2024-04-16 00:00:00-04'),
            time_stuff_p20240416 FOR VALUES FROM ('2024-04-16 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_p20240417 FOR VALUES FROM ('2024-04-17 00:00:00-04') TO ('2024-04-18 00:00:00-04'),
            time_stuff_p20240418 FOR VALUES FROM ('2024-04-18 00:00:00-04') TO ('2024-04-19 00:00:00-04'),
            time_stuff_default DEFAULT

现在，您可以看到 2 天前的两个分区已被删除，并且已创建两个新分区以包含后 6 天的数据。pg_partman 中还有一些其他更高级的保留选项。您将在上面看到，默认情况下，retention_keep_table选项设置为 true。这意味着，虽然子表不再是保留集的一部分，但这些表仍存在于数据库中。pg_partman 会尝试保持所有选项的默认设置，以减少意外的数据丢失。

\dt public.time_stuff*
                     List of relations
 Schema |         Name         |       Type        | Owner
--------+----------------------+-------------------+-------
 public | time_stuff           | partitioned table | keith
 public | time_stuff_default   | table             | keith
 public | time_stuff_p20240408 | table             | keith
 public | time_stuff_p20240409 | table             | keith
 public | time_stuff_p20240410 | table             | keith
 public | time_stuff_p20240411 | table             | keith
 public | time_stuff_p20240412 | table             | keith
 public | time_stuff_p20240413 | table             | keith
 public | time_stuff_p20240414 | table             | keith
 public | time_stuff_p20240415 | table             | keith
 public | time_stuff_p20240416 | table             | keith
 public | time_stuff_p20240417 | table             | keith
 public | time_stuff_p20240418 | table             | keith

删除表和索引

如果您希望实际删除这些表，则可以将retention_keep_table设置为 false。或者，如果您想将表保留在数据库中，但不需要索引再占用空间，则可以将retention_keep_table保留为 true，但设置retention_keep_index为 false。在下面的示例中，在运行完create_parent()后，已经将分区集重置回了其原始状态，然后运行此更新。

UPDATE partman.part_config
SET retention = '2 days', premake = 6, retention_keep_table = false
WHERE parent_table = 'public.time_stuff';

CALL partman.run_maintenance_proc();

现在，如果我们查看实际存在的表，我们可以看到最早的两个表已经消失了。

\dt public.time*
                     List of relations
 Schema |         Name         |       Type        | Owner
--------+----------------------+-------------------+-------
 public | time_stuff           | partitioned table | keith
 public | time_stuff_default   | table             | keith
 public | time_stuff_p20240410 | table             | keith
 public | time_stuff_p20240411 | table             | keith
 public | time_stuff_p20240412 | table             | keith
 public | time_stuff_p20240413 | table             | keith
 public | time_stuff_p20240414 | table             | keith
 public | time_stuff_p20240415 | table             | keith
 public | time_stuff_p20240416 | table             | keith
 public | time_stuff_p20240417 | table             | keith
 public | time_stuff_p20240418 | table             | keith

数据库外部的保留

另一种情况是，如果您不需要数据库中的数据，但仍希望在数据库以外保留数据的备份。在本例中，我们将使用retention_schema选项，该选项将子表从分区集中分离，然后将它们移动到该选项指定的模式中。同样，在运行完create_parent()后，分区集已重置为初始状态，然后我们运行以下命令：

CREATE SCHEMA old_tables;

UPDATE partman.part_config
SET retention = '2 days', retention_schema = 'old_tables'
WHERE parent_table = 'public.time_stuff';

CALL partman.run_maintenance_proc();

现在我们可以看到，旧表不再在分区集中，而是现在在old_tables模式中。

\d+ time_stuff
                                             Partitioned table "public.time_stuff"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-11 00:00:00-04'),
            time_stuff_p20240411 FOR VALUES FROM ('2024-04-11 00:00:00-04') TO ('2024-04-12 00:00:00-04'),
            time_stuff_p20240412 FOR VALUES FROM ('2024-04-12 00:00:00-04') TO ('2024-04-13 00:00:00-04'),
            time_stuff_p20240413 FOR VALUES FROM ('2024-04-13 00:00:00-04') TO ('2024-04-14 00:00:00-04'),
            time_stuff_p20240414 FOR VALUES FROM ('2024-04-14 00:00:00-04') TO ('2024-04-15 00:00:00-04'),
            time_stuff_p20240415 FOR VALUES FROM ('2024-04-15 00:00:00-04') TO ('2024-04-16 00:00:00-04'),
            time_stuff_p20240416 FOR VALUES FROM ('2024-04-16 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_default DEFAULT

\dt old_tables.*
                 List of relations
   Schema   |         Name         | Type  | Owner
------------+----------------------+-------+-------
 old_tables | time_stuff_p20240408 | table | keith
 old_tables | time_stuff_p20240409 | table | keith

要在数据库之外“离线”存储这些表，我们可以使用 pg_partman 提供的 python 脚本，来转储给定模式中的所有表。它与分区配置或分区集没有任何关联，因此该脚本可用于转储任何模式中的任何表。

$ python3 dump_partition.py -c"host=localhost" --schema=old_tables
DROP TABLE IF EXISTS"old_tables"."time_stuff_p20240409"
DROP TABLE IF EXISTS"old_tables"."time_stuff_p20240408

$ ls -l old*
-rw-rw-r-- 1 keith keith  168 Apr 12 18:17 old_tables.time_stuff_p20240408.hash
-rw-rw-r-- 1 keith keith 1410 Apr 12 18:17 old_tables.time_stuff_p20240408.pgdump
-rw-rw-r-- 1 keith keith  168 Apr 12 18:17 old_tables.time_stuff_p20240409.hash
-rw-rw-r-- 1 keith keith 1410 Apr 12 18:17 old_tables.time_stuff_p20240409.pgdump

默认情况下，它会以自定义转储格式创建转储文件，并提供转储文件的 SHA-512 哈希值，以提供长期的数据完整性检查。该备份方法可以作为定期计划脚本的一部分运行，也可以作为一次性备份运行。

总结

保留不需要实际存在于数据库中的数据，是保持数据库高效运行的一个关键部分。希望这能为使用 pg_partman 的用户，提供一份关于基础和高级的数据保留管理方案的指南。