本文共 8691 字,大约阅读时间需要 28 分钟。
作者:阿列克谢·瓦西里耶夫(Alexey Vasiliev)
译者:类延良,任职于瀚高基础软件股份有限公司,PostgreSQL数据库技术爱好者,PostgreSQL ACE、PGCM、10g &11g OCM,OGG认证专家。
原文地址:
在本文中,我们将学习如何使用PostgreSQL的ltree模块,该模块允许以分层的树状结构存储数据。
Ltree是PostgreSQL模块。它实现了一种数据类型ltree,用于表示存储在分层树状结构中的数据的标签。提供了用于搜索标签树的广泛工具。
首先,您应该在数据库中启用扩展。您可以通过以下命令执行此操作:
CREATE EXTENSION ltree;
让我们创建表并向其中添加一些数据:
CREATE TABLE comments (user_id integer, description text, path ltree);INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001');INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001');INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001.0001');INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001.0001.0001.0002');INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0001.0001.0003');INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002');INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002.0001');INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0003');INSERT INTO comments (user_id, description, path) VALUES ( 8, md5(random()::text), '0001.0003.0001');INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0002');INSERT INTO comments (user_id, description, path) VALUES ( 11, md5(random()::text), '0001.0003.0002.0001');INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0003.0002.0002');INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0003.0002.0003');INSERT INTO comments (user_id, description, path) VALUES ( 7, md5(random()::text), '0001.0003.0002.0002.0001');INSERT INTO comments (user_id, description, path) VALUES ( 20, md5(random()::text), '0001.0003.0002.0002.0002');INSERT INTO comments (user_id, description, path) VALUES ( 31, md5(random()::text), '0001.0003.0002.0002.0003');INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0004');INSERT INTO comments (user_id, description, path) VALUES ( 34, md5(random()::text), '0001.0003.0002.0002.0005');INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0006');
另外,我们应该添加一些索引:
CREATE INDEX path_gist_comments_idx ON comments USING GIST(path);CREATE INDEX path_comments_idx ON comments USING btree(path);
正如您看到的那样,我建立comments表时带有path字段,该字段包含该表的tree全部路径。如您所见,对于树分隔符,我使用4个数字和点。
让我们在commenets表中找到path以‘0001.0003’的记录:
$ SELECT user_id, path FROM comments WHERE path <@ '0001.0003'; user_id | path---------+-------------------------- 6 | 0001.0003 8 | 0001.0003.0001 9 | 0001.0003.0002 11 | 0001.0003.0002.0001 2 | 0001.0003.0002.0002 5 | 0001.0003.0002.0003 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | 0001.0003.0002.0002.0003 22 | 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | 0001.0003.0002.0002.0006(12 rows)
让我们通过EXPLAIN命令检查这个SQL:
$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003'; QUERY PLAN---------------------------------------------------------------------------------------------------- Seq Scan on comments (cost=0.00..1.24 rows=2 width=38) (actual time=0.013..0.017 rows=12 loops=1) Filter: (path <@ '0001.0003'::ltree) Rows Removed by Filter: 7 Total runtime: 0.038 ms(4 rows)
让我们禁用seq scan进行测试:
$ SET enable_seqscan=false;SET$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003'; QUERY PLAN----------------------------------------------------------------------------------------------------------------------------------- Index Scan using path_gist_comments_idx on comments (cost=0.00..8.29 rows=2 width=38) (actual time=0.023..0.034 rows=12 loops=1) Index Cond: (path <@ '0001.0003'::ltree) Total runtime: 0.076 ms(3 rows)
现在SQL慢了,但是能看到SQL是怎么使用index的。
第一个SQL语句使用了sequence scan,因为在表中没有太多的数据。我们可以将select “path <@ ‘0001.0003’” 换种实现方法:
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';user_id | path---------+-------------------------- 6 | 0001.0003 8 | 0001.0003.0001 9 | 0001.0003.0002 11 | 0001.0003.0002.0001 2 | 0001.0003.0002.0002 5 | 0001.0003.0002.0003 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | 0001.0003.0002.0002.0003 22 | 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | 0001.0003.0002.0002.0006(12 rows)
你不应该忘记数据的顺序,如下的例子:
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0001');$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0002');$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0003');$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';user_id | path---------+-------------------------- 6 | 0001.0003 8 | 0001.0003.0001 9 | 0001.0003.0002 11 | 0001.0003.0002.0001 2 | 0001.0003.0002.0002 5 | 0001.0003.0002.0003 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | 0001.0003.0002.0002.0003 22 | 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | 0001.0003.0002.0002.0006 9 | 0001.0003.0001.0001 9 | 0001.0003.0001.0002 9 | 0001.0003.0001.0003(15 rows)
现在进行排序:
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path; user_id | path---------+-------------------------- 6 | 0001.0003 8 | 0001.0003.0001 9 | 0001.0003.0001.0001 9 | 0001.0003.0001.0002 9 | 0001.0003.0001.0003 9 | 0001.0003.0002 11 | 0001.0003.0002.0001 2 | 0001.0003.0002.0002 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | 0001.0003.0002.0002.0003 22 | 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | 0001.0003.0002.0002.0006 5 | 0001.0003.0002.0003(15 rows)
可以在lquery的非星号标签的末尾添加几个修饰符,以使其比完全匹配更匹配:
“ @”-不区分大小写匹配,例如a @匹配A “ *”-匹配任何带有该前缀的标签,例如foo *匹配foobar “%”-匹配以下划线开头的单词$ SELECT user_id, path FROM comments WHERE path ~ '0001.*{1,2}.0001|0002.*' ORDER by path; user_id | path---------+-------------------------- 2 | 0001.0001.0001 2 | 0001.0001.0001.0001 1 | 0001.0001.0001.0002 5 | 0001.0001.0001.0003 6 | 0001.0002.0001 8 | 0001.0003.0001 9 | 0001.0003.0001.0001 9 | 0001.0003.0001.0002 9 | 0001.0003.0001.0003 9 | 0001.0003.0002 11 | 0001.0003.0002.0001 2 | 0001.0003.0002.0002 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | 0001.0003.0002.0002.0003 22 | 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | 0001.0003.0002.0002.0006 5 | 0001.0003.0002.0003(19 rows)
我们来为parent ‘0001.0003’找到所有直接的childrens,见下:
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*{1}' ORDER by path; user_id | path---------+---------------- 8 | 0001.0003.0001 9 | 0001.0003.0002(2 rows)
为parent ‘0001.0003’找到所有的childrens,见下:
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path; user_id | path---------+-------------------------- 6 | 0001.0003 8 | 0001.0003.0001 9 | 0001.0003.0001.0001 9 | 0001.0003.0001.0002 9 | 0001.0003.0001.0003 9 | 0001.0003.0002 11 | 0001.0003.0002.0001 2 | 0001.0003.0002.0002 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | 0001.0003.0002.0002.0003 22 | 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | 0001.0003.0002.0002.0006 5 | 0001.0003.0002.0003(15 rows)
为children ‘0001.0003.0002.0002.0005’找到parent:
$ SELECT user_id, path FROM comments WHERE path = subpath('0001.0003.0002.0002.0005', 0, -1) ORDER by path; user_id | path---------+--------------------- 2 | 0001.0003.0002.0002(1 row)
如果你的路径不是唯一的,你会得到多条记录。
可以看出,使用ltree的物化路径非常简单。在本文中,我没有列出ltree的所有可能用法。它不被视为全文搜索问题ltxtquery。但是您可以在PostgreSQL官方文档()中找到它。
了解更多PostgreSQL热点资讯、新闻动态、精彩活动,请访问中国PostgreSQL官方网站:
解决更多PostgreSQL相关知识、技术、工作问题,请访问中国PostgreSQL官方问答社区:
下载更多PostgreSQL相关资料、工具、插件问题,请访问中国PostgreSQL官方下载网站:
转载地址:http://xmmxf.baihongyu.com/