初遇 PostgreSQL

Posted on Jun 26, 2023

以现有 Pleroma 数据库为基础学习 postgreSQL

本地安装 PG

用 Homebrew 安装

brew install postgresql@13

添加环境变量

# ~/.zshrc
+ # PostgreSQL
+ export PATH="/opt/homebrew/Cellar/postgresql@13/13.11_1/bin:$PATH"
+ export PGDATA="/opt/homebrew/var/postgresql@13"

让修改生效 & 测试

source ~/.zshrc # 对配置文件的修改生效
pg_ctl -V # 查看版本,return `pg_ctl (PostgreSQL) 13.1` 

dmg 安装

下载

修改环境变量

# PostgreSQL
export PATH="/Library/PostgreSQL/13/bin:$PATH"
source ~/.zshrc

远程操作 PG

登录

允许远程登录

locate postgresql.conf # 定位文件位置

# /etc/postgresql/13/main/postgresql.conf 
listen_addresses = '*' # 修改内容

添加客户端鉴定条目

# /etc/postgresql/13/main/pg_hba.conf
# 允许用户在提供有效口令(md5)后从某个 IP 地址登录
host all all 0.0.0.0/0 md5
host all all ::0/0 md5

[可选]修改密码

postgres=# alter user postgres with password 'xxxxxxxx';
ALTER ROLE

最后重启 pg

但是不知道为啥(也许以后就知道了)我开启 5432 端口无效:ufw 命令返回结果显示已开启,防火墙规则添加了,ip 检查工具还是显示端口未开放。然后黛黛教了我端口转发

ssh -L 127.0.0.1:5432:127.0.0.1:5432 username@hostname

连上之后保持打开这个窗口,我用 Navicat 连上了数据库,第一次见到了数据库到底是啥结构……(Navicat 是什么请看 用户手册

image

.
└── public # public 是默认的数据库模式(schema)。一个数据库可以包含多个模式,而每个模式又可以包含多个表、视图、函数等对象.schema 是一种逻辑容器,用于组织和管理数据库对象。它可以帮助在一个数据库中对不同的对象进行分类和分组
    ├── Tables # 表是存储数据的基本对象,类似于电子表格或数据库中的表格。它们由行和列组成,用于存储和组织数据
    ├── Views # 视图是虚拟表,是根据存储在其他表中的数据定义的查询结果。视图可以简化复杂查询,并提供一种方式来访问和使用数据
    ├── Materialized Views # 材料化视图是一种特殊类型的视图,它在创建时将查询结果缓存为实际的表。与普通视图不同,材料化视图的内容在查询时不会再次计算,而是使用预先计算的结果
    ├── Functions # 函数是一段封装了特定功能的可重用代码块。在数据库中,函数用于执行特定的计算、处理数据或返回结果
    ├── Queries
    └── Backups

解决数据库增长过快的问题(到底解决了没有呢?)

前提:备份数据库

# 备份
mkdir -p /pleromabackup
chmod 777 /pleromabackup
su - postgres
psql -U postgres
postgres=# \! pg_dump pleroma -f /pleromabackup/pleroma.pgdump
# 检查备份目录大小
[xxx@xxx pleromabackup]# du -hs
341M    .

🌟 **查询所有表的行数:**创建一个 count_em_all 函数,然后调用它获得准确的统计信息

  1. 创建:
CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT 
            c.relname
        FROM
            pg_catalog.pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE 
            c.relkind = ''r''
            AND n.nspname = ''public'' 
        ORDER BY 1 
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.relname 
            LOOP 
            END LOOP; 

            r.table_name := t_name.relname; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 
  1. 调用
SELECT
	* 
FROM
	count_em_all ( ) AS r 
ORDER BY
	r.num_rows DESC;

结果:

                 table_name                 | num_rows
--------------------------------------------+----------
 activities                                 |   171772
 objects                                    |   120664
 hashtags_objects                           |    38433
 users                                      |    22054
 hashtags                                   |     7719
 deliveries                                 |     5420
 counter_cache                              |      876
 notifications                              |      570
 oauth_authorizations                       |      479
 oauth_tokens                               |      474
 following_relationships                    |      472
 schema_migrations                          |      302
 apps                                       |       83
 user_relationships                         |       53
 conversation_participation_recipient_ships |       36
 instances                                  |       34
 oban_jobs                                  |       32
 conversation_participations                |       18
 conversations                              |       10
 config                                     |       10
 bookmarks                                  |        5
 markers                                    |        3
 user_notes                                 |        2
 data_migrations                            |        2
 backups                                    |        1
 moderation_log                             |        1
 password_reset_tokens                      |        1
 lists                                      |        1
 oban_peers                                 |        1
 announcements                              |        0
 filters                                    |        0
 announcement_read_relationships            |        0
 data_migration_failed_ids                  |        0
 mfa_tokens                                 |        0
 chats                                      |        0
 chat_message_references                    |        0
 push_subscriptions                         |        0
 registrations                              |        0
 report_notes                               |        0
 scheduled_activities                       |        0
 thread_mutes                               |        0
 user_invite_tokens                         |        0
(42 rows)

🌟 查询所有表的总大小(括表数据文件、索引文件、TOAST表、Free Space Map等对象的大小)

pleroma=# SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) AS size_mb
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

                  relname                   | size_mb 
--------------------------------------------+---------
 objects                                    | 266 MB
 activities                                 | 258 MB
 oban_jobs                                  | 100 MB
 users                                      | 64 MB
 hashtags_objects                           | 4096 kB
 hashtags                                   | 1288 kB
 deliveries                                 | 960 kB
 oauth_tokens                               | 592 kB
 oauth_authorizations                       | 344 kB
 notifications                              | 240 kB
 following_relationships                    | 232 kB
 counter_cache                              | 208 kB
 apps                                       | 176 kB
 instances                                  | 96 kB
 conversation_participations                | 88 kB
 lists                                      | 80 kB
 markers                                    | 80 kB
 oban_peers                                 | 80 kB
 schema_migrations                          | 64 kB
 conversation_participation_recipient_ships | 56 kB
 user_relationships                         | 56 kB
 data_migrations                            | 48 kB
 config                                     | 48 kB
 backups                                    | 48 kB
 conversations                              | 40 kB
 bookmarks                                  | 40 kB
 user_notes                                 | 40 kB
 filters                                    | 32 kB
 scheduled_activities                       | 32 kB
 registrations                              | 32 kB
 chat_message_references                    | 32 kB
 moderation_log                             | 32 kB
 password_reset_tokens                      | 24 kB
 push_subscriptions                         | 24 kB
 user_invite_tokens                         | 16 kB
 mfa_tokens                                 | 16 kB
 announcement_read_relationships            | 16 kB
 thread_mutes                               | 16 kB
 announcements                              | 16 kB
 report_notes                               | 16 kB
 chats                                      | 16 kB
 data_migration_failed_ids                  | 16 kB
(42 rows)

列出行数和 size 排前五的表

🌟 activities 表

type 字段的所有值

pleroma=# SELECT DISTINCT data->>'type'
FROM activities;
  ?column?  
------------
 Follow
 Block
 Announce
 Move
 Delete
 Undo
 Create
 EmojiReact
 Remove
 Update
 Like
 Add
 Reject
 Accept
(14 rows)

列出值是 Announce 的所有行

SELECT * FROM activities WHERE data->>'type' = 'Announce';
./bin/pleroma_ctl database vacuum full

SELECT a."actor", a."data"->>'type' AS type, a."updated_at", o."data" 
FROM "activities" AS a 
INNER JOIN "objects" AS o
ON (o."data"->>'id') = COALESCE(a."data"->'object'->>'id', a."data"->>'object')
WHERE a."data"->>'actor' = 'https://pleroma.your.site/users/admin'
OR a."data"->'to' ? 'https://pleroma.your.site/users/admin'
OR a."data"->'cc' ? 'https://pleroma.your.site/users/admin'
ORDER BY a."updated_at";

🌟 users 表里有很多很多只有 nickname 的用户,它们所在实例是真实的,但是 ID 不是,我把这些行删了,暂时还没出问题

SELECT * FROM users WHERE name IS NULL;
DELETE FROM users WHERE name IS NULL;

🌟 删除了一个 bot 的记录

pleroma=# SELECT *
FROM objects
WHERE data->>'actor' = 'https://xx.ca/users/xxxBot';

pleroma=# DELETE FROM objects
WHERE data->>'actor' = 'https://xx.ca/users/xxxBot';
DELETE 485

🌟 删除了 activities 表 2023-06-15 之前、与我无讨论关系、书签以外的内容

pleroma=# DELETE FROM activities 
WHERE id NOT IN (SELECT activity_id FROM bookmarks) 
  AND inserted_at < '2023-06-15'
  AND id NOT IN (
    SELECT a.id
    FROM activities AS a
    INNER JOIN objects AS o ON (o."data"->>'id') = COALESCE(a."data"->'object'->>'id', a."data"->>'object')
    WHERE a."data"->>'actor' = 'https://ponderduck.cc/users/yaya'
      OR a."data"->'to' ? 'https://ponderduck.cc/users/yaya'
      OR a."data"->'cc' ? 'https://ponderduck.cc/users/yaya'
  );

DELETE 89430

🌟 删除了 objects 表与以上内容对应的行

/*
创建一个临时表 temp_objects 
来存储那些在 activities 表中找不到对应行的 objects 表的行
*/
pleroma=# CREATE TEMPORARY TABLE temp_objects AS
SELECT o.*
FROM objects o
LEFT JOIN activities a ON o.data->>'id' = a.data->'object'->>'id'
WHERE a.data->'object'->>'id' IS NULL;

SELECT 128652

/*
报错:
执行删除操作时违反了外键约束
objects 表中的某些行仍然被 deliveries 表引用,因此无法直接删除

deliveries 表是用于跟踪消息传递的表,通常在 ActivityPub 协议的实现中使用
它记录了消息传递的相关信息,例如发送方、接收方、消息内容等
每条记录代表一次消息传递或交互。
在 Pleroma 中,deliveries 表用于存储消息的传递状态和相关信息,以确保消息能够正确地传递给目标用户
它与 objects 表和 activities 表之间可能存在外键关系,用于保证数据的完整性
*/
pleroma=# DELETE FROM objects
WHERE id IN (SELECT id FROM temp_objects);

ERROR:  update or delete on table "objects" violates foreign key constraint "deliveries_object_id_fkey" on table "deliveries"
DETAIL:  Key (id)=(75072) is still referenced from table "deliveries".

/*
所以先删除被 deliveries 表引用的行
*/
pleroma=# DELETE FROM deliveries
WHERE user_id NOT IN (SELECT id FROM activities);
DELETE 6372

/*
继续执行删除操作并删除临时表
*/
pleroma=# DELETE FROM objects
WHERE id IN (SELECT id FROM temp_objects);
DELETE 128652

pleroma=# DROP TABLE temp_objects;
DROP TABLE

最后 su pleroma -s $SHELL -lc "./bin/pleroma_ctl database vacuum full ",但是!SQL 命令有问题,删了我好多嘟,呜呜呜哭着倒地。重新看了一下数据库大小,现在倒是变得迷你了,164 MB。但是这一切真的值得吗……完全在我计划外的……

常用 psql 命令

psql -U postgres # 指定用户连接PostgreSQL
psql -d postgres # 指定数据库连接PostgreSQL
psql -h 127.0.0.1 -p 5432 -U pg -d postgres # 连接到本地主机(127.0.0.1)的 PostgreSQL 数据库服务器,使用用户名 pg 连接到名为 postgres 的数据库
\du # 查看所有用户
\l # 查看所有数据库
\c{dbname} # 切换到数据库
\dt # 查看当前数据库的所有表
\d {tablename} # 查看指定表
SHOW data_directory; # 查看数据目录
\q # 退出交互查询模式