Zabbix Server Cluster部署最佳实践

架构设计

使用软件

  • REDHAT 8.4
  • Mysql 8.0
  • Zabbix 5.4

IP规划

vips for cluster

1
2
192.168.2.28 zabbix-ha-db
192.168.2.29 zabbix-ha-web

db nodes

1
2
192.168.2.24 zabbix-db1
192.168.2.25 zabbix-db2

web nodes

1
2
192.168.2.26 zabbix-server1
192.168.2.27 zabbix-server2

服务器通用配置

  • 时间同步
1
0 0 * * * /usr/sbin/ntpdate ntpserver  >> /root/ntpdate.log 2>&1 ; /sbin/hwclock -w
  • 关闭防火墙
1
2
systemctl stop firewalld
systemctl disable firewalld
  • 关闭SElinux
1
2
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
  • 配置hosts文件
1
2
3
4
5
6
7
8
9
10
11
# vips for cluster
192.168.2.28 zabbix-ha-db
192.168.2.29 zabbix-ha-web

# db nodes
192.168.2.24 zabbix-db1
192.168.2.25 zabbix-db2

# web nodes
192.168.2.26 zabbix-server1
192.168.2.27 zabbix-server2
  • 配置yum源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[base]
name=base
baseurl=http://yumserver/redhat/8.4/BaseOS/
enable=1
gpgcheck=0

[app]
name=app
baseurl=http://yumserver/redhat/8.4/AppStream/
enable=1
gpgcheck=0

[ha]
name=ha
baseurl=http://yumserver/redhat/ha-8.4/
enable=1
gpgcheck=0

[epel]
name=epel
baseurl=http://yumserver/epel/8/
enable=1
gpgcheck=0

数据库HA集群

集群安装

所有节点上执行:

  1. 安装HA组件
1
2
3
yum install pcs pacemaker fence-agents-all
systemctl start pcsd.service
systemctl enable pcsd.service
  1. 给hacluster用户设置密码(最好相同)
1
echo hacluster | passwd --stdin hacluster

任意节点上执行:

  1. 用相同的密码验证所有节点
1
pcs host auth zabbix-db1 zabbix-db2 -u hacluster -p hacluster
  1. 创建database cluster和增加资源
1
pcs cluster setup zabbix_db_cluster zabbix-db1 zabbix-db2
  1. 启用集群
1
2
3
4
5
6
7
pcs cluster start --all

#启动pacemaker服务
systemctl start pacemaker.service

pcs cluster enable --all
systemctl enable pacemaker.service
  1. 检查集群状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@zabbix-db1 pcsd]# pcs status
Cluster name: zabbix_db_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
* Stack: corosync
* Current DC: zabbix-db1 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 10:39:04 2021
* Last change: Thu Jun 24 10:39:01 2021 by hacluster via crmd on zabbix-db1
* 2 nodes configured
* 0 resource instances configured

Node List:
* Online: [ zabbix-db1 zabbix-db2 ]

Full List of Resources:
* No resources

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

集群参数配置

  1. 禁用fencing
1
pcs property set stonith-enabled=false
  1. 忽略quorum状态
1
pcs property set no-quorum-policy=ignore
  1. 配置转移策略
1
pcs resource defaults migration-threshold=1

创建Service和测试故障转移

  1. 创建VIP服务
1
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.2.28 op monitor interval=5s --group zabbix_db_cluster
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@zabbix-db1 pcsd]# pcs status
Cluster name: zabbix_db_cluster
Cluster Summary:
* Stack: corosync
* Current DC: zabbix-db1 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 13:22:59 2021
* Last change: Thu Jun 24 13:22:53 2021 by root via cibadmin on zabbix-db1
* 2 nodes configured
* 1 resource instance configured

Node List:
* Online: [ zabbix-db1 zabbix-db2 ]

Full List of Resources:
* Resource Group: zabbix_db_cluster:
* VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-db1 <<<<==== 表示vip目前是运行在db1上

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

vip可以ping通

1
2
3
4
5
[root@zabbix-db1 pcsd]# ping 192.168.2.28
PING 192.168.2.28 (192.168.2.28) 56(84) bytes of data.
64 bytes from 192.168.2.28: icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from 192.168.2.28: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 192.168.2.28: icmp_seq=3 ttl=64 time=0.030 ms

IP信息里也能看到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@zabbix-db1 pcsd]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:94:23:a2 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.24/24 brd 192.168.2.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 192.168.2.28/24 brd 192.168.2.255 scope global secondary ens192
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe94:23a2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
  1. 通过crm_resource强制终止服务
1
crm_resource --resource VirtualIP --force-stop

通过crm_mon来监控资源的运行情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#crm_mon

Cluster Summary:
* Stack: corosync
* Current DC: zabbix-db1 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 13:30:14 2021
* Last change: Thu Jun 24 13:22:53 2021 by root via cibadmin on zabbix-db1
* 2 nodes configured
* 1 resource instance configured

Node List:
* Online: [ zabbix-db1 zabbix-db2 ]

Active Resources:
* Resource Group: zabbix_db_cluster:
* VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-db2

Failed Resource Actions:
* VirtualIP_monitor_5000 on zabbix-db1 'not running' (7): call=16, status='complete', exitreason='', last-rc-change='2021-06-24 13:30:08 +08:00', queued=0ms, exec=0ms

这里可以看到VirtualIP服务自动转移到了zabbix-db2节点上,由此验证了资源的自动故障转移。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@zabbix-db2 pcsd]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:94:8d:28 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.25/24 brd 192.168.2.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 192.168.2.28/24 brd 192.168.2.255 scope global secondary ens192
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe94:8d28/64 scope link noprefixroute
valid_lft forever preferred_lft forever
  1. 避免频繁故障转移,意思就是当转移到节点二后可以一直在节点2运行,直到节点2出现异常
1
pcs resource defaults resource−stickiness=100

Mysql安装

单节点安装

  1. 选择使用新版的mysql8.0
1
yum install mysql-community-server

这里由于redhat版本太新(8.4),导致只能手动本地安装

1
2
3
4
5
6
7
[root@zabbix-db1 ~]# ll *.rpm
-rw-r--r-- 1 root root 14055044 4月 26 15:36 mysql-community-client-8.0.25-1.el8.x86_64.rpm
-rw-r--r-- 1 root root 632760 4月 26 15:36 mysql-community-common-8.0.25-1.el8.x86_64.rpm
-rw-r--r-- 1 root root 1498488 4月 26 15:37 mysql-community-libs-8.0.25-1.el8.x86_64.rpm
-rw-r--r-- 1 root root 55485448 4月 26 15:38 mysql-community-server-8.0.25-1.el8.x86_64.rpm

yum localinstall mysql-community-*.rpm
  1. 启动mysql
1
2
systemctl start mysqld
systemctl status mysqld
  1. 修改初始密码
1
2
3
4
5
sudo grep 'temporary password' /var/log/mysqld.log

shell> mysql -uroot -p

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'MyNewPass4!';
  1. 修改my.cnf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
[mysqld]
datadir=/u01/data
socket=/var/lib/mysql/mysql.sock

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

default_authentication_plugin=mysql_native_password
# replication
server-id=1
binlog_format = mixed
report_host = zabbix-db1
expire_logs_days = 3
gtid_mode = ON
enforce_gtid_consistency = ON
master-verify-checksum = 1
log-slave-updates = ON
log-bin = /var/lib/mysql/mysql-bin
log_bin_index = /var/lib/mysql/mysql-bin.index
relay_log = /var/lib/mysql/relay-bin
relay_log_index = /var/lib/mysql/relay-bin.index
binlog_checksum = crc32

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# logs
log-error=/var/log/mysqld.log

key_buffer_size = 256M
table_open_cache = 512
sort_buffer_size = 16M
net_buffer_length = 8K
read_buffer_size = 16M
read_rnd_buffer_size = 512K
thread_cache_size = 64

# network
connect_timeout= 60
wait_timeout= 28800
max_connections= 2048
max_allowed_packet= 500M
max_connect_errors= 1000

# limits
tmp_table_size= 512M
max_heap_table_size= 256M

# innodb
default-storage-engine = InnoDB
innodb = force
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 20G #一般设为服务器物理内存的70%-80%
innodb_log_file_size = 2G
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_file_per_table=1
innodb_flush_method= O_DIRECT
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_open_files = 65535
innodb_lru_scan_depth = 4000
innodb_lock_wait_timeout = 10
innodb_rollback_on_timeout = 1
innodb_print_all_deadlocks = 1
innodb_online_alter_log_max_size = 4G
innodb_print_ddl_logs = 1
innodb_status_file = 1

# other stuff
event_scheduler= 1
group_concat_max_len = 1000000
log_timestamps= SYSTEM
sync-binlog = 0

[client]
user=zabbix
password=zabbix
  1. 修改二号节点配置
1
2
server_id                       = 2 			## Last number of IP
report_host = zabbix-db2 ## Hostname

Mysql同步配置

登录zabbix-db1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#mysql −uroot −p

mysql> stop slave;

mysql> create user 'rep'@'192.168.2.25' identified by 'MyNewPass4!';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'rep'@'192.168.2.25';

mysql> show master status\G;
*************************** 1. row ***************************
File: mysql-bin.000004
Position: 711
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 928d976b-d4b8-11eb-82e2-0050569423a2:1-2
1 row in set (0.00 sec)

ERROR:
No query specified

登录zabbix-db2

配置db2作为db1的从库

1
2
3
4
5
mysql −uroot −p<MYSQL_ROOT_PASSWORD>

STOP SLAVE;

CHANGE MASTER TO MASTER_HOST = '192.168.2.24', MASTER_USER = 'rep', MASTER_PASSWORD='MyNewPass4!', MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 711;

配置db2的同步账号

1
2
mysql> create user 'rep'@'192.168.2.24' identified by 'MyNewPass4!';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'rep'@'192.168.2.24';

重置db2的master,启动slave

1
2
3
4
RESET MASTER;

START SLAVE;
SHOW SLAVE STATUS\G

查看db2的master信息

1
2
3
4
5
6
7
8
9
10
11
mysql> show master status\G;
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 156
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)

ERROR:
No query specified

登录zabbix-db1

配置db1作为db2的从库

1
2
3
CHANGE MASTER TO MASTER_HOST = '192.168.2.25', MASTER_USER = 'rep', MASTER_PASSWORD='MyNewPass4!', MASTER_LOG_FILE = 'mysql-bin.000001', MASTER_LOG_POS = 156;

START SLAVE;

Zabbix Server Mysql性能优化

zabbix一般来说最大的瓶颈都在于数据库层面,大量数据的读写导致压力很大,所以可以历史数据表进行分区处理。

由于即使关闭了前端的housekeeping,zabbix server依旧会写相关信息到housekeeper表中,所以将其关闭

1
ALTER TABLE housekeeper ENGINE = BLACKHOLE;

首先需要对7张表做一个分区的初始化操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
ALTER TABLE `history` PARTITION BY RANGE ( clock)
(PARTITION p2021_06_24 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_25 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-25 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_26 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-26 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_log` PARTITION BY RANGE ( clock)
(PARTITION p2021_06_24 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_25 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-25 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_26 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-26 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_str` PARTITION BY RANGE ( clock)
(PARTITION p2021_06_24 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_25 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-25 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_26 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-26 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_text` PARTITION BY RANGE ( clock)
(PARTITION p2021_06_24 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_25 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-25 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_26 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-26 00:00:00")) ENGINE = InnoDB);


ALTER TABLE `history_uint` PARTITION BY RANGE ( clock)
(PARTITION p2021_06_24 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_25 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-25 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_06_26 VALUES LESS THAN (UNIX_TIMESTAMP("2021-06-26 00:00:00")) ENGINE = InnoDB);


ALTER TABLE `trends` PARTITION BY RANGE ( clock)
(PARTITION p2021_06 VALUES LESS THAN (UNIX_TIMESTAMP("2021-07-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_07 VALUES LESS THAN (UNIX_TIMESTAMP("2021-08-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_08 VALUES LESS THAN (UNIX_TIMESTAMP("2021-09-01 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `trends_uint` PARTITION BY RANGE ( clock)
(PARTITION p2021_06 VALUES LESS THAN (UNIX_TIMESTAMP("2021-07-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_07 VALUES LESS THAN (UNIX_TIMESTAMP("2021-08-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2021_08 VALUES LESS THAN (UNIX_TIMESTAMP("2021-09-01 00:00:00")) ENGINE = InnoDB);

开启event

1
2
3
4
5
6
7
8
mysql> show variables like '%event_scheduler%';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| event_scheduler | OFF |
+-----------------+-------+

mysql> set global event_scheduler=1;

通过自带的存储过程来实现定时自动增删分区

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
USE `zabbix`;

CREATE TABLE IF NOT EXISTS `manage_partitions` (
`tablename` VARCHAR(64) NOT NULL COMMENT 'Table name',
`period` VARCHAR(64) NOT NULL COMMENT 'Period - daily or monthly',
`keep_history` INT(3) UNSIGNED NOT NULL DEFAULT '1' COMMENT 'For how many days or months to keep the partitions',
`last_updated` DATETIME DEFAULT NULL COMMENT 'When a partition was added last time',
`comments` VARCHAR(128) DEFAULT '1' COMMENT 'Comments',
PRIMARY KEY (`tablename`)
) ENGINE=INNODB;

CREATE TABLE IF NOT EXISTS `manage_partitions_history` (
`schema_name` varchar(64) NOT NULL COMMENT 'Zabbix schema name',
`table_name` varchar(64) NOT NULL COMMENT 'Zabbix table name',
`table_partition_name` varchar(64) NOT NULL COMMENT 'Zabbix table partition name',
`partition_action` varchar(64) NOT NULL COMMENT 'Zabbix table partition action',
`partition_action_date` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'When a partition was added or dropped'
) ENGINE=InnoDB;

INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('history', 'day', 90, now(), '');
INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('history_uint', 'day', 90, now(), '');
INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('history_str', 'day', 90, now(), '');
INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('history_text', 'day', 90, now(), '');
INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('history_log', 'day', 90, now(), '');
INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('trends', 'month', 24, now(), '');
INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('trends_uint', 'month', 24, now(), '');

DROP PROCEDURE IF EXISTS `create_next_partitions`;
DROP PROCEDURE IF EXISTS `create_partition_by_day`;
DROP PROCEDURE IF EXISTS `create_partition_by_month`;
DROP PROCEDURE IF EXISTS `drop_partitions`;
DROP PROCEDURE IF EXISTS `drop_old_partition`;
DROP EVENT IF EXISTS `e_zbx_part_mgmt`;

DELIMITER $$
CREATE PROCEDURE `create_next_partitions`(IN_SCHEMANAME VARCHAR(64))
BEGIN
DECLARE TABLENAME_TMP VARCHAR(64);
DECLARE PERIOD_TMP VARCHAR(12);
DECLARE DONE INT DEFAULT 0;
DECLARE get_prt_tables CURSOR FOR
SELECT `tablename`, `period`
FROM manage_partitions;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN get_prt_tables;
loop_create_part: LOOP
IF DONE THEN
LEAVE loop_create_part;
END IF;
FETCH get_prt_tables INTO TABLENAME_TMP, PERIOD_TMP;
CASE
WHEN PERIOD_TMP = 'day' THEN
CALL `create_partition_by_day`(IN_SCHEMANAME, TABLENAME_TMP);
WHEN PERIOD_TMP = 'month' THEN
CALL `create_partition_by_month`(IN_SCHEMANAME, TABLENAME_TMP);
ELSE
BEGIN
ITERATE loop_create_part;
END;
END CASE;
UPDATE manage_partitions set last_updated = NOW() WHERE tablename = TABLENAME_TMP;
END LOOP loop_create_part;
CLOSE get_prt_tables;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `create_partition_by_day`(IN_SCHEMANAME VARCHAR(64), IN_TABLENAME VARCHAR(64))
BEGIN
DECLARE BEGINTIME TIMESTAMP;
DECLARE ENDTIME INT UNSIGNED;
DECLARE PART_ACTION VARCHAR(12);
DECLARE PARTITIONNAME VARCHAR(16);
DECLARE ROWS_CNT INT UNSIGNED;
SET BEGINTIME = DATE(NOW()) + INTERVAL 1 DAY;
SET ENDTIME = UNIX_TIMESTAMP(BEGINTIME + INTERVAL 1 DAY);
SET PART_ACTION = 'ADD';
SET PARTITIONNAME = DATE_FORMAT( BEGINTIME, 'p%Y_%m_%d' );
SELECT COUNT(*) INTO ROWS_CNT
FROM information_schema.partitions
WHERE table_schema = IN_SCHEMANAME
AND table_name = IN_TABLENAME
AND partition_name = PARTITIONNAME;
IF ROWS_CNT = 0 THEN
SET @SQL = CONCAT('INSERT INTO `manage_partitions_history`(`schema_name`,`table_name`,`table_partition_name`,`partition_action`) VALUES ("', IN_SCHEMANAME,'","', IN_TABLENAME,'","', PARTITIONNAME,'","', PART_ACTION,'");');
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SET @SQL = CONCAT( 'ALTER TABLE `', IN_SCHEMANAME, '`.`', IN_TABLENAME, '`',' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', ENDTIME, '));' );
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
ELSE
SELECT CONCAT("partition `", PARTITIONNAME, "` for table `",IN_SCHEMANAME, ".", IN_TABLENAME, "` already exists") AS result;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `create_partition_by_month`(IN_SCHEMANAME VARCHAR(64), IN_TABLENAME VARCHAR(64))
BEGIN
DECLARE BEGINTIME TIMESTAMP;
DECLARE ENDTIME INT UNSIGNED;
DECLARE PART_ACTION VARCHAR(12);
DECLARE PARTITIONNAME VARCHAR(16);
DECLARE ROWS_CNT INT UNSIGNED;
SET BEGINTIME = DATE(NOW() - INTERVAL DAY(NOW()) DAY + INTERVAL 1 DAY + INTERVAL 1 MONTH);
SET ENDTIME = UNIX_TIMESTAMP(BEGINTIME + INTERVAL 1 MONTH);
SET PART_ACTION = 'ADD';
SET PARTITIONNAME = DATE_FORMAT( BEGINTIME, 'p%Y_%m' );
SELECT COUNT(*) INTO ROWS_CNT
FROM information_schema.partitions
WHERE table_schema = IN_SCHEMANAME
AND table_name = IN_TABLENAME
AND partition_name = PARTITIONNAME;
IF ROWS_CNT = 0 THEN
SET @SQL = CONCAT('INSERT INTO `manage_partitions_history`(`schema_name`,`table_name`,`table_partition_name`,`partition_action`) VALUES ("', IN_SCHEMANAME,'","', IN_TABLENAME,'","', PARTITIONNAME,'","', PART_ACTION,'");');
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SET @SQL = CONCAT( 'ALTER TABLE `', IN_SCHEMANAME, '`.`', IN_TABLENAME, '`',' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', ENDTIME, '));' );
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
ELSE
SELECT CONCAT("partition `", PARTITIONNAME, "` for table `",IN_SCHEMANAME, ".", IN_TABLENAME, "` already exists") AS result;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `drop_partitions`(IN_SCHEMANAME VARCHAR(64))
BEGIN
DECLARE TABLENAME_TMP VARCHAR(64);
DECLARE PARTITIONNAME_TMP VARCHAR(64);
DECLARE VALUES_LESS_TMP INT;
DECLARE PERIOD_TMP VARCHAR(12);
DECLARE KEEP_HISTORY_TMP INT;
DECLARE KEEP_HISTORY_BEFORE INT;
DECLARE DONE INT DEFAULT 0;
DECLARE get_partitions CURSOR FOR
SELECT p.`table_name`, p.`partition_name`, LTRIM(RTRIM(p.`partition_description`)), mp.`period`, mp.`keep_history`
FROM information_schema.partitions p
JOIN manage_partitions mp ON mp.tablename = p.table_name
WHERE p.table_schema = IN_SCHEMANAME
ORDER BY p.table_name, p.subpartition_ordinal_position;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN get_partitions;
loop_check_prt: LOOP
IF DONE THEN
LEAVE loop_check_prt;
END IF;
FETCH get_partitions INTO TABLENAME_TMP, PARTITIONNAME_TMP, VALUES_LESS_TMP, PERIOD_TMP, KEEP_HISTORY_TMP;
CASE
WHEN PERIOD_TMP = 'day' THEN
SET KEEP_HISTORY_BEFORE = UNIX_TIMESTAMP(DATE(NOW() - INTERVAL KEEP_HISTORY_TMP DAY));
WHEN PERIOD_TMP = 'month' THEN
SET KEEP_HISTORY_BEFORE = UNIX_TIMESTAMP(DATE(NOW() - INTERVAL KEEP_HISTORY_TMP MONTH - INTERVAL DAY(NOW())-1 DAY));
ELSE
BEGIN
ITERATE loop_check_prt;
END;
END CASE;
IF KEEP_HISTORY_BEFORE >= VALUES_LESS_TMP THEN
CALL drop_old_partition(IN_SCHEMANAME, TABLENAME_TMP, PARTITIONNAME_TMP);
END IF;
END LOOP loop_check_prt;
CLOSE get_partitions;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `drop_old_partition`(IN_SCHEMANAME VARCHAR(64), IN_TABLENAME VARCHAR(64), IN_PARTITIONNAME VARCHAR(64))
BEGIN
DECLARE PART_ACTION VARCHAR(12);
DECLARE PART_ACTION_DATE INT;
DECLARE ROWS_CNT INT UNSIGNED;
SELECT COUNT(*) INTO ROWS_CNT
FROM information_schema.partitions
WHERE table_schema = IN_SCHEMANAME
AND table_name = IN_TABLENAME
AND partition_name = IN_PARTITIONNAME;
SET PART_ACTION = 'DROP';
IF ROWS_CNT = 1 THEN
SET @SQL = CONCAT('INSERT INTO `manage_partitions_history`(`schema_name`,`table_name`,`table_partition_name`,`partition_action`) VALUES ("', IN_SCHEMANAME,'","', IN_TABLENAME,'","', IN_PARTITIONNAME,'","', PART_ACTION,'");');
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SET @SQL = CONCAT( 'ALTER TABLE `', IN_SCHEMANAME, '`.`', IN_TABLENAME, '`',' DROP PARTITION ', IN_PARTITIONNAME, ';' );
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
ELSE
SELECT CONCAT("partition `", IN_PARTITIONNAME, "` for table `", IN_SCHEMANAME, ".", IN_TABLENAME, "` does not exist") AS result;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE EVENT `e_zbx_part_mgmt`
ON SCHEDULE EVERY 1 DAY STARTS '2021-06-27 01:00:00'
ON COMPLETION PRESERVE
ENABLE
COMMENT 'Creating and dropping partitions'
DO BEGIN
CALL zabbix.drop_partitions('zabbix');
CALL zabbix.create_next_partitions('zabbix');
END$$
DELIMITER ;

Zabbix Proxy Mysql性能优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# 重建proxy_history表
drop table proxy_history;

CREATE TABLE `proxy_history` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`itemid` bigint(20) unsigned NOT NULL,
`clock` int(11) NOT NULL DEFAULT '0',
`timestamp` int(11) NOT NULL DEFAULT '0',
`source` varchar(64) COLLATE utf8_bin NOT NULL DEFAULT '',
`severity` int(11) NOT NULL DEFAULT '0',
`value` longtext COLLATE utf8_bin NOT NULL,
`logeventid` int(11) NOT NULL DEFAULT '0',
`ns` int(11) NOT NULL DEFAULT '0',
`state` int(11) NOT NULL DEFAULT '0',
`lastlogsize` bigint(20) unsigned NOT NULL DEFAULT '0',
`mtime` int(11) NOT NULL DEFAULT '0',
`flags` int(11) NOT NULL DEFAULT '0',
`write_clock` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`,clock),
KEY `proxy_history_1` (`clock`)
) ENGINE=InnoDB AUTO_INCREMENT=621359368 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

ALTER TABLE `proxy_history` PARTITION BY RANGE ( clock)
(PARTITION p2023_01_01 VALUES LESS THAN (UNIX_TIMESTAMP("2023-01-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2023_01_02 VALUES LESS THAN (UNIX_TIMESTAMP("2023-01-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2023_01_03 VALUES LESS THAN (UNIX_TIMESTAMP("2023-01-04 00:00:00")) ENGINE = InnoDB);

# 通过定时任务增加和删除分区
USE `zabbix_proxy`;

CREATE TABLE IF NOT EXISTS `manage_partitions` (
`tablename` VARCHAR(64) NOT NULL COMMENT 'Table name',
`period` VARCHAR(64) NOT NULL COMMENT 'Period - daily or monthly',
`keep_history` INT(3) UNSIGNED NOT NULL DEFAULT '1' COMMENT 'For how many days or months to keep the partitions',
`last_updated` DATETIME DEFAULT NULL COMMENT 'When a partition was added last time',
`comments` VARCHAR(128) DEFAULT '1' COMMENT 'Comments',
PRIMARY KEY (`tablename`)
) ENGINE=INNODB;

CREATE TABLE IF NOT EXISTS `manage_partitions_history` (
`schema_name` varchar(64) NOT NULL COMMENT 'Zabbix schema name',
`table_name` varchar(64) NOT NULL COMMENT 'Zabbix table name',
`table_partition_name` varchar(64) NOT NULL COMMENT 'Zabbix table partition name',
`partition_action` varchar(64) NOT NULL COMMENT 'Zabbix table partition action',
`partition_action_date` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'When a partition was added or dropped'
) ENGINE=InnoDB;

INSERT INTO manage_partitions (tablename, period, keep_history, last_updated, comments) VALUES ('proxy_history', 'day', 2, now(), '');

DROP PROCEDURE IF EXISTS `create_next_partitions`;
DROP PROCEDURE IF EXISTS `create_partition_by_day`;
DROP PROCEDURE IF EXISTS `create_partition_by_month`;
DROP PROCEDURE IF EXISTS `drop_partitions`;
DROP PROCEDURE IF EXISTS `drop_old_partition`;
DROP EVENT IF EXISTS `e_zbx_part_mgmt`;

DELIMITER $$
CREATE PROCEDURE `create_next_partitions`(IN_SCHEMANAME VARCHAR(64))
BEGIN
DECLARE TABLENAME_TMP VARCHAR(64);
DECLARE PERIOD_TMP VARCHAR(12);
DECLARE DONE INT DEFAULT 0;
DECLARE get_prt_tables CURSOR FOR
SELECT `tablename`, `period`
FROM manage_partitions;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN get_prt_tables;
loop_create_part: LOOP
IF DONE THEN
LEAVE loop_create_part;
END IF;
FETCH get_prt_tables INTO TABLENAME_TMP, PERIOD_TMP;
CASE
WHEN PERIOD_TMP = 'day' THEN
CALL `create_partition_by_day`(IN_SCHEMANAME, TABLENAME_TMP);
WHEN PERIOD_TMP = 'month' THEN
CALL `create_partition_by_month`(IN_SCHEMANAME, TABLENAME_TMP);
ELSE
BEGIN
ITERATE loop_create_part;
END;
END CASE;
UPDATE manage_partitions set last_updated = NOW() WHERE tablename = TABLENAME_TMP;
END LOOP loop_create_part;
CLOSE get_prt_tables;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `create_partition_by_day`(IN_SCHEMANAME VARCHAR(64), IN_TABLENAME VARCHAR(64))
BEGIN
DECLARE BEGINTIME TIMESTAMP;
DECLARE ENDTIME INT UNSIGNED;
DECLARE PART_ACTION VARCHAR(12);
DECLARE PARTITIONNAME VARCHAR(16);
DECLARE ROWS_CNT INT UNSIGNED;
SET BEGINTIME = DATE(NOW()) + INTERVAL 1 DAY;
SET ENDTIME = UNIX_TIMESTAMP(BEGINTIME + INTERVAL 1 DAY);
SET PART_ACTION = 'ADD';
SET PARTITIONNAME = DATE_FORMAT( BEGINTIME, 'p%Y_%m_%d' );
SELECT COUNT(*) INTO ROWS_CNT
FROM information_schema.partitions
WHERE table_schema = IN_SCHEMANAME
AND table_name = IN_TABLENAME
AND partition_name = PARTITIONNAME;
IF ROWS_CNT = 0 THEN
SET @SQL = CONCAT('INSERT INTO `manage_partitions_history`(`schema_name`,`table_name`,`table_partition_name`,`partition_action`) VALUES ("', IN_SCHEMANAME,'","', IN_TABLENAME,'","', PARTITIONNAME,'","', PART_ACTION,'");');
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SET @SQL = CONCAT( 'ALTER TABLE `', IN_SCHEMANAME, '`.`', IN_TABLENAME, '`',' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', ENDTIME, '));' );
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
ELSE
SELECT CONCAT("partition `", PARTITIONNAME, "` for table `",IN_SCHEMANAME, ".", IN_TABLENAME, "` already exists") AS result;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `create_partition_by_month`(IN_SCHEMANAME VARCHAR(64), IN_TABLENAME VARCHAR(64))
BEGIN
DECLARE BEGINTIME TIMESTAMP;
DECLARE ENDTIME INT UNSIGNED;
DECLARE PART_ACTION VARCHAR(12);
DECLARE PARTITIONNAME VARCHAR(16);
DECLARE ROWS_CNT INT UNSIGNED;
SET BEGINTIME = DATE(NOW() - INTERVAL DAY(NOW()) DAY + INTERVAL 1 DAY + INTERVAL 1 MONTH);
SET ENDTIME = UNIX_TIMESTAMP(BEGINTIME + INTERVAL 1 MONTH);
SET PART_ACTION = 'ADD';
SET PARTITIONNAME = DATE_FORMAT( BEGINTIME, 'p%Y_%m' );
SELECT COUNT(*) INTO ROWS_CNT
FROM information_schema.partitions
WHERE table_schema = IN_SCHEMANAME
AND table_name = IN_TABLENAME
AND partition_name = PARTITIONNAME;
IF ROWS_CNT = 0 THEN
SET @SQL = CONCAT('INSERT INTO `manage_partitions_history`(`schema_name`,`table_name`,`table_partition_name`,`partition_action`) VALUES ("', IN_SCHEMANAME,'","', IN_TABLENAME,'","', PARTITIONNAME,'","', PART_ACTION,'");');
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SET @SQL = CONCAT( 'ALTER TABLE `', IN_SCHEMANAME, '`.`', IN_TABLENAME, '`',' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', ENDTIME, '));' );
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
ELSE
SELECT CONCAT("partition `", PARTITIONNAME, "` for table `",IN_SCHEMANAME, ".", IN_TABLENAME, "` already exists") AS result;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `drop_partitions`(IN_SCHEMANAME VARCHAR(64))
BEGIN
DECLARE TABLENAME_TMP VARCHAR(64);
DECLARE PARTITIONNAME_TMP VARCHAR(64);
DECLARE VALUES_LESS_TMP INT;
DECLARE PERIOD_TMP VARCHAR(12);
DECLARE KEEP_HISTORY_TMP INT;
DECLARE KEEP_HISTORY_BEFORE INT;
DECLARE DONE INT DEFAULT 0;
DECLARE get_partitions CURSOR FOR
SELECT p.`table_name`, p.`partition_name`, LTRIM(RTRIM(p.`partition_description`)), mp.`period`, mp.`keep_history`
FROM information_schema.partitions p
JOIN manage_partitions mp ON mp.tablename = p.table_name
WHERE p.table_schema = IN_SCHEMANAME
ORDER BY p.table_name, p.subpartition_ordinal_position;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN get_partitions;
loop_check_prt: LOOP
IF DONE THEN
LEAVE loop_check_prt;
END IF;
FETCH get_partitions INTO TABLENAME_TMP, PARTITIONNAME_TMP, VALUES_LESS_TMP, PERIOD_TMP, KEEP_HISTORY_TMP;
CASE
WHEN PERIOD_TMP = 'day' THEN
SET KEEP_HISTORY_BEFORE = UNIX_TIMESTAMP(DATE(NOW() - INTERVAL KEEP_HISTORY_TMP DAY));
WHEN PERIOD_TMP = 'month' THEN
SET KEEP_HISTORY_BEFORE = UNIX_TIMESTAMP(DATE(NOW() - INTERVAL KEEP_HISTORY_TMP MONTH - INTERVAL DAY(NOW())-1 DAY));
ELSE
BEGIN
ITERATE loop_check_prt;
END;
END CASE;
IF KEEP_HISTORY_BEFORE >= VALUES_LESS_TMP THEN
CALL drop_old_partition(IN_SCHEMANAME, TABLENAME_TMP, PARTITIONNAME_TMP);
END IF;
END LOOP loop_check_prt;
CLOSE get_partitions;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE `drop_old_partition`(IN_SCHEMANAME VARCHAR(64), IN_TABLENAME VARCHAR(64), IN_PARTITIONNAME VARCHAR(64))
BEGIN
DECLARE PART_ACTION VARCHAR(12);
DECLARE PART_ACTION_DATE INT;
DECLARE ROWS_CNT INT UNSIGNED;
SELECT COUNT(*) INTO ROWS_CNT
FROM information_schema.partitions
WHERE table_schema = IN_SCHEMANAME
AND table_name = IN_TABLENAME
AND partition_name = IN_PARTITIONNAME;
SET PART_ACTION = 'DROP';
IF ROWS_CNT = 1 THEN
SET @SQL = CONCAT('INSERT INTO `manage_partitions_history`(`schema_name`,`table_name`,`table_partition_name`,`partition_action`) VALUES ("', IN_SCHEMANAME,'","', IN_TABLENAME,'","', IN_PARTITIONNAME,'","', PART_ACTION,'");');
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
SET @SQL = CONCAT( 'ALTER TABLE `', IN_SCHEMANAME, '`.`', IN_TABLENAME, '`',' DROP PARTITION ', IN_PARTITIONNAME, ';' );
PREPARE STMT FROM @SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
ELSE
SELECT CONCAT("partition `", IN_PARTITIONNAME, "` for table `", IN_SCHEMANAME, ".", IN_TABLENAME, "` does not exist") AS result;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE EVENT `e_zbx_part_mgmt`
ON SCHEDULE EVERY 1 DAY STARTS '2021-06-27 01:00:00'
ON COMPLETION PRESERVE
ENABLE
COMMENT 'Creating and dropping partitions'
DO BEGIN
CALL zabbix_proxy.drop_partitions('zabbix_proxy');
CALL zabbix_proxy.create_next_partitions('zabbix_proxy');
END$$
DELIMITER ;

Zabbix数据库准备

创建zabbix数据库

1
2
3
4
5
6
# mysql -uroot -p
password
mysql> create database zabbix character set utf8 collate utf8_bin;
mysql> create user zabbix@'%' identified by 'zabbix';
mysql> grant all privileges on zabbix.* to zabbix@'%';
mysql> quit;

导入zabbix数据

1
2
3
## create.sql.gz从zabbix-server服务器上复制过来

zcat /usr/share/doc/zabbix-sql-scripts/mysql/create.sql.gz | mysql -uzabbix -p zabbix

Zabbix Server HA集群

集群安装

所有节点上执行:

  1. 安装HA组件

    1
    2
    3
    yum install pcs pacemaker fence-agents-all
    systemctl start pcsd.service
    systemctl enable pcsd.service
  2. 给hacluster用户设置密码(最好相同)

    1
    echo hacluster | passwd --stdin hacluster

任意节点上执行:

  1. 用相同的密码验证所有节点

    1
    pcs host auth zabbix-server1 zabbix-server2 -u hacluster -p hacluster
  2. 创建database cluster和增加资源

    1
    pcs cluster setup zabbix_server_cluster zabbix-server1 zabbix-server2
  3. 启用集群

    1
    2
    3
    4
    5
    6
    7
    pcs cluster start --all

    #启动pacemaker服务
    systemctl start pacemaker.service

    pcs cluster enable --all
    systemctl enable pacemaker.service
  4. 检查集群状态

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    [root@zabbix-server1 ~]# pcs status
    Cluster name: zabbix_server_cluster

    WARNINGS:
    No stonith devices and stonith-enabled is not false

    Cluster Summary:
    * Stack: corosync
    * Current DC: zabbix-server2 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
    * Last updated: Thu Jun 24 16:16:52 2021
    * Last change: Thu Jun 24 16:16:52 2021 by hacluster via crmd on zabbix-server2
    * 2 nodes configured
    * 0 resource instances configured

    Node List:
    * Online: [ zabbix-server1 zabbix-server2 ]

    Full List of Resources:
    * No resources

    Daemon Status:
    corosync: active/enabled
    pacemaker: active/enabled
    pcsd: active/enabled

集群参数配置

  1. 禁用fencing

    1
    pcs property set stonith-enabled=false
  2. 忽略quorum状态

    1
    pcs property set no-quorum-policy=ignore
  3. 配置转移策略

    1
    pcs resource defaults migration-threshold=1

创建Service和测试故障转移

  1. 创建VIP服务

    1
    pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.2.29 op monitor interval=5s --group zabbix_server_cluster
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [root@zabbix-server1 ~]# pcs status
    Cluster name: zabbix_server_cluster
    Cluster Summary:
    * Stack: corosync
    * Current DC: zabbix-server2 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
    * Last updated: Thu Jun 24 16:19:07 2021
    * Last change: Thu Jun 24 16:19:00 2021 by root via cibadmin on zabbix-server1
    * 2 nodes configured
    * 1 resource instance configured

    Node List:
    * Online: [ zabbix-server1 zabbix-server2 ]

    Full List of Resources:
    * Resource Group: zabbix_server_cluster:
    * VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-server1

    Daemon Status:
    corosync: active/enabled
    pacemaker: active/enabled
    pcsd: active/enabled

    vip可以ping通

    1
    2
    3
    4
    5
    [root@zabbix-server1 ~]# ping 192.168.2.29
    PING 192.168.2.29 (192.168.2.29) 56(84) bytes of data.
    64 bytes from 192.168.2.29: icmp_seq=1 ttl=64 time=0.023 ms
    64 bytes from 192.168.2.29: icmp_seq=2 ttl=64 time=0.037 ms
    64 bytes from 192.168.2.29: icmp_seq=3 ttl=64 time=0.035 ms

    IP信息里也能看到

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    [root@zabbix-server1 ~]# ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:94:1c:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.26/24 brd 192.168.2.255 scope global noprefixroute ens192
    valid_lft forever preferred_lft forever
    inet 192.168.2.29/24 brd 192.168.2.255 scope global secondary ens192
    valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe94:1c50/64 scope link noprefixroute
    valid_lft forever preferred_lft forever
  2. 通过crm_resource强制终止服务

    1
    crm_resource --resource VirtualIP --force-stop

    通过crm_mon来监控资源的运行情况

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    #crm_mon

    Cluster Summary:
    * Stack: corosync
    * Current DC: zabbix-server2 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
    * Last updated: Thu Jun 24 16:20:09 2021
    * Last change: Thu Jun 24 16:19:00 2021 by root via cibadmin on zabbix-server1
    * 2 nodes configured
    * 1 resource instance configured

    Node List:
    * Online: [ zabbix-server1 zabbix-server2 ]

    Active Resources:
    * Resource Group: zabbix_server_cluster:
    * VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-server2

    Failed Resource Actions:
    * VirtualIP_monitor_5000 on zabbix-server1 'not running' (7): call=7, status='complete', exitreason='', last-rc-change='2021-06-24 16:20:06 +08:00', queued=0ms, exec=0ms

    这里可以看到VirtualIP服务自动转移到了zabbix-server2节点上,由此验证了资源的自动故障转移。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    [root@zabbix-server2 ~]# ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:94:27:39 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.27/24 brd 192.168.2.255 scope global noprefixroute ens192
    valid_lft forever preferred_lft forever
    inet 192.168.2.29/24 brd 192.168.2.255 scope global secondary ens192
    valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe94:2739/64 scope link noprefixroute
    valid_lft forever preferred_lft forever
  3. 避免频繁故障转移,意思就是当转移到节点二后可以一直在节点2运行,直到节点2出现异常

    1
    pcs resource defaults resource−stickiness=100

Zabbix安装

  1. 配置zabbix yum源

    1
    2
    3
    4
    5
    [zabbix]
    name=zabbix
    baseurl=http://yumserver/zabbix/zabbix/5.4/rhel/8/x86_64/
    enable=1
    gpgcheck=0
  2. 安装zabbix server、前端和agent

    1
    dnf install zabbix-server-mysql zabbix-web-mysql zabbix-nginx-conf zabbix-sql-scripts zabbix-agent
  3. 配置zabbix_server.conf

    1
    2
    3
    4
    5
    6
    7
    8
    # 修改sourceip为VIP
    SourceIP=192.168.2.29

    # 修改dbhost为db的vip
    DBHost=192.168.2.28
    DBName=zabbix
    DBUser=zabbix
    DBPassword=<DB_ZABBIX_PASS>
  4. 对于zabbix_server节点创建ZabbixServer资源

    1
    pcs resource create zabbixserver systemd:zabbix-server op monitor interval=10s --group zabbix_server_cluster
  5. 两个zabbix server不能同时运行,所以要确保zabbix server只在其中一个节点在线

    1
    pcs constraint colocation add VirtualIP with zabbixserver INFINITY
  6. 确保VirtualIP在zabbixserver之前开始运行

    1
    pcs constraint order VirtualIP then zabbixserver
  7. 配置资源的超时时间

    1
    2
    pcs resource op add zabbixserver start interval=0s timeout=60s
    pcs resource op add zabbixserver stop interval=0s timeout=120s
  8. 检查资源状态

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    [root@zabbix-server1 zabbix]# pcs status
    Cluster name: zabbix_server_cluster
    Cluster Summary:
    * Stack: corosync
    * Current DC: zabbix-server1 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
    * Last updated: Thu Jun 24 17:58:18 2021
    * Last change: Thu Jun 24 17:56:40 2021 by root via crm_resource on zabbix-server1
    * 2 nodes configured
    * 2 resource instances configured

    Node List:
    * Online: [ zabbix-server1 zabbix-server2 ]

    Full List of Resources:
    * Resource Group: zabbix_server_cluster:
    * VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-server1
    * zabbixserver (systemd:zabbix-server): Started zabbix-server1

    Daemon Status:
    corosync: active/enabled
    pacemaker: active/enabled
    pcsd: active/enabled
  9. 修改/etc/nginx/conf.d/zabbix.conf文件

    1
    2
    listen          80;
    server_name 192.168.2.29;
  10. 启动zabbix server和agent

    1
    2
    systemctl restart zabbix-server zabbix-agent nginx php-fpm
    systemctl enable zabbix-server zabbix-agent nginx php-fpm
  11. 关闭IPV6

    1
    2
    3
    4
    5
    # 临时关闭
    # sysctl -w net.ipv6.conf.all.disable_ipv6=1

    # Disabling IPv6 in NetworkManager
    nmcli connection modify ens192 ipv6.method "disabled"
  12. 修改默认时区

    1
    2
    3
    4
    5
    6
    [root@zabbix-server1 ~]# vim /etc/php.ini

    date.timezone = Asia/Shanghai

    #重启服务
    systemctl restart php-fpm
  13. zabbix server参数配置(参考)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    [root@zabbix-server1 include]# egrep -v '^#|^$' /etc/zabbix/zabbix_server.conf 
    SourceIP=192.168.2.29
    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=0
    PidFile=/var/run/zabbix/zabbix_server.pid
    SocketDir=/var/run/zabbix
    DBHost=192.168.2.28
    DBName=zabbix
    DBUser=zabbix
    DBPassword=zabbix
    StartPollers=200
    StartPreprocessors=20
    StartPollersUnreachable=5
    StartTrappers=20
    StartPingers=5
    StartDiscoverers=5
    StartHTTPPollers=5
    StartTimers=5
    StartEscalators=5
    StartAlerters=5
    SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
    StartSNMPTrapper=1
    CacheSize=2G
    StartDBSyncers=20
    HistoryCacheSize=1G
    HistoryIndexCacheSize=512M
    TrendCacheSize=512M
    TrendFunctionCacheSize=128M
    ValueCacheSize=128M
    Timeout=30
    LogSlowQueries=3000
    StartLLDProcessors=20
    AllowRoot=1
    StatsAllowedIP=127.0.0.1

故障处理

验证节点失败Unable to communicate

1
2
3
4
rm -rf /var/lib/pcsd/
yum reinstall pcs -y
systemctl start pcsd.service
systemctl enable pcsd.service

创建cluster报错Unable to read the known-hosts file: No such file or directory: ‘/var/lib/pcsd/known-hosts’

1
2
3
pcs cluster destroy

然后重新pcs cluster setup

Authentication plugin ‘caching_sha2_password’ reported error: Authentication requires secure connection

使用复制用户请求服务器公钥:
mysql -u rep -p -h 192.168.2.24 -P3306 --get-server-public-key
在这种情况下,服务器将RSA公钥发送给客户端,后者使用它来加密密码并将结果返回给服务器。插件使用服务器端的RSA私钥解密密码,并根据密码是否正确来接受或拒绝连接。

重新在从库配置change masrer to并且start slave,复制可以正常启动:

1
2
3
4
5
6
7
8
#停止主从复制
#清空之前的主从复制配置信息
stop slave;
reset slave;

#从新配置主从复制
change master to master_user='repl',master_password='123',master_host='118.31.127.96',master_port=3307,master_auto_position=1;
start slave;

缺少字体

查看系统带的字体

1
locale -a

安装缺少的字体

1
yum install langpacks-en.noarch

Zabbix Server Cluster部署最佳实践
https://www.xbdba.com/2021/06/25/zabbix-server-cluster/
作者
xbdba
发布于
2021年6月25日
许可协议