分类目录归档:Zabbix

Zabbix API批量新增用户

给zabbix配置完LDAP验证,但是需要提前在zabbix中建好用户,这个就会比较麻烦一点,AD域中的用户好几千人,而涉及到IT的也有2 300人,加上配置邮箱、手机号等,很明显工作量巨大。

由于对python和LDAP都不太熟悉,研究了几天没想出如何从直接从ldap将用户信息同步到zabbix,但刚好在另外一套系统中发现了有将ad域账号落地到数据库的情况,于是退而求其次,从这个数据库取得数据,再参考zabbix api的官方文档,完成了想要的效果。

这里采用python3的版本,通过pyzabbix库进行调用。
参考pyzabbix文档的语法

from pyzabbix import ZabbixAPI

zapi = ZabbixAPI("http://zabbixserver.example.com")
zapi.login("zabbix user", "zabbix pass")
print("Connected to Zabbix API Version %s" % zapi.api_version())

for h in zapi.host.get(output="extend"):
    print(h['hostid'])

获取数据库中ad账号信息

这里存放的临时表,很简单通过pymysql获取

QurySql ="""select * from tmp_user"""
mycursor.execute(QurySql)
data = mycursor.fetchall()

总共取出4个字段,登录账号、别名、手机号、邮箱地址

获取已有账号信息

第一步就是要登陆到zabbix,获取系统里所有的user信息。
请求对象

{
    "jsonrpc": "2.0",
    "method": "user.get",
    "params": {
        "output": "extend"
    },
    "auth": "038e1d7b1735c6a5436ee9eae095879e",
    "id": 1
}

返回值

{
    "jsonrpc": "2.0",
    "result": [
        {
            "userid": "1",
            "alias": "Admin",
            "name": "Zabbix",
            "surname": "Administrator",
            "url": "",
            "autologin": "1",
            "autologout": "0s",
            "lang": "ru_RU",
            "refresh": "0s",
            "type": "3",
            "theme": "default",
            "attempt_failed": "0",
            "attempt_ip": "",
            "attempt_clock": "0",
            "rows_per_page": "50"
        }
    ],
    "id": 1
}

因为alias这里是唯一索引,所以调用user.get方法的时候通过alias进行判断,如果已存在则更新信息,如果不存在则新增用户。

user = zapi.user.get(filter={"alias": row[0]},output='extend')

这里可以根据传入的alias值获取到账户的所有属性,还有个比较重要的返回值叫userid,这个是要作为后面更新的时候的入参。

更新用户

根据前面得到的userid,这里只在属性里配置手机号和邮件地址,这样调用的时候就只会更新配置了的属性,其他保持不变

user_defaults = {'userid': user.userid, "user_medias": [
                {
                    "mediatypeid": "3",
                    "sendto": row[2]
                },
                {
                    "mediatypeid": "4",
                    "sendto": row[3]
                }
            ]}

新增用户

新增用户的内容就比较多一点,会设置短信和邮件的告警级别,和账号的一些其他属性默认密码等

user_defaults = {'alias': row[0], 'name': row[1], 'autologin': 1, 'autologout': 0, 'lang': 'zh_CN', 'type': 1,
                         'usrgrps': [{'usrgrpid': 88}], 'passwd': default_password, "user_medias": [
                {
                    "mediatypeid": "3",
                    "sendto": row[2],
                    "active": 0,
                    "severity": 48,
                    "period": "1-7,00:00-24:00"
                },
                {
                    "mediatypeid": "4",
                    "sendto": row[3],
                    "active": 0,
                    "severity": 8,
                    "period": "1-7,00:00-24:00"
                }
            ]}

完整脚本

#!/usr/bin/env python3
import pymysql
from pyzabbix import ZabbixAPI, ZabbixAPIException

mysql_conn = pymysql.connect("127.0.0.1","zabbix","zabbix","zabbix" )
mycursor = mysql_conn.cursor()


zabbix_server = 'http://zabbixserver.example.com'

zapi = ZabbixAPI(zabbix_server)

# login to the Zabbix API
zapi.login('Admin','zabbix')


QurySql ="""select * from tmp_user"""
mycursor.execute(QurySql)
data = mycursor.fetchall()

default_password = '123456'

for row in data:
    user = zapi.user.get(filter={"alias": row[0]},output='extend')

    if user:
        userid = user[0]["userid"]
        user_defaults = {'userid': userid, "user_medias": [
                {
                    "mediatypeid": "3",
                    "sendto": row[2]
                },
                {
                    "mediatypeid": "4",
                    "sendto": row[3]
                }
            ]}
        zapi.user.update(user_defaults)
        print("更新用户{0}".format(row[1]))

    else:
        user_defaults = {'alias': row[0], 'name': row[1], 'autologin': 1, 'autologout': 0, 'lang': 'zh_CN', 'type': 1,
                         'usrgrps': [{'usrgrpid': 88}], 'passwd': default_password, "user_medias": [
                {
                    "mediatypeid": "3",
                    "sendto": row[2],
                    "active": 0,
                    "severity": 48,
                    "period": "1-7,00:00-24:00"
                },
                {
                    "mediatypeid": "4",
                    "sendto": row[3],
                    "active": 0,
                    "severity": 8,
                    "period": "1-7,00:00-24:00"
                }
            ]}
        zapi.user.create(user_defaults)
        print("新增用户{0}".format(row[1]))

执行效果

[root@zabbix-pri ~]# ./test.py 
更新用户占凯
新增用户王锴

Reference

zabbix告警优化

目前新zabbix系统添加了1300多台监控设备,3W多个触发器,每天的告警也是满天飞,造成了有用的信息通常淹没在了告警风暴当中。由于目前都采用的短信告警,成本上也是一笔不小的开支,所以就很必要对告警进行优化。

告警依赖

有时候一台主机的可用性依赖于另一台主机。如果一台路由器宕机,则路由器后端的服务器将变得不可用。如果这两者都设置了触发器,你可能会收到关于两个主机宕机的通知,然而只有路由器是真正故障的。

这就是主机之间某些依赖关系可能有用的地方,设置依赖关系的通知可能会被抑制,而只发送根本问题的通知。

虽然Zabbix不支持主机之间的直接依赖关系,但是它们可以定义另外一种更加灵活的方式 – 触发器依赖关系。一个触发器可以有一个或多个依赖的触发器。

例如,主机位于路由器2后面,路由器2在路由器1后面。

Zabbix – 路由器1 – 路由器2 – 主机
如果路由器1宕机,显然主机和路由器2也不可达,然而我们不想收到主机、路由器1和路由器2都宕机的3条通知。

因此,在这种情况下我们定义了两个依赖关系:

‘主机宕机’ 触发器依赖于 ‘路由器2宕机’ 触发器
‘路由器2宕机’ 触发器依赖于 ‘路由器1宕机’ 触发器
在改变“主机宕机”触发器的状态之前,Zabbix将会检查相应触发器的依赖关系,如果找到,并且一个触发器处于“异常”状态,则触发器状态不会发生改变,因此不会执行动作,也不会发送通知。

Zabbix递归执行此检查,如果路由器1或路由器2是不可达的状态,那么主机触发器则不会更新。

所以根据zabbix提供的这个功能,对部分有依赖性的触发器之间做了关联,可以减少一部分告警。

告警升级

我给每个不同系统的群组分配了不同的权限,每个人收到的告警相对不会太多,但是作为领导或者权限更大的人来说,收到的告警数量数十倍增加,这个时候就需要对告警进行升级配置。

第一次出现告警的时候消息只发给一线运维人员,如果5分钟还没处理完才会发给级别或权限更大的人员。

告警去重合并

我实际工作中碰过很多次因为网络抖动造成的大面积agent无法访问的告警,或者因为某些原因服务无法访问但是下一次监控周期即恢复正常的情况,这样会导致大量的误报。

针对于此,上面的两种办法都不能有效的应对这种场景,于是想到通过数据库的层面进行一些处理,主要思路是将生成的alert写库以后,再定期写入到一张临时表当中,对于告警发送周期范围内已经恢复正常的告警直接删除。对于agent大面积无法访问的误报进行告警合并。

对于动作的内容有点要求,这里用#符号作为分隔,方便数据库层面去处理字符串,只用配置subject即可

#Operations:
{EVENT.NAME}#{HOST.NAME}#{HOST.IP}#{ITEM.NAME}:{ITEM.VALUE}#{EVENT.DATE} {EVENT.TIME}

#Recovery operations:
{EVENT.NAME}#{HOST.NAME}#{HOST.IP}#{ITEM.NAME}:{ITEM.VALUE}#{EVENT.RECOVERY.DATE} {EVENT.RECOVERY.TIME}

新建两张表

CREATE TABLE `his_alerts` (
  `eventid` bigint(20) DEFAULT NULL,
  `clock` int(20) DEFAULT NULL,
  `sendto` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `subject` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `p_eventid` bigint(20) DEFAULT NULL,
  `status` varchar(2) COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

create index idx_his_alerts_eventid on his_alerts(eventid);

CREATE TABLE `tmp_alerts` (
  `eventid` bigint(20) DEFAULT NULL,
  `clock` int(20) DEFAULT NULL,
  `sendto` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `subject` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `p_eventid` bigint(20) DEFAULT NULL,
  `status` varchar(2) COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

CREATE TABLE `all_alerts` (
  `sendto` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `subject` varchar(255) COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

在mysql库中新建存储过程

drop PROCEDURE IF EXISTS resize_alerts;
delimiter $$
CREATE PROCEDURE resize_alerts ( ) 
BEGIN
    DECLARE
        v_eventid INT ( 20 );
    DECLARE
        v_cnt INT ( 2 );
    DECLARE
        v_status INT ( 2 );
    DECLARE
        v_sendto VARCHAR ( 255 );
    declare v_name varchar(128);
    DECLARE
        loop_done INT DEFAULT 0;

    start transaction;
    # 归档告警数据
    INSERT INTO his_alerts SELECT
    * 
    FROM
        tmp_alerts;

    # 清空临时表
    TRUNCATE TABLE tmp_alerts;
    truncate table all_alerts;

    # 初始化要处理的告警数据
    INSERT INTO tmp_alerts 
    SELECT
        a.eventid,
        a.clock,
        a.sendto,
        a.`subject`,
        a.p_eventid,
    CASE
            WHEN a.p_eventid IS NULL THEN
            1 ELSE 0 
    END STATUS 
    FROM
        alerts a,
        `events` b 
    WHERE a.clock > UNIX_TIMESTAMP( date_add( now( ), INTERVAL - 2 minute ) ) 
      AND a.eventid not in ( SELECT eventid FROM his_alerts )
        AND STATUS = 1 
        AND mediatypeid = 3 
        AND a.eventid = b.eventid 
        order by 1;


# 删除一分钟内已经恢复的告警
    delete from tmp_alerts where eventid in(select eventid from (       
SELECT a.eventid FROM tmp_alerts a WHERE eventid IN ( SELECT p_eventid FROM tmp_alerts )
union all
SELECT a.eventid FROM tmp_alerts a WHERE p_eventid IN ( SELECT eventid FROM tmp_alerts )) aa);

# 合并多个agent无法访问的告警
    INSERT INTO all_alerts (sendto, subject)
    SELECT
    sendto,
    concat( CASE WHEN STATUS = '0' THEN '【恢复OK】有大量zabbix客户端恢复访问! ' ELSE '【故障PROBLEM】有大量zabbix客户端无法访问,请检查网络! ' END, '主机数量:', cnt ) 
    FROM
        (
        SELECT
            sendto,
            STATUS,
            count( 1 ) cnt 
        FROM
            tmp_alerts 
        WHERE
            `subject` = '%Zabbix agent 无法访问!%' 
        GROUP BY
            sendto,
        STATUS 
        HAVING
            count( 1 ) > 2 
        ) aa;

DELETE from tmp_alerts where eventid in(select eventid from (   
    SELECT EVENTID 
    FROM
        tmp_alerts a 
    WHERE
        EXISTS (
        SELECT
            1 
        FROM
            ( SELECT sendto, STATUS FROM tmp_alerts WHERE `subject` ='Zabbix agent 无法访问!' GROUP BY sendto, STATUS HAVING count( 1 ) > 2 ) b 
        WHERE
            a.sendto = b.sendto 
            AND a.STATUS = b.STATUS 
        ) 
        AND a.`subject` ='Zabbix agent 无法访问!') aa);

    # 合并相同主机告警

    insert into all_alerts (sendto, subject, eventtime)
    select sendto,subject,substr(`subject`,instr(subject,'告警时间:')+5) eventtime from (
    select sendto,
                    concat(case  when status='0' then '【恢复OK】: ' else '【故障PROBLEM】: ' end ,triggername,
                    ' 主机:',hostname,
                    ' 问题详情:',itemvalue,
                    ' 告警时间:',eventtime) subject
    from (select sendto,group_concat(triggername order by eventtime) triggername,hostname,ip,min(itemvalue) itemvalue,min(eventtime) eventtime,status from (
                SELECT
                        a.sendto,
                        substring_index(subject,'#',1) triggername,
                        substring_index(substring_index(subject,'#',2),'#',-1) hostname,
                        substring_index(substring_index(subject,'#',3),'#',-1) ip,
                        substring_index(substring_index(subject,'#',4),'#',-1) itemvalue,
                        substring_index(subject,'#',-1) eventtime,
                        a.STATUS 
                    FROM
                        tmp_alerts a) dd
        group by sendto,hostname,ip,status order by eventtime) aa ) bb;


    # 归档已发送短信数据
    insert into all_alerts_history
    select * from all_alerts;

    commit;

END $$

主机上通过python脚本按每分钟定期调用resize_alerts过程,遍历数据然后写入到短信库内

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2019-12-10
# @Author  : Xiong Bin (i@xbdba.com)
# @Link    : http://www.xbdba.com
# @Name    : sendsms.py

import pymysql
import pymssql

mysql_conn = pymysql.connect("127.0.0.1","zabbix","zabbix","zabbix" )
mycursor = mysql_conn.cursor()

ms_conn = pymssql.connect(IP, USERNAME, PASSWORD, DATABASENAME)
mscursor = ms_conn.cursor()

mycursor.callproc('resize_alerts')
QurySql ="""select sendto,subject from all_alerts order by sendto,eventtime"""
mycursor.execute(QurySql)
data = mycursor.fetchall()

sql = "insert into 短信TABLE values (%s, %s, 'Zabbix', 0)"
for row in data:
        mscursor.execute(sql, (row[0], unicode(row[1], 'utf-8')))
ms_conn.commit()

定时任务

*/1  *  *  *  *  /usr/lib/zabbix/alertscripts/sendsms.py 2>&1

升级zabbix3.2到4.2

以前源码安装的3.2版本现在要升级到最新4.2版本,由于不是在原有基础上进行的升级,所以无法直接yum upgrade,只能在新环境上安装完毕以后进行迁移。

zabbix安装

基于之前配置好的本地源,这里就可以很方便的用到yum

yum install zabbix-server-mysql zabbix-web-mysql -y

安装完毕以后zabbix默认采用的是mariadb,这里换成以前用的mysql

yum remove mariadb-server -y
yum install mysql-server -y

如果是初始化的数据库,则执行下面语句导入数据库

zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -p zabbix

-- for proxy
zcat /usr/share/doc/zabbix-proxy-mysql*/schema.sql.gz | mysql -uzabbix -p zabbix

这里zabbix很简单就安装完毕了。

Zabbix Proxy 安装

yum install zabbix-proxy-mysql -y
yum install mysql-community-server -y

systemctl start mysqld
systemctl enable mysqld

systemctl enable zabbix-proxy
systemctl start zabbix-proxy

初始化mysql并设置root密码

mysql_secure_installation

创建数据库

create database zabbix_proxy  character set utf8 collate utf8_bin;
grant all privileges on zabbix_proxy.* to zabbix@'%' identified by 'zabbix';

flush privileges;

导入proxy数据

zcat /usr/share/doc/zabbix-proxy-mysql*/schema.sql.gz | mysql -uzabbix -p zabbix_proxy

修改配置文件/etc/zabbix/zabbix_proxy.conf

Server=<zabbix server ip>
Hostname=zabbix_proxy
DBName=zabbix_proxy
DBUser=zabbix
DBPassword=zabbix

配置 Zabbix Server Proxy

Administration -> Proxies -> Create proxy
Proxy name: zabbix_proxy
Proxy mode: Active
Proxy address: <zabbix proxy ip>

修改 zabbix_agent 配置指向 zabbix_proxy
vim /etc/zabbix/zabbix_agentd.conf

Server=<zabbix proxy ip>,<zabbix server ip>
ServerActive=<zabbix proxy ip>
Hostname= <hostname>

在 zabbix_server 中添加代理主机

Configuration -> Hosts -> Create host
Monitored by proxy: zabbix_proxy

备份原zabbix数据

停掉zabbix

systemctl stop zabbix-server
systemctl stop zabbix-proxy

备份zabbix数据库

mysqldump -uroot -p -B zabbix --force --opt |gzip > zabbix.sql.gz

备份配置文件, PHP文件和Zabbix二进制文件
— 配置文件

mkdir /opt/zabbix-backup/
cp /etc/zabbix/zabbix_server.conf /opt/zabbix-backup/
cp /etc/httpd/conf.d/zabbix.conf  /opt/zabbix-backup/

— php文件和zabbix文件

cp -R /usr/share/zabbix/ /opt/zabbix-backup/
cp -R /usr/share/doc/zabbix-* /opt/zabbix-backup/

如果只是原基础上升级

yum upgrade zabbix-server-mysql zabbix-web-mysql zabbix-agent

--启动zabbix服务
systemctl start zabbix-server
systemctl start zabbix-proxy
systemctl start zabbix-agent

数据导入zabbix4.2

停掉zabbix4.2

systemctl stop zabbix-server
systemctl stop zabbix-proxy

导入mysql数据

mysql -uroot -p123456 < /root/zabbix_0527.sql

重新启动zabbix4.2

-- 启动zabbix服务
systemctl start zabbix-server
systemctl start zabbix-proxy
systemctl start zabbix-agent

在zabbix server启动的过程中,会自动判断当前数据库的版本,如果发现不一致,则会对其自动升级,更新相关表结构

30044:20190527:214417.505 using configuration file: /etc/zabbix/zabbix_server.conf
 30044:20190527:214417.511 current database version (mandatory/optional): 04020000/04020000
 30044:20190527:214417.511 required mandatory version: 04020000
 30044:20190527:214418.763 starting event name update forced by database upgrade
 30044:20190527:214418.804 completed 1% of event name update
 30044:20190527:214418.805 completed 2% of event name update
 30044:20190527:214418.805 completed 3% of event name update
 30044:20190527:214418.806 completed 4% of event name update
 30044:20190527:214418.806 completed 5% of event name update
 30044:20190527:214418.807 completed 6% of event name update
 30044:20190527:214418.808 completed 7% of event name update
 30044:20190527:214418.808 completed 8% of event name update
 30044:20190527:214418.809 completed 9% of event name update
 30044:20190527:214418.809 completed 10% of event name update
 30044:20190527:214418.810 completed 11% of event name update
 30044:20190527:214418.811 completed 12% of event name update
 30044:20190527:214419.259 completed 13% of event name update
 30044:20190527:214419.260 completed 14% of event name update
 30044:20190527:214419.260 completed 15% of event name update
 30044:20190527:214419.261 completed 16% of event name update
 30044:20190527:214419.261 completed 17% of event name update
 30044:20190527:214419.262 completed 18% of event name update
 30044:20190527:214419.263 completed 19% of event name update
 30044:20190527:214419.263 completed 20% of event name update
 30044:20190527:214419.264 completed 21% of event name update
 30044:20190527:214419.264 completed 22% of event name update
 30044:20190527:214419.265 completed 23% of event name update
 30044:20190527:214419.658 completed 24% of event name update
 30044:20190527:214419.658 completed 25% of event name update
 --- more

 30044:20190527:214423.599 completed 100% of event name update
 30044:20190527:214423.799 event name update completed

优化zabbix server配置

vim /etc/php.ini修改下面参数

max_execution_time = 300
post_max_size = 16M
max_input_vars = 10000
always_populate_raw_post_data = -1
date.timezone = Asia/Shanghai

-- 重启服务
service httpd restart

vim /etc/zabbix/zabbix_server.conf

StartPollers=160
StartTrappers=20
StartPingers=100
StartDiscoverers=20
StartDBSyncers=16
HistoryCacheSize=1024M
TrendCacheSize=1024M
Timeout=30
ValueCacheSize=512M

-- 重启服务
service zabbix-server restart

vim /etc/my.cnf,个人使用的配置

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/run/mysqld/mysqld.pid

# replication
server-id=1
binlog_format = row
expire_logs_days = 30
gtid_mode = ON
enforce_gtid_consistency = ON
master-verify-checksum = 1
log-slave-updates = ON
log-bin = /var/lib/mysql/mysql-bin
log_bin_index = /var/lib/mysql/mysql-bin.index

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# logs
log-error=/var/log/mysqld.log
slow_query_log_file= /var/log/mysql-slow.log
slow_query_log= 1
long_query_time= 20

key_buffer_size = 256M
table_open_cache = 512
sort_buffer_size = 16M
net_buffer_length = 8K
read_buffer_size = 16M
read_rnd_buffer_size = 512K
thread_cache_size = 64

# network
connect_timeout= 60
wait_timeout= 28800
max_connections= 2048
max_allowed_packet= 64M
max_connect_errors= 1000

# limits
tmp_table_size= 512M
max_heap_table_size= 256M

# innodb
innodb_buffer_pool_size = 10G #一般设为服务器物理内存的70%-80%
innodb_log_file_size = 512M
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table=1
innodb_flush_method= O_DIRECT
innodb_io_capacity= 2000

# other stuff
event_scheduler= 1
query_cache_type= 0
group_concat_max_len = 1000000
tmpdir = /tmp/mysqltmp
log_timestamps= SYSTEM

-- 重启服务
service mysqld restart

Mysql优化

zabbix一般来说最大的瓶颈都在于数据库层面,大量数据的读写导致压力很大,所以可以历史数据表进行分区处理。

由于即使关闭了前端的housekeeping,zabbix server依旧会写相关信息到housekeeper表中,所以将其关闭

ALTER TABLE housekeeper ENGINE = BLACKHOLE;

首先需要对7张表做一个分区的初始化操作。

ALTER TABLE `history` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_log` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_str` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_text` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);


ALTER TABLE `history_uint` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);


ALTER TABLE `trends` PARTITION BY RANGE ( clock)
(PARTITION p2019_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-07-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_07 VALUES LESS THAN (UNIX_TIMESTAMP("2019-08-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_08 VALUES LESS THAN (UNIX_TIMESTAMP("2019-09-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_09 VALUES LESS THAN (UNIX_TIMESTAMP("2019-10-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_10 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11 VALUES LESS THAN (UNIX_TIMESTAMP("2019-12-01 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `trends_uint` PARTITION BY RANGE ( clock)
(PARTITION p2019_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-07-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_07 VALUES LESS THAN (UNIX_TIMESTAMP("2019-08-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_08 VALUES LESS THAN (UNIX_TIMESTAMP("2019-09-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_09 VALUES LESS THAN (UNIX_TIMESTAMP("2019-10-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_10 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11 VALUES LESS THAN (UNIX_TIMESTAMP("2019-12-01 00:00:00")) ENGINE = InnoDB);

这里采用外部perl脚本,首先需要安装相关依赖

yum install perl-Sys-Syslog perl-DateTime -y

cat part.pl

#!/usr/bin/perl

use strict;
use Data::Dumper;
use DBI;
use Sys::Syslog qw(:standard :macros);
use DateTime;
use POSIX qw(strftime);

openlog("mysql_zbx_part", "ndelay,pid", LOG_LOCAL0);

my $db_schema = 'zabbix';
my $dsn = 'DBI:mysql:'.$db_schema.':mysql_socket=/var/lib/mysql/mysql.sock';
my $db_user_name = 'zabbix';
my $db_password = 'zabbix';
my $tables = {  'history' => { 'period' => 'day', 'keep_history' => '30'},
        'history_log' => { 'period' => 'day', 'keep_history' => '30'},
        'history_str' => { 'period' => 'day', 'keep_history' => '30'},
        'history_text' => { 'period' => 'day', 'keep_history' => '30'},
        'history_uint' => { 'period' => 'day', 'keep_history' => '30'},
        'trends' => { 'period' => 'month', 'keep_history' => '2'},
        'trends_uint' => { 'period' => 'month', 'keep_history' => '2'},

# comment next 5 lines if you partition zabbix database starting from 2.2
# they usually used for zabbix database before 2.2

#      'acknowledges' => { 'period' => 'month', 'keep_history' => '23'},
#      'alerts' => { 'period' => 'month', 'keep_history' => '6'},
#      'auditlog' => { 'period' => 'month', 'keep_history' => '24'},
#      'events' => { 'period' => 'month', 'keep_history' => '12'},
#      'service_alarms' => { 'period' => 'month', 'keep_history' => '6'},
        };
my $amount_partitions = 10;

my $curr_tz = 'Asia/Shanghai';

my $part_tables;

my $dbh = DBI->connect($dsn, $db_user_name, $db_password, {'ShowErrorStatement' => 1});

unless ( check_have_partition() ) {
    print "Your installation of MySQL does not support table partitioning.\n";
    syslog(LOG_CRIT, 'Your installation of MySQL does not support table partitioning.');
    exit 1;
}

my $sth = $dbh->prepare(qq{SELECT table_name, partition_name, lower(partition_method) as partition_method,
                    rtrim(ltrim(partition_expression)) as partition_expression,
                    partition_description, table_rows
                FROM information_schema.partitions
                WHERE partition_name IS NOT NULL AND table_schema = ?});
$sth->execute($db_schema);

while (my $row =  $sth->fetchrow_hashref()) {
    $part_tables->{$row->{'table_name'}}->{$row->{'partition_name'}} = $row;
}

$sth->finish();

foreach my $key (sort keys %{$tables}) {
    unless (defined($part_tables->{$key})) {
        syslog(LOG_ERR, 'Partitioning for "'.$key.'" is not found! The table might be not partitioned.');
        next;
    }

    create_next_partition($key, $part_tables->{$key}, $tables->{$key}->{'period'});
    remove_old_partitions($key, $part_tables->{$key}, $tables->{$key}->{'period'}, $tables->{$key}->{'keep_history'})
}

delete_old_data();

$dbh->disconnect();

sub check_have_partition {
    my $result = 0;
# MySQL 5.5
    #my $sth = $dbh->prepare(qq{SELECT variable_value FROM information_schema.global_variables WHERE variable_name = 'have_partitioning'});
# MySQL 5.6
    my $sth = $dbh->prepare(qq{SELECT plugin_status FROM information_schema.plugins WHERE plugin_name = 'partition'});

    $sth->execute();

    my $row = $sth->fetchrow_array();

    $sth->finish();

# MySQL 5.5
    #return 1 if $row eq 'YES';
# MySQL 5.6
    return 1 if $row eq 'ACTIVE';
}

sub create_next_partition {
    my $table_name = shift;
    my $table_part = shift;
    my $period = shift;

    for (my $curr_part = 0; $curr_part < $amount_partitions; $curr_part++) {
        my $next_name = name_next_part($tables->{$table_name}->{'period'}, $curr_part);
        my $found = 0;

        foreach my $partition (sort keys %{$table_part}) {
            if ($next_name eq $partition) {
                syslog(LOG_INFO, "Next partition for $table_name table has already been created. It is $next_name");
                $found = 1;
            }
        }

        if ( $found == 0 ) {
            syslog(LOG_INFO, "Creating a partition for $table_name table ($next_name)");
            my $query = 'ALTER TABLE '."$db_schema.$table_name".' ADD PARTITION (PARTITION '.$next_name.
                        ' VALUES less than (UNIX_TIMESTAMP("'.date_next_part($tables->{$table_name}->{'period'}, $curr_part).'") div 1))';
            syslog(LOG_DEBUG, $query);
            $dbh->do($query);
        }
    }
}

sub remove_old_partitions {
    my $table_name = shift;
    my $table_part = shift;
    my $period = shift;
    my $keep_history = shift;

    my $curr_date = DateTime->now;
    $curr_date->set_time_zone( $curr_tz );

    if ( $period eq 'day' ) {
        $curr_date->add(days => -$keep_history);
        $curr_date->add(hours => -$curr_date->strftime('%H'));
        $curr_date->add(minutes => -$curr_date->strftime('%M'));
        $curr_date->add(seconds => -$curr_date->strftime('%S'));
    }
    elsif ( $period eq 'week' ) {
    }
    elsif ( $period eq 'month' ) {
        $curr_date->add(months => -$keep_history);

        $curr_date->add(days => -$curr_date->strftime('%d')+1);
        $curr_date->add(hours => -$curr_date->strftime('%H'));
        $curr_date->add(minutes => -$curr_date->strftime('%M'));
        $curr_date->add(seconds => -$curr_date->strftime('%S'));
    }

    foreach my $partition (sort keys %{$table_part}) {
        if ($table_part->{$partition}->{'partition_description'} <= $curr_date->epoch) {
            syslog(LOG_INFO, "Removing old $partition partition from $table_name table");

            my $query = "ALTER TABLE $db_schema.$table_name DROP PARTITION $partition";

            syslog(LOG_DEBUG, $query);
            $dbh->do($query);
        }
    }
}

sub name_next_part {
    my $period = shift;
    my $curr_part = shift;

    my $name_template;

    my $curr_date = DateTime->now;
    $curr_date->set_time_zone( $curr_tz );

    if ( $period eq 'day' ) {
        my $curr_date = $curr_date->truncate( to => 'day' );
        $curr_date->add(days => 1 + $curr_part);

        $name_template = $curr_date->strftime('p%Y_%m_%d');
    }
    elsif ($period eq 'week') {
        my $curr_date = $curr_date->truncate( to => 'week' );
        $curr_date->add(days => 7 * $curr_part);

        $name_template = $curr_date->strftime('p%Y_%m_w%W');
    }
    elsif ($period eq 'month') {
        my $curr_date = $curr_date->truncate( to => 'month' );
        $curr_date->add(months => 1 + $curr_part);

        $name_template = $curr_date->strftime('p%Y_%m');
    }

    return $name_template;
}

sub date_next_part {
    my $period = shift;
    my $curr_part = shift;

    my $period_date;

    my $curr_date = DateTime->now;
    $curr_date->set_time_zone( $curr_tz );

    if ( $period eq 'day' ) {
        my $curr_date = $curr_date->truncate( to => 'day' );
        $curr_date->add(days => 2 + $curr_part);
        $period_date = $curr_date->strftime('%Y-%m-%d');
    }
    elsif ($period eq 'week') {
        my $curr_date = $curr_date->truncate( to => 'week' );
        $curr_date->add(days => 7 * $curr_part + 1);
        $period_date = $curr_date->strftime('%Y-%m-%d');
    }
    elsif ($period eq 'month') {
        my $curr_date = $curr_date->truncate( to => 'month' );
        $curr_date->add(months => 2 + $curr_part);

        $period_date = $curr_date->strftime('%Y-%m-%d');
    }

    return $period_date;
}

sub delete_old_data {
    $dbh->do("DELETE FROM sessions WHERE lastaccess < UNIX_TIMESTAMP(NOW() - INTERVAL 1 MONTH)");
    $dbh->do("TRUNCATE housekeeper");
    $dbh->do("DELETE FROM auditlog_details WHERE NOT EXISTS (SELECT NULL FROM auditlog WHERE auditlog.auditid = auditlog_details.auditid)");
}

创建定时任务

chmod +x part.pl

[root@zabbix-pri ~]# crontab -l
0 3 * * * /root/part.pl

查看执行后的结果

mysql> show create table history;
+---------+-------------------------------------------------+
| Table   | Create Table                                                                                                                                                     
+---------+-------------------------------------------------+
| history | CREATE TABLE `history` (
  `itemid` bigint(20) unsigned NOT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `value` double(16,4) NOT NULL DEFAULT '0.0000',
  `ns` int(11) NOT NULL DEFAULT '0',
  KEY `history_1` (`itemid`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`clock`)
(PARTITION p2019_06_02 VALUES LESS THAN (1559491200) ENGINE = InnoDB,
 PARTITION p2019_06_03 VALUES LESS THAN (1559577600) ENGINE = InnoDB,
 PARTITION p2019_06_04 VALUES LESS THAN (1559664000) ENGINE = InnoDB,
 PARTITION p2019_06_05 VALUES LESS THAN (1559750400) ENGINE = InnoDB,
 PARTITION p2019_06_06 VALUES LESS THAN (1559836800) ENGINE = InnoDB,
 PARTITION p2019_06_07 VALUES LESS THAN (1559923200) ENGINE = InnoDB,
 PARTITION p2019_06_08 VALUES LESS THAN (1560009600) ENGINE = InnoDB,
 PARTITION p2019_06_09 VALUES LESS THAN (1560096000) ENGINE = InnoDB,
 PARTITION p2019_06_10 VALUES LESS THAN (1560182400) ENGINE = InnoDB,
 PARTITION p2019_06_11 VALUES LESS THAN (1560268800) ENGINE = InnoDB) */ |

proxy mysql优化

对于proxy端的mysql来说,主要是proxy_history表很大,所以做个定时任务清理,同样换成分区表

drop table proxy_history;

 CREATE TABLE `proxy_history` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `itemid` bigint(20) unsigned NOT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `timestamp` int(11) NOT NULL DEFAULT '0',
  `source` varchar(64) COLLATE utf8_bin NOT NULL DEFAULT '',
  `severity` int(11) NOT NULL DEFAULT '0',
  `value` longtext COLLATE utf8_bin NOT NULL,
  `logeventid` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  `state` int(11) NOT NULL DEFAULT '0',
  `lastlogsize` bigint(20) unsigned NOT NULL DEFAULT '0',
  `mtime` int(11) NOT NULL DEFAULT '0',
  `flags` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`,clock)
) ENGINE=InnoDB AUTO_INCREMENT=242428784 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY RANGE ( clock)
(PARTITION p2019_11_19 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-20 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_20 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-21 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_21 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-22 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_22 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-23 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_23 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_24 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-25 00:00:00")) ENGINE = InnoDB)

其他的跟上面类似,也是通过part.pl脚本来进行分区的处理。