奇怪的ZBX_NOTSUPPORTED问题一则

在监控linux硬件信息的时候,通过system.hw.chassis[serial]监控项来获取设备序列号。本机通过zabbix_agentd命令测试正常

1
2
3
4
[root@localhost ~]# zabbix_agentd -t system.sw.os[name]
system.sw.os[name] [s|Oracle Linux Server 7.8]
[root@localhost ~]# zabbix_agentd -t system.hw.chassis[model]
system.hw.chassis[model] [s|ProLiant DL360 Gen10]

但是在proxy服务器上通过zabbix_get命令获取数据时却发现监控项不支持

1
2
3
4
[root@zabbix-proxy ~]# zabbix_get -s 192.168.2.11 -k system.sw.os[name]
Oracle Linux Server 7.8
[root@zabbix-proxy ~]# zabbix_get -s 192.168.2.11 -k system.hw.chassis[model]
ZBX_NOTSUPPORTED: Cannot obtain hardware information.
阅读更多

Zabbix Logrt Monitoring Windows Logs

有些应用程序会每天生成一个当天日期命名的日志文件,对应日志文件里面出现的报错信息要进行实时监控,实际的过程中遇到了很多坑,总结记录下来,供以后参考。

监控设备是多台win机器,而每个win机器上有多个路径不一的日志目录,目录都含有中文。每个目录下每天都会生成一个新的日期命名的log文件,而目前所要做的是对每个日志文件中的多个关键字进行监控告警。

有几个需要思考的地方:

  • 多个机器多个目录肯定是需要通过模板和自动发现的方式
  • 涉及到win机器并且包含中文,涉及到字符集转码的问题
  • 多个关键字正则表达式处理
  • 如何最简化部署的工作

配置自动发现

这里选择了一种比较简单的方式,通过自带的system.run进行文件路径的获取,将每台设备的目录手动记录在一个文本文件当中,然后通过system.run的方式去读取到,接着通过preprocessing的js进行处理生成json格式的返回结果,然后将每个目录路径作为变量来交给后面的步骤。这样就不用在每台机器上去部署脚本和新增userparameter,简化操作。

基于system.run key创建发现规则:

阅读更多

Zabbix监控GBK字符集的oracle

监控oracle有很多方式,目前主要使用的是通过第三方软件orabbix,它是通过jdbc连到各个oracle数据库上去执行sql,效率还可以,只是目前有个新需求,有个业务监控返回结果必须含有中文,而orabbix中无法配置字符集 ,导致存放到zabbix中时就会乱码,所以只能换一种方式,采用自带的database monitor来监控。整个配置过程很简单,主要是记录下字符集的转换配置。

Linux安装oracle客户端

下载linux客户端rpm包

1
2
3
4
5
6
[root@localhost oracle linux client]# ll
total 54436
-rw-r--r-- 1 root root 54172532 Jul 23 16:30 oracle-instantclient19.8-basic-19.8.0.0.0-1.x86_64.rpm
-rw-r--r-- 1 root root 612416 Jul 23 16:29 oracle-instantclient19.8-devel-19.8.0.0.0-1.x86_64.rpm
-rw-r--r-- 1 root root 246760 Jul 23 16:29 oracle-instantclient19.8-odbc-19.8.0.0.0-1.x86_64.rpm
-rw-r--r-- 1 root root 702872 Jul 23 16:29 oracle-instantclient19.8-sqlplus-19.8.0.0.0-1.x86_64.rpm

安装rpm包

1
rpm -ivh oracle-instantclient19.8-*.rpm

安装完以后会生成对应的oracle目录/usr/lib/oracle/19.8/client64

新建tns目录

1
2
3
[root@zabbix-wh-proxy client64]# mkdir -p network/admin

touch tnsnames.ora

设置环境变量

1
2
3
4
export ORACLE_HOME=/usr/lib/oracle/19.8/client64
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib64:$LD_LIBRARY_PATH
export TNS_ADMIN=$ORACLE_HOME/network/admin
export PATH=$PATH:$ORACLE_HOME/bin:$HOME/bin

Linux安装odbc

1
yum -y install unixODBC unixODBC-devel

配置文件主要为两个,/etc/odbc.ini/etc/odbcinst.ini,后者配置驱动,前者配置数据库参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
cat /etc/odbcinst.ini

[PostgreSQL]
Description=ODBC for PostgreSQL
Driver=/usr/lib/psqlodbcw.so
Setup=/usr/lib/libodbcpsqlS.so
Driver64=/usr/lib64/psqlodbcw.so
Setup64=/usr/lib64/libodbcpsqlS.so
FileUsage=1

[MySQL]
Description=ODBC for MySQL
Driver=/usr/lib/libmyodbc5.so
Setup=/usr/lib/libodbcmyS.so
Driver64=/usr/lib64/libmyodbc5.so
Setup64=/usr/lib64/libodbcmyS.so
FileUsage=1

[MySQL ODBC 8.0 Unicode Driver]
Driver=/usr/lib64/libmyodbc8w.so
UsageCount=1

[MySQL ODBC 8.0 ANSI Driver]
Driver=/usr/lib64/libmyodbc8a.so
UsageCount=1


[FreeTDS]
Driver = /usr/lib64/libtdsodbc.so.0

[OracleDriver]
Description= ODBC for Oracle
Driver= /usr/lib/oracle/19.8/client64/lib/libsqora.so.19.1

添加oracle的驱动到末尾,driver指向lib路径

1
2
3
4
5
6
7
cat /etc/odbc.ini

[test]
Driver = OracleDriver
ServerName = 192.168.2.112:1521/pdb12c
UserID = zabbix
Password = zabbix

测试配置

1
2
3
4
5
6
7
8
9
10
isql test

+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+

这里很简单就配置完了,只需要在web端添加database monitor就可以了,odbc.ini里设置的test就是唯一DSN

字符集问题

zabbix后台存放数据的库是mysql,是utf8字符集,在监控字符集为gbk的oracle时,会碰到一个问题,就是当sql返回结果含有中文时就会乱码,由于我之前所有的oracle监控都是通过orabbix来完成,对于这个问题始终无法解决。而通过odbc监控时,则可以通过修改odbc的配置来实现字符集的转换。

在odbc.ini里添加环境变量参数

1
DMEnvAttr = SQL_ATTR_UNIXODBC_ENVATTR={NLS_LANG=SIMPLIFIED CHINESE.AL32UTF8}

然后重新执行sql,就可以得到正确的中文返回结果。

Zabbix监控ogg延迟情况

最近ogg出现了一点问题,没有及时发现,于是考虑将ogg的监控也纳入zabbix当中来。对于这一类监控,考虑的地方不单单在于如何监控,而是善用zabbix的模板、自动发现等功能来实现,这样会方便配置以及后期的可扩展性。

首先对ogg的运行情况查看通常是通过info all命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
MANAGER     RUNNING                                           
EXTRACT RUNNING EPRISK 00:00:03 00:00:08
EXTRACT RUNNING ERDMEDW 00:00:04 00:00:06
EXTRACT RUNNING ESHDW 00:00:00 00:00:01
EXTRACT RUNNING PCIF 00:00:00 00:00:08
EXTRACT RUNNING PEIF 00:00:00 00:00:08
EXTRACT RUNNING PHG 00:00:00 00:00:08
EXTRACT RUNNING PQH 00:00:00 00:00:06
EXTRACT RUNNING PRISKMGR 00:00:00 00:00:08
EXTRACT RUNNING PSHDW 00:00:00 00:00:06
REPLICAT RUNNING RCIF 00:00:10 00:00:06
REPLICAT RUNNING REDMDB1 00:00:00 00:00:01
REPLICAT RUNNING REDQDB1 00:00:00 00:00:07
REPLICAT RUNNING REIF 00:00:00 00:00:00
REPLICAT RUNNING REPWIND 00:00:00 00:00:06
REPLICAT RUNNING REPWIND2 00:00:00 00:00:00
REPLICAT RUNNING REPWIND3 00:00:00 00:00:05
REPLICAT RUNNING REPWIND4 00:00:00 00:00:04
REPLICAT RUNNING RZNMGR 00:00:00 00:00:00
REPLICAT RUNNING RZNMGRC 00:00:00 00:00:01

这里总共有19个抽取、传输进程,根据实际需求只需要做到分钟级别监控即可,所以筛选出4列,分别为运行状态、进程名、lag时间、time列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
echo "info all"|ggsci|egrep 'EXTRACT|REPLICAT'|awk -F"[ ]+|:" '{print $2,$3,$4*60+$5,$7*60+$8}'

RUNNING EPRISK 0 0
RUNNING ERDMEDW 0 0
RUNNING ESHDW 0 0
RUNNING PCIF 0 0
RUNNING PEIF 0 0
RUNNING PHG 0 0
RUNNING PQH 0 0
RUNNING PRISKMGR 0 0
RUNNING PSHDW 0 0
RUNNING RCIF 0 0
RUNNING REDMDB1 0 0
RUNNING REDQDB1 0 0
RUNNING REIF 0 0
RUNNING REPWIND 0 0
RUNNING REPWIND2 0 0
RUNNING REPWIND3 0 0
RUNNING REPWIND4 0 0
RUNNING RZNMGR 0 0
RUNNING RZNMGRC 0 0

通过shell将获取到的数据写入中间文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
all()
{
source /home/oracle/.bash_profile;echo "info all"|ggsci|egrep 'EXTRACT|REPLICAT'|awk -F"[ ]+|:" '{print $2,$3,$4*60+$5,$7*60+$8}'
}

name()
{
source /home/oracle/.bash_profile;echo "info all"|ggsci|egrep 'EXTRACT|REPLICAT'|awk '{print $3}'
}

a=$1
case "$a" in
all)
all
;;
name)
name
;;
*)
echo -e "Usage: ./`basename $0` [all|name]"
esac

为了满足zabbix自动发现的要求,返回值必须要是json格式,所以可以对上面的数据进行格式化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/env python  
#coding:utf-8
import os, json, sys
f = open("/etc/zabbix/scripts/ogg.cfg","r")
lines = f.readlines()
ogg_list=[]
ogg_dict={"data":None}
for line in lines:
if line.startswith('#')or not line.split():
continue
line=line.strip('\n')
status=line.split()[0]
name=line.split()[1]
lag=line.split()[2]
time=line.split()[3]
pdict={}
pdict["{#NAME}"]=name
pdict["{#STATUS}"]=status
pdict["{#LAG}"]=lag
pdict["{#TIME}"]=time
ogg_list.append(pdict)
f.close()
ogg_dict["data"]=ogg_list
var = sys.argv[1]
var2 = sys.argv[2]
jsonStr = json.dumps(ogg_dict, sort_keys=True, indent=4)
#print jsonStr

def get_lag(name):
l = [x.get("{#LAG}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
print l[0]

def get_time(name):
l = [x.get("{#TIME}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
print l[0]

def get_status(name):
l = [x.get("{#STATUS}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
print l[0]

def get_json():
jsonStr = json.dumps(ogg_dict, sort_keys=True, indent=4)
print jsonStr

def ogg(var, var2):
if var == 'json':
get_json()
else:
if var2 == 'lag':
get_lag(var)
elif var2 == 'status':
get_status(var)
else:
get_time(var)
ogg(var, var2)

最后得到的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
{
"data": [
{
"{#LAG}": "0",
"{#NAME}": "EPRISK",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "ERDMEDW",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "ESHDW",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PCIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PEIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PHG",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PQH",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PRISKMGR",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PSHDW",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "RCIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REDMDB1",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REDQDB1",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND2",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND3",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND4",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "RZNMGR",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "RZNMGRC",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
}
]
}

数据都已经获取到,到这里脚本就准备完毕,需要添加两个自定义key

1
2
UserParameter=ogg.discovery,/etc/zabbix/scripts/ogg_delay.py json test
UserParameter=ogg_delay[*],/etc/zabbix/scripts/ogg_delay.py $1 $2

在zabbix上新建模板,配置宏参数以及新增监控项和触发器等,最后只需要将模板添加到对应监控的主机即可



Zabbix Preprocessing

介绍

监控结果预处理可以用来定义和根据相应的规则来对采集到的数据进行转换,通过预处理管理进程来管理这些预处理,而具体的步骤则是由多个work进程来完成。

Preprocessing页可以对接收到的数据定义转换规则,可以允许一个或多个规则同时存在,之后才将转换后的数据写入到数据库。如果有多个规则存在,则会按照定义的顺序来依次进行转换。Preprocessing可以运行在zabbix server或zabbix proxy上。

为什么需要Preprocessing,因为很多情况下我们获取到的数据是各种各样的格式,并不适合计算、合并或优化存储空间。

Preprocessing支持类型

从3.4版本开始支持preprocessing,原生只支持几种情况:

  • 自定义倍数
  • 数值型转换(布尔、hex等)
  • 简单改变(计算每秒变化等)

如果需要更复杂的转换,则需要自己通过其他工具(python、php)等来处理数据,而后来preprocessing也支持了更多的方法:

  • 正则表达式
  • 简单裁剪
  • XPath和JSONPath支持

随着时间推移,在4.2版本中逐步增加了更多的扩展

  • JavaScript和Prometheus支持
  • Validation-用于验证数据是否符合某些规则
  • Throttling-用于丢弃无用数据
  • 自定义错误处理-为了更好的可读性

zabbix4.4中同样进行了非常多的改进,带来了完全不一样的体验:

  • 通过XPath来处理XML数据
  • 通过JSONPath来抽取和分割JSON数据
  • 扩展自定义错误处理
  • CSV转换到JSON
  • 将WMI,JMX和ODBC数据收集到JSON数组,然后通过JSONPath进行处理

监控值处理

上面这个图示简单的描述了监控值处理过程中对应的步骤,包括进程、对象和动作等,并没有展示条件直接变化、错误处理。

  • 开始于从源获取原始数据,这时数据只包含ID,时间戳和值
  • 不管使用何种类型的数据,对于主动和被动模式或trapper等方式都是一样。它只改变数据的格式,原始数据被验证,并从配置缓存里获取item配置
  • 基于socket的IPC机制用来将数据从数据收集者传输给预处理管理进程。这时数据收集者不用等待预处理的响应,而是继续持续的去收集数据
  • 数据预处理时包含预处理步骤的执行和依赖监控项的运行
  • 预处理管理的本地缓存数据被刷新到历史缓存
  • 这时数据流会暂停直到下一次历史缓存的刷新
  • 同步进程开始于数据规范化存储在数据库中。数据规范化包括转换需要的格式,包括截取文本格式的数据为数据库表字段中可以支持的格式,在做完数据规范化后数据才会被发送到database中
  • 收集的数据也被处理-检查触发器,如果数据不可用则会修改item的状态

Examples

主要介绍几种我在实际工作中常用的几种,其他的可以参考官方文档详细说明。

Text preprocessing

使用正则表达式获取数据,然后从文本格式传输最后存储为number格式。

这里使用正则表达式的时候包含两个参数:

  • pattern-具体的表达式
  • output-输出格式模板,\N(N=1…9)转义会被替换成第N个匹配组,比如\0则会被整个匹配文本所替代

比如一段日志文本里记录的温度Temperature: 50C,作为监控需要提取其中的数字50作为最后的存储,于是这里则可以使用正则表达

如果想将字符格式的值转换成数字型,可以通过trim来清除文本中的字符。

JSON数据抽取

某个程序将进程所占用的资源情况全部记录在了日志当中,格式为

1
{"proc":"TEST","cpu":3.6,"mem":5559784,"vir_mem":14303532,"net_in":2,"net_out":0}

zabbix通过监控得到了这个json串,但是对于数据的存放、数据的二次分析以及触发器的配置都是很不方便的,所以需要将json串的里每个值都拆开作为单独的多个监控,这里就可以用到JSONPath和dependent item。

新建一个测试监控项,插入一条数据

新建dependent item

在preprocessing tab页,增加JSONPath处理

生成的结果

这样对于每次监控的JSON结果获取到以后,则会自动生成多个拆分出来的单项监控,便于后期处理。

jsonpath的详细用法参考:https://www.zabbix.com/documentation/current/manual/appendix/preprocessing/jsonpath_functionality

自定义倍数

custom multipliers很好理解,就是简单的乘以一个指定的倍数即可,常见的场景是在转换bytes到bits,或者得到的内存单位是KB,转换成B存储到zabbix里等。

throttling

这个功能主要是为了提高性能和节省数据库存储空间,特别是在一些高频的场景当中,对于那些监控得到的数据基本都是重复的情况下非常有用。

举个例子,当监控zabbix agent心跳网络时,我们需要比较高的频次,然后绝大部分情况下获取到的数据都是一模一样,通过throttling,则可以将重复的数据都过滤掉,只存储监控值产生变化的情况。

throttling也分为两种模式:

  • 丢弃未变化——如果监控值没有变化则丢弃掉,这个数据也不会保存到数据库,zabbix server也不知道获取到过这个值。触发器也不会生成相应的告警和恢复,因为触发器只会基于存储在数据库里的值起作用。trends也是基于存储在数据库里的值建立,每个item只能有一个throttling选项
  • 丢弃未变化(心跳模式)——在指定时间范围内丢弃没有变化的监控值,达到心跳时间时会仍然存储数据到数据库

下面用图示来表达,监控项不断的进行采集,获取到的数据分布为0和1

未使用throttling的情况:所有的数据都被存储到数据库里

使用throttling,重复的数据则被丢掉

增加心跳选项,可以避免nodata类型的触发器漏报

总结

通常有四大便利是你要考虑使用预处理的原因

  • 自动化
    1. 通过主监控项进行LLD
    2. 通过预处理发现指标
  • 提升
    1. 丢弃重复数据
    2. 4.4版本上直接在proxy层做预处理工作
  • 自定义化
    1. 数据验证
    2. 自定义高级预处理规则
  • 转换
    1. 转换数据
    2. 通过预处理进行数据合并和计算

Zabbix API批量新增用户

给zabbix配置完LDAP验证,但是需要提前在zabbix中建好用户,这个就会比较麻烦一点,AD域中的用户好几千人,而涉及到IT的也有2 300人,加上配置邮箱、手机号等,很明显工作量巨大。

由于对python和LDAP都不太熟悉,研究了几天没想出如何从直接从ldap将用户信息同步到zabbix,但刚好在另外一套系统中发现了有将ad域账号落地到数据库的情况,于是退而求其次,从这个数据库取得数据,再参考zabbix api的官方文档,完成了想要的效果。

这里采用python3的版本,通过pyzabbix库进行调用。

参考pyzabbix文档的语法

1
2
3
4
5
6
7
8
from pyzabbix import ZabbixAPI

zapi = ZabbixAPI("http://zabbixserver.example.com")
zapi.login("zabbix user", "zabbix pass")
print("Connected to Zabbix API Version %s" % zapi.api_version())

for h in zapi.host.get(output="extend"):
print(h['hostid'])

这里存放的临时表,很简单通过pymysql获取

1
2
3
QurySql ="""select * from tmp_user"""
mycursor.execute(QurySql)
data = mycursor.fetchall()

总共取出4个字段,登录账号、别名、手机号、邮箱地址

获取已有账号信息

第一步就是要登陆到zabbix,获取系统里所有的user信息。

请求对象

1
2
3
4
5
6
7
8
9
{
"jsonrpc": "2.0",
"method": "user.get",
"params": {
"output": "extend"
},
"auth": "038e1d7b1735c6a5436ee9eae095879e",
"id": 1
}

返回值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"jsonrpc": "2.0",
"result": [
{
"userid": "1",
"alias": "Admin",
"name": "Zabbix",
"surname": "Administrator",
"url": "",
"autologin": "1",
"autologout": "0s",
"lang": "ru_RU",
"refresh": "0s",
"type": "3",
"theme": "default",
"attempt_failed": "0",
"attempt_ip": "",
"attempt_clock": "0",
"rows_per_page": "50"
}
],
"id": 1
}

因为alias这里是唯一索引,所以调用user.get方法的时候通过alias进行判断,如果已存在则更新信息,如果不存在则新增用户。

1
user = zapi.user.get(filter={"alias": row[0]},output='extend')

这里可以根据传入的alias值获取到账户的所有属性,还有个比较重要的返回值叫userid,这个是要作为后面更新的时候的入参。

更新用户

根据前面得到的userid,这里只在属性里配置手机号和邮件地址,这样调用的时候就只会更新配置了的属性,其他保持不变

1
2
3
4
5
6
7
8
9
10
user_defaults = {'userid': user.userid, "user_medias": [
{
"mediatypeid": "3",
"sendto": row[2]
},
{
"mediatypeid": "4",
"sendto": row[3]
}
]}

新增用户

新增用户的内容就比较多一点,会设置短信和邮件的告警级别,和账号的一些其他属性默认密码等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
user_defaults = {'alias': row[0], 'name': row[1], 'autologin': 1, 'autologout': 0, 'lang': 'zh_CN', 'type': 1,
'usrgrps': [{'usrgrpid': 88}], 'passwd': default_password, "user_medias": [
{
"mediatypeid": "3",
"sendto": row[2],
"active": 0,
"severity": 48,
"period": "1-7,00:00-24:00"
},
{
"mediatypeid": "4",
"sendto": row[3],
"active": 0,
"severity": 8,
"period": "1-7,00:00-24:00"
}
]}

完整脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/usr/bin/env python3
import pymysql
from pyzabbix import ZabbixAPI, ZabbixAPIException

mysql_conn = pymysql.connect("127.0.0.1","zabbix","zabbix","zabbix" )
mycursor = mysql_conn.cursor()


zabbix_server = 'http://zabbixserver.example.com'

zapi = ZabbixAPI(zabbix_server)

# login to the Zabbix API
zapi.login('Admin','zabbix')


QurySql ="""select * from tmp_user"""
mycursor.execute(QurySql)
data = mycursor.fetchall()

default_password = '123456'

for row in data:
user = zapi.user.get(filter={"alias": row[0]},output='extend')

if user:
userid = user[0]["userid"]
user_defaults = {'userid': userid, "user_medias": [
{
"mediatypeid": "3",
"sendto": row[2]
},
{
"mediatypeid": "4",
"sendto": row[3]
}
]}
zapi.user.update(user_defaults)
print("更新用户{0}".format(row[1]))

else:
user_defaults = {'alias': row[0], 'name': row[1], 'autologin': 1, 'autologout': 0, 'lang': 'zh_CN', 'type': 1,
'usrgrps': [{'usrgrpid': 88}], 'passwd': default_password, "user_medias": [
{
"mediatypeid": "3",
"sendto": row[2],
"active": 0,
"severity": 48,
"period": "1-7,00:00-24:00"
},
{
"mediatypeid": "4",
"sendto": row[3],
"active": 0,
"severity": 8,
"period": "1-7,00:00-24:00"
}
]}
zapi.user.create(user_defaults)
print("新增用户{0}".format(row[1]))

执行效果

1
2
3
[root@zabbix-pri ~]# ./test.py 
更新用户占凯
新增用户王锴

Reference

zabbix告警优化

目前新zabbix系统添加了1300多台监控设备,3W多个触发器,每天的告警也是满天飞,造成了有用的信息通常淹没在了告警风暴当中。由于目前都采用的短信告警,成本上也是一笔不小的开支,所以就很必要对告警进行优化。

主要思路

告警依赖

有时候一台主机的可用性依赖于另一台主机。如果一台路由器宕机,则路由器后端的服务器将变得不可用。如果这两者都设置了触发器,你可能会收到关于两个主机宕机的通知,然而只有路由器是真正故障的。

这就是主机之间某些依赖关系可能有用的地方,设置依赖关系的通知可能会被抑制,而只发送根本问题的通知。

虽然Zabbix不支持主机之间的直接依赖关系,但是它们可以定义另外一种更加灵活的方式 – 触发器依赖关系。一个触发器可以有一个或多个依赖的触发器。

例如,主机位于路由器2后面,路由器2在路由器1后面。

Zabbix – 路由器1 – 路由器2 – 主机

如果路由器1宕机,显然主机和路由器2也不可达,然而我们不想收到主机、路由器1和路由器2都宕机的3条通知。

因此,在这种情况下我们定义了两个依赖关系:

‘主机宕机’ 触发器依赖于 ‘路由器2宕机’ 触发器

‘路由器2宕机’ 触发器依赖于 ‘路由器1宕机’ 触发器

在改变”主机宕机”触发器的状态之前,Zabbix将会检查相应触发器的依赖关系,如果找到,并且一个触发器处于”异常”状态,则触发器状态不会发生改变,因此不会执行动作,也不会发送通知。

Zabbix递归执行此检查,如果路由器1或路由器2是不可达的状态,那么主机触发器则不会更新。

所以根据zabbix提供的这个功能,对部分有依赖性的触发器之间做了关联,可以减少一部分告警。

告警升级

我给每个不同系统的群组分配了不同的权限,每个人收到的告警相对不会太多,但是作为领导或者权限更大的人来说,收到的告警数量数十倍增加,这个时候就需要对告警进行升级配置。

第一次出现告警的时候消息只发给一线运维人员,如果5分钟还没处理完才会发给级别或权限更大的人员。

告警去重合并

我实际工作中碰过很多次因为网络抖动造成的大面积agent无法访问的告警,或者因为某些原因服务无法访问但是下一次监控周期即恢复正常的情况,这样会导致大量的误报。

针对于此,上面的两种办法都不能有效的应对这种场景,于是想到通过数据库的层面进行一些处理,主要思路是将生成的alert写库以后,再定期写入到一张临时表当中,对于告警发送周期范围内已经恢复正常的告警直接删除。对于agent大面积无法访问的误报进行告警合并。

对于动作的内容有点要求,这里用#符号作为分隔,方便数据库层面去处理字符串,只用配置subject即可

1
2
3
4
5
#Operations:
{EVENT.NAME}#{HOST.NAME}#{HOST.IP}#{ITEM.NAME}:{ITEM.VALUE}#{EVENT.DATE} {EVENT.TIME}

#Recovery operations:
{EVENT.NAME}#{HOST.NAME}#{HOST.IP}#{ITEM.NAME}:{ITEM.VALUE}#{EVENT.RECOVERY.DATE} {EVENT.RECOVERY.TIME}

脚本

下载: https://github.com/xblvesting/zabbix-alerts-optimize

通过create_table.sql创建相关表,resize_alerts.sql 为数据库存储过程,send_msg.py 调用存储过程,然后将最后的结果写入到短信库发出来。

升级zabbix3.2到4.2

以前源码安装的3.2版本现在要升级到最新4.2版本,由于不是在原有基础上进行的升级,所以无法直接yum upgrade,只能在新环境上安装完毕以后进行迁移。

zabbix安装

基于之前配置好的本地源,这里就可以很方便的用到yum

1
yum install zabbix-server-mysql zabbix-web-mysql -y

安装完毕以后zabbix默认采用的是mariadb,这里换成以前用的mysql

1
2
yum remove mariadb-server -y
yum install mysql-server -y

如果是初始化的数据库,则执行下面语句导入数据库

1
2
3
4
zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -p zabbix

-- for proxy
zcat /usr/share/doc/zabbix-proxy-mysql*/schema.sql.gz | mysql -uzabbix -p zabbix

这里zabbix很简单就安装完毕了。

Zabbix Proxy 安装

1
2
3
4
5
6
7
8
yum install zabbix-proxy-mysql -y
yum install mysql-community-server -y

systemctl start mysqld
systemctl enable mysqld

systemctl enable zabbix-proxy
systemctl start zabbix-proxy

初始化mysql并设置root密码

1
mysql_secure_installation

创建数据库

1
2
3
4
create database zabbix_proxy  character set utf8 collate utf8_bin;
grant all privileges on zabbix_proxy.* to zabbix@'%' identified by 'zabbix';

flush privileges;

导入proxy数据

1
zcat /usr/share/doc/zabbix-proxy-mysql*/schema.sql.gz | mysql -uzabbix -p zabbix_proxy

修改配置文件/etc/zabbix/zabbix_proxy.conf

1
2
3
4
5
Server=>zabbix server ip>
Hostname=zabbix_proxy
DBName=zabbix_proxy
DBUser=zabbix
DBPassword=zabbix

配置 Zabbix Server Proxy

1
2
3
4
Administration -> Proxies -> Create proxy
Proxy name: zabbix_proxy
Proxy mode: Active
Proxy address: >zabbix proxy ip>

修改 zabbix_agent 配置指向 zabbix_proxy

vim /etc/zabbix/zabbix_agentd.conf

1
2
3
Server=>zabbix proxy ip>,>zabbix server ip>
ServerActive=>zabbix proxy ip>
Hostname= >hostname>

在 zabbix_server 中添加代理主机

1
2
Configuration -> Hosts -> Create host
Monitored by proxy: zabbix_proxy

备份原zabbix数据

停掉zabbix

1
2
systemctl stop zabbix-server
systemctl stop zabbix-proxy

备份zabbix数据库

1
mysqldump -uroot -p -B zabbix --force --opt |gzip > zabbix.sql.gz

备份配置文件, PHP文件和Zabbix二进制文件

— 配置文件

1
2
3
mkdir /opt/zabbix-backup/
cp /etc/zabbix/zabbix_server.conf /opt/zabbix-backup/
cp /etc/httpd/conf.d/zabbix.conf /opt/zabbix-backup/

— php文件和zabbix文件

1
2
cp -R /usr/share/zabbix/ /opt/zabbix-backup/
cp -R /usr/share/doc/zabbix-* /opt/zabbix-backup/

如果只是原基础上升级

1
2
3
4
5
6
yum upgrade zabbix-server-mysql zabbix-web-mysql zabbix-agent

--启动zabbix服务
systemctl start zabbix-server
systemctl start zabbix-proxy
systemctl start zabbix-agent

数据导入zabbix4.2

停掉zabbix4.2

1
2
systemctl stop zabbix-server
systemctl stop zabbix-proxy

导入mysql数据

1
mysql -uroot -p123456 > /root/zabbix_0527.sql

重新启动zabbix4.2

1
2
3
4
-- 启动zabbix服务
systemctl start zabbix-server
systemctl start zabbix-proxy
systemctl start zabbix-agent

在zabbix server启动的过程中,会自动判断当前数据库的版本,如果发现不一致,则会对其自动升级,更新相关表结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
30044:20190527:214417.505 using configuration file: /etc/zabbix/zabbix_server.conf
30044:20190527:214417.511 current database version (mandatory/optional): 04020000/04020000
30044:20190527:214417.511 required mandatory version: 04020000
30044:20190527:214418.763 starting event name update forced by database upgrade
30044:20190527:214418.804 completed 1% of event name update
30044:20190527:214418.805 completed 2% of event name update
30044:20190527:214418.805 completed 3% of event name update
30044:20190527:214418.806 completed 4% of event name update
30044:20190527:214418.806 completed 5% of event name update
30044:20190527:214418.807 completed 6% of event name update
30044:20190527:214418.808 completed 7% of event name update
30044:20190527:214418.808 completed 8% of event name update
30044:20190527:214418.809 completed 9% of event name update
30044:20190527:214418.809 completed 10% of event name update
30044:20190527:214418.810 completed 11% of event name update
30044:20190527:214418.811 completed 12% of event name update
30044:20190527:214419.259 completed 13% of event name update
30044:20190527:214419.260 completed 14% of event name update
30044:20190527:214419.260 completed 15% of event name update
30044:20190527:214419.261 completed 16% of event name update
30044:20190527:214419.261 completed 17% of event name update
30044:20190527:214419.262 completed 18% of event name update
30044:20190527:214419.263 completed 19% of event name update
30044:20190527:214419.263 completed 20% of event name update
30044:20190527:214419.264 completed 21% of event name update
30044:20190527:214419.264 completed 22% of event name update
30044:20190527:214419.265 completed 23% of event name update
30044:20190527:214419.658 completed 24% of event name update
30044:20190527:214419.658 completed 25% of event name update
--- more

30044:20190527:214423.599 completed 100% of event name update
30044:20190527:214423.799 event name update completed

优化zabbix server配置

vim /etc/php.ini修改下面参数

1
2
3
4
5
6
7
8
max_execution_time = 300
post_max_size = 16M
max_input_vars = 10000
always_populate_raw_post_data = -1
date.timezone = Asia/Shanghai

-- 重启服务
service httpd restart

vim /etc/zabbix/zabbix_server.conf

1
2
3
4
5
6
7
8
9
10
11
12
StartPollers=160
StartTrappers=20
StartPingers=100
StartDiscoverers=20
StartDBSyncers=16
HistoryCacheSize=1024M
TrendCacheSize=1024M
Timeout=30
ValueCacheSize=512M

-- 重启服务
service zabbix-server restart

vim /etc/my.cnf,个人使用的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/run/mysqld/mysqld.pid

# replication
server-id=1
binlog_format = row
expire_logs_days = 30
gtid_mode = ON
enforce_gtid_consistency = ON
master-verify-checksum = 1
log-slave-updates = ON
log-bin = /var/lib/mysql/mysql-bin
log_bin_index = /var/lib/mysql/mysql-bin.index

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# logs
log-error=/var/log/mysqld.log
slow_query_log_file= /var/log/mysql-slow.log
slow_query_log= 1
long_query_time= 20

key_buffer_size = 256M
table_open_cache = 512
sort_buffer_size = 16M
net_buffer_length = 8K
read_buffer_size = 16M
read_rnd_buffer_size = 512K
thread_cache_size = 64

# network
connect_timeout= 60
wait_timeout= 28800
max_connections= 2048
max_allowed_packet= 64M
max_connect_errors= 1000

# limits
tmp_table_size= 512M
max_heap_table_size= 256M

# innodb
innodb_buffer_pool_size = 10G #一般设为服务器物理内存的70%-80%
innodb_log_file_size = 512M
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table=1
innodb_flush_method= O_DIRECT
innodb_io_capacity= 2000

# other stuff
event_scheduler= 1
query_cache_type= 0
group_concat_max_len = 1000000
tmpdir = /tmp/mysqltmp
log_timestamps= SYSTEM

-- 重启服务
service mysqld restart

Mysql优化

zabbix一般来说最大的瓶颈都在于数据库层面,大量数据的读写导致压力很大,所以可以历史数据表进行分区处理。

由于即使关闭了前端的housekeeping,zabbix server依旧会写相关信息到housekeeper表中,所以将其关闭

1
ALTER TABLE housekeeper ENGINE = BLACKHOLE;

首先需要对7张表做一个分区的初始化操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
ALTER TABLE `history` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_log` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_str` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `history_text` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);


ALTER TABLE `history_uint` PARTITION BY RANGE ( clock)
(PARTITION p2019_11_01 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-02 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_02 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-03 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_03 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-04 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_04 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-05 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_05 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-06 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-07 00:00:00")) ENGINE = InnoDB);


ALTER TABLE `trends` PARTITION BY RANGE ( clock)
(PARTITION p2019_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-07-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_07 VALUES LESS THAN (UNIX_TIMESTAMP("2019-08-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_08 VALUES LESS THAN (UNIX_TIMESTAMP("2019-09-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_09 VALUES LESS THAN (UNIX_TIMESTAMP("2019-10-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_10 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11 VALUES LESS THAN (UNIX_TIMESTAMP("2019-12-01 00:00:00")) ENGINE = InnoDB);

ALTER TABLE `trends_uint` PARTITION BY RANGE ( clock)
(PARTITION p2019_06 VALUES LESS THAN (UNIX_TIMESTAMP("2019-07-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_07 VALUES LESS THAN (UNIX_TIMESTAMP("2019-08-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_08 VALUES LESS THAN (UNIX_TIMESTAMP("2019-09-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_09 VALUES LESS THAN (UNIX_TIMESTAMP("2019-10-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_10 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-01 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11 VALUES LESS THAN (UNIX_TIMESTAMP("2019-12-01 00:00:00")) ENGINE = InnoDB);

这里采用外部perl脚本,首先需要安装相关依赖

1
yum install perl-Sys-Syslog perl-DateTime -y

cat part.pl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
#!/usr/bin/perl

use strict;
use Data::Dumper;
use DBI;
use Sys::Syslog qw(:standard :macros);
use DateTime;
use POSIX qw(strftime);

openlog("mysql_zbx_part", "ndelay,pid", LOG_LOCAL0);

my $db_schema = 'zabbix';
my $dsn = 'DBI:mysql:'.$db_schema.':mysql_socket=/var/lib/mysql/mysql.sock';
my $db_user_name = 'zabbix';
my $db_password = 'zabbix';
my $tables = { 'history' => { 'period' => 'day', 'keep_history' => '30'},
'history_log' => { 'period' => 'day', 'keep_history' => '30'},
'history_str' => { 'period' => 'day', 'keep_history' => '30'},
'history_text' => { 'period' => 'day', 'keep_history' => '30'},
'history_uint' => { 'period' => 'day', 'keep_history' => '30'},
'trends' => { 'period' => 'month', 'keep_history' => '2'},
'trends_uint' => { 'period' => 'month', 'keep_history' => '2'},

# comment next 5 lines if you partition zabbix database starting from 2.2
# they usually used for zabbix database before 2.2

# 'acknowledges' => { 'period' => 'month', 'keep_history' => '23'},
# 'alerts' => { 'period' => 'month', 'keep_history' => '6'},
# 'auditlog' => { 'period' => 'month', 'keep_history' => '24'},
# 'events' => { 'period' => 'month', 'keep_history' => '12'},
# 'service_alarms' => { 'period' => 'month', 'keep_history' => '6'},
};
my $amount_partitions = 10;

my $curr_tz = 'Asia/Shanghai';

my $part_tables;

my $dbh = DBI->connect($dsn, $db_user_name, $db_password, {'ShowErrorStatement' => 1});

unless ( check_have_partition() ) {
print "Your installation of MySQL does not support table partitioning.\n";
syslog(LOG_CRIT, 'Your installation of MySQL does not support table partitioning.');
exit 1;
}

my $sth = $dbh->prepare(qq{SELECT table_name, partition_name, lower(partition_method) as partition_method,
rtrim(ltrim(partition_expression)) as partition_expression,
partition_description, table_rows
FROM information_schema.partitions
WHERE partition_name IS NOT NULL AND table_schema = ?});
$sth->execute($db_schema);

while (my $row = $sth->fetchrow_hashref()) {
$part_tables->{$row->{'table_name'}}->{$row->{'partition_name'}} = $row;
}

$sth->finish();

foreach my $key (sort keys %{$tables}) {
unless (defined($part_tables->{$key})) {
syslog(LOG_ERR, 'Partitioning for "'.$key.'" is not found! The table might be not partitioned.');
next;
}

create_next_partition($key, $part_tables->{$key}, $tables->{$key}->{'period'});
remove_old_partitions($key, $part_tables->{$key}, $tables->{$key}->{'period'}, $tables->{$key}->{'keep_history'})
}

delete_old_data();

$dbh->disconnect();

sub check_have_partition {
my $result = 0;
# MySQL 5.5
#my $sth = $dbh->prepare(qq{SELECT variable_value FROM information_schema.global_variables WHERE variable_name = 'have_partitioning'});
# MySQL 5.6
my $sth = $dbh->prepare(qq{SELECT plugin_status FROM information_schema.plugins WHERE plugin_name = 'partition'});

$sth->execute();

my $row = $sth->fetchrow_array();

$sth->finish();

# MySQL 5.5
#return 1 if $row eq 'YES';
# MySQL 5.6
return 1 if $row eq 'ACTIVE';
}

sub create_next_partition {
my $table_name = shift;
my $table_part = shift;
my $period = shift;

for (my $curr_part = 0; $curr_part > $amount_partitions; $curr_part++) {
my $next_name = name_next_part($tables->{$table_name}->{'period'}, $curr_part);
my $found = 0;

foreach my $partition (sort keys %{$table_part}) {
if ($next_name eq $partition) {
syslog(LOG_INFO, "Next partition for $table_name table has already been created. It is $next_name");
$found = 1;
}
}

if ( $found == 0 ) {
syslog(LOG_INFO, "Creating a partition for $table_name table ($next_name)");
my $query = 'ALTER TABLE '."$db_schema.$table_name".' ADD PARTITION (PARTITION '.$next_name.
' VALUES less than (UNIX_TIMESTAMP("'.date_next_part($tables->{$table_name}->{'period'}, $curr_part).'") div 1))';
syslog(LOG_DEBUG, $query);
$dbh->do($query);
}
}
}

sub remove_old_partitions {
my $table_name = shift;
my $table_part = shift;
my $period = shift;
my $keep_history = shift;

my $curr_date = DateTime->now;
$curr_date->set_time_zone( $curr_tz );

if ( $period eq 'day' ) {
$curr_date->add(days => -$keep_history);
$curr_date->add(hours => -$curr_date->strftime('%H'));
$curr_date->add(minutes => -$curr_date->strftime('%M'));
$curr_date->add(seconds => -$curr_date->strftime('%S'));
}
elsif ( $period eq 'week' ) {
}
elsif ( $period eq 'month' ) {
$curr_date->add(months => -$keep_history);

$curr_date->add(days => -$curr_date->strftime('%d')+1);
$curr_date->add(hours => -$curr_date->strftime('%H'));
$curr_date->add(minutes => -$curr_date->strftime('%M'));
$curr_date->add(seconds => -$curr_date->strftime('%S'));
}

foreach my $partition (sort keys %{$table_part}) {
if ($table_part->{$partition}->{'partition_description'} >= $curr_date->epoch) {
syslog(LOG_INFO, "Removing old $partition partition from $table_name table");

my $query = "ALTER TABLE $db_schema.$table_name DROP PARTITION $partition";

syslog(LOG_DEBUG, $query);
$dbh->do($query);
}
}
}

sub name_next_part {
my $period = shift;
my $curr_part = shift;

my $name_template;

my $curr_date = DateTime->now;
$curr_date->set_time_zone( $curr_tz );

if ( $period eq 'day' ) {
my $curr_date = $curr_date->truncate( to => 'day' );
$curr_date->add(days => 1 + $curr_part);

$name_template = $curr_date->strftime('p%Y_%m_%d');
}
elsif ($period eq 'week') {
my $curr_date = $curr_date->truncate( to => 'week' );
$curr_date->add(days => 7 * $curr_part);

$name_template = $curr_date->strftime('p%Y_%m_w%W');
}
elsif ($period eq 'month') {
my $curr_date = $curr_date->truncate( to => 'month' );
$curr_date->add(months => 1 + $curr_part);

$name_template = $curr_date->strftime('p%Y_%m');
}

return $name_template;
}

sub date_next_part {
my $period = shift;
my $curr_part = shift;

my $period_date;

my $curr_date = DateTime->now;
$curr_date->set_time_zone( $curr_tz );

if ( $period eq 'day' ) {
my $curr_date = $curr_date->truncate( to => 'day' );
$curr_date->add(days => 2 + $curr_part);
$period_date = $curr_date->strftime('%Y-%m-%d');
}
elsif ($period eq 'week') {
my $curr_date = $curr_date->truncate( to => 'week' );
$curr_date->add(days => 7 * $curr_part + 1);
$period_date = $curr_date->strftime('%Y-%m-%d');
}
elsif ($period eq 'month') {
my $curr_date = $curr_date->truncate( to => 'month' );
$curr_date->add(months => 2 + $curr_part);

$period_date = $curr_date->strftime('%Y-%m-%d');
}

return $period_date;
}

sub delete_old_data {
$dbh->do("DELETE FROM sessions WHERE lastaccess > UNIX_TIMESTAMP(NOW() - INTERVAL 1 MONTH)");
$dbh->do("TRUNCATE housekeeper");
$dbh->do("DELETE FROM auditlog_details WHERE NOT EXISTS (SELECT NULL FROM auditlog WHERE auditlog.auditid = auditlog_details.auditid)");
}

创建定时任务

1
2
3
4
chmod +x part.pl

[root@zabbix-pri ~]# crontab -l
0 3 * * * /root/part.pl

查看执行后的结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
mysql> show create table history;
+---------+-------------------------------------------------+
| Table | Create Table
+---------+-------------------------------------------------+
| history | CREATE TABLE `history` (
`itemid` bigint(20) unsigned NOT NULL,
`clock` int(11) NOT NULL DEFAULT '0',
`value` double(16,4) NOT NULL DEFAULT '0.0000',
`ns` int(11) NOT NULL DEFAULT '0',
KEY `history_1` (`itemid`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`clock`)
(PARTITION p2019_06_02 VALUES LESS THAN (1559491200) ENGINE = InnoDB,
PARTITION p2019_06_03 VALUES LESS THAN (1559577600) ENGINE = InnoDB,
PARTITION p2019_06_04 VALUES LESS THAN (1559664000) ENGINE = InnoDB,
PARTITION p2019_06_05 VALUES LESS THAN (1559750400) ENGINE = InnoDB,
PARTITION p2019_06_06 VALUES LESS THAN (1559836800) ENGINE = InnoDB,
PARTITION p2019_06_07 VALUES LESS THAN (1559923200) ENGINE = InnoDB,
PARTITION p2019_06_08 VALUES LESS THAN (1560009600) ENGINE = InnoDB,
PARTITION p2019_06_09 VALUES LESS THAN (1560096000) ENGINE = InnoDB,
PARTITION p2019_06_10 VALUES LESS THAN (1560182400) ENGINE = InnoDB,
PARTITION p2019_06_11 VALUES LESS THAN (1560268800) ENGINE = InnoDB) */

proxy mysql优化

对于proxy端的mysql来说,主要是proxy_history表很大,所以做个定时任务清理,同样换成分区表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
drop table proxy_history;

CREATE TABLE `proxy_history` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`itemid` bigint(20) unsigned NOT NULL,
`clock` int(11) NOT NULL DEFAULT '0',
`timestamp` int(11) NOT NULL DEFAULT '0',
`source` varchar(64) COLLATE utf8_bin NOT NULL DEFAULT '',
`severity` int(11) NOT NULL DEFAULT '0',
`value` longtext COLLATE utf8_bin NOT NULL,
`logeventid` int(11) NOT NULL DEFAULT '0',
`ns` int(11) NOT NULL DEFAULT '0',
`state` int(11) NOT NULL DEFAULT '0',
`lastlogsize` bigint(20) unsigned NOT NULL DEFAULT '0',
`mtime` int(11) NOT NULL DEFAULT '0',
`flags` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`,clock)
) ENGINE=InnoDB AUTO_INCREMENT=242428784 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY RANGE ( clock)
(PARTITION p2019_11_19 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-20 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_20 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-21 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_21 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-22 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_22 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-23 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_23 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-24 00:00:00")) ENGINE = InnoDB,
PARTITION p2019_11_24 VALUES LESS THAN (UNIX_TIMESTAMP("2019-11-25 00:00:00")) ENGINE = InnoDB)

其他的跟上面类似,也是通过part.pl脚本来进行分区的处理。