最近ogg出现了一点问题,没有及时发现,于是考虑将ogg的监控也纳入zabbix当中来。对于这一类监控,考虑的地方不单单在于如何监控,而是善用zabbix的模板、自动发现等功能来实现,这样会方便配置以及后期的可扩展性。

首先对ogg的运行情况查看通常是通过info all命令

MANAGER     RUNNING                                           
EXTRACT     RUNNING     ERDMEDW     00:00:03      00:00:09    
EXTRACT     RUNNING     ERDMEDWC    00:00:07      00:00:01    
EXTRACT     RUNNING     PCIF        00:00:00      00:00:05    
EXTRACT     RUNNING     PEIF        00:00:00      00:00:05    
EXTRACT     RUNNING     PHG         00:00:00      00:00:05    
EXTRACT     RUNNING     PQH         00:00:00      00:00:00    
EXTRACT     RUNNING     PRISK       00:00:00      00:00:00    
EXTRACT     RUNNING     PRISKMGR    00:00:00      00:00:01    
REPLICAT    RUNNING     RCIF        00:00:06      00:00:08    
REPLICAT    RUNNING     REIF        00:00:00      00:00:08    
REPLICAT    RUNNING     RUF2        00:00:08      00:00:01    
REPLICAT    RUNNING     RZNCLS      00:00:00      00:00:09    
REPLICAT    RUNNING     RZNMGR      00:00:00      00:00:03    
REPLICAT    RUNNING     RZNMGRC     00:00:00      00:00:08

这里总共有14个抽取、传输进程,根据实际需求只需要做到分钟级别监控即可,所以筛选出3列,分别为进程名、lag时间、time列

echo "info all"|ggsci|awk -F"[ ]+|:" '/REPLICAT|EXTRACT/{print $3,$5,$8}'

ERDMEDW 00 00
ERDMEDWC 00 00
PCIF 00 00
PEIF 00 00
PHG 00 00
PQH 00 00
PRISK 00 00
PRISKMGR 00 00
RCIF 00 00
REIF 00 00
RUF2 00 00
RZNCLS 00 00
RZNMGR 00 00
RZNMGRC 00 00

为了满足zabbix自动发现的要求,返回值必须要是json格式,所以可以对上面的数据进行格式化

#!/usr/bin/env python
# coding:utf-8
import os
import json
cmd = "source /home/oracle/.bash_profile;echo \"info all\"|ggsci|awk -F\"[ ]+|:\" '/REPLICAT|EXTRACT/{print $3,$5,$8}'"
res = os.popen(cmd).read()
res_s = res.split()

length = len(res_s) / 3

ogg_list = []
ogg_dict = {"data": None}
for i in range(length):
    pdict = {}
    pdict["{#NAME}"] = res_s[0 + i * 3]
    pdict["{#LAG}"] = res_s[1 + i * 3]
    pdict["{#TIME}"] = res_s[2 + i * 3]
    ogg_list.append(pdict)
ogg_dict["data"] = ogg_list
jsonStr = json.dumps(ogg_dict, sort_keys=True, indent=4)
print jsonStr

最后得到的数据

{
    "data": [
        {
            "{#LAG}": "00", 
            "{#NAME}": "ERDMEDW", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "ERDMEDWC", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "PCIF", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "PEIF", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "PHG", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "PQH", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "PRISK", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "PRISKMGR", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "RCIF", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "REIF", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "RUF2", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "RZNCLS", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "RZNMGR", 
            "{#TIME}": "00"
        }, 
        {
            "{#LAG}": "00", 
            "{#NAME}": "RZNMGRC", 
            "{#TIME}": "00"
        }
    ]
}

通过这个json返回数据,就能用自动发现将{#NAME}作为变量,接下来的事就是要根据得到的变量值去获取相对应的lag和time值,可以从之前生成的列表里直接取到。

#!/usr/bin/env python
# coding:utf-8
import os
import json
import sys
cmd = "source /home/oracle/.bash_profile;echo \"info all\"|ggsci|awk -F\"[ ]+|:\" '/REPLICAT|EXTRACT/{print $3,$5,$8}'"
res = os.popen(cmd).read()
res_s = res.split()

length = len(res_s) / 3

ogg_list = []
ogg_dict = {"data": None}
for i in range(length):
    pdict = {}
    pdict["{#NAME}"] = res_s[0 + i * 3]
    pdict["{#LAG}"] = res_s[1 + i * 3]
    pdict["{#TIME}"] = res_s[2 + i * 3]
    ogg_list.append(pdict)
ogg_dict["data"] = ogg_list

var = sys.argv[1]
var2 = sys.argv[2]

def get_lag(name):
        l = [x.get("{#LAG}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
        print l[0]

def get_time(name):
        l = [x.get("{#TIME}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
        print l[0]

def get_json():
        jsonStr = json.dumps(ogg_dict, sort_keys=True, indent=4)
        print jsonStr

def ogg(var, var2):
        if var == 'json':
                get_json()
        else:
                if var2 == 'lag':
                        get_lag(var)
                else:
                        get_time(var)
ogg(var, var2)

到这里脚本就准备完毕,需要添加两个自定义key

UserParameter=ogg.discovery,/etc/zabbix/scripts/ogg_delay.py json test
UserParameter=ogg_delay[*],/etc/zabbix/scripts/ogg_delay.py $1 $2

在zabbix上新建模板,配置宏参数以及新增监控项和触发器等,最后只需要将模板添加到对应监控的主机即可

ogg_item

ogg_trigger

ogg_data