OGG缺失日志导致进程中断

有时候ogg出现中断,日志里会提示如下错误信息:

ERROR OGG-00868 Oracle GoldenGate Capture for Oracle, eprisk.prm: Error code 1291, error message: ORA-01291: 缺失日志文件

(Missing Log File WAITING FOR REDO: FILE NA, THREAD 2, SEQUENCE 290359, SCN 0x0000002c59bd0176. Read Position SCN: 44.1505736448 (190484297472))

在这种情况下,是因为日志出现了gap,但好消息是通常我们都有这些归档日志的备份。

OGG通过dba_registered_archived_log 里提到的logfile来进行redo日志的挖掘,因此我们需要确定出基于SCN号所需要的所有归档日志(参考 ‘Minimum Archive Log Necessary to Restart Integrated Extract’ - in the Streams’ Healthcheck report output (Note 1448324.1))。

  • 查询抽取进程的当前中断时的APPLIED_SCN
阅读更多

Zabbix监控ogg延迟情况

最近ogg出现了一点问题,没有及时发现,于是考虑将ogg的监控也纳入zabbix当中来。对于这一类监控,考虑的地方不单单在于如何监控,而是善用zabbix的模板、自动发现等功能来实现,这样会方便配置以及后期的可扩展性。

首先对ogg的运行情况查看通常是通过info all命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
MANAGER     RUNNING                                           
EXTRACT RUNNING EPRISK 00:00:03 00:00:08
EXTRACT RUNNING ERDMEDW 00:00:04 00:00:06
EXTRACT RUNNING ESHDW 00:00:00 00:00:01
EXTRACT RUNNING PCIF 00:00:00 00:00:08
EXTRACT RUNNING PEIF 00:00:00 00:00:08
EXTRACT RUNNING PHG 00:00:00 00:00:08
EXTRACT RUNNING PQH 00:00:00 00:00:06
EXTRACT RUNNING PRISKMGR 00:00:00 00:00:08
EXTRACT RUNNING PSHDW 00:00:00 00:00:06
REPLICAT RUNNING RCIF 00:00:10 00:00:06
REPLICAT RUNNING REDMDB1 00:00:00 00:00:01
REPLICAT RUNNING REDQDB1 00:00:00 00:00:07
REPLICAT RUNNING REIF 00:00:00 00:00:00
REPLICAT RUNNING REPWIND 00:00:00 00:00:06
REPLICAT RUNNING REPWIND2 00:00:00 00:00:00
REPLICAT RUNNING REPWIND3 00:00:00 00:00:05
REPLICAT RUNNING REPWIND4 00:00:00 00:00:04
REPLICAT RUNNING RZNMGR 00:00:00 00:00:00
REPLICAT RUNNING RZNMGRC 00:00:00 00:00:01

这里总共有19个抽取、传输进程,根据实际需求只需要做到分钟级别监控即可,所以筛选出4列,分别为运行状态、进程名、lag时间、time列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
echo "info all"|ggsci|egrep 'EXTRACT|REPLICAT'|awk -F"[ ]+|:" '{print $2,$3,$4*60+$5,$7*60+$8}'

RUNNING EPRISK 0 0
RUNNING ERDMEDW 0 0
RUNNING ESHDW 0 0
RUNNING PCIF 0 0
RUNNING PEIF 0 0
RUNNING PHG 0 0
RUNNING PQH 0 0
RUNNING PRISKMGR 0 0
RUNNING PSHDW 0 0
RUNNING RCIF 0 0
RUNNING REDMDB1 0 0
RUNNING REDQDB1 0 0
RUNNING REIF 0 0
RUNNING REPWIND 0 0
RUNNING REPWIND2 0 0
RUNNING REPWIND3 0 0
RUNNING REPWIND4 0 0
RUNNING RZNMGR 0 0
RUNNING RZNMGRC 0 0

通过shell将获取到的数据写入中间文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
all()
{
source /home/oracle/.bash_profile;echo "info all"|ggsci|egrep 'EXTRACT|REPLICAT'|awk -F"[ ]+|:" '{print $2,$3,$4*60+$5,$7*60+$8}'
}

name()
{
source /home/oracle/.bash_profile;echo "info all"|ggsci|egrep 'EXTRACT|REPLICAT'|awk '{print $3}'
}

a=$1
case "$a" in
all)
all
;;
name)
name
;;
*)
echo -e "Usage: ./`basename $0` [all|name]"
esac

为了满足zabbix自动发现的要求,返回值必须要是json格式,所以可以对上面的数据进行格式化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/env python  
#coding:utf-8
import os, json, sys
f = open("/etc/zabbix/scripts/ogg.cfg","r")
lines = f.readlines()
ogg_list=[]
ogg_dict={"data":None}
for line in lines:
if line.startswith('#')or not line.split():
continue
line=line.strip('\n')
status=line.split()[0]
name=line.split()[1]
lag=line.split()[2]
time=line.split()[3]
pdict={}
pdict["{#NAME}"]=name
pdict["{#STATUS}"]=status
pdict["{#LAG}"]=lag
pdict["{#TIME}"]=time
ogg_list.append(pdict)
f.close()
ogg_dict["data"]=ogg_list
var = sys.argv[1]
var2 = sys.argv[2]
jsonStr = json.dumps(ogg_dict, sort_keys=True, indent=4)
#print jsonStr

def get_lag(name):
l = [x.get("{#LAG}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
print l[0]

def get_time(name):
l = [x.get("{#TIME}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
print l[0]

def get_status(name):
l = [x.get("{#STATUS}") for t, x in enumerate(ogg_list) if x.get("{#NAME}") == name]
print l[0]

def get_json():
jsonStr = json.dumps(ogg_dict, sort_keys=True, indent=4)
print jsonStr

def ogg(var, var2):
if var == 'json':
get_json()
else:
if var2 == 'lag':
get_lag(var)
elif var2 == 'status':
get_status(var)
else:
get_time(var)
ogg(var, var2)

最后得到的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
{
"data": [
{
"{#LAG}": "0",
"{#NAME}": "EPRISK",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "ERDMEDW",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "ESHDW",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PCIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PEIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PHG",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PQH",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PRISKMGR",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "PSHDW",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "RCIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REDMDB1",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REDQDB1",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REIF",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND2",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND3",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "REPWIND4",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "RZNMGR",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
},
{
"{#LAG}": "0",
"{#NAME}": "RZNMGRC",
"{#STATUS}": "RUNNING",
"{#TIME}": "0"
}
]
}

数据都已经获取到,到这里脚本就准备完毕,需要添加两个自定义key

1
2
UserParameter=ogg.discovery,/etc/zabbix/scripts/ogg_delay.py json test
UserParameter=ogg_delay[*],/etc/zabbix/scripts/ogg_delay.py $1 $2

在zabbix上新建模板,配置宏参数以及新增监控项和触发器等,最后只需要将模板添加到对应监控的主机即可