Dataguard中pdb的迁移和故障切换

在12c多租户环境中,每一个CDB当中都会有多个pdb在运行,在某些情况下会涉及到CDB之间进行pdb的迁移。当然也就会碰到类似的情况,比如只有其中某一个pdb出现了问题,而其他的pdb都运行正常,这个时候如果对CDB做failover代价太大,这时为了最大程度减少停机时间和缩小影响,我们可以只对这个单独的pdb进行failover或者迁移到同服务器的其他primary cdb。这就需要用到dg broker所带来的新功能,从12.1版本开始,通过dg broker中migrate pluggable database命令能很方便的对pdb进行迁移操作,方便的同时得需要满足一些条件才可以实现,数据库的版本、补丁、COMPATIBLE参数等

实验环境大致如下

Host CDBNAME PDBNAME
Pri ORA12C PDB12C
Pri ORA12C PDB3
Stb ORA12C PDB12C
Stb ORA12C PDB3
Stb CDB1 PDB1

注意事项

下面的步骤假设源CDB库(无论是主库还是备库用来做迁移和failover)和目标CDB库能访问到同一个存储,这样就不需要复制文件操作。

  • 源CDB库可以是RAC也可以是单实例
  • 源CDB的备库不需要是ADG,尽管备库的CDB和PDB都需要被打开为只读模式一段时间
  • 目标CDB可以是RAC也可以是单实例
  • 源端和目标端的存储方式可以是文件系统也可以是ASM
  • 如果是从备库进行迁移,确保之前已经建好了TEMP临时文件
  • 如果源PDB正常关闭:
    • 对于迁移操作,源pdb必须要打开,而且恢复到当前状态否则迁移的过程中会拔下失败
    • 对于failover,源端pdb的正常关闭没什么影响。拔下操作会失败,需要手动清理并完成操作。
    • 如果目标CDB的版本较高,则pdb仍能插入但是处于关闭状态,需要手动进行升级。
    • 无法迁移或者failover到低版本数据库
    • 迁移和failover不支持PDB快照克隆
    • 任务执行完毕以后,需要手动清理主库剩余的数据文件
    • 目标库插入pdb时是通过STANDBYS=NONE的方式,所以在目标库的其他备库中需要手动启用恢复

迁移过程

迁移是用来将PDB从一个cdb迁移到其他cdb中去,migrate命令会将dgbroker配置中主库的pdb拔下,然后插入到另外一个configuration中的主库当中去。

可以将pdb迁移到更高版本的CDB当中去,往低版本则不行。在迁往更高的版本以后,被迁移的pdb在目标cdb中处于关闭状态,你必须执行一些操作来让pdb进行升级。同样pdb可以迁移到更高COMPATIBLE参数值的数据库当中去。

迁移过程的大致步骤如下:

  1. 停止所有连到被迁移pdb的服务
  2. 如果dg环境中你却没有打补丁Patch 25616359 的话,则必须先要停掉备库上所有的pdb,否则redo apply会失败
  3. dgmgrl连接到主库
1
2
$ dgmgrl 
DGMGRL> connect sys as sysdba
  1. 执行migrate命令
1
2
3
4
5
DGMGRL> migrate pluggable database <PDBNAME1> to container <dest-cdb-name> using '/<path>/<DEST_SID>/<PDBNAME1.xml>' connect as <USERNAME>@"<dest-cdb-connect-identifer>";  

-or-

DGMGRL> migrate pluggable database <PDBNAME1> to container <dest-cdb-name> using '/<path>/<DEST_SID>/<PDBNAME1.xml>':

manifest文件存放路径必须可以同时被源端主库和目标端主库访问的到

  1. 当连上目标库以后,migrate命令会做以下事情:
  • 对migrate操作做一些必要的验证工作
  • 从主库拔下pdb
  • 在目标库上使用主库的数据文件通过STANDBYS=NONE的方式创建PDB
  • 如果目标库的版本与源端一样,则在其所有实例中打开PDB
  • 在源端主库中删除pdb,同时删除源端备库中的pdb
  1. 当migrate命令完成以后则执行以下任务:
  • 将之前连接到pdb的服务进行重连
  • 将目标端CDB中的pdb进行备份方便以后恢复
  • 参照Note 1916648.1中开启pdb的备库同步

迁移测试用例

主库pdb的状态

1
2
3
4
5
6
7
8
9
10
11
12
13
sys@ORA12C> show pdbs;

CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB12C READ WRITE NO
4 PDB3 READ WRITE NO

sys@ORA12C> select database_role from v$database;

DATABASE_ROLE
----------------
PRIMARY

生成pdb的xml文件

1
2
3
4
5
6
7
8
9
10
11
sys@ORA12C> alter session set container=pdb3;

Session altered.

BEGIN
DBMS_PDB.DESCRIBE(
pdb_descr_file => '/u01/app/oracle/oradata/pdb3.xml');
END;
/

PL/SQL procedure successfully completed.

在备库机器上连接到源库的dg broker

1
2
3
DGMGRL> connect sys/oracle@ora12c_dg
Connected to "ORA12C_DG"
Connected as SYSDBA.

然后执行迁移命令,使用之前生成的xml文件,因为这里只能使用本地路径,所以必须要是共享目录或者同一个机器上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
DGMGRL> migrate pluggable database immediate pdb3 to container cdb1 using '/u01/app/oracle/oradata/pdb3.xml';
Username: sys@cdb1
Password:
Connected to "cdb1"
Connected as SYSDBA.

Beginning migration of pluggable database PDB3.
Source multitenant container database is ORA12C_DG.
Destination multitenant container database is cdb1.

Connected to "ora12c"
Closing pluggable database PDB3 on all instances of multitenant container database ora12c.
Pluggable database PDB3 on database ORA12C_DG lags its primary
database, ora12c, by 2129 seconds.
Continuing with migration of pluggable database PDB3 to multitenant container database cdb1.
Stopping Redo Apply services on source multitenant container database ORA12C_DG.
Pluggable database description will be written to /u01/app/oracle/oradata/pdb3.xml.
Closing pluggable database PDB3 on all instances of multitenant container database ORA12C_DG.
Disabling media recovery for pluggable database PDB3.
Restarting redo apply services on source multitenant container database ORA12C_DG.
Creating pluggable database PDB3 on multitenant container database cdb1.
Opening pluggable database PDB3 on all instances of multitenant container database cdb1.
Unplugging pluggable database PDB3 from multitenant container database ora12c.
Pluggable database description will be written to /tmp/ora_tfils2PLmU.xml.
Dropping pluggable database PDB3 from multitenant container database ora12c.
Migration of pluggable database PDB3 completed.
Succeeded.

-- 检查目标端cdb的pdb状态
sys@CDB1> show pdbs;

CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB1 MOUNTED
4 PDB3 READ WRITE NO

-- 日志能看到并没有文件的拷贝动作
Completed: create pluggable database PDB3 using '/u01/app/oracle/oradata/pdb3.xml' nocopy standbys=none tempfile reuse
alter pluggable database PDB3 open instances=all
PDB3(4):Autotune of undo retention is turned on.

failover步骤

下面的图展示了将一个故障的pdb迁移到新的cdb过程中初始和结束的状态。

CDB1有三个pdb(PDB1,PDB2,PDB3),同时有一个物理备库。在物理备库同一个机器上有另外一个CDB2,CDB2是可读写状态,出故障的pdb将会迁移到这个CDB上面来。

迁移前的状态:

目前CDB1上的所有pdb都运行正常,CDB1的备库不是处于active dg。CDB2有自己的pdb4

迁移后的状态:

PDB2出现故障,估算需要很长的时间才能恢复,但是目前PDB1和PDB3均允许正常,而且备库的同步也都正常。为了最低程度减小影响,将使用备库pdb2的数据文件去插入到CDB2当中作为读写状态提供给应用访问,然后删除CDB1上的pdb2。这不是一个本机拔插操作,因为它需要一个可读写的CDB。

过程

failover主要用于主库的某个pdb出现了故障,而其他pdbs都允许正常,这时就需要将这个pdb移到其他CDB上去。它提供了一个方法可以只对主库的其中一个pdb进行故障转移而不用影响其他库。备库必须要与目标CDB共享数据文件存放的目录和通过DBMS_PDB.DESCRIBE生成的XML文件所在路径,目标CDB都要能够访问到这些文件,无论是通过在同一台机器或者NFS挂载的方式。

Failover不是一个计划性操作,所以目标就是尽可能降低停机时间。所以要确保一些前提条件:

  • 源和目标库都必须打上相同的补丁包,这样就不用执行额外的脚本,这些内容会在DGMGRL CLI MIGRATE命令时用到,最好的方式是目标CDB与源备库使用同一个oracle home目录
  • 目标CDB最好与源库安装的同样的Oracle组件

failover步骤:

  1. 停掉所有连接到故障pdb主备库的服务
  2. 连接dgmgrl到备库,使用sysdba角色
1
2
$ dgmgrl 
DGMGRL> connect >USERNAME> as sysdba
  1. 在dgmgrl命令行中执行failover操作。Note:这里的操作与之前的迁移命令一模一样
1
2
3
DGMGRL> migrate pluggable database <PDBNAME1> to container <dest-cdb-name> using '/<path>/<PDBNAME1>/<PDBNAME1.xml>' connect as <USERNAME>@"<dest-cdb-connect-identifer>";
-or-
DGMGRL> migrate pluggable database <PDBNAME1> to container <dest-cdb-name> using '/<path>/<PDBNAME1>/<PDBNAME1.xml>': # This will prompt for user name and password
  1. 当到目标CDB的连接建立以后,主要做了如下事情:
    1. 做了很多failover操作必要的校验
    2. 如果备库的redo apply运行则会停掉
    3. 将所有备库实例打开到read only状态
    4. 根据迁移命令指定的路径通过DBMS_PDB.DESCRIBE创建XML文件
    5. 停止源备库PDB的恢复
    6. 如果源备库本身不是ADG状态则还原成mount
    7. 通过备库的数据文件(NOCOPY)在目标CDB创建PDB,使用STANDBYS=NONE选项
    8. 在目标CDB的所有实例打开PDB
    9. 拔掉源主库的PDB,如果操作有报错则需要手动解决
    10. 如果拔下操作顺利,则通过KEEP DATAFILES的方式删掉PDB,这同样会删除所有源备库的pdb。
  2. 如果都顺利结束,则将之前的服务都连到新PDB上来

  3. 备份目标CDB的PDB允许恢复
  4. 开启目标CDB所有备库的恢复操作

因为Failover过程中使用的DGMGRL命令与之前迁移操作的基本一样,这里就不再一一赘述,全文完。

12c rac to rac 物理备库最佳实践

这篇文章主要目的是详细记录在12cR2环境中,配置主备都是rac环境的dataguard,下图列出了本次将要创建的环境

准备步骤说明

  • 通过vmware创建虚拟机
  • 总共4台,两台作为主库,两台作为备库
  • 主库安装12cR2 rac环境,创建实例
  • 备库安装12cR2 grid软件和database软件,不创建实例

测试环境介绍

画图的时候手误,scan hostname两节点是一致的

主库:

备库:

所用软件

  • VMWARE 14 pro
  • Oracle Grid Infrastructure 12c R2
  • Oracle Database 12c R2
  • Oracle Linux 7.6

Dataguard 配置

  • Protection mode: Maximum Performance
  • Fast-start failover: Disabled
  • Standby database type: Physical Standby

Rac环境中dg注意事项

rac环境中配置dg与单机中的配置有所不同:

  • 在Rman中执行duplicate database for standby from active database命令时,只会在备库创建一个单实例,你需要手动设置rac参数包括INSTANCE_NUMBER和INSTANCE_NAME等,在另一个主机上启用第二个实例,然后添加新创建的数据库到OCR中
  • INSTANCE_NUMBERINSTANCE_NAME参数必须在第二个实例当中手动修改
  • 不管主库或者备库都可以是单机的非集群环境
  • standby redo logfiles必须在共享磁盘上,每个备库实例必须配置归档standby redo log到同样目录
  • 如果没设置OMF,则LOG_ARCHIVE_FORMAT参数必须设置,需要加上%t或者T%来确保名称的唯一性

安装大纲

下面列出了安装的标准化步骤

这里默认主备库都已经安装好了所需环境,如果对12cR2 rac安装不清楚的可以参考我之前的文章:在vmware linux 7.5安装oracle 12c rac

准备主库环境

  • 开启归档
  • 配置ARCHIVELOG DELETION POLICY策略
  • 开启forced logging
  • 配置standby redo log files
  • 配置主库相关参数
  • 开启数据库闪回
  • 配置CONTROL_FILE_RECORD_KEEP_TIME参数
  • 配置tnsnames.ora

准备备库环境

  • 创建相关目录
  • 创建静态监听
  • 创建init.ora
  • 拷贝主库password文件
  • 配置tnsnames.ora

创建物理备库

  • 通过rman duplicate在线创建
  • 设置备库参数
  • 开启redo apply
  • 检查物理备库同步情况

检查环境

确保每个节点都能访问其他所有地址

主库

1
2
3
4
5
6
7
8
9
10
192.168.146.121  racnode1 racnode1.oracle.com
192.168.146.123 racnode1-vip

192.168.146.122 racnode2 racnode2.oracle.com
192.168.146.124 racnode2-vip

192.168.146.125 racnode-scan

10.10.10.121 racnode1-priv
10.10.10.122 racnode2-priv

备库

1
2
3
4
5
6
7
8
9
10
192.168.146.131  stbracnode1 stbracnode1.oracle.com
192.168.146.133 stbracnode1-vip

192.168.146.132 stbracnode2 stbracnode2.oracle.com
192.168.146.134 stbracnode2-vip

192.168.146.135 stbracnode-scan

10.10.10.131 stbracnode1-priv
10.10.10.132 stbracnode2-priv

crs状态

1
crs_stat -t

检查服务状态

1
srvctl status racnode -n xxx

准备主库环境

检查数据库是否是归档模式,如果不是,则设置成归档模式,归档日志存放在+FRA下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
sys@RAC1> select log_mode from v$database;

LOG_MODE
------------
NOARCHIVELOG

sys@RAC1> show parameter log_archive_dest_1

PARAMETER_NAME TYPE VALUE
------------------------------------------------------------ ----------- ---------------------------------------------------------------------------
log_archive_dest_1 string

ALTER SYSTEM SET LOG_ARCHIVE_DEST_1 = 'LOCATION=USE_DB_RECOVERY_FILE_DEST' scope=BOTH;

srvctl stop database -d rac -o immediate
srvctl start database -d rac -o mount

sqlplus / as sysdba
ALTER DATABASE ARCHIVELOG;

sys@RAC1> alter system switch logfile;

System altered.

sys@RAC1> select name from v$archived_log;

NAME
---------------------------------------------------------------------------
+FRA/RAC/ARCHIVELOG/2019_04_17/thread_1_seq_2.261.1005831111

开启数据库的强制日志选项

1
2
3
4
5
6
7
8
9
sys@RAC1> SELECT FORCE_LOGGING FROM V$DATABASE;

FORCE_LOGGING
---------------------------------------
NO

sys@RAC1> ALTER DATABASE FORCE LOGGING;

Database altered.

确定所需的standby redo logfile数量,可以通过以下公式得出

(maximum number of redo logfiles +1) * number of threads

根据下面语句得出的结果,可以看到需要(4+1)&#42;2=10组standby redo logfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT GROUP#, BYTES/1024/1024 MB FROM V$LOG ORDER BY 1,2;
col INSTANCE format a5

GROUP# MB
---------- ----------
1 200
2 200
3 200
4 200

sys@RAC1> sys@RAC1> SELECT DISTINCT THREAD#, INSTANCE FROM GV$THREAD ORDER BY 1,2;

THREAD# INSTA
---------- -----
1 rac1
2 rac2

创建standby redo logfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 '+FRA' SIZE 200M ;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 '+FRA' SIZE 200M ;

--检查
SELECT INST_ID, GROUP#,THREAD#,SEQUENCE#,ARCHIVED,STATUS
FROM GV$STANDBY_LOG ORDER BY 2,1;

INST_ID GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- ---------- --- ----------
1 5 1 0 YES UNASSIGNED
2 5 1 0 YES UNASSIGNED
1 6 1 0 YES UNASSIGNED
2 6 1 0 YES UNASSIGNED
1 7 1 0 YES UNASSIGNED
2 7 1 0 YES UNASSIGNED
1 8 1 0 YES UNASSIGNED
2 8 1 0 YES UNASSIGNED
1 9 1 0 YES UNASSIGNED
2 9 1 0 YES UNASSIGNED
1 10 2 0 YES UNASSIGNED
2 10 2 0 YES UNASSIGNED
1 11 2 0 YES UNASSIGNED
2 11 2 0 YES UNASSIGNED
1 12 2 0 YES UNASSIGNED
2 12 2 0 YES UNASSIGNED
1 13 2 0 YES UNASSIGNED
2 13 2 0 YES UNASSIGNED
1 14 2 0 YES UNASSIGNED
2 14 2 0 YES UNASSIGNED

在主集群上两台机器上均配置tnsnames.ora,这里备库只填写一号节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
RAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = racnode-scan)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = rac)
)
)

STBRAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = stbracnode1)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = rac)
)
)

在备集群一号节点上配置tnsnames.ora,添加下面的连接描述

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
RAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = racnode-scan)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = rac)
)
)

STBRAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = stbracnode1)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = rac)(UR=A) >>>>==== UR=A能让监听处于blocked等状态时客户端连接上来
)
)

在备集群二号节点上配置tnsnames.ora

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
RAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = racnode-scan)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = rac)
)
)

STBRAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = stbracnode2)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = rac)
)
)

在所有机器上测试连接

1
2
tnsping rac
tnsping stbrac

在主库上配置dataguard相关参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
show parameter DB_UNIQUE_NAME

# 注意参数大小写敏感:
ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='DG_CONFIG=(rac,stbrac)' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2= 'SERVICE=stbrac ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=stbrac' SCOPE=BOTH SID='*';

alter system set log_archive_dest_state_1 = ENABLE SCOPE=BOTH SID='*';
alter system set log_archive_dest_state_2 = ENABLE SCOPE=BOTH SID='*';

# 应该返回EXCLUSIVE
show parameter REMOTE_LOGIN_PASSWORDFILE

# FAL = fetch archive log
ALTER SYSTEM SET FAL_SERVER='stbrac' SCOPE=BOTH SID='*';
ALTER SYSTEM SET FAL_CLIENT='rac' SCOPE=BOTH SID='*';

# 如果主备库文件目录一致,则不用设置
show parameter DB_FILE_NAME_CONVERT
show parameter STANDBY_FILE_MANAGEMENT
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT='AUTO' SCOPE=BOTH SID='*';

# 设置ARCn进程最大值 ( default 4):
ALTER SYSTEM SET LOG_ARCHIVE_MAX_PROCESSES=8 SCOPE=BOTH SID='*';

主库开启数据库闪回

1
2
ALTER SYSTEM SET DB_FLASHBACK_RETENTION_TARGET =2880 scope=BOTH;
ALTER DATABASE FLASHBACK ON;

在所有节点配置sqlnet.ora,添加下面配置

1
NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

准备备库环境

在备集群的一号节点上,创建相关目录。因为都采用的ASM存储,所以要进到ASM目录下去创建

1
2
3
4
5
6
7
su - grid
asmcmd
cd DATA
mkdir stbrac

-- 在两个节点上都创建目录
mkdir -p /u01/app/oracle/admin/stbrac/{adump,cdump}

在stbracnode1上,创建initstbrac1.ora参数文件,先只用填写DB_NAME即可

1
echo 'DB_NAME=rac' > $ORACLE_HOME/dbs/initstbrac1.ora

在racnode1上,将密码文件复制到备集群的两个节点。密码文件目前存放在ASM当中,通过asmcmd工具拷贝到文件系统当中,然后传输到备集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
su - grid
asmcmd pwget --dbuniquename rac

asmcmd pwcopy +DATA/RAC/PASSWORD/pwdrac.256.1005822021 /tmp/orapwstbrac

# racnode1
scp /tmp/orapwstbrac stbracnode1:/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapwstbrac1
scp /tmp/orapwstbrac stbracnode2:/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapwstbrac2

# stbracnode1
chown oracle:oinstall /u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapwstbrac1

# stbracnode2
chown oracle:oinstall /u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapwstbrac2

在stbracnode1上创建静态监听

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
su - grid
cp listener.ora listener.ora.bak
vim listener.ora

--添加下面这段
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=rac)
(SID_NAME=stbrac1)
(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1)
)
)

-- 重启监听
srvctl stop listener
srvctl start listener
lsnrctl services | grep stbrac

创建备库

启动stbracnode1上的实例到nomount状态

1
2
3
4
5
6
7
su - oracle
sqlplus / as sysdba

export ORACLE_SID=stbrac1
startup nomount pfile='/u01/app/oracle/product/12.2.0/dbhome_1/dbs/initstbrac1.ora';

create spfile from pfile='/u01/app/oracle/product/12.2.0/dbhome_1/dbs/initstbrac1.ora';

通过rman duplicate来创建备库,这个命令可以从主库执行也可以从备库执行。

如果duplicate命令在关掉辅助实例以后失败了,你必须重新关闭和启动备库,而且重新执行之前要先删掉你所建好的spfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
rman target sys/oracle@rac auxiliary sys/oracle@stbrac

run {
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate auxiliary channel stbc1 type disk;
DUPLICATE TARGET DATABASE FOR STANDBY FROM ACTIVE DATABASE
SPFILE
set 'db_unique_name'='stbrac'
set instance_name='stbrac1'
set instance_number='1'
set db_create_online_log_dest_1='+FRA'
set db_create_online_log_dest_2='+DATA'
set db_recovery_file_dest='+FRA'
set audit_file_dest='/u01/app/oracle/admin/stbrac/adump'
set core_dump_dest='/u01/app/oracle/admin/stbrac/cdump'
nofilenamecheck
dorecover;
}

详细日志:点击这里

检查备库实例是否运行正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
sys@STBRAC1> SELECT NAME, DB_UNIQUE_NAME, DATABASE_ROLE, OPEN_MODE FROM GV$DATABASE;

NAME DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE
--------- ------------------------------ ---------------- --------------------
RAC stbrac PHYSICAL STANDBY MOUNTED

COL DATAFILE FORMAT a100
SELECT INST_ID, NAME DATAFILE FROM GV$DATAFILE;

INST_ID DATAFILE
---------- ----------------------------------------------------------------------------------------------------
1 +DATA/STBRAC/DATAFILE/system.263.1005926817
1 +DATA/STBRAC/DATAFILE/sysaux.260.1005926819
1 +DATA/STBRAC/DATAFILE/undotbs1.301.1005926851
1 +DATA/STBRAC/86B2264711E008FCE0537992A8C0E982/DATAFILE/system.290.1005926841
1 +DATA/STBRAC/86B2264711E008FCE0537992A8C0E982/DATAFILE/sysaux.291.1005926833
1 +DATA/STBRAC/DATAFILE/users.284.1005926855
1 +DATA/STBRAC/86B2264711E008FCE0537992A8C0E982/DATAFILE/undotbs1.262.1005926847
1 +DATA/STBRAC/DATAFILE/undotbs2.300.1005926855
1 +DATA/STBRAC/86B2433955FB2449E0537992A8C02B22/DATAFILE/system.264.1005926841
1 +DATA/STBRAC/86B2433955FB2449E0537992A8C02B22/DATAFILE/sysaux.293.1005926833
1 +DATA/STBRAC/86B2433955FB2449E0537992A8C02B22/DATAFILE/undotbs1.261.1005926849
1 +DATA/STBRAC/86B2433955FB2449E0537992A8C02B22/DATAFILE/undo_2.303.1005926849
1 +DATA/STBRAC/86B2433955FB2449E0537992A8C02B22/DATAFILE/users.283.1005926855
1 +DATA/STBRAC/DATAFILE/test.302.1005926851


sys@STBRAC1> SELECT INST_ID,GROUP#,THREAD#,SEQUENCE#,ARCHIVED,STATUS FROM GV$STANDBY_LOG ORDER BY 1,2,3;

INST_ID GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- ---------- --- ----------
1 5 1 0 YES UNASSIGNED
1 6 1 0 YES UNASSIGNED
1 7 1 0 YES UNASSIGNED
1 8 1 0 YES UNASSIGNED
1 9 1 0 YES UNASSIGNED
1 10 2 0 YES UNASSIGNED
1 11 2 0 YES UNASSIGNED
1 12 2 0 YES UNASSIGNED
1 13 2 0 YES UNASSIGNED
1 14 2 0 YES UNASSIGNED


sys@STBRAC1> SELECT TYPE, count(*) FROM GV$LOGFILE GROUP BY TYPE;

TYPE COUNT(*)
------- ----------
ONLINE 8
STANDBY 20

可以看到standby redo logfile 有很多重复,删除重复项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
column member format a50
select group#, member from v$logfile where type='STANDBY' order by 1,2;

GROUP# MEMBER
---------- --------------------------------------------------
5 +DATA/STBRAC/ONLINELOG/group_5.278.1005926909
5 +FRA/STBRAC/ONLINELOG/group_5.280.1005926905
6 +DATA/STBRAC/ONLINELOG/group_6.277.1005926915
6 +FRA/STBRAC/ONLINELOG/group_6.279.1005926911
7 +DATA/STBRAC/ONLINELOG/group_7.276.1005926921
7 +FRA/STBRAC/ONLINELOG/group_7.278.1005926917
8 +DATA/STBRAC/ONLINELOG/group_8.275.1005926925
8 +FRA/STBRAC/ONLINELOG/group_8.277.1005926923
9 +DATA/STBRAC/ONLINELOG/group_9.274.1005926931
9 +FRA/STBRAC/ONLINELOG/group_9.276.1005926929
10 +DATA/STBRAC/ONLINELOG/group_10.273.1005926937
10 +FRA/STBRAC/ONLINELOG/group_10.272.1005926935
11 +DATA/STBRAC/ONLINELOG/group_11.272.1005926943
11 +FRA/STBRAC/ONLINELOG/group_11.273.1005926939
12 +DATA/STBRAC/ONLINELOG/group_12.271.1005926947
12 +FRA/STBRAC/ONLINELOG/group_12.295.1005926945
13 +DATA/STBRAC/ONLINELOG/group_13.314.1005926953
13 +FRA/STBRAC/ONLINELOG/group_13.296.1005926951
14 +DATA/STBRAC/ONLINELOG/group_14.315.1005926959
14 +FRA/STBRAC/ONLINELOG/group_14.297.1005926957

set linesize 150
set pagesize 100
select 'alter database drop standby logfile member ''' || member || ''';'
from v$logfile where type='STANDBY' and member like '+DATA%';

'ALTERDATABASEDROPSTANDBYLOGFILEMEMBER'''||MEMBER||''';'
------------------------------------------------------------------------------------------------------------------------------------------------------
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_5.278.1005926909';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_6.277.1005926915';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_7.276.1005926921';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_8.275.1005926925';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_9.274.1005926931';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_10.273.1005926937';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_11.272.1005926943';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_12.271.1005926947';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_13.314.1005926953';
alter database drop standby logfile member '+DATA/STBRAC/ONLINELOG/group_14.315.1005926959';

-- 确认删除完毕
sys@STBRAC1> SELECT GROUP#, MEMBER FROM V$LOGFILE WHERE TYPE='STANDBY' ORDER BY 1,2;

GROUP# MEMBER
---------- --------------------------------------------------
5 +FRA/STBRAC/ONLINELOG/group_5.280.1005926905
6 +FRA/STBRAC/ONLINELOG/group_6.279.1005926911
7 +FRA/STBRAC/ONLINELOG/group_7.278.1005926917
8 +FRA/STBRAC/ONLINELOG/group_8.277.1005926923
9 +FRA/STBRAC/ONLINELOG/group_9.276.1005926929
10 +FRA/STBRAC/ONLINELOG/group_10.272.1005926935
11 +FRA/STBRAC/ONLINELOG/group_11.273.1005926939
12 +FRA/STBRAC/ONLINELOG/group_12.295.1005926945
13 +FRA/STBRAC/ONLINELOG/group_13.296.1005926951
14 +FRA/STBRAC/ONLINELOG/group_14.297.1005926957

设置备库的参数

1
2
3
4
5
ALTER SYSTEM SET FAL_SERVER=rac SCOPE=BOTH SID='*';
ALTER SYSTEM SET FAL_CLIENT=stbrac SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='DG_CONFIG=(rac,stbrac)' SID='*';
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='SERVICE=rac ASYNC DB_UNIQUE_NAME=rac VALID_FOR=(ONLINE_LOGFILE,PRIMARY_ROLE)' SCOPE=BOTH SID='*';

注册备库到ocr

1
2
3
4
su - oracle 
srvctl add database -d stbrac -o $ORACLE_HOME -role physical_standby -startoption mount -diskgroup DATA,FRA
srvctl add instance -d stbrac -i stbrac1 -n stbracnode1
srvctl add instance -d stbrac -i stbrac2 -n stbracnode2

将备库的spfile从文件系统中导入asm

通过pfile生成spfile

1
2
create pfile from spfile;
create spfile='+data/stbrac/PARAMETERFILE/spfilestbrac.ora' from pfile;

修改pfile指向spfile

1
2
3
4
5
echo "SPFILE='+data/stbrac/PARAMETERFILE/spfilestbrac.ora'" > $ORACLE_HOME/dbs/initstbrac1.ora
rm $ORACLE_HOME/dbs/spfilestbrac1.ora

-- stbracnode2
echo "SPFILE='+data/stbrac/PARAMETERFILE/spfilestbrac.ora'" > $ORACLE_HOME/dbs/initstbrac2.ora

更新OCR中的spfile信息

1
srvctl modify database -d stbrac -p +DATA/STBRAC/PARAMETERFILE/spfilestbrac.ora

重启备库实例

1
2
3
4
5
6
7
8
9
srvctl stop instance -d stbrac -i stbrac1
srvctl start instance -d stbrac -i stbrac1
srvctl status database -d stbrac

SQL> show parameter spfile;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +DATA/STBRAC/PARAMETERFILE/spfilestbrac.ora

修改spfile中的INSTANCE_NAMEINSTANCE_NUMBER参数

1
2
3
4
ALTER SYSTEM SET INSTANCE_NUMBER=1 SCOPE=SPFILE SID='stbrac1';
ALTER SYSTEM SET INSTANCE_NUMBER=2 SCOPE=SPFILE SID='stbrac2';
ALTER SYSTEM SET INSTANCE_NAME='stbrac1' SCOPE=SPFILE SID='stbrac1';
ALTER SYSTEM SET INSTANCE_NAME='stbrac2' SCOPE=SPFILE SID='stbrac2';

启动备库的第二个实例

1
2
srvctl start instance -d stbrac -i stbrac2
srvctl status database -d stbrac

开启备库的日志应用进程

1
alter database recover managed standby database using current logfile disconnect from session;

检查每个节点的dg配置情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select inst_id, database_role, db_unique_name instance, open_mode, protection_mode, protection_level from gv$database;

INST_ID DATABASE_ROLE INSTANCE OPEN_MODE PROTECTION_MODE PROTECTION_LEVEL
---------- ---------------- ------------------------------ -------------------- -------------------- --------------------
1 PHYSICAL STANDBY stbrac READ ONLY WITH APPLY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
2 PHYSICAL STANDBY stbrac READ ONLY WITH APPLY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE


select inst_id, database_mode, recovery_mode, protection_mode
from gv$archive_dest_status
where dest_name ='LOG_ARCHIVE_DEST_1';

INST_ID DATABASE_MODE RECOVERY_MODE PROTECTION_MODE
---------- --------------- ----------------------- --------------------
2 OPEN_READ-ONLY MANAGED REAL TIME APPLY MAXIMUM PERFORMANCE
1 OPEN_READ-ONLY MANAGED REAL TIME APPLY MAXIMUM PERFORMANCE

检查redo传输服务

1
2
3
4
5
6
7
8
9
10
11
alter session set nls_date_format ='yyyy-mm-dd hh24:mi:ss';
select inst_id, sequence#, applied, first_time, next_time
from gv$archived_log order by 2,1,4;

-- 主库建表
create table test as select * from dba_tables where 1=2;

-- 检查备库
SQL> select * from test;

no rows selected

备库检查dg统计信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
set linesize 300
col start_time format a20
col item format a20
select to_char(start_time, 'yyyy-mm-dd hh24:mi:ss') start_time, item , sofar, units
from v$recovery_progress
where item in ('Active Apply Rate', 'Average Apply Rate', 'Redo Applied');

START_TIME ITEM SOFAR UNITS
-------------------- -------------------- ---------- --------------------------------
2019-04-18 17:26:33 Active Apply Rate 1148 KB/sec
2019-04-18 17:26:33 Average Apply Rate 58 KB/sec
2019-04-18 17:26:33 Redo Applied 22 Megabytes



col name for a13
col value for a13
col unit for a30
set lines 132
select name, value, unit, time_computed
from v$dataguard_stats where name in ('transport lag', 'apply lag');

NAME VALUE UNIT TIME_COMPUTED
------------- ------------- ------------------------------ ------------------------------
transport lag +00 00:00:00 day(2) to second(0) interval 04/18/2019 17:33:57
apply lag +00 00:00:00 day(2) to second(0) interval 04/18/2019 17:33:57

优化配置

配置主库归档日志删除策略

1
2
rman target /
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

启用备库的闪回

1
2
3
4
alter system set db_flashback_retention_target =2880;
alter database recover managed standby database cancel;
alter database flashback on;
alter database recover managed standby database using current logfile disconnect from session;

配置dg broker

查看系统现有dg broker配置

1
2
3
4
5
6
7
sys@RAC1> show parameter dg_broker

PARAMETER_NAME TYPE VALUE
------------------------------------------------------------ ----------- -------------------------------------------------------------------------------------
dg_broker_config_file1 string /u01/app/oracle/product/12.2.0/dbhome_1/dbs/dr1rac.dat
dg_broker_config_file2 string /u01/app/oracle/product/12.2.0/dbhome_1/dbs/dr2rac.dat
dg_broker_start boolean FALSE

rac环境中dgbroker的配置文件要存放在共享目录上,在ASM上新建一个目录

1
2
3
su - grid
ASMCMD>cd +data/rac/
ASMCMD>mkdir DGCONFIG

停掉备库apply

1
alter database recover managed standby database cancel;

修改主库参数

1
2
3
4
alter system set LOG_ARCHIVE_DEST_2='' scope=both;
ALTER SYSTEM SET DG_BROKER_CONFIG_FILE1='+DATA/RAC/DGCONFIG/dr1rac.dat' SCOPE=spfile sid='*';
ALTER SYSTEM SET DG_BROKER_CONFIG_FILE2='+DATA/RAC/DGCONFIG/dr2rac.dat' SCOPE=spfile sid='*';
alter system set dg_broker_start=true scope=both sid='*';

同时也修改备库

1
2
3
4
5
6
7
8
su - grid
ASMCMD>cd +data/stbrac/
ASMCMD>mkdir DGCONFIG

alter system set LOG_ARCHIVE_DEST_2='' scope=both;
ALTER SYSTEM SET DG_BROKER_CONFIG_FILE1='+DATA/STBRAC/DGCONFIG/dr1stbrac.dat' SCOPE=spfile sid='*';
ALTER SYSTEM SET DG_BROKER_CONFIG_FILE2='+DATA/STBRAC/DGCONFIG/dr2stbrac.dat' SCOPE=spfile sid='*';
alter system set dg_broker_start=true scope=both sid='*';

重启主备库

1
2
3
4
5
srvctl stop database -d rac
srvctl start database -d rac

srvctl stop database -d stbrac
srvctl start database -d stbrac

创建dgbroker配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
su - oracle
dgmgrl>
create configuration 'raccfg' as primary database is 'rac' connect identifier is 'rac';
add database 'stbrac' as connect identifier is 'stbrac';
ENABLE configuration;
enable database 'rac';
enable database 'stbrac';


DGMGRL> show configuration

Configuration - raccfg

Protection Mode: MaxPerformance
Members:
rac - Primary database
stbrac - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS (status updated 17 seconds ago)

测试通过dgbroker做swithover切换操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
DGMGRL> switchover to 'stbrac';
Performing switchover NOW, please wait...
Operation requires a connection to database "stbrac"
Connecting ...
Connected to "stbrac"
Connected as SYSDBA.
New primary database "stbrac" is opening...
Oracle Clusterware is restarting database "rac" ...
Connected to "rac"
Connected to "rac"
Switchover succeeded, new primary is "stbrac"
DGMGRL> show configuration;

Configuration - raccfg

Protection Mode: MaxPerformance
Members:
stbrac - Primary database
rac - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS (status updated 110 seconds ago)

--重新swithover回来
DGMGRL> switchover to 'rac';
Performing switchover NOW, please wait...
Operation requires a connection to database "rac"
Connecting ...
Connected to "rac"
Connected as SYSDBA.
New primary database "rac" is opening...
Oracle Clusterware is restarting database "stbrac" ...
Switchover succeeded, new primary is "rac"

至此整个安装配置过程结束。

处理一个dg gap同步问题

有一套10g的DG发现不同步,备库显示MRP进程正在等待gap的归档日志,需要从387105-387676

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;

PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
ARCH CLOSING 1 387677 2033665 1047
ARCH CLOSING 1 387676 2033665 1048
MRP0 WAIT_FOR_GAP 1 387105 0 0
RFS IDLE 0 0 0 0
RFS IDLE 1 387678 638776 1

SQL> SELECT max(sequence#) from v$archived_log where applied='YES';

MAX(SEQUENCE#)
--------------
387104

而主库的归档日志已经被删除,则只能通过增量备份的方式

根据当前scn对主库进行增量备份

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- 查询备库当前scn
SQL> select to_char(current_scn) from v$database;

TO_CHAR(CURRENT_SCN)
----------------------------------------
9164156262

run{
allocate channel t1 type disk;
allocate channel t2 type disk;
allocate channel t3 type disk;
allocate channel t4 type disk;
allocate channel t5 type disk;
allocate channel t6 type disk;
backup INCREMENTAL from scn 9164156262 database format '/u01/arch/incre_%U';
release channel t1;
release channel t2;
release channel t3;
release channel t4;
release channel t5;
release channel t6;
}

创建备库控制文件

1
2
3
SQL> alter database create standby controlfile as '/u01/arch/standby.ctl';

Database altered.

然后替换掉备库的控制文件

1
2
3
SQL> startup nomount

RMAN> restore standby controlfile from '/u01/backup/standby.ctl';

备库恢复增量备份

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
RMAN> catalog start with '/u01/backup';

searching for all files that match the pattern /u01/backup

List of Files Unknown to the Database
=====================================
File Name: /u01/backup/incre_1itso2at_1_1
File Name: /u01/backup/incre_1jtso2av_1_1
File Name: /u01/backup/incre_1htso1vf_1_1
File Name: /u01/backup/incre_1etso1vf_1_1
File Name: /u01/backup/incre_1ftso1vf_1_1
File Name: /u01/backup/incre_1ctso1vf_1_1
File Name: /u01/backup/incre_1dtso1vf_1_1
File Name: /u01/backup/incre_1gtso1vf_1_1

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/backup/incre_1itso2at_1_1
File Name: /u01/backup/incre_1jtso2av_1_1
File Name: /u01/backup/incre_1htso1vf_1_1
File Name: /u01/backup/incre_1etso1vf_1_1
File Name: /u01/backup/incre_1ftso1vf_1_1
File Name: /u01/backup/incre_1ctso1vf_1_1
File Name: /u01/backup/incre_1dtso1vf_1_1
File Name: /u01/backup/incre_1gtso1vf_1_1

RMAN> recover database noredo;

备库应用日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.

-- 备库
SQL> SELECT max(sequence#) from v$archived_log where applied='YES';

MAX(SEQUENCE#)
--------------
387684 >>>>====现在与主库一致

SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;

PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
ARCH CLOSING 1 387684 1 1004
ARCH CLOSING 1 387683 30721 1068
RFS IDLE 0 0 0 0
RFS IDLE 1 387685 944650 1
MRP0 APPLYING_LOG 1 387685 944649 2097152 >>>>====状态正常

基本上碰到gap处理的步骤就是这样简单几步就行。