ORA-00445: background process W003 did not start after 120 seconds

一个11g的数据库出现报错,根据字面判断可能是某个slave进程启动失败。这种ORA-00455错误通常表示在操作系统层面为了响应某种请求而去生成一个新的进程时因为某种原因导致失败,最有可能的原因一般是由于操作系统资源不足或者配置错误,所以这个错误的解决途径通常是从操作系统层面入手,但是也有部分情况是与oracle有关的。

这里显示的120s超时可以通过设置event事件来进行动态修改

1
2
3
4
$ sqlplus / as sysdba
alter system set events '10281 trace name context forever, level xxx';
-- where xxxxxx is the number of seconds to timeout at.
eg: alter system set events '10281 trace name context forever, level 300';

检查相关日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Memory (Avail / Total) = 221.27M / 32181.19M
Swap (Avail / Total) = 47847.41M / 49151.99M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 6755 1 0 80 0 - 3206692 semtim 10:21 ? 00:00:00 ora_w003_xxxxx

*** 2020-07-02 10:23:53.079
Stack:
[Thread debugging using libthread_db enabled]
0x0000003595ceb197 in semop () from /lib64/libc.so.6
#0 0x0000003595ceb197 in semop () from /lib64/libc.so.6
#1 0x0000000009825453 in sskgpwwait ()
#2 0x00000000098241b0 in skgpwwait ()
#3 0x0000000009411825 in kslges ()
#4 0x00000000094111be in kslgetl ()
#5 0x000000000940e60a in ksfglt ()
#6 0x00000000098097e9 in kghalo ()
#7 0x0000000000b3f9f2 in ksp_param_handle_alloc ()
#8 0x0000000000b3eec0 in kspcrec ()
#9 0x0000000000ba8e2c in ksucre ()
#10 0x0000000000bdd2c4 in ksvrdp ()
#11 0x00000000025a57ea in opirip ()
#12 0x0000000001850509 in opidrv ()
#13 0x0000000001e27b37 in sou2o ()
#14 0x0000000000a298d6 in opimai_real ()
#15 0x0000000001e2de55 in ssthrdmain ()
#16 0x0000000000a297cd in main ()
A debugging session is active.
Inferior 1 [process 6755] will be detached.
Quit anyway? (y or n) [answered Y; input not from terminal]

-------------------------------------------------------------------------------
Process diagnostic dump actual duration=8.680000 sec
(max dump time=30.000000 sec)

*** 2020-07-02 10:23:53.079
Killing process (ospid 6755): (reason=KSOREQ_WAIT_CANCELLED error=0)
... and the process is still alive after kill!

通过这里可以看到是由于内存不足的情况导致创建子进程W003失败,于是经过了120s超时的上限以后,调度过程将子任务终止并报错。

这里的主进程是SMCO,引用官方说明

SMCO协调以下空间管理任务。 它执行主动的空间分配和空间回收。 它动态产生从属进程(Wnnn)来执行任务。

  • 表空间级空间(扩展)的预分配

    这里的预分配是指数据文件扩展,当通常通过插入/加载到段的空间请求(扩展分配)操作在表空间中找不到连续空间时发生数据文件扩展,会话将通过下一个extent来扩展数据文件,并将继续进行空间请求或范围分配。

    为了使SMCO自动扩展数据文件,应将数据文件的AUTOEXTEND设置为ON。 SMCO决定根据历史记录扩展表空间,扩展在表空间中的所有数据文件中平均分配,这些数据文件尚未达到其最大大小,并且在一小时的SMCO唤醒中仍限制为整个表空间大小的10%。

    (完整的表空间大小=任何给定时间实例的数据文件大小总和。)

除了上述任务外,SMCO流程还负责执行以下任务

  • 添加extent后,为本地管理的表空间更新SEG$中的块和extent计数(来自未发布的Bug 12940620)
  • 加密lob段预扩展
  • 加密lob段内存分配器空间的预分配.
  • 加密lob段空间回收
  • 临时段空间回收

这种好处是会话无需等待被动的空间分配,因为这是主动完成的,所以可以提高性能

启用和关闭SMCO,不是核心进程,可以通过这种方式进行重启

1
2
3
4
ALTER SYSTEM SET "_ENABLE_SPACE_PREALLOCATION" = 0;

# 默认值
ALTER SYSTEM SET "_ENABLE_SPACE_PREALLOCATION" = 3;

其他value

1
2
3
4
* 0 to turn off the tbs pre-extension feature.
* 1 To enable tablespace extension.
* 2 To enable segment growth.
* 4 To enable chunk allocation.

不同版本之间的EXPDP/IMPDP

故障现象

将一个18c版本的dmp导入到11.2.0.4当中时出现报错

1
2
ORA-39000: bad dump file specification
ORA-39142: incompatible version number 5.1 in dump file "/u01/schema_dump.dmp"

这是由于高版本导出的dmp文件在低版本数据库当中无法识别。

解决办法:

在导出语句里加上version=11.2即可

1
2
Example
expdp scott/tiger@orcl directory=EXPIMP schemas=scott Version=11.2 dumpfile=Exp_Scott.dmp logfile=Exp_Scott.log

跨版本DATAPUMP

兼容性

依据数据泵的兼容性有一些准则:

  1. 数据泵dmp文件的兼容级别是由源数据库的兼容级别决定的
  2. 使用与源数据库同版本的expdp客户端
  3. 使用与目标数据库同版本的impdp客户端
  4. 当目标库的compatibility级别比源库低时,使用expdp的VERSION参数
  5. 即使两边兼容级别不一致,也可以基于database link的数据传输
  6. impdp总是可以读取从更低版本库中导出的dmp文件
  7. imp只能读取exp的dmp,impdp只能读取expdp的dmp

Version

数据泵可以用来不同版本数据库之间的数据迁移,通常通过expdp当中的version参数来完成,带上这个参数后就会生成与指定版本兼容的数据dmp集。

1
VERSION=[COMPATIBLE | LATEST | version_string]

version总共有三种值

  • COMPATIBLE - 这是默认值,元数据的版本与数据库初始化参数COMPATIBLE相对应,这个参数必须被设成9.2或更新
  • LATEST - 元数据的版本和生成的SQL DDL对于数据库的发行版本,不用管兼容参数如何
  • version_string - 指定的数据库版本

如果数据库里的对象或者某些特性与指定的version参数有冲突,则冲突的对象不会被导出。比如12.2中表名的长度限制要比11.2中大,所以对于version=11.2的情况,12.2中对象名称过长的不会被导出。

数据泵的dmp兼容性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
源数据库兼容参数		目标库兼容参数

COMPATIBLE 10.1.0.x.y 10.2.0.x.y 11.1.0.x.y 11.2.0.x.y 12.1.0.x.y 12.2.0.x.y 18.x.y.z 19.x.y.z
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
10.1.0.x.y - - - - - - - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
10.2.0.x.y VERSION=10.1 - - - - - - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
11.1.0.x.y VERSION=10.1 VERSION=10.2 - - - - - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
11.2.0.x.y VERSION=10.1 VERSION=10.2 VERSION=11.1 - - - - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
12.1.0.x.y VERSION=10.1 VERSION=10.2 VERSION=11.1 VERSION=11.2 - - - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
12.2.0.x.y VERSION=10.1 VERSION=10.2 VERSION=11.1 VERSION=11.2 VERSION=12.1 - - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
18.x.y.z VERSION=10.1 VERSION=10.2 VERSION=11.1 VERSION=11.2 VERSION=12.1 VERSION=12.2 - -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------
19.x.y.z VERSION=10.1 VERSION=10.2 VERSION=11.1 VERSION=11.2 VERSION=12.1 VERSION=12.2 VERSION=18.x -
---------- ------------- ------------- ------------- ------------- ------------- ------------- ------------ ------------

数据泵client/server兼容性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Data Pump client compatibility.
===============================

连接数据库版本
数据泵客户端版本 10gR1 10gR2 11gR1 11gR2 12cR1 12cR2 18c 19c
version 10.1.0.x 10.2.0.x 11.1.0.x 11.2.0.x 12.1.0.x 12.2.0.x 18.x.y.z 19.x.y.z
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10.1.0.x supported supported supported supported no no no no
10.2.0.x no supported supported supported supported no no no
11.1.0.x no supported supported supported supported no no no
11.2.0.x no no no supported supported supported supported supported
12.1.0.x no no no no supported supported supported supported
12.2.0.x no no no no no supported supported supported
18.x.y.z no no no no no supported supported supported
19.x.y.z no no no no no no no supported

数据泵dmp文件版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Data Pump file version.
=======================

Version Written by Can be imported into Target:
Data Pump database with 10gR1 10gR2 11gR1 11gR2 12cR1 12cR2 18c 19c
Dumpfile Set compatibility 10.1.0.x 10.2.0.x 11.1.0.x 11.2.0.x 12.1.0.x 12.2.0.x 18.x.y.z 19.x.y.z
------------ --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
0.1 10.1.x supported supported supported supported supported supported supported supported
1.1 10.2.x no supported supported supported supported supported supported supported
2.1 11.1.x no no supported supported supported supported supported supported
3.1 11.2.x no no no supported supported supported supported supported
4.1 12.1.x no no no no supported supported supported supported
5.1 12.2.x no no no no no supported supported supported
5.1 18.x.y no no no no no no supported supported
5.1 19.x.y no no no no no no no supported

数据泵特性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Data Pump New Features.
=======================

Version New Feature Remark:
--------- --------------------------------- ----------------------------------------------
10.1.0.1 Data Pump Technology Introduction of high-speed data movement.
---------
10.2.0.1 COMPRESSION Compress metadata in export dumpfile.
10.2.0.1 ENCRYPTION_PASSWORD Allows encrypted column data in dumpfile.
10.2.0.1 SAMPLE Specify a percentage of data to be unloaded.
10.2.0.1 TRANSFORM Change DDL for OID's and space allocation.
10.2.0.1 VERSION Create file compatible with earlier release.
---------
11.1.0.1 COMPRESSION Both data and metadata can be compressed.
11.1.0.1 DATA_OPTIONS (expdp) Specify handling of XMLType columns.
11.1.0.1 DATA_OPTIONS (impdp) Specify handling of constraint violations.
11.1.0.1 ENCRYPTION Both data and metadata can be encrypted.
11.1.0.1 ENCRYPTION_ALGORITHM Specify a specific encryption algorithm.
11.1.0.1 ENCRYPTION_MODE Specify the type of security to be used.
11.1.0.1 PARTITION_OPTIONS Specify how to handle partitioned tables.
11.1.0.1 REMAP_DATA Change column values based on a function.
11.1.0.1 REMAP_TABLE Rename tables during an import operation.
11.1.0.1 REUSE_DUMPFILES Option to overwrite existing dumpfiles.
11.1.0.1 TRANSPORTABLE Transfer table data by copying datafiles.
---------
11.2.0.1 Legacy mode: accept exp and imp parameters.
11.2.0.1 ABORT_STEP Stop job after initializing to query master.
11.2.0.1 ACCESS_METHOD Specify a particular method to (un)load data.
11.2.0.1 CLUSTER Control whether workers use all RAC instances.
11.2.0.1 DATA_OPTIONS (impdp) Specify to disable the APPEND hint.
11.2.0.1 KEEP_MASTER Specify whether to retain master table.
11.2.0.1 MASTER_ONLY Only import the master table.
11.2.0.1 METRICS Report additional information in logfile.
11.2.0.1 SERVICE_NAME Use with CLUSTER to specify a service name.
11.2.0.1 SOURCE_EDITION (expdp) Specify the edition from which to exp objects.
11.2.0.1 TABLES Now specify tables in multiple schemas.
11.2.0.1 TABLES Now specify % for multiple tables and part.
11.2.0.1 TARGET_EDITION (impdp) Specify the edition from which to exp objects.
11.2.0.2 Default first segment now 8M for part. table.
11.2.0.2 TRANSFORM Specify how to handle SEGMENT CREATION.
---------
12.1.0.1 Support for CDB and Pluggable db's (PDB).
12.1.0.1 COMPRESSION_ALGORITHM Specify the algorithm when compressing data.
12.1.0.1 ENCRYPTION_PWD_PROMPT Specify whether Data to prompt for password.
12.1.0.1 FULL Can now be used together with TRANSPORTABLE.
12.1.0.1 LOGTIME Provide timestamp for messages in the logfile.
12.1.0.1 TRANSFORM Specify to disable logging during import.
12.1.0.1 TRANSFORM Specify to change the LOB storing on import.
12.1.0.1 TRANSFORM Specify to change table compression type.
12.1.0.1 VIEWS_AS_TABLES Export views as tables.
---------
12.2.0.1
12.2.0.1 REMAP_DIRECTORY Let you remap directories when you move databases between platforms.
12.2.0.1 TRUST_EXISTING_TABLE_PARTITIONS Enable data from multiple partitions to be loaded in parallel into a pre-existing table. This is a flag on Data Pump Import
DATA_OPTIONS
12.2.0.1 VALIDATE_TABLE_DATA Verify the format number and date data types in table data columns. This is a flag on Data Pump Import DATA_OPTIONS
12.2.0.1 ENABLE_NETWORK_COMPRESSION Tell Data Pump to compress data before sending it over the network. This is a flag on DATA_OPTIONS parameter.
12.2.0.1 GROUP_PARTITION_TABLE_DATA Enable data for all partitions to be loaded at once and in parallel. This is a flag on the Data Pump Export DATA_OPTIONS parameter.
12.2.0.1 VERIFY_STREAM_FORMAT Validate the format of a data stream before it is written to the Data Pump dump file. This is a flag on the Data Pump Export
DATA_OPTIONS parameter.
---------
18.1.X.X CONTINUE_LOAD_ON_FORMAT_ERROR This is a new value for the DATA_OPTIONS parameter for impdp. When it is set, Data Pump jumps ahead and continue loading from the
next granule when an inconsistency is found.
---------
19c Use Document 2457955.1 19c DataPump New Features are described in this document.

介绍

确定数据库的兼容版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
col value for a20
col description for a49
select * from database_compatible_level;

VALUE DESCRIPTION
-------------------- -------------------------------------------------
18.0.0 Database will be completely compatible with this
software version

-- 或者:
sys@> show parameter compatible

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
compatible string 18.0.0
noncdb_compatible boolean FALSE

检查数据库和数据泵的版本

1
2
3
4
5
6
7
8
9
10
11
12
sys@> select banner_full from v$version;

BANNER_FULL
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.5.0.0.0

-- client版本
[oracle@]$ expdp HELP=Y

Export: Release 18.0.0.0.0
Version 18.5.0.0.0

获取dmp文件的兼容版本

每个生成的dmp文件都会有一个头部信息(通常是4kb大小)包含了这个dmp文件的详细信息,从10.2版本开始,可以通过DBMS_DATAPUMP.GET_DUMPFILE_INFO过程来获取这些详细信息,但是这个过程生成的信息太过繁杂,为了便于阅读,通过一个自定义的存储过程SHOW_DUMPFILE_INFO.sql来调用然后输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
SET serveroutput on SIZE 1000000
exec show_dumpfile_info(p_dir=> 'xx', p_file=> 'xx.dmp');

----------------------------------------------------------------------------
Purpose..: Obtain details about export dumpfile. Version: 18-DEC-2013
Required.: RDBMS version: 10.2.0.1.0 or higher
. Export dumpfile version: 7.3.4.0.0 or higher
. Export Data Pump dumpfile version: 10.1.0.1.0 or higher
Usage....: execute show_dumfile_info('DIRECTORY', 'DUMPFILE');
Example..: exec show_dumfile_info('MY_DIR', 'expdp_s.dmp')
----------------------------------------------------------------------------
Filename.: xx.dmp
Directory: xx
Disk Path: /backup/xxxx
Filetype.: 1 (Export Data Pump dumpfile)
----------------------------------------------------------------------------
...Database Job Version..........: 18.00.00.00.00
...Internal Dump File Version....: 5.1
...Creation Date.................: Sun ---
...File Number (in dump file set): 1
...Master Present in dump file...: 1 (Yes)
...Master in how many dump files.: 1
...Master Piece Number in file...: 1
...Operating System of source db.: x86_64/Linux 2.4.xx
...Instance Name of source db....: ------
...Characterset ID of source db..: 852 (ZHS16GBK)
...Language Name of characterset.: ZHS16GBK
...Job Name......................: "SYSTEM"."SYS_EXPORT_SCHEMA_01"
...GUID (unique job identifier)..: A28977F60899EE30E053D326680AD984
...Block size dump file (bytes)..: 4096
...Metadata Compressed...........: 1 (Yes)
...Data Compressed...............: 1 (Yes)
...Compression Algorithm.........: 3 (Basic)
...Metadata Encrypted............: 0 (No)
...Table Data Encrypted..........: 0 (No)
...Column Data Encrypted.........: 0 (No)
...Encryption Mode...............: 2 (None)
...Internal Flag Values..........: 518
...Max Items Code (Info Items)...: 24
----------------------------------------------------------------------------

PL/SQL procedure successfully completed.

根据上面得到的结果可以看出这个生成的文件兼容参数为18.0(18.00.00.00.00),内部文件版本为5.1,这个文件只能导入到兼容版本高于或等于18.0的数据库。

ORA-15001: diskgroup "FRA" does not exist or is not mounted

一台数据库在启动时候报错,提示一个asm group不存在

1
2
3
4
5
6
7
8
9
10
11
12
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+FRA/STBRAC/CONTROLFILE/current.294.1005908473'
ORA-17503: ksfdopn:2 Failed to open file +FRA/STBRAC/CONTROLFILE/current.294.1005908473
ORA-15001: diskgroup "FRA" does not exist or is not mounted
ORA-15040: diskgroup is incomplete
ORA-205 signalled during: alter database mount...


WARNING: failed to open a disk[/dev/asmdiskd]
ORA-15025: could not open disk "/dev/asmdiskd"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied

检查asm的磁盘等信息均正常

1
2
3
4
5
6
7
8
9
10
11
[grid@stbracnode1 ~]$ asmcmd
ASMCMD> lsdg
State Type Rebal Sector Logical_Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 512 4096 4194304 51200 16784 0 16784 0 Y CRS/
MOUNTED EXTERN N 512 512 4096 4194304 61440 55336 0 55336 0 N DATA/
MOUNTED EXTERN N 512 512 4096 4194304 20480 17404 0 17404 0 N FRA/

[grid@stbracnode1 ~]$ ll /dev/asm*
brw-rw---- 1 grid asmadmin 8, 16 Apr 18 13:22 /dev/asmdiskb
brw-rw---- 1 grid asmadmin 8, 32 Apr 18 13:22 /dev/asmdiskc
brw-rw---- 1 grid asmadmin 8, 48 Apr 18 13:22 /dev/asmdiskd

查询MOS说是权限出现了问题,ORACLE_HOME目录下的oracle二进制文件的属主应该是asmadmin,而不是oinstall。

1
2
[oracle@ ~]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle oinstall 407944928 Apr 16 15:53 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

查看其它正常的asm节点

1
2
[oracle@ dbs]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 407944928 Apr 16 15:48 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

这台集群刚刚进行了grid和database的补丁升级,应该是由于这个原因导致属主进行了变更

知道问题原因以后,解决办法就简单了,切换到grid用户下

1
2
3
4
setasmgidwrap o=/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

[grid@ ~]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 407944928 Apr 16 15:56 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

重新启动数据库,问题解决

1
2
3
4
5
6
7
8
9
SQL> startup mount
ORACLE instance started.

Total System Global Area 2516582400 bytes
Fixed Size 8795904 bytes
Variable Size 754977024 bytes
Database Buffers 1744830464 bytes
Redo Buffers 7979008 bytes
Database mounted.

ORA-00845: MEMORY_TARGET not supported on this system

12.2安装grid完毕后执行root.sh时报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
[root@racnode1 ~]# /u01/app/12.2.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/12.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racnode1/crsconfig/rootcrs_racnode1_2019-04-11_10-21-10AM.log
2019/04/11 10:21:12 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2019/04/11 10:21:12 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2019/04/11 10:21:46 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2019/04/11 10:21:46 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2019/04/11 10:21:52 CLSRSC-363: User ignored prerequisites during installation
2019/04/11 10:21:52 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2019/04/11 10:21:54 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2019/04/11 10:21:55 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2019/04/11 10:22:08 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2019/04/11 10:22:09 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2019/04/11 10:22:09 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2019/04/11 10:23:00 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2019/04/11 10:23:12 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2019/04/11 10:23:12 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2019/04/11 10:23:18 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2019/04/11 10:23:34 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2019/04/11 10:23:55 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2019/04/11 10:24:02 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2019/04/11 10:24:24 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2019/04/11 10:24:36 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'racnode1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode1'
CRS-2676: Start of 'ora.mdnsd' on 'racnode1' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode1'
CRS-2676: Start of 'ora.gpnpd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode1'
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode1'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode1'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode1'
CRS-2676: Start of 'ora.diskmon' on 'racnode1' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode1' succeeded

ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-190411AM102519.log for details.

2019/04/11 10:25:41 CLSRSC-184: Configuration of ASM failed
2019/04/11 10:25:47 CLSRSC-258: Failed to configure and start ASM
Died at /u01/app/12.2.0/grid/crs/install/crsinstall.pm line 2091.
The command '/u01/app/12.2.0/grid/perl/bin/perl -I/u01/app/12.2.0/grid/perl/lib -I/u01/app/12.2.0/grid/crs/install /u01/app/12.2.0/grid/crs/install/rootcrs.pl ' execution failed

查看日志有报错

1
2
3
4
5
[main] [ 2019-04-11 10:25:29.659 CST ] [OracleHome.getVersion:1152]  Current version from sqlplus: 12.2.0.1.0
[main] [ 2019-04-11 10:25:29.659 CST ] [UsmcaLogger.logInfo:156] Role SYSASM
[main] [ 2019-04-11 10:25:29.659 CST ] [UsmcaLogger.logInfo:156] OS Auth true
[main] [ 2019-04-11 10:25:40.692 CST ] [SQLEngine.done:2314] Done called
[main] [ 2019-04-11 10:25:40.695 CST ] [USMInstance.configureLocalASM:3367] ORA-00845: MEMORY_TARGET not supported on this system

这个错误以前启动数据库的时候碰见过,就是/dev/shm太小了,这个值至少要比MAX_MEMORY_TARGET大。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@racnode1 trace]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 980M 0 980M 0% /dev
tmpfs 993M 0 993M 0% /dev/shm
tmpfs 993M 8.9M 984M 1% /run
tmpfs 993M 0 993M 0% /sys/fs/cgroup
/dev/mapper/ol-root 58G 11G 48G 18% /
/dev/sda1 253M 139M 115M 55% /boot
tmpfs 199M 0 199M 0% /run/user/0

[root@racnode1 trace]# mount -o remount,size=4g /dev/shm
[root@racnode1 trace]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 980M 0 980M 0% /dev
tmpfs 4.0G 0 4.0G 0% /dev/shm
tmpfs 993M 8.9M 984M 1% /run
tmpfs 993M 0 993M 0% /sys/fs/cgroup
/dev/mapper/ol-root 58G 11G 48G 18% /
/dev/sda1 253M 139M 115M 55% /boot
tmpfs 199M 0 199M 0% /run/user/0

-- 添加到/etc/fstab

tmpfs /dev/shm tmpfs defaults,size=4G 0 0

卸载已安装的crs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[root@racnode1 trace]# /u01/app/12.2.0/grid/crs/install/rootcrs.sh  -deconfig -force
Using configuration parameter file: /u01/app/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racnode1/crsconfig/crsdeconfig_racnode1_2019-04-11_11-06-32AM.log
PRCR-1070 : Failed to check if resource ora.net1.network is registered
CRS-0184 : Cannot communicate with the CRS daemon.
PRCR-1070 : Failed to check if resource ora.helper is registered
CRS-0184 : Cannot communicate with the CRS daemon.
PRCR-1070 : Failed to check if resource ora.ons is registered
CRS-0184 : Cannot communicate with the CRS daemon.

2019/04/11 11:06:45 CLSRSC-180: An error occurred while executing the command '/u01/app/12.2.0/grid/bin/srvctl config nodeapps'
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.evmd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'racnode1'
CRS-2679: Attempting to clean 'ora.asm' on 'racnode1'
CRS-2677: Stop of 'ora.evmd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'racnode1'
CRS-2677: Stop of 'ora.mdnsd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.cssd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'racnode1'
CRS-2677: Stop of 'ora.gpnpd' on 'racnode1' succeeded
CRS-2681: Clean of 'ora.asm' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'racnode1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2019/04/11 11:07:16 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.
2019/04/11 11:09:01 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.
2019/04/11 11:09:02 CLSRSC-336: Successfully deconfigured Oracle Clusterware stack on this node

重新执行root.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
[root@racnode1 trace]# /u01/app/12.2.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/12.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racnode1/crsconfig/rootcrs_racnode1_2019-04-11_11-09-55AM.log
2019/04/11 11:09:58 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2019/04/11 11:09:58 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2019/04/11 11:10:29 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2019/04/11 11:10:29 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2019/04/11 11:10:34 CLSRSC-363: User ignored prerequisites during installation
2019/04/11 11:10:34 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2019/04/11 11:10:37 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2019/04/11 11:10:38 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2019/04/11 11:10:46 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2019/04/11 11:10:46 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2019/04/11 11:10:46 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2019/04/11 11:11:24 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2019/04/11 11:11:33 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2019/04/11 11:11:33 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2019/04/11 11:11:39 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2019/04/11 11:11:55 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2019/04/11 11:12:15 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2019/04/11 11:12:20 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2019/04/11 11:12:42 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2019/04/11 11:12:48 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'racnode1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode1'
CRS-2676: Start of 'ora.evmd' on 'racnode1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode1'
CRS-2676: Start of 'ora.gpnpd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode1'
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode1'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode1'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode1'
CRS-2676: Start of 'ora.diskmon' on 'racnode1' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode1' succeeded

Disk groups created successfully. Check /u01/app/grid/cfgtoollogs/asmca/asmca-190411AM111328.log for details.


2019/04/11 11:20:23 CLSRSC-482: Running command: '/u01/app/12.2.0/grid/bin/ocrconfig -upgrade grid oinstall'
CRS-2672: Attempting to start 'ora.crf' on 'racnode1'
CRS-2672: Attempting to start 'ora.storage' on 'racnode1'
CRS-2676: Start of 'ora.storage' on 'racnode1' succeeded
CRS-2676: Start of 'ora.crf' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode1'
CRS-2676: Start of 'ora.crsd' on 'racnode1' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk d5d63504c2084fd1bf20c9109c45188c.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE d5d63504c2084fd1bf20c9109c45188c (/dev/asmdiskf) [DATA]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2673: Attempting to stop 'ora.crsd' on 'racnode1'
CRS-2677: Stop of 'ora.crsd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'racnode1'
CRS-2673: Attempting to stop 'ora.crf' on 'racnode1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'racnode1'
CRS-2677: Stop of 'ora.crf' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.storage' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'racnode1'
CRS-2677: Stop of 'ora.mdnsd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.asm' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.evmd' on 'racnode1'
CRS-2677: Stop of 'ora.evmd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'racnode1'
CRS-2677: Stop of 'ora.cssd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'racnode1'
CRS-2677: Stop of 'ora.gipcd' on 'racnode1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2019/04/11 11:26:42 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode1'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode1'
CRS-2676: Start of 'ora.evmd' on 'racnode1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode1'
CRS-2676: Start of 'ora.gpnpd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode1'
CRS-2676: Start of 'ora.gipcd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode1'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode1'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode1'
CRS-2676: Start of 'ora.diskmon' on 'racnode1' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode1'
CRS-2676: Start of 'ora.ctssd' on 'racnode1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-2676: Start of 'ora.asm' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'racnode1'
CRS-2676: Start of 'ora.storage' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'racnode1'
CRS-2676: Start of 'ora.crf' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode1'
CRS-2676: Start of 'ora.crsd' on 'racnode1' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: racnode1
CRS-6016: Resource auto-start has completed for server racnode1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2019/04/11 11:31:30 CLSRSC-343: Successfully started Oracle Clusterware stack
2019/04/11 11:31:33 CLSRSC-594: Executing installation step 18 of 19: 'ConfigNode'.
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'racnode1'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-2676: Start of 'ora.asm' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'racnode1'
CRS-2676: Start of 'ora.DATA.dg' on 'racnode1' succeeded
2019/04/11 11:43:18 CLSRSC-594: Executing installation step 19 of 19: 'PostConfig'.
2019/04/11 11:49:21 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

oracle无法启动ORA-27154,ORA-27300,ORA-27301,ORA-27302

发现一台实例突然无法启动,版本11.2.0.4

1
2
3
4
5
SQL> startup nomount
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates

看报错信息大概是sskgpcreates创建失败,资源不足。这个主要是在数据库启动时根据process参数设置的值去分配信号资源,如果内核参数设置的不合理,则会导致报错。

在Linux里,信号量主要是用来控制特定进程的利用率。它是一个可共享的资源,分别由P (wait)和V (signal)函数操作,这两个函数分别递减和递增信号量。当进程需要资源时,发出”wait”,信号量递减。当信号量的值为0时,资源不可用,调用进程会spin或阻塞(视情况而定),直到资源可用为止。当进程释放由信号量控制的资源时,它会增加信号量并通知等待的进程。由下面几个参数控制

信号量 描述 最小值
SEMMSL 每个组的最大信号量 128
SEMMNS 系统级别最大信号数
SEMOPM 每个semop调用的最大操作
SEMMNI 最大组数

计算当前最少需要的信号量总数

1
sum(本机所有实例的process参数值)+本机系统需求+本机其他程序需求
  • 设置semmns大于这个sum值
  • 设置semmsl为256
  • 设置semmni为semmns/semmsl

查看当前系统参数

1
2
3
4
5
6
7
8
$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128 // SEMMNI
max semaphores per array = 250 // SEMMSL
max semaphores system wide = 32000 // SEMMNS
max ops per semop call = 100 // SEMOP
semaphore max value = 32767

从这个配置来看理论上系统级别的最大信号数为32000,最大组数为128,每个组最大信号量250。

但是ipcs命令显示每个信号量标识符最多可以容纳156个Oracle信号量

1
2
3
4
5
6
7
8
9
10
11
$ ipcs
...
------ Semaphore Arrays --------
key semid owner perms nsems
0x2ff4a110 5701651 oracle 640 156
0x2ff4a111 5734420 oracle 640 156
0x2ff4a112 5767189 oracle 640 156
0x2ff4a113 5799958 oracle 640 156
0x2ff4a114 5832727 oracle 640 156
0x2ff4a115 5865496 oracle 640 156
...

那么实际上最大可用的信号总量为156 x 128=19968,所以要增加最大可用组数。

1
2
3
4
5
[root@whzdb1 ~]# /sbin/sysctl -a | grep sem
kernel.sem = 250 32000 100 128

-- change to:
kernel.sem = 250 32000 100 200

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

在做一个rman异机恢复完成后,数据库无法打开,报错提示

1
2
3
4
Oracle Error: 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+DG/xxx/datafile/system01.dbf'

用的源库一个rman 0级备份以及后来的部分归档日志,拷贝到这个测试环境,恢复完controlfile之后,进行了基于scn的recover,scn则是restore之后控制文件里显示的scn号

1
recover database until scn 75238614094;

尝试open时报错

1
2
3
4
5
6
16:54:07 sys. >alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+DG/xxx/datafile/system01.dbf'

但是检查scn均一致,有点奇怪,因为一般来说如果这三个视图的checkpoint_change#一致的话,数据库是应该可以正常打开的。

1
2
3
4
5
sys >select (select distinct checkpoint_change# from v$datafile) dfile_ckp,(select checkpoint_change# from v$database) db_ckp,(select distinct checkpoint_change# from v$datafile_header) dh_ckp from dual;

DFILE_CKP DB_CKP DH_CKP
------------------ ------------------ ------------------
75238614094 75238614094 75238614094

检测数据文件的状态

1
2
3
4
5
6
7
8
9
10
set numwidth 30;
set pagesize 50000;
alter session set nls_date_format = 'DD-MON-RRRR HH24:MI:SS';

sys.>select status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy from v$datafile_header group by status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, fuzzy;

STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME RESETLOGS_CHANGE# RESETLOGS_TIME COUNT(*) FUZ
------- ------------------ ----------------------- ------------------------------ ----------------------- ------------------------------ ---
ONLINE 75238614094 23-FEB-2019 00:15:46 67622732054 02-NOV-2018 22:00:36 45 NO
ONLINE 75238614094 23-FEB-2019 00:15:46 67622732054 02-NOV-2018 22:00:36 8 YES >>>>====有8个文件的状态不一致

fuzzy值为yes表示数据文件在checkpoint以后仍然有些写入的动作,比如有一些比存储在v$datafile_header.checkpoint_change#字段里scn更高的scn事务对数据文件进行了修改操作,所以为了让数据文件保持一致性,则需要前滚应用日志

查看需要哪些归档日志来恢复数据文件

1
2
3
4
5
select min(fhrba_Seq), max(fhrba_Seq) from X$KCVFH;

MIN(FHRBA_SEQ) MAX(FHRBA_SEQ)
------------------------------ ------------------------------
21201 21201

这个查询结果表示所需要的最小和最大日志文件,为了让数据文件保持一致,你必须要应用这之间所有的日志。

查看当前使用的controlfile

1
2
3
4
5
select name, controlfile_type from v$database ;

NAME CONTROL
------------------------------ -------
XXX BACKUP >>>>====使用恢复出来的控制文件

查看当前redo log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
select     substr(L.GROUP#,1,6)       GROUP#
,substr(L.THREAD#,1,7) THREAD#
,substr(L.SEQUENCE#,1,10) SEQUENCE#
,substr(L.MEMBERS,1,7) MEMBERS
,substr(L.ARCHIVED,1,8) ARCHIVED
,substr(L.STATUS,1,10) STATUS
,substr(L.FIRST_CHANGE#,1,16) FIRST_CHANGE#
,substr(LF.member,1,60) REDO_LOGFILE
from GV$LOG L, GV$LOGFILE LF
where L.GROUP# = LF.GROUP# ;

GROUP# THREAD# SEQUENCE# MEMBERS ARCHIV STATUS FIRST_CHANGE# REDO_LOGFILE
------------ -------------- -------------------- -------------- ------ -------------------- -------------------------------- ----------------------------------
1 1 21201 1 YES ACTIVE 75238614094 /home/oradata/xxx/redo01.log
2 1 21199 1 YES INACTIVE 75238587795 /home/oradata/xxx/redo02.log
11 1 21203 1 NO CURRENT 75238632711 /home/oradata/xxx/redo11.log >>>>====这是当前日志,包含最近的redo
10 1 21202 1 YES ACTIVE 75238624057 /home/oradata/xxx/redo10.log
3 1 21200 1 YES INACTIVE 75238601713 /home/oradata/xxx/redo03.log

通过前面查询的数据文件头的最小checkpoint_change#,用于查询所要从哪个日志开始恢复,最小的日志号一直到current redo中间的所有日志都必须可用,才能保证恢复成功。

1
2
3
4
5
6
7
8
 select thread#, sequence#, substr(name,1,80) from v$Archived_log
where 75238614094 between first_change# and next_change#;


THREAD# SEQUENCE# SUBSTR(NAME,1,80)
------------------------------ ------------------------------ ----------------------------------------------------------------------
1 21200 xxxx
1 21200

根据以上结果表示需要恢复21200到21203之间的日志,我们当前已经到了21201,所以总共需要恢复21201和21202 2个文件。

将备份集中的归档日志恢复出来

1
2
3
4
5
6
7
8
9
10
11
RMAN> RESTORE ARCHIVELOG FROM SEQUENCE 21201 UNTIL SEQUENCE 21202;

channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21201
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21202
channel ORA_DISK_1: reading from backup piece /bak/backup/arch_u8tqihpn_1_1
channel ORA_DISK_1: piece handle=/bak/backup/arch_u8tqihpn_1_1
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:10:35

这里先处理控制文件里的redo路径问题,原路径是文件系统,新环境是asm,所有对所有的redo日志进行了rename

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
select * from v$logfile;

GROUP# STATUS TYPE MEMBER IS_
---------- ------- ------- ---------------------------------------------------------------------- ---
3 ONLINE +REDO/xxx/redo03.log NO
2 ONLINE +REDO/xxx/redo02.log NO
1 ONLINE +REDO/xxx/redo01.log NO
4 STANDBY +REDO/xxx/stdredo01.log NO
5 STANDBY +REDO/xxx/stdredo02.log NO
6 STANDBY +REDO/xxx/stdredo03.log NO
7 STANDBY +REDO/xxx/stdredo04.log NO
10 ONLINE +REDO/xxx/redo10.log NO
11 ONLINE +REDO/xxx/redo11.log NO

ALTER DATABASE RENAME FILE '/home/oradata/xxx/redo03.log' to '+REDO/xxx/redo03.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/redo02.log' to '+REDO/xxx/redo02.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/redo01.log' to '+REDO/xxx/redo01.log';

ALTER DATABASE RENAME FILE '/home/oradata/xxx/stdredo01.log' to '+REDO/xxx/stdredo01.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/stdredo02.log' to '+REDO/xxx/stdredo02.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/stdredo03.log' to '+REDO/xxx/stdredo03.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/stdredo04.log' to '+REDO/xxx/stdredo04.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/redo10.log' to '+REDO/xxx/redo10.log';
ALTER DATABASE RENAME FILE '/home/oradata/xxx/redo11.log' to '+REDO/xxx/redo11.log';

当使用的是备份控制文件时,可以使用基于CANCEL的恢复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
22:52:18 sys. xxx>RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL ;
ORA-00279: change 75238614094 generated at 02/23/2019 00:15:46 needed for thread 1
ORA-00289: suggestion : +NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21201.664.1001796429
ORA-00280: change 75238614094 for thread 1 is in sequence #21201


22:52:35 Spexxxy log: {>RET>=suggested | filename | AUTO | CANCEL}
+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21201.664.1001796429
ORA-00279: change 75238624057 generated at 02/23/2019 00:19:25 needed for thread 1
ORA-00289: suggestion : +NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21202.663.1001796179
ORA-00280: change 75238624057 for thread 1 is in sequence #21202
ORA-00278: log file '+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21201.664.1001796429' no longer needed for this recovery


22:52:49 Spexxxy log: {>RET>=suggested | filename | AUTO | CANCEL}
+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21202.663.1001796179
ORA-00279: change 75238632711 generated at 02/23/2019 00:22:37 needed for thread 1
ORA-00289: suggestion : +nvmedg
ORA-00280: change 75238632711 for thread 1 is in sequence #21203
ORA-00278: log file '+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21202.663.1001796179' no longer needed for this recovery


22:53:03 Spexxxy log: {>RET>=suggested | filename | AUTO | CANCEL} >>>>====缺少进一步的日志用于恢复
auto
ORA-00308: cannot open archived log '+nvmedg'
ORA-17503: ksfdopn:2 Failed to open file +nvmedg
ORA-15045: ASM file name '+nvmedg' is not in reference form


ORA-00308: cannot open archived log '+nvmedg'
ORA-17503: ksfdopn:2 Failed to open file +nvmedg
ORA-15045: ASM file name '+nvmedg' is not in reference form


ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+NVMEDG/xxx/datafile/system01.dbf'

当我恢复完21201和21202两个文件以后,发现依然无法打开数据库。

1
2
3
4
5
6
sys. >alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+DG/xxx/datafile/system01.dbf'

那就表示目前数据文件仍然有不一致的地方,最后需要的部分则在current redo当中,因为我这个是测试环境,所以虽然没有current redo,但是我有21203号之后的归档日志,所以直接采用这些归档日志来进行recover,那具体要恢复到哪个scn呢,可以通过下面语句查询出来

1
2
3
4
5
6
7
8
select  min(FHSCN) "LOW FILEHDR SCN"
, max(FHSCN) "MAX FILEHDR SCN"
, max(FHAFS) "Min PITR ABSSCN"
from X$KCVFH ;

LOW FILEHDR SCN MAX FILEHDR SCN Min PITR ABSSCN
---------------- ---------------- ----------------
75238632711 75238632711 75238770594

“LOW FILEHDR SCN” – 恢复进程开始的scn

“MAX FILEHDR SCN” – 为了使所有数据文件保持一致必须要恢复到的scn

IF “Min PITR ABSSCN” != 0 AND > “MAX FILEHDR SCN”

THEN “Min PITR ABSSCN” 则是为了使所有数据文件保持一致必须要恢复到的scn

所以根据上述原则必须要恢复到75238770594

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
RMAN> recover database until scn 75238770594;

Starting recover at 2019-03-01 23:16:30
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2065 device type=DISK

starting media recovery

channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21203
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21204
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21205
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21206
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21207
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=21208
channel ORA_DISK_1: reading from backup piece /bak/backup/arch_u8tqihpn_1_1
channel ORA_DISK_1: piece handle=/bak/backup/arch_u8tqihpn_1_1
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:13:25
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21203.677.1001805529 thread=1 sequence=21203
channel default: deleting archived log(s)
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21203.677.1001805529 RECID=66515 STAMP=1001805655
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21204.678.1001805657 thread=1 sequence=21204
channel default: deleting archived log(s)
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21204.678.1001805657 RECID=66516 STAMP=1001805790
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21205.679.1001806059 thread=1 sequence=21205
channel default: deleting archived log(s)
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21205.679.1001806059 RECID=66517 STAMP=1001806190
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21206.675.1001805395 thread=1 sequence=21206
channel default: deleting archived log(s)
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21206.675.1001805395 RECID=66513 STAMP=1001805528
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21207.676.1001805395 thread=1 sequence=21207
channel default: deleting archived log(s)
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21207.676.1001805395 RECID=66512 STAMP=1001805524
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21208.674.1001805393 thread=1 sequence=21208
channel default: deleting archived log(s)
archived log file name=+NVMEDG/xxx/archivelog/2019_03_01/thread_1_seq_21208.674.1001805393 RECID=66514 STAMP=1001805536
media recovery complete, elapsed time: 00:00:17
Finished recover at 2019-03-01 23:30:16

这次的恢复就正常没有报错,检查数据文件的scn和状态等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sys >select (select distinct checkpoint_change# from v$datafile) dfile_ckp,(select checkpoint_change# from v$database) db_ckp,(select distinct checkpoint_change# from v$datafile_header) dh_ckp from dual;

DFILE_CKP DB_CKP DH_CKP
------------------ ------------------ ------------------
75238770595 75238770595 75238770595

set numwidth 30;
set pagesize 50000;
alter session set nls_date_format = 'DD-MON-RRRR HH24:MI:SS';

select status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy from v$datafile_header group by status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, fuzzy;

STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME RESETLOGS_CHANGE# RESETLOGS_TIME COUNT(*) FUZ
------- ------------------------------ ----------------------- ------------------------------ ----------------------- ------------------------------ ---
ONLINE 75238770595 01-MAR-2019 23:36:12 75238770595 01-MAR-2019 23:36:12 53 NO

尝试打开数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery


alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 97239
Session ID: 1921 Serial number: 1

重建控制文件

1
2
3
4
5
6
7
8
9
10
11
12
CREATE CONTROLFILE REUSE DATABASE "XXX" NORESETLOGS  ARCHIVELOG
......省略

select name, controlfile_type from v$database ;

NAME CONTROLFILE_TY
------------------------------ --------------
XXX CREATED

alter database open;

Database altered.

至此恢复完毕。

ORA-16086: standby database does not contain available standby log files

早上来检查告警邮件,一套10.2.0.5的dg主库一直在报ora-16086错误,根据错误提示大概是备库没有可用的standby log files。在11g以后,这个错误名称换成了ORA-16086 Redo data cannot be written to the standby redo log

1
2
3
4
5
16086, 0000, "standby database does not contain available standby log files"
// *Cause: The primary database is in "no data loss" mode, but the standby
// database does not contain any "standby log files".
// *Action: Add one or more standby log files to the standby database.
// This can be done while the standby database is mounted.

当前主备同步依旧正常,切换日志也能顺利传到备库,备库也能正常应用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
主:
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/arch
Oldest online log sequence 384007
Next log sequence to archive 384012
Current log sequence 384012

备:
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/arch
Oldest online log sequence 383998
Next log sequence to archive 0
Current log sequence 384012

主备库均为最大可用模式

1
2
3
4
5
SQL> select protection_mode,protection_level from v$database;

PROTECTION_MODE PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM AVAILABILITY RESYNCHRONIZATION

而主备库的standby log file均存在,且大小一致,状态正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SQL> select * from v$logfile;

GROUP# STATUS TYPE MEMBER IS_
---------- ------- ------- -------------------------------------------------------------------------------- ---
1 ONLINE /u01/app/oracle/oradata/test/redo01.log NO
2 ONLINE /u01/app/oracle/oradata/test/redo02.log NO
3 ONLINE /u01/app/oracle/oradata/test/redo03.log NO
4 ONLINE /u01/app/oracle/oradata/test/redo04.log NO
5 ONLINE /u01/app/oracle/oradata/test/redo05.log NO
6 ONLINE /u01/app/oracle/oradata/test/redo06.log NO
7 STANDBY /u01/app/oracle/oradata/test/redo07.log NO
8 STANDBY /u01/app/oracle/oradata/test/redo08.log NO
9 STANDBY /u01/app/oracle/oradata/test/redo09.log NO
10 STANDBY /u01/app/oracle/oradata/test/redo10.log NO
11 STANDBY /u01/app/oracle/oradata/test/redo11.log NO
12 STANDBY /u01/app/oracle/oradata/test/redo12.log NO
13 STANDBY /u01/app/oracle/oradata/test/redo13.log NO

MOS上有说可能是因为fra目录满导致的bug,但检查数据库并没有启用fra

1
2
3
4
5
6
7
SQL> show parameter recovery

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string
db_recovery_file_dest_size big integer 0
recovery_parallelism integer 0

尝试重建standby log file,主备库均删除重建

1
2
3
4
5
6
7
8
9
10
11
12
13
alter database drop standby logfile group 7;
alter database drop standby logfile group 8;
alter database drop standby logfile group 9;
alter database drop standby logfile group 10;
alter database drop standby logfile group 11;
alter database drop standby logfile group 12;
alter database drop standby logfile group 13;

select 'alter database add standby logfile group '||group#||' '''||member||''' size 1g reuse;' from v$logfile where type='STANDBY';

recover managed standby database cancel;

recover managed standby database using current logfile disconnect from session;

观察了一小时,并未再出现告警。

ORA-04031一例 ("shared pool","…","SQLA","tmp")

今天突然收到告警,一台11.2.0.4的数据库报错ORA-04031。

1
2
SYS@> show sga        
ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","SELECT DECODE(null,'','Total...","SQLA","tmp")

根据错误代码可以判断是共享池使用出了问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[oracle@xxx trace]$ oerr ora 4031
04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\",\"%s\")"
// *Cause: More shared memory is needed than was allocated in the shared
// pool or Streams pool.
// *Action: If the shared pool is out of memory, either use the
// DBMS_SHARED_POOL package to pin large packages,
// reduce your use of shared memory, or increase the amount of
// available shared memory by increasing the value of the
// initialization parameters SHARED_POOL_RESERVED_SIZE and
// SHARED_POOL_SIZE.
// If the large pool is out of memory, increase the initialization
// parameter LARGE_POOL_SIZE.
// If the error is issued from an Oracle Streams or XStream process,
// increase the initialization parameter STREAMS_POOL_SIZE or increase
// the capture or apply parameter MAX_SGA_SIZE.

从smon日志中可以看到大量的sga组件等待扩展的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Session Wait History:
elapsed time of 0.261076 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
=0x0, =0x0, =0x0
wait_id=2337078 seq_num=43657 snap_id=101
wait times: snap=0.000000 sec, exc=5.069536 sec, total=13.674583 sec
wait times: max=infinite
wait counts: calls=100 os=100
occurred after 0.000000 sec of elapsed time
1: waited for 'SGA: allocation forcing component growth'
=0x0, =0x0, =0x0
wait_id=2337178 seq_num=43656 snap_id=1
wait times: snap=0.003531 sec, exc=0.003531 sec, total=0.003531 sec
wait times: max=infinite
wait counts: calls=2 os=2
occurred after 0.000000 sec of elapsed time
2: waited for 'SGA: allocation forcing component growth'
=0x0, =0x0, =0x0
wait_id=2337078 seq_num=43655 snap_id=100
wait times: snap=0.050112 sec, exc=5.069536 sec, total=13.671052 sec
wait times: max=infinite
wait counts: calls=100 os=100
occurred after 0.000000 sec of elapsed time
...省略

表示sga所分配的内存已经使用完,不足以支撑组件的扩展。这个库使用的sga_target,所以sga的各个组件都能自适应分配内存大小

1
2
3
PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- ------------------------------------------------------
sga_target big integer 12G

在sga中有6个子池,每个子池中大部分内存都分配给了”KGH: NO ACCESS”,都在1G左右。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 1
----------------------------------------------
"KGH: NO ACCESS " 1118 MB 67%
"free memory " 205 MB 12%
"KGLH0 " 86 MB 5%
"SQLA " 70 MB 4%
"KGLS " 21 MB 1%
"KGLHD " 15 MB 1%
"VIRTUAL CIRCUITS " 15 MB 1%
"db_block_hash_buckets " 13 MB 1%
"kglsim object batch " 10 MB 1%
"private strands " 9443 KB 1%
-----------------------------------------
free memory 205 MB
memory alloc. 1459 MB
Sub total 1664 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 2
----------------------------------------------
"KGH: NO ACCESS " 1277 MB 77%
"free memory " 260 MB 16%
"db_block_hash_buckets " 13 MB 1%
"kglsim object batch " 11 MB 1%
"ASH buffers " 10 MB 1%
"private strands " 9443 KB 1%
"event statistics per sess " 8335 KB 0%
"ksunfy : SSO free list " 7929 KB 0%
"dbktb: trace buffer " 6848 KB 0%
"kglsim heap " 6444 KB 0%
-----------------------------------------
free memory 260 MB
memory alloc. 1404 MB
Sub total 1664 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 3
----------------------------------------------
"KGH: NO ACCESS " 1086 MB 68%
"free memory " 192 MB 12%
"KGLH0 " 94 MB 6%
"SQLA " 39 MB 2%
"FileOpenBlock " 30 MB 2%
"KGLHD " 16 MB 1%
"db_block_hash_buckets " 13 MB 1%
"enqueue " 12 MB 1%
"kglsim object batch " 11 MB 1%
"KGLS " 10 MB 1%
-----------------------------------------
free memory 192 MB
memory alloc. 1408 MB
Sub total 1600 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 4
----------------------------------------------
"KGH: NO ACCESS " 1182 MB 67%
"free memory " 243 MB 14%
"KGLH0 " 81 MB 5%
"SQLA " 48 MB 3%
"KQR M PO " 38 MB 2%
"KGLS " 17 MB 1%
"db_block_hash_buckets " 13 MB 1%
"KGLHD " 12 MB 1%
"kglsim object batch " 10 MB 1%
"ASH buffers " 10 MB 1%
-----------------------------------------
free memory 243 MB
memory alloc. 1517 MB
Sub total 1760 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 5
----------------------------------------------
"KGH: NO ACCESS " 1271 MB 74%
"free memory " 221 MB 13%
"SQLA " 51 MB 3%
"KGLH0 " 45 MB 3%
"db_block_hash_buckets " 13 MB 1%
"ASH buffers " 10 MB 1%
"kglsim object batch " 9622 KB 1%
"private strands " 9443 KB 1%
"event statistics per sess " 8335 KB 0%
"ksunfy : SSO free list " 7942 KB 0%
-----------------------------------------
free memory 221 MB
memory alloc. 1507 MB
Sub total 1728 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 6
----------------------------------------------
"KGH: NO ACCESS " 1182 MB 70%
"free memory " 232 MB 14%
"KGLH0 " 83 MB 5%
"SQLA " 48 MB 3%
"KGLHD " 13 MB 1%
"db_block_hash_buckets " 13 MB 1%
"KGLS " 13 MB 1%
"kglsim object batch " 12 MB 1%
"private strands " 9576 KB 1%
"event statistics per sess " 8335 KB 0%
-----------------------------------------
free memory 232 MB
memory alloc. 1464 MB
Sub total 1696 MB
TOTALS ---------------------------------------
Total free memory 1571 MB
Total memory alloc. 10 GB
Grand total 11 GB
==============================================

在每个子池中,会有4个持续时间部分(durations),分别是”instance”, “session”, “cursor”, 和 “execution”。 主要目的是为了根据不同的事务种类对共享池申请的内存大小和方式的不同,而将会造成大量碎片、或者能重用的任务区分开来,它们之间互相独立、互不干扰。比如cursor所需内存来源于duration2,execution所需内存来源于duration3等等。

12c以前的版本

12c以后的版本

为了能使shared pool里的内存得到充分利用,我们通过隐含参数来去掉durations的限制。

1
2
3
4
       NUM N_HEX NAME                                                   VALUE                          DESCRIPTION
---------- ----- ------------------------------------------------------ ------------------------------ ---------------------------------------------
111 6F _enable_shared_pool_durations TRUE temporary to disable/enable kgh policy

最后解决办法

1
2
SQL> alter system set "_enable_shared_pool_durations"=false scope=spfile;
- restart the database

设置"_enable_shared_pool_durations = false"的主要好处在于所有的durations会合并成一个池,所以就不会出现其中一个duration内存不足而其他duration还有空余的情况。

这个问题将在12c版本以后解决,由于体系的改变导致可以允许子池减少对duration的需求。