通过DBMS_REDEFINITION更新大表

更新大表的时候通常伴随着大量的redo和undo产生,消耗了不少的资源。同时如果这个字段较长的话,也会产生许多行迁移,因为肯定会有某些块无法容纳下更新后的行的大小。

通常考虑通过DDL的方式来替代对这些大表的更新DML操作,第一时间想到的可能就是CTAS,最后rename回去的方式。但是这会带来一些其他的问题,比如表的其他属性并不会复制到新表当中,其他的约束和索引等都需要重建,工作比较繁琐也容易漏掉,而且这种变更表操作是需要停机时间的。而通过在线重定义的方式则可以免去这些烦恼,程序会自动帮我们将这些依赖关系同步过去。通常我们使用在线重定义都是作为改变表的存储属性用途,但是别忘了,在线重定义同样支持字段匹配的功能,只要我们的字段匹配是确定的,那我们可以获得更多有价值的内容,包括更新操作。

首先建两张测试表,T1和T2,T1做为被更新的表,T2作为T1更新字段所需要关联的表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
drop table t1 purge;
create table t1 as
select object_id ,object_name
from dba_objects
where object_id is not null;

drop table t2 purge;
create table t2 as
select object_id ,object_name ,object_id||object_name cname
from dba_objects
where object_id is not null;

alter table t1
add constraint pk_t1
primary key (object_id);

alter table t2
add constraint pk_t2
primary key (object_id);

这里给T1加一个字段cname,然后通过与T2表的object_id字段关联,将T2表的cname更新到T1表cname字段中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
alter table t1 add cname varchar2(200);

xb@PDB12C> select * from t1 where rownum>10;

OBJECT_ID OBJECT_NAME CNAME
---------- ----------------------------------- --------------
9 I_FILE#_BLOCK#
38 I_OBJ3
45 I_TS1
51 I_CON1
19 IND$
31 CDEF$
6 C_TS#
58 I_CCOL2
24 I_PROXY_DATA$

xb@PDB12C> select * from t2 where rownum>10;

OBJECT_ID OBJECT_NAME CNAME
---------- -------------------- ---------------------------
9 I_FILE#_BLOCK# 9I_FILE#_BLOCK#
38 I_OBJ3 38I_OBJ3
45 I_TS1 45I_TS1
51 I_CON1 51I_CON1
19 IND$ 19IND$
31 CDEF$ 31CDEF$
6 C_TS# 6C_TS#
58 I_CCOL2 58I_CCOL2
24 I_PROXY_DATA$ 24I_PROXY_DATA$

DBMS_REDEFINITION中无法使用子查询或者表关联,但是可以用确定性的表达式,在这个例子中可以使用函数的方式

1
2
3
4
5
6
7
8
9
10
11
create or replace
function get_cname(v_id int ) return varchar2 deterministic is
f varchar2(200);
begin
select cname
into f
from t2
where object_id = v_id;
return f;
end;
/

建一张临时表用作重定义的过程中转换的数据

1
2
3
4
5
drop table t_tmp purge;
create table t_tmp (
object_id number,
object_name varchar2(200),
cname varchar2(200));

那现在可以使用DBMS_REDEFINITION包了,这里需要注意的重点是COL_MAPPING参数,这里将配置之前建的get_cname函数来获取要更新T1表的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
declare
l_colmap varchar(512);
begin
l_colmap :=
'OBJECT_ID
,OBJECT_NAME
,get_cname(OBJECT_ID) cname';

dbms_redefinition.start_redef_table
( uname => user,
orig_table => 'T1',
int_table => 'T_TMP',
col_mapping => l_colmap );
end;
/

在这里不需要考虑其他依赖对象,只需要COPY_TABLE_DEPENDENTS就可以解决这些问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var nerrors number
begin
dbms_redefinition.copy_table_dependents
( user, 'T1', 'T_TMP',
copy_indexes => dbms_redefinition.cons_orig_params,
num_errors => :nerrors );
end;
/

xb@PDB12C> print nerrors

NERRORS
----------
0

-- 完毕
begin
dbms_redefinition.finish_redef_table
( user, 'T1', 'T_TMP' );
end;
/

最后检查T1表的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
xb@PDB12C> select * from t1 where rownum>10;

OBJECT_ID OBJECT_NAME CNAME
---------- -------------------- ----------------------
9 I_FILE#_BLOCK# 9I_FILE#_BLOCK#
38 I_OBJ3 38I_OBJ3
45 I_TS1 45I_TS1
51 I_CON1 51I_CON1
19 IND$ 19IND$
31 CDEF$ 31CDEF$
6 C_TS# 6C_TS#
58 I_CCOL2 58I_CCOL2
24 I_PROXY_DATA$ 24I_PROXY_DATA$

在线重定义操作避免了业务的中断,不需要考虑依赖对象的问题,特别是对大表的操作起到很好的效果。

通过DBMS_PARALLEL_EXECUTE更新大表

DBMS_PARALLEL_EXECUTE包可以将一个任务拆分成很多小块,然后通过并发的方式共同完成任务,从而提高效率。

当你在执行超大表的更新操作时,这个技术都能起到作用:

  • 它在同一时间内只会锁住一小部分的数据集,而不是锁住全表
  • 在整个任务完成之前如果碰见部分更新失败,则其他已完成的任务不受影响
  • 减少了回滚段的消耗
  • 提高了效率

注意DBMS_PARALLEL_EXECUTE是通过job的方式来执行,所以参数job_queue_processe必须要大于0,否则任务则会一直处于”PL/SQL lock timer”等待

DBMS_PARALLEL_EXECUTE提供了多种切片方式

  1. CREATE_CHUNKS_BY_NUMBER_COL,将表通过指定的字段进行切片,这个字段必须是数字型,找出这个字段的最大值和最小值,然后根据分块值平均将其分成多个部分。
  2. CREATE_CHUNKS_BY_ROWID,通过rowid来切分表,所以被切分的表必须是物理存在的表,需要含有物理rowid地址,所以不支持索引组织表之类。
  3. CREATE_CHUNKS_BY_SQL,通过用户指定的SELECT语句来进行切分。

使用DBMS_PARALLEL_EXECUTE前提条件

  • 为了并发执行多个块任务,必须含有CREATE JOB权限
  • 需要有执行DBMS_SQL包的权限,因为CHUNK_BY_SQL, RUN_TASK, 和RESUME_TASK子系统都需要查询,都是通过执行DBMS_SQL得到

下面建了一个测试表,插入了一些数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DROP TABLE TEST_TAB;
CREATE TABLE test_tab
(
REC_ID NUMBER,
DESCRIPTION VARCHAR2(50),
NUM_COL NUMBER,
CONSTRAINT TEST_TAB_PK PRIMARY KEY (REC_ID)
);

INSERT /*+ APPEND */ INTO
TEST_TAB
SELECT LEVEL,
'Description for ' || LEVEL,
CASE
WHEN MOD(level, 5) = 0 THEN 10
WHEN MOD(level, 3) = 0 THEN 20
ELSE 30
END
FROM DUAL
CONNECT BY LEVEL >= 500000;
COMMIT;

SELECT NUM_COL, COUNT(*) FROM TEST_TAB
GROUP BY NUM_COL ORDER BY NUM_COL;

NUM_COL COUNT(*)
---------- ----------
10 100000
20 133333
30 266667

创建任务

1
2
3
4
BEGIN
DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');
END;
/

建好的任务相关信息可以通过视图查看

1
2
3
4
5
6
7
8
COLUMN task_name FORMAT A10
SELECT task_name,
status
FROM user_parallel_execute_tasks;

TASK_NAME STATUS
---------- -------------------
test_task CREATED

做个切片示例,按rowid

1
2
3
4
5
6
7
8
9
10
11
BEGIN
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID
(
TASK_NAME => 'test_task',
TABLE_OWNER => 'XB',
TABLE_NAME => 'TEST_TAB',
BY_ROW => TRUE,
CHUNK_SIZE => 2500
);
END;
/

重新查看任务状态

1
2
3
4
5
6
7
8
COLUMN task_name FORMAT A10
SELECT task_name,
status
FROM user_parallel_execute_tasks;

TASK_NAME STATUS
---------- -------------------
test_task CHUNKED

查看每个单独切片的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
SELECT chunk_id, status, start_rowid, end_rowid
FROM user_parallel_execute_chunks
WHERE task_name = 'test_task'
ORDER BY chunk_id;

CHUNK_ID STATUS START_ROWID END_ROWID
---------- -------------------- ------------------ ------------------
257 UNASSIGNED AAASyXAAMAAAACAAAA AAASyXAAMAAAAC5H//
258 UNASSIGNED AAASyXAAMAAAAC6AAA AAASyXAAMAAAADrH//
259 UNASSIGNED AAASyXAAMAAAADsAAA AAASyXAAMAAAAGVH//
260 UNASSIGNED AAASyXAAMAAAAGWAAA AAASyXAAMAAAAHHH//
261 UNASSIGNED AAASyXAAMAAAAHIAAA AAASyXAAMAAAAH5H//
262 UNASSIGNED AAASyXAAMAAAAH6AAA AAASyXAAMAAAAIrH//
263 UNASSIGNED AAASyXAAMAAAAIsAAA AAASyXAAMAAAAJdH//
264 UNASSIGNED AAASyXAAMAAAAJeAAA AAASyXAAMAAAAKPH//
265 UNASSIGNED AAASyXAAMAAAAKQAAA AAASyXAAMAAAALBH//
266 UNASSIGNED AAASyXAAMAAAALCAAA AAASyXAAMAAAALzH//
267 UNASSIGNED AAASyXAAMAAAAL0AAA AAASyXAAMAAAAMlH//
268 UNASSIGNED AAASyXAAMAAAAMmAAA AAASyXAAMAAAANXH//
269 UNASSIGNED AAASyXAAMAAAANYAAA AAASyXAAMAAAAOJH//
270 UNASSIGNED AAASyXAAMAAAAOKAAA AAASyXAAMAAAAO7H//
271 UNASSIGNED AAASyXAAMAAAAO8AAA AAASyXAAMAAAAPtH//
272 UNASSIGNED AAASyXAAMAAAAPuAAA AAASyXAAMAAAAQfH//
273 UNASSIGNED AAASyXAAMAAAAQgAAA AAASyXAAMAAAARRH//
274 UNASSIGNED AAASyXAAMAAAARSAAA AAASyXAAMAAAASDH//
275 UNASSIGNED AAASyXAAMAAAASEAAA AAASyXAAMAAAAS1H//
276 UNASSIGNED AAASyXAAMAAAAS2AAA AAASyXAAMAAAATnH//
277 UNASSIGNED AAASyXAAMAAAAToAAA AAASyXAAMAAAAUZH//
278 UNASSIGNED AAASyXAAMAAAAUaAAA AAASyXAAMAAAAVLH//
279 UNASSIGNED AAASyXAAMAAAAVMAAA AAASyXAAMAAAAV9H//
280 UNASSIGNED AAASyXAAMAAAAV+AAA AAASyXAAMAAAAWvH//
281 UNASSIGNED AAASyXAAMAAAAWwAAA AAASyXAAMAAAAXhH//
282 UNASSIGNED AAASyXAAMAAAAXiAAA AAASyXAAMAAAAYTH//
283 UNASSIGNED AAASyXAAMAAAAYUAAA AAASyXAAMAAAAZFH//
284 UNASSIGNED AAASyXAAMAAAAZGAAA AAASyXAAMAAAAZ3H//
285 UNASSIGNED AAASyXAAMAAAAZ4AAA AAASyXAAMAAAAapH//
286 UNASSIGNED AAASyXAAMAAAAaqAAA AAASyXAAMAAAAbbH//
287 UNASSIGNED AAASyXAAMAAAAbcAAA AAASyXAAMAAAAcNH//
288 UNASSIGNED AAASyXAAMAAAAcOAAA AAASyXAAMAAAAc/H//
289 UNASSIGNED AAASyXAAMAAAAdAAAA AAASyXAAMAAAAdxH//
290 UNASSIGNED AAASyXAAMAAAAdyAAA AAASyXAAMAAAAejH//
291 UNASSIGNED AAASyXAAMAAAAekAAA AAASyXAAMAAAAfVH//
292 UNASSIGNED AAASyXAAMAAAAfWAAA AAASyXAAMAAAAgHH//
293 UNASSIGNED AAASyXAAMAAAAgIAAA AAASyXAAMAAAAg5H//
294 UNASSIGNED AAASyXAAMAAAAg6AAA AAASyXAAMAAAAhrH//
295 UNASSIGNED AAASyXAAMAAAAhsAAA AAASyXAAMAAAAidH//
296 UNASSIGNED AAASyXAAMAAAAieAAA AAASyXAAMAAAAjPH//
297 UNASSIGNED AAASyXAAMAAAAjQAAA AAASyXAAMAAAAkBH//
298 UNASSIGNED AAASyXAAMAAAAkCAAA AAASyXAAMAAAAkzH//
299 UNASSIGNED AAASyXAAMAAAAk0AAA AAASyXAAMAAAAllH//
300 UNASSIGNED AAASyXAAMAAAAlmAAA AAASyXAAMAAAAmXH//
301 UNASSIGNED AAASyXAAMAAAAmYAAA AAASyXAAMAAAAnJH//
302 UNASSIGNED AAASyXAAMAAAAnKAAA AAASyXAAMAAAAn7H//
303 UNASSIGNED AAASyXAAMAAAAn8AAA AAASyXAAMAAAAotH//
304 UNASSIGNED AAASyXAAMAAAAouAAA AAASyXAAMAAAApfH//
305 UNASSIGNED AAASyXAAMAAAApgAAA AAASyXAAMAAAAqRH//
306 UNASSIGNED AAASyXAAMAAAAqSAAA AAASyXAAMAAAArDH//
307 UNASSIGNED AAASyXAAMAAAArEAAA AAASyXAAMAAAAr1H//
308 UNASSIGNED AAASyXAAMAAAAr2AAA AAASyXAAMAAAAr/H//

52 rows selected.

完整语句

通过CREATE_CHUNKS_BY_ROWID切片

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
SET SERVEROUTPUT ON
BEGIN
DBMS_PARALLEL_EXECUTE.DROP_TASK ('test_task');
EXCEPTION WHEN OTHERS THEN
NULL;
END;
/

DECLARE
l_task VARCHAR2(30) := 'test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK (task_name => l_task);

-- Chunk the table by the ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID
(
TASK_NAME => l_task,
TABLE_OWNER => 'XB', /* Replace the TABLE_OWNER as appropriately */
TABLE_NAME => 'TEST_TAB',
BY_ROW => TRUE,
CHUNK_SIZE => 2500
);

-- DML to be execute in parallel
l_sql_stmt := 'UPDATE test_tab t SET t.num_col = t.num_col + 10 WHERE rowid BETWEEN :start_id AND :end_id';

-- Run the task
DBMS_PARALLEL_EXECUTE.RUN_TASK
(
TASK_NAME => l_task,
SQL_STMT => l_sql_stmt,
LANGUAGE_FLAG => DBMS_SQL.NATIVE,
PARALLEL_LEVEL => 10
);

-- If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);

WHILE(l_try > 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK(l_task);
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);
END LOOP;

-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK(l_task);
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error in the code :' || SQLERRM);
END;
/

通过CREATE_CHUNKS_BY_NUMBER_COL切片

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
SET SERVEROUTPUT ON

DECLARE
l_task VARCHAR2(30) := 'test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK (task_name => l_task);

-- Chunk the table by the ID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_NUMBER_COL
(
TASK_NAME => l_task,
TABLE_OWNER => 'XB', /* Replace the TABLE_OWNER value as appropriately */
TABLE_NAME => 'TEST_TAB',
TABLE_COLUMN => 'REC_ID',
CHUNK_SIZE => 2500
);

-- Procedure to be execute in parallel
l_sql_stmt := 'BEGIN PROCESS_UPDATE(:start_id, :end_id); END;';

DBMS_PARALLEL_EXECUTE.RUN_TASK
(
TASK_NAME => l_task,
SQL_STMT => l_sql_stmt,
LANGUAGE_FLAG => DBMS_SQL.NATIVE,
PARALLEL_LEVEL => 10
);

-- If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);

WHILE(l_try > 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK(l_task);
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);
END LOOP;

-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK(l_task);
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error in the code :' || SQLERRM);
END;
/

通过CREATE_CHUNKS_BY_SQL切片

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
SET SERVEROUTPUT ON
DECLARE
l_chunk_sql VARCHAR2(1000);
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('test_task');
-- Chunk the table by NUM_COL
l_chunk_sql := 'SELECT DISTINCT NUM_COL, NUM_COL FROM TEST_TAB';
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL
(
TASK_NAME => 'test_task',
SQL_STMT => l_chunk_sql,
BY_ROWID => false
);

-- Execute the DML in parallel
-- the WHERE clause contain a condition on num_col, which is the chunk
-- column. In this case, grouping rows is by num_col.
l_sql_stmt := 'UPDATE test_tab t SET t.num_col = t.num_col + 10 WHERE num_col BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK
(
TASK_NAME => 'test_task',
SQL_STMT => l_sql_stmt,
LANGUAGE_FLAG => DBMS_SQL.NATIVE,
PARALLEL_LEVEL => 10
);

-- If there is error, RESUME it for at most 2 times.
L_try := 0;
L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('test_task');

WHILE(l_try > 2 and L_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
L_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('test_task');
L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('test_task');
END LOOP;

-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('test_task');
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error in the code :' || SQLERRM);
END;
/

Pipelined Table Functions

TABLE函数

table function是一个用户自定义的PL/SQL函数,它能返回一个结果集(关联数组、嵌套表),你能像查询普通表一样查询它,通过SELECT语句里的TABLE表达式

1
SELECT * FROM TABLE(table_function_name(parameter_list))

table函数能将行的集合作为入参,也就是说它的入参可以是嵌套表、数组或者游标等,因此table函数tf1的结果集可以作为tf2的入参,同理函数tf2的结果集也可以作为tf3的入参,以此类推。

由于传统的table function必须要填充完所有的数据以后才能返回结果,而结果又保存在内存当中,所以因为这个原因导致table function并不适合数据量较大的操作。

先构建一个table function例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
DROP TYPE t_tf_type;
DROP TYPE t_tf_obj;

CREATE TYPE t_tf_obj AS OBJECT (
id NUMBER,
name VARCHAR2(50)
);
/

CREATE TYPE t_tf_type IS TABLE OF t_tf_obj;
/

CREATE OR REPLACE FUNCTION get_tab (cnt IN NUMBER) RETURN t_tf_type AS
l_tab t_tf_type := t_tf_type();
BEGIN
FOR i IN 1 .. cnt LOOP
l_tab.extend;
l_tab(l_tab.last) := t_tf_obj(i, 'Name is ' || i);
END LOOP;

RETURN l_tab;
END;
/

XB@ora12c> SELECT * FROM TABLE(get_tab(10));

ID NAME
---------- -------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5
6 Name is 6
7 Name is 7
8 Name is 8
9 Name is 9
10 Name is 10

10 rows selected.

如果版本是12.2及以上,可以连关键字TABLE都省掉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
XB@ora12c> SELECT * FROM   get_tab(10);

ID NAME
---------- --------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5
6 Name is 6
7 Name is 7
8 Name is 8
9 Name is 9
10 Name is 10

10 rows selected.

Pipelined Table Functions

为了提高table function的性能,你可以

  • 通过PARALLEL_ENABLE选项开启并行
  • 通过oracle streams将表函数的结果集直接传给下一个进程
  • 通过PIPELINED选项管道化函数结果

这篇重点说一下最后一个。

pipelined table function当获取到行时会立刻返回给调用它的进程,然后继续下一行数据的获取。因为不需要等到全部的数据集合获取完毕,所以能很大程度上的提高响应时间。同时也能节省内存,因为不需要缓存整个结果集对象。

pipelined table function包含PIPELINED子句,并调用PIPE ROW在函数的结果生成时就立刻返回,并不需要等数据构建完。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
CREATE OR REPLACE FUNCTION get_tab_ptf (cnt IN NUMBER) RETURN t_tf_type PIPELINED AS
BEGIN
FOR i IN 1 .. cnt LOOP
PIPE ROW(t_tf_obj(i, 'Name is ' || i));
END LOOP;

RETURN;
END;
/

XB@ora12c> SELECT * FROM get_tab_ptf(10);

ID NAME
---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5
6 Name is 6
7 Name is 7
8 Name is 8
9 Name is 9
10 Name is 10

10 rows selected.

当你在操作大数据量时,这个性能上的优势就会非常明显了。

NO_DATA_NEEDED异常

你必须理解NO_DATA_NEEDED异常的几种场景

  • 包含有PIPE ROW语句的块中必须存在OTHERS异常处理

  • 你输入的PIPE ROW代码伴随清理的过程

    一般情况下,清理过程主要是用来释放那些代码不需要的资源。

如果pipelined table function函数返回的结果集超过了语句所需要的,pipelined table function函数的调用则会停止但是调用的语句却还在执行,如果这时有NO_DATA_NEEDED的异常处理过程,则能释放出不再需要的资源。

比如下面这个没有异常处理的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE OR REPLACE FUNCTION get_tab_ptf (cnt IN NUMBER) RETURN t_tf_type PIPELINED AS
BEGIN
FOR i IN 1 .. cnt LOOP
PIPE ROW(t_tf_obj(i, 'Name is ' || i));
END LOOP;
RETURN;
END;
/


Function created.

XB@ora12c> XB@ora12c> SELECT * FROM get_tab_ptf(10) where rownum>=5;

ID NAME
---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5

看结果显示语句并没有报错提示,而且只返回了5行结果,如果加上异常处理的判断,是能获取到异常情况的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
set serveroutput on

CREATE OR REPLACE FUNCTION get_tab_ptf (cnt IN NUMBER) RETURN t_tf_type PIPELINED AS
BEGIN
FOR i IN 1 .. cnt LOOP
PIPE ROW(t_tf_obj(i, 'Name is ' || i));
END LOOP;
RETURN;
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line('EXCEPTION');
RAISE;
END;
/

XB@ora12c> SELECT * FROM get_tab_ptf(10) where rownum>=5;

ID NAME
---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5

EXCEPTION

可以看到最后有EXCEPTION的输出,说明异常是真实存在的,只不过并没有显示的提醒。

指定获取NO_DATA_NEEDED的异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
CREATE OR REPLACE FUNCTION get_tab_ptf (cnt IN NUMBER) RETURN t_tf_type PIPELINED AS
BEGIN
FOR i IN 1 .. cnt LOOP
PIPE ROW(t_tf_obj(i, 'Name is ' || i));
END LOOP;
RETURN;
EXCEPTION
WHEN NO_DATA_NEEDED THEN
DBMS_OUTPUT.put_line('NO_DATA_NEEDED');
RAISE;
WHEN OTHERS THEN
DBMS_OUTPUT.put_line('EXCEPTION');
RAISE;
END;
/


Function created.

XB@ora12c> XB@ora12c> SELECT * FROM get_tab_ptf(10) where rownum>=5;

ID NAME
---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5

NO_DATA_NEEDED

可以看到最后的结果表示异常确实是NO_DATA_NEEDED

内存使用情况比较

主要是要对比一下普通的表函数和pipelined表函数在内存的使用情况有什么区别,二者都是将数据缓存到pga内存里,所以直接比较session pga memory值的大小

因为需要在匿名块里计算,所以新建一个简单的函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CREATE OR REPLACE FUNCTION get_stat RETURN NUMBER AS
l_sum NUMBER;
BEGIN
SELECT a.value
INTO l_sum
FROM v$mystat a,
v$statname b
WHERE a.statistic# = b.statistic#
AND b.name = 'session pga memory';
RETURN l_sum;
END get_stat;
/

Function created.

首先看普通表函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
set serveroutput on;

declare
v_begin number;
v_end number;
BEGIN
v_begin:=sys.get_stat;
FOR cur IN (SELECT *
FROM TABLE(get_tab(10000)))
LOOP
NULL;
END LOOP;
v_end:=sys.get_stat;
DBMS_OUTPUT.put_line('Begin memory is: '||v_begin);
DBMS_OUTPUT.put_line('End memory is: '||v_end);
DBMS_OUTPUT.put_line('Total memory used: '||(v_end-v_begin));
END;
16 /
Begin memory is: 4753336
End memory is: 9013176
Total memory used: 4259840

接下来换成pipelined table function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
set serveroutput on;
declare
v_begin number;
v_end number;
BEGIN
v_begin:=sys.get_stat;
FOR cur IN (SELECT *
FROM TABLE(get_tab_ptf(10000)))
LOOP
NULL;
END LOOP;
v_end:=sys.get_stat;
DBMS_OUTPUT.put_line('Begin memory is: '||v_begin);
DBMS_OUTPUT.put_line('End memory is: '||v_end);
DBMS_OUTPUT.put_line('Total memory used: '||(v_end-v_begin));
END;
16 /
Begin memory is: 4032440
End memory is: 4032440
Total memory used: 0

结果令人震惊,居然是0。重新试了多次结果依然都是0

这是由于pipelined table function不需要将全部结果集缓存到内存当中,所以数据在内存当中停留的时间非常短

Cardinality

如果你优化过table function的sql的话 你会发现oracle估算其基数时总是不准的。主要取决于db block size,如果是默认的8k,那么估算的结果总是8196

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
XB@ora12c> 
SELECT *
2 FROM TABLE(get_tab_ptf(10));

ID NAME
---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5
6 Name is 6
7 Name is 7
8 Name is 8
9 Name is 9
10 Name is 10

10 rows selected.

XB@ora12c> @x

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2gpbj20w5kmbn, child number 0
-------------------------------------
SELECT * FROM TABLE(get_tab_ptf(10))

Plan hash value: 822655197

---------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 29 (100)|
| 1 | COLLECTION ITERATOR PICKLER FETCH| GET_TAB_PTF | 8168 | 16336 | 29 (0)| >>>>====估算8168行
---------------------------------------------------------------------------------------

通常针对这种问题有几种处理办法

  • cardinality hint告诉优化器我会返回多少行
  • dynamica_sampling hint动态采样
  • Cardinality Feedback 就是sql实际执行完毕后会进行基数的修正,在之后的执行时会采用正确的基数。但是这个反馈值是存储在shared pool里,就面临到被age out的因素

动态采样是个不错的办法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
XB@ora12c> alter system flush shared_pool;

System altered.

XB@ora12c> select /*+ DYNAMIC_SAMPLING(2)*/ * FROM TABLE(get_tab_ptf(10));

ID NAME
---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
1 Name is 1
2 Name is 2
3 Name is 3
4 Name is 4
5 Name is 5
6 Name is 6
7 Name is 7
8 Name is 8
9 Name is 9
10 Name is 10

10 rows selected.

XB@ora12c> @x
Display execution plan for last statement for this session from library cache...

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 76sppwrcf832x, child number 0
-------------------------------------
select /*+ DYNAMIC_SAMPLING(2)*/ * FROM TABLE(get_tab_ptf(10))

Plan hash value: 822655197

---------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 11 (100)|
| 1 | COLLECTION ITERATOR PICKLER FETCH| GET_TAB_PTF | 10 | 20 | 11 (0)| >>>>====估算的值就要准确多了
---------------------------------------------------------------------------------------

- dynamic statistics used: dynamic sampling (level=2)


15 rows selected.