对JSON数据创建索引

JSON创建索引概述

对于JSON数据来说并没有什么特殊的SQL数据类型,所以创建索引的方式也并无特别之处。可以对json_value函数返回值创建B-tree索引或者位图索引,也可以对is jsonjs not jsonjson_exists等函数创建位图索引。

一般来说,位图索引通常是创建在那些唯一值很少的字段上,比如性别之类。

函数索引只会在对应函数的查询sql上才会生效,这在具体使用中无可避免存在一些局限性,所以在创建这些函数索引之前需要充分了解到业务经常使用的场景。

对于那些查询场景不确定的情况也有一种对应方式,那就是JSON search index,这是一种常规索引,不局限于特定的JSON路径表达式。它非常适合结构式的查询,比如查找JSON字段中的具体值,并且通过json_textcontains来进行全文查询,在各种字符串值中寻找特定的单词。

对于同一个JSON列来说,可以同时创建函数索引和全文索引,这里要注意的是JSON search index是专门用来对应JSON数据的全文索引。

Oracle建议使用AL32UTF8字符集,否则创建和使用索引时会自动的进行字符集的转换,而这种转换是有损的,也就表示某些数据查询的时候会无法返回

创建一张测试JSON表

1
2
3
4
5
6
7
drop table json_test purge;

create table json_test (
id raw(16) not null,
document clob
CONSTRAINT ensure_json CHECK (document IS JSON)
);

插入测试数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
INSERT INTO json_test
VALUES (SYS_GUID(),
'{ "PONumber" : 1600,
"Reference" : "ABULL-20140421",
"Requestor" : "Alexis Bull",
"User" : "ABULL",
"CostCenter" : "A50",
"ShippingInstructions" : { "name" : "Alexis Bull",
"Address": { "street" : "200 Sporting Green",
"city" : "South San Francisco",
"state" : "CA",
"zipCode" : 99236,
"country" : "United States of America" },
"Phone" : [ { "type" : "Office", "number" : "909-555-7307" },
{ "type" : "Mobile", "number" : "415-555-1234" } ] },
"Special Instructions" : null,
"AllowPartialShipment" : false,
"LineItems" : [ { "ItemNumber" : 1,
"Part" : { "Description" : "One Magic Christmas",
"UnitPrice" : 19.95,
"UPCCode" : 13131092899 },
"Quantity" : 9.0 },
{ "ItemNumber" : 2,
"Part" : { "Description" : "Lethal Weapon",
"UnitPrice" : 19.95,
"UPCCode" : 85391628927 },
"Quantity" : 5.0 } ] }');

创建位图索引

创建基于json_exists返回数据的位图索引,这种索引就非常适合json_exists,因为返回的结果只有true或者false

1
2
3
4
create bitmap index idx_name
on json_test (json_exists(document, '$.ShippingInstructions.name'));

Index created.

创建基于json_value函数返回数据的位图索引,这种就适合字段里只有少量可能的数据的情况

1
2
3
4
drop index idx_name;

create bitmap index idx_name
on json_test (json_value(document, '$.ShippingInstructions'));

创建JSON_VALUE函数索引

可以创建基于JSON_VALUE的函数索引,标准语法即可,显式的指定函数json_value,或者直接简单的用点表示法的语法。这些方法创建的索引对基于点表示法查询和json_value查询都可应用。

1
CREATE UNIQUE INDEX po_num_idx ON json_test po (po.document.PONumber);

这里基于点表示法创建了一个函数索引,另外也可以通过json_value函数来创建基于POnumber字段的索引

1
2
create unique index po_num_idx2 
on json_test (json_value(document, '$.PONumber' returning number error on error null on empty));

注意这里的ERROR ON ERROR表示如果数据包含不存在的POnumber的记录、有多个PONumber、或者有个PONumber不是数字型则索引就会创建失败,如果这个索引已经存在,则不符合要求的数据会插入失败。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
set autotrace traceonly

select a.document."User",a.document."CostCenter" from json_test a
where json_value(document, '$.PONumber' returning number error on error null on empty) = 1600;

Execution Plan
----------------------------------------------------------
Plan hash value: 1738154326

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3489 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX2 | 1 | | 1 (0)| 00:00:01 |
| 4 | JSONTABLE EVALUATION | | | | | |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(JSON_VALUE("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.POnumber'
RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=1600)

如果使用点表示法来查询,则同样可以使用这个索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
select a.document."User",a.document."CostCenter" from json_test a
where a.document."PONumber" = 1600;

Execution Plan
----------------------------------------------------------
Plan hash value: 1738154326

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3489 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX2 | 1 | | 1 (0)| 00:00:01 |
| 4 | JSONTABLE EVALUATION | | | | | |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(JSON_VALUE("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.PONumber'
RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=1600)

这里可以看到,尽管使用点表示法时,但是谓词仍将其看做JSON_VALUE格式的数据而使用json_value的索引,并没有看见其进行隐式转换成JSON_QUERY,这里主要是优化器进行了更优解而选择了json_value表达式的索引。

而当我们不使用这个索引时,就会看到预期的json_query转换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
drop index po_num_idx2;

select a.document."User",a.document."CostCenter" from json_test a
where a.document."PONumber" = '1600';

Execution Plan
----------------------------------------------------------
Plan hash value: 4223338378

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX | 1 | | 1 (0)| 00:00:01 |
| 4 | JSONTABLE EVALUATION | | | | | |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(JSON_QUERY("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON ,
'$.PONumber' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON
ERROR)='1600')

Oracle建议使用下面的方式对json_value创建函数索引

  • 点表示法

    索引值可以灵活对应点表示法的查询,无论何时都会返回JSON数据。包含非标量的JSON数据,可以匹配除了json_value查询的点表示法查询。

  • json_value表达式,指定RETURNING数据类型,使用ERROR ON ERROR(可选NULL ON EMPTY

    索引值只有指定数据类型的标量数据,但是仍可以适用于那些返回标量数据的点表示法查询。

JSON_VALUE索引和查询的数据类型注意事项

默认情况下,json_value函数返回的是varchar2类型数据,当创建一个基于json_value的函数索引时,除非指定RETURNING语句返回一个不同类型的数据,否则当谓词条件是一个非varchar2值的时候,就并不会适用索引。

返回值为varchar2类型,谓词为number型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
select a.document."User",a.document."CostCenter" from json_test a
where json_value(document, '$.PONumber' error on error null on empty) = 1600;

Execution Plan
----------------------------------------------------------
Plan hash value: 2422004586

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82 | 278K| 32 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 82 | 278K| 32 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | JSON_TEST | 1 | 3476 | 3 (0)| 00:00:01 |
|* 3 | JSONTABLE EVALUATION | | | | | |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter(TO_NUMBER("P"."C_01$")=1600)

多JSON字段创建复合索引

当对JSON对象的多个字段建复合索引时,首选需要对其创建虚拟列,然后在虚拟列上创建B-tree索引。

1
2
3
4
5
6
7
alter table json_test add (userid varchar2(50)
generated always as (json_value(document, '$.User' returning varchar2(50))));

alter table json_test add (cost varchar2(50)
generated always as (json_value(document, '$.CostCenter' returning varchar2(50))));

create index idx_json on json_test(userid,cost);

两种方式查询可以用到复合索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
select document from json_test
where userid='ABULL'
and cost='A50';

Execution Plan
----------------------------------------------------------
Plan hash value: 521514918

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3496 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_JSON | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("USERID"='ABULL' AND "COST"='A50')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
select document from json_test
where json_value(document,'$.User' returning varchar2(50))='ABULL'
and json_value(document,'$.CostCenter' returning varchar2(50))='A50';

Execution Plan
----------------------------------------------------------
Plan hash value: 521514918

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3496 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_JSON | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("JSON_TEST"."USERID"='ABULL' AND "JSON_TEST"."COST"='A50')

JSON全文索引

JSON search索引是一个常规索引,它可以提高临时结构查询和全文搜索的性能。 这是一个Oracle Text索引,专门设计用于JSON数据。

如果你使用12cR1版本创建过JSON search索引,则在新版本之后建议drop后重建

JSON search index

通过关键字FOR JSON来执行CREATE SEARCH INDEX语句进行索引的创建,JSON全文索引的维护是异步的,这样可以避免在业务高峰期的索引维护成本,而选择在业务低谷的时候进行这些消耗资源的操作。异步维护也就意味着在索引同步之前,那些被修改的数据或新插入的数据并不会在索引里存在记录。

创建JSON全文索引

1
create search index idx_search on json_test(document) for json;

当JSON全文索引创建完毕以后,就可以使用json_textcontains来进行查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
select document from json_test
where json_textcontains(document, '$.LineItems.Part.Description', 'Magic');

Execution Plan
----------------------------------------------------------
Plan hash value: 1340174430

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3488 | 1 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'Magic INPATH
(/LineItems/Part/Description)')>0)

执行计划中出现了DOMAIN INDEX表示sql使用了全文索引。

上面的例子仅查询了描述字段里出现Magic的情况,而比如想查询Description同时包含MagicChristmas的情况则需要使用操作符&and

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
select document from json_test
where json_textcontains(document, '$.LineItems.Part.Description', 'Magic \& Christmas');

{ "PONumber" : 1600,
"Reference" : "ABULL-20140421",
"Requestor" : "Alexis Bull",
"User" : "ABULL",
"CostCenter" : "A50",
"ShippingInstructions" : { "name" : "Alexis Bull",
"Address": { "street" : "200 Sporting Green",
"city" : "South San Francisco",
"state" : "CA",
"zipCode" : 99236,
"country" : "United States of America" },
"Phone" : [ { "type" : "Office", "number" : "909-555-7307" },
{ "type" : "Mobile", "number" : "415-555-1234" } ] },
"Special Instructions" : null,
"AllowPartialShipment" : false,
"LineItems" : [ { "ItemNumber" : 1,
"Part" : { "Description" : "One Magic Christmas",
"UnitPrice" : 19.95,
"UPCCode" : 13131092899 },
"Quantity" : 9.0 },
{ "ItemNumber" : 2,
"Part" : { "Description" : "Lethal Weapon",
"UnitPrice" : 19.95,
"UPCCode" : 85391628927 },
"Quantity" : 5.0 } ] }

这里使用了反斜杠转义&

也可以用and,但是注意这里查询包含保留字and,所以需要加上花括号{}

1
2
select document from json_test
where json_textcontains(document, '$.LineItems.Part.Description', '{Magic and Christmas}');

JSON全文索引同样支持那些不包含json_textcontains的查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT COUNT(*)
FROM json_test
WHERE JSON_EXISTS(document, '$.Reference');

Execution Plan
----------------------------------------------------------
Plan hash value: 2409884336

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3488 | | |
|* 2 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'HASPATH(/Refer
ence)')>0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT COUNT(*)
FROM json_test
where json_value(document, '$.ShippingInstructions.name') = 'Alexis Bull';

Execution Plan
----------------------------------------------------------
Plan hash value: 3056167535

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3488 | | |
|* 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3488 | 1 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(JSON_VALUE("DOCUMENT" FORMAT JSON , '$.ShippingInstructions.name'
RETURNING VARCHAR2(4000) NULL ON ERROR)='Alexis Bull')
3 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'{Alexis Bull} INPATH
(/ShippingInstructions/name)')>0)

对JSON数据创建索引
https://www.xbdba.com/2021/04/09/oracle-json-index/
作者
xbdba
发布于
2021年4月9日
许可协议