当前位置: 首页 > Oracle, oracle 10g, oracle 11g > 正文

记一次数据库优化

客户反映数据库时快时慢,数据库版本11.2.0.4.0,操作系统RHEL5U5,查看数据库的活动会话,发现基本100%的SQL都在同一条SQL,而且大多都是latch: row cache objects等待。

SQL> select inst_id,username,sql_id,event from gv$session where username is not null and status='ACTIVE';

   INST_ID USERNAME                       SQL_ID        EVENT
---------- ------------------------------ ------------- ---------------------------------
         1 SYS                            8s2qkvk056ugr PX Deq: Execution Msg
         1 SYS                            8s2qkvk056ugr PX Deq: Execute Reply
         1 PRESSO                         9rwzwamtgv6m6 gc cr request
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         4z9c5071vvaz5 resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 SYS                            8s2qkvk056ugr PX Deq: Execution Msg
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f enq: RC - Result Cache: Contention

21 rows selected.

latch: row cache objects等待事件通常是在修改数据字典相关的等待,通常由于硬解析过高导致,查看这条SQL的SQL语句。

SQL> select sql_text from v$sqltext where sql_id='dfftdnm7cu76f' order by piece

SQL_TEXT
---------------------------------------------------------------------------------------------------------
select *
  from (select t.apply_id,
               bb.cn_name,
               o.cn_name as oname,
               UPPER(bb.card_code),
               (select count(bd.BAD_RECORD_ID) as n
                  from biz_badrecoed_info bd wherebd.BAD_RECORD_ID = 4 and bd.PUNISH_ORG = o.org_id) || '-' ||
               (select count(bd.BAD_RECORD_ID) as nll
                  from biz_badrecoed_info bd
                 where bd.BAD_RECORD_ID = 1
                   and bd.ID_CARD = bb.card_code) || '-' ||
               (select count(bd.BAD_RECORD_ID) as nlllfrom biz_badrecoed_info bd where bd.BAD_RECORD_ID = 2 and bd.ID_CARD = bb.card_code) || '-' ||
               (select count(bd.BAD_RECORD_ID) as nllll
                  from biz_badrecoed_info bd
                 where bd.BAD_RECORD_ID = 3
                   and bd.ID_CARD = bb.card_code) bdn,
               o.type_id,
               bcode.code_name,
               trim(ps.cn_name) as psname,
               t.apply_status as st,
               lead(t.apply_id, 1) over(order by t.create_time, t.apply_id desc) as nextID,
               lag(t.apply_id, 1) over(order by t.create_time, t.apply_id desc) as prevID,
               t.create_time as create_time
          from biz_presscard_application t,
               biz_org_info              o,
               biz_reporter_base_info    bb,
               base_code_info            bcode,
               biz_pressstation_info     ps
         where t.org_id = o.org_id
           and bcode.code_sort_id = '1'
           and t.reporter_uuid = bb.reporter_uuid
           and to_char(t.apply_status) = bcode.code_id
         andt.sat_id = ps.sat_id(+)
           and t.apply_status = '2'
           and t.next_orgid = '1'
         ORDER BY T.create_time, t.apply_id DESC)
 where rownum <= :1

24 rows selected.

可见这条SQL没有使用绑定变量,修改cursor_sharing参数,在数据库层打开绑定变量。

SQL> show parameter cursor

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cursor_bind_capture_destination      string      memory+disk
cursor_sharing                       string      SIMILAR
cursor_space_for_time                boolean     FALSE
open_cursors                         integer     3000
session_cached_cursors               integer     50
SQL> alter system set cursor_sharing=SIMILAR sid='*';

System altered.

再次查询这个等待已经很少,但出现了大量的latch: cache buffers chains等待事件。

SQL> select inst_id,username,sql_id,event from gv$session where username is not null and status='ACTIVE' and username !='SYS';

   INST_ID USERNAME                       SQL_ID        EVENT
---------- ------------------------------ ------------- ------------------------------------
         1 PRESSO                         1fsdcuajuxncg enq: RC - Result Cache: Contention
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: cache buffers chains
         2 PRESSO                         dfftdnm7cu76f resmgr:cpu quantum
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f latch: row cache objects
         2 PRESSO                         dfftdnm7cu76f enq: RC - Result Cache: Contention

21 rows selected.

这套应用代码写的相当那啥,SQL还不能动,只能从数据库角度去解决问题。对这套系统来说,热点块一直都是个问题,为了解决这个问题,数据库的block_size已经修改为4K。

SQL> show parameter block

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_buffers                     integer     0
db_block_checking                    string      FALSE
db_block_checksum                    string      TYPICAL
db_block_size                        integer     4096
db_file_multiblock_read_count        integer     136

现在热点块依旧很严重,那么就只能通过调整pctfree来减少热点块了,先查出热点块严重的表。

SQL> SELECT *
  2    FROM (SELECT O.OWNER, O.OBJECT_NAME, O.OBJECT_TYPE, SUM(TCH) TOUCHTIME
  3            FROM X$BH B, DBA_OBJECTS O
  4           WHERE B.OBJ = O.DATA_OBJECT_ID
  5             AND B.TS# > 0
  6           GROUP BY O.OWNER, O.OBJECT_NAME, O.OBJECT_TYPE
  7           ORDER BY SUM(TCH) DESC)
  8   WHERE ROWNUM <= 10;


OWNER                          OBJECT_NAME                              OBJECT_TYPE          TOUCHTIME
------------------------------ ---------------------------------------- ------------------- ----------
PRESSO                         BIZ_REPORTER_BASE_INFO                   TABLE                  3691280
PRESSO                         BIZ_REPORTER_EDU_INFO                    TABLE                  3547004
PRESSO                         BIZ_REPORTER_ORG_INFO                    TABLE                  2273524
PRESSO                         BIZ_PRESSCARD_LOGOUT                     TABLE                  2099499
PRESSO                         BIZ_REPORTER_INFO                        TABLE                  1619598
PRESSO                         BIZ_PRESSCARD_APPLICATION                TABLE                  1191751
PRESSO                         BIZ_SYSTEM_MESSAGE_INFO                  TABLE                   730829
PRESSO                         BIZ_REPORTER_EXTEND_INFO                 TABLE                   610540
SYS                            WRH$_SQLSTAT                             TABLE PARTITION         193465
PRESSO                         IDX_REPORTER_UUID                        INDEX                   190901

10 rows selected.

调整热点块较高的表的pctfree。

SQL> ALTER TABLE PRESSO.BIZ_REPORTER_BASE_INFO PCTFREE 30;

Table altered.

SQL> ALTER TABLE PRESSO.BIZ_REPORTER_EDU_INFO PCTFREE 30;

Table altered.

SQL> ALTER TABLE PRESSO.BIZ_REPORTER_ORG_INFO PCTFREE 30;

Table altered.

这条SQL同样伴随着resmgr:cpu quantum等待事件,这显然是和资源管理器相关的等待事件,告警日志也可以看到相关的信息。

Setting Resource Manager plan SCHEDULER[0x32DB]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

当前服务器CPU使用率在50%左右,对PC服务器来讲,CPU超过50%通常是很危险的,就当前系统而言,AUTO_SQL_TUNING和资源管理器是不想看到的,禁用这些。

SQL> alter system set resource_manager_plan='' sid='*';

System altered.

SQL> execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('WEEKEND_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.

SQL> BEGIN
  2  DBMS_AUTO_TASK_ADMIN.DISABLE(
  3  client_name => 'auto space advisor',
  4  operation => NULL,
  5  window_name => NULL);
  6  END;
  7  /

PL/SQL procedure successfully completed.

过断时间再次查看,发现这条SQL的等待事件又出现了enq: RC – Result Cache: Contention。这受隐含参数_result_cache_timeout的影响,在11.2.0.4.0版本默认是10秒,在10g版本是60秒。

SQL> select
  1   x.ksppinm name,
  2   y.ksppstvl value,
  3   y.ksppstdf isdefault
  4  from
  5   sys.x$ksppi x,
  6   sys.x$ksppcv y
  7  where
  8   x.inst_id = userenv('Instance') and
  9   y.inst_id = userenv('Instance') and
 10   x.indx = y.indx and
 11   x.ksppinm like '%result_cache%'
 12  order by
 13*  translate(x.ksppinm, ' _', ' ')
SQL> /

NAME                                                                             VALUE      ISDEFAULT
-------------------------------------------------------------------------------- ---------- ---------
_client_result_cache_bypass                                                      FALSE      TRUE
client_result_cache_lag                                                          3000       TRUE
client_result_cache_size                                                         0          TRUE
_optimizer_ads_use_result_cache                                                  TRUE       TRUE
_result_cache_auto_dml_monitoring_duration                                       15         TRUE
_result_cache_auto_dml_monitoring_slots                                          4          TRUE
_result_cache_auto_dml_threshold                                                 16         TRUE
_result_cache_auto_dml_trend_threshold                                           20         TRUE
_result_cache_auto_execution_threshold                                           1          TRUE
_result_cache_auto_size_threshold                                                100        TRUE
_result_cache_auto_time_distance                                                 300        TRUE
_result_cache_auto_time_threshold                                                1000       TRUE
_result_cache_block_size                                                         1024       TRUE
_result_cache_copy_block_count                                                   1          TRUE
_result_cache_deterministic_plsql                                                FALSE      TRUE
_result_cache_global                                                             TRUE       TRUE
result_cache_max_result                                                          100        TRUE
result_cache_max_size                                                            2147483648 TRUE
result_cache_mode                                                                FORCE      TRUE
result_cache_remote_expiration                                                   0          TRUE
_result_cache_timeout                                                            10         TRUE

21 rows selected.

降低_result_cache_timeout参数的值可以减少enq: RC – Result Cache: Contention等待事件。

经过查询,发现这些SQL运行时间已经超过2小时,在这期间应用程序重启过,这些进程消耗了大量的资源,并且有释放。经开发人员确认,超过15分钟以上的连接都是不正常的,要回收掉,通过修改profile,设定会话连接时间15分钟。

SQL> alter profile "DEFAULT" limit CONNECT_TIME 15;

Profile altered.

SQL> select * from dba_profiles;

PROFILE                        RESOURCE_NAME                    RESOURCE LIMIT
------------------------------ -------------------------------- -------- ----------------------------------------
DEFAULT                        COMPOSITE_LIMIT                  KERNEL   UNLIMITED
DEFAULT                        SESSIONS_PER_USER                KERNEL   UNLIMITED
DEFAULT                        CPU_PER_SESSION                  KERNEL   UNLIMITED
DEFAULT                        CPU_PER_CALL                     KERNEL   UNLIMITED
DEFAULT                        LOGICAL_READS_PER_SESSION        KERNEL   UNLIMITED
DEFAULT                        LOGICAL_READS_PER_CALL           KERNEL   UNLIMITED
DEFAULT                        IDLE_TIME                        KERNEL   UNLIMITED
DEFAULT                        CONNECT_TIME                     KERNEL   15
DEFAULT                        PRIVATE_SGA                      KERNEL   UNLIMITED
DEFAULT                        FAILED_LOGIN_ATTEMPTS            PASSWORD UNLIMITED
DEFAULT                        PASSWORD_LIFE_TIME               PASSWORD UNLIMITED
DEFAULT                        PASSWORD_REUSE_TIME              PASSWORD UNLIMITED
DEFAULT                        PASSWORD_REUSE_MAX               PASSWORD UNLIMITED
DEFAULT                        PASSWORD_VERIFY_FUNCTION         PASSWORD NULL
DEFAULT                        PASSWORD_LOCK_TIME               PASSWORD UNLIMITED
DEFAULT                        PASSWORD_GRACE_TIME              PASSWORD UNLIMITED

16 rows selected.

杀掉这些进程后,问题解决,再次运行这些SQL,很快就有结果返回了。这个故障导致数据库响应时快时慢,原因是连到节点2的会话,由于服务器资源被这些没有回收的进程消耗和占用,速度就相对较慢,而节点1资源充足,分配到节点1的操作就相对较快。

本文固定链接: http://www.dbdream.com.cn/2014/10/%e8%ae%b0%e4%b8%80%e6%ac%a1%e6%95%b0%e6%8d%ae%e5%ba%93%e4%bc%98%e5%8c%96/ | 信春哥,系统稳,闭眼上线不回滚!

该日志由 dbdream 于2014年10月10日发表在 Oracle, oracle 10g, oracle 11g 分类下, 你可以发表评论,并在保留原文地址及作者的情况下引用到你的网站或博客。
原创文章转载请注明: 记一次数据库优化 | 信春哥,系统稳,闭眼上线不回滚!
关键字: , , , ,

记一次数据库优化:等您坐沙发呢!

发表评论

快捷键:Ctrl+Enter