当前位置: 首页 > Oracle, Oracle 12c > 正文

ORACLE 12C新特性-多线程引起的一个笑话

前几天和同事处理了一个12C的故障,现在想起来还觉得好笑,没文化太可怕了,还好这个是我们公司的测试服务器出的问题,怎么折腾都没事。

故障描述:

新同事在向12C数据库中导入其他数据库的AWR数据的时候,说数据库连不上了,当时忙着技术分享,也没看,后来同事在处理的时候,发现SQLPLUS不能正常连接数据库,不知道是什么原因,技术分享结束后,我帮他看了下,SQLPLUS本地验证的方式登录数据库后,所有操作报ORA-01017错误。

[oracle@enmocoredb admin]$ export ORACLE_SID=core
[oracle@enmocoredb admin]$ sqlplus / as sysdba 

SQL*Plus: Release 12.1.0.1.0 Production on Tue Oct 15 10:06:52 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

遇到这个问题我通常会想到是$ORACLE_HOME/bin目录下的ORACLE文件的权限不对,查看后发现不是这个原因,权限是正常的。

[oracle@enmocoredb bin]$ ll oracle
-rwsr-s--x 1 oracle oinstall 294728395 Jul  2 14:05 oracle

那是什么原因呢?和同事研究半天,也没发现端倪,凭着以前的经验,查了下数据库的进程。

[oracle@enmocoredb admin]$ ps -ef | grep smon
oracle   29003 28878  0 10:18 pts/1    00:00:00 grep smon

SMON进程不存在,SMON进程挂了,数据库不就挂了吗?那在看看其他进程是否在。

[oracle@enmocoredb admin]$ ps -ef | grep ora
root     28705  1405  0 09:28 ?        00:00:00 sshd: oracle [priv]
oracle   28707 28705  0 09:28 ?        00:00:00 sshd: oracle@pts/0
oracle   28708 28707  0 09:28 pts/0    00:00:00 -bash
root     28737  1405  0 09:28 ?        00:00:00 sshd: oracle [priv]
oracle   28739 28737  3 09:29 ?        00:01:54 sshd: oracle@pts/2
oracle   28740 28739  0 09:29 pts/2    00:00:00 -bash
oracle   28787     1  0 09:48 ?        00:00:00 ora_pmon_core
oracle   28789     1  0 09:48 ?        00:00:00 ora_psp0_core
oracle   28791     1  3 09:48 ?        00:01:08 ora_vktm_core
oracle   28795     1  0 09:48 ?        00:00:02 ora_u004_core
oracle   28801     1  0 09:48 ?        00:00:04 ora_u005_core
oracle   28807     1  0 09:48 ?        00:00:00 ora_dbw0_core
root     28875  1405  0 10:02 ?        00:00:00 sshd: oracle [priv]
oracle   28877 28875  0 10:02 ?        00:00:00 sshd: oracle@pts/1
oracle   28878 28877  0 10:02 pts/1    00:00:00 -bash
oracle   28954 28877  0 10:10 ?        00:00:00 /usr/libexec/openssh/sftp-server
oracle   29006 28878  0 10:19 pts/1    00:00:00 ps -ef
oracle   29007 28878  0 10:19 pts/1    00:00:00 grep ora

到现在就满脸疑惑了,为什么SMON进程挂掉了,其他进程还在,之前的经验告诉我,SMON进程挂了,数据库也就挂了,本地验证登录不了数据库,那就杀进程吧,然后我们残忍的干掉了这些进程。

[oracle@enmocoredb admin]$ pkill core

再次尝试本地验证登录数据库成功,但是启动时,错误依旧。

[oracle@enmocoredb dbs]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 25 18:27:29 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL>  startup pfile='/u01/app/oracle/product/12.1.0/dbhome_1/dbs/initcore.ora'
ORA-01017: invalid username/password; logon denied

告警日志最后面竟然不报错,看不出什么问题,而且数据库进程又神奇般的启动了,还是没有SMON进程,真见鬼了。没办法只好重建了PFILE,看看是不是参数出了问题。

[oracle@enmocoredb dbs]$ vi initcore.ora

core.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
*.compatible='12.1.0.0.0'
*.control_files='/u01/app/oracle/oradata/core/control01.ctl','/u01/app/oracle/oradata/core/control02.ctl'
*.db_block_size=8192
*.db_create_file_dest='/u01/app/oracle/oradata/core/'
*.db_domain=''
*.db_name='core'
*.db_recovery_file_dest='/home/oracle/fra'
*.db_recovery_file_dest_size=1073741824
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=coreXDB)'
*.sga_target=500m
*.sga_max_size=500m
*.pga_aggregate_target=300m
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
#*.threaded_execution=TRUE
*.undo_tablespace='UNDOTBS1'
PGA_AGGREGATE_LIMIT=300M

threaded_execution参数没见过,干啥用的?不知道干啥用的就先注释掉,看看是不是他搞的鬼。

[oracle@enmocoredb dbs]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 25 18:31:23 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile='/u01/app/oracle/product/12.1.0/dbhome_1/dbs/initcore.ora'
ORACLE instance started.

Total System Global Area  521936896 bytes
Fixed Size                  2290264 bytes
Variable Size             209718696 bytes
Database Buffers          301989888 bytes
Redo Buffers                7938048 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 2755
Session ID: 125 Serial number: 5

看来真是这个参数搞的鬼,可以看到数据库启动到了MOUNT状态,OPEN时报错,ORA-03113错误通常和磁盘有关系,要么是磁盘不能读写,要么是文件损坏了,不过基本遇到的都是前者,这个错误告警日志会记录。经查告警日志,发现是闪回恢复区满了,导致日志无法归档,数据库挂起,修改闪回恢复去大小,数据库可以正常OPEN。

那么threaded_execution参数到底是干什么的呢?是不是12C引入的新功能呢?查了查资料,发现这个参数竟然是12C的新特性-多线程引入的参数。

Multiprocess and Multithreaded Oracle Database Systems
Multiprocess Oracle Database (also called multiuser Oracle Database) uses several processes to run different parts of the Oracle Database code and additional Oracle processes for the users—either one process for each connected user or one or more processes shared by multiple users. Most databases are multiuser because a primary advantage of a database is managing data needed by multiple users simultaneously.

Each process in a database instance performs a specific job. By dividing the work of the database and applications into several processes, multiple users and applications can connect to an instance simultaneously while the system gives good performance.

In previous releases, Oracle processes did not run as threads on UNIX and Linux systems. Starting in Oracle Database 12c, the multithreaded Oracle Database model enables Oracle processes to execute as operating system threads in separate address spaces. When Oracle Database 12c is installed, the database runs in process mode. You must set a parameter to run the database in threaded mode. In threaded mode, some background processes on UNIX and Linux run as processes (processes containing one thread), whereas the remaining Oracle processes run as threads within processes.

In a database running in threaded mode, PMON and DBW might run as operating system processes, whereas LGWR and CMON might run as threads within a single process. Two foreground processes and a parallel execution (PX) server process might run as threads in a second operating system process. A third operating system process might contain multiple foreground threads. Thus, "Oracle process" does not always mean "operating system process."

The V$PROCESS view contains one row for each Oracle process connected to a database instance. For example, you can run the following query in SQL*Plus to get the operating system process ID and operating system thread ID for each process:

threaded_execution参数就是控制是否使用多线程模式,ORACLE 12C在开启多线程之后,默认就是没有SMON进程的。以下摘自官方文档:

SPID   STID   PROGRAM           PSEUDO
-----  -----  ----------------- ---------
7190   7190   oracle@samplehost (PMON) 
7192   7192   oracle@samplehost (PSP0) 
7194   7194   oracle@samplehost (VKTM) 
7198   7198   oracle@samplehost (SCMN) 
7198   7200   oracle@samplehost (GEN0) 
7202   7202   oracle@samplehost (SCMN) 
7202   7204   oracle@samplehost (DIAG) 
7198   7205   oracle@samplehost (DBRM) 
7202   7206   oracle@samplehost (DIA0)

而且开启多线程之后,就不再支持操作系统认证了,所以会遇到ORA-01017错误。

由于我之前不了解这个特性,凭借以前的经验,发现SMON进程不在就以为数据库已经挂了,其实这个问题只要看下告警日志,就知道是因为闪回恢复区满,日志无法归档,只需要修改下闪回恢复区的大小就可以了,数据库并没有宕掉,看来在一套新版本的系统中,凭借以前的经验有时候也会害死人的,这也体现出没文化太可怕了。

本文固定链接: http://www.dbdream.com.cn/2013/11/oracle-12c%e6%96%b0%e7%89%b9%e6%80%a7-%e5%a4%9a%e7%ba%bf%e7%a8%8b%e5%bc%95%e8%b5%b7%e7%9a%84%e4%b8%80%e4%b8%aa%e7%ac%91%e8%af%9d/ | 信春哥,系统稳,闭眼上线不回滚!

该日志由 dbdream 于2013年11月04日发表在 Oracle, Oracle 12c 分类下, 你可以发表评论,并在保留原文地址及作者的情况下引用到你的网站或博客。
原创文章转载请注明: ORACLE 12C新特性-多线程引起的一个笑话 | 信春哥,系统稳,闭眼上线不回滚!
关键字: , , , ,

ORACLE 12C新特性-多线程引起的一个笑话:等您坐沙发呢!

发表评论

快捷键:Ctrl+Enter