This website is not affiliated with, sponsored by, or approved by SAP AG.

how to find out what caused the app server down

Basis (Basis Technology Modules: Basis Component/System Administration, GUIs)

Moderators: Snowy, thx4allthefish

how to find out what caused the app server down

Postby ashuai » Thu May 30, 2013 7:21 am

hi everyone,
My manager told to me of my PRD server was down yesterday morning. My manager said some users could not login the system , and he found one app server was down. He restarted the app server and seems everything is ok now. But he asked me to find out what caused the server down.

I have checked the dev trace files of the app server today.

I did not find any useful log in the dev_disp.old.

but I found the some dev_wp*.old files record like below (the server has 76 wp, 0~19 record like below)

M Tue May 28 06:55:22 2013
M ThAlarmHandler (1)
M ThAlarmHandler: set CONTROL_TIMEOUT/DP_CONTROL_JAVA_EXIT and break sql
B db_sqlbreak() = 1
M ThAlarmHandler: return from signal handler
M
M Tue May 28 06:56:22 2013
M ThAlarmHandler (2)
M ThAlarmHandler: 2. ALARM: terminate process (pid=9408, user is T138/M0)
M ThAlarmHandler: prv_action of W0: 0x2
M ThAlarmHandler: set clean state of T138/M0 to DP_TIMEOUT
M ThAlarmHandler: prv_action of W0: 0xa
M ThAlarmHandler: save snc contexts
M ThISncSaveAllContexts: save 0 snc contexts
M ThAlarmHandler: C-Stack during alarm handler
M C-STACK
(0) 0x4000000001b363b0 CTrcStack + 0x1d0 at dptstack.c:227 [dw.sapP20_D20]
(1) 0x4000000001733100 ThAlarmHandler + 0x11e0 at thxxhead.c:21417 [dw.sapP20_D20]
(2) 0x4000000001664520 DpSigAlrm + 0x220 at dpxxtool.c:2295 [dw.sapP20_D20]
(3) 0xe00000013305f440 Signal 14 (SIGALRM) delivered
(4) 0xc00000000054ee70 _semop_sys + 0x30 [/usr/lib/hpux64/libc.so.1]
(5) 0xc0000000005607e0 _semop + 0xe0 at ../../../../../core/libs/libc/shared_em_64_perf/../core/syscalls/t_semop.c:19 [/usr/lib/hpux64/libc.so.1]
(6) 0x4000000001707680 RqOsSem + 0xb0 at semux.c:1186 [dw.sapP20_D20]
(7) 0x40000000017097a0 SemRq + 0x810 at semux.c:1814 [dw.sapP20_D20]
(8) 0x4000000004cc2990 EsILock + 0x2410 at esxx.c:3449 [dw.sapP20_D20]
(9) 0x4000000004cca410 STD_EsAttach + 0x1d0 at esxx.c:2348 [dw.sapP20_D20]
(10) 0x4000000004cd5110 EsAttach + 0x90 at esxxfunc.c:874 [dw.sapP20_D20]
(11) 0x4000000004c988c0 EmContextAttach + 0x1e0 at emxx.c:932 [dw.sapP20_D20]
(12) 0x40000000018e20a0 ThCheckEmState + 0x300 at thxxmem.c:438 [dw.sapP20_D20]
(13) 0x40000000018dd780 ThRollIn + 0x380 at thxxmem.c:870 [dw.sapP20_D20]
(14) 0x400000000175bc20 ThSessionRestore + 0x180 at thxxhead.c:22129 [dw.sapP20_D20]
(15) 0x40000000017250b0 TskhLoop + 0x1210 at thxxhead.c:3542 [dw.sapP20_D20]
(16) 0x400000000171f000 ThStart + 0x5d0 at thxxhead.c:10759 [dw.sapP20_D20]
(17) 0x40000000015ab260 DpMain + 0x870 at dpxxdisp.c:1152 [dw.sapP20_D20]
(18) 0x40000000015a4b60 main + 0x80 at thxxanf.c:64 [dw.sapP20_D20]
(19) 0xc00000000006e9b0 main_opd_entry + 0x50 [/usr/lib/hpux64/dld.so]
M
M ***LOG Q02=> wp_halt, WPStop (Workproc 0 9408) [dpuxtool.c 268]

other wp*.old just record wp heap memory is not enough and asked us to change the parameter.


Could anyone give me any suggestions how to find out what caused the issues?

regards .
ashuai
 
Posts: 3
Joined: Thu May 30, 2013 4:16 am

Re: how to find out what caused the app server down

Postby Snowy » Mon Jun 03, 2013 10:42 pm

we'll need more lines from dev_w0 to help you out.
SapFans Moderator

Search: http://www.sapfans.com/forums/search.php
Notes: http://service.sap.com/notes
Help: http://help.sap.com
Rules: http://www.sapfans.com/forums/viewtopic.php?t=344127
Snowy
 
Posts: 28767
Joined: Mon Oct 21, 2002 2:33 pm
Location: 3.1415926535

Re: how to find out what caused the app server down

Postby ashuai » Tue Jun 04, 2013 2:21 am

Hi snowy , thank for for your reply.

here is the dev_disp.old
Tue May 28 05:43:28 2013
DpHdlDeadWp: restart wp (pid=4092) automatically
DpHdlDeadWp: restart wp (pid=7475) automatically

Tue May 28 05:48:28 2013
DpHdlDeadWp: restart wp (pid=7558) automatically

Tue May 28 05:59:08 2013
DpHdlDeadWp: restart wp (pid=7781) automatically

Tue May 28 06:01:48 2013
DpHdlDeadWp: restart wp (pid=7559) automatically

Tue May 28 06:06:08 2013
DpHdlDeadWp: restart wp (pid=8356) automatically

Tue May 28 06:07:48 2013
DpHdlDeadWp: restart wp (pid=7158) automatically

Tue May 28 06:23:29 2013
DpHdlDeadWp: restart wp (pid=8933) automatically

Tue May 28 06:32:29 2013
DpHdlDeadWp: restart wp (pid=8869) automatically

Tue May 28 06:35:02 2013

SoftCancel request for T196 U25298 M2 received from REMOTE_TERMINAL

Tue May 28 06:35:29 2013
DpHdlDeadWp: restart wp (pid=26907) automatically

Tue May 28 10:42:25 2013
DpSigInt: caught signal 2
DpHalt: shutdown server >sapp20_P20_20 < (normal)
DpModState: change server state from ACTIVE to SHUTDOWN
Stop work processes

Tue May 28 10:43:26 2013
SoftCancel request for T61 U62 M0 received from DISPATCHER
SoftCancel request for T64 U65 M0 received from DISPATCHER
SoftCancel request for T65 U66 M0 received from DISPATCHER
SoftCancel request for T67 U68 M0 received from DISPATCHER
SoftCancel request for T68 U69 M0 received from DISPATCHER
SoftCancel request for T69 U70 M0 received from DISPATCHER

Tue May 28 10:44:26 2013
*** ERROR => DpWpKill(9408, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(5017, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(19231, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(7207, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(7095, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(26552, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(24400, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(2441, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(15636, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29057, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(11921, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(18804, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(21517, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29061, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(28708, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29063, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29068, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29069, SIGUSR2): kill failed [dpxxtool.c 2527]
Stop gateway
Stop icman
Terminate gui connections

Tue May 28 10:47:28 2013
DpSigInt: caught signal 15
DpHalt: shutdown server >sapp20_P20_20 < (normal)
Stop work processes


Here is the dis_w0.old

C Application info callback registered.
C Client NLS setting (by OCINlsGetInfo): con=1, 'AMERICAN_AMERICA.UTF16'
C Logon as OPS$-user to get SAPSR3's password
C Connecting as /@P20 on connection 1 (nls 0) ... (dbsl 701 170611, UNICODE[2])
C Starting user session: OCISessionBegin(con_hdl=1, usr='/',svchp=0x6000000004c294e0, srvhp=0x6000000003a58f80, usrhp=0x600000000578da40)
C Now '/@P20' is connected: con_hdl=1, nls_hdl=0, session_id=919.
C Got SAPSR3's password from OPS$-user
C Disconnecting from connection 1 ...
C Closing user session (con_hdl=1,svchp=0x6000000004c294e0,usrhp=0x600000000578da40)
C Disconnected (con=1) from ORACLE.
C Connecting as SAPSR3/<pwd>@P20 on connection 1 (nls 0) ... (dbsl 701 170611, UNICODE[2])
C Starting user session: OCISessionBegin(con_hdl=1, usr=SAPSR3/<pwd>, svchp=0x6000000004c294e0, srvhp=0x6000000003a58f80, usrhp=0x600000000578da40)
C Now 'SAPSR3/<pwd>@P20' is connected: con_hdl=1, nls_hdl=0, session_id=919.
C con=1, V$NLS_PARAMETERS: NLS_LANG=AMERICAN_AMERICA.UTF8, NLS_NCHAR=UTF8
C Nls CharacterSet NationalCharSet EnvHp ErrHp ErrBt
C 0 UTF16 AL16UTF16 0x6000000003a28360 0x6000000003a337f8 0x6000000003a58318
C DB instance P20 is running on ph12 with ORACLE version 11.2.0.3.0 since JAN 13, 2013, 02:29:51
B Connection 1 opened (DBSL handle 1)
D
D Tue May 28 06:04:00 2013
D *** ERROR => tablecontrol error on screen [diagotab.c 2788]
D *** ERROR => >SAPMF02D< >7324< [diagotab.c 2789]
D *** ERROR => tablecontrol >TCTRL_PARTNERROLLEN< created for screen 0324 [diagotab.c 2793]
D *** ERROR => but used on screen 7324 [diagotab.c 2794]
D
D Tue May 28 06:04:13 2013
D *** ERROR => tablecontrol error on screen [diagotab.c 2788]
D *** ERROR => >SAPMF02D< >7350< [diagotab.c 2789]
D *** ERROR => tablecontrol >TCTRL_STEUERN< created for screen 1350 [diagotab.c 2793]
D *** ERROR => but used on screen 7350 [diagotab.c 2794]
D
D Tue May 28 06:04:22 2013
D *** ERROR => tablecontrol error on screen [diagotab.c 2788]
D *** ERROR => >SAPMF02D< >7340< [diagotab.c 2789]
D *** ERROR => tablecontrol >TCTRL_ABLADESTELLEN< created for screen 0340 [diagotab.c 2793]
D *** ERROR => but used on screen 7340 [diagotab.c 2794]
A
A Tue May 28 06:07:29 2013
A ***SUBPOOL*** generating subroutine pool %_T00O4G for user 20621077 (6acb0001).
A MainProg=SAPLM61K, Incl=LM61KF90, Line=282
A
A Tue May 28 06:11:20 2013
A ***SUBPOOL*** generating subroutine pool %_T00O4H for user 40187721 (6bad0100).
A MainProg=SAPLSBAL_DISPLAY_BASE, Incl=LSBAL_DISPLAY_BASEF01, Line=316
C
C Tue May 28 06:15:52 2013

C CbApplInfoGet() failed (ignored 1400).
C
C Tue May 28 06:16:56 2013
C Disconnecting from connection 1 ...
C Closing user session (con_hdl=1,svchp=0x6000000004c294e0,usrhp=0x600000000578da40)
C Disconnected (con=1) from ORACLE.
B Disconnected from connection 1, con_da={R/3*WFCONTAINER,1912}
M
M Tue May 28 06:55:22 2013
M ThAlarmHandler (1)
M ThAlarmHandler: set CONTROL_TIMEOUT/DP_CONTROL_JAVA_EXIT and break sql
B db_sqlbreak() = 1
M ThAlarmHandler: return from signal handler
M
M Tue May 28 06:56:22 2013
M ThAlarmHandler (2)
M ThAlarmHandler: 2. ALARM: terminate process (pid=9408, user is T138/M0)
M ThAlarmHandler: prv_action of W0: 0x2
M ThAlarmHandler: set clean state of T138/M0 to DP_TIMEOUT
M ThAlarmHandler: prv_action of W0: 0xa
M ThAlarmHandler: save snc contexts
M ThISncSaveAllContexts: save 0 snc contexts
M ThAlarmHandler: C-Stack during alarm handler
M C-STACK
(0) 0x4000000001b363b0 CTrcStack + 0x1d0 at dptstack.c:227 [dw.sapP20_D20]
(1) 0x4000000001733100 ThAlarmHandler + 0x11e0 at thxxhead.c:21417 [dw.sapP20_D20]
(2) 0x4000000001664520 DpSigAlrm + 0x220 at dpxxtool.c:2295 [dw.sapP20_D20]
(3) 0xe00000013305f440 Signal 14 (SIGALRM) delivered
(4) 0xc00000000054ee70 _semop_sys + 0x30 [/usr/lib/hpux64/libc.so.1]
(5) 0xc0000000005607e0 _semop + 0xe0 at ../../../../../core/libs/libc/shared_em_64_perf/../core/syscalls/t_semop.c:19 [/usr/lib/hpux64/libc.so.1]
(6) 0x4000000001707680 RqOsSem + 0xb0 at semux.c:1186 [dw.sapP20_D20]
(7) 0x40000000017097a0 SemRq + 0x810 at semux.c:1814 [dw.sapP20_D20]
(8) 0x4000000004cc2990 EsILock + 0x2410 at esxx.c:3449 [dw.sapP20_D20]
(9) 0x4000000004cca410 STD_EsAttach + 0x1d0 at esxx.c:2348 [dw.sapP20_D20]
(10) 0x4000000004cd5110 EsAttach + 0x90 at esxxfunc.c:874 [dw.sapP20_D20]
(11) 0x4000000004c988c0 EmContextAttach + 0x1e0 at emxx.c:932 [dw.sapP20_D20]
(12) 0x40000000018e20a0 ThCheckEmState + 0x300 at thxxmem.c:438 [dw.sapP20_D20]
(13) 0x40000000018dd780 ThRollIn + 0x380 at thxxmem.c:870 [dw.sapP20_D20]
(14) 0x400000000175bc20 ThSessionRestore + 0x180 at thxxhead.c:22129 [dw.sapP20_D20]
(15) 0x40000000017250b0 TskhLoop + 0x1210 at thxxhead.c:3542 [dw.sapP20_D20]
(16) 0x400000000171f000 ThStart + 0x5d0 at thxxhead.c:10759 [dw.sapP20_D20]
(17) 0x40000000015ab260 DpMain + 0x870 at dpxxdisp.c:1152 [dw.sapP20_D20]
(18) 0x40000000015a4b60 main + 0x80 at thxxanf.c:64 [dw.sapP20_D20]
(19) 0xc00000000006e9b0 main_opd_entry + 0x50 [/usr/lib/hpux64/dld.so]
M
M ***LOG Q02=> wp_halt, WPStop (Workproc 0 9408) [dpuxtool.c 268]
ashuai
 
Posts: 3
Joined: Thu May 30, 2013 4:16 am

Re: how to find out what caused the app server down

Postby ashuai » Tue Jun 04, 2013 2:21 am

Hi snowy , thank for for your reply.

here is the dev_disp.old
Tue May 28 05:43:28 2013
DpHdlDeadWp: restart wp (pid=4092) automatically
DpHdlDeadWp: restart wp (pid=7475) automatically

Tue May 28 05:48:28 2013
DpHdlDeadWp: restart wp (pid=7558) automatically

Tue May 28 05:59:08 2013
DpHdlDeadWp: restart wp (pid=7781) automatically

Tue May 28 06:01:48 2013
DpHdlDeadWp: restart wp (pid=7559) automatically

Tue May 28 06:06:08 2013
DpHdlDeadWp: restart wp (pid=8356) automatically

Tue May 28 06:07:48 2013
DpHdlDeadWp: restart wp (pid=7158) automatically

Tue May 28 06:23:29 2013
DpHdlDeadWp: restart wp (pid=8933) automatically

Tue May 28 06:32:29 2013
DpHdlDeadWp: restart wp (pid=8869) automatically

Tue May 28 06:35:02 2013

SoftCancel request for T196 U25298 M2 received from REMOTE_TERMINAL

Tue May 28 06:35:29 2013
DpHdlDeadWp: restart wp (pid=26907) automatically

Tue May 28 10:42:25 2013
DpSigInt: caught signal 2
DpHalt: shutdown server >sapp20_P20_20 < (normal)
DpModState: change server state from ACTIVE to SHUTDOWN
Stop work processes

Tue May 28 10:43:26 2013
SoftCancel request for T61 U62 M0 received from DISPATCHER
SoftCancel request for T64 U65 M0 received from DISPATCHER
SoftCancel request for T65 U66 M0 received from DISPATCHER
SoftCancel request for T67 U68 M0 received from DISPATCHER
SoftCancel request for T68 U69 M0 received from DISPATCHER
SoftCancel request for T69 U70 M0 received from DISPATCHER

Tue May 28 10:44:26 2013
*** ERROR => DpWpKill(9408, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(5017, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(19231, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(7207, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(7095, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(26552, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(24400, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(2441, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(15636, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29057, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(11921, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(18804, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(21517, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29061, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(28708, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29063, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29068, SIGUSR2): kill failed [dpxxtool.c 2527]
*** ERROR => DpWpKill(29069, SIGUSR2): kill failed [dpxxtool.c 2527]
Stop gateway
Stop icman
Terminate gui connections

Tue May 28 10:47:28 2013
DpSigInt: caught signal 15
DpHalt: shutdown server >sapp20_P20_20 < (normal)
Stop work processes


Here is the dis_w0.old

C Application info callback registered.
C Client NLS setting (by OCINlsGetInfo): con=1, 'AMERICAN_AMERICA.UTF16'
C Logon as OPS$-user to get SAPSR3's password
C Connecting as /@P20 on connection 1 (nls 0) ... (dbsl 701 170611, UNICODE[2])
C Starting user session: OCISessionBegin(con_hdl=1, usr='/',svchp=0x6000000004c294e0, srvhp=0x6000000003a58f80, usrhp=0x600000000578da40)
C Now '/@P20' is connected: con_hdl=1, nls_hdl=0, session_id=919.
C Got SAPSR3's password from OPS$-user
C Disconnecting from connection 1 ...
C Closing user session (con_hdl=1,svchp=0x6000000004c294e0,usrhp=0x600000000578da40)
C Disconnected (con=1) from ORACLE.
C Connecting as SAPSR3/<pwd>@P20 on connection 1 (nls 0) ... (dbsl 701 170611, UNICODE[2])
C Starting user session: OCISessionBegin(con_hdl=1, usr=SAPSR3/<pwd>, svchp=0x6000000004c294e0, srvhp=0x6000000003a58f80, usrhp=0x600000000578da40)
C Now 'SAPSR3/<pwd>@P20' is connected: con_hdl=1, nls_hdl=0, session_id=919.
C con=1, V$NLS_PARAMETERS: NLS_LANG=AMERICAN_AMERICA.UTF8, NLS_NCHAR=UTF8
C Nls CharacterSet NationalCharSet EnvHp ErrHp ErrBt
C 0 UTF16 AL16UTF16 0x6000000003a28360 0x6000000003a337f8 0x6000000003a58318
C DB instance P20 is running on ph12 with ORACLE version 11.2.0.3.0 since JAN 13, 2013, 02:29:51
B Connection 1 opened (DBSL handle 1)
D
D Tue May 28 06:04:00 2013
D *** ERROR => tablecontrol error on screen [diagotab.c 2788]
D *** ERROR => >SAPMF02D< >7324< [diagotab.c 2789]
D *** ERROR => tablecontrol >TCTRL_PARTNERROLLEN< created for screen 0324 [diagotab.c 2793]
D *** ERROR => but used on screen 7324 [diagotab.c 2794]
D
D Tue May 28 06:04:13 2013
D *** ERROR => tablecontrol error on screen [diagotab.c 2788]
D *** ERROR => >SAPMF02D< >7350< [diagotab.c 2789]
D *** ERROR => tablecontrol >TCTRL_STEUERN< created for screen 1350 [diagotab.c 2793]
D *** ERROR => but used on screen 7350 [diagotab.c 2794]
D
D Tue May 28 06:04:22 2013
D *** ERROR => tablecontrol error on screen [diagotab.c 2788]
D *** ERROR => >SAPMF02D< >7340< [diagotab.c 2789]
D *** ERROR => tablecontrol >TCTRL_ABLADESTELLEN< created for screen 0340 [diagotab.c 2793]
D *** ERROR => but used on screen 7340 [diagotab.c 2794]
A
A Tue May 28 06:07:29 2013
A ***SUBPOOL*** generating subroutine pool %_T00O4G for user 20621077 (6acb0001).
A MainProg=SAPLM61K, Incl=LM61KF90, Line=282
A
A Tue May 28 06:11:20 2013
A ***SUBPOOL*** generating subroutine pool %_T00O4H for user 40187721 (6bad0100).
A MainProg=SAPLSBAL_DISPLAY_BASE, Incl=LSBAL_DISPLAY_BASEF01, Line=316
C
C Tue May 28 06:15:52 2013

C CbApplInfoGet() failed (ignored 1400).
C
C Tue May 28 06:16:56 2013
C Disconnecting from connection 1 ...
C Closing user session (con_hdl=1,svchp=0x6000000004c294e0,usrhp=0x600000000578da40)
C Disconnected (con=1) from ORACLE.
B Disconnected from connection 1, con_da={R/3*WFCONTAINER,1912}
M
M Tue May 28 06:55:22 2013
M ThAlarmHandler (1)
M ThAlarmHandler: set CONTROL_TIMEOUT/DP_CONTROL_JAVA_EXIT and break sql
B db_sqlbreak() = 1
M ThAlarmHandler: return from signal handler
M
M Tue May 28 06:56:22 2013
M ThAlarmHandler (2)
M ThAlarmHandler: 2. ALARM: terminate process (pid=9408, user is T138/M0)
M ThAlarmHandler: prv_action of W0: 0x2
M ThAlarmHandler: set clean state of T138/M0 to DP_TIMEOUT
M ThAlarmHandler: prv_action of W0: 0xa
M ThAlarmHandler: save snc contexts
M ThISncSaveAllContexts: save 0 snc contexts
M ThAlarmHandler: C-Stack during alarm handler
M C-STACK
(0) 0x4000000001b363b0 CTrcStack + 0x1d0 at dptstack.c:227 [dw.sapP20_D20]
(1) 0x4000000001733100 ThAlarmHandler + 0x11e0 at thxxhead.c:21417 [dw.sapP20_D20]
(2) 0x4000000001664520 DpSigAlrm + 0x220 at dpxxtool.c:2295 [dw.sapP20_D20]
(3) 0xe00000013305f440 Signal 14 (SIGALRM) delivered
(4) 0xc00000000054ee70 _semop_sys + 0x30 [/usr/lib/hpux64/libc.so.1]
(5) 0xc0000000005607e0 _semop + 0xe0 at ../../../../../core/libs/libc/shared_em_64_perf/../core/syscalls/t_semop.c:19 [/usr/lib/hpux64/libc.so.1]
(6) 0x4000000001707680 RqOsSem + 0xb0 at semux.c:1186 [dw.sapP20_D20]
(7) 0x40000000017097a0 SemRq + 0x810 at semux.c:1814 [dw.sapP20_D20]
(8) 0x4000000004cc2990 EsILock + 0x2410 at esxx.c:3449 [dw.sapP20_D20]
(9) 0x4000000004cca410 STD_EsAttach + 0x1d0 at esxx.c:2348 [dw.sapP20_D20]
(10) 0x4000000004cd5110 EsAttach + 0x90 at esxxfunc.c:874 [dw.sapP20_D20]
(11) 0x4000000004c988c0 EmContextAttach + 0x1e0 at emxx.c:932 [dw.sapP20_D20]
(12) 0x40000000018e20a0 ThCheckEmState + 0x300 at thxxmem.c:438 [dw.sapP20_D20]
(13) 0x40000000018dd780 ThRollIn + 0x380 at thxxmem.c:870 [dw.sapP20_D20]
(14) 0x400000000175bc20 ThSessionRestore + 0x180 at thxxhead.c:22129 [dw.sapP20_D20]
(15) 0x40000000017250b0 TskhLoop + 0x1210 at thxxhead.c:3542 [dw.sapP20_D20]
(16) 0x400000000171f000 ThStart + 0x5d0 at thxxhead.c:10759 [dw.sapP20_D20]
(17) 0x40000000015ab260 DpMain + 0x870 at dpxxdisp.c:1152 [dw.sapP20_D20]
(18) 0x40000000015a4b60 main + 0x80 at thxxanf.c:64 [dw.sapP20_D20]
(19) 0xc00000000006e9b0 main_opd_entry + 0x50 [/usr/lib/hpux64/dld.so]
M
M ***LOG Q02=> wp_halt, WPStop (Workproc 0 9408) [dpuxtool.c 268]
ashuai
 
Posts: 3
Joined: Thu May 30, 2013 4:16 am

Re: how to find out what caused the app server down

Postby bcchap » Fri Jun 07, 2013 7:09 am

This message seens to be your first one ... a signal 2 means someone is shutting down SAP

DpSigInt: caught signal 2

Then you are getting a DP_TIMEOUT from the task handler where its tired of waiting and managed to kill but nastily

So to me it looks like the system hangs and then someone kills it manually? ... is that what is happening?

You are getting a signal 14 wich I believe is an invalid memory reference (is this Linux?) typically this can happen during shoutdown but quite possibly you have a memory corruption that caused a hang and someone shut down the instance ... a signal 2 is not automatic.
The memory corruption could be down to a memory area like the screen buffer being too small .
Another cause could be a recent kernel change or similar so what changed lately?

If it was my prod system I would restart the entire host (or clean shared memory for fussy types that do not like a restart) and wait for a recurrance after checking short duups for memory overflows and keeping a good eye on ST02 on all prod app servers
bcchap
 
Posts: 6
Joined: Fri Jun 07, 2013 6:53 am


Return to Basis

Who is online

Users browsing this forum: No registered users and 8 guests





loading...


This website is not affiliated with, sponsored by, or approved by SAP AG.