~fhusson

WinDbg, howto find the cause of an IIS crash in production

We had a dump file of the crash \o/ and I had a Microsoft training on this subject 3 month ago with the impressive Christophe Nasarre.

Opening the dump

  1. Launch WinDbg x64 because we have a dump of a 64 bits process
  2. CTRL+D or FILE / OPEN CRASH DUMP

Setting the symbols server

.sympath SRV\*c:\symbols\*http://msdl.microsoft.com/download/symbols
.reload

Check the modification

.sympath
Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Expanded Symbol search path is: srv*c:\symbols*http://msdl.microsoft.com/download/symbols

Loading the SOS module for .NET

For .NET 2.0 | 3.0 | 3.5

.loadby sos mscorwks

For .NET 4.0

.loadby sos clr

Check what is loaded

.chain

Check if the module SOS is working

!EEVersion

If you have this error :

Failed to load data access DLL, 0x80004005
Verify that 1) you have a recent build of the debugger (6.2.14 or newer)
            2) the file mscordacwks.dll that matches your version of mscorwks.dll is
                in the version directory
            3) or, if you are debugging a dump file, verify that the file
                mscordacwks_<arch>_<arch>_<version>.dll is on your symbol path.
            4) you are debugging on the same architecture as the dump file.
                For example, an IA64 dump file must be debugged on an IA64
                machine.

You can also run the debugger command .cordll to control the debugger's
load of mscordacwks.dll.  .cordll -ve -u -l will do a verbose reload.
If that succeeds, the SOS command should work on retry.

If you are debugging a minidump, you need to make sure that your executable
path is pointing to mscorwks.dll as well.

Try what is in the message :

.cordll -ve -u -l

And if you got :

CLR DLL status: No load attempts

Try :

lmv m mscorwks

You should have something like :

start             end                 module name
000007fe`f9030000 000007fe`f99cc000   mscorwks   (pdb symbols)          c:\symbols\mscorwks.pdb\A3BDE007E06845F7A0A4073CD16B1D7A1\mscorwks.pdb
    Loaded symbol image file: mscorwks.dll
    Mapped memory image file: c:\symbols\mscorwks.dll\4E15396099c000\mscorwks.dll
    Image path: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscorwks.dll
    Image name: mscorwks.dll
    Timestamp:        Thu Jul 07 06:43:12 2011 (4E153960)
    CheckSum:         0098DCAB
    ImageSize:        0099C000
    File version:     2.0.50727.5448
    Product version:  2.0.50727.5448
    File flags:       0 (Mask 3F)
    File OS:          4 Unknown Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® .NET Framework
    InternalName:     mscorwks.dll
    OriginalFilename: mscorwks.dll
    ProductVersion:   2.0.50727.5448
    FileVersion:      2.0.50727.5448 (Win7SP1GDR.050727-5400)
    FileDescription:  Microsoft .NET Runtime Common Language Runtime - WorkStation
    LegalCopyright:   © Microsoft Corporation.  All rights reserved.
    Comments:         Flavor=Retail

Add the path where mscorwks.dll is located (Image path) with :

.exepath+ C:\Windows\Microsoft.NET\Framework64\v2.0.50727\
.reload

(Thanks to Volker von Einem for his post)

We check again if SOS is working :

!EEVersion
CLRDLL: Loaded DLL C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscordacwks.dll
2.0.50727.5448 free
Server mode with 24 gc heaps
SOS Version: 2.0.50727.5448 retail build

Finding the cause

We can take a look at all the managed threads with :

!threads
ThreadCount: 38
UnstartedThread: 0
BackgroundThread: 38
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive Lock
ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception
28 1 1430 0000000001f7ce30 8220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
57 2 19b8 0000000001fd9f00 b220 Enabled 000000019f6d1378:000000019f6d3060 0000000001f704c0 0 MTA (Finalizer)
58 3 140c 0000000002012840 80a220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 MTA (Threadpool Completion Port)
59 4 1534 00000000020136f0 1220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
60 9 b08 0000000009bd07a0 180b220 Enabled 000000016f893630:000000016f894500 0000000001f704c0 0 MTA (Threadpool Worker)
61 a 15d0 0000000009ae53a0 180b220 Disabled 00000001c0003ae8:00000001c0005340 00000000020141a0 1 MTA (Threadpool Worker) System.StackOverflowException (00000001af2a0138)
62 b 1b90 00000000098cd980 180b220 Enabled 00000001dfcc8a20:00000001dfcc8ad0 0000000001f704c0 0 MTA (Threadpool Worker)
63 c 1e74 000000000ca417c0 180b220 Enabled 0000000100383358:0000000100385200 0000000001f704c0 0 MTA (Threadpool Worker)
64 d 15a0 0000000009c5f9d0 180b220 Enabled 0000000140a3f988:0000000140a3fb28 0000000001f704c0 0 MTA (Threadpool Worker)
65 e 1d4c 000000000c8e12a0 180b220 Enabled 000000016076a250:000000016076bad0 0000000001f704c0 0 MTA (Threadpool Worker)
66 f 1ca0 0000000009c2cc90 180b220 Enabled 000000027020ffa0:00000002702115c8 0000000001f704c0 0 MTA (Threadpool Worker)
68 10 b74 000000000c97d6e0 180b220 Enabled 00000001affdbcd8:00000001affddc28 00000000020141a0 1 MTA (Threadpool Worker)
69 11 1d20 000000000ca01290 180b220 Enabled 0000000110415f08:0000000110417860 0000000001f704c0 0 MTA (Threadpool Worker)
70 12 15dc 000000000c9ffbb0 180b220 Enabled 000000011faae010:000000011faaf408 0000000001f704c0 0 MTA (Threadpool Worker)
71 13 134 000000000ca2dbb0 180b220 Enabled 00000001ef8a9f00:00000001ef8aa6b8 0000000001f704c0 0 MTA (Threadpool Worker)
72 14 1f54 000000000ca2e180 180b220 Enabled 000000023f96bc40:000000023f96db18 0000000001f704c0 0 MTA (Threadpool Worker)
73 15 166c 000000000ca2e750 180b220 Enabled 000000011fb05b38:000000011fb07408 0000000001f704c0 0 MTA (Threadpool Worker)
74 16 1b20 000000000ca2ed20 180b220 Enabled 00000001ffdc1038:00000001ffdc1ef8 0000000001f704c0 0 MTA (Threadpool Worker)
75 17 1774 000000000ca2f2f0 180b220 Enabled 000000023fb99c60:000000023fb9a618 0000000001f704c0 0 MTA (Threadpool Worker)
76 8 1b10 000000000ca2f8c0 180b220 Enabled 00000001dfcb6100:00000001dfcb6ad0 0000000001f704c0 0 MTA (Threadpool Worker)
77 7 10f4 000000000ca2fe90 180b220 Enabled 000000012ffb5400:000000012ffb7398 0000000001f704c0 0 MTA (Threadpool Worker)
78 18 42c 000000000ca30460 180b220 Enabled 000000021f80cd48:000000021f80d468 0000000001f704c0 0 MTA (Threadpool Worker)
79 19 1d94 000000000ca30a30 180b220 Enabled 000000022f8ad6e8:000000022f8af470 0000000001f704c0 0 MTA (Threadpool Worker)
80 1a 127c 000000000ca31000 180b220 Enabled 000000014f919ed8:000000014f91a980 0000000001f704c0 0 MTA (Threadpool Worker)
81 1b 1928 000000000ca315d0 180b220 Enabled 00000001cff3abd0:00000001cff3c168 0000000001f704c0 0 MTA (Threadpool Worker)
82 1c 1250 000000000ca31ba0 180b220 Enabled 000000024f96e848:000000024f9707a0 0000000001f704c0 0 MTA (Threadpool Worker)
83 1d 16ac 000000000ca32170 180b220 Enabled 00000001bff81ce0:00000001bff83340 0000000001f704c0 0 MTA (Threadpool Worker)
84 1e 1eec 000000000ca32740 180b220 Enabled 000000024f982a30:000000024f9847a0 0000000001f704c0 0 MTA (Threadpool Worker)
85 20 424 000000000ca332e0 880b220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 MTA (Threadpool Completion Port)
86 21 3fc 000000000ca338b0 200b220 Enabled 000000018fa79a10:000000018fa7a078 00000000020141a0 1 MTA
4 22 1d8c 000000000ca33e80 220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
88 23 b58 000000000ca34450 200b220 Enabled 000000017f9e4390:000000017f9e5d20 00000000020141a0 0 MTA System.IO.FileNotFoundException (00000000ff338d80)
24 1f 1d90 000000000ca32d10 220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
27 6 b04 000000000ca34a20 220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
26 24 17c4 000000000ccb89e0 220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
91 25 1844 000000000ccb8fb0 80a220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 MTA (Threadpool Completion Port)
2 26 86c 000000000ccb8410 220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn
90 27 1c88 000000000ccba120 220 Enabled 0000000000000000:0000000000000000 0000000001f704c0 0 Ukn

We have a StackOverflowException !

if we are not on the good thread we change with the command ~XXs where XX is the thread number, here we do :

~61s

Now we can take a look at the stack :

!CLRStack
OS Thread Id: 0x15d0 (61)
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
Unable to load image C:\Windows\assembly\NativeImages_v2.0.50727_64\Microsoft.VisualBas#\486ff8cee09c8c63aa9c60ff4f5feafa\Microsoft.VisualBasic.ni.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for Microsoft.VisualBasic.ni.dll
*** WARNING: Unable to verify checksum for System.Data.ni.dll
(!clrstack processes a max of 1000 stack frames)
Child-SP RetAddr Call Site
000000000d4b7230 000007fef5237fed XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b7260 000007ff002446c6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b72a0 000007ff00244634 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b7300 000007ff002445dd XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b7340 000007ff00ec16af XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b7370 000007ff015f0389 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b7460 000007ff00e9bcac XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4b74a0 000007fef913e219 Line 1
000000000d4b9460 000007fef913e219 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bb430 000007fef913e219 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bd430 000007fef913e219 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf410 000007fef21143d5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf450 000007fef2112c5b XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf4b0 000007fef2107b4a XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf580 000007fef2107410 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf5d0 000007fef21043c9 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf650 000007fef21042ad XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf6d0 000007fef2103d48 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf7b0 000007fef2103b8c XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf860 000007fef21039c3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf8a0 000007fef2103835 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf950 000007ff00d3a002 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bf9c0 000007ff0148fa2a XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bfa70 000007ff0148f8d6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bfb10 000007ff0148f847 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bfb60 000007ff00e9bade XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000d4bfbb0 000007ff00d4eb9a Line 1
000000000d4bfc60 000007ff00d4ef02 Line 2
000000000d4bfca0 000007ff00d4db60 Line 3
000000000d4bfd30 000007ff01984a67 Line 4
000000000d4bfe50 000007ff00e9bc11 Line 5
000000000d4bfef0 000007ff00d4eb9a Line 1
000000000d4bffa0 000007ff00d4ef02 Line 2
000000000d4bffe0 000007ff00d4db60 Line 3
000000000d4c0070 000007ff01984a67 Line 4
000000000d4c0190 000007ff00e9bc11 Line 5
000000000d4c0230 000007ff00d4eb9a Line 1
000000000d4c02e0 000007ff00d4ef02 Line 2
000000000d4c0320 000007ff00d4db60 Line 3
000000000d4c03b0 000007ff01984a67 Line 4
000000000d4c04d0 000007ff00e9bc11 Line 5
000000000d4c0570 000007ff00d4eb9a Line 1
000000000d4c0620 000007ff00d4ef02 Line 2
000000000d4c0660 000007ff00d4db60 Line 3
000000000d4c06f0 000007ff01984a67 Line 4
000000000d4c0810 000007ff00e9bc11 Line 5
000000000d4c08b0 000007ff00d4eb9a Line 1
000000000d4c0960 000007ff00d4ef02 Line 2
000000000d4c09a0 000007ff00d4db60 Line 3
000000000d4c0a30 000007ff01984a67 Line 4
000000000d4c0b50 000007ff00e9bc11 Line 5
...

Now we can launch visual studio and change the code to remove/stop the recursive calls.

if we need more information we can try to look at parameters / objects from the dump with :

for the parameters :

!clrstack -p

for local object :

!clrstack -l

or for both :

!clrstack -a

when you have a reference you can dump it with :

!do 0x000000015f9e56f8
Discuss on Twitter