Friday, June 3, 2011

DbgEng.lib Part 1- Checking for Debug Breaks

DbgEng.lib- What is it?
dbgeng.lib is the library upon which WinDbg, CDB, and NTSD are all built.  It allows you to connect to a Windows operating system, insert breakpoints, catch bugchecks, dump memory, and anytyhing else you can do in a debugger.  Further, it allows you to connect to existing debugger instances and check their status.  You can get your hands on dbgeng.lib by downloading the debugging tools for windows here

Why would I want to use it?
Good question.  I used it to develop some automated stress testing for a component of windows.  We wanted to be careful that we weren’t rebooting VMs that were bugchecked (and losing all that juicy stack trace info), and the best way to do that was to check the attached debugger for a break before reseting the VM.  You could use it for driver development (seriously… test your drivers, or windows will bugcheck and everybody blames us), to implement your own debugger extensions, or to auto-triage failures based on the call stack.

What's the catch?
There’s always a catch.  It turns out that if your debugger gets out of sync with the client, some calls to dbgEng can hang indefinitely.  But we’ll get to that in a bit.

Getting Started
In this example, I’m going to stick with checking whether a debugger is broken in.  The more advanced case is implementing your own debugger extension, and maybe I’ll post about that later.
First, you’ll need to make sure you have a debugger running that’s connected to the target.  Pretty easy.  Say you have a VM named “vm1″, and you’ve set it’s com1 port to be a named pipe called “vm1_com1″, you would just run

kd -server npipe:pipe=vm1_debug_pipe,icfenable -k com:pipe,port=vm1_com1,resets=0,reconnect

from an elevated cmd prompt.  At this point, you’ll be able to connect to that remotely from any remote server (windbg.exe -remote npipe:server=servernamewherekdlives,pipe=vm1_debug_pipe)  If the usernames and passwords aren’t the same for the server your connecting from and the one you’re connecting to, you’ll get an access denied error.  Try the following from an elevated cmd prompt on the connecting machine to connect as a specific user:

net use \\servername /u:username password

Next, you’ll want to fire up your trusty C++ IDE.  Include dbgeng.h, and make sure you set your project to point to dbgeng.lib.

Codify Me!
Now Let’s take a look at some code:
#include dbgend.h
int main()
{
    HRESULT hr;
    BOOL BrokenIn = FALSE;
    HRESULT Status = 0;
    IDebugControl* pControl = NULL;
    IDebugClient* pClient = NULL;
    CHAR RemoteOptions[] = "npipe:server=localhost,pipe=vm_com1";
    PCSTR pcRemoteOptions((PSTR)RemoteOptions);
    hr = DebugConnect(pcRemoteOptions, __uuidof(IDebugClient), (PVOID*)&pClient);
    if(FAILED(hr))
    {
        TRACE_ERROR("DebugConnect failed with error %x", hr);
        goto Cleanup;
    }
    hr = pClient->QueryInterface(__uuidof(IDebugControl), (PVOID*)&pControl);
    if(FAILED(hr))
    {
        TRACE_ERROR("QueryInterface failed with error %x", hr);
        goto Cleanup;
    }
    hr = pControl->GetExecutionStatus(&status);
    if(FAILED(hr))
    {
        TRACE_ERROR("GetExecutionStatus failed with error %x", hr);
        goto Cleanup;
    }
    if(*status == DEBUG_STATUS_BREAK)
    {
        BrokenIn = TRUE;
    }
Cleanup:
    if(pControl)
    {
        pControl->Release();
    }
    if(pClient)
    {
        pClient->Release();
    }
    if(BrokenIn)
    {
        TRACE_ERROR(0, "Debugger is broken in");
        /*do something interesting*/
    }
    return 0;
}  


So what just happened?
Well, we called DebugConnect, a function provided by DbgEng, to connect to an already existing debugger.  If we wanted to create a debugger instead, we would use the DebugCreate() call.  We got ourselves an IDebugClient* out of the deal, which is a pointer to a COM interface object that we can use to query for an IDebugControl instance.  We could also query for other objects, like IDebugSymbols, IDebugDataSpaces, and IDebugRegisters.  These would be used to do interesting things in a debugger extension.

Once we had our IDebugControl instance, we just queried for the execution status of the debugger.  Viola, we know whether or now we’re broken in!  Now, you can do so much more with this interface- anything from executing debugger commands to disassembling processor commands to reading memory locations.

Now for the BUT...
As I mentioned before, the debuger and the debugee can get out of sync.  This leads to the debugger outputting the error “retry sending the same data packet 64 times” and becoming totally unresponsive.  Even worse, any call into the IDebugControl interface HANGS INDEFINITELY.  Horrible.  Just horrible.

In my experience, this seems to happen only on rebooting the debugee machine.  The workaround I came up with for dealing with this was to implement a ref counting scheme for each debug instance.  Every time a thread wanted to call into a debug instance, it would increment the ref count and set the LastCalledTime variable.  I then had a seperate thread that checked every few seconds if the ref count was > 0 and the LastCalledTime was greater than some threshold value.  Once you know the debugger is hung, you can either abort completely, closing down the debugger and starting up a new one, or you can reboot the debugee again.  Sometimes it takes several reboots before the debugee connects succesfully.

This is a terrible, hacky way to deal with it, but its better than the alternative.  It boggles my mind a that there isn’t a timeout implemented inside dbgeng.lib for exactly this case- that seems by far the best solution.

But alas, you can’t win them all.

No comments:

Post a Comment