Debugging Native Memory Leaks, Part 1: LeakDiag

Leaking memory is probably the single most painful aspect of native code – its the reason managed was ever born.

At work, our code routes ‘new’ calls through_aligned_malloc_dbg. This CRT API, along with cousins like _malloc_dbg and _calloc_dbg, takes extra parameters containing a file name and line number, and so enables the CRT to report the exact location of an unreleased allocation upon process termination:

Sadly, this is useful only in the handful of cases where the code allocates directly. What if the offending allocation is performed via some common container routine? Even worse – what if the leak is properly de-allocated in destructors at shutdown time? The CRT support would be of no use. Wouldn’t it be nice if we could see the entire stack that allocated unreleased memory?

There are two powerful, free, and vastly different tools from Microsoft, that achieve just that.

Enter LeakDiag!

I’m actually not sure how public this tool is. Two of its components, LeakDiag and LDGrapher are available on the public MS FTP, but a third one, LDParser, seems to be available only by Microsoft Premier Support. Anyway, both LDParser and LDGrapher only format the output, and LDGrapher can do most (but not all!) of what LDParser does.

LeakDiag does its magic by using Detours technology (fascinating read!) to intercept memory allocators calls. Detours enables interception of any API, and not just replace it but extend it – that is, it preserves the original function, and enables calling it via a so called ‘trampoline’ stub. LeakDiag allows you to specify various low-level allocators, and once activated it intercepts them and adds stack-walking functionality to them.

To demonstrate, consider the leaking code here (adapted from a UMDH demo) :

#include “stdafx.h”
#include
#include

void LeakyFunc1();
void LeakyFunc2();
void LeakyFunc3();
void LeakHere(int value);

int _tmain(int argc, _TCHAR* argv[])
{
printf(“Activate LeakDiag tracking here…nn”);
_getch();
printf(“Take first log now…nn”);
_getch();

for (int i=0;i<1000;i++) { LeakyFunc1(); LeakyFunc2(); LeakyFunc3(); } printf("Take second log..nnPress any key to exit application...n"); _getch(); return 0; } void LeakyFunc1() { LeakHere(500); } void LeakyFunc2() { LeakHere(1000); } void LeakyFunc3() { LeakHere(1500); } void LeakHere(int value) { char * cBuff = new char[value]; } [/sourcecode] Start LeakDiag and run your program. In the LeakDiag window select view/Refresh (just a good habit, meaningful if it was already running), and select your process at the list (in this example: LeakyApp.exe). At the first wait for input (and generally, as early as you can in the program), start the LeakDiag C Runtime allocator:

This activates the API interception in your selected process (LeakyApp.exe), for your selected API (CRT allocator).

Focus back to LeakyApp.exe and press any key. At the second and third waits for input, take a LeakDiag log: leakdiag2

Click anything again to end the program. LeakDiag dumps his logs as xml files, by default into %LeakDiag dir%/logs. The files are quite readable and it is occasionally useful to manually dig in – but to gain some insight into large dumps (or many dumps), LDGrapher can help tremendously.

Start LDGrapher. Open all the allocation logs you dumped. A multi-line graph would appear, where every line represents a specific recurring allocation stack. The stack responsible for most allocations is coloured red, the rest are yellow. Each x-tick is a specific log dumped. And the useful part: click any circle, representing an allocation stack at a specific log, and inspect the complete stack leading to that allocation, along with source file names and line numbers!

leakdiag3

While this is mega-cool as is, it can be further tweaked in many ways. You can intercept and dump different allocators. If you’re hunting for a leak of a known size, you can limit the dump to include just that size (or a size range). You can use DbgHlp StackWalk to overcome FPO (which you shouldn’t use in your own code anyway!), and some more.

However, LeakDiag has one significant flaw – that I think amounts to just a weird design choice (that is, I can’t understand when it would be helpful): all its functionality (except the GUI) seems to run in the target process. You can actually see LeakDiag messages in your debugger.

That can make delicate control quite hard. For one, you cannot place breakpoints in locations where you want a dump. (hence,the ‘getch’ in the code sample). For two, suppose you’re not continuously leaking memory – but just forgot to release some. Wouldn’t it be nice to be able to take an allocation dump just before the process terminates, and see whatever still needs releasing? Alas, you cannot do this with LeakDiag, as any code you intend to run immediately before terminating would not run.

The solution (hint, UMDH), would have to wait for another post.

This entry was posted in Debugging. Bookmark the permalink.

12 Responses to Debugging Native Memory Leaks, Part 1: LeakDiag

Pingback: Debugging Memory Leaks, Part 2: CRT support « Ofek’s Visual C++ stuff
A. Murray says:

August 5, 2009 at 6:40 pm

Hey Ofek, these look like a couple of good tools. I’ve downloaded them to use to track down some issues I am having with a native COM+ application. So essentially, I start up LeakDiag and select the instance of dllhst3g.exe that is running my COM+ app, when I select ‘C Runtime Allocator’ and click Start, the button changes to ‘starting’ but doesn’t do anything. Are these tools compatible with native COM+ apps?

Cheers,
Andy

- Ofek Shilon says:
  
  August 6, 2009 at 10:59 am
  
  Hey Andy – last I saw this symptom, it was because my application was paused at a breakpoint, under a debugger. It seems LeakDiag injects app code that runs under identical restrictions as the app threads (unlike UMDH, for example – but more on that in a future post). Could that be your issue?
  If not – in your context, i would try tracking both ‘COM allocator’ and ‘COM internal allocator’ (haven’t got a clue what’s the difference) over CRT allocator.
  2 possibly helpful other things I learnt since writing this:
  (1) The detours technology, used by LeakDiag, isn’t perfectly robust,
  (2) a newer tool, called DebugDiag, might be of more help to you. It specifically mentions COM+ as one of its foci.
  
  - A. Murray says:
    
    August 6, 2009 at 12:17 pm
    
    Hi Ofek, Thanks for the reply. I’ve tried all of the allocator types and none of them appear to work unfortunetely. I’ve been using DebugDiag for a while to get some dumps but I’m finding that without valid symbol files they’re not as useful as I’d have otherwise hoped. I’m focusing just now on getting those valid symbol files – do you know of any good resources in this respect?
    
    Thanks again,
    Andy
    
Ofek Shilon says:

August 6, 2009 at 3:43 pm

Andy
(1) you’ve been using DebugDiag only to get dumps? It can act as a direct LeakDiag replacement, tracking allocation stacks. Have you tried it?
(2) Can’t understand the symbol issue: are you debugging an app that you don’t have the code/symbols for? Perhaps what you’re missing are symbols for some MS modules?

A. Murray says:

August 7, 2009 at 2:48 pm

Hi Ofek, how does DebugDiag affect runtime performance? We only seem able to reproduce the issue on live (we don’t use load tests – which I realise isn’t ideal) if DebugDiag affected processes too much it wouldn’t be an option for us and the client would be dubious about performance unless we could give assurances.
I have to own up to the symbol issue – we currently don’t do anything with symbol files. When we release code we don’t store the symbols and eventually they just get deleted, a situation I’m trying to rectify.
Am I right in saying that the build process generates GUIDs that are written to both the binaries and the symbol files and if they don’t match then the symbols are of no use? Do you know much about how to create a symbol server and getting symbols onto it?

- Ofek Shilon says:
  
  August 7, 2009 at 7:51 pm
  
  my bad! my bad!there’s indeed a GUID involved in the matching. (the link is a great read and very relevant by itself).
  
- Ofek Shilon says:
  
  August 8, 2009 at 11:43 pm
  
  Some more details:
  I wasn’t *completely* wrong about timestamp matching: PDB v2.0 indeed used timestamps. PDB v7.0+ uses a GUID. if you produce symbols in DBG format, the match is still based on time-stamp alone.
  I was also wrong, and also not entirely, about being able to bypass the requirement to use exactly matching symbols. You can’t do it in visual studio (and rather intentionaly so). but you *can* in WinDbg: use ‘.symopt+0x40’.
  Even better: it seems you can modify the PDB with this evil utility called ChkMatch, thereby fooling VS to accept it.
  
Ofek Shilon says:

August 7, 2009 at 7:44 pm

In my brief experimenting with DebugDiag i didn’t feel any performence overhead. However, our app isn’t intensive on runtime allocations (its more like a game than like a web-server). Either way, i really don’t know enough about its internals to give predictions.
About symbols – IIRC the build process generates an executable & pdbwith *exact* same date and time, and later on debuggers match only these attributes and module name. I think also VS has a checkbox somewhere that says something like “require symbol files to match exacly”, but i’ll need to verify that back at the office on sunday. I should also try and find online references for date&time matching.
Excluding MS uber-devs – without symbols, stacks or dumps are pretty much useless. we deploy public pdb’s *with* our app, but our circumstances are unique (our product includes custom hardware, a PC, and large software modules combined) and i realize this may not be suitable for everyone. You might try unchecking that “match symbols exactly” requirement, and see if the stacks you get make sense. I don’t have VS at home, but its probably under project props / configuration / c++ / debugging, or something close. I have no experience whatsoever with in-house symbol servers. From brief documentation skimming – this seems a major undertaking.

Nick says:

August 4, 2010 at 9:48 am

This is a great article.You are saying that “The stack responsible for most allocations is coloured red, the rest are yellow”.What I don’t understand is: is there one stack or multiple stacks? do you mean one stack per thread? thanks

- Ofek Shilon says:
  
  August 15, 2010 at 8:48 pm
  
  Nick – thanks. The Y-axis is the count of different invocations of identical call stacks, resulting in memory allocation. I.e., these are *different* calls, with *identical* stacks.
  
Pingback: ArcObjects: Memory leak in IFeatureClass.Search (only on SDE with direct connect) | Q&A System