White Paper: How to Identify and Resolve Memory Bugs in Parallel and Distributed Applications

Totalview packageMemory bugs, essentially a mistake in the management of heap memory, can occur in any program that is being written, enhanced, or maintained. A memory bug can be caused by a number of factors, including:

  • Failure to check for error conditions
  • Relying on nonstandard behavior
  • Memory leaks including failure to free memory
  • Dangling references such as failure to clear pointers
  • Array bounds violations
  • Memory corruption such as writing to memory not owned/over running array bounds

Memory Bugs sometimes cause programs to crash or generate incorrect random results, or more frustratingly, they may lurk in the code base for long periods of time — only to manifest themselves at the worst possible moment. Memory problems are difficult to track down with conventional tools on even a simple desktop architecture, and are much more vexing when encountered on a distributed parallel architecture.

This paper will review the challenges of memory debugging, with special attention paid to the challenges of parallel development and debugging, introduce a tool that helps developers identify and resolve memory bugs in parallel and distributed applications, highlight its major features, and provide usage tips.

Download the Whitepaper