Debugging Node.js applications using core dumps

I was recently asked to help debug a production issue affecting a Node.js application. The application got stuck in such a way, that requests weren’t served anymore. Monitoring graphs showed that each stuck Node.js process consumed a full CPU core. On the other hand, there was plenty of memory free on the box.

In similar situations JVM users would probably attach some sort of remote monitoring or at least take a thread/heap dump. Luckily, similar tools exist for the Node.js ecosystem.

mdb and mdb_v8

mdb is a program that enables postmortem (also live) debugging on Solaris-like systems. mdb_v8 is an extension created by Joyent that extends the capabilities of mdb to analyze core dumps created of Node.js processes. See the mdb_v8 documentation for debugging commands supported. mdb_v8 can for example print the call stack of the JavaScript thread (including both native and JS stacks) and the JS heap.

mdb_v8 can also be used on core dumps generated on Linux machines. However, the software still only runs on Solaris-like operating systems. Running a SmartOS VM is therefore required. Luckily, there’s a tool called autopsy that allows easy access to mdb_v8 on Linux & Mac OS X.

Usage sample

For example, running the following JS in Node, taking a core dump & the Node executable from the Linux machine:

function callMe() {
  function ikikiersio(parametri) {
    console.log("in ikikiersio");
    for (;;) {}
  }

  function namedFunction() {
    (function() {
      ikikiersio({
        key1: 24,
	key2: "timantti"
      });
    })()
  }

  (function namedInlineFunction() {
    namedFunction();
  })();
}

callMe();

And then running autopsy nodebin corefile:

ristela1$ autopsy node core.1283 

    ██      ▄      ▄▄▄▄▀ ████▄ █ ▄▄    ▄▄▄▄▄ ▀▄    ▄ 
    █ █      █  ▀▀▀ █    █   █ █   █  █     ▀▄ █  █  
    █▄▄█  █   █     █    █   █ █▀▀▀ ▄  ▀▀▀▀▄    ▀█   
    █  █  █   █     █    ▀████ █     ▀▄▄▄▄▀     █    
       █  █▄ ▄█     ▀          █              ▄▀     
            ▀                 ▀                     
    JS Commands > ::dmods -l ! grep js
    V8 Commands > ::dmods -l ! grep v8
    Useful: ::status, ::dump, ::help <cmd>

Stack information for example:

> ::jsstack -vn0
js:     ikikiersio
          file: /home/luser/node-v4.4.2-linux-x64/bin/index.js
          posn: position 84
          this: dd37f3041b9 (Oddball: "undefined")
          arg1: 6812a0864e1 (JSObject: Object)
js:     <anonymous> (as <anon>)
          file: /home/luser/node-v4.4.2-linux-x64/bin/index.js
          posn: position 206
          this: dd37f3041b9 (Oddball: "undefined")
js:     namedFunction
          file: /home/luser/node-v4.4.2-linux-x64/bin/index.js
          posn: position 174
          this: dd37f3041b9 (Oddball: "undefined")
js:     namedInlineFunction
          file: /home/luser/node-v4.4.2-linux-x64/bin/index.js
          posn: position 301
          this: dd37f3041b9 (Oddball: "undefined")
js:     callMe
          file: /home/luser/node-v4.4.2-linux-x64/bin/index.js
          posn: position 62
          this: dd37f3041b9 (Oddball: "undefined")
js:     <anonymous> (as <anon>)
          file: /home/luser/node-v4.4.2-linux-x64/bin/index.js
          posn: position 1            
          this: 6812a081619 (JSObject: Object)
          arg1: 6812a081619 (JSObject: Object)
          arg2: 6812a0857b1 (JSFunction)
          arg3: 6812a081521 (JSObject: Module)
          arg4: 6812a07e731 (ConsString)
          arg5: 6812a085681 (ConsString)
        (1 internal frame elided)
js:     <anonymous> (as Module._compile)
          file: module.js
          posn: position 10415
          this: 6812a081521 (JSObject: Module)
          arg1: 6812a084ea1 (SeqOneByteString)
          arg2: 6812a07e731 (ConsString)
js:     <anonymous> (as Module._extensions..js)
          file: module.js
          posn: position 12057
          this: 6812a07a8f1 (JSObject: Object)
          arg1: 6812a081521 (JSObject: Module)
          arg2: 6812a07e731 (ConsString)
js:     <anonymous> (as Module.load)
          file: module.js
          posn: position 9430
          this: 6812a081521 (JSObject: Module)
          arg1: 6812a07e731 (ConsString)
js:     <anonymous> (as Module._load)
          file: module.js
          posn: position 7966
          this: 6812a0379b9 (JSFunction)
          arg1: 6812a033141 (ConsString)
          arg2: dd37f304101 (Oddball: "null")
          arg3: dd37f304211 (Oddball: "true")
js:     <anonymous> (as Module.runMain)
          file: module.js
          posn: position 12706
          this: 6812a0379b9 (JSFunction)
js:     startup
          file: node.js
          posn: position 345
          this: dd37f3041b9 (Oddball: "undefined")
js:     <anonymous> (as <anon>)
          file: node.js
          posn: position 300
          this: 190f3109779 (<unknown>)
          arg1: 190f3109761 (JSObject: process)
        (1 internal frame elided)
        (1 internal frame elided)     
native: v8::internal::Execution::Call+0x14f
native: v8::Function::Call+0xff
native: v8::Function::Call+0x41
native: node::LoadEnvironment+0x1c5
native: node::Start+0x3e8

As you can see in the stack trace, inline functions without names can only be seen as <anonymous>. Therefore naming even inline functions is helpful when debugging. We can see that the function ikikiersio was called with parameter at 6812a0864e1. Let’s dump its value:

> 6812a0864e1::jsprint
{
    "key1": 24,
    "key2": "timantti",
}

mdb_v8 supports lots of other commands as well (e.g. for analyzing the heap), but in this case just analyzing the stack data was enough to hint us in the right direction with our production problem.

Production postmortem

The initial CPU usage pattern hinted at something being stuck in an infinite loop. The stack traces from all the stuck Node.js processes pointed to garbage collection running at that time. GC running at the same time in all the processes doesn’t sound like a coincidence. So, probably the processes are stuck in some sort of situation, where GC can free enough memory to avoid the process being killed but not enough to do useful work.

Further analysis concluded that the software running was expecting a 4GB heap for caching things in-VM, but default Node.js heap size of about 1.6GB was used. We raised the heap size and there have been no similar situations since. GC tracing debug options could have also been used to analyze the problem, but core dumps provide a nice way of examining a running system where certain debug options might not be in use.

Pitfalls

I’d suggest testing that this technique works before a reason for using it arises. For example, I had a problem analyzing core dumps taken from Node.js running on Alpine Linux (might have something to do with Alpine Linux using musl instead of glibc or the Node.js binary missing debug symbols).

Extra reading