Stopping the OOM Pain — Removing JVM Memory Leaks

Posted by feydr | Posted in Uncategorized | Posted on 09-07-2010

View Comments

If you are like me you probably have to be working in several different languages at any given time and monitoring more than a handful of daemons. Sometimes your tests do not catch crucial items that they should be catching. As my loyal readers might already know I do not consider myself as knowledgeable about the JVM and java as I should be. These past couple of days were a lesson in humliity when it comes to garbage collection on the JVM.

I grew up learning how to write code using C from man pages managing memory on my own. Malloc and free were my friends and I would pride myself on running valgrind on a paritcular piece of code passing through it’s tests.



The JVM of course uses a managed memory model with it’s garbage collector. This means no more memory leaks right? Wrong.

Example:

Let’s suppose you have an object you instantiated called Blah. Blah gets used but is never de-referenced nor does it have a way to tell the GC that it is a candidate for removing. (You never actually remove an object — you just tell the garbage collector that is is ready to be removed.

” Note the wording: Just because an object is a candidate for collection doesn’t mean it will be immediately collected.” Candidate for Removal

What happens after you instantitate 300k of these Blah objects that use 80 bytes a pop? You just lost 22 meg that you won’t regain until you kill that process!

So to cut to the chase I had a tomcat servlet that kept dying about once a week and I didn’t allocate any time to look at it what the problem was until now as I could just restart it and everything would be peachy. I finally decided to look at it.

This is what I did.

Get a Heap Dump

First, you need to obtain a heap dump — without the heap dump you are having better luck at the casino. Chances are if you are crashing cause of an OOM memory error you won’t be able to get your heap dump from the production site — we don’t really want to screw with that anyways.

So, let’s duplicate our results on our dev box. Like I said before, we were looking at a tomcat servlet that was not working so in theory we should be able to hit our servlet with enough requests to trigger some heap growth.

I wrote a quick little script in ruby to slam our servlet, here it is:

#!/usr/bin/ruby
 
require 'rubygems'
require 'uri'
require 'net/http'
require 'cgi'
require 'timeout'
require 'open-uri'
require 'monitor'
 
File.open('b.txt', 'r') do |f| @filetopost = f.read end
@params = {'text' => CGI::escapeHTML(@filetopost), 'submit' => 'Submit'}
 
def pingsite
  500.times do |i|
    x = Net::HTTP.post_form(URI.parse('http://127.0.0.1:8080/upload/process'), @params)
    puts i
  end
end
 
threads = []
(1..20).each do |i|
  threads << Thread.new do
    pingsite
  end
end
 
threads.each do |t|
  t.join
end

This essentially as you might have guessed posts the contents of a text file through a form to our servlet for processing (text into xml). It makes 10k requests (500 at 20 threads).

BEFORE you run this you’ll want to hop into jconsole. JConsole will allow you to watch the action live as you see your heap grow and be garbage collected. You’ll see that the faster the heap grows the faster the garbage collector will start to run so your graph will start to have it’s lines closer together. What you are wanting to do here is just make the graph curve upward — that’s it.

Once you have a graph that looks like this:



we are ready for the next step. This graph actually grew up into the 400s and was much more pronounced by the time my 10k requests were done — all you really need to do is show proof that there is a growth — the more the merrier as it should help you drill down more easily to determine what is piling up.

Dump the Heap

Find out the process we need to dump:

ps aux | grep tomcat

dump it:

jmap -dump:live,format=b,file=heap.bin 6608

The dump will need to be initiated by the same user owning the tomcat process or someone with perms for it (tomcat6, root, joebob, whatever).

Great! Let’s take a quick look at the size

ls -lh heap.bin

200 meg! WTF!?

Get Relevant Stats

Ok, now let’s hop into jvisualvm (visualvm). I went ahead and changed the perms on the dump file so I could run it as my own user. Once in visualvm we want to go to our classes view in the top right hand corner. From here we can sort by # of instances or % of how much the heap is towards the object. I focused on instance count because that seemed like something that would let me know what was up. I noticed that at least one thing was wrong here. I had something like 300k prepared statement instances being built — what the fuck?!

Fix the Problem

So I went back to my code and did a .close() on them.

        // note this is scala
        pStmt.executeUpdate()
 
        pStmt.close()
 
        if(stmt != null) {
          stmt.close()
        }

Measure Again to Ensure the Problem is Fixed

After this was done we compile our jar, throw it back to our servlet, restart tomcat and run through our bulk POSTing again. The theory here is that we should be able to have a noticeable impact our instance count for prepared statements and have a noticeably different graph as well. I did not know that simply copying your webapp over to the root directory was not enough — you NEED to restart tomcat to get accurate results as items in the PermGen space will be duplicated if you simply copy a new webapp over (and after doing that a couple of times you’ll find a nice little PermGen OOM error in your catalina log files.)

Here’s the new graph:



The new heap size is :20 meg!!

Well, we might not have squashed all of our bugs but I’d bet good money that this servlet doesn’t die again in the next week or so.

Other Profiling Tools I Used but Did not Like:

jhat: I tried this out and it seems like it is supposed to be used on remote servers, however, there was absolute NO thought put into making it a nice UI to use by a human and it didn’t particularly lend itself well to a computer looking at information so frankly I think it’s a piece of shit.

Memory Analyzer: Speaking of piece of shit, this rotten crap came with Eclipse (now, I have my own opinions about how horrible eclipse is to begin with so maybe that’s just my prejudice rubbing off) but this kept crashing every fucking time I tried to load the heap. I tried passing in the arguments -Xms128m -Xmx1024m and all but it just did not like me. Fuck it — I got the job done regardless.

Anyways, I hope this information was somewhat useful for someone else — while documentation exists on the internet for this it wasn’t the most readily available info.

  • Jiri
    Nice article! Any reasons to use jmap for dumping the heap and JConsole for monitoring memory usage? VisualVM can do the same, no need to leave the tool.
  • feydr
    hey Jiri, Thanks!

    As to your question -- no, there was absolutely no logic involved in switching tools -- it's just what I happened to do in the course of resolving this issue. I think I prefer visualvm over jconsole but I really haven't used either of them enough to make a good decision.
blog comments powered by Disqus