software and other stuff.
Syndicate: [full] [short]
1 of 1 article
matz, the ruby trunk, and GC changes
26 Apr 08    [ link all ]

matz, the ruby trunk, and GC changes w00t. An email from matz, and a little spelunking in the ruby subversion repository shows that there's some tinkering going on in ruby garbage collection land. Here are the interesting change logs:

r15674 | matz | 2008-03-03 01:27:43 -0700 (Mon, 03 Mar 2008) | 5 lines

* gc.c (add_heap): sort heaps array in ascending order to use
  binary search.

* gc.c (is_pointer_to_heap): use binary search to identify object
  in heaps.  works better when number of heap segments grow big.
and...
r16194 | matz | 2008-04-25 03:03:32 -0600 (Fri, 25 Apr 2008) | 7 lines

* gc.c (free_unused_heaps): preserve last used heap segment to
  reduce malloc() call.

* gc.c (HEAP_SIZE): use smaller heap segment (2K) for more chance
  to be freed.  based on patch from authorNari .

* gc.c (rb_newobj_from_heap): eventually allocate heap segments.

So now in ruby 1.9 trunk we're keeping heaps in sorted order by memory address, and using binary search to answer the is_pointer_to_heap() question quickly. This optimizes things to the point where we can really crank down heap size. Smaller heaps means more OS reclaimation, means reduced resource usage, and should even mean a ruby with reduced COW badness. All this at a minimal performance impact for normal execution (maybe none, matz knows).

So applause to open source, and matz specifically for sifting through all the ideas/hacks/and patches to realize this thing. It will be interesting to include 1.9 in the performance comparison table to see how things have changed from 1.8.6 to present trunk.

So why do I care so much about a less memory intensive ruby? Well because ruby _really_ shines as an embedded language. In terms of the presence of a robust set of built-ins, and a fairly modest size hit. Also, the C api for embedding is beautiful. It's fun to use in the way that ruby itself is. I'm guessing that the embedding API got so nice because extension authors have been using it and complaining about it for a while. So the part of the api that is common to authoring extentions and embedding an interpreter is great. What sucks is the part that's unique to embedding. It would be great if:

  1. I could ruby two ruby interpreter contexts in a single process in different threads (I know this would have some benefits in web server plugins too).
  2. if I could have multiple interpreter contexts around at the same time.
  3. To be able to cleanly shutdown and restart the interpreter, without massive memory leaks.

So reducing memory usage is a first good step to making ruby the premier language for embedding. Next steps include getting rid of all them statics and breaking and making optional the stuff that is only required by the ruby interpreter itself. Perhaps a bit more ambitious than hackin on the GC...

till the next,
lloyd