software and other stuff.
Syndicate: [full] [short]
3 of 3 articles
hacking on ruby's garbage collector
03 Nov 07    [ link all ]

hacking on ruby's garbage collector This is my original writeup on possible improvements to the ruby garbage collector. Latest status is archived in projects/ruby.

overview

Ruby's GC & heap implementation uses a lot of memory. The thing is based around the idea of "heaps". Heaps are chunks of memory where ruby objects are stored. Each heap consists of a number of slots. Slots are between 20 and 40 bytes, depending on sizeof(long). When ruby runs out of heap space, it first does a GC run to try to free something up, and then allocates a new heap. the new heap is 1.8 times larger than the last. Every time a GC run happens, the entire heap is written to turn off mark bits, these are stored in the heap. Then we run through top level objects, and mark them, and all their descendents. Then we throw away anything that's not marked (sweep). Because of the way ruby works, objects may _never_ be moved around in heaps. That means from the time they're allocated to the time they're freed they may not be moved to a new memory address.

So this is a very terse summary, more is available in the ruby hackers guide. But it's enough. There are a couple bad things here.

  1. in order for a heap to be reclaimed, _all_ entries on the heap need to be freed. The bigger a heap is, the more likely that it will contain at least one long lived object. The 1.8 growth factor makes it bloody unlikely that you'll ever get to reclaim heap space.
  2. A big heap makes GC slower. You have to scan the whole thing
  3. (this guy gets the credit on this idea) Because we do scanning copy on write semantics are blown. you do a fork, and as soon as GC runs, your whole heap is resident and private memory.
  4. We cannot do compaction at GC time, we must either change the way ruby works in a very fundamental way (bad), or think of a creative lightweight way to just keep the heap compacted.

Plan of attack

  1. develop a method of quantifying performance of ruby GC & heap
  2. prove some of these ideas can have positive effects in terms of memory usage and performance
  3. produce patches for the ruby community and anyone who wants em
  4. drink beer

Quantifying

First thing we need is a way to get a look at statistics of the gc stuff. So we hack in a GC.heap_info function that returns
  • num_heaps - the number of allocated heaps
  • heap_slots_free - the total number of free slots
  • heap_memory - the amount of memory allocated to ruby heaps
  • heap_slots_allocated - the total number of available slots
  • heap_slots_used - the total number of used slots
  • num_gc_passes - the number of times GC has been run

Great. Next we need test cases. I start with three:

  • grow array build a buncha arrays bigger and bigger
  • shrink array start with a big array, build a buncha ones smaller.
  • PLIST parsing Parse a huge plist file.
Yeah, they're artificial, but we'll add more cases as we go. At least we have some constant tests to start with.

Proof of concept

hypothesis

By getting rid of the 1.8 growth factor, and making heaps smaller, we can increase the amount of memory that's reclaimed. And make ruby faster, by reducing the amount of scanning unused memory that occurs.

data

Vanilla ruby:
running cases/growarray.rb
 heap before (mem/used slots/% free/heaps/gc passes) 560080/6501/0.767838011570602/2/9
 heap after (mem/used slots/% free/heaps/gc passes) 560080/6367/0.772623384043997/2/11
 time 1.339915
running cases/plist.rb
 heap before (mem/used slots/% free/heaps/gc passes) 15056520/559048/0.257393875553088/7/21
 heap after (mem/used slots/% free/heaps/gc passes) 15056520/109351/0.854744633172117/7/23
 time 3.773354
running cases/shrinkarray.rb
 heap before (mem/used slots/% free/heaps/gc passes) 560080/7537/0.730840654238983/2/9
 heap after (mem/used slots/% free/heaps/gc passes) 560080/6373/0.77240911363474/2/11
 time 1.338151
Killing 1.8 growth factor:
running cases/growarray.rb
 heap before (mem/used slots/% free/heaps/gc passes) 400080/6501/0.674982501749825/2/9
 heap after (mem/used slots/% free/heaps/gc passes) 400080/6367/0.681681831816818/2/11
 time 1.337504
running cases/plist.rb
 heap before (mem/used slots/% free/heaps/gc passes) 9406200/380442/0.191001631002226/47/51
 heap after (mem/used slots/% free/heaps/gc passes) 9406200/109351/0.767468416609429/47/53
 time 4.278476
running cases/shrinkarray.rb
 heap before (mem/used slots/% free/heaps/gc passes) 400080/7525/0.623787621237876/2/9
 heap after (mem/used slots/% free/heaps/gc passes) 400080/6373/0.681381861813819/2/11
 time 1.343519
Killing 1.8 growth factor and reducing heap size to 1/10th
running cases/growarray.rb
 heap before (mem/used slots/% free/heaps/gc passes) 221540/6491/0.413428519790349/11/18
 heap after (mem/used slots/% free/heaps/gc passes) 221540/6368/0.424543647207663/11/20
 time 1.351549
running cases/plist.rb
 heap before (mem/used slots/% free/heaps/gc passes) 7741680/382475/0.0109718192584778/366/365
 heap after (mem/used slots/% free/heaps/gc passes) 3795580/109350/0.423259493670886/179/367
 time 7.971707
running cases/shrinkarray.rb
 heap before (mem/used slots/% free/heaps/gc passes) 221540/7717/0.302638713175493/11/17
 heap after (mem/used slots/% free/heaps/gc passes) 221540/6374/0.424001445870233/11/19
 time 1.343365

analysis

note "heap before" means "heap before final GC run". We fork a process which runs the test case, then we check out the heap using GC.heap_info, then we run a GC pass, then we check it out again.

w00t! we made ruby twice as slow! Well hold on. First inspect the run times of plist.rb (the most realistic test case). Also inspect the number of gc passes. Pretty tight correlation, right? reducing heap size, and removing the 1.8 growth factor both increase the number of gc passes that we make. So we see a significant performance degradation proportional to the number of passes that are run.

Inspect memory usage (still looking at only plist.rb). Vanilla ruby is using 15mb. At the end of everything, and that heap is 85% unused. Kill the 1.8 and we're using 9mb of heap space, 76% unused. Decrease the heap size, and we actually see memory being reclaimed. After the run and final GC we've only got ~4mb in use at 42% free. Immediately after the run we were around 8mb.

parting shot/conclusion

By changing two constants we can make ruby a lot more memory efficient, and at the same time a lot slower. The slow down appears to be largely from increased frequency of GC. Maybe we can look at not running GC _every_ heap allocation, but every N heap allocations... Goal here being to restore ruby to it's original, or better performance characteristics, but reduce the memory usage.

Essence here is that everyone knows you can grow a buffer by a factor and make things faster. But other aspects of ruby make that choice perhaps not optimal here. Stay tuned, we'll dig further.

one more thing, ideas on "automatic heap compaction"

a global freelist? bad. Why not have per-heap freelists. Why not sort the heaps by usage percentage at the end of a sweep? Allocate the heaviest used ones first... There's some complexity here around when GC is run... Cause it's only run when everything is full... But perhaps some room for exploration...
Fullscreen X11 broken on Leopard!?
01 Nov 07    [ link all ]

Fullscreen X11 broken on Leopard!? Here's the Problem: http://lists.apple.com/archives/x11-users/2007/Oct/msg00065.html

Here's a proposed "fix": http://aaroniba.net/articles/x11-leopard.html

If they didn't add RPATH support, DTrace, and pretty much avoid judicious changes, I'd be throwing stones. As it stands, this is extremely annoying, but tolerable. Looking forward to the fix...

-lloyd

building emacs on leopard from macports
01 Nov 07    [ link all ]

building emacs on leopard from macports I cannot live without X11 emacs! It doesn't build from macports right now. As far as I can tell, the emacs that apple ships with leopard is broken, at least for me after upgrade I get:

[lth@tumno ~] $ /usr/bin/emacs.broken 
Fatal malloc_jumpstart() error
This blog entry documents the steps to get emacs built, installed and in the macports registry. So at the time of writing, this is the package I'm installing:
[lth@tumno ~] $ port search emacs | grep 22.1
emacs                          editors/emacs  22.1         The GNU Emacs text editor (command line only)
Start by trying to install
[lth@tumno ~] $ sudo port install emacs@gtk--->  Fetching emacs
--->  Verifying checksum(s) for emacs
--->  Extracting emacs
--->  Applying patches to emacs
--->  Configuring emacs
--->  Building emacs with target all
Error: Target org.macports.build returned: shell command
     " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_editors_emacs/work/emacs-22.1" && \
     make all " returned error 2
Command output: Highest address of load commands in input file: 0x2b7000
Lowest offset of all sections in __TEXT segment:   0x2380
--- List of Load Commands in Input File ---
# cmd              cmdsize name                address     size
0 LC_SEGMENT             56 __PAGEZERO                0   0x1000
1 LC_SEGMENT            396 __TEXT               0x1000 0x135000
                           __text               0x3380 0x122eb8
                           __cstring          0x126238   0xfc23
                           __literal8         0x135e60     0x78
                           __const            0x135ee0    0x100
                           __literal4         0x135fe0      0x4
2 LC_SEGMENT            328 __DATA             0x136000 0x156000
                           __data             0x136000 0x119ac2
                           __dyld             0x24fac4     0x1c
                           __bss              0x24fae0  0x31aa5
                           __common           0x2815a0   0xaa51
3 LC_SEGMENT            192 __IMPORT           0x28c000   0x2000
                           __pointers         0x28c000   0x1584
                           __jump_table       0x28d5c0    0x3fc
4 LC_SEGMENT             56 __LINKEDIT         0x28e000  0x29000
5 LC_SYMTAB              24
6 LC_DYSYMTAB            80
7 LC_LOAD_DYLINKER       28
8 LC_UNIXTHREAD          80
9 LC_LOAD_DYLIB          60
10 LC_LOAD_DYLIB          52
11 LC_LOAD_DYLIB          52
make[1]: *** [emacs] Abort trap
make[1]: *** Deleting file `emacs'
make: *** [src] Error 2

Error: Status 1 encountered during processing.
So the apple folks have changed the structure of a dylib (to add, among other things, RPATH support. that's a huge w00t).

Let's patch this source. Now for some reason, if we apply a patch directly to the current state of the working directory, things go boom. I'm guessing the fix doesn't play nice with other current patches applied by macports. So I use a sledgehammer. Download emacs-22.1, configure as you like, apply this patch (credit to YAMAMOTO Mitsuharu for producing this patch) and build. Then move the built emacs-22.1 into your working directory, and port install.

The result for me is a emacs which runs in console mode. once I get X11 set up and running sawfish we'll see if it actually works in X11, in the past the emacs-devel package was required for X11 support, and the brief port description suggests this is still the case..

hth,
lloyd