26 September 2010 by Published in: Code, iphone No comments yet

Here be dragons. This bug was the bane of my existence for two weeks. The dreaded EXC_BAD_ACCESS.

The trouble with this crash is it gives you basically zero information, and often the frame (backtrace) is invalid (so, less than zero information: wrong information).

So the first step is to look at the backtrace. Unfortunately, the backtrace has only system calls, so this is going to be a long night…

Next step is debugging with NSZombies. Zombie is a special debugging mode that, instead of freeing objects when they are dealloced, replaces them with an object of type NSZombie that basically throws exceptions whenever you try to do anything with it. The easiest and most reliable way to turn on NSZombies is detailed over here. Sure enough, a run with Zombies enabled gives me one of these:

2009-03-30 02:30:36.172 appName[3997:20b] *** -[CALayer release]: message sent to deallocated instance 0x59bf670

So the class causing the error is (er, was, before it was Zombied) CALayer. Great.

  • Problem: I never use CALayer explicitly in my code.
  • Problem2: The (now correct) backtrace is showing that the zombied object is being released inside Apple’s frameworks. This is bad. Very bad. Somehow, something that I am doing is causing an internal Apple framework to over-release private objects I know nothing about.

This is going to be a really long night…

Next step is trying to find out something about the object’s lifecycle. If I can figure out where this phantom CALayer is being created, that may give me some insight into what I am doing that is causing the overrelease. Fortunately, Apple has some great tools for this. Unfortunately, this bug was the perfect storm and managed to break all of them.

First up: malloc_history. malloc_history is a command-line tool that parses malloc stack logs. By setting a couple of environment variables, the ObjC runtime will log every call to malloc and free complete with backtraces to a file.

  • Problem: MallocStackLogging doesn’t work on the device in 4.0 (fixed in 4.1).
  • Problem2: you can’t run the malloc_history script on the iPhone to read the malloc stack logs without shell / ssh access.
  • Problem3: Although you can write the stack logs to your app’s directory using XCode, and subsequently download the stack logs through the XCode window, the x86 malloc_history that ships with OSX can’t figure out how to read the stack logs.

Three strikes, you’re out!

Second up: Instruments. Instruments has Zombie support, plus it can (check the “Record reference counts” checkbox) record every retain/release call to every object (in theory). Hopefully, Instruments will help me figure out where this mythical CALayer is being over-released!

  • Problem: Instruments doesn’t actually log every retain/release call in system libraries, especially if retains/releases happen rapidly. It’s not uncommon for me to see gaps of 2 or more in the reference count.
  • Problem: Even though the iPhone supports NSZombies (works fine in XCode), Instruments will refuse to launch in NSZombie mode on the device. The checkbox to turn it on simply isn’t there. Only available on x86/Simulator. Oh, and my bug only reproduces on the device, never in the simulator
  • I tried to turn on NSZombies at run-time in the code following these directions and also informed by this post by an Apple employee. While there was some evidence that it may work for some things, it’s not guaranteed and of course it didn’t work for my bug.
  • Unlike x86 / Simulator, Instruments won’t “attach” to an already-running process on ARM/device, it insists on launching the app from scratch.

Four strikes, you’re out!

At this point, I was determined to get a reference-count-log, so I wrote a bunch of code to try to get the bug reproducing in the simulator. But the bug reproduces only when the user is taking a picture, and there’s no camera in the simulator. I had a hunch it was memory-related though, so I wrote a bunch of code to push fake camera view controllers onto the stack and take up lots of memory. Finally, I got an EXC_BAD_ACCESS and the reference log I had been waiting for…

Yeah… let’s just say that didn’t help at all. The object is referenced exactly two places (one retain, one release, both in system libraries), and yet somehow the reference count jumps automagically from 1 to -1.

Fast forward a week later, made some significant changes to the codebase that (by chance) give us a little better view into the problem. This time, instead of a CALayer that’s getting over-released, it’s suddenly a UIView (probably the UIView that owns the CALayer that was formerly causing the problem). Reference-count-log:

Why in the world is an NSKeyValueCoding internal call decrementing the reference count smack-dab in the middle of the object’s lifecycle?  Let’s look at the code:

IBOutlet UIView *viewCausingCrash;
//snip...
UIView *viewCausingCrash = [[[UIView alloc] initWithFrame:old.frame] autorelease];
[old removeFromSuperview];
[scroller addSubview:viewCausingCrash];
old = viewCausingCrash;

What in the world could be wrong with this code? The object is retained by scroller, so it won’t get dealloced. All the reference counts are balanced. Where is this call to NSKeyValueCoding coming from?

Well, it turns out Interface Builder is magic. Really, really magic.

Recall that when an object is awoken out of a nib file, it has a reference count of 1. Normally you can think of IB objects as “alloced” because they will pretty much always be valid for the lifetime of your ViewController, barring low-memory situations.

But. Suppose you have things set up like I do, where you have an IB “object” that is replaced programmatically by another object. In this case, I have an IB object, old, which is really just a placedholder for where viewCausingCrash is going to go, because I like dragging rects around in IB over hardcoding digits into the code. So IB hands me an object, and I pretty much throw that object away, by making old point to some new object.

Since IB is a good memory citizen, it is eventually going to try to release the object that it originally unarchived (to balance the unarchive’s +1 to the reference count). However, instead of turning to some internal pointer to figure out where to send the release message, it just uses whatever the IBOutlet is connected to at the moment. An IBOutlet which, at the point the release is made, is not retained by IB, because it points to a very different object. Oops.

To further complicate matters, IB sends the release a lot later than you would expect. The release doesn’t happen on viewDidUnload (where I would have found it many days sooner), but in fact the release happens when/if the view loads again. So if you modify the value of the IBOutlet pointer in your viewDidLoad method, who knows what will happen? It actually varies from platform to platform. The perfect heisenbug.

For reference, the way IB’s outlet reconnections work is a whole topic in itself. See here for a discussion of how the memory management works on each platform. it’s just similar enough to convince you that it’s the same, and just different enough to cause you to lose hair. For instance, it looks like Mac OS retains everything first and then autoreleases only those objects that appear to IB as if they have a parent. Meanwhile iOS retains and then autoreleases everything, and then retains only those objects that appear to IB as if they have a parent. Maddeningly backwards. Seriously, go read the docs.

But here’s the fix, and apparently this design pattern works across all the hairy platforms and runtimes.

  1. Every IBOutlet should have an @property which is declared retain. Every time.
  2. The @property you declared in #1 should be set to nil on dealloc. (This is the release to balance the @property’s retain)
  3. The @property you declared in #1should be set to nil on viewDidUnload. (This is the release to balance the @property’s retain in the low-memory case. Not entirely clear if this applies to only 2.x or if it applies to 3.x+ as well… it appears in a Note box in the Apple docs that is otherwise talking about an “implementation detail” of OS2.)

Is this a pain? Yes. Now I have to interact with every IBOutlet no less than SIX FREAKING DIFFERENT TIMES. (Variable declaration, IB connection, @property declaration, @synthesize, dealloc, and viewDidUnload). Yuck. But it did fix the bug.

Further reading:

Comments

Comments are closed.

Powered by WordPress