In collecting feedback on my previous post discussing the new hotness of NSIncrementalDataStore, I seem to have unexpectedly lit a fuse. On the one hand, that blog post has spawned a dozen new projects and has kept my inbox unusually full. On the other hand, it met an unexpected amount of resistance–not just to the new workflow for networked models introduced in iOS 5, but the use of Core Data at all. As I’ve discussed this with more and more developers, I’ve found a lot of prevailing myths. Among them, Core Data is designed for something–not really sure what–but whatever it is, it’s a lot more complicated than what I need to do in my project. I just want to save some entities to disk. It shouldn’t take a wheelbarrow of NSManagedSomethingSomethings and programming guides in the hundreds of pages to solve that problem!
I think this pretty much sums up the key objection to CoreData. Many developers originally migrate from somewhere like Ruby or Python with a library ecosystem that is pretty reasonable. And so the initial approach is something like “We need networking? Okay, let’s install ASIHTTP!” (If this is you, the original author of ASIHTTPRequest no longer recommends its use. Clue.) Let’s rope in Three20! Maybe KIF! And DDLog, or maybe Lumberjack! Okay, now just write some glue code!
For endlessly-debated reasons, this pattern isn’t viable on iOS for anything beyond very simple applications, despite being a core tenet of other software ecosystems. For starters, iOS lacks any kind of reasonable package management (yes, I’ve heard of CocoaPods, that’s a talk for another day), continuous integration that provides some semblance of test coverage for your library is difficult to put it mildly, building “real” libraries is forbidden, faking it requires patched versions of XCode that break under key corner cases, installing complicated libraries requires a lot of documentation and often “works for me” and not anyone else on your team, there is a known bug in LLVM since the dark ages that prevents some libraries from working correctly out of the box, and the popular workaround causes a lot of problems when using lots of libraries. To put it mildly, the toolchain and ecosystem is openly hostile to library use. As a result, plenty of smart people have run into one or two or six of these, and have started to carefully watch the number of libraries they let into an application.
So most competent iOS developers have this unusually heightened spidey sense that tingles every time you try and talk them into using a library. Why mess with all that dependency BS when we can just roll our own? And so people joke about the wheelbarrow of NSManagedSomethingSomethings like, you know, who needs that. It’s just a plain terrible library, and we only have a few free library slots, that are reserved for more deserving contestants.
If you had a time machine, you could travel back to the year 2000, when Joel wrote:
“It’s a big hairy mess,” they will tell you. “I’d like nothing better than to throw it out and start over.”
Why is it a mess?
“Well,” they say, “look at this function. It is two pages long! None of this stuff belongs in there! I don’t know what half of these API calls are for.”
It’s incredible how well that article has aged. It could easily be a criticism of Core Data today. Joel goes on to write:
Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
Joel is saying that code is hairy because real problems are hard. I’m arguing that CoreData is hairy because it solves real problems. There is no universe in which you are going to avoid writing a big hairy mess because you understand how to structure real-world data access for Mac and iOS applications better than Apple. None. CoreData is used across tens of thousands of applications inside and outside of Cupertino because it’s good. It solves problems your software actually has.
Here are some things you probably haven’t thought about when architecting your so-called data stack:
Just to be clear: if your requirements never change and you never update your applications, you never interact with remote objects, your users never make mistakes, your data operations are all instantaneous, your users all run 12-core Mac Pros, and you only have one view, Core Data is definitely not the correct tool for you. Go, write your own data stack, and be merry. The rest of us should be using Core Data.
Also to be clear, using CoreData here and there does not magically solve all your thread synchronization problems, or build a complete sync engine for you. It’s not some magical dust you sprinkle on and life is sunshine and rainbows. Multithreading is still one of the most challenging problems in all of computer programming. But, if you’re going to be at the front lines of combat, you might as well be using military standard issue equipment, not forks and hope and railway-shares. Use the same tools that other people use to solve hard problems.
There is a huge temptation to believe that we “just” need to read and write a few objects from disk. After all: undo support was never mentioned in the specification, and we barely have enough time to write a single-threaded application! Yet. Just you wait. There are three things of which we can be certain: death, taxes, and requirements creep. Core Data encapsulates a lot of the data-related tasks that are common to Cocoa applications, and as such, encapsulates much of the hidden requirements that your users expect but haven’t told you about yet. I give you perhaps until beta before your app has creeped to more than 60% of these. Just give in and make the jump at the beginning.
Cocoa is a pretty big learning curve, no? The initial hurdles are learning MVC and learning not to subclass for everything like you do in Java. Then it’s delegation, the view hierarchy, and other intermediate topics.
Have you ever stopped to think about how complicated a UIButton is? I mean, you have, obviously, the usual stuff with views and windows and frames and bounds. Mix in some UIControl stuff with target/action patterns. But there’s a lot of complexity that is specific to buttons. How should a button react when you touch down on it? (iOS draws some shading by default). Does it make sense to customize this behavior for some kinds of buttons? What happens if a button is selected (an application-specific button state, like the toggle position of a switch or the check/unchecked property of a checkbox) and you touch inside it? What about when it is disabled? Are selected and highlighted and disabled mutually exclusive, or do we consider a button to have many different kinds of “highlighted” states? How should we handle UIControlStateSelected (which is often application-specific) any differently from UIControlStateApplication? And to further confuse you, the state of a button may or may not be a bitmask field in practice, depending on how you read the documentation. Is a button in a single state or is it in multiple states? We’re just scratching the surface. Does it make sense for a button of type UIButtonTypeInfoDark to even have a size? How should it behave to setFrame? Why is this so hard? I just want to detect a tap on the screen!
Now consider something that’s actually a little bit complicated, like UITableView. I actually got it out and counted–the documentation that is unique to table views and unique to iOS runs to 168 pages (and it’s dwarfed by the Mac documentation). That’s crazy! Why should I even use dequeueReusableCellWithIdentifier? Why should I implement heightForRowAtIndexPath? Why are the data source and the delegate different? Why do I need to call deselectCellForRowAtIndexPath? Why is this so freaking hard when all I want to do is display some rects the user can tap on?
Or my personal favorite: why do we have these conventions for user interface animations? Why can’t I just roll my own? (Answer: because realistic animations are hard.)
And so developers, kicking and screaming, eventually settle on the idea that Cocoa is in fact not arbitrarily terrible, and that things are this way because users want certain things that are just plain difficult to deliver, you’re a bit overoptimistic with your estimates, and that these Cocoa APIs really are very good solutions to the problems users typically ask you to solve. And nobody seriously advocates let’s drop Cocoa and roll our own UI library. (Well, except for the gaming developers. And let’s face it, gaming developers are crazy.)
Core Data is basically Cocoa for models. Apple has been shipping software with the MVC pattern for a very long time. They have invented a set of primitives that are really freaking good. Good for large applications, small applications, simple applications, complicated applications, in a box, with a fox, etc.
I think that a big part of the problem is that it is easier (but not easy) to see what it means for a framework that pushes pixels to be good than for a framework that emits SQL queries to be good. A framework that automatically shades our button image when we click on it seems like a good framework, intuitively. dequeueReusableCellWithIdentifier seems like a bad pattern at first, but you can see the effects of doing thing the other way, and it’s bad. So even if it’s not immediately evident, it’s eventually evident.
The benefits of a data framework that you don’t use is never evident, so you have a whole category of people who just never have an opportunity to evaluate it on a real project. Then you have another group of people who give it a “fair shot” of one project–you remember when you got Cocoa on the first try?–and then curse it forever. I was being recruited for a job once where the interviewer suddenly went on a tirade that CoreData was “not multithreaded” and was “slow” and that they were moving their code to SQLite immediately, because of all the “problems” they were experiencing with a fairly mundane data entry application. Uh… what?
I’ve actually discovered that asking questions about the data model is a pretty good filter for joining a Cocoa project. If it’s backed by CoreData, you’re talking to a programmer that has at least earned the “read a programming book and wants to try something new” badge. If the models are backed by SQLite, or a home-grown stack, or by no stack, the badness is basically unbounded. (Of course it can be very good–I’ve been in a few good SQLite projects. It just probably isn’t.)
Of course there are. But they’re strange.
For example, Brent Simmons has a really fantastic article documenting his switch away from CoreData for Net NewsWire. It ends like this:
My warning: you probably don’t need to switch away from Core Data. It’s the right answer almost every time.
If you go through that article you discover that there’s probably nothing he’s running into that applies to you. It’s a corner case. Corner cases exist, and if you’re in one, don’t use Core Data.
Most of the criticism, though, is of this variety:
If I had been writing that post I probably wouldn’t have praised Core Data as much as he did, although admittedly because I rarely use it, and not at all in any shipping applications. Its approach always seemed slightly wrong to me.
My main reason for sticking with SQL directly is that I know by coding at this lower level — with my own lightweight model objects on top of FMDB and utility methods for working with the Clipstart database — that if something is slow it’s my fault. I can fix things that are my fault. I can’t fix fundamental design problems in Apple’s code.
I don’t want to be too hard on Manton–I’m sure he’s a nice guy, seems local too–but the politest way to characterize that comment is willful ignorance. iOS and Mac development is chock full of Apple magic that he has no visibility into–what makes Core Data different from the rest of Cocoa? What are these fundamental design issues that cannot be corrected? He proudly publishes that he has never tried it long enough to know.
I have heard a lot of people with this same general sentiment — “I haven’t really tried it and it sucks!”–which is just not a very rational sentiment, just for starters. Don’t get me wrong, there are a set of things that are reasonable to bash without really giving a fair shake. I’m just saying that the technology that underlies iCal, Mail, Contacts, iCloud, iMovie, and half a dozen other multi-million dollar software projects that you use daily is probably not in that set.
Let me let Apple do the talking:
There are a number of reasons why it may be appropriate for you to use Core Data. One of the simplest metrics is that, with Core Data, the amount of code you write to support the model layer of your application is typically 50% to 70% smaller as measured by lines of code. This is primarily due to the features listed above—the features Core Data provides are features you don’t have to implement yourself. Moreover they’re features you don’t have to test yourself, and in particular you don’t have to optimize yourself.
Core Data has a mature code base whose quality is maintained through unit tests, and is used daily by millions of customers in a wide variety of applications. The framework has been highly optimized over several releases. It takes advantage of information provided in the model and runtime features not typically employed in application-level code. Moreover, in addition to providing excellent security and error-handling, it offers best memory scalability of any competing solution. Put another way: you could spend a long time carefully crafting your own solution optimized for a particular problem domain, and not gain any performance advantage over what Core Data offers for free for any application.
I think this point needs to be stressed: Apple’s high-level APIs can be much faster than ‘optimized’ code at lower levels. CoreData is quite often going to be faster than whatever SQL you write manually.
I should add to this, Core Data is not the only high-level API that is so good it regularly outperforms people trying to work at a lower level. UIImageView is another case, where it is literally the fastest way to put an image on the screen you will ever come up with. I can’t count the number of times that I’ve seen smart people reject CALayer or UIView-based approaches to drawing, opting instead to drop to Quartz or CGContext… only to have it run much slower than the naive view implementation, which is very often emitting hand-optimized GPU assembler as part of its implementation. High-level doesn’t always mean slower; in fact it often means faster if the API boys have done their homework.
And that 50-70% figure is taken from actual Apple codebases that were migrated to CoreData. Real numbers, from real applications. That you probably have installed.
Well, I don’t think there is a real reason not to use Core Data for virtually any project of any size.
But if there are plenty of reasons why it has an (undeserved) bad rap. These include: