In collecting feedback on my previous post discussing the new hotness of NSIncrementalDataStore, I seem to have unexpectedly lit a fuse. On the one hand, that blog post has spawned a dozen new projects and has kept my inbox unusually full. On the other hand, it met an unexpected amount of resistance–not just to the new workflow for networked models introduced in iOS 5, but the use of Core Data at all. As I’ve discussed this with more and more developers, I’ve found a lot of prevailing myths. Among them, Core Data is designed for something–not really sure what–but whatever it is, it’s a lot more complicated than what I need to do in my project. I just want to save some entities to disk. It shouldn’t take a wheelbarrow of NSManagedSomethingSomethings and programming guides in the hundreds of pages to solve that problem!
I think this pretty much sums up the key objection to CoreData. Many developers originally migrate from somewhere like Ruby or Python with a library ecosystem that is pretty reasonable. And so the initial approach is something like “We need networking? Okay, let’s install ASIHTTP!” (If this is you, the original author of ASIHTTPRequest no longer recommends its use. Clue.) Let’s rope in Three20! Maybe KIF! And DDLog, or maybe Lumberjack! Okay, now just write some glue code!
For endlessly-debated reasons, this pattern isn’t viable on iOS for anything beyond very simple applications, despite being a core tenet of other software ecosystems. For starters, iOS lacks any kind of reasonable package management (yes, I’ve heard of CocoaPods, that’s a talk for another day), continuous integration that provides some semblance of test coverage for your library is difficult to put it mildly, building “real” libraries is forbidden, faking it requires patched versions of XCode that break under key corner cases, installing complicated libraries requires a lot of documentation and often “works for me” and not anyone else on your team, there is a known bug in LLVM since the dark ages that prevents some libraries from working correctly out of the box, and the popular workaround causes a lot of problems when using lots of libraries. To put it mildly, the toolchain and ecosystem is openly hostile to library use. As a result, plenty of smart people have run into one or two or six of these, and have started to carefully watch the number of libraries they let into an application.
So most competent iOS developers have this unusually heightened spidey sense that tingles every time you try and talk them into using a library. Why mess with all that dependency BS when we can just roll our own? And so people joke about the wheelbarrow of NSManagedSomethingSomethings like, you know, who needs that. It’s just a plain terrible library, and we only have a few free library slots, that are reserved for more deserving contestants.
If you had a time machine, you could travel back to the year 2000, when Joel wrote:
“It’s a big hairy mess,” they will tell you. “I’d like nothing better than to throw it out and start over.”
Why is it a mess?
“Well,” they say, “look at this function. It is two pages long! None of this stuff belongs in there! I don’t know what half of these API calls are for.”
It’s incredible how well that article has aged. It could easily be a criticism of Core Data today. Joel goes on to write:
Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
Joel is saying that code is hairy because real problems are hard. I’m arguing that CoreData is hairy because it solves real problems. There is no universe in which you are going to avoid writing a big hairy mess because you understand how to structure real-world data access for Mac and iOS applications better than Apple. None. CoreData is used across tens of thousands of applications inside and outside of Cupertino because it’s good. It solves problems your software actually has.
Here are some things you probably haven’t thought about when architecting your so-called data stack:
Just to be clear: if your requirements never change and you never update your applications, you never interact with remote objects, your users never make mistakes, your data operations are all instantaneous, your users all run 12-core Mac Pros, and you only have one view, Core Data is definitely not the correct tool for you. Go, write your own data stack, and be merry. The rest of us should be using Core Data.
Also to be clear, using CoreData here and there does not magically solve all your thread synchronization problems, or build a complete sync engine for you. It’s not some magical dust you sprinkle on and life is sunshine and rainbows. Multithreading is still one of the most challenging problems in all of computer programming. But, if you’re going to be at the front lines of combat, you might as well be using military standard issue equipment, not forks and hope and railway-shares. Use the same tools that other people use to solve hard problems.
There is a huge temptation to believe that we “just” need to read and write a few objects from disk. After all: undo support was never mentioned in the specification, and we barely have enough time to write a single-threaded application! Yet. Just you wait. There are three things of which we can be certain: death, taxes, and requirements creep. Core Data encapsulates a lot of the data-related tasks that are common to Cocoa applications, and as such, encapsulates much of the hidden requirements that your users expect but haven’t told you about yet. I give you perhaps until beta before your app has creeped to more than 60% of these. Just give in and make the jump at the beginning.
Cocoa is a pretty big learning curve, no? The initial hurdles are learning MVC and learning not to subclass for everything like you do in Java. Then it’s delegation, the view hierarchy, and other intermediate topics.
Have you ever stopped to think about how complicated a UIButton is? I mean, you have, obviously, the usual stuff with views and windows and frames and bounds. Mix in some UIControl stuff with target/action patterns. But there’s a lot of complexity that is specific to buttons. How should a button react when you touch down on it? (iOS draws some shading by default). Does it make sense to customize this behavior for some kinds of buttons? What happens if a button is selected (an application-specific button state, like the toggle position of a switch or the check/unchecked property of a checkbox) and you touch inside it? What about when it is disabled? Are selected and highlighted and disabled mutually exclusive, or do we consider a button to have many different kinds of “highlighted” states? How should we handle UIControlStateSelected (which is often application-specific) any differently from UIControlStateApplication? And to further confuse you, the state of a button may or may not be a bitmask field in practice, depending on how you read the documentation. Is a button in a single state or is it in multiple states? We’re just scratching the surface. Does it make sense for a button of type UIButtonTypeInfoDark to even have a size? How should it behave to setFrame? Why is this so hard? I just want to detect a tap on the screen!
Now consider something that’s actually a little bit complicated, like UITableView. I actually got it out and counted–the documentation that is unique to table views and unique to iOS runs to 168 pages (and it’s dwarfed by the Mac documentation). That’s crazy! Why should I even use dequeueReusableCellWithIdentifier? Why should I implement heightForRowAtIndexPath? Why are the data source and the delegate different? Why do I need to call deselectCellForRowAtIndexPath? Why is this so freaking hard when all I want to do is display some rects the user can tap on?
Or my personal favorite: why do we have these conventions for user interface animations? Why can’t I just roll my own? (Answer: because realistic animations are hard.)
And so developers, kicking and screaming, eventually settle on the idea that Cocoa is in fact not arbitrarily terrible, and that things are this way because users want certain things that are just plain difficult to deliver, you’re a bit overoptimistic with your estimates, and that these Cocoa APIs really are very good solutions to the problems users typically ask you to solve. And nobody seriously advocates let’s drop Cocoa and roll our own UI library. (Well, except for the gaming developers. And let’s face it, gaming developers are crazy.)
Core Data is basically Cocoa for models. Apple has been shipping software with the MVC pattern for a very long time. They have invented a set of primitives that are really freaking good. Good for large applications, small applications, simple applications, complicated applications, in a box, with a fox, etc.
I think that a big part of the problem is that it is easier (but not easy) to see what it means for a framework that pushes pixels to be good than for a framework that emits SQL queries to be good. A framework that automatically shades our button image when we click on it seems like a good framework, intuitively. dequeueReusableCellWithIdentifier seems like a bad pattern at first, but you can see the effects of doing thing the other way, and it’s bad. So even if it’s not immediately evident, it’s eventually evident.
The benefits of a data framework that you don’t use is never evident, so you have a whole category of people who just never have an opportunity to evaluate it on a real project. Then you have another group of people who give it a “fair shot” of one project–you remember when you got Cocoa on the first try?–and then curse it forever. I was being recruited for a job once where the interviewer suddenly went on a tirade that CoreData was “not multithreaded” and was “slow” and that they were moving their code to SQLite immediately, because of all the “problems” they were experiencing with a fairly mundane data entry application. Uh… what?
I’ve actually discovered that asking questions about the data model is a pretty good filter for joining a Cocoa project. If it’s backed by CoreData, you’re talking to a programmer that has at least earned the “read a programming book and wants to try something new” badge. If the models are backed by SQLite, or a home-grown stack, or by no stack, the badness is basically unbounded. (Of course it can be very good–I’ve been in a few good SQLite projects. It just probably isn’t.)
Of course there are. But they’re strange.
For example, Brent Simmons has a really fantastic article documenting his switch away from CoreData for Net NewsWire. It ends like this:
My warning: you probably don’t need to switch away from Core Data. It’s the right answer almost every time.
If you go through that article you discover that there’s probably nothing he’s running into that applies to you. It’s a corner case. Corner cases exist, and if you’re in one, don’t use Core Data.
Most of the criticism, though, is of this variety:
If I had been writing that post I probably wouldn’t have praised Core Data as much as he did, although admittedly because I rarely use it, and not at all in any shipping applications. Its approach always seemed slightly wrong to me.
My main reason for sticking with SQL directly is that I know by coding at this lower level — with my own lightweight model objects on top of FMDB and utility methods for working with the Clipstart database — that if something is slow it’s my fault. I can fix things that are my fault. I can’t fix fundamental design problems in Apple’s code.
I don’t want to be too hard on Manton–I’m sure he’s a nice guy, seems local too–but the politest way to characterize that comment is willful ignorance. iOS and Mac development is chock full of Apple magic that he has no visibility into–what makes Core Data different from the rest of Cocoa? What are these fundamental design issues that cannot be corrected? He proudly publishes that he has never tried it long enough to know.
I have heard a lot of people with this same general sentiment — “I haven’t really tried it and it sucks!”–which is just not a very rational sentiment, just for starters. Don’t get me wrong, there are a set of things that are reasonable to bash without really giving a fair shake. I’m just saying that the technology that underlies iCal, Mail, Contacts, iCloud, iMovie, and half a dozen other multi-million dollar software projects that you use daily is probably not in that set.
Let me let Apple do the talking:
There are a number of reasons why it may be appropriate for you to use Core Data. One of the simplest metrics is that, with Core Data, the amount of code you write to support the model layer of your application is typically 50% to 70% smaller as measured by lines of code. This is primarily due to the features listed above—the features Core Data provides are features you don’t have to implement yourself. Moreover they’re features you don’t have to test yourself, and in particular you don’t have to optimize yourself.
Core Data has a mature code base whose quality is maintained through unit tests, and is used daily by millions of customers in a wide variety of applications. The framework has been highly optimized over several releases. It takes advantage of information provided in the model and runtime features not typically employed in application-level code. Moreover, in addition to providing excellent security and error-handling, it offers best memory scalability of any competing solution. Put another way: you could spend a long time carefully crafting your own solution optimized for a particular problem domain, and not gain any performance advantage over what Core Data offers for free for any application.
I think this point needs to be stressed: Apple’s high-level APIs can be much faster than ‘optimized’ code at lower levels. CoreData is quite often going to be faster than whatever SQL you write manually.
I should add to this, Core Data is not the only high-level API that is so good it regularly outperforms people trying to work at a lower level. UIImageView is another case, where it is literally the fastest way to put an image on the screen you will ever come up with. I can’t count the number of times that I’ve seen smart people reject CALayer or UIView-based approaches to drawing, opting instead to drop to Quartz or CGContext… only to have it run much slower than the naive view implementation, which is very often emitting hand-optimized GPU assembler as part of its implementation. High-level doesn’t always mean slower; in fact it often means faster if the API boys have done their homework.
And that 50-70% figure is taken from actual Apple codebases that were migrated to CoreData. Real numbers, from real applications. That you probably have installed.
Well, I don’t think there is a real reason not to use Core Data for virtually any project of any size.
But if there are plenty of reasons why it has an (undeserved) bad rap. These include:
Comments are closed.
You’re almost certainly right that Core Data is excellent. My main problem with it is its opacity: the documentation is typical Apple fare: covers the basics, but doesn’t give a deep insight that will help you when you run into weird situations. I’ve not read the dedicated books on the topic, but from my research (reviews, Amazon “look inside”) they seem to be barely more than regurgitated Apple docs, aside from some optimisation tips. Maybe my impression of the books is wrong and there are some excellent ones.
I should probably mention my level of exposure to Core Data: I’ve used Core Data in a small and simple CRUD OS X app and spent some time researching it before deciding against using it in an iOS app (back on iOS 4).
The OSX app really was extremely simple, but we still managed to run into all sorts of weird cases that are either not covered by CD at all or just not documented. Most of these were probably just unexpected Bindings/KVO/CD interplay. It’s unfortunate that the documentation for these systems is pretty much completely separate yet you end up using them in a deeply integrated way doesn’t help.
For example, it seemed reasonable to us to have a Cocoa sheet for creating a new instance of one of the more complex entities that encapsulated a bit of logic to derive some data. For such an instance to make sense, we needed to get all the data for it in one go, and then only run the logic for creating the object once the user was ready. The Bindings/KVO/CD triad doesn’t seem to be ready for this case at all, as it always wants you to create the object and insert it into the managed object context first. This of course causes it to appear in other views, even though it’s not “ready”. You can work around this of course, but we didn’t find a solution we were really happy with.
The iOS app wasn’t a straightforward CRUD situation, although the data model was heavily relational. Most of the data was read-only, generated in a web app back-end, to be pushed out to mobile clients for offline use. Lots of image data, which pushed it up into the dozens of megabytes; the data changed somewhat frequently but the extent of the changes was usually small, so we wanted to avoid redownloading the whole database every time. Parallel to this were two types of object which WERE mutable on the client, but were only stored locally. I ended up using two SQLite databases, one big one opened in read-only mode, generated on the server, and a small read-write one. For JOINs, I also ATTACHed the local one to the big database’s session. We made binary diffs between versions of the readonly database and had the client download those and patch them locally.
I’m still not sure to this day if Core Data would have (a) worked in that situation or (b) saved me any effort. The main risks I saw were:
– Generating a Core Data store outside the Apple ecosystem (the source of the data was an existing webapp running on a Linux server). I’m sure the SQLite schema isn’t too hard to copy, but I didn’t find any attempts from Apple whatsoever to aid or document this kind of interop.
– Efficient syncing of the server-generated data and combining it with local changes. Maybe this is solved with CD from iOS5, I haven’t looked into it. If that’s iCloud-only that’s a dealbreaker yet again though.
So I think it comes down to the fact that Core Data is just a bit too opaque and thus a rather risky proposition. I mean, it’s a lot more documented than a lot of third-party libraries, but crucially, those third-party libraries let me look at the source code (and modify it) when the documentation fails and I get stuck.
So although Core Data probably does more, better, its opacity is a liability. And unlike almost everything else about Apple’s libraries, Core Data is an all-or-nothing proposition. If whatever UIKit component isn’t working for you on a screen, you can roll your own FOR THAT SCREEN. If you run into an unfixable problem with Core Data, you either have to work around it, or migrate the whole data model for your app out of it. That’s a big risk.
I’ll certainly still re-evaluate Core Data for new apps, but I think you’re glossing over some important reasons not to use it.
I have pretty much the same sentiment regarding Core Data, it was somehow a little bit hard to get started, but after I managed to understand how it actually works, I really just forgot about the existence of SQLITE, and I tell you, I have a pretty neat case scenario where I was about to drop Core Data because it just didn’t “do” what I wanted. Of course, after some more research, I was able to pull that requirement I had, out of it. Its really a great tool, and if you don’t use it, really, you are doing something wrong, to quote the author
Thanks Drew !
H there and thanks for the great article!!
One question: I’ve always heard that CoreData isn’t a relational database, that is it’s not easy to perform my complicated SQL queries with it.
I mean, is it simple to execute a complicated query and get back the result set in form of an NSDictionary as FMDB does?
Because if it’s not, then many application should stick with FMDB.
What do you think about it?
Thank you very much!
Paolo
Great post. Definitely encouraged me to try CoreData for a future project.
I’d add an obvious argument to your list at the end: using CoreData doesn’t sound good if you need to port the app to a non apple platform. If you need to ensure it behaves the same way on Android and iOS then you’d use SQLite.
Excellent arguments.
I used every one of these excuses to steer away from Core Data over the past. But 2 months ago I had a project that simply HAD to have Core Data because I was not about to hand roll any SQLite backends.
Short story, and barely 100 lines of a code later Core Data has magically saved the day and fit in perfectly in a multithreaded, feature creeped, relational model application.
I don’t think I’ll ever build an app without Core Data again.
Hi Phil,
Core Data is not opaque at all. If you haven’t found good documentation, you haven’t looked hard enough. It’s been in production since the Next days (e.g. decades), much longer than, say, SQLite. There are many, many, good books and WWDC talks that delve deeply into internals.
There are two things you can do in your situation: either set a boolean flag that means “really created”, or you could use two MOC/store pairs: one for “inserted” and one as a staging area.
For read-only data, SQLite isn’t such a bad idea as an interchange format with a web service. So you might have gotten away with an SQLite dump for the read-only stuff plus CD for the read-write stuff. You should also look into NSIncrementalStore for web service data communication, see my earlier post. CD’s web service integration is a lot more advanced than SQLite’s (that is, it is not nonexistent).
> If you run into an unfixable problem with Core Data, you either have to work around it, or migrate the whole data model for your app out of it. That’s a big risk.
Well, first of all, I’ve never run into this kind of issue. But you completely can migrate out on a model-by-model basis if you want to. From application code they’re just Cocoa objects. So you can move back and forth between NSKeyedArchiver (or whatever) without any change to the view or controller code. Assuming, of course, you’re practicing good MVC.
Drew, any chance you could recommend some of those “many, many good books”? Maybe it’s just the recent iOS book goldrush that has produced a bunch of bad books on the subject.
Also, I can’t help pointing out the irony in ” If you haven’t found good documentation, you haven’t looked hard enough,” as a follow-up to “NSIncrementalStore is perhaps the best-kept secret in iOS 5. It doesn’t show up in any advertising materials, barely rates a passing mention in WWDC tech talks, and has the world’s shortest programming guide, at only a few pages. It is so well hidden that I only discovered it very recently.” (from the NSIncrementalDataStore post)
Re: WWDC – noted; if I embark on another project that’s a Core Data candidate, I’ll look through the archives.
Re “staging” objects – yeah, those are the two solutions we encountered. Both give me the creeps as dirty, dirty hacks. The flag doesn’t scale as suddenly the whole rest of the app needs to know about this implementation detail. The solution with two MOCs is far from obvious. Maybe it’s documented somewhere, I don’t know, but I eventually found it in some mailing list post from the early 2000s.
For the hybrid read-only SQLite & read-write CD: does that permit some kind of efficient cross-database JOINs, like raw SQLite does?
> “Well, first of all, I’ve never run into this kind of issue. But you completely can migrate out on a model-by-model basis if you want to. ”
That isn’t a real solution though, is it? If you can’t run queries across your whole data model, that makes it fairly useless. So you’re basically stuck with going the whole hog.
I used SQLite because I had to build the same functionality into both an iPhone app and an Android app. Even though I knew I’d have to rewrite the code in Java, I figured that having the same database schema, the same SQL queries, and the same function calls across both platforms would be beneficial. Was I wrong?
@Frank, I think if you have a lot of queries that you can share e.g. in a text file that is shared between both platforms, you can potentially get some reduced maintenance burden because new features over here can be more easily ported over there.
But, you have to balance this against, for example, having to roll your own undo support, additional time to performance tune, and so on. There is a class of applications for which this tradeoff is a net positive, but I would say probably not the majority.
@Phil,
Pro Core Data by Privat is the bible these days, Zarra’s book is dated but still very good.
Claiming that CD’s documentation is bad because NSIncrementalStore’s documentation is bad is like claiming UIKit’s documentation is bad because UITextInputStringTokenizer’s documentation is bad. Both classes are advanced tools for solving advanced problems, and a are a small part of a much larger whole. I wrote the NSIncrementalStore post to tell the advanced CD guys “Hey, you should take a look at this!”. But my discussions with people IRL has often devolved to “Why should I be using CD anyway?”, so it occurred to me that there needed to be a more surfacey discussion of why CD is a good idea to begin with before delving into the One True Architecture conversation.
> The solution with two MOCs is far from obvious.
“Obvious” is subjective. Obvious to you. I came up with it in 30 seconds. (I’m also a CD veteran.)
A lot of the things we programmers do: Cocoa, MVC, ARC, autolayout, UITableView, git, are all “far from obvious”. But they are still incredibly powerful tools to solve incredibly common problems. For something that is a critical part of the toolbelt, “obvious” is not the right criteria. A Keurig coffee maker with one button is great if you just need some coffee, but if you make coffee for a living, you need an expensive espresso machine with the levers and analog pressure meters and hoses and that. I write models code for a living, and anyone in that position needs actual professional-grade tools. Pro tools are not obvious. They are not easy to pick up. They are good. Core Data is good. That’s the claim. Not that it’s easy.
> So you’re basically stuck with going the whole hog.
This is true in the sense that it is true for any database. It makes just as much sense to blame SQLite for its failure to interface with your CD models as it does to blame CD for failure to integrate with your SQLite models. I don’t know what this has to do with one or the other.
Actually, that was a complete lie. You can do cross-domain joins with NSIncrementalStore (but this is “advanced”, and might take me an afternoon). So in a very serious sense, CD is better able to do cross-domain queries with an arbitrary alternate database than SQLite is able to do the same with an arbitrary alternate database. The solution isn’t “obvious”, but it’s a lot easier than it would be cross-joining (say) SQLite with MySQL.
thx for the nice article.
i only dismiss CD when i have heavy usage of (INSERT OR REPLACE/UPDATE) and DELETE FROM table;.
– Bi-directional synchronization with servers
– Bi-directional synchronization with peers
How does CoreData solve that? Are you referring to the willSave notifications which allows you to implement a synching framework? Or is there built in functionality you are referring to.
Hi, nice site and articles.
With my feelings i’am sitting between the two kamps. I’am convinsed that I need CD, but it feels difficult. Like the questions of michael. You claim it is something normal, but for me/us it is hard to figure it out the CD way. Btw I have a sync lib not using CD, but its old. So I need to step up to CD. Also interacting with two different stores in one app, for me it feels like hell. Maybe you can write an article about those two, sync and two stores.
Thank you
Wonderful article, thanks.
(Brief intro: I’m a long-time Filemaker developer (20+ years), just beginning OS X/iOS development, so I’m definitely thinking beyond my current skills).
The application I’m working on right now seems destined for Core Data – it has a very strong database component. Unfortunately it’s a complex scheme that needs a great deal of flexibility – users will want to rethink their classification and the fields they use frequently as their datasets grow and the addition of custom fields is a necessary feature.
Let’s call my base object a ‘record’.
Ordinarily I’d create a Record class and use an NSMutableDictionary to hold the fields for each record so users can make changes on the fly.
Since CoreData does not allow the editing of attributes at runtime, a ‘record’ entity with attributes (instead of keys) doesn’t fulfill my needs. I could stick ‘userDefinableField1-n’ into the model, but that would just mean that most of my users will wish they had n+1 fields to work with. Also, users may want fields for different purposes, including attributed text, numbers, and images, so providing one potential attribute would not be enough.
At this point, I cannot see a way to implement this aspect of my application using CoreData. The idea of having all my data in one big lump in the database and fetched using predicates makes me shudder (though maybe that’s the way to deal with this?) and documentation, including Zarra’s book (which I find somewhat prolematic) don’t give much help with this. At the same time, the model seems almost trivial to design using subclasses of NSObject and I’m still at the time in my programming career where I go <handwave> I’ll solve the problem of performance optimisation later, I just want something that works so I can move on<waves hands again> (this might give you some insight why people don’t use Core Data).
It does, although it’s not something I would recommend trying as a first project. You’re way outside of xcdatamodeld land at that point.
But then again, I wouldn’t recommend it as your first sqlite / MySQL project either, or for that matter for their first custom database implementation. So we’re at solution parity here.
I think the real problem that you face is that arbitrary schema changes are arbitrarily complicated. How do you write queries for arbitrary schemas? How do you do migrations when the schemas change? There are (hopefully!) simple solutions in your specific case, but there are no solutions in the arbitrary case, and you have presented an arbitrary problem. And whether the solutions to your problems are simple or hard doesn’t have much to do with what storage technology you use, as it’s either simple or it’s hard for all the reasonable contenders.
If you really need user-configured arbitrary schema support, you are reading the wrong article. The real problem you face is designing a user interface that sufficiently empowers mobile users to edit their schemas. To the best of my knowledge, this is a completely unsolved problem, even by companies where databases are the core business and are quite motivated to study it. That’s where the focus should be, and your data layer should be whatever crazy implementation that lets you iterate the UI prototype the fastest. Once you’ve got that solution proven is the time to look at how anything is stored.