nanomsg postmortem and other stories

08 February 2016 by Drew Crawford Published in: open source, rants 5 comments

Update: In the two years since I wrote this, Garrett has taken over the project and disputes a lot of things in this post. You can read his point of view here

nanomsg was a once-bright alternative to ZeroMQ. The project had a lot going for it:

It was a rewrite by the original author.
It was a rewrite in C, and there were really solid technical arguments at the time why C was the right language
It was MIT-licensed, which was more commercially friendly than LGPL
It had a plugin model, whereas ZeroMQ was vertically integrated
It wanted to become an IETF standard, and spawn other implementations. One implementation is quite well-maintained and will outlive the project.
There are many more goals, which you can read about in the documentation.

Unfortunately, it didn’t work out. The current maintainer resigned, nobody will take his place, the mailing list thread says “Dead?” and has now devolved into incoherent ramblings.

I’m going to write a postmortem about what I learned. Not the postmortem the project deserves, but the one it needs right now.

Who are you and what is your bias?

I’m a user and sometime developer of the project. At my peak I had several thousand customers using my “dark fork” of nanomsg, and rewrote around a third of the project.

I contributed most of my changes initially upstream (I am ranked as contributor #12), but it became clear to me that upstream didn’t like my direction, and I struck out on my own. Therefore one interpretation of this essay is as the revenge of a scorned contributor. There may be some truth in that characterization.

However, I prefer to think that I left the burning house at the right time. In 2014 I excised all nanomsg from my projects, replacing it with my own from-scratch library that I’ve carefully maintained for two years. At first my silence about striking out on my own was to avoid creating drama in nanomsg. But it soon became apparent that working on my own lifted a huge weight on my productivity, and so I just ignored everybody else completely. That’s why when ZeroMQ expressed an interest in working with me, I ignored them too.

So that is how I wrote an entire mini production-class messaging library in secret. And for the reasons in this essay, I am likely to keep it that way.

Politics and technology

While nanomsg’s presentation of the fork rationale was all technical, from ZeroMQ’s perspective, it seemed all political. I reached out to Pieter Hintjens at ZeroMQ to see how he described the events leading up to nanomsg’s creation.

His recollection is essentially:

Martin (who later became the primary nanomsg maintainer) made a series of decisions which Pieter/ZeroMQ felt were “not in the best interests of users”, including releasing several incompatible versions of the software without discussion.
Pieter proposed a new open governance model, in order “to end Martin and Mato’s control over the project.” Crucially, all changes must “get consensus” regardless of their source.
Martin essentially rejected this process, basically arguing that not all opinions are equal.

After that, the schism started.

On the other side of the ring, nanomsg had its 95 theses of technical grievances with ZeroMQ that I summarized at the outset and you can read all of them if you’re so inclined.

So was the situation dominated dominated by politics (as ZeroMQ maintains) or technology (as nanomsg maintains)? Predictably, Pieter/ZeroMQ will tell you it is policy, and Martin/nanomsg still believes it is technology:

I guess it boils down to the difference in personalities, with Pieter, being an extrovert advocating the idea of a software project treated as a social club, place where people with similar interests get together, feel at home, have a good time and eventually do some good work.

Me, being reclusive to the point of misanthropy, I see a software project as purely technical endeavour, an exercise of craftsmanship, without caring too much about its social aspects.

As someone who no longer uses either project, I think both theories of the schism capture some essence of the truth. nanomsg did win (and lose) some technical victories. And ZeroMQ’s community process protected itself from the worst of nanomsg while also insulating itself from the kind of radical breakthroughs (and losses) nanomsg made.

So while the politics vs technology debate is a nice soundbyte, and has been discussed by the insiders ad nauseum, I think it oversimplifies a much more complicated situation.

The early years

At inception, nanomsg was composed of essentially two kinds of people:

Martin, who was a demonstrated expert in writing a messaging library
The rest of us in the peanut gallery, who were vaguely annoyed at ZeroMQ for some reason

Taking a look at the commit graph confirms this story. There’s Martin with his 1k+ commits, and then there’s everybody else.

I was one of those “vaguely annoyed” people. Specifically, ZMTP was bloated and performed poorly in my situation, and maintaining my complex patches to a huge C++ codebase was becoming untenable. I figured I could be much more involved at nanomsg than I was under ZeroMQ, and avoid the need to keep my own patches at all.

From a technical perspective, I was about 80% correct. nanomsg was much better suited to my problems than ZeroMQ was. My code was faster, it ran better, and I maintained it easier. It was night and day.

From the community perspective, I was about 80% wrong. I began to propose far more radical ideas than nanomsg’s “modest” reforms. It did not go well.

Digital dischord

The problem is that when people mutiny, they often mutiny for different reasons. We were all vaguely dissatisfied with ZeroMQ, but we were dissatisfied differently. I had architectural and performance disagreements with ZeroMQ, but others were there for a ZeroMQ clone with more permissive licensing. Still others didn’t want the code at all, but wanted a set of simple open standards that would be ratified by the IETF.

So long as nobody actually admitted why they showed up, everybody could assume that the others were there for the same reasons as them (a.k.a. the false consensus effect). But when we started talking about actual substance, we began to realize that our dislike of ZeroMQ was not the start of our shared worldview, but the end of it. Like the joke when the doctor tells you “if doing that hurts, don’t do it” so we learned not to talk about things that made our real disagreements clear.

In ZeroMQ, there is a shared point of consensus–the ZMTP specification. On several occasions Pieter tried to get us to follow it so we would be back “in communion” with ZeroMQ. Meanwhile, I and at least a few others left ZeroMQ in order to get away from that specification. So obviously nanomsg rejected the idea.

But I think we missed something critical in that decision. We never developed a concrete declaration of what nanomsg was. We had a concrete idea of what it wasn’t, that is, how it differed from ZeroMQ. But the community never really formed a “positive” manifesto of what the project wanted to be when it grew up. That was part of our undoing. Everybody assumed the manifesto was whatever they themselves imagined it to be, and we all imagined something different.

A positive manifesto

Looking back, this is much more obvious than it was at the time. Most of us imagined that ZeroMQ “was” a software project, and so if we explained how our software was different from ZeroMQ software that was a sufficient declaration for everybody to be on the same page. However, what ZeroMQ actually is, is the ZMTP specification, and it houses a collection of separate implementations, one of which is commonly glossed as, but is not, ZeroMQ.

As it turns out, there is no reason we could not have solved nearly the entire list of technical grievances as simply “yet another ZeroMQ implementation.” This is why Pieter kept poking us to come back. The only reasons not to do that are either:

Objections to the ZMTP specfication (my motivation)
Objections to the governance model (Martin’s motivation)

We did not openly discuss either of these, or seriously advance an alternate vision for either of them. You will not find any discussion of them in the documentation or even on the mailing list. Now that the project is dead, Martin has a blog post that dances around the governance issue but it is still not a concrete proposal in any way.

We kept talking about things like “threading” and “C++ sucks” and we had serious, substantive conversations about those things. But all of them would have been perfectly solveable inside an alternate ZeroMQ implementation. So those were not, I believe, the real reason for the fork.

I will now examine what we actually did on each of these real rationales for the fork.

Troubled governance

Governance I think we did worst. I do not mean to blame the actual governors themselves (Martin, and later Garrett). But I mean as a community, we never really formed a clear idea of how the project worked. There was never any formal declaration, at all, of who was in charge of the project and how it was governed. We all sort of assumed it was Martin, and that worked fine enough for awhile. But when Martin stepped away to work on other things, this power crisis became so substantial that there was briefly a project fork simply so work could continue.

Finally we declared Garrett our benevolent dictator, and things went on just fine, until he stepped away to work on other things and… you see where this is going.

Beyond just who controls the project, there was substantial debate about what that person was empowered to do. Garrett, for example, tried to install a code of conduct on the project, but ultimately decided due to the critical feedback he received he didn’t have that authority:

I could have just enforced my will upon the project, but since the project existed before I came to it, that doesn’t feel right.

Well if he doesn’t have the power, who does? Nobody, it seems. We broke away from ZeroMQ governance, but we never bothered to replace it with anything substantial. With nobody in charge, there is nobody to table discussions, or to say “we’re doing it this way, if you don’t like it, fork.” So the most pernicious debates could never be resolved.

To be clear, I think Garrett did as well as he could under the circumstances. I briefly considered getting more involved around the time he was nominated, but I figured his moderatism would go over better with a fragile community than my radicalism. It turns out even his moderate changes were considered controversial. The project didn’t understand what it lost until it was too late.

Troubled specifications

We did not fare much better when distinguishing ourselves from the ZMTP specification, the core specification document for ZeroMQ.

We did write RFCs, and those may be the only part of the project to live on in any significant way (in Garrett’s mangos). I think the fact that this lives on while the project does not is an important clue of what killed it.

However, nanomsg RFCs are themselves controlled by a variety of design constraints that are a mystery to me still. For example, the design of REQ/REP relies on “Sustrik’s theorem,” which he stated once on the mailing lists but did not define any of the terms. When I pushed for a definition of “state” I was told “Business state” which was about as clear to me as it is to you.

Several of us did not understand the rationales at all. One contributor wrote to me privately:

With his AMQP and ZeroMQ experience, Martin has a lot of experience with this stuff. He seems to be convinced that nanomsg communication patterns should be stateless (as much as possible), and that any state should be layered on top of it. Which feels plausible to me, at least, though I don’t feel I have enough experience to make deep judgement calls about it.

Everybody just sort of assumed this made sense, which worked, until Martin left, and then it didn’t anymore.

Ultimately the lack of clarity around the real design principles had two consequences:

The project had a critical bus factor, that ultimately played a major role in the governance crisis. To this day, I think nobody really understands many of the internals. There are a lot of bugs in them, that none of us were able to fix.
Most of us did not understand the design principles, and so we backfilled them with whatever idea made sense to us, creating more false consensus and maintaining the illusion that we were all on the same page

If I personally had understood the true project goals, I would have realized I needed to leave much earlier.

An unfriendly community

It is all the rage these days to talk about how programming communities are “toxic” and so on. I will leave that discussion to other people, as I think there are a lot of nuanced positions that get lost in that bikeshed.

I will, however, say one thing. I don’t know what led Garrett to attempt to install a code of conduct, but I do know what part of the community bruised me. The community was quite hostile to taking my patches:

(I’ve already brought up my concerns over the security framework you’ve chosen in past discussions)

…but was also hostile to me using them by myself:

As you’ve made clear that your intent is an incompatible fork regardless (due to silently differing semantics), then there seems to be no help for it.

If you beat me with a stick no matter what I do, you lose the power to influence my behavior at all. nanomsg could have taken a ton of free labor from me, alternatively it could have played a role in shaping that work into a more palatable form, alternatively it could have kept me around as an experiment to see how I faired. There are probably more productive options that I’m forgetting. Instead it decided to drive the unbeliever from its midst.

My contributions may have been not critical to the project, but Garrett eventually received similar treatment:

I was trying to do a good thing; instead I got pounded for it. Frankly the negativism that accompanies most OSS projects makes me wonder why we do this at all. After all it is always easier to gripe about who some program is crapware than to do anything constructive. FOSS maintainership is a thankless job. The CoC debacle just underscored that for me

As to being bribed to come back – while nothing is impossible – it seems unlikely to me. The nanomsg code base is pretty unfun for me to work on and it would take a lot to get me interested in it again. The lack of sufficient interested contributors has meant I had to carry too much of this beast – not my baby – on my own shoulders. The thankless nature and sense of inherited responsibility didn’t make it more fun.

Meanwhile the person who is probably best-qualified to lead nanomsg right now never publicly replied to nomination to leading the project.

So while driving out a “radical” like me was arguably a sensible decision, it didn’t stop there. It seems that anyone doing anything of substance has been driven out of the project by the same forces that prompted my exit two years ago. Now there is nobody left to do the work.

The Rule

This brings me to the ultimate conclusion I draw from this parable. And it has nothing to do with politics, technology, codes of conduct, kindness, or anything else. And like all important lessons, we learned it in kindergarten.

Open source projects hinge enitrely on contributors. Without regular patches, the project dies. Or, as someone put it, rather ironically, in the email that drove me out of the project:

A protocol spec only dies when people refuse to work together on it.

This is how nanomsg died: we decided not to work together, and the people doing actual work all went to work separately.

Eventually, it is inevitable for all contributors to fall away. They lose interest, they get involved in another project, they take a new job, they get annoyed at the community.

From this observation I derive the following Rule: the success of a project requires contributors entering the project at least as fast as they leave. As long as contributors are at least “exactly replaced”, the project remains stable. But if fewer contributors enter than leave, the project enters a death sprial as contributions shrink and eventually become zero.

Technology, politics, social factors, governance, and more all play a role in the rate of attracting new contributors or pushing out existing ones. But they are merely knobs, of selecting for one kind of person over another. Projects can and do exist without sound technical or social policy. They cannot exist without contributors. So if we have to pick one, the critical thing is making sure the rates meet requirements. All the other factors are implementation details.

nanomsg’s mistake was to build a tent so narrow that nobody can fit under it. It would have been better, I believe, to tolerate anyone who sent patches, even if their ideas seemed strange or wrong, then to tolerate nobody.

Several of us tried to widen the tent, and Garrett in particular deserves a lot of credit for trying very hard, but ultimately the community preferred no patches to debatable ones, and so no patches is what it will get.

Circling back

nanomsg was one of many projects I criticized anonymously in an essay two years ago, conduct unbecoming of a hacker. I avoided naming it because it was not yet clear which way it would go. It is very clear now.

When a patch is proposed to the 90%+ of projects that aren’t super popular, there should be a presumption that the patch should be merged.

nanomsg took exactly the opposite approach, rejecting “patches” even from the person allegedly in charge. This kills the project.

Legacy

nanomsg is survived in several key places.

ZeroMQ, obviously, which I think has incorporated at least some of the technical improvements that nanomsg made. It cannot, obviously, take the radical changes, because they break ZMTP.
mangos, which is a Go implementation of nanomsg, and is quite competently maintained
My radical library, which I have learned not to discuss that much to avoid beatings. I shipped 1.x ages ago, and 2.0 is now in an internal beta.
I have a strong suspicion that others involved in nanomsg also took their work underground, based on subtle clues I have seen, but I have no proof.

But perhaps more important than software, are the lessons I learned, because they are cross-platform.

I continue to see projects “drive out the unbeliever from their midst” in the same way nanomsg did. And on similar risky foundation: a single maintainer, with no clear plan for what will happen when they lose interest. “We don’t want to go in this direction” says some developer using the royal “we” who may not be around next year. You can probably think of examples.

There is sometimes even sound technical justification. But the technical justification does not actually matter, it turns out. What matters is The Rule.

These days I have seen this movie before. So now I can skip to the end where I leave instead of getting invested.

My own maintenance

I maintain a pretty huge number of projects, and watching nanomsg implode has changed how I view that as well. Whereas before I would approach patches from a “technical review” hat, I wear it much less now.

I still do have a conversation about if something in the patch can be improved. But at the end of the day, I ask myself: is this PR going to improve or harm my compliance with The Rule? In almost every case, it is better to merge even a bad patch than to turn away a contributor for the projects I already struggle to maintain. So I try to get the patches improved, but I merge them even when I can’t. Even bad patches are better than none.

A new trick

In addition to avoiding bad projects, and better maintaining my own, I have learned another trick. These days I am much quicker to gather up people as they are driven out, and see if we can find some way to tolerate each other. And when you accumulate exiled people who can tolerate each other, you can do great work together, even if you don’t agree on everything. It’s something I failed to do with nanomsg, but I’m better at it now.

It turns out that these “rejects” are often very bright, extremely motivated people. It makes sense: an arbitrary project maintainer is merely average, because there is no magic wisdom bestowed upon a person for setting up a GitHub repo. Meanwhile the average person who read (and understood) somebody else’s code well enough to contribute a PR complex enough to be rejected sails over a very high bar. So the average member of that set is very competent.

Due to a variety of factors though, they will often fail to act on their own. Mailing lists are cauldrons of groupthink, and when you hear the same opinions over and over again you start to believe them. Forking the project sounds hostile, and what will happen when you tire of the project and move on to something else? (And that thought alone is typically proof that you understand more than the maintainer.)

Often, all it takes to shake someone is an email. Did you know that I too see a clear need for your patch? Did you know that there is room in the unlimited sea of GitHub for more than one vision for [project]? Did you know that GitHub has a fork button that you can press and nothing bad happens? Did you know that I too will contribute code to this effort? Did you know that several people are disaffected by upstream and we can rally them all together into a united front? Did you know that if we do this, we can unsubscribe from the mailing list, and there will be less drama in your inbox? Did you know that we can all go back to coding instead of endless flamewars?

Now that I know what to look for, I have frequently been in the right place at the right time to shake the right person. When it works, the result ranges from “good” to “oh my god, did we really reimplement all of upstream in a weekend…?” When it doesn’t work, someone thanks me for being the one person who was brave enough to believe in their PR.

I have been fortunate enough to work with many bright people as a result of this trick, and collectively we have done some incredible work. It may be the single most important open source trick I have ever learned. And perhaps, dear reader, this blog post comes to you at the right place and the right time to shake you too.

In conclusion

Unfortunately, this is all too late for nanomsg. I have circled back to those of us who could have chosen differently, and we are all too far along now to re-choose to work together again. However I think nearly all of us report we’re better “out” than “in”, so there is a silver lining. If there is one thing we’ve learned it’s that forking is not nearly as dangerous as it says in the news.

While it is too late for us, it may not be too late for you. nanomsg’s collapse is an archetype I see everywhere now, and if you know what to look for, you can avoid it happening to you.

If I have learned one thing, it is to always keep my eye on entrance and exit rates. When the numbers are good, you can do all the technical and policy bikeshedding you want, but when the numbers are bad, we have to be mature enough to put those ideas aside in the interest of widening the project’s tent to recruit our replacements. Projects can ignore this for awhile, but then people leave, and suddenly the thing falls over.

Want me to build your app / consult for your company / speak at your event? Good news! I'm an iOS developer for hire.

Like this post? Contribute to the coffee fund so I can write more like it.

Comments

qznc

Mon 08th Feb 2016 at 4:29 am
Great and insightful article. Thanks. Especially the remark that a rejected PR might be a badge of honor resonated.

Where I cannot follow is your section on hostility. The quotes do not seem hostile to me. A rejection is not necessarily hostile. Can you clarify that?
Pieter Hintjens

Mon 08th Feb 2016 at 5:06 am
Drew, thanks for writing this. It is valuable to document our failures, to learn from them.
Michael

Mon 08th Feb 2016 at 9:14 am
Good thoughts.

I can tell you I am kindred spirit in the sense that I have lived long enough to know what it is I want in/from a software project and won’t accept second best. The “unbeliever” in your words.

I will also tell you I am a third sort of person. I subscribe to several messaging / bus / networking forums because I have a morbid curiosity about the technology. I have rolled a couple of my own such libraries, or have been involved on projects that included said technology. Having done so would think twice before adopting such a library not least of which since so much of my livelihood potentially depends on it.

I fall somewhere in the middle of purist thinking I believe. Call it a moderate passion for software craftsmanship, which is most definitely people and service minded first followed closely by honing one’s technical skill.

I did interact with Garrett briefly throughout this ordeal. And will be clear that as far as I am concerned no one was twisting anyone’s arm to participate, contribute, much less demonstrate leadership capacity. We’re all adults and know the cost being willing to do so, or not. We can claim we were nominated if elected, but at the end of the day there is a personal responsibility.

Anecdotally, I have similar experience with a project called Automapper. I submitted a PR that would radically change it from being “static” to instance based. The result of which, my work was shelved for months, and when pressing the author whether I should wait for months more, received a b/s response, contrary to the weeks of discussion he and I passed back and forth leading up to that. He knows who he is and what was discussed, the details of which aren’t important here.

I took that to mean, simply, I wasn’t asking his permission whether my contribution was viable. It would serve my needs then, now, in the future. So I forked and forged out on my own with Micromapper. Ultimately I will use my fork, but the library is but a part of a much greater whole, so I care not to get that invested.

The funny part is that Automapper has now chosen to adopt my contribution, in philosophy, if not also partially in implementation. The author did attempt to reach out to me, but having read his retort correctly as a semi-major setback, believe that I made the right decision. It was either that or live with it, which frankly at the end of the day I loathe being that beholden to any man.

I took a moderate approach initially and replaced the static bits with instance friendly ones, but with the express intent to obsolete and eventually remove the static ones entirely. Now that Automapper has moved perhaps Micromapper will as well. A little friendly OSS competition isn’t a bad thing IMO.

Anyway thanks again for the thought provoking blog.
Jeff Ratcliff

Mon 08th Feb 2016 at 12:19 pm
I have no experience in open source or ZeroMQ but your post still resonated with me. It’s taken me many years to learn how to be a good software collaborator and I’m still learning. You’ve taught me a little more. Thanks.

Comments are closed.