[Interview] Brian Shirai on Rubinius 2.0, the GIL, and thread-safe Ruby code

Published on March 26, 2013 by Jesse Storimer

Brian Shirai and I recorded a conversation about multi-threaded programming in Ruby and his work on Rubinius. Listen (or read the transcript) to hear about:

  • what's the global lock and its effect on concurrency?
  • why the GIL is not a substitute for thread-safety
  • why multi-threading is hard to reason about, and the best approach to doing so
  • the three things that Ruby (as a language) needs going forward, to better support concurrency
  • how they removed the GIL from Rubinius and added tools to make it a better environment for concurrent programs
  • what's coming up in Rubinius 2.0

If this interests you, make sure you check out my upcoming book about multi-threading in Ruby. It will include more interviews like this with folks who have a deep understanding of this topic.

Listen to the audio



Jesse Storimer: Alright, so I'm here with Brian Shirai. Brian works on the Rubinius implementation of the Ruby language, and he is the creator of the RubySpec project, among other things. So thanks for talking with me today Brian.

Brian Shirai: Hey Jesse, good to be here.

Jesse Storimer: So one of the reasons that I wanted to have a chat with you was because you work on an implementation of Ruby, so you have an inside view of the concurrency story. But before we get into talking about Rubinius, the implementation you work on, I want to talk a bit about MRI, which is the first implementation of Ruby, you might call it the reference implementation, the one that everyone looks to.

MRI and the GIL

Specifically related to concurrency, MRI has a global lock. So can you, from an implementer's perspective, give people an idea of what the global lock is and what it does for the MRI internals?

Brian Shirai: Sure. So there are a lot of things that you could put a global lock around. The global lock would just be a single lock that would allow only a single thread of execution, that thread that held the lock for any particular thing.

The thing that's interesting about the global lock in MRI is that that global lock is around execution of Ruby code. So essentially, no matter how many CPUs or cores you have, there is a single thread of Ruby code execution, such that all your Ruby code is serialized, so that only one thing can happen in Ruby at a time. Now, that doesn't mean that things can't happen simultaneously, because IO can happen, but there's only a single thread of Ruby running at any instant in time.

Jesse Storimer: So does mean that as a developer, as a user of MRI, that the global lock gives me all the same benefits as writing thread-safe code? Because my code won't be ever running in parallel, being accessed by two threads.

Brian Shirai: No. And it's one of the biggest misunderstandings about the global lock, and it's so frustrating because it's absolutely not true. If someone tells you that that's true, they don't know anything about multi-threading. There are people in the Ruby community who continue to assert that MRI's global interpreter lock is somehow good for the correctness of programs and that's absolutely false. It could not be more false.

Concurrency, disentangling concurrency and parallelism: concurrency is about interleavings. You can have concurrency with a single processor. When we talk about interleavings if you imagine your program diced up into a bunch of little pieces and then scattered on the table and then put back together in some order, that would be an illustration of interleaving. And different interleavings have different semantics. Typically, the semantics of your program are not equivalent for all interleavings.

So when you look at something like the global interpreter lock and where the context switches occur, you get into this idea of non-determinism. There are a lot of sources of non-determinism, but basically, non-determinism can be considered a randomness that varies those interleavings. So you have a bunch of different ways your program could execute. Some of those ways have a meaning that is correct, given what you want your program to do. And some of them are incorrect.

Unless you have a mechanism for guaranteeing that the correct interleavings are allowed and the incorrect ones are disallowed, your program will not be correct under all circumstances. It may appear to be correct a lot of times that you run it, but that's not a guarantee, and that's just playing Russian roulette.

If that's the level of concern that you have for correctness: shoot, let it run and see if it works, then yeah, maybe the global interpreter lock can help your program's correctness. But no, you take a gamble. If things came out right, you know, phew, you dodged a bullet on that one.

It's just a terrible approach to programming. Absolutely, unequivocally, no, the global interpreter lock does not guarantee semantics. It's there for the semantics of the underlying virtual machine and its data structures. It's not there for your Ruby data structures. If someone wants to present to me a proof that maintaining the invariance of the underlying VM ensures that all possible Ruby programs are correct, I'm happy to entertain that. We'll be waiting an infinite amount of time for that to happen. But no, it's absolutely not true. And we can go into any of those different areas in more detail, but it's absolutely not true.

Understanding thread-safety

Jesse Storimer: Right. So there are two areas in particular that I want to explore in more detail. One is the non-determinism and the other is the data correctness. So the non-determinism essentially comes from... in Ruby the threading that we have in all the Ruby 1.9 versions is native threads.

Brian Shirai: Right.

Jesse Storimer: So, threads provided by the operating system.

Brian Shirai: Right.

Jesse Storimer: And the non-determinism comes from the thread scheduler in the kernel, right?

Brian Shirai: Some.

Jesse Storimer: Which decides when to switch threads?

Brian Shirai: It may. So MRI, the way that MRI implements that lock, it tries to ensure that if a thread could be doing IO that it's not blocked behind another thread that's executing Ruby. There's a lot of machinery in MRI that gets in the way of the OS's native thread scheduler. They're orchestrating a lot of stuff and that's another reason why the global interpreter lock shouldn't, and can't, be relied on to guarantee program semantics. Because they can change where they decide to lock and unlock, and when they decide to let a different thread run, when to let a context switch to occur. They decide that in the VM. And they may change their mind, MRI's code is constantly changing.

Jesse Storimer: So it's an implementation detail rather than a feature for users to rely on.

Brian Shirai: Absolutely. That's correct.

Jesse Storimer: Okay. And so we were talking about data correctness. This is one thing that I think a lot of people are confused about. The term "thread-safety", or "data-safety" even, gets thrown around as like a big, bad thing. You don't want your code to not be thread-safe or else bad things will happen but it's not always made clear what those bad things are.

Brian Shirai: Right.

Jesse Storimer: And the worst thing that could happen is that, like you said, your data is incorrect or your data is corrupt. So I want to have a more concrete example of how that might happen, a really simple example of something like trying to increment an integer. How could something like that go wrong when multiple threads are at play and there's no synchronization?

Brian Shirai: Sure. That example is often used where you read a value, and then you add one to it, and then you save it back, and something else may have read the same value that you did. So instead of getting that value plus two, you get that value plus one because two things try to increment it at the same time.

But there are other places where that can be a problem. So, when you add an element to an array, there are really two distinct things. There's where you put the element and then there's the value that says how many elements are in the array. Because those operations are not atomic, if two things are operating on the same data, they might put two different elements in the same spot.

Or they might put the two elements in there but then they have a race, what's called a data race; they race on updating that value for the counter. Maybe they both get the elements in there correctly, so your array has the right number of elements but when they go to increment that counter, that operation we just talked about, they only add one instead of adding two. So now your array says it has five things in it, but there are six elements.

And you can imagine more complex data structures where these sorts of things can become even more complicated. And what we're basically talking about is a concept called invariants.

The idea is that if you have an array, this structure that has n spots for objects, and then you have a counter that tells you how many are in there, the counter and the number of things in that storage are always supposed to be consistent. If you have five things, the counter should say five. So there are a lot of different places where our programs, to operate correctly, should have invariants like that.

And unless you ensure that those invariants hold, by doing something like an atomic update, where you update both the object in the storage and the count of those things, unless you do those things together so that they can't interleave, back to interleavings, in one of the ways that has incorrect semantics. Unless you ensure that, you don't have correctness.

The other problem is that the word "thread-safe" is tossed around, and to me it means a couple of different things. But essentially, it always comes back to this idea of those invariants and certain interleavings resulting in incorrect semantics. We talk about race conditions; we talk about concurrently accessing shared data, these sorts of things.

And I think the thing that's really important to understand is that the semantics for correctness and concurrency do not necessarily require things to happen simultaneously. When we have a multi-core, or a multi-CPU machine, things can happen simultaneously. They can happen essentially at the very same time. But concurrency doesn't require that things happen simultaneously or in parallel, concurrency really is just talking about interleavings.

And so if you just go back to that idea of chopping your program up into a bunch of bits, there are different ways to put that program back together. And I think that it's probably more productive to talk about things like race conditions, or interleavings, invariants, than it is to just lump things into that word "thread-safe", perhaps.

Jesse Storimer: That's a good point. Talking about concurrency versus parallelism: Rob Pike had a really interesting presentation about that. He talked about how concurrency is a way to organize your code which doesn't necessarily imply that it's going to be running parallel, but concurrent code easily enables parallel code. It's just an interesting way to think about it.

Brian Shirai: One point I would make about that is that technology is changing all the time. The machines that we're working on, they get faster and faster. Clear back in the 80's people were talking about the fact that we were going to have to address this issue of concurrency with multiple cores or multiple CPUs. And it's taken a long time to get critical, where you essentially have to or you're going to be wasting resources.

So understanding concurrency is really valuable at the point where hardware makes it possible to do more parallel stuff. If your code is correct from a concurrent standpoint, then that parallelism should be possible. And so it is interesting to keep those two separate, but understand that the benefits from parallelism are something that, at one point, we couldn't attain easily. We had uni- processor machines with a single core and yet we could still write correct concurrent programs. Now that we're at the point where we've got the hardware, we can start taking advantage of it.

Writing 'correct' thread-safe programs

Jesse Storimer: Good point. So I want to go back and talk about interleavings, because that's an interesting topic. So the idea that... actually before we talk about that you were talking about data safety there for a minute, or rather having the correct data. You gave the example of an array with the correct number of elements but the incorrect value for the number of elements that are stored there. That's something that I think is a scary thing because your program hasn't crashed and you don't have any notification that something is wrong, but the underlying information about your program is incorrect. And it's incorrect going forward into other operations.

Brian Shirai: Right.

Jesse Storimer: And that's really the worst that could happen, is just what I wanted to bring up.

Brian Shirai: Yeah, well the other thing to note about that is: correctness is a difficult property. A lot of times we get into these discussions of statically-typed versus dynamically-typed languages. There's assertions that statically-typed languages give you more guarantees for correctness then dynamically-typed. But the fact is that using pretty powerful languages, verified systems are still few and far between. It takes a tremendous amount of effort to verify something like a garbage collector.

They've actually done this in cases, but correctness is a huge challenge in software, a tremendous challenge, regardless of what particular paradigm you're programming in. And so when you get into things where non-determinism can have such a big impact, and we have machines that are inherently non-deterministic with multiple cores and IO events and connections to the Internet and stuff like that. All these different sources of randomness in the system, and testing it being extremely hard, correctness is a huge challenge.

So being really clear about what's correct and what isn't is important. We're battling something that is non- trivial. We have no good tools that are generally applicable right now. We have to reason about it, we have to use our brains, we have to architect systems in a way that makes it easier to produce correct systems and harder to produce incorrect systems.

Jesse Storimer: So that's interesting, on a project like Rubinius, I'm sure you guys have thought about the fact that you're running in a multi-threaded environment. How do you ensure that the internals are correct all the time besides just having a human look at the output and try to understand what it should be?

Brian Shirai: Right. It's really hard.

We don't have a good answer. It's a great thing to actually chat about.

The Rubinius virtual machine has a number of threads that it uses behind the scenes. We call them auxiliary threads. They do things like the just-in- time compiler. It is on its own thread with very few interactions with the mutator thread, the threads that are actually changing objects in your store, or in your heap, or in your object graph. We have a signal handover thread because signals, and processes, and threads are just a horrible mix of things.

Jesse Storimer: Yeah.

Brian Shirai: We have a garbage collector finalizer thread and people can add a finalizer to a Ruby object that runs Ruby code. It expects to operate on that sub-graph rooted in that object. So we have to maintain, essentially, generations of objects to run the finalizer. So we have a lot of machinery going on underneath the covers. When you hit an exec(2) call, like if you do system(), on OS X - at least up to Snow Leopard for sure - you cannot exec with multiple threads. We have to essentially kill off all the threads...

Jesse Storimer: Wow.

Brian Shirai: ...maintain their state so we can resurrect them, exec(2), and then come back from that system call, set the state back up and continue going forward. fork(2) is another one. There's a reason that the JVM doesn't particularly support fork(2) and semantics across fork(2). We work really hard to do that but it's a lot to do.

So how do we guarantee that correctness? Well, literally, we go through the code repeatedly and try to understand explicitly those interleavings that can result in incorrect things. Like a lock not being re-initialized after a fork(2) causing a locked state to be inconsistent with underlying data so that it just hangs forever trying to acquire that lock. Because the underlying data says "I should lock this before I access it", but the lock's already locked so it tries to lock, and it hangs waiting for someone to unlock that lock and that will never happen.

One of the things that I hope to do going forward is reduce the potential for those sorts of incorrect dead-locks and live-locks by using things like lock-free concurrent structures, or wait-free. The idea is that, for a wait-free structure is that any particular process, or in this case, thread, will complete in a finite number of steps regardless of how quickly other threads make progress.

Jesse Storimer: Okay.

Brian Shirai: There's some amazing research done in the field of concurrency. And there are a lot of options available. It's not just mutexes and semaphores and hoping and just adding more and more locks. That sort of thing.

Jesse Storimer: Right. So you mean that there's lots available in the academic world, and maybe in a lower-level language, but in Ruby the primitives we have are basically just mutexes at this point, right?

Brian Shirai: Right. Right.

Jesse Storimer: Okay. I just wanted to double-check.

Brian Shirai: Yeah, but it's a very interesting point. I mean, we're trying to provide a good platform for concurrency while ourselves struggling with the relatively primitive nature of the tools that we have available right now.

Jesse Storimer: Yeah. Okay. Good tangent there.

Brian Shirai: Yeah I can take you on a lot of tangents. But I should have warned you at the outset that I...

Jesse Storimer: I knew there was a reason why I wanted to talk with you.

So I want to get back to talking about interleavings for a minute, because we've been going over this a little bit, but just want to make it clear.

The idea you were getting at with interleavings was that sometimes the way that the pieces of your program, when made concurrent, are scheduled in a certain order that gives you the correct result, or sometimes you'll get the incorrect result. Sometimes there are 99 permutations that are correct and one that's incorrect. Or something equally rare, such that it looks to be thread-safe to the human eye, and the results always look good, until you get really heavy load or something.

Brian Shirai: Right.

Jesse Storimer: And then all of the sudden problems start cropping up.

Brian Shirai: Right.

Jesse Storimer: So I just wanted to acknowledge that as a property of these interleavings you were talking about.

Brian Shirai: Yes, that's absolutely true. And the point to make about them is that... what you just said I'll just reiterate: they can look correct, and they can perform correctly most of the time, until something changes the assumptions. Something changes the non-determinism under which the program, or the system, was operating most of the time. You hit that behavior that's incorrect or essentially undefined for what you are trying to do with the system. And those things are really frustrating to work out and figure out.

Jesse Storimer: Yeah, certainly. So you were saying on Rubinius, there are no tools to help you. You just start to recognize the patterns of things that look like they might cause issues, and you really just work out the logic and make sure it's as correct as it can be.

Brian Shirai: Right. A lot of it has to do with just being very explicit. We're very comfortable with conditionals in Ruby and in a lot of other languages. We're very comfortable saying if some value, then do this, else do that. It's just a logic.

And you can do the same thing with interleavings and program state. You can say: if the state is this, then we're here, else we're here. And its applying those same things, it's just trying to be very explicit. It's hard to look at one piece of code and think about "n" different threads of execution being in that code at the same time.

Jesse Storimer: Right.

Brian Shirai: So at each point, you have to be really clear that you understand where those lines are drawn. So if we imagine just chopping up a program into a bunch of pieces, understanding where those are. A lot of times we make assumptions about operations in Ruby being atomic, when absolutely they are not. And they only look atomic because we have a pretty high-level view. So, incrementing a value, or something like that, or something that looks atomic because we can't do anything in between that operation but a lot of stuff is going on that is definitely not atomic. If two threads are running that code simultaneously you could certainly get different interleavings.

Jesse Storimer: So you kind of have to work out the code, not as it's written in Ruby necessarily, but as a set of underlying, basic operations that are, presumably, themselves atomic. Break it down to that level.

Brian Shirai: And it's just, sort of, a stack of those things. Because then you get down to the VM and you have to do the same thing again with, in our case, C++ code. But then you get underneath that into how the compiler has ordered instructions and then you get down to the CPU. Does the CPU guarantee that this thing... I mean, we can very easily operate with 64-bit values on a 32-bit machine, right?

Jesse Storimer: Mm-hmm.

Brian Shirai: Or, the CPU is 64-bit essentially and the operating system is 32-bit but you have easy access to a long int, or something like that, or a pointer that could be 64-bits, and you do some operation that is not atomic. You're used to that being atomic but it's actually the CPU has to do that in two, sort of, steps. And so you can get different interleavings there. I'm not an expert C programmer either, but there are certainly people out there that can give you some pretty crazy stories of what happens when you make those assumptions. And it's easy to do. It's really easy to get comfortable thinking about things happening a certain way and make a mistake.

Jesse Storimer: Yeah, certainly. I think, in terms of the idea of getting comfortable with things working a certain way, this is exactly how most people feel about the GIL in MRI. That it works a certain way most of the time, and it's been that way for years with MRI. People have a certain assumption that gets passed on, even non-verbally, to new developers and people new to the community, that this is OK, or you don't have to think about this when you're doing Ruby, which is not always the case. Or certainly not the case as you have said.

Brian Shirai: Right.

The latest and greatest fads in concurrency

Jesse Storimer: So one more thing before we start talking more about Rubinius, which is: I wanted to ask you about a tweet you posted last week. I'm just going to read it out here because I think it's pretty relevant. So it's

"The reluctance of people to learn the semantics of threads, but their willingness to suffer all other matters of insanity is baffling to me."

And particularly somebody replied and said something like "Oh yes, this describes all evented programming." And I wanted to know if that's what you were getting at? By saying that? That people avoid threads so much that they would rather re-write all of their code using EventMachine, or some other evented system, so that it's single-threaded and "easier".

Brian Shirai: It's certainly related to that. I've sort of had this challenge with node.js for a while now. I gave a presentation about "Is node.js better?" (link) trying to look at: is that technology something that's going to save us from headaches or is it just pushing around the complexity and putting it in different places. So certainly, evented versus threaded can be one of the ways people try to divide things up to make sense of them, but also with processes. The sorts of things that we've done in Ruby, like God and Monit, I think both of those tools came out from having to monitor multiple Ruby worker processes and do that well.

We've got whole things like Phusion Passenger and Unicorn that put a lot of effort into that.

There are a lot of things that people do to avoid having to deal with the reality that CPUs can do threads pretty well. Threads, or lightweight processes, however you want to look at them; they can do those things pretty well. All the other more fancy concurrency models, they don't do so well, or don't do yet. Or people have high hopes for, I don't know, software transactional memory or something like that.

Jesse Storimer: Oh okay. Yeah.

Brian Shirai: Things that they think are going to save it, but it's not. I think one of the fundamental ideas that people fail to grasp is that concurrency is not something you just use everywhere. You use concurrent stuff where you need it, not everywhere. So you don't, just generally, have a whole program that uses STM. You need to use it appropriately. It's like you wouldn't use an array everywhere in your program, there's different data structures that are appropriate for different uses.

The idea that you're going to use STM everywhere doesn't make any sense. And it's easy to understand why. Because if you have something that does IO for instance, like that fabled "fire the missiles", you can't roll back firing the missiles in that transaction right?

Jesse Storimer: Right.

Brian Shirai: There's no way to do that. So you can't just throw STM at it and suddenly all your concurrency problems are fixed. So, yeah, it's really a thing. And the thing that I wonder most about is why have we convinced ourselves that multi-threading is so hard, therefore we don't try to teach multi-threading? Are we going to end up with 500 different metaphors for multi- threading like we have for monads?

Are we going to just give over multi-threading to the experts and, sort of, worship up the tower of the blessed few that can understand that? Or are we going to simply say: it's hard to reason about things that happen in a bunch of different ways, but let's figure out tools, language constructs, ways to teach it...

Jesse Storimer: Abstractions.

Brian Shirai: ...abstractions, and address the issue.

Jesse Storimer: Yeah. Great point. An example closer to multi-threading would be using mutexes or synchronizing stuff. People have the same argument: "Oh I don't want to use mutexes. I don't want to synchronize access to my variables because that's slow." But you also don't use that everywhere. You don't litter your program with everything locks a mutex, of course that's slow. But there are key places where that needs to be done, and then you organize your code in a certain way so it doesn't need to be done everywhere.

Brian Shirai: Absolutely. Right.

Removing the GIL from Rubinius

Jesse Storimer: Okay. So that's interesting. I want to get into talking more about Rubinius because I know that you guys have done a lot of work there to make Rubinius really embrace multi- threading in a way that MRI doesn't. You guys have encountered a lot of challenges, I'm sure. So the first thing I want to talk about is that... I know that Rubinius, the 1.0 branch and its releases, it had a global lock, right? Similar to MRI?

Brian Shirai: Right.

Jesse Storimer: A global lock around executing Ruby code. But the current release, the 2.0 release, there's a release candidate out, has removed the global lock to enable real concurrent execution, or sorry, real parallel execution of Ruby code.

Brian Shirai: Right.

Jesse Storimer: So can you talk about what kind of things you guys had to change, at least at a high-level, in the interpreter to allow this?

Brian Shirai: That's a great question, and it's a difficult one to answer because we worked very hard on architecture to support the garbage collector, the virtual machine, the just-in-time compiler, and concurrency.

Once, so it's of note that Evan attempted to make MRI sort of thread-safe with a multi-thousand line patch. Not really feeling that he accomplished that he gave up and started Rubinius.

So from the very beginning, Evan's goal was to have a system to run Ruby that would have better concurrency. But there are other things that we wanted to fix, like the virtual machine. When Evan started, 1.8 still predominantly used a very slow way of executing Ruby: an AST walking interpreter essentially. We had a bytecode virtual machine. Then their garbage collector is what's called a conservative garbage collector: it scans the execution stack, essentially looking at things that could be references to memory and doesn't know for sure if they are.

By contrast, Rubinius has a precise garbage collector. Everywhere where you can stash a reference to an object, we know that that is a reference to an object and so we can be sure that if we're looking at one of those things that it is a reference to an object. And if we scan the heap and we don't see a pointer to a region of memory then we'll know that nothing references that memory.

So the garbage collector is actually really important to understand from a concurrency perspective because... well, there's a couple parts of it too. Also, our garbage collector can move objects. So a lot of MRI C code assumes that once you have a handle to an object, that value, that is the pointer to the object, never changes, which leads to all kinds of crazy behavior when we try to run those C extensions under Rubinius where we have a moving garbage collector.

So you need to know for sure where those object references are so that if you were to access them concurrently that you can properly lock around that if you needed to. So the garbage collector is implicated in it, the just-in-time compiler is going to convert stuff that's running in the virtual machine, the virtual machine is executing into code that's going to run on the native processor. It's converted into machine code so the just-in-time compiler has to be sure that it's generating machine code that has consistent semantics with what the virtual machine would do.

So we had native threads, we had the global interpreter lock, and the garbage collector and we worked, just improving a lot of elements of the architecture. We had C code to begin with. Well, Evan did it in Ruby actually way back when. His first prototype essentially was Ruby, and that was pretty interesting. You can find remnants of that in the repository if you go way back to the beginning. But we moved from C to C++ and improved a lot of the elements of the architecture.

We also moved from a stackless architecture to one that uses the C stack, essentially. So as you make calls to Ruby methods the C stack gets deeper and deeper, under the way it is right now. Because when you call a Ruby method, that basically puts another frame on the C call stack because we're calling into another C function, so we get deeper and deeper in the stack.

In stackless, essentially, to call, you would say "I want to call this", and you would put that on a data structure and then that would exit and you would go back to this. It would say "is there something to call?" and you would say "yes", and it would go get that. It would call it and execute it and when that one calls it would put it on the stack and exit. So the stack never got deeper and there's this word that's really weird, because it's basically stackless for that type of thing, or a spaghetti stack structure, and then what's the inverse of that? Stackful? It's not really stackful so I don't know even what you'd call it. But we made that change as well.

Jesse Storimer: So this is all going from the version 1.x to 2.x?

Brian Shirai: This is the whole life of the project. The main thing that happened in 1.x to 2.x is removing the GIL and adding the encodings and other 1.9 stuff. Over a long period of time we just kept improving the architecture of Rubinius. And at the point that... because we knew we had this goal of better concurrency, but at some point I just said to Evan, I was like "Look, I think we really need to take out the global interpreter lock, and get that stuff into master so that people are using it and just figure out the bugs".

So Evan spent about two weeks removing the single lock and adding locks around little data structures in the virtual machine so that it was correct. In the end we've added a few more, but that one global lock turned into about 50 finer-grained locks. Somewhere around there.

Jesse Storimer: Okay.

Brian Shirai: It was a few weeks of work for Evan. Granted, Evan knows the system inside and out, architected it, programmed much of it, maintained it, these sort of things. So it's not that it was a trivial amount of work, but for someone who understood the system, and based on all the stuff that we built up in architecture to support this, it wasn't a huge undertaking.

So we got that merged in and it actually lived on another branch for a little while. We basically put it in with the new 1.9 stuff, just put it all together and pushed it in there. So the big change of going from a GIL to no GIL was an initial investment of a couple weeks' worth of work and a bunch of fixing bugs and continuing improving the architecture. But a lot of that was based on building up to that point where we could remove the global interpreter lock.

And the point I would make about that is: the architectural decisions that we were making as we were improving that, were improving things across the board. So it was improving the speed of the virtual machine, we were making garbage collector improvements, we were improving the implementation of our C API so we could run MRI C extensions, improving the way the JIT compiler thread was working, and then ultimately removing the global lock.

Jesse Storimer: Cool. So you have these finer-grained locks around the internals to preserve their correctness.

Multi-threading and C extensions

I want to ask more about the C API because as far as I understand it, the MRI C extension API is one of the main reasons why it has the global interpreter lock. Because the C extensions essentially run independently, unaware of each other, but they all access memory and they're writing into memory without any kind of synchronization, which can very easily lead to incorrect code or incorrect results. So do you guys do any kind of locking around C extensions?

Brian Shirai: Well, we used to. We had another global lock around C extensions.

Jesse Storimer: Okay.

Brian Shirai: And we decided to disable that by default. The reason is because we feel like it's better to figure out which things are not thread-safe and work with C extension authors to improve those, than just continue to suffer the performance problem of that global lock. We still have to be careful, right?

I think one thing that I've been trying to communicate more is that the C API is not an API. It's an ad-hoc collection of C functions that implement MRI. And because of the way object files are created and loaded, if you have a way to say: this is the address in memory that that thing starts, you can point the CPU's instruction pointer at it, and you can start executing, right?

Jesse Storimer: Right.

Brian Shirai: So MRI, as a collection of this bag of C functions that implement stuff, is not thread-safe. And so, of course C extensions that call into those functions are not, by themselves, going to be thread-safe. But we implement what we call the C API, we implement that compatibility layer in terms of the internals in our project, and so what we have to ensure is that if you call one of those functions, the things it does are going to be consistent.

Then that pushes that idea of thread-safety right up to the boundary of the C extension. So if the C extension itself is operating with its own native libraries, or however they implement stuff in a way that's not thread-safe, we hope to fix that.

Since we've disabled the global lock around the the C extensions around the C API, we haven't had major issues. It will be interesting to see if that's because we're lucky. People are running postgres as a database, things are working if they do things correctly, respecting the semantics of the postgres connection, or whatever those things are. We have people using that and it's working fine.

I think there's a lot of reasons given for why MRI can't remove the global interpreter lock. That one that we talked about way back when, that it somehow leads to program correctness, is absolutely untrue.

So taking that one completely off the table, the rest of them basically come down to this: there are tens of thousands of lines of C code in MRI. Every one of those lines of C code needs to be audited. Considerations about how it interacts with the garbage collector have to be made. Essentially that's what Evan tried to do and gave up and started Rubinius.

So I think the biggest reason is simply that it's too much work to do with MRI. If you're going to do it, you're probably going to build a new system. I think that's just the reality of it. So, yeah, it's not just that there are C extensions that couldn't be thread- safe or something; it's a bigger problem than that.

Jesse Storimer: Even taking it back from technology for a minute, Evan is someone who started the Rubinius project and was obviously committed to providing a good concurrency environment. I remember at this year's Rubyconf, people were asking Matz about his plans for concurrency in MRI or moving the GIL. He doesn't seem very interested in that problem. And so he's putting his focus, and his team's focus, in other areas. There certainly is a technical debt issue, but there's also no passionate leader to lead that project on the MRI team.

Brian Shirai: Right. I think it's interesting. In preparation for our chat I was actually Googling some stuff and I came across a blog post by, oh man, let me find this real quick because I always have trouble with his name. Ilya Grigorik, I think, so he's at Google now, but he wrote a post back in December 2nd, 2010 and he's talking about concurrency when Go was just coming out. We had goroutines and then there was more talk of actors. Of course Scala has an actor framework.

Rubinius has had actors in it since like 2007, not to get too distracted. But the thing that he quoted in this first paragraph in his blog post, he said that Bruce Tate, in his book "Seven Languages in Seven Weeks", asked Matz in an interview what feature he would like to change in Ruby if he could go back in time. According to this the quote is

"I would remove the thread and add actors or some other more advanced concurrency features"

If you just take that on the surface, you're like "Oh, okay, Matz doesn't really like threads. Yeah, but having no threads are painful, so some other concurrency mechanism would have been nice". But guess what? Every one of those other concurrency mechanisms requires that you make a lot of really fundamental architectural choices that support that. Otherwise, you just end up with actors that have a GIL around them. And only one object can be added to a mailbox at a time, or something like that, right?

So, why I think that was important to me, is that there's such a misunderstanding of concurrency that people would accept that argument without questioning: how would you have done those other more advanced concurrency features correctly when you didn't get threads correct? And maybe they would have, but I don't think that's the case because it's not that Ruby programs can't use threads correctly, it's that MRI can't run them in parallel. They're always serialized.

So, I think that's interesting to consider. Matz is not interested in concurrency. And I had a conversation with Koichi once where he even said: Matz would consider removing Thread altogether. That to me is pretty odd because processors are going more and more multi-core and I think the day where we have eight cores in our phones is not that far off, right?

Jesse Storimer: Yup.

Brian Shirai: And memory is still finite. Even if you're really efficient, if you multiply something by eight, it's eight times bigger, right? If you have one process, and you have to multiply it by eight to use those eight cores, it just got bigger. As the number of cores increase, that graph, the slope on that line where the amount of memory you use as you start increasing processes, that slope is so steep that it's just crazy. You can't scale processes. Even if you try to be super-efficient on memory. It's not going to work.

Jesse Storimer: When MRI first arrived in the 90s, processes was the model that made the most sense for a number of years. But in the 2000s when multi-core really started to come up, MRI didn't change its game, and it still isn't changing its game going forward. It's still advocating processes as the way to do concurrency with Ruby, whereas other implementations are taking the lead it seems.

Brian Shirai: Yeah, well it's an essential problem to address, right? Because we can't utilize the hardware that's coming any other way. So if we don't get that underlying, the very low level way the CPU is going to work with chunks of memory, and threads of execution, if we don't get that correct, then we're not going to be able to utilize that hardware.

And I think that as we touched on before, this idea that people will go to crazy lengths to avoid threads, node.js is going to end up in the exact same thing. The reason that I can say that without reservation is because one of the proponents of Node is tweeting about "who really needs multiple threads?" "What can't you do with processes?" And that's the last six years of hard work on Rubinius, just sort of thumbing your nose at that. We know this can't scale this way, that's why we've worked so hard, and here you are asking the same question, just with a different technology and essentially the same sorts of constraints.

The language itself doesn't have the concept of threads. So, to use the language, you can't really use threads. But Intel has this project called River Trail that's trying to bring concurrency to JavaScript in a way that will conform to the language and still utilize multi-cores.

Jesse Storimer: What's the name of the project again?

Brian Shirai: It's called River Trail.

Jesse Storimer: River Trail. Okay. I hadn't heard of that one.

Brian Shirai: It's an Intel project headed up by a guy... one of the guys that's in charge of that is a dude that knows a lot of stuff about concurrency and garbage collection and stuff like that. I actually chatted with him at JSConf in 2012. But there are definitely people looking at, even JavaScript, and how to capitalize on multi-core and still stay consistent with the language. It's just essential. It must be addressed.

Jesse Storimer: So I want to talk about some of the mechanisms that Rubinius provides that other implementations don't have.

What can Ruby (as a language) do to improve its concurrency story?

But since we're already here lets first talk about... OK, so Matz might actually be interested in dropping threads from MRI. I know that something you're passionate about is... the Ruby design process is what I'm thinking of. Making sense of how to decide what's best for Ruby's future.

I don't want to get too much into that topic, because I know it's also a big topic. I just want to talk more about... presuming that Ruby's future was a more open process, what kind of things do you think would benefit the community? Instead of removing Thread, adding more constructs to the core of the language that would facilitate concurrency? What kind of things are we missing that other languages already have?

And I want to emphasize things that are proven and dependable because I know there's a lot of new hot stuff that isn't necessarily in that category.

Brian Shirai: Absolutely. So I think a couple of things are really important. One is: we need a memory model. Essentially, we need to understand: when different threads make changes to memory, what is visible? What guarantees do we have about what is visible? So I think that's very important. And I think that there are people that question whether that's something we can do with all these different implementations.

But I think that we can by being very careful about what we describe as those semantics. I would think that the GIL implementation of threads would be a subset of that, so that things that are correct in that would be correct in the other implementations without the GIL. But it's something that people should be working on. It's something that, if we want to see Ruby continue to be a unified language, then we should work on that.

Jesse Storimer: That's something that the language implementers would figure out and implement, and then as a user of the language: if my threaded program is correct on one implementation it should behave the same on another implementation, right?

Brian Shirai: That's the idea behind it, right.

Jesse Storimer: Which is not guaranteed today.

Brian Shirai: It's really not, as much as we try. Even testing thread behavior, if you look at RubySpec, there's a need to really clean up the specs. There are specs that assert that there's a certain behavior when you kill a sleeping, dying thread. Or a dying, sleeping thread returns a particular status. There's absolutely no way to deterministically put a thread into a state where it's dying and sleeping and check its status.

So tons of these specs are just race conditions. And they fail, as expected, sporadically. And I just haven't had a chance to clean that up yet. The point is: it's hard enough already to just get a fairly clear semantics of just what a thread does when you do certain things to it. Like if you put it to sleep and then kill it from another thread and then check its status.

So, more complex stuff like saying: these two threads mutate memory, and what's visible to the other thread from thread one because of thread twos actions? That sort of thing. Especially when you get into the issue of... you have two threads: thread one and thread two. Thread one might be running on CPU one, core one, and thread two might be running on CPU two, core one. And in another case it might be both of them running on CPU one on different cores.

These sorts of things... Intel has a manual that describes some of these semantics but it's not something that we can just brush under the rug. It's something that we really need to address. So a memory model that is defined for Ruby as a language I think would be really important.

And I don't want to see it just left to implementation specific behavior, because it's hard enough when you have just a single method that might have a different behavior, let alone when you have this whole semantics of the program execution. I don't want to see that just left as an implementation detail: undefined behavior. There's no reason to do that, we can put more guarantees in there for people writing Ruby code.

Another big thing would be: a set of concurrent data structures.

Right now there's like Queue in standard library. But I think that should be in core, not in standard library. I think that it should be built in to the language. A set of concurrent data structures, things that have defined semantics under concurrent execution. I don't want to see a synchronized array replace every array in the system, there's no point in that.

It inhibits you from properly stratifying a system. You really want to build a system by having primitive behavior and compose that into more complex behavior, or being able to say "In this area I need speed, I am absolutely sure that only one thread will be running this code right now, and I want it to be as fast as possible, so it should do no synchronization at all. And I know this because this is the way it's defined and it's on me if I've incorrectly done that".

But I should have that facility, I shouldn't suddenly be forced to have every array in the system synchronized, and the overhead of that. So I think we need concurrent data structures and we have languages, we can look at java.util.concurrent. One of the really frustrating things to me is that while people on JRuby can use java.util.concurrent, it's not a solution for Ruby. And that's the frustrating thing.

We need this in Ruby. And it needs to be in Ruby, not in C, like complex and rational, which were re-written from Ruby to C, so now they're basically in the core in MRI in 1.9. They're no longer in std lib, so we have a little bit better numeric power, but now they're rewritten in C and they're not useful unless you run C.

Or if you don't run C then you port them to the language that you do run. Which does nothing good for Ruby, it just wastes effort and increases the likelihood of bugs and makes the challenge of consistency and unification of the language even worse.

So yeah, memory model, concurrent data structures, and then starting to build up an understanding of concurrency and the other concurrency models. Things like Actor is something that could conceivably be in core Ruby, because it's a fairly simple abstraction. And we could really then emphasize the semantics of programs composed of more actors, instead of wasting time trying to make sure that we have consistency among different implementations.

All right. So I see those as three different levels. The really fundamental is the memory model, the data structures that allow you to structure your programs in a lot of different ways, and then even one step higher, not quite to a framework, but one step higher to more sophisticated, more comprehensive model like an Actor model.

Jesse Storimer: A built-in abstraction that people can use.

Brian Shirai: Yeah.

Rubinius' built-in concurrency features

Jesse Storimer: Yeah. Okay. I think that's a good segue. Rubinius already has some of this stuff, right?

That you want Ruby to have. So for me, while looking around in the source and being a bit familiar with it, I know you guys have stuff like AtomicReference as a Ruby class in Rubinius? And you had an Actor implementation, used to be in a core, now it looks like it's in a gem. But like you said, it was there years ago and you guys had it for a long time.

Brian Shirai: Right. Yes. It was in the standard lib and you still had to require it, but it was sort of with Rubinius. It is now split out into a gem, which is better. I'm actually working right now on gemifying the entire standard lib for Rubinius. It's just going to be in gems and there's another gem that you can install. It's the same amount of work to maintain 1.8, 1.9, 2.0, as files in the repository as it is to maintain those gems. So it's just going to happen.

So splitting Rubinius::Actor out into a gem is a reasonable sort of thing I think, unless the language is going to say we're going to have Actor as a built-in construct. Which is not unreasonable, there are certainly languages that say Actors are built-in. There's a language called Hummus that is essentially an attempt to build a system with actors built-in. So it's reasonable to do it, but for now, or if, Actor was built in, it should certainly live in a gem.

AtomicReference is a utility class that Evan dropped in there. Basically it gives you an operation to compare-and-swap, or compare-and-set is more accurate because then it returns true or false. So basically, you have a value and you have a new value that you want to update it with, and you have an operation that will atomically get the old value, test it, if it's still the value that you think it is, and if it is, it will install the new value and return true.

If it's not, and it had changed from the time that you had made the assumption about it until the operation was actually being attempted, some other thread came in and did something, then it would fail and you would get false back. Then you can choose to re-attempt it or you can do something else.

That's what AtomicReference gives you and it's basically the idea of the compare-and-swap, which theoretically has been shown to be useful for implementing any of the various synchronization things that we have. Like semaphores and mutexes. So it's sort of a fundamental building block for building concurrent structures.

Jesse Storimer: Do you guys use that internally to build up your implementation of Mutex or anything?

Brian Shirai: We don't yet actually. We have something internally that we do use some for synchronization called a Channel, which is a communication pipe that's blocking when you read from it if nothing is available. It was actually the fundamental construct on which the Actors were built. So Actors have a 'mailbox', which is really a sort of a channel where one end of the channel is fixed inside of that object, that actor, and the other end is mobile.

So multiple things can write to it, but only one thing can read from it. And reading it, it's synchronous in the sense that reading from a channel that has nothing in it, blocks. So you can implement synchronization that way because you read to block, and when you want to wake that thing up, you write to it and then continue. So you can implement synchronization with it. So that's built-in. That's the sort of thing that we'd like to see in the toolbox of primitives that we can use to implement other concurrent data structures.

Jesse Storimer: So something like Channel, even something like AtomicReference, is it meant to be a public constant for Rubinius users to use? Or was it meant for you guys to build your bigger abstractions on internally?

Brian Shirai: The challenge is that: as we're trying to build out Rubinius and get it compatible with MRI in all the different ways, there is certainly stuff that is Rubinius specific and we try to put all that stuff under the Rubinius constant. So what we have to be very careful about, especially with 2.0, what I'm really trying to be careful about, and I hope to implement semantic versioning for 2.0+, is the idea of being careful about those APIs. So, ideally, we would see these sorts of things in Ruby. But when? I don't know.

And so as we build these things up, I have no particular objection to people using them. I think that there's two parts to that. One, we want to be careful that people understand what they should use. Ruby is pretty open, so that's difficult. And two, we need to be really careful that we respect versioning in those APIs so people aren't surprised like "Hey, where did this thing go?" it's like we're like "Oh, we moved it over there."

So that's one of the things that sort of delaying 2.0 right now. I want to make some good decisions and make sure things are done a little bit more correctly. And then as we go forward with the versioning, use versioning in a way that will differ quite a bit from the way MRI uses it. They've made semantic changes at patch levels and they've changed syntax from one tiny version to the next and that sort of thing.

Jesse Storimer: So there's also a gem, I forget the name of it, but if you were using Rubinius to implement some cool concurrent thing using these constructs, and you wanted to take that code to another implementation, these are implementations in Ruby right? That you can install on other implementations as a gem and re-use? Is that correct?

Brian Shirai: The Rubinius Actor stuff is.

Jesse Storimer: Okay.

Brian Shirai: Some of the stuff, like Channel, I don't think so. Channel has the VM level stuff, but I would have to refresh my memory on that. The goal would be, and what I'm certainly doing with gemifying standard lib, is to try to move stuff from C to Ruby. Because the more we have in Ruby, the more suitable it is. There's Topaz, there's Maglev, there's IronRuby, JRuby, so there's a lot of Ruby implementations and they all have fairly different underlying technology.

Jesse Storimer: This might be a good time to say it, because I don't think we've said it yet, but Rubinius, one of its prime directives is to implement as much of Ruby as possible in Ruby.

Brian Shirai: Right. Essentially, yeah. We certainly try to do that; we try to move the boundary between Ruby and the Ruby substrate as low as possible. It's certainly a goal of the project.

In terms of being fairly clear with your question, we would ultimately like to see these things as a standard Ruby feature. But we're not going to just throw stuff... we did that initially and we had ByteArray, and we had Tuple, and stuff like that, in the global namespace. So we pulled all those things back into Rubinius.

So we're not going to just put them into the main namespace. They'll be under Rubinius, but what would be good is to see those things in Ruby, and what we will do our best to do is be sure that when we change APIs we represent that in the version of Rubinius. So that if people do start building projects that use Rubinius related stuff that they can be confident that we're not going to break stuff, basically.

Jesse Storimer: Right. I found the gem I was thinking of. It's called Rubinius Core API; it does provide things like ByteArray and Channel but only to JRuby at the moment.

Brian Shirai: Right. Yeah. So JRuby, at various times, has had more or less interest in running Rubinius stuff. So they try to take the underlying stuff that we implement and provide a compatibility layer so that the stuff above that can work well. Ideally, what you have is just good Ruby code, right? And not worry about the implementation details at that lower level. But that's challenging and frustrating. And I want to be clear about that. Because when MRI goes and converts a bunch of Ruby code into C code, that doesn't further the cause of Ruby in the world.

Upcoming in Rubinius 2.0

Jesse Storimer: All right. Well, we can talk about, if you want to, the upcoming Rubinius 2.0 release. But I think we've covered a lot of really awesome topics in terms of multi-threading and thread-safety. It's been awesome. So tell us what's coming up in Rubinius, the next release.

Brian Shirai: Well 2.0 has primarily been focused on removing the GIL and adding 1.9 features. Now that MRI has released 2.0, I hope to get some of the 2.0 features in, but I'm not going to block the 2.0 release on those. We'll try to get some 2.0 stuff in and it probably will be experimental. The goal for 2.0 release is to solidify the 1.9 features, get Rubinius performance on par with MRI 1.9. We're targeting 1.9.3, so on par with 1.9.3 for Rails, basically.

We should be close and Dirkjan has been doing a lot of work on that and certainly has made some impressive gains in some areas. So what we want to focus on for 2.0 is that Rubinius is a drop in replacement for MRI. Wherever people are using MRI right now, except Windows, and Windows support is coming, but wherever people are using MRI right now, 1.9, 1.8, and soon 2.0, we want Rubinius to be suitable as a replacement.

And when I say drop-in, I mean drop-in. I mean you switch to Rubinius, you run bundle update so that it re-computes the dependencies and then you start your app, and that's it. There's no system to change, it's just the way MRI works. So C extensions work and that sort of thing.

Focusing on making sure that we're true to that promise, and that the performance is on par. There are certainly areas where we're slower, there are certainly areas where we're faster, and ultimately the goal is to be much faster, but it's a process to get there. I think if we can be on par, or if we can have better throughput by using threads, and lower memory pressure on these virtualized instances that people are using, like EC2 and all these other infrastructures and service providers.

If we can get Rubinius performing, in throughput, better than MRI by using far less memory: then the whole story, the overall story, which is like "can we have more memory on instances for memcached??" and that sort of thing, that whole story can be better, and then we can continue to work on performance.

Jesse Storimer: That's pretty exciting because I know myself and other people that I know too, over the years we've tried Rubinius on our day-to-day projects. I've tried it on the Shopify Rails app, even just to try out some of the developer tools that it ships with that offer a lot more promise than what we have with MRI. And there's always been something that isn't quite compatible. But if it's coming to that point where it's drop-in, that's pretty exciting for Rubinius.

Brian Shirai: Yeah, it's certainly the promise. For anyone, if you run into that, a gem that doesn't work for instance: open an issue for us please, because it helps a lot. There are certainly areas, like booting Rails, that are slower right now but we're looking at those.

In general, the promise is to be a drop-in replacement and if you find that it's not, please let us know. I can't guarantee how quickly we will address that, but it's the goal. We're getting so much closer, so much of 1.9 actually works, and Encoding works, and we're sort of tailing off on the big compatibility issues. Now we have 2.0, but there's not that much in 2.0 to be a challenge from the standpoint of compatibility.

Jesse Storimer: Right. Awesome. Well thanks, Brian, it's been a really enlightening conversation. Appreciate it.

Brian Shirai: Thank you, thanks for having me.

Jesse Storimer: Take care.

Easily clickable links

And if you're still reading, you should check out my upcoming book about multi-threading in Ruby. Sign up for the email list to stay updated and make sure you get the launch discount.