The world of frontend development evolves so quickly that sometimes it feels impossible to keep up. Even worse, what do you do with your existing code when technologies shift? Innovations in languages, libraries, and practices are great, but they’re only useful if they can be used in practice. To enable new technologies, we need good modernization tools to aid in the transition process.
At Benchling, we chose CoffeeScript in 2013, but about a year and a half ago, we recognized that the language was losing steam. The community and the tools were all moving to JavaScript, and with ES2015, JavaScript was getting much, much better. We made a decision: New code will be in JavaScript, and for now, the 200,000 lines of CoffeeScript will stay as-is. We’ll convert code over to JS as we find time for it. We wanted to be entirely on JavaScript, of course, but converting that much code seemed like a gigantic task.
I took that as a challenge, and a year and a half later, I’m happy to say that that gigantic task is now complete. We’re finally at zero lines of CoffeeScript, all with minimal disruption to feature work. At the time, there weren’t good tools for such a big conversion, so I had to help build them. I started contributing to the decaffeinate open source project, became the primary maintainer, and pushed it to completion. This post is about that long journey, and now that a tool like decaffeinate exists, it probably won’t take as long for you. 😄
At Benchling, we build tools to help biologists coordinate experiments, analyze and work with DNA, and more, and one of our strengths is our modern platform. We need to build and iterate quickly, so it’s important that we always have access to the latest and greatest tools. (We’re also hiring!)
We started with CoffeeScript in 2013, and that was the right choice at the time. The JavaScript language was at a standstill, and CoffeeScript overhauled it by introducing arrow functions, classes, destructuring, string templates, array spread, and lots of other cool features. It certainly had problems, like lack of standardization, surprises around variable scoping, and difficulty writing tooling, but it was the best there was.
Over the next few years, JavaScript made a comeback. Inspired largely by many of the CoffeeScript features, the JavaScript committee released the ES2015 spec (a.k.a. ES6) that moved the language up to parity with CoffeeScript in some cases and beyond in other cases, and with Babel, it was possible to use modern JavaScript in any browser without waiting. The community became focused on JavaScript, so the tools got better. ESLint now has hundreds of rules and many competing popular default configurations, Prettier is a whole new class of formatter, and TypeScript and Flow allow advanced type checking while staying true to JS. Switching programming languages is never an easy choice, but it seemed like JavaScript was a safer long-term bet, and it’s easy to use the two side-by-side, so writing new code in JavaScript seemed reasonable.
Having a split codebase is possible, but it ended up hurting productivity when working with older code. Switching back and forth between two languages is a pain, and two frontend languages meant more for new hires to learn and more trivial style rules to keep track of. And, of course, our new tools like ESLint couldn’t help when modifying old CoffeeScript code, where they probably would be most useful. A unified JavaScript codebase seemed like the right end goal.
One day in June 2016, a coworker decided to be ambitious: he was working with a complex 500-line React component, GuideGroupTable.coffee, and he was going to port it to ES6 first to make it easier to work with. Line-by-line, he carefully reworked the syntax and some of the logic until finally, GuideGroupTable.js was ready to try out. It broke the first time, and the second, but after fixing some little mistakes, things seemed to work, and he wrote some additional tests to gain more confidence in the conversion.
Even with tests, and even with a thorough code review, making such a big change to complex code is risky, and it ended up introducing two bugs. In all, converting the code, getting it reviewed, and fixing the resulting bugs took about two engineer-days total. But at least there was a shiny new modern-looking GuideGroupTable.js. One file down, about 1000 more to go.
Math quiz: If every 500 lines of CoffeeScript converted takes 2 days and introduces 2 bugs, how many days and bugs is it for 200,000 lines?
Answer: 800 days (or 6,400 hours), 800 bugs.
Spending over 3 engineer-years on switching programming languages would be a disastrous waste of time, especially for a startup; that’s time that could be spent working on real problems that scientists are facing. At the end of the day, cancer doesn’t care what programming language we use, and if sticking with CoffeeScript is the pragmatic choice, that’s what we’ll do.
Maybe this was just a particularly difficult case and other code would be easier. Even if these estimates are too high by a factor of ten, 640 hours and 80 bugs is still way too much to ask. If we’re going to move off of CoffeeScript at all, there needs to be a better way.
In a certain sense, automatically converting CoffeeScript to JavaScript is trivial: just run it through the CoffeeScript compiler. The newest versions of CoffeeScript (which didn’t exist at the time) produce pretty good code, but there’s still quite a bit to be desired:
var
and declares all variables at the top of the file/function.void 0
instead of undefined
or [].indexOf.call(array, x)
instead of
array.indexOf(x)
.
If you’re still on CoffeeScript version 1, you’ll see more issues: that compiler
also removes inline comments and doesn’t attempt to use newer JS features like
classes and destructuring.In short, the CoffeeScript compiler is a good compiler, but not a great codebase conversion tool, so we decided to look for other approaches.
We weren’t the only company with this problem, and through some discussions, I heard about a tool called decaffeinate that tried to solve this problem: it took in CoffeeScript and produced modern JavaScript, at least as much as it could. Unlike the CoffeeScript compiler, it keeps your formatting and tries to use modern syntax and patterns.
I installed it and tried it out on our codebase. Out of 1200 files, it failed on about 600 of them. On the plus side, it successfully converted half of the files.
It was promising, but it certainly wasn’t done. Lots of features were explicitly
not supported: ?.
, for own
loops, loops used as expressions, complex class
bodies, and more. And most of the time, you’d get a confusing error like
“Unexpected token” with no further context.
I hadn’t made many GitHub contributions before, but it seemed like a reasonable place to start. I filed some bugs, improved some error reporting, learned the code better and better, and eventually, I was a regular contributor. For the last year or so, I’ve been the primary maintainer.
If you want to try it out, the REPL is a nice interactive environment where you can type or paste some CoffeeScript and see what decaffeinate produces.
Is decaffeinate a compiler? Maybe it’s a transpiler? (Is that really any different?) I’d say it’s neither; it’s something else entirely. It’s similar to a compiler, but it has different goals and, in this case, a different architecture.
First, let’s see how the CoffeeScript compiler handles some example code:
1
|
|
First, it splits the code up into tokens: IF
, IDENTIFIER: isHappy
, THEN
,
IDENTIFIER: cheer
, CALL_START
, CALL_END
, TERMINATOR
.
Then, it parses the tokens into an abstract syntax tree (AST):
Each AST node then knows how to format itself into JavaScript code:
1 2 3 |
|
Notice that the one-liner was expanded to multiple lines. CoffeeScript doesn’t take any formatting into account when producing output, since it throws away the code and only uses the AST when generating JavaScript.
Many of the details in decaffeinate are similar, but it focuses on editing your
code, not rewriting it. Rather than producing new code, it uses the AST and
tokens to generate a list of changes to the code. In this case, decaffeinate
uses the token positions to recognize that the if
statement is a one-liner, then
makes these changes:
1 2 3 4 5 6 |
|
Those operations are then applied to the original code:
1
|
|
Your code only changes where it needs to, and in most cases, the shape of the code is the same as before.
Is this the right architecture? Frankly, it’s unclear, and projects like Prettier have made a compelling argument that codebases should simply have zero manual formatting anyway. But it’s the architecture decaffeinate went with.
When building an automatic translation from one programming language to another, you run into an uncomfortable possibility: the problem you’re solving might be impossible.
Let’s take an example. Ideally, CoffeeScript classes can always be converted to JavaScript classes. Here’s one CoffeeScript class that uses a feature you may not have seen:
1 2 3 4 5 6 7 |
|
Yep, you can do that in CoffeeScript! Not only can you conditionally define methods, you can run arbitrary code at class setup time. JS classes can only consist of plain methods (for now), so there simply isn’t a way to do this. You can try, and you’ll get a syntax error:
1 2 3 4 5 6 7 8 9 10 11 |
|
If you’re moving this code over manually, you’d think to yourself “I guess that trick doesn’t work in JavaScript, so I’ll rethink the code”, but that’s not possible with an automated tool.
So how should a tool like decaffeinate approach this? There are a few options:
#3 is where decaffeinate really shines: there are lots of little tricks to make the code look as good as possible while still being correct.
Let’s take a look at some CoffeeScript code:
1
|
|
Looks pretty clean (although you’ll get an unpleasant surprise if you forget the parens). JavaScript doesn’t have array comprehensions, so we’ll need to find some alternative.
Here’s how the CoffeeScript compiler handles it:
1 2 3 4 5 6 7 8 9 10 11 |
|
Yuck. It works, but it certainly wouldn’t pass code review. Here’s what we really want:
1
|
|
Looks pretty clean, and not too different from the original code. But is it
correct? It certainly looks right, but would you be willing to run this
transformation (a for b in c
to c.map(b => a)
) on hundreds of thousands of lines
of code? As it turns out, it’s not correct. If you want, you can stop reading
and think about what goes wrong.
I implemented this transformation, ran decaffeinate on a few thousand lines of code from work, did lots of testing, including existing automated tests, and it still ended up causing a crash in production.
Here’s the old CoffeeScript:
1
|
|
And here’s the new JavaScript:
1
|
|
The fundamental problem here is that you might not be working with an array. In
this case, it was a jQuery collection, where map
exists, but doesn’t produce an
array.
This problem is all over the place in CoffeeScript: every time you iterate
through something in CoffeeScript, it needs to be array-like. That means you
need arr.length
, arr[0]
, arr[1]
, etc. When ES2015 came around, they decided to
do things differently: they created the iterable protocol. Any object that wants
to be iterable can expose a Symbol.iterator
function that describes how to
iterate through the object. That’s what JavaScript uses for iteration when you
use for (const rhino of rhinos)
or children = [...children, baby]
. Arrays are
both array-like and iterable, but plenty of JS objects are array-like but not
iterable, and even more objects don’t have map
(or have a map
that works
differently). Strings, jQuery collections, and FileList (but only in Safari) can
all cause problems here.
Fortunately, there’s a simple built-in function that converts array-like objects
to true arrays: Array.from
. To systematically avoid this problem, we can throw
in Array.from
anywhere we iterate over anything, and this is what decaffeinate does.
This is one of the toughest design questions of decaffeinate: how important is
correctness really? If a bug only comes up every few thousand lines of code, do
we really want to defensively add Array.from
on all iteration operators?
After a lot of thinking, I decided that yes, decaffeinate should be completely correct on all reasonable code. There’s a judgement call about what is “reasonable”, but decaffeinate needs to be trustworthy. decaffeinate’s goal is to speed up the conversion from CoffeeScript to JS, and if you need to extensively manually test your code after decaffeinate, it will take much, much longer. decaffeinate needs to be as stable as any compiler.
If you use decaffeinate, you’ll probably see Array.from
all over the place. You
can disable it with the --loose
option, but I’d recommend instead looking through
the resulting code and only removing Array.from
when you’re confident that the
object you’re working with is already an array.
Between July and December 2016, decaffeinate slowly got more and more stable when run on the Benchling code. 500 files failing, then 350, then 200, then 50, then 10, then 5, then 0. After some cheers and excitement, I ran the tests for the newly-decaffeinated codebase, and they crashed immediately. More work to do, I guess. I fixed a bug, I fixed another bug, I tracked down and fixed an ESLint bug, I upgraded Babel to work around a Babel bug, I kept on iterating, and finally it got to a point of the tests passing.
So now what? decaffeinate seemed to work great on the code that was covered by tests, but what about the code that wasn’t covered by tests? I could write 100,000 lines of tests to try to get full code coverage, but that might take a while. I needed to find more test cases for decaffeinate, and fortunately, the internet has no shortage of CoffeeScript code. I set up a build system that would run decaffeinate on a bunch of projects, patch them to use Babel, then run all of the tests. A status board on the README was also a good motivating factor and a good way to see progress. Here were some of the results early on:
Testing out decaffeinate on codebases with a wide variety of authors and coding styles worked out great, and setting up the tests allowed me to discover the most critical bugs. There were lots and lots of bugs, but after a few months, I finally got everything working:
It’s rare that you ever get to say that a software project is complete, but I think this is one of those times. Technically, there’s still a bit more that would be useful, and I did some follow-up work to add a suggestions system and clean up usage, but decaffeinate is now in maintenance mode.
With decaffeinate stable, we were finally at a point where it wasn’t crazy to run it on large swaths of code without extensive testing or review. So what do you do when you have a tool like that and 150,000 lines of CoffeeScript to convert? Somehow, converting it all at once seemed a little worrying.
Here’s the strategy we took: every Tuesday, we’d pick off a large chunk of code
and run it through decaffeinate, first about 5000 lines, then larger and larger
chunks, up to 20,000 lines at once. We always had a 100%-machine-automated set
of commits, then two of us scanned through the converted code and made any safe
cleanups we could find. That did not mean blindly removing all usages of
Array.from
or other decaffeinate artifacts; it meant fixing formatting, removing
unused variables, renaming autogenerated variables to other names, etc. The code
at the end wasn’t pristine, but it was JavaScript, and it was much better than
what the CoffeeScript compiler gives.
The stability work paid off, and after repeating this process for about 10 weeks, we finally were able to get it completely to zero with very few issues. That also meant that it was possible to remove CoffeeScript from the build system, disable CoffeeLint, and delete our CoffeeScript style guide.
In a sense, the job still isn’t done. We still have lots of files with decaffeinate artifacts and disabled lint rules that should eventually be manually cleaned up. One of our biggest auto-converted files starts with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
So it’s not perfect, but it’s pretty easy to do the remaining cleanup work as you go.
Let’s compare my original estimate with the actual cost of converting 200,000 lines of code:
Estimated cost without decaffeinate: 6,400 hours, 800 bugs.
Actual cost with decaffeinate: ~100 hours, ~5 bugs.
(Not including the work on decaffeinate, which was in my spare time. 😄).
Those 100 hours were mostly spent spot-checking and cleaning up the resulting JS, code reviewing those cleanups, manually testing the relevant features, and deploying the changes gradually to reduce risk. 2000 lines of code per hour seemed like a safe rate, but theoretically, it could have all been done at once, and if you’re in a hurry, you could probably go much faster.
decaffeinate is extremely stable, but the conversion process still wasn’t without its bugs. By far, the largest source of bugs was human error. Let’s look at some code, before and after decaffeinate:
1 2 3 |
|
1 2 3 4 |
|
decaffeinate moved the default param to an if
statement in order to be
technically correct, wrapped itemIdsToKeep
in Array.from
, and changed the in
operator to the includes
method. The if
statement and the Array.from
could both
use cleanup, and in this case we played it safe and only removed the Array.from
,
since it clearly seemed like an array:
1 2 3 4 |
|
As it turns out, itemIdsToKeep
was not always an array. Purely by mistake, it
could sometimes be a DOM event instead. Both CoffeeScript and Array.from
silently treat it as the empty array in that case, but removing Array.from
exposed the crash.
This is an example of a theme that occurred a number of times: decaffeinate tends to break on code that is already buggy.
Let’s look at another example. Here’s some CoffeeScript before and JavaScript after:
1 2 3 |
|
1 2 3 4 |
|
Seems pretty simple, right? What could go wrong?
As it turns out, the problem wasn’t so much a conversion error as an unexpected
consequence in switching compilers: switching from CoffeeScript to Babel enables
strict mode. Without strict mode, assigning to a non-writable property is a
no-op, and in strict mode it crashes. In this case, response
was supposed to be
an object, but instead was sometimes the empty string, which meant that the
assignment simply did nothing. This was just a bug, but it was a benign bug
before and switching to Babel made it a crashing bug.
How can you avoid running into these same problems? Some ideas:
Here are some more things to keep in mind:
The hardest problem that decaffeinate had to solve is handling this
before super
in constructors, which is allowed in CS but not in JS. There’s a hack to trick
Babel and TypeScript into allowing it, but it’s an ugly hack that you should
remove ASAP after running decaffeinate.
To help run decaffeinate along with other tools, I wrote a tool called
bulk-decaffeinate that runs decaffeinate, several jscodeshift transforms (e.g.
converting code to JSX), eslint --fix
and several other steps. You’ll probably
want to either use that tool or one you write yourself, since decaffeinate is
more of a building block.
There have been some great previous blog posts on decaffeinate: Converting a large React Codebase from Coffeescript to ES6 and Decaffeinating a Large CoffeeScript Codebase Without Losing Sleep.
The decaffeinate docs have some pages to help you through the process: the Conversion Guide has some practical advice on running decaffeinate on a big codebase, and the Cleanup Suggestions page lists most of the subtleties you’ll see in the converted code. You can also drop in on Gitter to ask questions!
Every project and team is different, though, so regardless of how you approach it, you’ll likely need some real effort and care. But hopefully, once the conversion strategy, the configuration, and any other details are figured out, the gruntwork will all be safely handled by decaffeinate!
Programming is full of tradeoffs, and a common pitfall is to focus too much on your code when you should be focusing on the real problem that you’re solving. Whether it’s formatting, variable names, code structure, or even what programming language to use, none of it matters if the product doesn’t actually help people. Big code migrations can sometimes feel necessary and benefit in the long run, but they can also be a big distraction and a massive time sink. It’s often an awkward tradeoff, but with solid migration tools like decaffeinate, you can get the best of both worlds: you can benefit from the latest tools and practices while still focusing your time and brainpower on solving real problems and helping people.
]]>My perspective in this post is someone who has plenty of experience with programming both low-level and high-level languages, but is new to the Go language and curious about its internals and how it compares to other languages. In both the exploration process and the process of writing this post, I learned quite a bit about the right way to think about Go. Hopefully by reading this you’ll be able to learn some of those lessons and also share the same curiosity.
In Go, capitalization matters, and determines whether a name can be accessed from the outside world. For example:
1 2 3 4 5 6 7 |
|
Other languages use the terms “private” and “public” for this distinction, but in Go, they’re called unexported and exported.
But what about when you just want to hack and explore, and you can’t easily
modify the code in question? In Python, you might see a name starting with an
underscore, like _this_function_is_private
, meaning that it’s rude to call it
from the outside world, but the runtime doesn’t try to stop you. In Java, you
can generally defeat the private
keyword using reflection and the
setAccessible
method. Neither of these are good practice in professional code, but the
flexibility is nice if you’re trying to figure out what’s going wrong in a
library or if you want to build a proof of concept that you’ll later make more
professional.
It also can be used as a substitute when other ways of exploring aren’t available. In Python, nothing is compiled, so you can add print statements to the standard library or hack the code in other ways and it’ll just work. Java has an excellent debugging story, so you can learn a lot about library code by stepping through it in an IDE. In Go, neither of these approaches are very pleasant (as far as I’ve seen), so calling internal functions can sometimes be the next best thing.
In my specific case, the milestone I was trying to achieve was for my interpeter
to be able to successfully run the time.Now
function in the standard library.
Let’s take a look at the relevant part of
time.go:
1 2 3 4 5 6 7 8 |
|
The unexported function now
is implemented in assembly language and gets the
time as a pair of primitive values. The exported function Now
wraps that
result in a struct called Time
with some convenience methods on it (not
shown).
So what does it take to get an interpreter to correctly evaluate time.Now
?
We’ll need at least these pieces:
parser
and ast
packages are a big
help here.Time
(defined elsewhere in the file) and
its methods into some representation known to the interpreter.Now
.now
doesn’t have
a Go implementation, and prefer to just call the real time.now
. (There are
other possible approaches, but this one seemed reasonable.)To prove that that last bullet point was possible, I wanted to write a quick
dummy program that just called time.now
(even if it needed some hacky
mechanism), but this ended up being a lot harder than I was expecting. Most
discussions on the internet basically said “don’t do that”, but I decided that I
wouldn’t give up so easily.
A related goal is that I wanted a way to take a string name of a function and get that function back. It’s worth noting that it’s totally unclear if I should expect this problem to be solvable in the first place. In C, there’s no way to do it, in Java it’s doable, and in higher-level scripting languages it’s typically pretty easy. Go seems to be somewhere between C and Java in terms of the reflection capabilities that I expect, so I might be attempting something that simply can’t be done.
reflect
Reflection is the answer in Java, so maybe in Go it’s the same way? Sure enough,
Go has a great reflect
package that works in a lot of cases, and even lets you
read unexported struct fields by name, but it doesn’t seem to have any way to
provide access to top-level functions (exported or unexported).
In a language like Python, an expression like time.now
would take the time
object and pull off a field called now
. So you might hope to do something like
this:
1
|
|
But alas, in Go, time.now
is resolved at compile time, and time
isn’t its
own object that can be accessed like that. So it seems like reflect
doesn’t
provide an easy answer here.
runtime
While I was exploring, I noticed runtime.FuncForPC
as a way to
programmatically get the name of any function:
1
|
|
I dug into the implementation, and sure enough, the Go runtime
package keeps a
table of all functions and their names, provided by the linker. Relevant
snippets from symtab.go:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The moduledata
struct isn’t particularly friendly, but it looks like if I
could access it, then I should, theoretically, be able to loop through it to
find a pointer to a function with name "time.now"
. With a function pointer, it
should hopefully be possible to find a way to call it.
Unfortunately, we’re at the same place we started. I can’t access
firstmoduledata
, findmoduledatap
, or findfunc
for the same reason that I
can’t access time.now
. I looked through the package to find some place where
maybe it leaks a useful pointer, but I couldn’t find anything. Drat.
If I was desperate, I might attempt to guess function pointers and call
FuncForPC
until I find one with that right name. But that seemed like a recipe
for disaster, so I decided to look at other approaches.
An escape hatch that should definitely work is to just write my code in assembly
language. It should be possible to make an assembly function that calls
time.now
then connect that function to a Go function. I cloned the Go source
code and took a look at the
Darwin AMD64 implementation
of time.now
itself to see what it was like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Ugh. Maybe I shouldn’t be scared off by assembly, but learning the calling conventions and writing a separate wrapper for every function I wanted to call for every architecture didn’t seem pleasant. I decided to defer that idea and look at other options. Having a solution in pure Go would certainly be ideal.
Another escape hatch that seemed promising is CGo, which is Go’s mechanism for directly calling C functions from Go code. Here’s a first attempt:
1 2 3 4 5 6 7 8 9 10 11 |
|
And here’s the error that it gives:
1 2 3 4 5 6 |
|
Hmm, it seems to be putting an underscore before every function name, which
isn’t really what I want. Maybe there’s a way around that, but dealing with
multiple return values in time·now
seemed like it may be another barrier, and
from my reading, CGo calls have a lot of overhead because it’s doing a lot of
translation work so that you can integrate with existing C code. In Java speak,
it seems like it’s JNA, not JNI. So while CGo seems useful, it looks like it’s
not really the solution to my problem.
go:linkname
As I was digging through the standard library source code I saw something
interesting in the runtime
package in
stubs.go:
1 2 |
|
Interesting! I had seen semantically-meaningful comments like this before (like
with the CGo example, and also in other places), but I hadn’t seen this one. It
looks like it’s saying “linker, please use time.now
as the implementation of
runtime.time_now
”. Sure enough, the
documentation
suggests that this works, as long as your file imports unsafe
. So I tried it
out:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Let’s see what happens:
1 2 |
|
Drat. Isn’t the missing function body the whole point of the bodyless syntax to allow for externally-implemented functions? The spec certainly seems to think that it’s valid.
Just to see what would happen, I replaced the empty function body with a dummy implementation:
1 2 3 4 |
|
Then I tried again and got this error:
1 2 |
|
When I don’t implement the function, it complains that there’s no implementation, but when I do implement it, it complains that the function is implemented twice! How frustrating!
go:linkname
to workFor a while, it seemed like the go:linkname
approach wasn’t going to work out,
but then I noticed something suspicious: the error formatting is different. It
looks like the “missing function body” error is from the compiler, but the
“duplicate symbol” error is from the linker. Why would the compiler care about a
function body being missing, if it’s the linker’s job to make sure every symbol
gets an implementation?
I decided to dig into the code for the compiler to see why it might be generating this error. Here’s what I found in pgen.go:
1 2 3 4 5 6 7 |
|
Something is causing that inner if
statement to evaluate to true
, and my
function doesn’t have to do with init
, so it looks like pure_go
is nonzero
when it should be zero. Searching for pure_go
shows this compiler flag:
1
|
|
Makes sense: if your code doesn’t have any way of defining external functions,
then it’s friendlier to give an error at compile time with the location of the
problem. But it looks like go:linkname
was overlooked somewhere in the
process. It certainly is a bit of an edge case.
After some searching, I found the culprit in build.go:
1 2 3 4 5 |
|
So it’s just counting the number of non-Go files of each type. Since I’m only compiling with Go files, it assumes that every function needs a body. But on the plus side, the code suggests a workaround: just add a file of any of those types. I already know how to use CGo, so let’s try that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
And here’s what happens when you try to build that:
1 2 3 |
|
A different error! Now the linker is complaining that the function doesn’t
exist. After some experimentation, I discovered that CGo seems to cause
go:linkname
to be disabled for that file. If I remove the import of "C"
and
move it to another file, then compile the two together, then I get this output:
1
|
|
It worked! If your only goal is to get access to time.now
, then this is good
enough, but I’m hoping that I can go a bit further.
Now that I know that go:linkname
works, I can use it to access the
firstmoduledata
structure mentioned in attempt #2, which is a table containing
information on all compiled functions in the binary. My hope is that I can use
it to write a function that takes a function name as a string, like
"time.now"
, and provides that function.
One problem is that runtime.firstmoduledata
has type runtime.moduledata
,
which is an unexported type, so I can’t use it in my code. But as a total hack,
I can just copy the struct to my code (or, at least, enough of it to keep the
alignment correct) and pretend that my struct is the real thing. From there, I
can pretty much copy the code from the runtime
package to do a full scan
through the list of functions until I find the right one:
1 2 3 4 5 6 7 8 9 10 11 |
|
This seems to work! This code:
1 2 |
|
prints this:
1 2 |
|
So the underlying code pointer is correct! Now we just need to figure out how to use it…
You would think that having a function pointer would be the end of the story.
In C you could just cast the pointer value to the right function type, then call
it. But Go isn’t quite so generous. For one, Go normally doesn’t just let you
cast between types like that, but unsafe.Pointer
can be used to circumvent
some safety checks. You might try just casting it to a function of the proper
type:
1 2 3 |
|
But that type of cast doesn’t compile; pointers can’t be cast to functions, not
even using unsafe.Pointer
. What if we literally cast it to a pointer to a
func
type?
1 2 3 |
|
This compiles, but crashes at runtime:
1 2 3 |
|
(Look at that fault address. Apparently someone had a sense of humor.)
This isn’t a surprising outcome; functions in Go are first-class values, so
their implementation is naturally more interesting than in C. When you pass
around a func
, you’re not just passing around a code pointer, you’re passing
around a function value of some sort, and we’ll need to come up with a
function value somehow if we’re to have any hope of calling our function. That
function value needs to have our pointer as its underlying code pointer.
I didn’t see any obvious ways to create a function value from scratch, so I
figured I’d take a different approach: take an existing function value and hack
the code pointer to be the one I want. After spending some time reading
how interfaces work in Go and reading
the implementation of the reflect
library, an approach that seemed promising was to treat the function as an
interface{}
(that’s Go’s equivalent of Object
or void*
or any
: a type
that includes every other type), which internally stores it as a (type, pointer)
pair. Then I could pull the pointer off and work with it reliably. The reflect
source code suggests that the code pointer (the pointer to the actual machine
code) is the first value in a function object.
So, as a first attempt, I created a dummy function called timeNow
then defined
some structs to make it easy to swap out its code pointer with the real
time.now
code pointer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
And, as you might guess, it crashed:
1 2 3 |
|
After some experimenting, I discovered that the crash was happening even
without calling the function. The crash was from the line
timeNowInterfacePtr.funcPtr.codePtr = timeNowCodePtr
. After double-checking
that the pointers were what I expect, I realized the problem: the function
object I was modifying was probably in the code segment, in read-only memory.
Just like how the machine code isn’t going to change, Go expects that the
timeNow
function value isn’t going to change at runtime. What I really needed
to do was allocate a function object on the heap so that I could safely change
its underlying code pointer.
So how do you dynamically allocate a function in Go? That’s what lambdas are
for, right? Let’s try using one! Instead of the top-level timeNow
, we can
write our main
function like this (the only difference is the new definition
of timeNow
):
1 2 3 4 5 6 7 8 9 |
|
And, again, it crashes. I’ve seen how lambdas work in other languages, so I suspected why: when a lambda takes in no outside variables, there’s no need to do an allocation each time, so a common optimization is to just have a single shared instance for simple lambdas like the one I wrote, so probably I’m again trying to write to the code segment. To work around this, we can trick the compiler into allocating a new function object each time by making the function a real closure and pulling in a variable from the outer scope (even a trivial one):
1 2 3 4 5 6 7 8 9 10 |
|
And it works!
1
|
|
This code is almost useful, but wouldn’t really work as a library yet because it would require the function’s type to be hard-coded into the library. We could have the library caller pass in a function that will be modified, but that has gotchas like the read-only memory problem I ran into above.
Instead, I looked around at possible API approaches, and I got some nice inspiration from the example code for reflect.MakeFunc.
We’ll try writing a GetFunc
function that can be used like this:
1 2 3 |
|
But how can GetFunc
allocate a function value? Above, we used a lambda
expression, but that doesn’t work if the type isn’t known until runtime.
Reflection to the rescue! We can call reflect.MakeFunc
to create a function
value with a particular type. In this case, we don’t really care what the
implementation is because we’re going to be modifying its code pointer anyway.
We end up with a reflect.Value
object with a memory layout like this:
The ptr
field in the reflect.Value
definition is unexported, but we can use
reflection on the reflect.Value
to get it, then treat it as a pointer to a
function object, then modify that function object’s code pointer to be what we
want. The full code looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
And that’s it! That function modifies its argument to be the function at
codePtr
. Implementing the main GetFunc
API is just a matter of tying
together FindFuncWithName
and CreateFuncForCodePtr
; details are in the
source code.
This API still isn’t ideal; the library user still needs to know the type in
advance, and if they get it wrong, there will be horrible consequences at
runtime. At the end of the day, the library isn’t significantly more useful than
go:linkname
, but it has some advantages, and is a good starting point for more
interesting tricks. It’s potentially possible, but probably harder, to make a
function that takes a string
and returns a reflect.Value
of the function,
which would be ideal. But that’s out of scope for now. Also, the
README
has a number of other warnings and things to consider. For example, this
approach will sometimes completely break due to function inlining.
Go is certainly in an interesting place in the space of languages. It’s dynamic
enough that it’s not crazy to look up a function by name, but it’s much more
performance-focused than, say, Java. The reflection capabilities are good for a
systems language, but sometimes the better escape hatch is to just use
unsafe.Pointer
.
I’d be happy to hear any feedback or corrections in the comments below. Like I mentioned, I’m still learning all of this stuff, so I probably overlooked some things and got some terminology wrong.
]]>Instead of the old strategy of keeping the A/B test data continuously up-to-date using memcache (and periodically flushing to the App Engine datastore), the new system would report events by simply logging them, and those log statements would eventually make their way into Google BigQuery through an hourly MapReduce job based on log2bq. From there, the real A/B test processing would be done completely in SQL using BigQuery queries. Since we were revamping GAE/Bingo using BigQuery, there was an obvious name: BigBingo.
Of course, that three-sentence description leaves out pretty much all of the details and makes some dangerous assumptions, but the high-level plan ended up working (with some tweaks), and I’m happy to say that all A/B tests at Khan Academy are now running under BigBingo, and the last remnants of the old GAE/Bingo system are finally being removed. In this post, I’ll talk about why a rewrite was so important, how we think about A/B testing, and some specific points of the design and architecture of BigBingo. There are some additional cool details that are probably deserving of their own blog post, so look out for those in the future.
Most developers at Khan Academy had a sense that the old GAE/Bingo system was slow and BigBingo would improve overall performance, but I doubt anybody expected that the improvement would be as dramatic as it was. When I finally flipped the switch to turn off GAE/Bingo, the average latency across all requests went from a little over 300ms to a little under 200ms. The most important pages had even better results, but I’ll let the pictures do the talking:
The logged-in homepage got twice as fast:
The logged-out homepage improved even more:
And our memcache went from “worryingly overloaded” to “doing great”:
Of course, making the site faster makes users happier, but it has another big benefit: cost savings. If requests can be processed twice as fast, we only need half as many App Engine instances running at a given time, so our App Engine bill drops significantly. Since Khan Academy is a nonprofit running off of donations, it’s important to us to have an efficient infrastructure so we can focus our money on improving education, not upkeep.
A/B testing isn’t just some occasional tool at Khan Academy; it’s an important part of our engineering culture, and almost any change that we care about goes through an A/B test first, often multiple A/B tests. Right now, there are 57 A/B tests actively running, which is an average of about two active A/B tests per developer.
Unlike “traditional” A/B testing (which tends to maximize simple metrics like ad clicks, purchases, etc.), Khan Academy’s A/B testing tries to maximize student learning. That means that we try out much more advanced changes than just little UI tweaks, and measuring success is a huge challenge by itself. Here are some examples of A/B tests we do:
In the years since GAE/Bingo was written, the devs at KA learned a thing or two about the right way to do A/B testing and what an A/B testing framework should really do, so BigBingo diverges from GAE/Bingo in a few important ways.
Here’s what you’d see when looking at the latest results of an old GAE/Bingo experiment (I added a red box to indicate the “real” data; everything else is derived from those numbers):
For clear-cut results, a few numbers will do just fine, but what do you do when the results are unexpected or completely nonsensical? In GAE/Bingo, the best thing you could do was shrug and speculate about what happened. BigBingo is different: we keep around all raw results (user-specific conversion totals) as well as the source logs and the intermediate data used to determine those results. Since it’s all in BigQuery, investigating anomalies is just a matter of doing some digging using SQL.
Keeping the raw data also makes it easy to do more advanced analysis after-the-fact:
Here’s a big-picture overview of what BigBingo looks like:
Here’s how the data flows from a user-facing request to BigQuery, then to the dashboard UI:
hash(experiment_name + user_id) % num_alternatives
, so no RPCs are
necessary to coordinate that information.Most of the details are reasonably straightforward, but I’ll dig into what’s probably the most controversial aspect of this architecture: the decision to use Google BigQuery for all storage and processing.
If you’re not familiar with BigQuery, it’s a hosted Google service (really an externalization of an internal Google service called Dremel) that allows you to import giant datasets and run nearly-arbitrary SQL queries on them. BigQuery is way faster than MapReduce-based SQL engines like Hive: you’re borrowing thousands of machines from Google for just the duration of your query, and all work is done in-memory, so queries tend to finish in just a few seconds. The primary use case for BigQuery is for human users to manually dig into data, but I’ll show how it can also be used to build stateful data pipelines.
BigQuery supports nearly all SQL, but don’t let that fool you into thinking it’s anything close to a relational database! It has a small set of primitives that’s different from anything I’ve worked with before:
Operation | Price |
---|---|
Import CSV/JSON data into a table | Free |
Run a SELECT query | 0.5 cents per GB in all columns touched |
Store a query result as a new table | Free |
Apppend query results to the end of a table | Free |
Copy a table | Free |
There are a few more operations that are less common, but the ones I listed are the most common ones.
Notice anything missing? No transactions? Not even a way to update or delete rows? No way to pull out a single row without paying for the whole table? How can you possibly keep track of A/B test results in such a restricted system? You’re pretty much stuck with the following rule:
To update a table, you must completely rebuild it from scratch with the new values.
It certainly feels like an architectural sin to rebuild all of your data over and over, but it’s not as unreasonable as you might think. BigQuery is quite cost-efficient (some rough numbers suggest that it’s more than 10x as cost-efficient as MapReduce running on App Engine), and there are lots of little tricks you can do to reduce the size of your tables. By designing the table schemas with space-efficiency in mind, I was able to reduce BigBingo’s data usage from 1TB ($5 per query) to 50GB (25 cents per query). (I’ll go over the details in a future blog post.)
There are also some huge usability advantages to using BigQuery over another batch processing system like MapReduce:
At first, having to deal with only immutable tables felt like an annoying restriction that I just had to live with, but as soon as I started thinking about making the system robust, immutability was a huge benefit. When thinking through the details, I discovered some important lessons:
This is probably best explained by looking at a simple data pipeline similar to BigBingo. First, I’ll give a straightforward but fragile approach, then show how it can be improved to take advantage of BigQuery’s architecture.
Goal: Keep track of the median number of problems solved, problems attempted, and hints taken across all users.
Every hour, the following queries are done to update the latest_medians
table:
Step 1: Extract the events from the logs table into a table called
new_event_totals
:
1 2 3 4 5 6 7 8 9 10 |
|
Step 2: Combine new_event_totals
with the previous full_event_totals
table to make the new full_event_totals
table:
1 2 3 4 5 6 7 8 9 |
|
Step 3: Find the median of each metric, and write the result to a table
called latest_medians
:
1 2 3 4 5 6 |
|
This code ends up working, but it doesn’t handle failure very well:
To solve all of these problems, just include a timestamp in each table’s name. The background job then takes as a parameter the particular hour to process, rather than trying to figure out what the “latest” hour is. Here’s what it would do if you run it with the hour from 6:00 to 7:00 on July 1:
Step 1: Read from logs_2014_07_01_06
(the logs for 6:00 to 7:00 on July
1) and write to the table new_event_totals_logs_2014_07_01_06
(the new events
for 6:00 to 7:00 on July 1).
Step 2: Read from new_event_totals_logs_2014_07_01_06
and full_event_totals_2014_07_01_06
and write to the table
full_event_totals_2014_07_01_07
(the full totals as of 7:00 on July 1).
Step 3: Read from full_event_totals_2014_07_01_07
and write to the table
latest_medians_2014_07_01_07
(the medians as of 7:00 on July 1).
The job takes the hour to process as a parameter, and reads the previous hour’s tables to generate that hour’s tables. Making three new tables per hour may seem wasteful, but it’s actually just as easy and cheap as the previous scheme. The main problem is that the tables will just accumulate over time, so you’ll rack up storage costs. Fortunately, BigQuery makes it easy to give an expiration time to tables, so you can set them to be automatically deleted after a week (or however long you want to keep them).
The core BigBingo job has 7 queries/tables instead of 3, but it is designed with the same strategy of keeping all old tables, and this strategy has helped tremendously and kept BigBingo’s data consistent in the face of all sorts of errors:
The system is completely foolproof: I could replace cron with a thousand monkeys repeatedly triggering BigBingo jobs with random UNIX timestamps, and the system would still eventually make progress and remain completely consistent (although it would be a little less cost-efficient). That level of safety means I can stop worrying about maintenance and focus on more important things.
Ideally, BigBingo would be a self-contained open-source library, but it currently has enough dependencies on internal KA infrastructure that it’s both hard to make general and would be a bit difficult to use in isolation anyway.
That said, there’s no reason I can’t share the code, so here’s a Gist with pretty much all of the code (at the time of this blog post). I put an MIT license on it, so feel free to base work off of it or use any of the self-contained pieces.
Khan Academy has lots of open-source projects, and it’s not out of the question for BigBingo to be made truly open source in the future, so let me know in the comments if you think you would use it.
Curious about any more details? Think we’re doing A/B testing all wrong? Let me know in the comments!
]]>