Alan Pierce

Adventures in Go: Accessing Unexported Functions

| Comments

I’ve been learning the Go programming language in my favorite way: by writing a Go interpreter in Go. The source code so far is on GitHub, but the point of this post is to tell the story of a particularly interesting challenge I ran into: programmatically accessing unexported (a.k.a. private) functions in other packages. In the process of figuring this out, I learned about lots of escape hatches and limitations of Go. The result of the adventure is a little library I made and put on GitHub called go-forceexport. If you want a TLDR, you can just read the source code, but hopefully you’ll find the adventure interesting as well!

My perspective in this post is someone who has plenty of experience with programming both low-level and high-level languages, but is new to the Go language and curious about its internals and how it compares to other languages. In both the exploration process and the process of writing this post, I learned quite a bit about the right way to think about Go. Hopefully by reading this you’ll be able to learn some of those lessons and also share the same curiosity.

What are unexported functions are and why would I want to call one?

In Go, capitalization matters, and determines whether a name can be accessed from the outside world. For example:

1
2
3
4
5
6
7
func thisFunctionIsUnexported() {
    fmt.Println("This function starts with a 't' and can only be called from this package.")
}

func ThisFunctionIsExported() {
    fmt.Println("This function starts with a 'T' and can be called from anywhere.")
}

Other languages use the terms “private” and “public” for this distinction, but in Go, they’re called unexported and exported.

But what about when you just want to hack and explore, and you can’t easily modify the code in question? In Python, you might see a name starting with an underscore, like _this_function_is_private, meaning that it’s rude to call it from the outside world, but the runtime doesn’t try to stop you. In Java, you can generally defeat the private keyword using reflection and the setAccessible method. Neither of these are good practice in professional code, but the flexibility is nice if you’re trying to figure out what’s going wrong in a library or if you want to build a proof of concept that you’ll later make more professional.

It also can be used as a substitute when other ways of exploring aren’t available. In Python, nothing is compiled, so you can add print statements to the standard library or hack the code in other ways and it’ll just work. Java has an excellent debugging story, so you can learn a lot about library code by stepping through it in an IDE. In Go, neither of these approaches are very pleasant (as far as I’ve seen), so calling internal functions can sometimes be the next best thing.

In my specific case, the milestone I was trying to achieve was for my interpeter to be able to successfully run the time.Now function in the standard library. Let’s take a look at the relevant part of time.go:

1
2
3
4
5
6
7
8
// Provided by package runtime.
func now() (sec int64, nsec int32)

// Now returns the current local time.
func Now() Time {
    sec, nsec := now()
    return Time{sec + unixToInternal, nsec, Local}
}

The unexported function now is implemented in assembly language and gets the time as a pair of primitive values. The exported function Now wraps that result in a struct called Time with some convenience methods on it (not shown).

So what does it take to get an interpreter to correctly evaluate time.Now? We’ll need at least these pieces:

  • Parse the file into a syntax tree. Go’s parser and ast packages are a big help here.
  • Transform the struct definition for Time (defined elsewhere in the file) and its methods into some representation known to the interpreter.
  • Implement multiple assignment, struct creation, and the other operations used in Now.
  • While evaluating line 6, my interpreter should notice that now doesn’t have a Go implementation, and prefer to just call the real time.now. (There are other possible approaches, but this one seemed reasonable.)

To prove that that last bullet point was possible, I wanted to write a quick dummy program that just called time.now (even if it needed some hacky mechanism), but this ended up being a lot harder than I was expecting. Most discussions on the internet basically said “don’t do that”, but I decided that I wouldn’t give up so easily.

A related goal is that I wanted a way to take a string name of a function and get that function back. It’s worth noting that it’s totally unclear if I should expect this problem to be solvable in the first place. In C, there’s no way to do it, in Java it’s doable, and in higher-level scripting languages it’s typically pretty easy. Go seems to be somewhere between C and Java in terms of the reflection capabilities that I expect, so I might be attempting something that simply can’t be done.

Attempt #1: reflect

Reflection is the answer in Java, so maybe in Go it’s the same way? Sure enough, Go has a great reflect package that works in a lot of cases, and even lets you read unexported struct fields by name, but it doesn’t seem to have any way to provide access to top-level functions (exported or unexported).

In a language like Python, an expression like time.now would take the time object and pull off a field called now. So you might hope to do something like this:

1
reflect.ValueOf(time).MethodByName("now").Call([]reflect.Value{})

But alas, in Go, time.now is resolved at compile time, and time isn’t its own object that can be accessed like that. So it seems like reflect doesn’t provide an easy answer here.

Attempt #2: runtime

While I was exploring, I noticed runtime.FuncForPC as a way to programmatically get the name of any function:

1
runtime.FuncForPC(reflect.ValueOf(f).Pointer()).Name()

I dug into the implementation, and sure enough, the Go runtime package keeps a table of all functions and their names, provided by the linker. Relevant snippets from symtab.go:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
var firstmoduledata moduledata  // linker symbol

func findmoduledatap(pc uintptr) *moduledata {
    for datap := &firstmoduledata; datap != nil; datap = datap.next {
        ...
    }
}

func findfunc(pc uintptr) *_func {
    datap := findmoduledatap(pc)
    ...
}

func FuncForPC(pc uintptr) *Func {
    return (*Func)(unsafe.Pointer(findfunc(pc)))
}

The moduledata struct isn’t particularly friendly, but it looks like if I could access it, then I should, theoretically, be able to loop through it to find a pointer to a function with name "time.now". With a function pointer, it should hopefully be possible to find a way to call it.

Unfortunately, we’re at the same place we started. I can’t access firstmoduledata, findmoduledatap, or findfunc for the same reason that I can’t access time.now. I looked through the package to find some place where maybe it leaks a useful pointer, but I couldn’t find anything. Drat.

If I was desperate, I might attempt to guess function pointers and call FuncForPC until I find one with that right name. But that seemed like a recipe for disaster, so I decided to look at other approaches.

Attempt #3: jump down to assembly language

An escape hatch that should definitely work is to just write my code in assembly language. It should be possible to make an assembly function that calls time.now then connect that function to a Go function. I cloned the Go source code and took a look at the Darwin AMD64 implementation of time.now itself to see what it was like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// func now() (sec int64, nsec int32)
TEXT time·now(SB),NOSPLIT,$0-12
    CALL    nanotime<>(SB)

    // generated code for
    //    func f(x uint64) (uint64, uint64) { return x/1000000000, x%100000000 }
    // adapted to reduce duplication
    MOVQ    AX, CX
    MOVQ    $1360296554856532783, AX
    MULQ    CX
    ADDQ    CX, DX
    RCRQ    $1, DX
    SHRQ    $29, DX
    MOVQ    DX, sec+0(FP)
    IMULQ   $1000000000, DX
    SUBQ    DX, CX
    MOVL    CX, nsec+8(FP)
    RET

Ugh. Maybe I shouldn’t be scared off by assembly, but learning the calling conventions and writing a separate wrapper for every function I wanted to call for every architecture didn’t seem pleasant. I decided to defer that idea and look at other options. Having a solution in pure Go would certainly be ideal.

Attempt #4: CGo

Another escape hatch that seemed promising is CGo, which is Go’s mechanism for directly calling C functions from Go code. Here’s a first attempt:

1
2
3
4
5
6
7
8
9
10
11
/*
int time·now();
int time_now_wrapper() {
    return time·now();
}
*/
import "C"

func callTimeNow() {
    C.time_now_wrapper()
}

And here’s the error that it gives:

1
2
3
4
5
6
Undefined symbols for architecture x86_64:
  "_time·now", referenced from:
      _time_now_wrapper in foo.cgo2.o
      __cgo_b552a62968a6_Cfunc_time_now_wrapper in foo.cgo2.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Hmm, it seems to be putting an underscore before every function name, which isn’t really what I want. Maybe there’s a way around that, but dealing with multiple return values in time·now seemed like it may be another barrier, and from my reading, CGo calls have a lot of overhead because it’s doing a lot of translation work so that you can integrate with existing C code. In Java speak, it seems like it’s JNA, not JNI. So while CGo seems useful, it looks like it’s not really the solution to my problem.

Attempt #5: go:linkname

As I was digging through the standard library source code I saw something interesting in the runtime package in stubs.go:

1
2
//go:linkname time_now time.now
func time_now() (sec int64, nsec int32)

Interesting! I had seen semantically-meaningful comments like this before (like with the CGo example, and also in other places), but I hadn’t seen this one. It looks like it’s saying “linker, please use time.now as the implementation of runtime.time_now”. Sure enough, the documentation suggests that this works, as long as your file imports unsafe. So I tried it out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
package main

import (
    "fmt"
    _ "unsafe"
)

//go:linkname time_now time.now
func time_now() (sec int64, nsec int32)

func main() {
    sec, nsec := time_now()
    fmt.Println(sec, nsec)
}

Let’s see what happens:

1
2
# command-line-arguments
./sandbox.go:9: missing function body for "time_now"

Drat. Isn’t the missing function body the whole point of the bodyless syntax to allow for externally-implemented functions? The spec certainly seems to think that it’s valid.

Just to see what would happen, I replaced the empty function body with a dummy implementation:

1
2
3
4
//go:linkname time_now time.now
func time_now() (sec int64, nsec int32) {
    return 0, 0
}

Then I tried again and got this error:

1
2
# command-line-arguments
2016/03/16 22:50:31 duplicate symbol time.now (types 1 and 1) in main and /usr/local/Cellar/go/1.6/libexec/pkg/darwin_amd64/runtime.a(sys_darwin_amd64)

When I don’t implement the function, it complains that there’s no implementation, but when I do implement it, it complains that the function is implemented twice! How frustrating!

Getting go:linkname to work

For a while, it seemed like the go:linkname approach wasn’t going to work out, but then I noticed something suspicious: the error formatting is different. It looks like the “missing function body” error is from the compiler, but the “duplicate symbol” error is from the linker. Why would the compiler care about a function body being missing, if it’s the linker’s job to make sure every symbol gets an implementation?

I decided to dig into the code for the compiler to see why it might be generating this error. Here’s what I found in pgen.go:

1
2
3
4
5
6
7
func compile(fn *Node) {
    ...
    if len(fn.Nbody.Slice()) == 0 {
        if pure_go != 0 || strings.HasPrefix(fn.Func.Nname.Sym.Name, "init.") {
            Yyerror("missing function body for %q", fn.Func.Nname.Sym.Name)
            return
        }

Something is causing that inner if statement to evaluate to true, and my function doesn’t have to do with init, so it looks like pure_go is nonzero when it should be zero. Searching for pure_go shows this compiler flag:

1
obj.Flagcount("complete", "compiling complete package (no C or assembly)", &pure_go)

Makes sense: if your code doesn’t have any way of defining external functions, then it’s friendlier to give an error at compile time with the location of the problem. But it looks like go:linkname was overlooked somewhere in the process. It certainly is a bit of an edge case.

After some searching, I found the culprit in build.go:

1
2
3
4
5
extFiles := len(p.CgoFiles) + len(p.CFiles) + len(p.CXXFiles) + len(p.MFiles) + len(p.FFiles) + len(p.SFiles) + len(p.SysoFiles) + len(p.SwigFiles) + len(p.SwigCXXFiles)
...
if extFiles == 0 {
    gcargs = append(gcargs, "-complete")
}

So it’s just counting the number of non-Go files of each type. Since I’m only compiling with Go files, it assumes that every function needs a body. But on the plus side, the code suggests a workaround: just add a file of any of those types. I already know how to use CGo, so let’s try that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package main

import (
    "C"
    "fmt"
    _ "unsafe"
)

//go:linkname time_now time.now
func time_now() (sec int64, nsec int32)

func main() {
    sec, nsec := time_now()
    fmt.Println(sec, nsec)
}

And here’s what happens when you try to build that:

1
2
3
main.main: call to external function main.time_now
main.main: main.time_now: not defined
main.main: undefined: main.time_now

A different error! Now the linker is complaining that the function doesn’t exist. After some experimentation, I discovered that CGo seems to cause go:linkname to be disabled for that file. If I remove the import of "C" and move it to another file, then compile the two together, then I get this output:

1
1458197809 407398202

It worked! If your only goal is to get access to time.now, then this is good enough, but I’m hoping that I can go a bit further.

Looking up functions by name

Now that I know that go:linkname works, I can use it to access the firstmoduledata structure mentioned in attempt #2, which is a table containing information on all compiled functions in the binary. My hope is that I can use it to write a function that takes a function name as a string, like "time.now", and provides that function.

One problem is that runtime.firstmoduledata has type runtime.moduledata, which is an unexported type, so I can’t use it in my code. But as a total hack, I can just copy the struct to my code (or, at least, enough of it to keep the alignment correct) and pretend that my struct is the real thing. From there, I can pretty much copy the code from the runtime package to do a full scan through the list of functions until I find the right one:

1
2
3
4
5
6
7
8
9
10
11
func FindFuncWithName(name string) (uintptr, error) {
    for moduleData := &Firstmoduledata; moduleData != nil; moduleData = moduleData.next {
        for _, ftab := range moduleData.ftab {
            f := (*runtime.Func)(unsafe.Pointer(&moduleData.pclntable[ftab.funcoff]))
            if f.Name() == name {
                return f.Entry(), nil
            }
        }
    }
    return 0, fmt.Errorf("Invalid function name: %s", name)
}

This seems to work! This code:

1
2
ptr, _ := FindFuncWithName("math.Sqrt")
fmt.Printf("Found pointer 0x%x\nNormal function: %s", ptr, math.Sqrt)

prints this:

1
2
Found pointer 0x104250
Normal function: %!s(func(float64) float64=0x104250)

So the underlying code pointer is correct! Now we just need to figure out how to use it…

Calling a function by pointer

You would think that having a function pointer would be the end of the story. In C you could just cast the pointer value to the right function type, then call it. But Go isn’t quite so generous. For one, Go normally doesn’t just let you cast between types like that, but unsafe.Pointer can be used to circumvent some safety checks. You might try just casting it to a function of the proper type:

1
2
3
ptr, _ := FindFuncWithName("time.now")
timeNow := (func() (int64, int32))(unsafe.Pointer(ptr))
sec, msec := timeNow()

But that type of cast doesn’t compile; pointers can’t be cast to functions, not even using unsafe.Pointer. What if we literally cast it to a pointer to a func type?

1
2
3
ptr, _ := FindFuncWithName("time.now")
timeNow := (*func() (int64, int32))(unsafe.Pointer(ptr))
sec, msec := (*timeNow)()

This compiles, but crashes at runtime:

1
2
3
unexpected fault address 0xb01dfacedebac1e
fatal error: fault
[signal 0xb code=0x1 addr=0xb01dfacedebac1e pc=0x7e450]

(Look at that fault address. Apparently someone had a sense of humor.)

This isn’t a surprising outcome; functions in Go are first-class values, so their implementation is naturally more interesting than in C. When you pass around a func, you’re not just passing around a code pointer, you’re passing around a function value of some sort, and we’ll need to come up with a function value somehow if we’re to have any hope of calling our function. That function value needs to have our pointer as its underlying code pointer.

I didn’t see any obvious ways to create a function value from scratch, so I figured I’d take a different approach: take an existing function value and hack the code pointer to be the one I want. After spending some time reading how interfaces work in Go and reading the implementation of the reflect library, an approach that seemed promising was to treat the function as an interface{} (that’s Go’s equivalent of Object or void* or any: a type that includes every other type), which internally stores it as a (type, pointer) pair. Then I could pull the pointer off and work with it reliably. The reflect source code suggests that the code pointer (the pointer to the actual machine code) is the first value in a function object.

So, as a first attempt, I created a dummy function called timeNow then defined some structs to make it easy to swap out its code pointer with the real time.now code pointer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func timeNow() (int64, int32) {
    return 0, 0
}

type Interface struct {
    typ     unsafe.Pointer
    funcPtr *Func
}

type Func struct {
    codePtr uintptr
}

func main() {
    timeNowCodePtr, _ := FindFuncWithName("time.now")
    var timeNowInterface interface{} = timeNow
    timeNowInterfacePtr := (*Interface)(unsafe.Pointer(&timeNowInterface))
    timeNowInterfacePtr.funcPtr.codePtr = timeNowCodePtr
    sec, msec := timeNow()
    fmt.Println(sec, msec)
}

And, as you might guess, it crashed:

1
2
3
unexpected fault address 0x129e80
fatal error: fault
[signal 0xa code=0x2 addr=0x129e80 pc=0x20be]

After some experimenting, I discovered that the crash was happening even without calling the function. The crash was from the line timeNowInterfacePtr.funcPtr.codePtr = timeNowCodePtr. After double-checking that the pointers were what I expect, I realized the problem: the function object I was modifying was probably in the code segment, in read-only memory. Just like how the machine code isn’t going to change, Go expects that the timeNow function value isn’t going to change at runtime. What I really needed to do was allocate a function object on the heap so that I could safely change its underlying code pointer.

So how do you dynamically allocate a function in Go? That’s what lambdas are for, right? Let’s try using one! Instead of the top-level timeNow, we can write our main function like this (the only difference is the new definition of timeNow):

1
2
3
4
5
6
7
8
9
func main() {
    timeNowCodePtr, _ := FindFuncWithName("time.now")
    timeNow := func() (int64, int32) { return 0, 0 }
    var timeNowInterface interface{} = timeNow
    timeNowInterfacePtr := (*Interface)(unsafe.Pointer(&timeNowInterface))
    timeNowInterfacePtr.funcPtr.codePtr = timeNowCodePtr
    sec, msec := timeNow()
    fmt.Println(sec, msec)
}

And, again, it crashes. I’ve seen how lambdas work in other languages, so I suspected why: when a lambda takes in no outside variables, there’s no need to do an allocation each time, so a common optimization is to just have a single shared instance for simple lambdas like the one I wrote, so probably I’m again trying to write to the code segment. To work around this, we can trick the compiler into allocating a new function object each time by making the function a real closure and pulling in a variable from the outer scope (even a trivial one):

1
2
3
4
5
6
7
8
9
10
func main() {
    timeNowCodePtr, _ := FindFuncWithName("time.now")
    var x int64 = 0
    timeNow := func() (int64, int32) { return x, 0 }
    var timeNowInterface interface{} = timeNow
    timeNowInterfacePtr := (*Interface)(unsafe.Pointer(&timeNowInterface))
    timeNowInterfacePtr.funcPtr.codePtr = timeNowCodePtr
    sec, msec := timeNow()
    fmt.Println(sec, msec)
}

And it works!

1
1458245880 151691912

Turning it into a library

This code is almost useful, but wouldn’t really work as a library yet because it would require the function’s type to be hard-coded into the library. We could have the library caller pass in a function that will be modified, but that has gotchas like the read-only memory problem I ran into above.

Instead, I looked around at possible API approaches, and I got some nice inspiration from the example code for reflect.MakeFunc.

We’ll try writing a GetFunc function that can be used like this:

1
2
3
var timeNow func() (int64, int32)
GetFunc(&timeNow, "time.now")
sec, msec := timeNow()

But how can GetFunc allocate a function value? Above, we used a lambda expression, but that doesn’t work if the type isn’t known until runtime.

Reflection to the rescue! We can call reflect.MakeFunc to create a function value with a particular type. In this case, we don’t really care what the implementation is because we’re going to be modifying its code pointer anyway. We end up with a reflect.Value object with a memory layout like this:

Function pointer layout

The ptr field in the reflect.Value definition is unexported, but we can use reflection on the reflect.Value to get it, then treat it as a pointer to a function object, then modify that function object’s code pointer to be what we want. The full code looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
type Func struct {
    codePtr uintptr
}

func CreateFuncForCodePtr(outFuncPtr interface{}, codePtr uintptr) {
    // outFuncPtr is a pointer to a function, and outFuncVal acts as *outFuncPtr.
    outFuncVal := reflect.ValueOf(outFuncPtr).Elem()
    newFuncVal := reflect.MakeFunc(outFuncVal.Type(), nil)
    funcValuePtr := reflect.ValueOf(newFuncVal).FieldByName("ptr").Pointer()
    funcPtr := (*Func)(unsafe.Pointer(funcValuePtr))
    funcPtr.codePtr = codePtr
    outFuncVal.Set(newFuncVal)
}

And that’s it! That function modifies its argument to be the function at codePtr. Implementing the main GetFunc API is just a matter of tying together FindFuncWithName and CreateFuncForCodePtr; details are in the source code.

Future steps and lessons learned

This API still isn’t ideal; the library user still needs to know the type in advance, and if they get it wrong, there will be horrible consequences at runtime. At the end of the day, the library isn’t significantly more useful than go:linkname, but it has some advantages, and is a good starting point for more interesting tricks. It’s potentially possible, but probably harder, to make a function that takes a string and returns a reflect.Value of the function, which would be ideal. But that’s out of scope for now. Also, the README has a number of other warnings and things to consider. For example, this approach will sometimes completely break due to function inlining.

Go is certainly in an interesting place in the space of languages. It’s dynamic enough that it’s not crazy to look up a function by name, but it’s much more performance-focused than, say, Java. The reflection capabilities are good for a systems language, but sometimes the better escape hatch is to just use unsafe.Pointer.

I’d be happy to hear any feedback or corrections in the comments below. Like I mentioned, I’m still learning all of this stuff, so I probably overlooked some things and got some terminology wrong.

BigBingo: Khan Academy's New BigQuery-backed A/B Testing Framework

| Comments

In late January, I joined Khan Academy as the third member of the infrastructure team. We were just starting a big performance push, so I spent my first week or two improving our profiling tools and finding and fixing some easy slow spots (including speeding up the home page by over a second). However, every time I profiled any page, I found that the A/B testing framework, GAE/Bingo, was always one of the slowest pieces. I had some ideas on how to make some incremental improvements to speed it up, but instead, I was given a much more ambitious project: to rethink and rewrite the whole A/B testing system from scratch. I had plenty of sources of guidance and advice, but I, the new guy, was to be the sole owner and author of the new system. I was up for the task, but it was nevertheless a bit daunting.

Instead of the old strategy of keeping the A/B test data continuously up-to-date using memcache (and periodically flushing to the App Engine datastore), the new system would report events by simply logging them, and those log statements would eventually make their way into Google BigQuery through an hourly MapReduce job based on log2bq. From there, the real A/B test processing would be done completely in SQL using BigQuery queries. Since we were revamping GAE/Bingo using BigQuery, there was an obvious name: BigBingo.

Of course, that three-sentence description leaves out pretty much all of the details and makes some dangerous assumptions, but the high-level plan ended up working (with some tweaks), and I’m happy to say that all A/B tests at Khan Academy are now running under BigBingo, and the last remnants of the old GAE/Bingo system are finally being removed. In this post, I’ll talk about why a rewrite was so important, how we think about A/B testing, and some specific points of the design and architecture of BigBingo. There are some additional cool details that are probably deserving of their own blog post, so look out for those in the future.

BigBingo is fast!

Most developers at Khan Academy had a sense that the old GAE/Bingo system was slow and BigBingo would improve overall performance, but I doubt anybody expected that the improvement would be as dramatic as it was. When I finally flipped the switch to turn off GAE/Bingo, the average latency across all requests went from a little over 300ms to a little under 200ms. The most important pages had even better results, but I’ll let the pictures do the talking:

The logged-in homepage got twice as fast:

Logged-in homepage percentiles

The logged-out homepage improved even more:

Logged-out homepage percentiles

And our memcache went from “worryingly overloaded” to “doing great”:

Memcache compute units

Of course, making the site faster makes users happier, but it has another big benefit: cost savings. If requests can be processed twice as fast, we only need half as many App Engine instances running at a given time, so our App Engine bill drops significantly. Since Khan Academy is a nonprofit running off of donations, it’s important to us to have an efficient infrastructure so we can focus our money on improving education, not upkeep.

A/B testing at Khan Academy

A/B testing isn’t just some occasional tool at Khan Academy; it’s an important part of our engineering culture, and almost any change that we care about goes through an A/B test first, often multiple A/B tests. Right now, there are 57 A/B tests actively running, which is an average of about two active A/B tests per developer.

Unlike “traditional” A/B testing (which tends to maximize simple metrics like ad clicks, purchases, etc.), Khan Academy’s A/B testing tries to maximize student learning. That means that we try out much more advanced changes than just little UI tweaks, and measuring success is a huge challenge by itself. Here are some examples of A/B tests we do:

  • We have a sophisticated system that tries to understand a student’s knowledge level and recommend the best exercises for them, and we’re always making little improvements to it. For example, we recently tried out a new system to detect when users are below their learning edge and advance them through the material more quickly. Learners under the new system progressed further, as expected, and they almost always stayed at their advanced level rather than being demoted, so we rolled out the new algorithm to all users.
  • We’ve been experimenting with providing message snippets to teach our users that learning makes them not just more knowledgeable, but smarter as well. This specific motivational approach turns out to be surprisingly effective, and results in increased site usage and learning outcomes, so we’re trying out various different approaches to deliver the message in the most effective way.
  • We recently switched the homepage background to one we liked better. It didn’t improve any metrics noticeably, but the A/B test verified that it didn’t hurt anything either, so we kept the new background. We run lots of little experiments like this one.

What’s different about BigBingo?

In the years since GAE/Bingo was written, the devs at KA learned a thing or two about the right way to do A/B testing and what an A/B testing framework should really do, so BigBingo diverges from GAE/Bingo in a few important ways.

The data

Here’s what you’d see when looking at the latest results of an old GAE/Bingo experiment (I added a red box to indicate the “real” data; everything else is derived from those numbers):

GAE/Bingo Dashboard

For clear-cut results, a few numbers will do just fine, but what do you do when the results are unexpected or completely nonsensical? In GAE/Bingo, the best thing you could do was shrug and speculate about what happened. BigBingo is different: we keep around all raw results (user-specific conversion totals) as well as the source logs and the intermediate data used to determine those results. Since it’s all in BigQuery, investigating anomalies is just a matter of doing some digging using SQL.

Keeping the raw data also makes it easy to do more advanced analysis after-the-fact:

  • Instead of just using the mean number of conversions, you can look at more interesting statistics like the median, percentiles, and standard deviation, and you can ignore outliers.
  • You can cross-reference A/B test participation with more sophisticated metrics, like the learning gain metric that the data science team is working on.
  • You can segment your analysis based on any property you can come up with. For example, you might want to focus on only new users or only long-term users.

Some other differences

  • Instead of experiments needing to pick their metrics up-front, every experiment automatically tracks every conversion (currently we have about 200 of them).
  • Since KA already has a culture of A/B testing, BigBingo encourages high-quality experiments rather than focusing on making experiments as easy as possible. Every experiment has an owner assigned and a description explaining what the experiment is for and the experimental hypothesis. When an A/B test is stopped, the author is forced to fill in a conclusion. Whenever an experiment starts or finishes, a notification is sent to the entire team, so it’s easy to see what kinds of ideas everyone else is trying out and how they are going.
  • BigBingo doesn’t try to be real-time, which makes the implementation much simpler. After all, up-to-the-minute A/B test results are pretty useless anyway.
  • The use of memcache counters added a little bit of complexity to GAE/Bingo, which I was happy to get rid of. Not only were there complex details, running BigBingo and GAE/Bingo side-by-side revealed some additional race conditions in GAE/Bingo that weren’t known yet.

Implementation

Here’s a big-picture overview of what BigBingo looks like:

The architecture

Here’s how the data flows from a user-facing request to BigQuery, then to the dashboard UI:

  1. When a user enters into an A/B test, that event is recorded through a log statement. The user’s alternative is chosen through a deterministic function similar to hash(experiment_name + user_id) % num_alternatives, so no RPCs are necessary to coordinate that information.
  2. When a user triggers a conversion event, it is recorded through a log statement.
  3. In the hourly LogToBigQuery log export process, the raw log events (called “bingo events”) are parsed and extracted into custom BigQuery columns to be included in the normal request logs tables.
  4. Every two hours, the BigBingo Summarize task runs and processes the new logs to compute the latest A/B test numbers, following a few rules:
    • If a user participates multiple times in an A/B test (which is common), only the earliest event counts.
    • A conversion event only counts for an experiment if the event happened after the user first participated in the experiment.
    • For each conversion, BigBingo computes both the total number of times the conversion was triggered and the total number of distinct users that triggered the conversion.
  5. The latest data is cleaned up and copied to a “publish” dataset where it can be conveniently accessed.
  6. The BigBingo dashboard, a web UI, queries these results to disply all data about a given experiment: the historical participant and conversion numbers, as well as p-values for each alternative.

Most of the details are reasonably straightforward, but I’ll dig into what’s probably the most controversial aspect of this architecture: the decision to use Google BigQuery for all storage and processing.

About BigQuery

If you’re not familiar with BigQuery, it’s a hosted Google service (really an externalization of an internal Google service called Dremel) that allows you to import giant datasets and run nearly-arbitrary SQL queries on them. BigQuery is way faster than MapReduce-based SQL engines like Hive: you’re borrowing thousands of machines from Google for just the duration of your query, and all work is done in-memory, so queries tend to finish in just a few seconds. The primary use case for BigQuery is for human users to manually dig into data, but I’ll show how it can also be used to build stateful data pipelines.

BigQuery supports nearly all SQL, but don’t let that fool you into thinking it’s anything close to a relational database! It has a small set of primitives that’s different from anything I’ve worked with before:

| Operation | Price | | ——— | —– | | Import CSV/JSON data into a table | Free | Run a SELECT query | 0.5 cents per GB in all columns touched | Store a query result as a new table | Free | Apppend query results to the end of a table | Free | Copy a table | Free

There are a few more operations that are less common, but the ones I listed are the most common ones.

Notice anything missing? No transactions? Not even a way to update or delete rows? No way to pull out a single row without paying for the whole table? How can you possibly keep track of A/B test results in such a restricted system? You’re pretty much stuck with the following rule:

To update a table, you must completely rebuild it from scratch with the new values.

That’s crazy, right?

It certainly feels like an architectural sin to rebuild all of your data over and over, but it’s not as unreasonable as you might think. BigQuery is quite cost-efficient (some rough numbers suggest that it’s more than 10x as cost-efficient as MapReduce running on App Engine), and there are lots of little tricks you can do to reduce the size of your tables. By designing the table schemas with space-efficiency in mind, I was able to reduce BigBingo’s data usage from 1TB ($5 per query) to 50GB (25 cents per query). (I’ll go over the details in a future blog post.)

There are also some huge usability advantages to using BigQuery over another batch processing system like MapReduce:

  • When I was designing the queries and schema, I could try things out on real production data from within the BigQuery web UI and get results back in seconds. This meant that I could work through almost all architectural details before having to write a line of Python code.
  • Once I did start to write code, I could run the full job completely from my laptop, with no need to push code out to servers in order to iterate. Whenever a query had a problem, it showed up in the “Query History” section of the BigQuery web UI, and I could easily debug it there.
  • Sanity-checking the intermediate steps and hunting down problems in the data was easy because everything was immediately accessible through SQL.

Taking advantage of immutable storage

At first, having to deal with only immutable tables felt like an annoying restriction that I just had to live with, but as soon as I started thinking about making the system robust, immutability was a huge benefit. When thinking through the details, I discovered some important lessons:

  • Never append to the end of a table. Keep tables immutable and queries idempotent.
  • A table’s name should exactly define its contents.

This is probably best explained by looking at a simple data pipeline similar to BigBingo. First, I’ll give a straightforward but fragile approach, then show how it can be improved to take advantage of BigQuery’s architecture.

Goal: Keep track of the median number of problems solved, problems attempted, and hints taken across all users.

Every hour, the following queries are done to update the latest_medians table:

Step 1: Extract the events from the logs table into a table called new_event_totals:

1
2
3
4
5
6
7
8
9
10
-- Results are written to new_event_totals.
SELECT
    user_id,
    -- Count the number of times the event column matches each event name.
    SUM(event = "problem_correct") AS problem_correct_count,
    SUM(event = "problem_attempt") AS problem_attempt_count,
    SUM(event = "hint_taken") AS hint_taken_count,
FROM logs_2014_07_01
WHERE time >= 1404194400 AND time < 1404198000
GROUP EACH BY user_id  -- GROUP EACH BY is just a large-scale GROUP BY

Step 2: Combine new_event_totals with the previous full_event_totals table to make the new full_event_totals table:

1
2
3
4
5
6
7
8
9
-- Results are written to full_event_totals by querying to a temp table and
-- copying over full_event_totals.
SELECT
    user_id,
    SUM(problem_correct_count) AS problem_correct_count,
    SUM(problem_attempt_count) AS problem_attempt_count,
    SUM(hint_taken_count) AS hint_taken_count,
FROM new_event_totals, full_event_totals -- UNION ALL
GROUP EACH BY user_id

Step 3: Find the median of each metric, and write the result to a table called latest_medians:

1
2
3
4
5
6
-- Results are written to latest_medians.
SELECT
    NTH(50, QUANTILES(problem_correct_count, 100)) AS median_problems_correct,
    NTH(50, QUANTILES(problem_attempt_count, 100)) AS median_problems_attempted,
    NTH(50, QUANTILES(hint_taken_count, 100)) AS median_hints_taken,
FROM full_event_totals

This code ends up working, but it doesn’t handle failure very well:

  • Step 2 isn’t idempotent. For certain errors (e.g. a connection timeout when submitting the query), there’s no way to know for sure if it’s safe to retry, or if it succeeded in the first place.
  • If the job fails between steps 2 and 3, it can’t be safely retried, so you need to either manually re-run step 3 or live with out-of-date results for an hour.
  • If the job fails before step 2 finishes and isn’t retried before the next job runs, the event_totals table will lose all events from that hour.
  • If the logs weren’t successfully loaded into BigQuery, Step 1 will think that nothing happend in that hour and will silently compute the wrong results.

To solve all of these problems, just include a timestamp in each table’s name. The background job then takes as a parameter the particular hour to process, rather than trying to figure out what the “latest” hour is. Here’s what it would do if you run it with the hour from 6:00 to 7:00 on July 1:

Step 1: Read from logs_2014_07_01_06 (the logs for 6:00 to 7:00 on July 1) and write to the table new_event_totals_logs_2014_07_01_06 (the new events for 6:00 to 7:00 on July 1).

Step 2: Read from new_event_totals_logs_2014_07_01_06 and full_event_totals_2014_07_01_06 and write to the table full_event_totals_2014_07_01_07 (the full totals as of 7:00 on July 1).

Step 3: Read from full_event_totals_2014_07_01_07 and write to the table latest_medians_2014_07_01_07 (the medians as of 7:00 on July 1).

The job takes the hour to process as a parameter, and reads the previous hour’s tables to generate that hour’s tables. Making three new tables per hour may seem wasteful, but it’s actually just as easy and cheap as the previous scheme. The main problem is that the tables will just accumulate over time, so you’ll rack up storage costs. Fortunately, BigQuery makes it easy to give an expiration time to tables, so you can set them to be automatically deleted after a week (or however long you want to keep them).

The core BigBingo job has 7 queries/tables instead of 3, but it is designed with the same strategy of keeping all old tables, and this strategy has helped tremendously and kept BigBingo’s data consistent in the face of all sorts of errors:

  • Various transient errors (connection timeouts, internal BigQuery errors, etc.) have caused the whole BigBingo job to occasionally fail, and in these cases, it’s always safe to just retry the job.
  • The log import process has sometimes failed and sometimes taken too long to run, and in both situations, BigBingo automatically fails (and sends an email reporting the failure) because the data it depends on isn’t ready yet.
  • Whenever BigBingo fails, all future BigBingo jobs fail (rather than computing incorrect data) until the data is caught up.
  • Sometimes two instances of the job end up running at the same time. Since the intermediate data is all timestamped, this doesn’t cause any problems.
  • One time, when retrying a failed job, I accidentally gave an incorrect UNIX timestamp. The wrong hour was processed, but it didn’t hurt data integrity at all.
  • In one or two cases, bugs have made the data actually incorrect for a while. Repairing the system is easy: just fix the bug and re-run the BigBingo job from before the bug was introduced.

The system is completely foolproof: I could replace cron with a thousand monkeys repeatedly triggering BigBingo jobs with random UNIX timestamps, and the system would still eventually make progress and remain completely consistent (although it would be a little less cost-efficient). That level of safety means I can stop worrying about maintenance and focus on more important things.

Where’s the source code?

Ideally, BigBingo would be a self-contained open-source library, but it currently has enough dependencies on internal KA infrastructure that it’s both hard to make general and would be a bit difficult to use in isolation anyway.

That said, there’s no reason I can’t share the code, so here’s a Gist with pretty much all of the code (at the time of this blog post). I put an MIT license on it, so feel free to base work off of it or use any of the self-contained pieces.

Khan Academy has lots of open-source projects, and it’s not out of the question for BigBingo to be made truly open source in the future, so let me know in the comments if you think you would use it.

That’s all for now

Curious about any more details? Think we’re doing A/B testing all wrong? Let me know in the comments!