Data Race Patterns in Golang

Uber has adopted Golang (Go for short) as a primary programming language for developing microservices. Our Golang monorepo consists of about 50 million lines of code (and growing) and contains approximately 2,100 unique Golang services (and growing).

Go makes concurrency a first-class citizen; prefixing function calls with the go keyword runs the call asynchronously. These asynchronous function calls in Go are called goroutines. Developers hide latency (e.g., IO or RPC calls to other services) by creating goroutines. Two or more goroutines can communicate data either via message passing (channels) or shared memory. Shared memory happens to be the most commonly used means of data communication in Golang.

Goroutines are considered “lightweight” and since they are easy to create, Go programmers use goroutines liberally. As a result, we notice that programs written in Go, typically, expose significantly more concurrency than programs written in other languages. For example, by scanning hundreds of thousands of microservice instances running our data centers, we found that Go microservices expose ~8x more concurrency than Java microservices. Unfortunately, higher concurrency also means potential for more concurrency bugs. A data race is a concurrency bug that occurs when two or more goroutines access the same datum, at least one of them is a write, and there is no ordering between them. Data races are insidious bugs and must be avoided at all costs.

Using a dynamic data race detection technique, we developed a system to detect data races at Uber. This system, over a period of six months, detected about 2,000 data races in our Go code base, of which our developers already fixed ~1,100 data races.

This blog will show various data race patterns we found in our Go programs. This study analyzed over 1,100 data races fixed by 210 unique developers over six months. We noticed that Golang makes it easier to introduce data races, due to certain language design choices. There is a complicated interplay between the language features and data races.