diff options
Diffstat (limited to 'doc/articles/race_detector.html')
-rw-r--r-- | doc/articles/race_detector.html | 388 |
1 files changed, 388 insertions, 0 deletions
diff --git a/doc/articles/race_detector.html b/doc/articles/race_detector.html new file mode 100644 index 000000000..282db8ba4 --- /dev/null +++ b/doc/articles/race_detector.html @@ -0,0 +1,388 @@ +<!--{ + "Title": "Data Race Detector", + "Template": true +}--> + +<h2 id="Introduction">Introduction</h2> + +<p> +Data races are among the most common and hardest to debug types of bugs in concurrent systems. +A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. +See the <a href="/ref/mem/">The Go Memory Model</a> for details. +</p> + +<p> +Here is an example of a data race that can lead to crashes and memory corruption: +</p> + +<pre> +func main() { + c := make(chan bool) + m := make(map[string]string) + go func() { + m["1"] = "a" // First conflicting access. + c <- true + }() + m["2"] = "b" // Second conflicting access. + <-c + for k, v := range m { + fmt.Println(k, v) + } +} +</pre> + +<h2 id="Usage">Usage</h2> + +<p> +To help diagnose such bugs, Go includes a built-in data race detector. +To use it, add the <code>-race</code> flag to the go command: +</p> + +<pre> +$ go test -race mypkg // to test the package +$ go run -race mysrc.go // to run the source file +$ go build -race mycmd // to build the command +$ go install -race mypkg // to install the package +</pre> + +<h2 id="Report_Format">Report Format</h2> + +<p> +When the race detector finds a data race in the program, it prints a report. +The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. +Here is an example: +</p> + +<pre> +WARNING: DATA RACE +Read by goroutine 185: + net.(*pollServer).AddFD() + src/pkg/net/fd_unix.go:89 +0x398 + net.(*pollServer).WaitWrite() + src/pkg/net/fd_unix.go:247 +0x45 + net.(*netFD).Write() + src/pkg/net/fd_unix.go:540 +0x4d4 + net.(*conn).Write() + src/pkg/net/net.go:129 +0x101 + net.func·060() + src/pkg/net/timeout_test.go:603 +0xaf + +Previous write by goroutine 184: + net.setWriteDeadline() + src/pkg/net/sockopt_posix.go:135 +0xdf + net.setDeadline() + src/pkg/net/sockopt_posix.go:144 +0x9c + net.(*conn).SetDeadline() + src/pkg/net/net.go:161 +0xe3 + net.func·061() + src/pkg/net/timeout_test.go:616 +0x3ed + +Goroutine 185 (running) created at: + net.func·061() + src/pkg/net/timeout_test.go:609 +0x288 + +Goroutine 184 (running) created at: + net.TestProlongTimeout() + src/pkg/net/timeout_test.go:618 +0x298 + testing.tRunner() + src/pkg/testing/testing.go:301 +0xe8 +</pre> + +<h2 id="Options">Options</h2> + +<p> +The <code>GORACE</code> environment variable sets race detector options. +The format is: +</p> + +<pre> +GORACE="option1=val1 option2=val2" +</pre> + +<p> +The options are: +</p> + +<ul> +<li> +<code>log_path</code> (default <code>stderr</code>): The race detector writes +its report to a file named <code>log_path.<em>pid</em></code>. +The special names <code>stdout</code> +and <code>stderr</code> cause reports to be written to standard output and +standard error, respectively. +</li> + +<li> +<code>exitcode</code> (default <code>66</code>): The exit status to use when +exiting after a detected race. +</li> + +<li> +<code>strip_path_prefix</code> (default <code>""</code>): Strip this prefix +from all reported file paths, to make reports more concise. +</li> + +<li> +<code>history_size</code> (default <code>1</code>): The per-goroutine memory +access history is <code>32K * 2**history_size elements</code>. +Increasing this value can avoid a "failed to restore the stack" error in reports, at the +cost of increased memory usage. +</li> + +<li> +<code>halt_on_error</code> (default <code>0</code>): Controls whether the program +exits after reporting first data race. +</li> +</ul> + +<p> +Example: +</p> + +<pre> +$ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race +</pre> + +<h2 id="Excluding_Tests">Excluding Tests</h2> + +<p> +When you build with <code>-race</code> flag, the <code>go</code> command defines additional +<a href="/pkg/go/build/#hdr-Build_Constraints">build tag</a> <code>race</code>. +You can use the tag to exclude some code and tests when running the race detector. +Some examples: +</p> + +<pre> +// +build !race + +package foo + +// The test contains a data race. See issue 123. +func TestFoo(t *testing.T) { + // ... +} + +// The test fails under the race detector due to timeouts. +func TestBar(t *testing.T) { + // ... +} + +// The test takes too long under the race detector. +func TestBaz(t *testing.T) { + // ... +} +</pre> + +<h2 id="How_To_Use">How To Use</h2> + +<p> +To start, run your tests using the race detector (<code>go test -race</code>). +The race detector only finds races that happen at runtime, so it can't find +races in code paths that are not executed. +If your tests have incomplete coverage, +you may find more races by running a binary built with <code>-race</code> under a realistic +workload. +</p> + +<h2 id="Typical_Data_Races">Typical Data Races</h2> + +<p> +Here are some typical data races. All of them can be detected with the race detector. +</p> + +<h3 id="Race_on_loop_counter">Race on loop counter</h3> + +<pre> +func main() { + var wg sync.WaitGroup + wg.Add(5) + for i := 0; i < 5; i++ { + go func() { + fmt.Println(i) // Not the 'i' you are looking for. + wg.Done() + }() + } + wg.Wait() +} +</pre> + +<p> +The variable <code>i</code> in the function literal is the same variable used by the loop, so +the read in the goroutine races with the loop increment. +(This program typically prints 55555, not 01234.) +The program can be fixed by making a copy of the variable: +</p> + +<pre> +func main() { + var wg sync.WaitGroup + wg.Add(5) + for i := 0; i < 5; i++ { + go func(j int) { + fmt.Println(j) // Good. Read local copy of the loop counter. + wg.Done() + }(i) + } + wg.Wait() +} +</pre> + +<h3 id="Accidentally_shared_variable">Accidentally shared variable</h3> + +<pre> +// ParallelWrite writes data to file1 and file2, returns the errors. +func ParallelWrite(data []byte) chan error { + res := make(chan error, 2) + f1, err := os.Create("file1") + if err != nil { + res <- err + } else { + go func() { + // This err is shared with the main goroutine, + // so the write races with the write below. + _, err = f1.Write(data) + res <- err + f1.Close() + }() + } + f2, err := os.Create("file2") // The second conflicting write to err. + if err != nil { + res <- err + } else { + go func() { + _, err = f2.Write(data) + res <- err + f2.Close() + }() + } + return res +} +</pre> + +<p> +The fix is to introduce new variables in the goroutines (note the use of <code>:=</code>): +</p> + +<pre> + ... + _, err := f1.Write(data) + ... + _, err := f2.Write(data) + ... +</pre> + +<h3 id="Unprotected_global_variable">Unprotected global variable</h3> + +<p> +If the following code is called from several goroutines, it leads to races on the <code>service</code> map. +Concurrent reads and writes of the same map are not safe: +</p> + +<pre> +var service map[string]net.Addr + +func RegisterService(name string, addr net.Addr) { + service[name] = addr +} + +func LookupService(name string) net.Addr { + return service[name] +} +</pre> + +<p> +To make the code safe, protect the accesses with a mutex: +</p> + +<pre> +var ( + service map[string]net.Addr + serviceMu sync.Mutex +) + +func RegisterService(name string, addr net.Addr) { + serviceMu.Lock() + defer serviceMu.Unlock() + service[name] = addr +} + +func LookupService(name string) net.Addr { + serviceMu.Lock() + defer serviceMu.Unlock() + return service[name] +} +</pre> + +<h3 id="Primitive_unprotected_variable">Primitive unprotected variable</h3> + +<p> +Data races can happen on variables of primitive types as well (<code>bool</code>, <code>int</code>, <code>int64</code>, etc.), +as in this example: +</p> + +<pre> +type Watchdog struct{ last int64 } + +func (w *Watchdog) KeepAlive() { + w.last = time.Now().UnixNano() // First conflicting access. +} + +func (w *Watchdog) Start() { + go func() { + for { + time.Sleep(time.Second) + // Second conflicting access. + if w.last < time.Now().Add(-10*time.Second).UnixNano() { + fmt.Println("No keepalives for 10 seconds. Dying.") + os.Exit(1) + } + } + }() +} +</pre> + +<p> +Even such "innocent" data races can lead to hard-to-debug problems caused by +non-atomicity of the memory accesses, +interference with compiler optimizations, +or reordering issues accessing processor memory . +</p> + +<p> +A typical fix for this race is to use a channel or a mutex. +To preserve the lock-free behavior, one can also use the +<a href="/pkg/sync/atomic/"><code>sync/atomic</code></a> package. +</p> + +<pre> +type Watchdog struct{ last int64 } + +func (w *Watchdog) KeepAlive() { + atomic.StoreInt64(&w.last, time.Now().UnixNano()) +} + +func (w *Watchdog) Start() { + go func() { + for { + time.Sleep(time.Second) + if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() { + fmt.Println("No keepalives for 10 seconds. Dying.") + os.Exit(1) + } + } + }() +} +</pre> + +<h2 id="Supported_Systems">Supported Systems</h2> + +<p> +The race detector runs on <code>darwin/amd64</code>, <code>linux/amd64</code>, and <code>windows/amd64</code>. +</p> + +<h2 id="Runtime_Overheads">Runtime Overhead</h2> + +<p> +The cost of race detection varies by program, but for a typical program, memory +usage may increase by 5-10x and execution time by 2-20x. +</p> |