Substrings vs. Regular Expressions – Benchmarking in Golang


In this third part of a four-part series, we’ll cover benchmarking in Go using our text filtering tool as a test subject. You can find the other parts below:

Previously, we built a Go program to remove unwanted text and extended it with regular expression support. We’ll learn about benchmarking in Go by comparing the initial substring search approach against the added standard expression pattern matching. Note that the methods used here are simple and intended to facilitate benchmarking discussions, not prove objectively that one way is strictly faster than another.

Go’s standard library testing package provides extensive test support (you might have seen functions of the form TestXxx(t *testing.T)), as well as benchmarking support via Type B. B facilitates managing benchmark timing, iterations, etc., as well as the typical utility functions to clean up, log, or fail a test.

Using these tools, we can garner insights on whether the tool runs faster (under these scenarios) using substring or pattern matching.

Setting Up the Benchmarks

Go benchmarks are functions that live in a <filename>_test.go file, where <filename> is typically the name of the file containing the functions to be tested. Both files are typically in the same package, and benchmark functions take the form of:

func BenchmarkXxx(b *testing.B) { // benchmark code }

When writing the benchmarks, we usually require some code-specific initial setup and then move to a loop that will run the code under test several times (determined by the testing package). In the end, we are told the number of loops that ran and how long it took (on average) to run each loop. From there, we can analyze code performance.

We will start by setting up a []stringwith a few lines of text — this will be common to substring and pattern match search. We then will build a config that contains either a set of keys or a pattern, and then we’ll run lineMatches()(in the benchmark loop) against the input text and look at how long each iteration takes. We’ll do this with varying amounts of keyphrases and pattern complexity (each built to match the same text) and compare the results.

Building the Benchmarks with Golang

Our benchmarks will be pretty simple: they’ll all use the same input text and generate a config with either a keys or a pattern. We’ll make congruent substring search and pattern matching benchmarks for each query type. Then, we just need to run the tests and compare the results.