A Golang Generics Use Case: HTML table extraction case study

Golang Generics

You’ve likely heard and read dozens of stories about generics in Go about ordinary slices and maps but haven’t yet thought about a fun way to apply this feature. Let’s implement the peer of pandas.read_html, which maps HTML tables into slices of structs! If it’s achievable even with Rust, why shouldn’t it be with Go?! This essay will show you a thrilling mix of reflection and generics to reach concise external APIs for your libraries.

First, let’s look at the direct inspiration for this article — the most popular interactive data analysis library, Pandas: reading HTML seems to be so common that it’s deemed a commodity and thus, works outside of the box:

Example of using pandas.read_html from Jupyter notebook.

To follow the idiomatic table parsing example, let’s aim at taking S&P 500 list from Wikipedia and turn it into a slice of Ticker instances, where we annotate every column with a table header name:

https://nf-x.medium.com/media/d04c4e21f4e4bdbb92810e8dd816c11eThings could even be concise in Go

Something that we’ll pay attention to here will be [Ticker] from “NewSliceFromURL[Ticker](URL).” This Go 1.18+ feature called type parameter is our fancied way to tell NewSliceFromURL the name of the type, where reflection will assist us in uncovering the names of headers. Before generics, you may have written the similar API as “NewSliceFromURL( Ticker{}, URL),” though I always found it moderately confusing:

Why do we need to pass the empty instance of type if our objective is passing just the type?

Having spent numerous years writing Java code, I’ve gotten used to the concept of “object mapping” from libraries like Jackson. But this blog is about Go, and you’ve probably landed here to figure out how to achieve a similar thing. You may have assumed that Go generics “just work,” but your level of enjoyment depends on your affinity to the other programming ecosystems. As of the time of this writing, methods cannot yet have type parameters, which “opens the flood gates of creativity” for API design. Here’s the illustration:

Removing the need for Public API to initialize the empty struct with a type parameter

It looks a bit like magic, but here’s the simplified way of thinking about it: when we call “NewSliceFromURL[Ticker](),” the compiler substitutes type parameter references with the actual type and dummy T in feeder[T] type becomes dummy Ticker. Still hard to follow, but thrilling? Please read a couple of introductory articles (or more advanced ones).