Minimal memory usage in Golang with large stream processing


How to work with file streams with Golang

Being a member of the Platform team of my company, I have been playing around with many use cases of file handling like handling a universal file uploading center, processing mailing attachments, and processing and exporting large files. In the old days, the work has been much easier because we fully control a whole server. We can write to a persistent file on servers’ disks, and the resources are much larger for the job. In those days, your codebase is launched on smaller handling units, like pods. The resources are only virtually allocated and usually limited in many cases, you should be aware of how to use them efficiently. The gracefully handling and OOM termination may be a big problem for those already familiar with the free usage of memory.

In my opinion, Reader and Writer are the most crucial parts of Golang. It’s the vital supporter for goroutine and concurrent handling and the key to the lean and well performance of go programming. Therefore, to master the Go programming language, you should be able to manipulate go buffers and goroutines elegantly and gracefully. In this article, I will discuss the problems I encountered while handling the file stream from the satellite client to the centralized file uploader before uploading them to the cloud storage engine.

Multipart file forwarding

In Golang, if you search for anything like reader handling, you should go through something like this.

r := strings.NewReader("Golang is a cool language designed with systems programming in mind.")
b, err := ioutil.ReadAll(r)
if err != nil {
   log.Fatal(err)
}
// Working with your loaded bytes
fmt.Printf("%s", b)

It’s usual to see something like this in your code because many internet practices have used it. Since the first time using Reader, I have also used it. But, it may deal a lot of damage to your memory usage if you overuse it, which significantly affects the amount of data you can process.

Typically the data you have read is in a predefined format, which means after you read it, you must also pass it to another data processor to return to do your job. One alternative you can come across is using `io.Copy`: