Scalable distributed map/reduce system written in pure #golang, runs standalone or distributedly.

https://github.com/chrislusf/gleam

Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize.

Gleam is built in Go, and the user defined computation can be written in Go, Lua, Unix pipe tools, or any streaming programs.

It is convenient to write logic in Lua, but Lua is optional. Go is also supported with a little bit extra effort.

High Performance

  • Pure Go mappers and reducers have high performance and concurrency.
  • Optional LuaJIT also has high performance comparable to C, Java, Go. It streamingly processes data, without context switch between Go and Lua.
  • Data flows through memory, optionally to disk.
  • Multiple map reduce steps are merged together for better performance.

Memory Efficient

  • Gleam does not have the common GC problem that plagued other languages. Each executor is run in a separated OS process. The memory is managed by the OS. One machine can host many more executors.
  • Gleam master and agent servers are memory efficient, consuming about 10 MB memory.
  • Gleam tries to automatically adjust the required memory size based on data size hints, avoiding the try-and-error manual memory tuning effort.

Flexible

  • The Gleam flow can run standalone or distributed.
  • Adjustable in memory mode or OnDisk mode.

Easy to Customize

  • The Go code is much simpler to read than Scala, Java, C++.
  • Optional LuaJIT FFI library can easily invoke any C functions, for even more performance or use any existing C libraries.
  • (future) Write SQL with UDF written in Lua.