Why Are Golang Heaps So Complicated
Heaps are commonly used to partially sort a set. Every insertion/deletion from the set is followed by a "fixup" to restore either min-heap or max-heap integrity. For example, a max-heap can be represented as a binary tree where every parent is "greater" than its children. It usually takes a small number of swaps to "fixup" a tree to restore the max-heap property after an insertion or deletion. Even though all of the set elements are not globally ordered, the "biggest" value is always at the top of a max-heap. As a result, heaps have a variety of practical applications.
container/heap
Heaps can be implemented as binary trees with nodes and pointers, but most programming languages provide default heap implementations for lists.
I personally found the standard library heap confusing when I started using Golang.
This is what I was used to coming from Python:
h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')
For comparison, here is the equivalent code in Go adapted from the
container/heap
docs: (run
here):
import (
"container/heap"
"fmt"
)
type Tuple struct {
i int
s string
}
// An TupleHeap is a min-heap of ints.
type TupleHeap []Tuple
func (h TupleHeap) Len() int { return len(h) }
func (h TupleHeap) Less(i, j int) bool {
if h[i].i != h[j].i {
return h[i].i < h[j].i
}
return h[i].s < h[j].s
}
func (h TupleHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *TupleHeap) Push(x any) {
// Push and Pop use pointer receivers because they modify the slice's length,
// not just its contents.
*h = append(*h, x.(Tuple))
}
func (h *TupleHeap) Pop() any {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
// This example inserts several ints into an TupleHeap, checks
// the minimum, and removes them in order of priority.
func main() {
h := &TupleHeap{}
heap.Init(h)
heap.Push(h, Tuple{i: 5, s: "write code"})
heap.Push(h, Tuple{i: 7, s: "release product"})
heap.Push(h, Tuple{i: 1, s: "write spec"})
heap.Push(h, Tuple{i: 3, s: "create tests"})
fmt.Printf("%d ", heap.Pop(h))
// main.Tuple{i:1, s:"write spec"}
// Program exited.
}
I used heaps for several engine optimizations recently, and 3 years later I still find Go heaps confusing. So I took sometime to research why.
Default []int
Heaps
One thing that stands out about collections/heap
is that it is broadly
generic but requires a lot of boilerplate. heap.Interface
supports
binary heaps, array heaps, disk-backed heaps, with any user supported
type.
Much of Golang's standard library aims to be maximally simple, so this
design feature is unsurprising. sort
, for example, is a similar
standard library package whose interfaces are similarly generic. One key
difference is that sort.Ints
and sort.Strings
provide defaults that
remove boilerplate for the most common cases.
Maximally flexible but good defaults sounds ideal. Why doesn't heap support simple defaults?
Slice Pointers
We have documented in the past how slices have sharp edges. Part of why slices are confusing is that they are implicit pointers to underlying storage arrays. One of Nick's main criticisms in that blog is how array mutations can have unpredictable effects on memory. A similar problem plagues the space of heap implementations.
To illustrate, here are two different append
functions. One uses a
default slice pointer, and one uses a pointer to a slice pointer (run
here).
func main() {
arr := []int{1, 2}
append1(arr, 3)
fmt.Printf("#%v\n", arr) // prints: [1 2]
append2(&arr, 3)
fmt.Printf("#%v\n", arr) // prints: [1 2 3]
}
func append1(arr []int, v int) {
arr = append(arr, v)
}
func append2(arr *[]int, v int) {
*arr = append(*arr, v)
}
Without going too deep into the details, slice pointers act like any other value type when passed to a function 1. Mutating the underlying array creates a new slice pointer. If that slice pointer was passed by value, all of the changes are restricted to the inner closure (by contrast, in-place modifying an array through a slice pointer maintains the integrity of the original reference). If the same slice pointer is passed by reference, the outer slice address is updated to the new array.
This is important for heap
implementations because it limits the
design space. We either need:
-
An indirection layer that maintains the newest pointer when we modify memory
-
Return new references when we modify memory, or
-
Operate explicitly on pointers to slices so the original references remain valid
Indirect Slices Through Object Pointer
The first option, chosen by the Go standard library, outlines a minimal possible interface that abstracts unnecessary sorting details but forces a user to acknowledge and maintain the slice indirection:
type IntHeap []int
type Interface interface {
sort.Interface
Push(x any) // add x as element Len()
Pop() any // remove and return element Len() - 1.
}
Instead of passing a pointer of a pointer, we add a concrete type to
make the additional reference explicit. This makes it easy for
heap.Interface
to highlight the slice's edges while supporting
implementation flexibility.
Immutable Pointers
The second option, returning and tracking updated slice pointers for
each heap operation, is one way to implement a heap that handles []int
defaults:
var h []int
h = heap.Push(h, x)
h, y := heap.Pop(h)
Immutable updates force us to continuously track the newest heap ref. We also lose the ability to do the sort of context management, locking, and concurrency control that we usually use in production. Given that Go often designs around in-place updating central objects, I am not surprised the standard library avoided this.
Slice Pointers
The third pattern passes slice pointers to heap functions, similar to
how the standard library
encoding/json
passes string
pointers for JSON deserialization. I forked the standard library and
implemented this last design to convince myself that it works 2:
h := []int{5, 7, 1, 3}
Init(&h)
Push(&h, 3)
min := Pop(&h)
This package supports both heap.Interface
and built-in Comparable
slices ([]int
and []string
for starters). The downside is that
unsupported types panic (alternatively, heap functions could
deviate from the standard library interface and return an error
).
Arguably the biggest downside of the last approach is combining the dynamic and static interfaces into one package. If something goes wrong in the wrapper I would be even more confused than before doing all of this!
Generics
After I originally posted this article, one of the r/golang
mods
jerf pointed out a nice
package that makes it
easy to heap sort built-in types:
h := binheap.EmptyMaxHeap[int]()
h.Push(5)
h.Push(7)
h.Push(1)
h.Push(3)
min := h.Pop() // 1
Another user named twek found an old proposal from 2021 with a similar interface design. He was nice enough to create an example implementation sandbox for others to try if you are interested in running it yourself!
This diverges a bit from the current standard library by combining the helper
methods and the heap.Interface
into one. A heap implementation can
standardize Push()
, Pop()
, and Swap()
for all slices. And the
generic object statically captures the comparable
type's Less()
.
The result is zero effort heaps for all built in types!
The same strategy also makes it easy to heap sort
arbitrary slices with an initialization callback similar to
sort.Slice
(run
here:
import (
"fmt"
"github.com/lispad/go-generics-tools/binheap"
)
type Struct struct {
Idx int
Description string
}
func (s Struct) LessThan(r Struct) bool {
if s.Idx < r.Idx {
return true
}
if s.Idx > r.Idx {
return false
}
return s.Description < r.Description
}
func main() {
h := binheap.EmptyHeap[Struct](func(a, b Struct) bool { return a.LessThan(b) })
h.Push(Struct{5, "write code"})
h.Push(Struct{7, "release product"})
h.Push(Struct{1, "write spec"})
h.Push(Struct{3, "create tests"})
fmt.Println(h.Pop())
}
Summary
At the end of the day, the way slices work makes Golang's heaps a bit
more confusing than other languages. Peeling back the curtain a bit
helped me understand the design space given Golang's language
limitations. The active community on r/golang
helped provide me with
more context and better implementations than I could find on my own. The generics
solution3 uses a slightly different design than the standard library,
but is a simple abstraction and removes most of the boilerplate compared
to container/heap
today. I linked the open proposal below if you are
interested in following the Go core team's thoughts[^4].
I am sure I missed some things in the process of doing this writeup, whether previous design discussions or other blogs regarding this topic. If you have comments our questions feel free to reach out to us on Twitter, Discord, and GitHub!
here [^4] Proposal to add generics implementation to go standard library here