Graph coloring and register allocation theory
Processors have a limited number of registers—fast memory locations directly accessible by the CPU. When a compiler generates machine code, it needs to map variables and temporary values to these registers. This is the register allocation problem.
Graph coloring is a neat solution because it transforms a hardware constraint problem into a graph theory problem.
Related to SSA (Static Single Assignment), register allocation is a step after SSA that works with lifetimes and access patterns of program variables with better L2 cache coherence without register spilling. It's still classified as an NP-hard problem.
Let
The register allocation problem is defined by the optimization:
Where
- Each variable or temporary value becomes a node in the graph
- Interference between variables (can't be in the same register simultaneously) creates edges
- Colors represent unique registers
Graph nodes #
We start by creating nodes for variables or temporary values. Add edges between nodes that interfere (live at the same time). The graph_node
structure models a node in the interference graph.
#define MAX_NODES 1000
#define MAX_COLORS 32
#define MAX_NEIGHBORS 100
struct graph_node {
int id;
int degree;
int color;
bool marked;
struct graph_node* neighbors[MAX_NEIGHBORS];
int neighbor_count;
};
Each node has an id
to uniquely identify the variable it represents. The degree
of a node measures the number of edges connected to it. In the context of graph coloring, the degree is defined as follows.
Where
If a node has
The color
field stores the assigned register number, starting from 0
. If uncolored, the value remains −1
.
neighbors
is an adjacency list holding pointers to other nodes, limited by MAX_NEIGHBORS
.
Graph construction #
A graph is a collection of nodes.
struct graph {
struct graph_node* nodes[MAX_NODES];
int node_count;
};
Construct the graph by adding conflicts between variables. Each pair of variables,
void add_edge(struct graph* graph, int src, int dest) {
struct graph_node* src_node = find_or_create_node(graph, src);
struct graph_node* dest_node = find_or_create_node(graph, dest);
if (!src_node || !dest_node) {
return;
}
if (src_node->neighbor_count < MAX_NEIGHBORS &&
dest_node->neighbor_count < MAX_NEIGHBORS) {
src_node->neighbors[src_node->neighbor_count++] = dest_node;
dest_node->neighbors[dest_node->neighbor_count++] = src_node;
src_node->degree++;
dest_node->degree++;
}
}
If the graph has reached the maximum node limit, no edge is added. Otherwise, find_or_create_node
will attempt to define a new node.
The adjacency list for both nodes is updated to include the other node as a neighbor. Edges are bidirectional, representing an undirected graph. Each degree of each node should increase with the changes.
struct graph_node* find_or_create_node(struct graph* graph, int id) {
for (int i = 0; i < graph->node_count; i++) {
if (graph->nodes[i]->id == id) {
return graph->nodes[i];
}
}
if (graph->node_count >= MAX_NODES) {
return NULL;
}
struct graph_node* new_node = malloc(sizeof(struct graph_node));
new_node->id = id;
new_node->degree = 0;
new_node->color = -1;
new_node->marked = false;
new_node->neighbor_count = 0;
graph->nodes[graph->node_count++] = new_node;
return new_node;
}
Mathematically, for two nodes
Adjacency list representation is only a pre-processing step for simplification.
Graph simplification #
Nodes with the smallest degree are iteratively pushed onto a stack, which will be processed in reverse order for coloring.
struct graph_node* find_lowest_degree_node(struct graph* graph) {
struct graph_node* lowest_degree = NULL;
int min_degree = INT_MAX;
for (int i = 0; i < graph->node_count; i++) {
struct graph_node* node = graph->nodes[i];
if (!node->marked && node->degree < min_degree) {
lowest_degree = node;
min_degree = node->degree;
}
}
return lowest_degree;
}
Unmarked nodes with the smallest degree are identified (I know this could be done with recursion but it's best to avoid non-TCO recursive procedures). The goal is to reduce the problem size while still keeping the relationships among the nodes.
The most important part of the graph coloring scheme is the iteration below.
while (true) {
struct graph_node* node = find_lowest_degree_node(graph);
if (!node) break;
node->marked = true;
stack_push(&node_stack, node);
}
Let
Where
Chromatic constraint #
The chromatic number
Such that the following constraint is satisfied.
If a value greater than the edge-connectivity count is used for register count, it may run into bounds issues and segfault. We will revisit this later.
For a graph with maximum degree
We had to be greedy because the constraint is expensive by definition. Actually, W [SAT]-hard satisfiability.
As a test, we have the following case.
struct graph graph = {0};
add_edge(&graph, 1, 2);
add_edge(&graph, 1, 3);
add_edge(&graph, 2, 3);
int reg_count = 2;
int output = color_graph(&graph, reg_count);
printf("reg alloc: %s\n", output ? "pass" : "fail");
for (int i = 0; i < graph.node_count; i++) {
printf("node %d: reg %d\n", graph.nodes[i]->id, graph.nodes[i]->color);
}
free_graph(&graph);
We have a triangular interference graph with three nodes. With a register count of 2
or number of nodes minus 1, we deliberately fail since multiple variables cannot occupy the same register.
reg alloc: fail
node 1: reg -1
node 2: reg 1
node 3: reg 0
There is a strong use case for this in encoders, esp. if you'd be targeting, say, x86 or ARM, registers require explicit management. Try changing the register count to n > 2
, there should be a segfault (ideally bad practice, but you get the point).
Since the stack is traversed in a reverse order, node 1
receives a failure status for coloring, whereas the others pass as boolean.
Complexity #
Time complexity bounds is
Worst case space complexity is probably
Why this approach? #
Why?
What makes this a reliable approach?
Register allocation reduces memory access latency. Optimal register mapping minimizes a
Where:
is access frequency is access latency is total memory interactions
L2 & L3 caching of non-allocatable objects #
When optimal register allocation is achieved, more variables are kept in registers rather than in memory or caches. Registers have the lowest possible latency, so minimizing memory accesses and keeping data in registers reduces the number of slower accesses to L2, L3, or even main memory.
- For a frequently accessed variable (high
), we want to minimize by keeping it in the register, without cache and memory latency. - For less frequently accessed variables (low
), it may be acceptable to store them in L2 or L3, since lower frequency of access compensates for the higher latency of the cache levels.
The other part is ideal because cache memory is obviously faster than main memory.
Register spilling is a special case when there are more variables than available registers, so the algorithm would force some variables to be stored in the RAM instead of in registers.
There are many items we didn't cover so far, e.g. conservative coalescing, Chaitin's aggressive coalescing allocator, partitioned quadratics, linear scanning, etc. You may find that reading a few papers on this would be insightful.
Cf. final source.
References #
- Register allocation - Wikipedia
- PDF: Register allocation - Stanford University
- PDF: A Survey on Register Allocation - UCLA CS Compilers Group
- PDF: Register Allocation by Puzzle Solving - LLVM
- Previous: Intuition of JIT compilers with C and x86-64
- Next: Review of 2024