Skip to main content
elric neumann

Graph coloring and register allocation theory

Processors have a limited number of registers—fast memory locations directly accessible by the CPU. When a compiler generates machine code, it needs to map variables and temporary values to these registers. This is the register allocation problem.

Graph coloring is a neat solution because it transforms a hardware constraint problem into a graph theory problem.

Related to SSA (Static Single Assignment), register allocation is a step after SSA that works with lifetimes and access patterns of program variables with better L2 cache coherence without register spilling. It's still classified as an NP-hard problem.

Let Γ=V1,V2,,Vn be the set of variables in a program.

The register allocation problem is defined by the optimization:

mini=1nI(Vimemory)

Where I is the indicator function:

I(Vimemory)={1if variable is in memory0if variable is in register

Graph nodes #

We start by creating nodes for variables or temporary values. Add edges between nodes that interfere (live at the same time). The graph_node structure models a node in the interference graph.

#define MAX_NODES     1000
#define MAX_COLORS    32
#define MAX_NEIGHBORS 100

struct graph_node {
  int id;
  int degree;
  int color;
  bool marked;
  struct graph_node* neighbors[MAX_NEIGHBORS];
  int neighbor_count;
};

Each node has an id to uniquely identify the variable it represents. The degree of a node measures the number of edges connected to it. In the context of graph coloring, the degree is defined as follows.

deg(v)=|uV:u,vE|

Where V is the set of nodes (variables) and E is the set of edges (conflicts).

If a node has deg(v)=k, then at least k+1 colors may be required to color v and its neighbors in the worst case.

The color field stores the assigned register number, starting from 0. If uncolored, the value remains −1.

neighbors is an adjacency list holding pointers to other nodes, limited by MAX_NEIGHBORS.

Graph construction #

A graph is a collection of nodes.

G=(V,E),|V|nodesmax |E|neighborsmax

struct graph {
  struct graph_node* nodes[MAX_NODES];
  int node_count;
};

Construct the graph by adding conflicts between variables. Each pair of variables, x and y, has an undirected edge between them.

void add_edge(struct graph* graph, int src, int dest) {
  struct graph_node* src_node = find_or_create_node(graph, src);
  struct graph_node* dest_node = find_or_create_node(graph, dest);

  if (!src_node || !dest_node) {
    return;
  }

  if (src_node->neighbor_count < MAX_NEIGHBORS &&
      dest_node->neighbor_count < MAX_NEIGHBORS) {
    src_node->neighbors[src_node->neighbor_count++] = dest_node;
    dest_node->neighbors[dest_node->neighbor_count++] = src_node;
    src_node->degree++;
    dest_node->degree++;
  }
}

If the graph has reached the maximum node limit, no edge is added. Otherwise, find_or_create_node will attempt to define a new node.

The adjacency list for both nodes is updated to include the other node as a neighbor. Edges are bidirectional, representing an undirected graph. Each degree of each node should increase with the changes.

struct graph_node* find_or_create_node(struct graph* graph, int id) {
  for (int i = 0; i < graph->node_count; i++) {
    if (graph->nodes[i]->id == id) {
      return graph->nodes[i];
    }
  }

  if (graph->node_count >= MAX_NODES) {
    return NULL;
  }

  struct graph_node* new_node = malloc(sizeof(struct graph_node));

  new_node->id = id;
  new_node->degree = 0;
  new_node->color = -1;
  new_node->marked = false;
  new_node->neighbor_count = 0;

  graph->nodes[graph->node_count++] = new_node;

  return new_node;
}

Mathematically, for two nodes u and v, adding an edge is defined in the relation below.

EEu,vdeg(u)deg(u)+1deg(v)deg(v)+1

Adjacency list representation is only a pre-processing step for simplification.

Graph simplification #

Nodes with the smallest degree are iteratively pushed onto a stack, which will be processed in reverse order for coloring.

struct graph_node* find_lowest_degree_node(struct graph* graph) {
  struct graph_node* lowest_degree = NULL;
  int min_degree = INT_MAX;

  for (int i = 0; i < graph->node_count; i++) {
    struct graph_node* node = graph->nodes[i];

    if (!node->marked && node->degree < min_degree) {
      lowest_degree = node;
      min_degree = node->degree;
    }
  }

  return lowest_degree;
}

Unmarked nodes with the smallest degree are identified (I know this could be done with recursion but it's best to avoid non-TCO recursive procedures). The goal is to reduce the problem size while still keeping the relationships among the nodes.

The most important part of the graph coloring scheme is the iteration below.

while (true) {
  struct graph_node* node = find_lowest_degree_node(graph);
  if (!node) break;

  node->marked = true;
  stack_push(&node_stack, node);
}

Let d(v) represent the degree of a node v. Simplifying the graph reduces the degrees of neighboring nodes.

uN(v): d(u)d(u)1

Where N(v) is the set of neighbors of v.

Chromatic constraint #

The chromatic number χ(G) represents minimal register count, or in math-speak, the minimum number of colors needed to color our graph. It only works if kχ(G).

χ(G)=mink: proper k-coloring

Such that the following constraint is satisfied.

(u,v)E:color(u)color(v)

If a value greater than the edge-connectivity count is used for register count, it may run into bounds issues and segfault. We will revisit this later.

For a graph with maximum degree Δ, the chromatic number satisfies the following relation.

χ(G)Δ+1

We had to be greedy because the constraint is expensive by definition. Actually, W [SAT]-hard satisfiability.

As a test, we have the following case.

struct graph graph = {0};

add_edge(&graph, 1, 2);
add_edge(&graph, 1, 3);
add_edge(&graph, 2, 3);

int reg_count = 2;
int output = color_graph(&graph, reg_count);

printf("reg alloc: %s\n", output ? "pass" : "fail");

for (int i = 0; i < graph.node_count; i++) {
  printf("node %d: reg %d\n", graph.nodes[i]->id, graph.nodes[i]->color);
}

free_graph(&graph);

We have a triangular interference graph with three nodes. With a register count of 2 or number of nodes minus 1, we deliberately fail since multiple variables cannot occupy the same register.

reg alloc: fail
node 1: reg -1
node 2: reg 1
node 3: reg 0

There is a strong use case for this in encoders, esp. if you'd be targeting, say, x86 or ARM, registers require explicit management. Try changing the register count to n > 2, there should be a segfault (ideally bad practice, but you get the point).

Since the stack is traversed in a reverse order, node 1 receives a failure status for coloring, whereas the others pass as boolean.

Complexity #

Time complexity bounds is T(n)O(nk), where k is the graph's treewidth. Amortized time complexity is mostly due to the graph coloring step and is O(n2) in the worst case for k=2, assuming m=O(n2) edges in the graph.

Worst case space complexity is probably S(n)O(n2) — I don't know. Overall space complexity is S(n)O(n+m) due to the storage of the graph and the stack.

Why this approach? #

Why?

What makes this a reliable approach?

Register allocation reduces memory access latency. Optimal register mapping minimizes a τi such that the following holds.

Tcomputation=i=1mαiτi

Where:

L2 & L3 caching of non-allocatable objects #

When optimal register allocation is achieved, more variables are kept in registers rather than in memory or caches. Registers have the lowest possible latency, so minimizing memory accesses and keeping data in registers reduces the number of slower accesses to L2, L3, or even main memory.

The other part is ideal because cache memory is obviously faster than main memory.

Register spilling is a special case when there are more variables than available registers, so the algorithm would force some variables to be stored in the RAM instead of in registers.


There are many items we didn't cover so far, e.g. conservative coalescing, Chaitin's aggressive coalescing allocator, partitioned quadratics, linear scanning, etc. You may find that reading a few papers on this would be insightful.

Cf. final source.

References #