Skip to main content
elric neumann

Pointer compression in C

Memory addressing is aligned by default for fixed word size, but alignment may need to change. 64-bit systems usually represent pointers using 8 bytes which isn't a great idea for VMs. Compressing pointers is a workaround that is based on a predefined memory region, base address, offset and optionally a mask (for 48-bit or 52-bit addressing on x86-64).

Unmasked compression

Pointer compression is strictly a space complexity optimization.

We won't look into how this is achieved in V8 or the JVM, but rather a general idea of how it would work—i.e. without small integers, tagged values (V8), HotSpot JVM ordinary object pointers, etc.

Unmasked compression (direct offset representation) #

This is the naive-ish approach.

The pointer is transformed into a smaller representation by finding the difference between the pointer’s actual address and a predefined base address. Instead of storing the full 64-bit pointer, we store the offset, which is the difference between the pointer and the base.

For instance, if the pointer ptr is at memory address 0x7ffda298da20 and the base address is 0x7fffb829b32f, the offset can be computed as:

offset = ptr - base

With this we have the offset 0x47d64c50. By only storing this smaller offset (usually 4 bytes), memory usage is reduced compared to storing the full pointer (8 bytes on a 64-bit system).

The compressed pointer would have to store the base and offset, which is int32_t to allow decompression with sign extension.

struct compressed_ptr {
  int32_t offset;
  uintptr_t base;
};

Compression performs basic pointer arithmetic on the pointer and base values, stores the offset and caches the base which we'll use in the decompression part.

struct compressed_ptr compress_ptr(void *ptr, void *base) {
  struct compressed_ptr compressed;
  uintptr_t ptr_value = (uintptr_t)ptr;
  uintptr_t base_value = (uintptr_t)base;

  compressed.offset = (int32_t)(ptr_value - base_value);
  compressed.base = base_value;

  return compressed;
}

To decompress, simply reverse the process.

void *decompress_ptr(const struct compressed_ptr *compressed) {
  int32_t offset = compressed->offset;
  uintptr_t base = compressed->base;

  uintptr_t out = base + (uintptr_t)offset;
  return (void *)out;
}

To test it out, we have to define the prelude variables, including a memory region. We will use 1 kilobyte of memory space.

unsigned char memory_region[1 << 10];
void *base = memory_region;
void *ptr = memory_region + (1 << 7);

None of any of this changes ASLR, addressing still remains deterministic and the tests are conclusive.

Compressed Pointer:
  Base: 0x7ffd50f99d40
  Offset: 0x80
Original pointer:     0x7ffd50f99dc0
Decompressed pointer: 0x7ffd50f99dc0

This approach is faster and has no branch prediction requirements in decompression since we can just rebuild the pointer's address. Usually there'd be extra artifacts that would require sign extension which is why the offset is signed.

Masked compression (obfuscation + reduction) #

We choose a bitmask (e.g. 0xFFFFFFFF) to a pointer which will change certain bits, for reducing the address space, improving alignment and as a primitive way to achieve obfuscation.

Masked compression

Obviously, this implies that an actual obfuscator could be used to defend against heap abuse or memory corruption, if required.

Masking, in this context, doesn't just represent an offset from a base, but the pointer will be transformed before being compressed by using XOR or AND. Everything else remains the same.

The base is XOR'd and masked.

compressed.base = (base_value ^ 0xFFFFFFFF);

Offset calculation is relative to the unmasked base.

compressed.offset =
    (int32_t)((ptr_value - base_value) ^ (base_value & 0xFFFFFFFF));

In the decompression, the process is reversed, but with an offset recorrection.

uintptr_t base = compressed->base ^ 0xFFFFFFFF;

uintptr_t offset_correction = base & 0xFFFFFFFF;
uintptr_t full_ptr =
    base + ((uintptr_t)compressed->offset ^ offset_correction);

The full_ptr represents the original pointer we used.

Test results are still conclusive.

Compressed Pointer:
  Base: 0x7ffe8a5963bf
  Offset: 0x75a69cc0
Original pointer:     0x7ffe75a69cc0
Decompressed pointer: 0x7ffe75a69cc0

Try re-running with GDB and debug symbols.

Cf. final source.

Resources #