Skip to main content
elric neumann

Endianness in the JVM bytecode layout

Ever disassembled a Java class file and tried introspecting the layout? Bytecode symbols are fixed, ordered and the method's code section is filled with JVM stack operations (e.g. OP_IMUL). An actual disassembler is extremely complicated and in most cases impracticable for consumers (unless when interfacing). Instead, write the bytecode.

figure 1

void write_uint16_be(std::ofstream& stream, uint16_t value) {
  stream.put(static_cast<char>((value >> 8) & 0xFF)); // <-- MSB
  stream.put(static_cast<char>(value & 0xFF));        // <-- LSB
}

void write_uint32_be(std::ofstream& stream, uint32_t value) {
  stream.put(static_cast<char>((value >> 24) & 0xFF)); // <-- MSB
  stream.put(static_cast<char>((value >> 16) & 0xFF));
  stream.put(static_cast<char>((value >> 8) & 0xFF));
  stream.put(static_cast<char>(value & 0xFF));         // <-- LSB
}

write_uint16_be encodes a 16-bit (u2) field by extracting and sequentially writing its most significant byte (MSB) and least significant byte (LSB) in big-endian ordering. The encoding operation uses right shifts (>>) to isolate higher-order bits, followed by AND (& 0xFF) to mask irrelevant higher bits.

For instance:

write_uint32_be encodes a 32-bit (u4) field using four sequential writes, starting from the most significant byte:

Cf. the below for a class file disassembly with invokedynamic by Ben Evans (Java Magazine, Oracle). The disassembler reinterprets the bytecode into the text format, which is way more intuitive to work with for Java developers.

public static void main(java.lang.String[]) throws java.lang.Exception;
   Code:
      0: invokedynamic #2, 0 // InvokeDynamic
                                             // #0:run:()Ljava/lang/Runnable;
      5: astore_1
      6: new #3 // class java/lang/Thread
      9: dup
      10: aload_1
      11: invokespecial #4 // Method java/lang/Thread."<init>":
                                        // (Ljava/lang/Runnable;)V
      14: astore_2
      15: aload_2
      16: invokevirtual #5 // Method java/lang/Thread.start:()V
      19: aload_2
      20: invokevirtual #6 // Method java/lang/Thread.join:()V
      23: return

In the context of JVM bytecode, the encoding functions are required in various fields:


References #