Under the hood

Edit: Changed the blog template to something that doesn’t cut off my comment lines. Also, removed extra line breaks. Wtf, Blogger? If I hand-edit the HTML, I expect zero generated tags!

Today was a relatively good day. I’m currently on a Java course at the Open University, and while there’s relatively little new they’re teaching me, it has proven to be a good source of inspiration to learn new things myself. One of those things was how to convert java bytecode to a readable representation, and to an admittedly small extent, read the converted bytecode.

(By “convert” I mean “run javap -c -private ClassName > ClassName.bytecode”, not exactly rocket science.)

That was a few days ago. The reason I did that was to prove my hunch about how many operations an array size lookup would be (given the array was an instance variable) compared to a static final integer variable — this was a case where the array was initialized to a length specified by said variable. Without actually having factual knowledge, I conjectured that a static final integer could be inlined by the compiler, whereas the array size lookup would be at least two operations. Turns out that I was right on both counts (edit: at least in this instance; I’m not experienced enough to know of any proven reason why optimizing the array size lookup would be impossible). This java code:

public class BytecodeTest {
   private static final int FOO = 42;
   private char[] arr = new char[FOO];

   public int getLength() {
       return arr.length;
   }  

   public int getFinal() {
       return FOO;
   }  
}

turns into this bytecode (using the Sun Java 6 compiler):

Compiled from "BytecodeTest.java"
public class BytecodeTest extends java.lang.Object{
private static final int FOO;

private char[] arr;
      
public BytecodeTest();
 Code:
  0:   aload_0
  1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
  4:   aload_0
  5:   bipush  42
  7:   newarray char
  9:   putfield    #2; //Field arr:[C
  12:  return

public int getLength();
 Code:
  0:   aload_0
  1:   getfield    #2; //Field arr:[C
  4:   arraylength
  5:   ireturn

public int getFinal();
 Code:
  0:   bipush  42 
  2:   ireturn

}

That makes it three instructions, not two. For my own benefit (since typing this out will ensure I don’t forget it), here’s a not-quite-accurate explanation of what goes on there:

  // push local variable "0" (the "this" pointer) into stack
  0:   aload_0
  // call getfield, which pops (?) "this" off the stack and pushes the field onto the stack
  // (the #2; bit is the location of the field, not entirely sure how it should be interpreted
  // note to self: find out!)
  1:   getfield    #2; //Field arr:[C
  // call arraylength which pops the array off the stack and pushes the length
  4:   arraylength
  // return integer from stack
  5:   ireturn

Compare and contrast with getFinal which doesn’t do any lookups, just says “push the following byte to the stack as an integer” and then “return integer from stack”.

Now, before anyone accuses me of premature optimization, that’s not what this was about. This was mainly about fact-checking and a desire to understand what goes on underneath. 🙂

Any day I get to prove myself right is a good day. But any day I get to learn something like this and deepen my understanding about a language I use is even better.

Leave a Reply

Your email address will not be published. Required fields are marked *