Why is the JVM Integer stored as byte and short?

Here is one piece of code

public class Classifier { public static void main(String[] args) { Integer x = -127;//this uses bipush Integer y = 127;//this use bipush Integer z= -129;//this use sipush Integer p=32767;//maximum range of short still sipush Integer a = 128; // use sipush Integer b = 129786;// invokes virtual method to get Integer class } } 

Here is a partial byte code of this

  stack=1, locals=7, args_size=1 0: bipush -127 2: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 5: astore_1 6: bipush 127 8: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 11: astore_2 12: sipush -129 15: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 18: astore_3 19: sipush 32767 22: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 25: astore 4 27: sipush 128 30: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 33: astore 5 35: ldc #22 // int 129786 37: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 40: astore 6 42: return 

As I see it, for the Integer range between -128 to 127 it uses bipush , which push the bytes bipush stack as an integer value. In the range -32768 to 32767 it uses short as a wrapper class like sipush . For the next, it uses Integer. What does the JVM use bytes and short to store the value of Integer?

+6
source share
3 answers

As far as I understand.

As you can from the remaining byte code instruction, it does not save the int as byte or short . Firstly, bipush or short : bipush has 2 bytes for the operation code and the second for value. that is, it can be in the range from -128 tp 127 (i.e., from a power of 8) This saves space and execution time. As you can see from recompiling the code compiler, a link is created for this variable as an integer type

 2: invokestatic #16 // Method java/lang/Integer.valueO f:(I)Ljava/lang/Integer; 

and then astore_1 , which store what on top of the stack ie a reference here into local variable 1 Similarly for sipush , where you can save a value from the range (-32768 to 32767) beacuse a 3-byte instruction set, one byte for the operation code and the remaining two bytes for the value (i.e. can hold 2 powers of 16)

Now why not the lDC JVM has a constant pool for each type. A bytecode requires data, but most of the time this data is too large to be stored directly in bytecodes. therefore, it is stored in a constant pool, and the bytecode contains a link to the constant pool. That lDC pushes the #index constant from the constant pool (String, int or float) onto the stack. That consumes extra time and loops. The following is an approximate comparison of the lDC operation and the bipush operation

enter image description hereenter image description here

Jvm bytecode ref here he says

If possible, it is more efficient to use one of bipush, sipush, or one of the const commands instead of ldc.

+1
source

It is not saved as byte or short at runtime, only in bytecode. Say you want to store the value 120 in Integer . You are writing a compiler, so you analyze the source code and know that a constant value of 120 can fit into one signed byte. Since you do not want to waste space on your bytecode to save the value 120 as a 32-bit (4 bytes) value, if it can be stored in 8 bits (1 byte), you will create a special instruction that can only load one byte from method bytecode and save it on the stack as 32bit Integer . This means that at run time you really have an Integer data type.

The resulting code is smaller and faster than using ldc everywhere, which requires more interaction with jvm after manipulating the constant runtime pool.

bipush has 2 bytes, one byte operation code, the value of the second byte of the immediate value. Since you have only one byte for the value, it can be used for values ​​from -128 to 127.

sipush has 3 bytes, one byte operation code, the second and third bytes constant value.

bipush format:

 bipush byte 

sipush format:

 sipush byte1 byte2 
+5
source

One reason might be an advantage with respect to the byte code mentioned in the other answers.

However, one can also reason about this, starting with the language. In particular, you do not want to embed casting when the (constant) value is actually represented in the target type.

So, one of the reasons for the observed behavior: the compiler uses the smallest possible type that can represent a given constant value.

To assign an int (or Integer ), this is not needed - but it does no harm when the bytecode assigns the "smaller" type to the "larger" type. Conversely, for smaller types, you must use a smaller type, so using "bytecode for the smallest type" is the default behavior.


This is also implicitly referred to as β€œconst expression constriction at compile time” in Section 5.2. Destination contexts Java language specifications

Narrowing the primitive conversion can be used if the type of the variable is byte, short, or char, and the value of the constant expression is represented in the type of the variable.

...

Reducing compilation time of constant expressions means that the code, for example:

 byte theAnswer = 42; 

allowed. Without narrowing down, the fact that the integer literal 42 is of type int will mean that it needs to be dropped to a byte:

 byte theAnswer = (byte)42; // cast is permitted but not required 
0
source

Source: https://habr.com/ru/post/985986/


All Articles