Is it possible to decompile Java bytecode to type-type source parameters

I know that the Java compiler replaces all type parameters in generic types with its borders or Object if the type parameters are not limited during the Type Erasure process. The generated machine bytecode will display the replaced boundaries or Object .

Is there a way to take the given machine bytecode and decompile it back into a Java file that contains the original type parameters in common types? Is there a decompiler that can do this? Or is this process simply irreversible due to the nature of the compilation process?

+5
source share
3 answers

You are right that at the bytecode level a lot of information is lost when defining and interacting with generic types. Erasing styles was nice to maintain compatibility: if in most cases you apply type safety at compile time, you don't have to work hard at runtime, so you can reduce the generic types to their raw equivalents.

And what's the key: checking compile time. If you need the flexibility and security of generic types, your compiler should know a lot about the typical types that you interact with. In many cases, you will not have the source code for these classes, so it should get the information from somewhere. And that: metadata. The built-in .class file along with bytecode is a lot of information: all that the compiler needs to know is that you safely use common types of libraries. So, what generic information is stored?

Type Variables and Constraints

The most basic thing a compiler needs to know in order to use a generic type is a list of type variables. For any general type or general method, the names and positions of type variables are stored. Moreover, any restrictions (upper or lower bound) are also included.

Common supertype subtypes

Sometimes you write a class that extends a common class or implements a common interface. If you write a StringList that extends ArrayList<String> , you inherit a lot of functions. If someone wants to use your StringList for its intended purpose and without source code, it is not enough for the compiler to know that you have expanded ArrayList ; he should know that you have expanded ArrayList<String> . This applies transitively up the hierarchy: it must know ArrayList<> extends AbstractList<> , etc. Thus, this information is stored. Your class a file will contain the complete general signatures of any universal supertypes (classes or interfaces).

Signatures of participants

The compiler cannot check if you are using the generic type correctly if it does not know the full field types, method parameters and return types. So you guessed it: this information is included. If any part of a class member contains a generic type, wildcard, or type variable, that member will receive its signature information stored in the metadata.

Local variables

It is not necessary to store information about local types of variables in order to use a type. This may be useful for debugging, but more on that. There are metadata tables that can be used to record the names and types of variables and the bytecode ranges in which they exist. Depending on the compiler, they may or may not be written by default. You can force javac emit them by passing -g:vars , but I believe that they are not specified by default.

Call sites

One of the biggest problems for decompilers, mainly affecting general output in method bodies, is that the calling sites that call the general methods do not store information about type arguments. This creates huge headaches for APIs such as Java 8 Streams, where common operators connect to each other, each of which accepts anonymously typed lambdas (which can be contravariant in their types of arguments and covariant in their types of return). This is an output type nightmare, but it is a problem for any code that interacts with generics. Such code does not become significantly more difficult to decompile simply because it exists in the generic type.

How does this affect decompilation?

Modern Java decompilers, such as Procyon and CFR, should be able to restore typical types well. If local variable metadata is available, the results should be close to the source code. If not, they will have to try to derive general type arguments in the bodies of the methods based on the analysis of the data stream. Essentially, the decompiler must look at what data is flowing into and out of shared instances, and use what it knows about the type of this data to guess the type arguments. Sometimes it works very well; in other cases, not so much (see previous comment on Java 8 Streams).

At the API level, even the type of signatures - the results should be in place.

Warning

Strictly speaking, all metadata described here is optional: they are needed only at compile time (or decompilation time). If someone runs the compiled classes through an obfuscator, optimizer, or some other utility, all this information can be deleted. This will not affect runtime.

TL; DR; Conclusion

Yes, of course, you can decompile general types and methods with their type parameters intact. Assuming that the required metadata is present, getting the correct type and member signature is the "easy" part. The correct output of arguments like generic instances and method calls is a complex bit, but it is a problem for any code that interacts with generics.

As already mentioned, Procyon and CFR should do a pretty decent job of recovering typical types and methods.

+3
source

This mainly depends on whether the code was run. Although it is true that generics use type erasure, compilers usually include information about the source level, such as generic types, such as metadata in the class file for various reasons: reflection, debugging, compilation with closed source libraries, etc.

So, for the correct behavior of the class file, it should be possible to return information. Are there any shelf tools for this, I don’t know. Many decompilers try to restore common types, but I don’t know how reliable they are.

If the code was confused, all metadata will be deleted, so there is no hope of restoring the original types.

+1
source

Yes, this is called the decompilation process for converting machine code, or we can say it as byte code to its original source code, but to some extent! There are some decompilers that exist!
You need to get some help from decompilers and take a little effort to convert this byte code to its generic type, as you said. But it is impossible to make such a reverse engineering process with a high coefficient of accuracy, since modern compilers are designed in such a way that they go through several steps to convert this source code into its native code so that you can return after reversing a simple assembly code that is not human readable, but the same work can be easily done to some extent using decompilers. The "java decompiler project" or the JD project is what I'm talking about http://jd.benow.ca I hope your concept is clear!

-3
source

Source: https://habr.com/ru/post/1271404/


All Articles