Why is the Java toString () loop infinitely linked to indirect loops?

This is more a question that I would like to tell than a question: when printing using toString() Java detects direct loops in the collection (where the collection refers to itself), but not indirect loops (where the collection refers to another collection that refers to first, or with a lot of steps).

 import java.util.*; public class ShonkyCycle { static public void main(String[] args) { List a = new LinkedList(); a.add(a); // direct cycle System.out.println(a); // works: [(this Collection)] List b = new LinkedList(); a.add(b); b.add(a); // indirect cycle System.out.println(a); // shonky: causes infinite loop! } } 

This was real for me, because in the debugging code to print the collection (I was surprised when I saw a direct loop, since I realized that they generally performed the check in general). The question arises: why?

The explanation I can think of is that it’s very inexpensive to check a collection that references itself, since you need to store the collection (which you already have), but for longer cycles you need to save all the collections with which you collide starting at the root. In addition, you may not be able to say exactly what the root is, and therefore you will need to store each collection in the system, which you do anyway, but you will also need to do a hash search for each element of the collection. This is very expensive for a relatively rare case of cycles (in most programs). (I think), the only reason he checks for direct loops is because it is so cheap (one comparative comparison).

OK ... I somehow answered my question - but did I miss something important? Anyone want to add something?


Clarification: I now understand that the problem I saw is specific to printing the collection (i.e. the toString() method). There are no problems with loops per se (I use them myself and need them); the problem is that Java cannot print them. Edit Andrzej Doyle points to not only collections, but also any object that is called toString .

Given that it is limited to this method, here is an algorithm for checking it:

  • root is the object to which the first toString() is called (in order to determine this, you need to maintain a state of whether toString is now or not, so this is inconvenient).
    • when you cross each object, you add it to IdentityHashMap along with a unique identifier (for example, an index with an extension).
    • but if this object is already on the map, write down its identifier.

This approach also correctly displays multirefs (a node that is mentioned more than once).

The cost of memory is IdentityHashMap (one link and index per object); complexity cost is a hash search for each node in a directed graph (i.e. each printed object).

+4
source share
5 answers

I think this is fundamental, because although the language is trying to stop you from shooting in the leg, it should not be so expensive. Therefore, although it is almost free to compare object pointers (for example, obj == this ), everything related to this includes method calls for the object you are passing through.

And at this point, the library code does not know anything about the objects that you are passing. Firstly, the generics implementation does not know if they are instances of Collection (or Iterable ), and although it can find it through instanceof , who will tell if it is a “collection-like” object, which is not really a collection, but still contains a pending circular link? Secondly, even if it is a collection without telling what its actual implementation is, and thus the behavior is similar. It was theoretically possible to put together a collection containing all the Luns, which would be used lazily; but since the library does not know this, it would be terribly expensive to iterate over each record. Or, in fact, you can even create a collection with Iterator that has never been interrupted (although it would be difficult to use in practice because so many library constructs / classes assume that hasNext will eventually return false ).

Thus, it basically comes down to an unknown, possibly infinite cost, to stop you from doing something that really can't be a problem.

+5
source

I would like to note that this statement:

when printing with toString (), Java will detect direct loops in the collection

confuse.

Java (JVM, language itself, etc.) does not detect self-esteem. Rather, it is a property of the toString() / override java.util.AbstractCollection method.

If you were to create your own implementation of Collection , the language / platform will not automatically protect you from self-esteem like this - if you do not extend AbstractCollection , you will need to make sure that you cover this logic yourself.

I could split my hair here, but I think this is an important difference. Just because one of the foundation classes in the JDK does something does not mean that it is "Java" as a common umbrella.

Here is the corresponding source code in AbstractCollection.toString() , with a keyline comment:

 public String toString() { Iterator<E> i = iterator(); if (! i.hasNext()) return "[]"; StringBuilder sb = new StringBuilder(); sb.append('['); for (;;) { E e = i.next(); // self-reference check: sb.append(e == this ? "(this Collection)" : e); if (! i.hasNext()) return sb.append(']').toString(); sb.append(", "); } } 
+3
source

The problem with the algorithm you propose is that you need to pass IdentityHashMap to all involved collections. This is not possible with the published Collection APIs. The Collection interface does not define the toString(IdentityHashMap) method.

I assume that the one who put self-esteem in the AbstractCollection.toString() method at Sun, thought about all this, and (along with his colleagues) decided that the “complete solution” is on top. I think the current design / implementation is correct.

Object.toString implementations are not required to be protected from bombs.

+1
source

You are right, you have already answered your question. Testing for longer cycles (especially very long ones, such as a period length of 1000) will be too complicated and not necessary in most cases. If someone wants this, he must check it himself.

The direct loop case, however, is easy to verify and will occur more often, so it is executed using Java.

0
source

You cannot detect indirect loops; This is a typical example of a stop problem.

0
source

Source: https://habr.com/ru/post/1286223/


All Articles