HTML for formatting text

Are there any java APIs that perform similar actions like Html.fromHtml () like in Android? JSoup parses and removes the tags, but the result is not formatted. eg:

<ol type="1"> <li>Test1</li> <ol type="a"> <li>TestA1</li> <li>TestB1</li> </ol> <li>Test2</li> <ol type="a"> <li>TestA2</li> <li>TestB2</li> </ol> </ol> 

should give me something like

  • Test1

    a. TestA1

    b. Testb1

  • Test2

    a. TestA2

    b. Testb2

+4
source share
1 answer

There is no api for jsoup-to- "formated text" , but you can convert the lists yourself:

  • iterate over all children of the ul / ol element, which is the root of the list
  • if element : format and add output line
  • if sublist : make 1. - but with a sublist element - and add the result

Example:

In this example, I use the type attribute to determine which bullet is required, and use the symbol (!) To index the elements. If there is no corresponding attribute, char 1 .

Implementation:

 /** * Convert the Listelement <code>root</code> to a formated string-representation. * * @param root Rootelement of the list (normally 'ul' or 'ol' tag) * @param depth Depth of the list (<code>=0</code> for root element) * @return List as String */ public String createList(Element root, int depth) { final String indentation = createIndentation(depth); // create indentation StringBuilder sb = new StringBuilder(); final String typeAttr = root.attr("type"); // Get the character used as bullet (= 'type' attribute) char type = typeAttr.isEmpty() ? '1' : typeAttr.charAt(0); // if 'type' attribute: use it, else: use '1' instead for( Element sub : root.children() ) // Iterate over all Childs { // If Java < 7: use if/else if/else here switch( sub.tagName() ) // Check if the element is an item or a sublist { case "li": // Listitem, format and append sb.append(indentation).append(type++).append(". ").append(sub.ownText()).append("\n"); break; case "ol": // Sublist case "ul": if( !sub.children().isEmpty() ) // If sublist is not empty (contains furhter items) { sb.append(createList(sub, depth + 1)); // Recursive call for the sublist } break; default: // "Illegal" tag, do furhter processing if required - output as an example here System.err.println("Not implemented tag: " + sub.tagName()); } } return sb.toString(); // Return the formated List } /** * Create an Indentationstring of <code>length</code> blanks. * * @param length Size of indentation * @return Indentationstring */ private String createIndentation(int length) { StringBuilder sb = new StringBuilder(length); for( int i=0; i<length; i++ ) { sb.append(' '); } return sb.toString(); } 

Testcode:

  Document doc = ... // Load / parse your document here Element listRoot = doc.select("ol").first(); // Select the root-element (!) of the list here. final String output = createList(listRoot, 0); // Convert the list System.out.println(output); // Ouput 

Result:

Input (HTML):

 <ol type="1"> <li>Test1</li> <ol type="a"> <li>TestA1</li> <li>TestB1</li> </ol> <li>Test2</li> <ol type="a"> <li>TestA2</li> <li>TestB2</li> </ol> </ol> 

Output:

 1. Test1 a. TestA1 b. TestB1 2. Test2 a. TestA2 b. TestB2 

Here it is!: -)

+1
source

Source: https://habr.com/ru/post/1397515/


All Articles