There is no api for jsoup-to- "formated text" , but you can convert the lists yourself:
- iterate over all children of the
ul / ol element, which is the root of the list - if element : format and add output line
- if sublist : make 1. - but with a sublist element - and add the result
Example:
In this example, I use the type attribute to determine which bullet is required, and use the symbol (!) To index the elements. If there is no corresponding attribute, char 1 .
Implementation:
/** * Convert the Listelement <code>root</code> to a formated string-representation. * * @param root Rootelement of the list (normally 'ul' or 'ol' tag) * @param depth Depth of the list (<code>=0</code> for root element) * @return List as String */ public String createList(Element root, int depth) { final String indentation = createIndentation(depth); // create indentation StringBuilder sb = new StringBuilder(); final String typeAttr = root.attr("type"); // Get the character used as bullet (= 'type' attribute) char type = typeAttr.isEmpty() ? '1' : typeAttr.charAt(0); // if 'type' attribute: use it, else: use '1' instead for( Element sub : root.children() ) // Iterate over all Childs { // If Java < 7: use if/else if/else here switch( sub.tagName() ) // Check if the element is an item or a sublist { case "li": // Listitem, format and append sb.append(indentation).append(type++).append(". ").append(sub.ownText()).append("\n"); break; case "ol": // Sublist case "ul": if( !sub.children().isEmpty() ) // If sublist is not empty (contains furhter items) { sb.append(createList(sub, depth + 1)); // Recursive call for the sublist } break; default: // "Illegal" tag, do furhter processing if required - output as an example here System.err.println("Not implemented tag: " + sub.tagName()); } } return sb.toString(); // Return the formated List } /** * Create an Indentationstring of <code>length</code> blanks. * * @param length Size of indentation * @return Indentationstring */ private String createIndentation(int length) { StringBuilder sb = new StringBuilder(length); for( int i=0; i<length; i++ ) { sb.append(' '); } return sb.toString(); }
Testcode:
Document doc = ... // Load / parse your document here Element listRoot = doc.select("ol").first(); // Select the root-element (!) of the list here. final String output = createList(listRoot, 0); // Convert the list System.out.println(output); // Ouput
Result:
Input (HTML):
<ol type="1"> <li>Test1</li> <ol type="a"> <li>TestA1</li> <li>TestB1</li> </ol> <li>Test2</li> <ol type="a"> <li>TestA2</li> <li>TestB2</li> </ol> </ol>
Output:
1. Test1 a. TestA1 b. TestB1 2. Test2 a. TestA2 b. TestB2
Here it is!: -)
source share