How to find consecutive repeats of an unknown substring

I am trying to make a system for factorization (if the term is true in chemistry) of this extended chemical formula, such as C6H2NO2NO2NO2CH3 , in brackets so that it is C₆Hβ‚‚(NOβ‚‚)₃CH₃ . (Ignore the index or its absence in the first case). The problem is that I don’t know what the repeating molecule will be, or even how long it will be. How do I find and recount repetitions?

In context, here is my code so far, which generates a formula from a 2D list of elements:

 private String getFormula(List<List<Element>> elements) { String formula = ""; //TODO Switch out for StringBuilder for(List<Element> currentElement : elements) { formula += currentElement.get(0).getSymbol(); //Every element per list is identical, so looking at 0 will always be safe if(currentElement.size() > 1) formula += currentElement.size(); //Only display a number if there is more than 1 element } return formula; } 
+5
source share
5 answers

EDIT Updated to take into account order. Thanks @JakeStanger!
2nd EDIT Updated to reflect new state when molecule ends |


I used regex to split on | , since from this String we know that a new molecule begins after | . I used a Hashmap to track the number of molecules of each type. In the end, I repeated each value in the Hashmap and added to the String result depending on whether it was a single molecule or not. Hooray!

  public static String factorise(String input) { String result = ""; Map<String, Integer> molecules = new LinkedHashMap<>(); String[] res = input.split("\\|"); for (String t : res) { //Check if we already have this element in our map if (!molecules.containsKey(t)) { //If not then add it and set the count to 1 molecules.put(t, 1); } else { //If we do then update the count by 1 molecules.put(t, molecules.get(t) + 1); } } //Iterate through each molecule for (String key : molecules.keySet()) { if (molecules.get(key) == 1) { //If the count is only at one, then we just need to append it. result += key; } else { //Otherwise, we need the parentheces and the number of repetitions followed after result = result + "(" + key + ")" + molecules.get(key); } } return result; } 

Launch

  System.out.println(factorise("C6|H2|NO2|NO2|NO2|CH3|OH|OH")); System.out.println(factorise("HO|HO")); 

It is installed at startup:

mileage:
C6H2 (NO2) 3CH3 (OH) 2
(HO) 2
BUILD SUCCESSFUL (total time: 0 seconds)

+2
source

This answer gives the code: String result = source.replaceAll("(.+)\\1+", "$1") , which replaces all duplicate substrings.

I think a little modification of this code should do what you want. If you use "($1)" as a replacement, it will wrap the match in brackets. You can probably go through the substitution and determine what number should appear after the brackets.

To prevent reuse of regular expressions from previous numbers, try "([A-Za-z]+[1-9]*)\\1+" .

This link explains how to count the number of matches. This is a little trickier:

  Pattern pattern = Pattern.compile("([A-Za-z]+[1-9]*)\\1+"); Matcher matcher = pattern.matcher(YOUR_CHEM_STRING); int count = 0; String prior=""; while (matcher.find()){ if(m.group().equals(prior){ count++; }else{ YOUR_CHEM_STRING.replaceAll("([A-Za-z]+[1-9]*)\\1+","($1)"+count); count=0; } } 
+2
source

You think that you separate the elements of the formula in the list and then analyze it by counting consecutive repetitions. When the counter is greater than 1, you will add it with parentheses.

 String formula = "C6H2NO2NO2NO2CH3"; Pattern p = Pattern.compile("[a-zA-Z]+[0-9]+"); Matcher m = p.matcher(formula); List<String> parts = new ArrayList<String>(); while(m.find()) { parts.add(m.group()); } String shrink = ""; int count = 0; for(int i=0; i<parts.size(); i++) { count++; if(i+1 == parts.size() || !parts.get(i+1).equals(parts.get(i))) { if(count == 1) shrink += parts.get(i); else shrink += "("+parts.get(i)+")"+count; count = 0; } } System.out.println(shrink); // result = "C6H2(NO2)3CH3" 

If you can send a list of items, try the following:

 public static String shortForumla(List<List<Element>> elements) { String shrink = ""; int count = 0; for(int i=0; i<elements.size(); i++) { String symbol = elements.get(i).get(0).symbol(); if(i+1 == elements.size() || !elements.get(i+1).get(0).symbol().equals(symbol)) { if(count == 1) shrink += symbol; else shrink += "("+symbol+")"+count; count = 0; } } return shrink; } 
+1
source

Here is an alternative solution that uses simple string parsing to find duplicate substrings.

 public static String factorise(String input) { StringBuilder result = new StringBuilder(); for (int start = 0; start < input.length(); start++) { char c = input.charAt(start); if (c >= '0' && c <= '9') { result.append(c); continue; } boolean foundRepeat = false; for (int end = start + 1; end <= input.length(); end++) { int length = end - start; if (end + length > input.length()) break; String sub = input.substring(start, end); String nextsub = input.substring(end, end + length); int nextpos = end + length; int count = 1; while (sub.equals(nextsub)) { count++; if (nextpos + length > input.length()) break; nextsub = input.substring(nextpos, nextpos + length); nextpos += length; } if (count > 1) { result.append("(" + sub + ")" + count); start += length * (count) - 1; foundRepeat = true; break; } } if (!foundRepeat) { result.append(c); } } return result.toString(); } 

Examples:

  • Enter exit
  • C6H2NO2NO2NO2CH3 : C6H2(NO2)3CH3
  • CaOHOH : Ca(OH)2
  • C6H2OH2ONO : C6(H2O)2NO
  • C6H2NO2NO2NO2CH3OHOH : C6H2(NO2)3CH3(OH)2
+1
source

Here is another solution using Map and ArrayList instead of regex . I inserted each molecule in the Map and increased its key value each time it appeared in the formula, and an ArrayList to maintain the order of the molecules.

 public static void main(String [] audi){ String s="C66H2NO2NO2NO2CH3"; Map<String, Integer > Formula=new HashMap<String, Integer >(); List E = new ArrayList(); String t="";int c=0,l=s.length(); for(int i=0;i<l;i++) { t=t.concat(""+s.charAt(i)); if(Character.isDigit(s.charAt(i))) { if(((i+1)<l && !Character.isDigit(s.charAt(i+1)) )|| i==l-1) { int count = Formula.containsKey(t) ? Formula.get(t) : 0; Formula.put(t, count + 1); if(!E.contains(t)) E.add(t); t=""; } } } //display for(int i=0;i<E.size();i++){ if(Formula.get(E.get(i))>1) System.out.print("("+E.get(i) + ")"+Formula.get(E.get(i))); else System.out.print(E.get(i)); } } 

Yield: C6H2 (NO2) 3CH3

0
source

Source: https://habr.com/ru/post/1246015/


All Articles