Tips for optimizing Java code

So, I wrote a spellchecker in Java, and everything works as it should. The only problem is that if I use a word in which the maximum allowable editing distance is too large (for example, 9), then my code runs out of memory. I have profiled my code and dumped a bunch into a file, but I don't know how to use it to optimize my code.

Can anyone help? I am more than ready to put the file / use any other approach that a person may have.

-Edit -

Many people asked for more information in the comments. I thought other people would find them useful, and they might be buried in the comments. Here they are:

  • I use Trie to store the words themselves.

  • To increase the efficiency of time, I do not calculate the preliminary version of Levenshtein Distance, but I calculate it as I go. I mean, I only store two rows of the LD table in memory. Since Trie is a prefix tree, this means that every time I return a node, the previous letters of the word (and therefore the distance for these words) remain unchanged. Therefore, I only calculate the distance with this new letter, with the previous line remaining unchanged.

  • The offers that I generate are stored in the HashMap. LD table rows are stored in ArrayLists.

Here is the function code in Trie, which leads to the task. Building Trie is pretty straight forward, and I haven't included the code for the same here.

/*
 * @param letter: the letter that is currently being looked at in the trie
 *        word: the word that we are trying to find matches for
 *        previousRow: the previous row of the Levenshtein Distance table
 *        suggestions: all the suggestions for the given word
 *        maxd: max distance a word can be from th query and still be returned as suggestion
 *        suggestion: the current suggestion being constructed
 */


public void get(char letter, ArrayList<Character> word, ArrayList<Integer> previousRow, HashSet<String> suggestions, int maxd, String suggestion){

// the new row of the trie that is to be computed.
ArrayList<Integer> currentRow = new ArrayList<Integer>(word.size()+1);
currentRow.add(previousRow.get(0)+1);

int insert = 0;
int delete = 0;
int swap = 0;
int d = 0;

for(int i=1;i<word.size()+1;i++){
    delete = currentRow.get(i-1)+1;
    insert = previousRow.get(i)+1;

    if(word.get(i-1)==letter)
    swap = previousRow.get(i-1);
    else
    swap = previousRow.get(i-1)+1;

    d = Math.min(delete, Math.min(insert, swap));
    currentRow.add(d);
}

// if this node represents a word and the distance so far is <= maxd, then add this word as a suggestion
if(isWord==true && d<=maxd){
    suggestions.add(suggestion);
    }

// if any of the entries in the current row are <=maxd, it means we can still find possible solutions. 
// recursively search all the branches of the trie
for(int i=0;i<currentRow.size();i++){
    if(currentRow.get(i)<=maxd){
    for(int j=0;j<26;j++){
        if(children[j]!=null){
        children[j].get((char)(j+97), word, currentRow, suggestions, maxd, suggestion+String.valueOf((char)(j+97))); 
        }
    }
    break;
    }   
}
}
+2
3

, , "" .

, "" .

: " , 9" " ".

, , , "9" . ( ) , , .

( , Levenhstein Edit Distance of 9, 10 10 9 )

9 OutOfMemory :

  • 1 "ptmizing", ( a z), 9 * 26 (.. 324 ) [ 9 , 26 )
  • 2, , , , 10 * 26 * 324 (60 840)
  • 3 : 17 400 240

, , ( , ..). , . "" .

, "" , .., : , .

... . ( ) , .

, .

- : , , , " " , , . , "" "" , , , "" .

( : , , ..: , )

   @Test
    public void spellCheck() {
        final String src = "misspeled";
        final Set<String> validWords = new HashSet<String>();
        validWords.add("boing");
        validWords.add("Yahoo!");
        validWords.add("misspelled");
        validWords.add("stackoverflow");
        final List<String> candidates = findNonSortedCandidates( src, validWords );
        final SortedMap<Integer,String> res = computeLevenhsteinEditDistanceForEveryCandidate(candidates, src);
        for ( final Map.Entry<Integer,String> entry : res.entrySet() ) {
            System.out.println( entry.getValue() + " @ LED: " + entry.getKey() );
        }
    }

    private SortedMap<Integer, String> computeLevenhsteinEditDistanceForEveryCandidate(
            final List<String> candidates,
            final String mispelledWord
    ) {
        final SortedMap<Integer, String> res = new TreeMap<Integer, String>();
        for ( final String candidate : candidates ) {
            res.put( dynamicProgrammingLED(candidate, mispelledWord), candidate );
        }
        return res;
    }

    private int dynamicProgrammingLED( final String candidate, final String misspelledWord ) {
        return Levenhstein.getLevenshteinDistance(candidate,misspelledWord);
    }

, . ( , , ;)

    private List<String> findNonSortedCandidates( final String src, final Set<String> validWords ) {
        final List<String> res = new ArrayList<String>();
        res.addAll( allCombinationAddingOneLetter(src, validWords) );
//        res.addAll( allCombinationRemovingOneLetter(src) );
//        res.addAll( allCombinationInvertingLetters(src) );
        return res;
    }

    private List<String> allCombinationAddingOneLetter( final String src, final Set<String> validWords ) {
        final List<String> res = new ArrayList<String>();
        for (char c = 'a'; c < 'z'; c++) {
            for (int i = 0; i < src.length(); i++) {
                final String candidate = src.substring(0, i) + c + src.substring(i, src.length());
                if ( validWords.contains(candidate) ) {
                    res.add(candidate); // only adding candidates we know are valid words
                }
            }
            if ( validWords.contains(src+c) ) {
                res.add( src + c );
            }
        }
        return res;
    }
+4

, ... :

  • , ( , ): - ? - , .

  • - , - .

  • , , - , .

  • After you have reduced your problem to a certain part of your code, and you cannot understand why there are so many objects in your memory $FOO, publish a fragment of the corresponding part.

0
source

Source: https://habr.com/ru/post/1528242/


All Articles