Find input anagram on multiple lines ..?

Given a set of strings (large set) and an input string, you need to find all the anagrams of the input string efficiently. What data structure will you use. And using this, how do you find anagrams?

The things I thought about are the following:

  • Use of cards

    a) exclude all words with more / less letters than the entry.

    b) enter the input characters in the card

    c) Go through the map for each line and see if all the letters with their counter are present.

  • Using Tries

    a) Put all lines that have the correct number of characters in trie.

    b) move each branch and go deeper if the letter is in the entrance.

    c) if the leaf reaches the word, this is an anagram

Can anyone find a better solution?

Are there any problems you find in the above approaches?

+4
source share
3 answers

Create a frequency map from each word and compare these cards.

Pseudocode:

class Word string word map<char, int> frequency Word(string w) word = w for char in word int count = frequency.get(char) if count == null count = 0 count++ frequency.put(char, count) boolean is_anagram_of(that) return this.frequency == that.frequency 
+5
source

You can create a hash map where the key is sorted (word), and the value is a list of all words that have been sorted, give the corresponding key:

 private Map<String, List<String>> anagrams = new HashMap<String, List<String>>(); void buildIndex(){ for(String word : words){ String sortedWord = sortWord(word); if(!anagrams.containsKey(sortedWord)){ anagrams.put(sortedWord, new ArrayList<String>()); } anagrams.get(sortedWord).add(word); } } 

Then you just look at the sorted word in the hashmap that you just created and you will have a list of all anagrams.

+4
source
 import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Set; /* *Program for Find Anagrams from Given A string of Arrays. * *Program Maximum Time Complexity is O(n) + O(klogk), here k is the length of word. * * By removal of Sorting, Program Complexity is O(n) * **/ public class FindAnagramsOptimized { public static void main(String[] args) { String[] words = { "gOd", "doG", "doll", "llod", "lold", "life", "sandesh", "101", "011", "110" }; System.out.println(getAnaGram(words)); } // Space Complexity O(n) // Time Complexity O(nLogn) static Set<String> getAnaGram(String[] allWords) { // Internal Data Structure for Keeping the Values class OriginalOccurence { int occurence; int index; } Map<String, OriginalOccurence> mapOfOccurence = new HashMap<>(); int count = 0; // Loop Time Complexity is O(n) // Space Complexity O(K+2K), here K is unique words after sorting on a for (String word : allWords) { String key = sortedWord(word); if (key == null) { continue; } if (!mapOfOccurence.containsKey(key)) { OriginalOccurence original = new OriginalOccurence(); original.index = count; original.occurence = 1; mapOfOccurence.put(key, original); } else { OriginalOccurence tempVar = mapOfOccurence.get(key); tempVar.occurence += 1; mapOfOccurence.put(key, tempVar); } count++; } Set<String> finalAnagrams = new HashSet<>(); // Loop works in O(K), here K is unique words after sorting on // characters for (Map.Entry<String, OriginalOccurence> anaGramedWordList : mapOfOccurence.entrySet()) { if (anaGramedWordList.getValue().occurence > 1) { finalAnagrams.add(allWords[anaGramedWordList.getValue().index]); } } return finalAnagrams; } // Array Sort works in O(nLogn) // Customized Sorting for only chracter works in O(n) time. private static String sortedWord(String word) { // int[] asciiArray = new int[word.length()]; int[] asciiArrayOf26 = new int[26]; // char[] lowerCaseCharacterArray = new char[word.length()]; // int characterSequence = 0; // Ignore Case Logic written in lower level for (char character : word.toCharArray()) { if (character >= 97 && character <= 122) { // asciiArray[characterSequence] = character; if (asciiArrayOf26[character - 97] != 0) { asciiArrayOf26[character - 97] += 1; } else { asciiArrayOf26[character - 97] = 1; } } else if (character >= 65 && character <= 90) { // asciiArray[characterSequence] = character + 32; if (asciiArrayOf26[character + 32 - 97] != 0) { asciiArrayOf26[character + 32 - 97] += 1; } else { asciiArrayOf26[character + 32 - 97] = 1; } } else { return null; } // lowerCaseCharacterArray[characterSequence] = (char) // asciiArray[characterSequence]; // characterSequence++; } // Arrays.sort(lowerCaseCharacterArray); StringBuilder sortedWord = new StringBuilder(); int asciiToIndex = 0; // This Logic uses for reading the occurrences from array and copying // back into the character array for (int asciiValueOfCharacter : asciiArrayOf26) { if (asciiValueOfCharacter != 0) { if (asciiValueOfCharacter == 1) { sortedWord.append((char) (asciiToIndex + 97)); } else { for (int i = 0; i < asciiValueOfCharacter; i++) { sortedWord.append((char) (asciiToIndex + 97)); } } } asciiToIndex++; } // return new String(lowerCaseCharacterArray); return sortedWord.toString(); } } 
0
source

Source: https://habr.com/ru/post/1392463/


All Articles