From what I understand, you have the input string S in the alphabet (allows you to call it A1), and you want to convert it to the string S ', which is its equivalent in the other alphabet A2. Actually, if I understand correctly, you want to generate a list of [S'1, S'2, ..., S'n] output lines that could potentially be equivalent to S.
One approach that comes to mind is that each word in the list of valid words in A2 generates a list of strings in A1 that matches. Using the example in your editing, we have
px->ac qy->ac pr->ab
(clarity for clarity added the extra valid word pr
)
Now that we know which possible series of input characters will always be mapped to a real word, we can use our table to create a Trie .
Each node will contain a pointer to a list of valid words in A2 that map to a sequence of characters in A1 that form the path from the Trie root to the current node.
So for our example, Trie would look something like this:
Root (empty) | a | V +---Node (empty)---+ | b | c | | VV Node (px,qy) Node (pr)
Starting from the root of the node, when characters are consumed, transitions are made from the current node to its child element, marked with the character consumed until we read the entire line. If a transition is not defined for this symbol at any point, the string entered does not exist in our trie and thus does not map to a valid word in our target language. Otherwise, at the end of the process, the list of words associated with the current node is a list of valid words onto which the input line is entered.
Besides the initial cost of building a trie (a trie can be sent in advance if we never want the list of valid words to be changed), it takes O (n) from the input length to find a display list of actual words.
Using Trie also provides the advantage that you can also use it to find a list of all valid words that can be generated by adding more characters to the end of the input ā that is, a prefix match. For example, if you supply the input character "a", we can use trie to search for all valid words that can begin with "a" ('px', 'qr', 'py'). But doing this is not as fast as finding an exact match.
Here's a quick hack solution (in Java):
import java.util.*; class TrieNode{ // child nodes - size of array depends on your alphabet size, // her we are only using the lowercase English characters 'a'-'z' TrieNode[] next=new TrieNode[26]; List<String> words; public TrieNode(){ words=new ArrayList<String>(); } } class Trie{ private TrieNode root=null; public void addWord(String sourceLanguage, String targetLanguage){ root=add(root,sourceLanguage.toCharArray(),0,targetLanguage); } private static int convertToIndex(char c){ // you need to change this for your alphabet return (c-'a'); } private TrieNode add(TrieNode cur, char[] s, int pos, String targ){ if (cur==null){ cur=new TrieNode(); } if (s.length==pos){ cur.words.add(targ); } else{ cur.next[convertToIndex(s[pos])]=add(cur.next[convertToIndex(s[pos])],s,pos+1,targ); } return cur; } public List<String> findMatches(String text){ return find(root,text.toCharArray(),0); } private List<String> find(TrieNode cur, char[] s, int pos){ if (cur==null) return new ArrayList<String>(); else if (pos==s.length){ return cur.words; } else{ return find(cur.next[convertToIndex(s[pos])],s,pos+1); } } } class MyMiniTransliiterator{ public static void main(String args[]){ Trie t=new Trie(); t.addWord("ac","px"); t.addWord("ac","qy"); t.addWord("ab","pr"); System.out.println(t.findMatches("ac")); // prints [px,qy] System.out.println(t.findMatches("ab")); // prints [pr] System.out.println(t.findMatches("ba")); // prints empty list since this does not match anything } }
This is a very simple program, without compression or acceleration, and works only on lowercase English characters for the input language. But it can be easily modified for other character sets.