The translate method of Unicode objects is the easiest and fastest way to perform the required transliteration. (I assume that you are using Unicode rather than simple byte strings, which would make it impossible for characters such as 'ΰ€ͺΰ€€ΰ₯ΰ€°' !).
All you have to do is position the transliteration dictionary correctly, as indicated in the documents I pointed out to you:
each key must be an integer, a Unicode code character; for example, 0x0904 is the code point for ΰ€ , AKA "DEVANAGARI LETTER SHORT A", so for transliteration you must use the integer 0x0904 as the key in the dict (equivalent to the decimal value 2308). (For a code point table for many South Asian scenarios, see this pdf ).
the corresponding value can be a Unicode sequence number, a Unicode string (presumably you will use transliteration, for example u'a' for your task, if you want to transliterate the letter A Devanagari short A into the English letter' a ') or None (if during "transliteration" you just want to remove instances of this Unicode character).
Characters that are not found as keys in the dict are passed intact from input to output.
Once your recorder is laid out in this way, output_text = input_text.translate(thedict) does all the transliteration for you - and pretty damn fast. You can apply this to Unicode blocks of text of any size that will be conveniently located in memory - basically, it makes one text file, since time will be very good on most machines (for example, wonderful - and huge - Mahabharata takes no more than several tens of megabytes in any of the freely downloadable forms - Sanskrit [[stitched with both Devanagari and Latin transliterated forms]], English translation - from this site ).
source share