If you are looking for a table that allows you to calculate the βreplacement costβ based on visual similarities, I have been looking for such a thing for a while with little success, so I started looking at it as a new problem. I do not work with OCR, but I am looking for a way to limit the search parameters in the probabilistic search for characters that are incorrectly typed . Since they are mistakenly introduced because the person visually confused the characters, the same principle should apply to you.
My approach was to classify letters based on their impact components in an 8-bit field. bit - from left to right:
7: Left Vertical 6: Center Vertical 5: Right Vertical 4: Top Horizontal 3: Middle Horizontal 2: Bottom Horizontal 1: Top-left to bottom-right stroke 0: Bottom-left to top-right stroke
For lowercase characters, the descriptors on the left are written in bit 1, and the descriptors on the right in bit 0, as diagonals.
With this scheme, I came up with the following meanings, which try to rank the characters according to visual similarity.
m: 11110000: F0 g: 10111101: BD S,B,G,a,e,s: 10111100: BC R,p: 10111010: BA q: 10111001: B9 P: 10111000: B8 Q: 10110110: B6 D,O,o: 10110100: B4 n: 10110000: B0 b,h,d: 10101100: AC H: 10101000: A8 U,u: 10100100: A4 M,W,w: 10100011: A3 N: 10100010: A2 E: 10011100: 9C F,f: 10011000: 98 C,c: 10010100: 94 r: 10010000: 90 L: 10000100: 84 K,k: 10000011: 83 T: 01010000: 50 t: 01001000: 48 J,j: 01000100: 44 Y: 01000011: 43 I,l,i: 01000000: 40 Z,z: 00010101: 15 A: 00001011: 0B y: 00000101: 05 V,v,X,x: 00000011: 03
This, so to speak, is too primitive for my purposes and requires more work. You can use it, however, or perhaps tailor it to suit your needs. The scheme is pretty simple. This rating is for monospatial fonts. If you use the sans-serif font, then you will probably have to redo the values.
This table is a hybrid table that includes all the characters, lower and upper case, but if you divide it only into upper case and only in lower case, this may turn out to be more efficient, and this will also allow you to apply a specific penalty case.
Keep in mind that these are early experiments. If you see a way to improve it (for example, by changing the sequence of bits), feel free to do it.