If you are using a pre-trained model, you need to save this output and enter the images into the character recognition network if a neural network or other approach is used.
What you do is "scene text recognition". You can check the reading of text in the wild using convolutional neural networks , paper , here a demo and the home page . Github chongyangtao has a list of related resources.
source share