One way to achieve this is to use the Vision framework to detect the text in your image, which gives you a list of rectangles. Then use a neural network that learns to recognize text on each of these rectangles. You can use Vision to control Core ML, but you still need to provide it with a suitable neural network. You can find pre-prepared networks for this on the Internet, but you will need to convert them to Core ML using tools provided by Apple.
source
share