Tesseract-ocr how to enable baseapi.h

I followed the instructions that I found in the tessesract form on how to enable baseapi.h.

I use:

VS2010
Tesseract 3.01 version

I am trying to understand how to use baseapi.h.

test program:

#define __MSW32__ #include "baseapi.h" using namespace tesseract; int _tmain(int argc, _TCHAR* argv[]) { TessBaseAPI *myTestApi; myTestApi=new TessBaseAPI(); //myTestApi->Init("d:/temp.jpg","eng"); return 0; } 

form gide:

add the following folders to the additional Include directories (properties) - to troubleshoot the file after enabling "baseapi.h"

tesseract-3.01 / API

tesseract-3.01 / ccmain

tesseract-3.01 / ccutil

tesseract-3.01 / ccstruct enter image description here

added the following libraries to "Properties / Linker / Input / Additional Dependancies" to use Tesseract and Leptonica libs libtesseract.lib; liblept.lib

enter image description here // added the following paths to "Properties / Linkers / General / Additional Library Directories" to find the Tesseract and Leptonica Tesseract-3.01 / VS2010 / Release Tesseract-3.01 / VS2008 / Libraries

enter image description here

And I'm trying to run now

enter image description here

So, I'm trying to find libs libtesseract.lib and replace libtesseract_tessopt.lib and then run enter image description here

 1>------ Build started: Project: test4, Configuration: Debug Win32 ------ 1> test4.cpp 1>test4.obj : error LNK2019: unresolved external symbol "public: __thiscall tesseract::TessBaseAPI::TessBaseAPI(void)" ( ??0TessBaseAPI@tesseract @@ QAE@XZ ) referenced in function _wmain 1>c:\users\eran0708\documents\visual studio 2010\Projects\test4\Debug\test4.exe : fatal error LNK1120: 1 unresolved externals ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ========== 

Is there a solution to the problem?

thanks,

eran

! [enter image description here] [6]

! [enter image description here] [7]

+4
source share
2 answers

Here is what I did to compile it:

1.) Copy all the header files into one include directory, so later you need to add only §(TESS_DIR)\include to the include directory.

copy the leptonica headers to $(TESS_DIR)\include\leptonica .

2.) Open vs2010\tesseract.sln and compile all the configurations. Then copy all the lib files to $(TESS_DIR)\lib\debug and $(TESS_DIR)\lib\release . Then add these directories to your build settings.

3.) Copy the compiled libtesseract.dll and liblept168.dll , as well as the tessdata folder, containg eng.traineddata , into the Release folder of your project.

4.) Add these libraries as additional dependencies:

 libtesseract.lib liblept168.lib 

5.) #include <baseapi.h>

+3
source

I realized that if you use 2010 visual studios and use windows forms / designer, you can easily add this method without any problems.

1) add the following projects to the project (I warn you once, do not add a tesseract solution or do not change any settings in the added projects if you do not like to hate yourself) ccmain ccstruct ccutil classify the cutil cube DICT image libtesseract nutral_networks textord viewer wordrec

you can add others, but you really don’t want everything that was built into your project, right? naaa, build them separately

2) go to the properties of your project and add libtesseract as a reference, now you can see that it is visible as a project, it will make it so that your project develops quickly without studying millions of warnings in tesseract. [general properties] → [add link]

3) right-click your project in the solution explorer and select the project dependencies, make sure it depends on libtesseract or even all of them, it just means that they build in front of your project.

4) tesseract 2010 visual studio projects contain a number of configuration settings for aka release, release.dll, debug, debug.dll, it seems that release.dll parameters create the correct files. First set the solution output to release.dll. Select project properties. Then click Configuration Manager. If this is not available, do it, click the SOLUTION properties in the decision tree and click the configuration tab, you will see a list of projects and the corresponding configuration parameters. You will notice that your project is not configured for release.dll, even if it is. If you took the second route, you still need to click the configuration manager. Then you can edit the settings, click new in the settings of your projects and call its release.dll ... just like the rest, and copy the settings from the release. Do the same for Debug so that you have the name debug.dll copied from the debug settings. wheel ... almost done

5) Do not try to change the tesseracts settings according to yours ... it will not work ... and when the new release comes out, you can’t just “drop it” and leave. Accept the fact that in this state your new modes are Release.dll and Debug.dll. not stress ... you can come back when it is finished and remove projects from your solution.

6) Guess where the libraries and DLLs go? in your project, you may or may not need to add library directories. Some people say that you need to drop all the headers in one folder, so they only need to add one inbox, but not me. I want to be able to delete the tesseract folder and reload it from ZIP files without additional work ... and be fully prepared to upgrade in one step or restore it if I ruin the code. This is a little work, and you can use the code with it instead of the settings that I do, but you must include all the folders containing the header files in the tesseract 2010 project folder and leave them alone.

7) there is no need to add files to your project. just these lines of code ..... I included some additional code that converts from one foreign dataset into a friendly version of tiff, without having to save / load the file. arent i nice?

8) now you can completely debug the debug.dll and release.dll files as soon as you successfully put it into your project even after you can delete all the added projects, and this will be inconvenient. no additional compilation or errors. completely debugged, all natural.

9) If I remember correctly, I could not get around the fact that I had to copy the files in 2008 / lib / to the release folder of my projects .... darn.

 In my projects "functions.h" I put #pragma comment (lib, "liblept.lib" ) #define _USE_TESSERACT_ #ifdef _USE_TESSERACT_ #pragma comment (lib, "libtesseract.lib" ) #include <baseapi.h> #endif #include <allheaders.h> 

in my main project, I put this in a class as a member: tesseract :: TessBaseAPI * readSomeNombers;

and of course I included "functions.h" somewhere

then I put this in my class constructor:

 readSomeNombers = new tesseract::TessBaseAPI(); readSomeNombers ->Init(NULL, "eng" ); readSomeNombers ->SetVariable( "tessedit_char_whitelist", "0123456789,." ); 

then I created this function, a member of the class: and a member of the class to output as output, I do not hate it, I do not like the return variables. Not my style. Memory for a peak does not need to be destroyed when used inside a member function in this way, I believe, and my test shows that this is a safe way to call these functions. But, by all means, you can do everything.

 void Gaara::scanTheSpot() { Pix *someNewPix; char* outText; ostringstream tempStream; RECT tempRect; someNewPix = pixCreate( 200 , 40 , 32 ); convertEasyBmpToPix( &scanImage, someNewPix, 87, 42 ); readSomeNombers ->SetImage(someNewPix); outText = readSomeNombers ->GetUTF8Text(); tempStream.str(""); tempStream << outText; classMemeberVariable = tempStream.str(); //pixWrite( "test.bmp", someNewPix, IFF_BMP ); } 

An object that has information that I want to scan is in memory and & scanImage points to it. This is from the EasyBMP library, but it doesn’t matter.

What do I mean in the function "functions.h" / "functions.cpp", by the way, I do a little extra processing here while I'm in a loop, namely, thinning out the characters and making them black and white and reverse black and white that is cool. At this stage of my development, I am still looking for ways to improve recognition. Although for my suggestions this has not yet given bad data. My opinion is to use the default Tess data for simplicity. I act heuristically to solve a very complex problem.

 void convertEasyBmpToPix( BMP *sourceImage, PIX *outputImage, unsigned startX, unsigned startY ) { int endX = startX + ( pixGetWidth( outputImage ) ); int endY = startY + ( pixGetHeight( outputImage ) ); unsigned destinationX; unsigned destinationY = 0; for( int yLoop = startY; yLoop < endY; yLoop++ ) { destinationX = 0; for( int xLoop = startX; xLoop < endX; xLoop++ ) { if( isWhite( &( sourceImage->GetPixel( xLoop, yLoop ) ) ) ) { pixSetRGBPixel( outputImage, destinationX, destinationY, 0,0,0 ); } else { pixSetRGBPixel( outputImage, destinationX, destinationY, 255,255,255 ); } destinationX++; } destinationY++; } } bool isWhite( RGBApixel *image ) { if( //destination->SetPixel( x, y, source->GetPixel( xLoop, yLoop ) ); ( image->Red < 50 ) || ( image->Blue < 50 ) || ( image->Green < 50 ) ) { return false; } else { return true; } } 

one thing i don't like is the way to declare the pixel size outside the function. It seems that if I try to do this inside the function, I will have unexpected results .... if the memory is aligned and inside it is destroyed when I leave.

gma i l Of course, not my most elegant work, but I just got rid of it because of its simplicity. Why I do not share this, I do not know. I had to keep this to myself. What is my name? Kage.Sabaku.No.Gaara

Before I let you go, I must mention the subtle differences between my default window and applications. namely, I use a "multibyte" character set. project features ... and such .. give the dog a bone, perhaps a vote?

pps I do not like to say this, but I made one change to host.c, if you use 64 bits, you can do the same. Otherwise yours on your own ..... but my reason was a little crazy that you do not need, it seems to be really stupid

 typedef unsigned int uinT32; #if (_MSC_VER >= 1200) //%%% vkr for VC 6.0 typedef _int64 inT64; typedef unsigned _int64 uinT64; #else typedef long long int inT64; typedef unsigned long long int uinT64; #endif //%%% vkr for VC 6.0 typedef float FLOAT32; typedef double FLOAT64; typedef unsigned char BOOL8; 
0
source

Source: https://habr.com/ru/post/1381501/


All Articles