Position when comparing and extracting text

I extracted the PDF using itextsharp and then converted to html to compare two PDF files for their stlye. in this I added a left and top position to align the text. But as soon as I get superscript text, the position of the text changes. But when I check firebug, the position of the left and right position is the same as I checked with adobe illustrator. can anyone call me. Why is this happening. In the example I posted, number 7 should be below the "was", but it is far from the "was" enter image description here el away from was.

Vector curBaseline = renderInfo.GetBaseline().GetStartPoint(); Vector topRight = renderInfo.GetAscentLine().GetEndPoint(); y_direction_source = Form1.Pagesize_source +( height_extract_source_page - curBaseline[Vector.I2]); this.result.AppendFormat("<p style=\"left:{0}pt;top:{1}pt;\">" curBaseline[Vector.I1], y_direction_source); 

result:

 <p style="font-family:TimesNewRoman;font-size:12.2618001271429pt;font-weight:;font-style:;left:42pt;top:120.2399pt;position:absolute;"> <p style="background: cyan">training</p> <p style="background: cyan">camps</p> <pp style="background: cyan">in</p> <pp style="background: cyan">Afghanistan</p> <pp style="background: cyan">.</p> </p> <pp style="font-family:TimesNewRoman;font-size:10.2386067682737pt;font-weight:;font-style:;left:441.48pt;top:114.72pt;position:absolute;"> <pp style="background: cyan">7</p> </p> 
+4
source share
1 answer

I took your code and ran it in Chrome. An absolute offset of 42 leads to a much larger gap between the point and the number 7 at my end. I also had to rebuild some of P and PP to get similair result.

It takes the browser 182 pixels to write "training camps in Afghanistan." but the difference in position is about 302 in the sample. It looks like 302-42 that the parent has an offset of 260?

My assumption, based on the code that I see here, is that the line is "training camps in Afghanistan." is positioned inline in another element, but 7 is not. This is not a problem, but the extra gap makes labor more difficult.

On the other hand, it also explains why this can happen. There is no such thing as a super or index. To act as if we scale fonts and print them using offsets. This is no different for the PDF you are translating.

As a result, 7 is positioned differently and is written with a different font size than the rest of the text. Now I don’t know how your original PDF works with details, but the effect you see may be a margin difference.

In PDF, you can adjust several different font settings, for example, word spacing or indicate the level of indentation. I would say that your normal line offset has a left margin on the page, and 7 is set to absolute, or the font used has some special settings, making the sentence longer.

Since you already have Illustrator, you can check, not 7, but. located in the same place. I assume that this is not so, and 7 is correct. It just looks weird, but in fact all other simple texts are located differently in your html.

What you can do is that each word is positioned as absolute, not just strings. This would compensate for any differences in fonts, type settings, browser, or other effects.

+1
source

Source: https://habr.com/ru/post/1479519/


All Articles