How to handle non-ASCII characters in Java using PDPageContentStream / PDDocument

I use PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes content from a web form and places the content in a PDF document.

Example. The user will fill in inputTextArea (JSF tag) in the form and convert to PDF. I cannot handle non-ASCII characters.

How do I handle non-ASCII characters, or at least cut them out before putting them in a PDF file. Please help me with any suggestions or point me to any resources. Thank!

+3
source share
1 answer

Since you are using JSF on JSP instead of Facelets (which is already implicitly using UTF-8), follow these steps to avoid using the default platform encoding (which is often ISO-8859-1, which is the wrong choice for handling most "non- ASCII "):

  • Add the following line on top of all JSPs:

    <%@ page pageEncoding="UTF-8" %>
    

    This sets the response encoding to UTF-8 and sets the header encoding of the contents of the HTTP response to UTF-8. The latter will tell the client (webbrowser) to display and submit the page using the form using UTF-8.

  • Create Filterone that performs the following actions in doFilter():

    request.setCharacterEncoding("UTF-8");
    

    Match this to FacesServletas follows:

    <filter-mapping>
        <filter-name>nameOfYourCharacterEncodingFilter</filter-name>
        <servlet-name>nameOfYourFacesServlet</servlet-name>
    </filter-mapping>
    

    This sets the request encoding of all JSF POST requests to UTF-8.

Unicode JSF. PDFBox, , iText, , , Unicode/UTF-8, , . , - .

. :

+2

Source: https://habr.com/ru/post/1774982/


All Articles