How to split an RTF file into lines?

I am trying to split an RTF file into lines (in my code), and I do not quite understand it, mainly because I really do not understand the full RTF format. It seems that the lines can be split into \ par or \ pard or \ par \ pard or any number of fun combinations.

I am looking for a piece of code that breaks a file into lines in any language.

+3
source share
3 answers

I coded a quick and dirty routine, and it seems to work in almost everything that I could pounce on it. It is in VB6, but easily translates into something else.

Private Function ParseRTFIntoLines(ByVal strSource As String) As Collection
    Dim colReturn As Collection
    Dim lngPosStart As Long
    Dim strLine As String
    Dim sSplitters(1 To 4) As String
    Dim nIndex As Long

    ' return collection of lines '

    ' The lines can be split by the following '
    ' "\par"                                  '
    ' "\par "                                 '
    ' "\par\pard "                            '

    ' Add these splitters in order so that we do not miss '
    ' any possible split combos, for instance, "\par\pard" is added before "\par" '
    ' because if we look for "\par" first, we will miss "\par\pard" '
    sSplitters(1) = "\par \pard"
    sSplitters(2) = "\par\pard"
    sSplitters(3) = "\par "
    sSplitters(4) = "\par"

    Set colReturn = New Collection

    ' We have to find each variation '
    ' We will look for \par and then evaluate which type of separator is there '

    Do
        lngPosStart = InStr(1, strSource, "\par", vbTextCompare)
        If lngPosStart > 0 Then
            strLine = Left$(strSource, lngPosStart - 1)

            For nIndex = 1 To 4
                If StrComp(sSplitters(nIndex), Mid$(strSource, lngPosStart, Len(sSplitters(nIndex))), vbTextCompare) = 0 Then
                    ' remove the 1st line from strSource '
                    strSource = Mid$(strSource, lngPosStart + Len(sSplitters(nIndex)))

                    ' add to collection '
                    colReturn.Add strLine

                    ' get out of here '
                    Exit For
                End If
            Next
        End If

    Loop While lngPosStart > 0

    ' check to see whether there is a last line '
    If Len(strSource) > 0 Then colReturn.Add strSource

    Set ParseRTFIntoLines = colReturn
End Function
+1
source

(1.9.1) (. - ).

, , "" , , .

+1

O'Reilly RTF Pocket Guide . ?

. 13 :

RTF:

  • \pard \(, "".
  • RTF, (, , ).
  • N- , {, }. (: , {, }, 60- .)

Or are you thinking about extracting plaintext as strings and doing it regardless of the plaintext language?

+1
source

Source: https://habr.com/ru/post/1715679/


All Articles