Lightweight rich text XML format?

I am writing a main word processing application and trying to set my own “internal” format, one that my code analyzes to display on the screen. I would like it to be XML so that in the future I can just write XSLT to convert it to ODF or XHTML or something else.

When looking for existing standards for use, the only one that looks promising is ODF. But it looks like a massive kink for what I need. All I need is paragraph tags, font selection, font size and decoration ... this is pretty much the case. It would take me a long time to implement even minimal ODF rendering, and I'm not sure if it is worth it.

Now I am going to create my own XML format, but this is not a good practice. It’s better to use the standard, especially since then, I can probably find XSLTs that may be needed in the future, already written.

Or should I just bite the bullet and implement ODF?

EDIT: Regarding the answer

I used to know about XSL-FO, but due to the weight of speculation I really could not cope. But you're right, a subset will give me everything I need to work, and a place for growth. Thanks so much for the reminder.

Also, by including a rendering library like FOP or RenderX, I get free PDF creation. Not bad...

+4
source share
5 answers

As you are sure of the need to present the presentation side of things, it may be worth considering the XSL-FO Recommendation W3C. This is a full-blown page description language and (deeply unfashionable) the other half of the more well-known XSLT.

It’s clear that all this is nothing but “lightwight”, but if you just included a very limited subset, which can be simple (according to your specifications “paragraph tags, font selection, font size and decoration") fo: block and general properties font , something like:

<yourcontainer xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:block font-family="Arial, sans-serif" font-weight="bold" font-size="16pt">Example Heading</fo:block> <fo:block font-family="Times, serif" font-size="12pt">Paragraph text here etc etc...</fo:block> </yourcontainer> 

This may have several advantages than just folding your own. There is an open specification for work there, and all this implies. It reuses CSS properties as XML attributes (similar to SVG), so many formatting details seem somewhat familiar. You would have an upgrade path if you later decided that, say, smart paging is a must-have feature - including more sections of the specification, as they become relevant to your application.

There is one more thing you can get from exploring XSL-FO - seeing how even simple-to-do paragraphs and fonts can be terribly complex. Trying to make a text layout and line break "The correct path" for different languages ​​and use cases seems to me very difficult.

+4
source

If it's just word processing, maybe DocBook might be a little easier than ODF?

However, the wiki entry states:

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other documentation.

So it's not suitable for a general-purpose word processor?

The advantage of using DocBook is that a number of DocBook → other format converters should be available? Hope this helps.

+1
source

I like DocBook, but it does not fit. It aims to be presentation independent, assuming you will use XSLT to display it in presentation format.

In a word processor, the user edits the presentation along with the content. For example, the user does not want to mark the "keyword", they definitely want to make the text bold.

The DocBook editor would be very nice (I'm not sure if a good one exists), but that’s not quite what I am doing.

+1
source

Well, right ... But since I need to be able to convert to XML anyway, why store both the document tree and the DOM tree in memory when nothing prevents me from working directly from the DOM tree?

In particular, since one unique function of my program is that everything is always saved as you type, and I do not want to run the whole conversion to XML every time I press a key. It's easier to just bind input and output directly to my DOM tree in memory.

Edit: Oh, and the only problem with XHTML is that I want to support basic pagination. Although I think that nothing prevents me from using some additional tags for this ...

0
source

XML is an external format, not an internal one.

What happened to XHTML ? It is simple and ubiquitous (at least there is HTML). Your implementation will be easy to debug, and your users will be forever grateful.

-1
source

Source: https://habr.com/ru/post/1276457/


All Articles