Text Editor Theory

Since I am always dissatisfied with existing editors, the project I always wanted to run is my own text editor. However, text editing is a serious business.

Besides analyzing the source text of existing text editors, is there any book or other resource (for example, academic work) on this topic? I am particularly interested in what teaches how to handle memory and how to control text insertion (if you have a 100 MB file and you want to add char to position x, you can just memmove get a huge block text ...).

+41
editor data-structures theory text-editor
Jul 02 2018-10-22T00:
source share
11 answers

Take a look at the description of Rob Pike text editor Sam . Be sure to review the overview and high-level command language. He further describes analysis, memory management, and data structures.

Also, take a look at the Russ Cox simple regex implementation . It is easy to follow and can open some doors outside existing regular expression libraries.

+15
Jul 02 2018-10-22T00:
source share

Over the years I have written quite a few text editors. Of course, the easiest way is to manage a long sequence of characters where you copy everything around to insert any character. Other methods that I used include:

  • Represents a text file as a doubly linked list of text strings.
  • Build a tree-like data structure (sometimes called "rope" ), which begins as a solid line of characters, but has the ability to split, insert, and delete blocks of text without having to move the rest of the text.

Many of the older Borland example examples used a text editor as an example tutorial. You can sometimes find copies of these books in used bookstores almost free.

+14
Jul 02 2018-10-22T00:
source share

Here is a great tutorial that covers a lot of relevant topics in a more modern context:

Other answers to this question cover the buffer spaces.

Another modern highlight is the AvalonEdit description.

and additional details from:

and the book has a ton of details / content (about SharpDevelop):

+9
Jan 06 '12 at 22:01
source share

Promoted to answer on request:

The antique " Software Tools in Pascal " Kernighan and Plaugher "implements the ed editor in a language that contains neither real strings nor pointers. It contains a great overview of the design considerations underlying any text editor.

+8
Jul 02 '10 at 23:03
source share

One old method that still works is called the space buffer. The main idea is that you put the text in the buffer, but instead of putting it all in one block, you create a β€œspace” in the cursor, placing all the text before the cursor at the beginning of the buffer, and all the text after the cursor at the end of the buffer . Most insertions occur with a cursor, which you can do without moving anything (until you overflow the buffer). When the user moves the cursor, you move the corresponding text from one side of the split to the other.

Given the typical controls (cursor left, right, up, down, up, down), the biggest move you usually come across is the page at a time, which usually easily handles pretty quickly than the keyboard repeats. Of course, this can slow down if you have a really huge file and the goto line command or something in that order. If you are going to do this, there are undoubtedly better structures to use ...

+6
Jul 02 '10 at 23:37
source share

This document compares many data structures that can be used for text editors, including those already mentioned here (for example, break buffers), as well as several others (for example, tables). The article is old, but I believe that it still covers all the main options, and it compares them well in terms of performance, complexity and space.

I know that stack overflow answers should not be links, but this is still the best source of information I have ever looked for the requested information, and it is too difficult to generalize in the answer here. If the link is out of date, search for β€œ Data Structures for Text Sequences ” by Charles Crowley of the University of New Mexico.

+5
May 16 '13 at 16:04
source share

The Scintilla component uses a shared buffer according to the theory explained in the text linked in the Scintilla and SciTE Related Sites page .
Related Data Structures page in a bitmap text editor .
The split buffer showed that it works well even with megabyte files. Using helper structures (such as a list of line starts) can also help.

+3
Jul 03 '10 at 8:26
source share

Here's how Microsoft professionals do it:

MS Word uses a piece table data structure. An increasing array of character positions is supported in parallel with an array of larger structures that contain pointers either to the source file (the text is loaded when the file is downloaded) or to the buffer of only added new characters.

An EDIT control used throughout Windows does not use a data structure at all. Notepad just uses multi-line EDIT control. Check this out with the heap viewer. The EDIT control only supports the line break and tab stop buffer.

If you intend to create a simple, unformatted text editor on Windows, you can easily subclass the EDIT control to add features.

+2
Jan 29 '15 at 8:04
source share

how to control the insertion of text (if you have a file of 100 MB in size and you want to add char to position x, you cannot just memove a huge text block ...)

Make the text file a linked list, where each line is a record.

+1
Jul 02 2018-10-22T00:
source share

Well, if you know that in general people will have relatively few insertion points, you can keep an array of pointers in the original text buffer, and when the user tries to insert it inside, you β€œsplit” the buffer by creating another pointer to the rest of the buffer, doing the first pointer length so that it stops at the insertion point and adds a third pointer for the text to be inserted between them, for example:

 long original text la la la ^ *^ | 2nd part 1st part 

And the star points to a new text buffer, where you start adding text to paste.

When you process (or even parse) your text file, you iterate over the array of pointers, and then do your work on each buffer. Of course, if the buffer is small enough, you skip adding a new part of the buffer, but this is just a heuristic, try each one and feel that it works best.

You can also consider splitting a text file into a load of several buffers, say, every 1 MB or so, because if you upload a file to one buffer, you will need to create a new buffer for the inserted text due to size. Again, this is heuristic optimization.

+1
Jul 02 '10 at
source share



All Articles