Why are we still programming with flat files?

Why are flat text files state of the art for representing source code?

Of course, the preprocessor and compiler need to see a flat representation of the file, but it is easy to create.

It seems to me that some form of XML or binary data can represent a lot of ideas that are very difficult to track, otherwise.

For example, you can embed UML diagrams directly in your code. They can be generated semi-automatically and annotated by developers to highlight important design aspects. Interaction diagrams in particular. Hell, embedding any custom drawing can make things clearer.

Another idea is to embed comments from code reviews directly into the code.

There may be all sorts of tools to facilitate the union of several branches.

What I'm passionate about is not just tracking code coverage, but also viewing parts of the code covered by the automated test. The hard part keeps track of this code, even when the source is changed. For example, moving a function from one file to another, etc. This can be done using a GUID, but they are pretty intrusive to embed directly in a text file. In a rich file format, they can be automatic and unobtrusive.

So, why is there no IDE (as far as I know, one way or another) that allows you to work with code this way?

EDIT: October 7, 2009.

Most of you have very heavily hung the word "binary" in my question. I clean it. Picture XML, very minimally marking your code. Before passing it to ordinary preprocessors or the compiler, you remove all the XML markup and pass only the source code. In this form, you can still do all the normal things in the file: diff, merge, edit, work in a simple and minimal editor, load them into thousands of tools. Yes, diff, merge and edit, directly with minimal XML markup, are a bit more complicated. But I think the value can be huge.

If there was an IDE that respected all XML, you could add much more than what we can do today.

For example, your DOxygen comments may indeed look like the final output of DOxygen.

When someone wanted to do a code review, such as Code Collaborator, they could tag the source code in place.

XML can even be hidden behind comments.

// <comment author="mcruikshank" date="2009-10-07"> // Please refactor to Delegate. // </comment> 

And if you want to use vi or emacs, you can just skip the comments.

If I want to use a modern editor, I see this in about a dozen different useful ways.

So this is my rough idea. These are not the “building blocks” of the images that you drag onto the screen ... I'm not that nut. :)

+48
flat-file
Oct 02 '08 at 2:32
source share
34 answers
  • one
  • 2
  • you can distinguish between them
  • you can combine them
  • anyone can edit them
  • they are simple and easy to handle
  • they are universally available to thousands of tools.
+137
02 Oct. '08 at 2:36
source share

In my opinion, any possible benefits are outweighed by being tied to a specific tool.

Using a regular text source (this seems to be what you are discussing, not flat files as such). I can embed snippets in email, use simple version control systems (very important!), Write code in comments on Stack Overflow, use one of thousands of text editors on any number of platforms, etc.

With some binary representation of the code, I need to use a specialized editor to view or edit it. Even if a text view can be created, you cannot trivially roll back changes to the canonical version.

+25
02 Oct '08 at 2:40
source share

Smalltalk is an image-based environment. You are no longer working with code in a file on disk. You work with and modify real objects at runtime. This is still text, but classes are not stored in human readable files. Instead, the entire memory of the object (image) is stored in a file in binary format.

But the biggest complaints of those who test smalltalk are that it does not use files. Most of the file-based tools we have (vim, emacs, eclipse, vs .net, unix tools) will have to be abandoned in favor of small tools. Not that the tools presented in the small box are lower. It is just different.

+14
02 Oct '08 at 2:42
source share

Why are essays written in the text? Why are legal documents written in the text? Why are science fiction novels written in the text? Because the text is the only best form - for people - the preservation of their thoughts.

The text is how people think, represent, understand and preserve concepts - and their complexity, hierarchy and relationship.

+11
Oct 02 '08 at 2:36
source share

Lisp programs are not flat files. They are serialization of data structures. This code as data is an old idea and in fact is one of the greatest ideas in the field of computer science.

+10
Oct 02 '08 at 3:03
source share

<? xml version = "1.0" encoding = "UTF-8"? > <code> Flat files are easier to read. </code> </xml> </p>

+8
Oct 02 '08 at 4:52
source share

That's why:

  • Human reading. This greatly facilitates the detection of errors, both in the file and in the analysis method. You can also read aloud. One that you simply cannot get with XML can make a difference, especially in customer support.

  • Obsolescence insurance. As long as the regular expression exists, you can write a pretty good parser in only a few lines of code.

  • Levers. Almost everything there, from version control systems to editors for filtering, can check, combine and work with flat files. XML merging can be a mess.

  • The ability to easily integrate them with UNIX tools such as grep, cut or sed.

+7
Oct 02 '08 at 3:16
source share

That's a good question. FWIW, I would like to see a Wiki-style code management tool. Each function block will have its own wiki page. Build tools combine wiki source code. There will be a discussion page associated with this page where people can argue about algorithms, APIs, etc.

Hell, it would be hard to crack one of the pre-existing Wiki implementations. Any participants ...?

+6
Oct 02 '08 at 2:50
source share

Ironically, there are AES programming designers who use exactly what you describe.

For example, SQL Server Integration Services, which include a coding logic stream by dragging and dropping components onto the visual surface of a design, is saved as XML files that describe this particular end.

SSIS, on the other hand, is quite difficult to control the source. It is also quite difficult to develop some kind of complex logic in it: if you need a little more “control”, you will need to encode VB.NET code into a component that returns us to where we started.

I assume that, as an encoder, you should consider the fact that there are consequences for every solution to the problem. Not everything can (and some arguments should) be represented in UML. Not everything can be visually represented. Not everything could be simplified to have a consistent representation of the binary.

If I say that I believe that the disadvantages of casting code to binary formats (most of which also tend to be proprietary) have a great weight of advantages of using them in plain text.

+5
Oct 02 '08 at 2:42
source share

IMHO, XML and binary formats would be a complete mess and would not bring any significant benefits.

An OTOH-related idea would be to write to the database, maybe one function per record, or perhaps a hierarchical structure. The IDE created around this concept can make the navigation source more natural and make it easier to hide anything that is not relevant to the code you are currently reading.

+4
Oct 02 '08 at 2:45
source share

People have been trying for a long time to create an editing environment that goes beyond the scope of a flat file, and to some extent everyone has failed. The closest I saw was a prototype for intentional programming by Charles Simoni, but then it was downgraded to a visual DSL creation tool.

No matter how the code is stored or displayed in memory, in the end it should be presentable and modifiable as text ( without changing the formatting for you ), since this is the easiest way that we know most of the abstract concepts that are needed to solve problems express using programming.

With flat files, you get it for free, and any simple text editor (with support for the correct character encoding) will work.

+4
Oct 02 '08 at 3:19
source share
Steve McConnell has the right, as always - you write programs for other programmers (including yourself), and not for computers.

However, Microsoft Visual Studio must internally manage code that you write in a very structured format, or you won’t be able to do things like “Find all links” or rename or refactor variables and methods so easily. I would be interested to have Does anyone have links to how this works.

+4
Oct 02 '08 at 5:29
source share

In fact, about 10 years ago, an early prototype of Charles Simoni for intentional programming tried to go beyond the boundaries of a flat file into a tree view of the code, which can be visualized in different ways. Theoretically, a domain expert, PM, and a software engineer could see (and combine together) the application code in ways that were useful to them, and the products could be built on a hierarchy of declarative "intentions", level code only as needed.

ETA (for each request in questions). There is a copy of one of his early works on the Microsoft website on the Microsoft website. Unfortunately, since Simoni left MS to start a separate company several years ago, I don’t think the prototype is still available for download. I saw some demos when I was at Microsoft, but I'm not sure how widespread its early prototype is.

His company, IntentSoft , is still a little worried about what they plan to bring to the market, if at all, but some of the early materials that came out of MSR were pretty interesting.

The storage model was in binary format, but I'm not sure how much of this information was disclosed during the MSR project, and I'm sure some things have changed from earlier implementations.

+4
Oct 02 '08 at 6:52
source share

Labview and Simulink are two graphical programming environments. They are both popular in their fields (interfacing with PC hardware and their control systems, respectively), but they are not used significantly outside these fields. I worked with people who were big fans of both, but never got into them.

+3
02 Oct '08 at 23:45
source share

Did you mention that we should use "some form of XML"? What do you think XHTML and XAML?

Also, XML is still just a flat file.

+2
02 Oct '08 at 2:54
source share

Old habits die, I think.

Until recently, there were not many high-quality, high-performance, widely available libraries for general storage of structured data. And I would categorically not put XML in this category even today - too much, too intense for processing, too thin.

Currently, my favorite thing is to use data that does not need human readability of SQLite and create a database. It is so incredibly easy to embed a full-featured SQL database in any application ... there are bindings for C, Perl, Python, PHP, etc ... and it is open source and very fast and reliable and lightweight.

I <3 SQLite.

+2
Oct 02 '08 at 4:01
source share

Why are text files correct? Because of the McIlroy test. It is imperative that the output of one program is acceptable as source code for another, and text files are the simplest thing that works.

+2
Oct 02 '08 at 11:45
source share

Has anyone ever tried Mathematica ?

The peak above is the old version, but it is the best tool for Google.

In any case ... compare the first equation with Math.Integrate (1 / (Math.Pow ("x", 3) -1), "x") , as you would have to write if you encoded plain text on most common languages. The mathematical representation is much easier to read, and it is still a rather small equation.

And yes, you can paste and copy code in plain text if you want.

See this as the next generation syntax highlighting. I bet there are many other things besides mathematics that could benefit from these kinds of representations.

+2
Oct 02 '08 at 14:52
source share

The trend that we see about DSL is the first thing that comes to mind when reading your question. The problem was that between the models (for example, UML) and the implementation there is no 1 to 1 relationship. Microsoft, among others, is working to get there, so that you can create your application as something similar to UML, then the code may be generated. And the important thing is that when you decide to change your code, the model will reflect this again.

The Windows Workflow Foundation is a pretty good example. The reason is that there are flat files and / or XML in the background, but usually you define your business logic in an orchestration tool. And that's pretty cool!

We need to think more about "software screensavers", and in the future we will see a richer IDE experience, but as long as computers work on zeros and in one, flat text files can and (possibly) always will be an intermediate stage. As already mentioned, several man, simple text files are very flexible.

+1
Oct 02 '08 at 5:41
source share

It is pretty obvious why plain text is king. But it is equally obvious why a structured format will be even better.

Just one example: if you rename a method, your diff / merge / source control tool can tell that only one has changed. The tools we use today will contain a long list of changes, one for each location and file that was called or declared by the method.

(By the way, this post does not answer the question, as you may have noticed)

+1
02 Oct '08 at 7:19
source share

I thoughtfully thought about the same as described in the answer to: What tool / application / whatever you want,

Although it is easy to imagine many advantages, I think that the biggest obstacle that would have to be solved is that no one has created a viable alternative.

When people think of alternatives to storing the source as text, they seem to often immediately think of graphical representations (I mean commercial products that were available - for example, HP-vee). And if we look at the experience of people like FPGA developers, we see that programming (exclusively) graphically just doesn’t work - hence languages ​​like Verilog and VHDL.

But I do not see that the storage of the source must necessarily be associated with the method of writing it first. Source input can be pretty much done as text, which means that copy / paste problems can still be achieved. But I also see that by allowing mergers and rollbacks based on a tokenized meta source, we could achieve more accurate and powerful manipulation tools.

+1
Oct 02 '08 at 8:01
source share

For an example of a language that eliminates traditional text programming, see Lava Language .

Another great thing I recently discovered is subtext2 ( video demo ).

+1
Oct 02 '08 at 9:14
source share

Visual FoxPro uses dbf table structures to store code and metadata for forms, reports, lib classes, etc. These are binary files. It also saves the code in prg files that actual text files ...

The only advantage that I see is to use the built-in VFP data language to search for the code in these files ... except that it is imo's responsibility. At least once every few months, one of these files will be corrupted for no apparent reason. Integration with the control source and is very painful. There are workarounds for this, but you need to temporarily convert the file to text!

+1
Oct 03 '08 at 22:33
source share

Your program code defines the structure to be created using xml or binary format. Your programming language is a more direct representation of your program structure than XML or binary representation. Have you ever noticed how Word behaves badly when you structure your document. WordPerfect will at least “show codes” so you can see what lies beneath your document. Flat files do the same for your program.

0
Oct 02 '08 at 2:50
source share

The perfect idea. I myself wondered about a smaller scale ... much less why IDE X cannot generate this or that.

I don’t know if, as a programmer, I’m capable of developing something as cool and complex as you say or what I’m thinking about, but I would be interested to try.

Perhaps start with some plugins for .NET, Eclipse, Netbeans, etc.? Show what you can do and start a new trend in coding.

0
Oct 02 '08 at 3:01
source share

I think another aspect of this is that the code is important. This is what will be accomplished. For example, in your UML example, I would think that UML (presumably created in some editor that is not directly related to the "code") included in your "source blob" would be almost useless. It would be much better if UML was generated directly from your code, so it describes the exact state of the code as a tool for understanding the code, and not as a reminder of what the code should have been.

We have been working on automated doc tools for many years. Although the actual programmer who generates comments in the code may get out of sync with the code, tools like JavaDoc, etc., accurately represent methods on the object, return types, arguments, etc. They represent them as they really are, and not as some artifact that came out of endless design collections.

It seems to me that if you could arbitrarily add random artifacts to some "source blob", they would probably be outdated and would be less useful right away. If you can generate such artifacts directly from code, then a little effort to get your build process to do this is much better than the previously mentioned traps for moving from text source files.

In this regard, the explanation of why you want to use the plaintext UML tool ( UMLGraph ) seems to be pretty much the same as to why you want to use text source files.

0
Oct 02 '08 at 3:37
source share

This may not answer exactly your question, but here the editor allows you to have a higher kind of code: http://webpages.charter.net/edreamleo/front.html

0
Oct 02 '08 at 7:46
source share

, , , , . ( , , ). , , .

, . , diff/merge , . diff/merge, ( XML ) diff , , "", . XML - , .

, , , diff/merge , , . , ++ Java , - , , - if() {}, , , EOL. , . , , .

0
02 . '08 10:12
source share

, ? .. - . , - ..

0
02 . '08 20:23
source share

! , .

, Fortress, Sun. .

. ASCII, . .

The main reason for saving text as a source is the lack of powertools, for example, version control, for a non-text date. This is based on my experience with Smalltalk, where simple bytecode is stored in the core dump all the time. In a non-textual system, with today's tools, team development is a nightmare.

0
Oct 03 '08 at 0:08
source share
  • one
  • 2



All Articles