Here's a (very) rudimentary BibTex grammar that emits an AST (contrary to a simple parsing tree):
grammar BibTex; options { output=AST; ASTLabelType=CommonTree; } tokens { BIBTEXFILE; TYPE; STRING; PREAMBLE; COMMENT; TAG; CONCAT; }
(if you do not need AST, delete everything ->
and everything that is to the right of it, and remove the options{...}
and tokens{...}
blocks)
which can be tested with the following class:
import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception {
and the following Bib-input example (file: test.bib
):
@PREAMBLE{ "\newcommand{\noopsort}[1]{} " # "\newcommand{\singleletter}[1]{#1} " } @string { me = "Bart Kiers" } @ComMENt{some comments here} % or some comments here @article{mrx05, auTHor = me # "Mr. X", Title = {Something Great}, publisher = "nob" # "ody", YEAR = 2005, x = {{Bib}\TeX}, y = "{Bib}\TeX", z = "{Bib}" # "\TeX", }, @misc{ patashnik-bibtexing, author = "Oren Patashnik", title = "BIBTEXing", year = "1988" } % no comma here @techreport{presstudy2002, author = "Dr. Diessen, van RJ and Drs. Steenbergen, JF", title = "Long {T}erm {P}reservation {S}tudy of the {DNEP} {P}roject", institution = "IBM, National Library of the Netherlands", year = "2002", month = "December", }
Start the demo
If now you create a parser and lexer from a grammar:
java -cp antlr-3.3.jar org.antlr.Tool BibTex.g
and compile all source .java
files:
javac -cp antlr-3.3.jar *.java
and finally run the Main
class:
* NIX / MacOS
java -cp .:antlr-3.3.jar Main
Window
java -cp .;antlr-3.3.jar Main
You will see some output on the console that corresponds to the following AST:

(click an image to enlarge it, generated using graphviz-dev.appspot.com )
To emphasize: I did not test the grammar correctly! I wrote this a while ago and never used it in any project.