Bible Grammar for ANTLR

I am looking for bibtex grammar in ANTLR for use in a hobby project. I don’t want to waste time writing ANTLR grammar (it may take some time for me because it will be related to the learning curve). Therefore, I will be grateful for any pointers.

Note: I found bibtex grammars for bison and yacc, but could not find them for antlr.

Edit: As Bart pointed out, I do not need to parse the preambles and text in the quoted lines.

+6
source share
1 answer

Here's a (very) rudimentary BibTex grammar that emits an AST (contrary to a simple parsing tree):

grammar BibTex; options { output=AST; ASTLabelType=CommonTree; } tokens { BIBTEXFILE; TYPE; STRING; PREAMBLE; COMMENT; TAG; CONCAT; } //////////////////////////////// Parser rules //////////////////////////////// parse : (entry (Comma? entry)* Comma?)? EOF -> ^(BIBTEXFILE entry*) ; entry : Type Name Comma tags CloseBrace -> ^(TYPE Name tags) | StringType Name Assign QuotedContent CloseBrace -> ^(STRING Name QuotedContent) | PreambleType content CloseBrace -> ^(PREAMBLE content) | CommentType -> ^(COMMENT CommentType) ; tags : (tag (Comma tag)* Comma?)? -> tag* ; tag : Name Assign content -> ^(TAG Name content) ; content : concatable (Concat concatable)* -> ^(CONCAT concatable+) | Number | BracedContent ; concatable : QuotedContent | Name ; //////////////////////////////// Lexer rules //////////////////////////////// Assign : '=' ; Concat : '#' ; Comma : ',' ; CloseBrace : '}' ; QuotedContent : '"' (~('\\' | '{' | '}' | '"') | '\\' . | BracedContent)* '"' ; BracedContent : '{' (~('\\' | '{' | '}') | '\\' . | BracedContent)* '}' ; StringType : '@' ('s'|'S') ('t'|'T') ('r'|'R') ('i'|'I') ('n'|'N') ('g'|'G') SP? '{' ; PreambleType : '@' ('p'|'P') ('r'|'R') ('e'|'E') ('a'|'A') ('m'|'M') ('b'|'B') ('l'|'L') ('e'|'E') SP? '{' ; CommentType : '@' ('c'|'C') ('o'|'O') ('m'|'M') ('m'|'M') ('e'|'E') ('n'|'N') ('t'|'T') SP? BracedContent | '%' ~('\r' | '\n')* ; Type : '@' Letter+ SP? '{' ; Number : Digit+ ; Name : Letter (Letter | Digit | ':' | '-')* ; Spaces : SP {skip();} ; //////////////////////////////// Lexer fragments //////////////////////////////// fragment Letter : 'a'..'z' | 'A'..'Z' ; fragment Digit : '0'..'9' ; fragment SP : (' ' | '\t' | '\r' | '\n' | '\f')+ ; 

(if you do not need AST, delete everything -> and everything that is to the right of it, and remove the options{...} and tokens{...} blocks)

which can be tested with the following class:

 import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { // parse the file 'test.bib' BibTexLexer lexer = new BibTexLexer(new ANTLRFileStream("test.bib")); BibTexParser parser = new BibTexParser(new CommonTokenStream(lexer)); // you can use the following tree in your code // see: http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1tree_1_1_common_tree.html CommonTree tree = (CommonTree)parser.parse().getTree(); // print a DOT tree of our AST DOTTreeGenerator gen = new DOTTreeGenerator(); StringTemplate st = gen.toDOT(tree); System.out.println(st); } } 

and the following Bib-input example (file: test.bib ):

 @PREAMBLE{ "\newcommand{\noopsort}[1]{} " # "\newcommand{\singleletter}[1]{#1} " } @string { me = "Bart Kiers" } @ComMENt{some comments here} % or some comments here @article{mrx05, auTHor = me # "Mr. X", Title = {Something Great}, publisher = "nob" # "ody", YEAR = 2005, x = {{Bib}\TeX}, y = "{Bib}\TeX", z = "{Bib}" # "\TeX", }, @misc{ patashnik-bibtexing, author = "Oren Patashnik", title = "BIBTEXing", year = "1988" } % no comma here @techreport{presstudy2002, author = "Dr. Diessen, van RJ and Drs. Steenbergen, JF", title = "Long {T}erm {P}reservation {S}tudy of the {DNEP} {P}roject", institution = "IBM, National Library of the Netherlands", year = "2002", month = "December", } 

Start the demo

If now you create a parser and lexer from a grammar:

 java -cp antlr-3.3.jar org.antlr.Tool BibTex.g 

and compile all source .java files:

 javac -cp antlr-3.3.jar *.java 

and finally run the Main class:

* NIX / MacOS

 java -cp .:antlr-3.3.jar Main 

Window

 java -cp .;antlr-3.3.jar Main 

You will see some output on the console that corresponds to the following AST:

enter image description here

(click an image to enlarge it, generated using graphviz-dev.appspot.com )

To emphasize: I did not test the grammar correctly! I wrote this a while ago and never used it in any project.

+9
source

Source: https://habr.com/ru/post/898207/


All Articles