This is entirely possible, and it works fine after tuning. Unfortunately, the documentation for the pure C ++ Flex / Bison lexer is not so easy to find and follow.
I can expose you as a parser of the parser I wrote, but this is just an example of how you could do this.
Remember that some of this code was set up by trial and error, because the documentation is not enough, so there may be unnecessary operations or things that are not entirely correct, but they work.
ypp file
%skeleton "lalr1.cc" %require "3.0.2" %defines %define api.namespace {script} %define parser_class_name {Parser} %define api.token.constructor %define api.value.type variant %define parse.assert true %code requires { namespace script { class Compiler; class Lexer; } } %lex-param { script::Lexer &lexer } %lex-param { script::Compiler &compiler } %parse-param { script::Lexer &lexer } %parse-param { script::Compiler &compiler } %locations %initial-action { @$.begin.filename = @$.end.filename = &compiler.file; }; %define parse.trace %define parse.error verbose %code top { #include "Compiler.h" #include "MyLexer.h" #include "MyParser.hpp" static script::Parser::symbol_type yylex(script::Lexer &scanner, script::Compiler &compiler) { return scanner.get_next_token(); } using namespace script; }
Here you can use C ++ everywhere, for example
%type<std::list<Statement*>> statement_list for_statement ... statement_list: { $$ = std::list<Statement*>(); } | statement_list statement { $1.push_back($2); $$ = $1; } ;
l file
%{ #include "MyParser.hpp" #include "MyLexer.h" #include "Compiler.h" #include <string> typedef script::Parser::token token; #define yyterminate() script::Parser::make_END(loc); static script::location loc; using namespace script; %} %x sstring %x scomment %option nodefault %option noyywrap %option c++ %option yyclass="Lexer" %option prefix="My" %{ # define YY_USER_ACTION loc.columns((int)yyleng); %} %% %{ loc.step(); %}
Then you need a header file that defines your Lexer class, which inherits from yyFlexLexer , how C ++ Flex works, something like
#if ! defined(yyFlexLexerOnce) #undef yyFlexLexer #define yyFlexLexer NanoFlexLexer #include <FlexLexer.h> #endif #undef YY_DECL #define YY_DECL script::Parser::symbol_type script::Lexer::get_next_token() #include "MyParser.hpp" namespace script { class Compiler; class Lexer : public yyFlexLexer { public: Lexer(Compiler &compiler, std::istream *in) : yyFlexLexer(in), compiler(compiler) {} virtual script::Parser::symbol_type get_next_token(); virtual ~Lexer() { } private: Compiler &compiler; }; }
The final step is to define your compiler class, which is called from Bison's grammar rules (which uses the parse-param attributes in the ypp file). Sort of:
#include "parser/MyParser.hpp" #include "parser/MyLexer.h" #include "parser/location.hh" #include "Symbols.h" namespace script { class Compiler { public: Compiler(); std::string file; void error(const location& l, const std::string& m); void error(const std::string& m); vm::Script* compile(const std::string& text); bool parseString(const std::string& text); void setRoot(ASTRoot* root); Node* getRoot() { return root.get(); } }; }
Now you can parse the C ++ code easily and completely, for example:
bool Compiler::parseString(const std::string &text) { constexpr bool shouldGenerateTrace = false; istringstream ss(text); script::Lexer lexer = script::Lexer(*this, &ss); script::Parser parser(lexer, *this); parser.set_debug_level(shouldGenerateTrace); return parser.parse() == 0; }
The only thing you have to take care of is to call flex in the .l file with the -c++ argument so that it creates C ++ vocabulary.
Actually, with some cautious operations, I could also have several independent and self-starting lexers / parsers in one project.