How to effectively implement an immutable graph of heterogeneous immutable objects in C ++?

I write a text parser in a programming language, out of curiosity. Let's say I want to define an immutable (at run time) token plot as vertices / nodes. Of course, they are of various types - some tokens are keywords, some are identifiers, etc. However, they all have a common feature, where each token in the chart points to another. This property allows the parser to find out what can follow a specific token, and therefore the chart determines the formal grammar of the language. My problem is that I stopped using C ++ on a daily basis several years ago, and since then I have used many higher-level languages, and my head is completely fragmented regarding heap allocation, stack allocation, etc. Alas, my C ++ is rusty.

However, I would like to immediately climb a steep hill and set a goal to determine this schedule in this imperative language in the most perfect way. For example, I want to avoid allocating each token object separately on the heap, using the β€œnew” one, because I think that if I allocated the entire graph of these tokens in a way that says (linearly, like the elements in the array) this could benefit performance somehow based on the reference principle - I mean, when the whole graph is compacted to occupy the minimum space along the "line" in memory, and not to have all its token objects in random places, which is a plus? In any case, as you can see, this is a very open question.

class token
{

}

class word: token
{
    const char* chars;

    word(const char* s): chars(s)
    {
    }
}

class ident: token
{
    /// haven't thought about these details yet
}

template<int N> class composite_token: token
{
    token tokens[N];
}

class graph
{
    token* p_root_token;
}

: ? , , .. - ? , ... ( , .) . , ++, C. .

+3
3

++ , , , . ...

, . , , , .

, : Vector < > ( , ) initializer "Node [] graph = {...};".

, . " ": node .

node, /: .

, (Node [3] Node [10]). , .

node ( "" ): , . , , , Lexer , , .

, .

+3

, "" , , " ".

"-" (BNF) "EBNF",.

EBNF ( " " ), SO , #. ++.

, EBNF , . , ; . , "", , .

, - , , , . , , .

+3

, , .

; ( , , , ) std::vector. .

. , "" (-), : - , - "", .

Thus, data is stored in one central place, tokens are allocated (but may not actually contain much data) and work with data in a central place. It is actually a data-driven design.

A vector might look like this:

struct TokenData
{
    token *previous, *current, *next;
    token_id id; // some enum?
    ... // more data that is similar
}

std::vector<TokenData> token_data;

class token
{
    std::vector<TokenData> *token_data;
    size_t index;

    TokenData &data()
    {
        return (*token_data)[index];
    }

    const TokenData &data() const
    {
        return (*token_data)[index];
    }
}

// class plus_sign: token
// if (data().previous->data().id == NUMBER && data().next->data().id == NUMBER)

for (size_t i = 0; i < token_data.size(); i++)
{
    token_data[i].current->do_work();
}

This is an idea.

+1
source

Source: https://habr.com/ru/post/1771984/


All Articles