What exactly is a token regarding parsing

Question

What exactly is a token regarding parsing

I need to use a parser and a write in C ++, I am trying to implement functions, however I do not understand what a token is. one of my functions / operations is to check if there are more tokens to create

bool Parser :: hasMoreTokens ()

exactly how can i do this please help

SO!

I open a text file with text in it, all the words are at the bottom. How can I check if he hasmoretokens?

This is what I have

bool Parser::hasMoreTokens() { while(source.peek()!=NULL){ return true; } return false; }

+4

c ++ stream ostream ifstream

Technupe Apr 12 '11 at 17:24

source share

6 answers

When you break a large block (long line) into a group of subblocks (smaller lines), each of the subblocks (smaller lines) is called a “token”. If there are no more sub-units, then you will finish the parsing.

How to make string tokenization in C ++?

+1

Jess Apr 12 '11 at 17:27

source share

A token is usually akin to a sponken word. In C ++ (int, float, 5.523, const) there will be tokens. It is the smallest unit of text that makes up a semantic element.

0

piotr Apr 12 '11 at 17:27

source share

A marker is the smallest unit of a programming language that makes sense. A bracket ( , name foo , integer 123 , all tokens. Decreasing text by a series of tokens is usually the first step in analyzing it.

0

Ernest friedman-hill Apr 12 '11 at 17:28

source share

A token is a terminal in a grammar, a sequence of one or more characters (s), which is determined by the sequence itself, i.e. not obtained from any other work defined in the grammar.

0

Felice pollano Apr 12 '11 at 17:30

source share

A token is what you want. Traditionally (and for good reason), language specifications divided the analysis into two parts: the first part broke the input stream into tokens, and the second analyzed the tokens. (Theoretically, I think you can write any grammar on only one level, without using tokens, or the same, using individual characters as markers. I would not want to see the results for that for a language such as C ++, however.) But the definition is that the token is completely dependent on the language you are parsing: most languages, for example, treat white space as a delimiter (but not fortran); most languages predefine punctuation / operators using punctuation characters and do not allow these characters in characters (but not COBOL, where "abc-def" will be the only character). In some cases (including in the C ++ preprocessor), what is a token depends on the context, so you may need some feedback from the parser. (Hopefully not; this is for experienced programmers.)

Perhaps one thing is certain (unless each character is a marker): you will have to read ahead in the stream. Usually you can’t tell if there are more tokens just by looking at one character. I generally found it useful, in fact, for the tokenizer to read the whole token at a time, and keep it for as long as the Parser needs it. A function like hasMoreTokens actually scans the full token.

(And while I am, if source is istream : istream::peek does not return a pointer, but int .)

0

James kanze Apr 12 '11 at 18:12

source share

Gareth mccaughan · Accepted Answer · 2011-04-12T17:28:34+0000

Tokens are the result of lexical analysis and input for parsing. Usually these are things like

the numbers
variable names
Parentheses
arithmetic operators
report terminators

That is, roughly speaking, the biggest things that can be uniquely identified with a code that just looks at its input of one character at a time.

One note that you should safely ignore if it confuses you: the line between lexical analysis and parsing is a bit fuzzy. For instance:

Some programming languages have complex number literals that look like, for example, 2+3i or 3.2e8-17e6i . If you parse such a language, you can make lexer gobble up a whole complex number and turn it into a token; or you can have a simpler lexer and a more complex parser and make (say) 3.2e8 , - , 17e6i separate tokens; then it would work with a parser (or even a code generator) to notice that what it received is really the only literal.
In some programming languages, a lexer may not be able to determine whether a given token is a variable name or a type name. (For example, this happens in C). But the language grammar can distinguish between the two, so you want the "variable foo" and "type name foo" to be different tokens. (This also happens in C.) In this case, it may be necessary that some information be returned from the analyzer to the lexer, so that in each case the correct type of token can be obtained.

So what is a token? may not always have a well-defined answer.

What exactly is a token regarding parsing

More articles: