Regular expression for VHDL identifier

I am trying to parse VHDL code for some additional checks.

I am looking for a regex that validates root identifiers in VHDL. And I'm still pretty new to regex.

It has the following rules:

  • may contain only alphabetic letters (A..Z a..z) (0..9) and underscore ('_')

  • must begin with a letter

  • may not end with underscores

  • may not include two consecutive underscores

So my current problem is checking for two consecutive underscores ...

Update: I think I just answered the question myself ... please double check

[A-Za-z](_?[A-Za-z0-9])* 
+4
source share
2 answers
 (?!.*__)[a-zA-Z][\w]*[^_] 

That should do the trick.

Explanation:

  # (?!.*__)[a-zA-Z][\w]*[^_] # # Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*__)» # Match any single character that is not a line break character «.*» # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» # Match the characters "__" literally «__» # Match a single character present in the list below «[a-zA-Z]» # A character in the range between "a" and "z" «az» # A character in the range between "A" and "Z" «AZ» # Match a single character that is a "word character" (letters, digits, etc.) «[\w]*» # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» # Match any character that is NOT a "_" «[^_]» 
0
source

Key Identifiers

In your update to the question, you suggest: {letter}({underscore}?{letter_or_digit})* . This is what the VHDL specification offers for expression. It is also worth noting that base identifiers are case insensitive . That is, id and id treated in the same way as the same identifiers.

Extended identifiers

However, VHDL also has extended identifiers. It would be a suitable regular expression for them:

 ({backslash}{Any ISO 8859-1 except backslash}*{backslash})+ 

Reserved Words

Also note that the following identifiers are not traditionally processed and are reserved words: this is a list from the 2002 specification. Depending on the version of the specification that you are implementing, there may be more or less reserved words.

 abs access after alias all and architecture array assert attribute begin block body buffer bus case component configuration constant disconnect downto else elsif end entity exit file for function generate generic group guarded if impure in inertial inout is label library linkage literal loop map mod nand new next nor not null of on open or others out package port postponed procedural procedure process protected pure range record reference register reject rem report return rol ror select severity shared signal sla sll sra srl subtype then to transport type unaffected units until use variable wait when while with xnor xor 

Letter

It is also worth noting that in VHDL [A-Za-z] not all letters are in the alphabet. You must also include the Latin characters ISO 8859-1. You can find more information about these symbols here .

But, to say it, here are additional capital letters:

 À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß 

And here are the additional lowercase letters:

 à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ 
0
source

Source: https://habr.com/ru/post/1379021/


All Articles