Check VAT number for syntactic correctness with Regex?

I am trying to find a way to check European VAT identifiers. They vary in length, sometimes have checksums, and so on. I usually use regular expressions to test simple strings, but for me it looks pretty complicated.

Wikipedia has a list of different syntaxes:

Therefore, before starting to spend a lot of time and fail at the end, I would like to know from someone who uses regular expressions more often than I do whether it will be possible to pre-check these numbers. If you think that checking VAT-ID syntax is not possible using regular expressions, give a detailed example, why not.

Thank you in advance.

Notes: Of course, I know about the XML-RPC validation of the German Ministry of Finance ( https://evatr.bff-online.de/eVatR/xmlrpc/ ), but sometimes it takes a few minutes to get a response to a request. In addition, they interrupt the operation of this XML-RPC validation service from 23:00 to 05:00 Berlin time. This is why I would like to get a two-step verification: the first step for the syntax, the second step (cron starts) with this XML-RPC.

+10
source share
5 answers

There is a regular expression for confirming the VAT number of 27 EU countries, presented in the Book of Regular Expressions, 2nd edition, 4.21. Section of European VAT numbers .

With this regular expression there is no calculation check, but it will still be able to check single lines, which are likely to be EU VAT numbers.

Before checking, the characters [-.●] or [^A-Z0-9] must be deleted. Then use

 (?xi)^( (AT)?U[0-9]{8} | # Austria (BE)?0[0-9]{9} | # Belgium (BG)?[0-9]{9,10} | # Bulgaria (HR)?[0-9]{11} | # Croatia (CY)?[0-9]{8}[AZ] | # Cyprus (CZ)?[0-9]{8,10} | # Czech Republic (DE)?[0-9]{9} | # Germany (DK)?[0-9]{8} | # Denmark (EE)?[0-9]{9} | # Estonia (EL)?[0-9]{9} | # Greece ES[AZ][0-9]{7}(?:[0-9]|[AZ]) | # Spain (FI)?[0-9]{8} | # Finland (FR)?[0-9A-Z]{2}[0-9]{9} | # France (GB)?([0-9]{9}([0-9]{3})?|[AZ]{2}[0-9]{3}) | # United Kingdom (HU)?[0-9]{8} | # Hungary (IE)?[0-9]{7}[AZ]{1,2} | # Ireland (IE)?[0-9][AZ][0-9]{5}[AZ] | # Ireland (2) (IT)?[0-9]{11} | # Italy (LT)?([0-9]{9}|[0-9]{12}) | # Lithuania (LU)?[0-9]{8} | # Luxembourg (LV)?[0-9]{11} | # Latvia (MT)?[0-9]{8} | # Malta (NL)?[0-9]{9}B[0-9]{2} | # Netherlands (PL)?[0-9]{10} | # Poland (PT)?[0-9]{9} | # Portugal (RO)?[0-9]{2,10} | # Romania (SE)?[0-9]{12} | # Sweden (SI)?[0-9]{8} | # Slovenia (SK)?[0-9]{10} # Slovakia )$ 

View regex demo

I have added the Croatian VAT alternative here.

Please note that if you expect country codes, delete ? quantifiers after closing parentheses.

When new countries join the European Union or member states change their rules for VAT numbers, regular expression needs to be updated.

Please note that the regular expression in the cookbook does not meet the definition of the Wiki-Irish VAT number.

In addition, it is impossible to fully verify this using a regular expression, because some VAT numbers require specific data that is either difficult to obtain or should be calculated using a common programming language:

  • The French first 2 digits are the “key”, and the French key is calculated as follows: Key = [ 12 + 3 * ( SIREN modulo 97 ) ] modulo 97 , for example: Key = [ 12 + 3 * ( 404,833,048 modulo 97 ) ] modulo 97 = [12 + 3*56] modulo 97 = 180 modulo 97 = 83 therefore the tax number for 404,833,048 is Key = [ 12 + 3 * ( 404,833,048 modulo 97 ) ] modulo 97 = [12 + 3*56] modulo 97 = 180 modulo 97 = 83 404,833,048 FR 83,404,833,048 Source: www.insee.fr.
  • Finland's final VAT figure is a check digit using MOD 11-2
  • Italian VAT has a provincial code with 3 characters (indices 8, 9, 10)
  • Slovak VAT number should be divided by 11
+25
source

Calculations related to the number (mod, multiplication, additions) cannot be represented as (valid) RegExp, since the language is not regular.

Since the numbers are finite in size, it is theoretically possible to create a RegExp that matches all the correct numbers. But this is clearly not practical.

For more information on the actual calculation, see http://www.pruefziffernberechnung.de/U/USt-IdNr.shtml (German)

+1
source

My answer is based on Wikipedia and Wiktor Stribiżew :

 ^(ATU[0-9]{8}|BE[01][0-9]{9}|BG[0-9]{9,10}|HR[0-9]{11}|CY[A-Z0-9]{9}|CZ[0-9]{8,10}|DK[0-9]{8}|EE[0-9]{9}|FI[0-9]{8}|FR[0-9A-Z]{2}[0-9]{9}|DE[0-9]{9}|EL[0-9]{9}|HU[0-9]{8}|IE([0-9]{7}[AZ]{1,2}|[0-9][AZ][0-9]{5}[AZ])|IT[0-9]{11}|LV[0-9]{11}|LT([0-9]{9}|[0-9]{12})|LU[0-9]{8}|MT[0-9]{8}|NL[0-9]{9}B[0-9]{2}|PL[0-9]{10}|PT[0-9]{9}|RO[0-9]{2,10}|SK[0-9]{10}|SI[0-9]{8}|ES[AZ]([0-9]{8}|[0-9]{7}[AZ])|SE[0-9]{12}|GB([0-9]{9}|[0-9]{12}|GD[0-4][0-9]{2}|HA[5-9][0-9]{2}))$ 

I found that some VAT identifiers in Ireland did not work with the specified answer. It is not 100% bulletproof (especially for UK government departments), but must do the job.

+1
source

Cyprus changed to:

 (CY)?[0-9]{8}[AZ] 

This is still not the case on the VIES verification site.

+1
source

I recently did something with this. What I did was keep a list of countries indicated by their 2-character ISO code. In each country there is a field for regular expressions, if given, the validator will use this to check if the input string matches at least this regular expression. If not, then this will be a mistake.

After that I had additional checks for specific countries. They are much more configured to run or not on the backend side. There is no “general” way to do this.

Each country also had the EU flag or did not know if other checks were needed where necessary.

I also used this link: https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s21.html along with the wikipedia list to get a complete list of ISO codes. I also used this as a link for testing VAT numbers: https://www.braemoor.co.uk/software/vattestx.php

0
source

Source: https://habr.com/ru/post/1235602/


All Articles