VCard 4.0 Regex

some time ago I created a program for processing vCard files. This can be done almost completely as follows:

(?<FIELD>[^\s:;]+)(;(?<PARAM>[^:]+))*:(?<CONTENT>.*(?>\r\n[ \t].*)*)$ 

However, this does not work for the new (August 2011) vCard 4.0 standard. The problem is that vCard 4.0 files use the following layout:

 FIELD(:)(;([PARAMETER]="[CONTENT],[MORE CONTENT]"(;))[DATATYPE(:)]:)CONTENT[newline] 

eg.

 ADR;type="home,work":(address) 

As you can see, I would like to capture the entire parameter, including the material type = "...".

So my question is: can my code be changed or will I have to write two processes (one for old types and one for the new version 4.0, ideally, I would like to support both), and if so, how? (By the way, I am using C # and .net 4.0).

Sincerely.

+4
source share
1 answer

Try the following regex:

 (?<FIELD>[^\s:;]+)(;(?<PARAM>[^=:;]+)=\"?(?<VALUE>[^:;]+)\"?)*:(?<CONTENT>[^;]*;?)* 

This request seems to handle as an example of vCard 3.0 presented here :

 ADR;TYPE=WORK:;;100 Waters Edge;Baytown;LA;30314;United States of America ADR;TYPE=HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America 

And example 4.0:

 ADR;TYPE=work;LABEL="42 Plantation St.\nBaytown, LA 30314\nUnited States of America" :;;42 Plantation St.;Baytown;LA;30314;United States of America 

It also matches this example from the vCard 4.0 specification here :

 ADR;GEO="geo:12.3457,78.910";LABEL="Mr. John Q. Public, Esq.\n Mail Drop: TNE QB\n123 Main Street\nAny Town, CA 91921-1234\n USA":;;123 Main Street;Any Town;CA;91921-1234;USA 

My disclaimer is that I do not have experience with vCard specifically, I just looked at a part of the specification and looked at the examples while playing with RegExr , so it’s possible that I am missing some cases with edges.

+2
source

Source: https://habr.com/ru/post/1382341/


All Articles