Multiple regex matches in a Google Sheets formula

I am trying to get a list of all the digits preceding the hyphen in a given line (say, in cell A1 ) using the Google Sheets regular expression formula:

 =REGEXEXTRACT(A1, "\d-") 

My problem is that it only returns the first match ... How can I get all matches?

Sample text:

 "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq" 

My formula returns 1- , while I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).

I know that to achieve the desired result, I could use a script or other function (for example, SPLIT ), but I really want to know how I can get the re2 regular expression to return such multiple matches in " REGEX.* " Google Sheets Formula . Something like the option < g lobal - Do not return after the first match " regex101.com

I also tried to remove unwanted text with REGEXREPLACE , without any success (I could not get rid of other numbers not preceding the hyphen).

Any help appreciated! Thanks:)

+6
source share
4 answers

Edit

I came up with a more general solution:

=regexreplace(A1,"(.)?(\d-)|(.)","$2")


Try this formula:

=regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")

It will process the string as follows:

"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"

with output:

1-2-2-2-3-

+2
source

In fact, you can do this in one formula using regexreplace to surround all values ​​with a capture group instead of replacing text:

 =join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)"))) 

basically what he does is surrounds all instances of \d- a capture group, and then uses a regex extract, it neatly returns all captures. if you want to join it back in one line, you can simply use the union to pack it back in one cell:

enter image description here

+3
source

This seems to work, and I tried to test it.

Logics

(1) Replace the letter followed by a hyphen, nothing

(2) Replace any number followed by a hyphen with nothing

(3) Replace anything that is not a digit or hyphen with anything

 =regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","") 

Result

 1-2-2-2-2-2-2-2-2-2-3-3- 

Analysis

I had to go through these procedures to convince myself that it was right. According to this link , when there are alternatives separated by a pipe symbol, the regular expression should match them in order from left to right. The above formula does not work correctly if the first rule does not come into effect (otherwise it reduces all characters except a digit or hyphen to zero before rule (1) can enter the game and you will get an additional hyphen from "Patho- jour ").

Here are some examples of how I think it should deal with text

enter image description here

+2
source

I could not accept the accepted answer to the work in my case. I would like to do it this way, but you needed a quick solution, and I went with the following:

Input data:

 1111 days, 123 hours 1234 minutes and 121 seconds 

Expected Result:

 1111 123 1234 121 

Formula:

 =split(REGEXREPLACE(C26,"[az,]"," ")," ") 
0
source

Source: https://habr.com/ru/post/1266774/


All Articles