How to choose based on partial string matching in Mathematica

Let's say I have a matrix that looks something like this:

{{foobar, 77},{faabar, 81},{foobur, 22},{faabaa, 8}, {faabian, 88},{foobar, 27}, {fiijii, 52}} 

and a list like this:

 {foo, faa} 

Now, I would like to add numbers for each row in the matrix based on the partial match of the rows in the list, so I get the following:

 {{foo, 126},{faa, 177}} 

I assume that I need to display the Select command, but I'm not quite sure how to do this and only match a partial string. Can anybody help me? Now my real matrix is ​​about 1.5 million rows, so something not too slow will have extra value.

+4
source share
4 answers

Here is another approach. It is fast enough as well as concise.

 data = {{"foobar", 77}, {"faabar", 81}, {"foobur", 22}, {"faabaa", 8}, {"faabian", 88}, {"foobar", 27}, {"fiijii", 52}}; match = {"foo", "faa"}; f = {#2, Tr @ Pick[#[[All, 2]], StringMatchQ[#[[All, 1]], #2 <> "*"]]} &; f[data, #]& /@ match 
  {{"foo", 126}, {"faa", 177}} 

You can use ruebenko preprocessing for more speed.
This is about twice as fast as its method on my system:

 {str, vals} = Transpose[data]; vals = Developer`ToPackedArray[vals]; f2 = {#, Tr @ Pick[vals, StringMatchQ[str, "*" <> # <> "*"]]} &; f2 /@ match 

Please note that in this version I am testing substrings that are not at the beginning to match the ruebenko output. If you only want to match at the beginning of the lines, as I suggested in the first function, this will be even faster.

+2
source

Here is the starting point:

 data={{"foobar",77},{"faabar",81},{"foobur",22},{"faabaa",8},{"faabian",88},{"foobar",27},{"fiijii",52}}; {str,vals}=Transpose[data]; vals=Developer`ToPackedArray[vals]; findValPos[str_List,strPat_String]:= Flatten[Developer`ToPackedArray[ Position[StringPosition[str,strPat],Except[{}],{1},Heads->False]]] Total[vals[[findValPos[str,"faa"]]]] 
+3
source

make data

 mat = {{"foobar", 77}, {"faabar", 81}, {"foobur", 22}, {"faabaa", 8}, {"faabian", 88}, {"foobar", 27}, {"fiijii", 52}}; lst = {"foo", "faa"}; 

now select

 r1 = Select[mat, StringMatchQ[lst[[1]], StringTake[#[[1]], 3]] &]; r2 = Select[mat, StringMatchQ[lst[[2]], StringTake[#[[1]], 3]] &]; {{lst[[1]], Total@r1 [[All, 2]]}, {lst[[2]], Total@r2 [[All, 2]]}} 

gives

 {{"foo", 126}, {"faa", 177}} 

I will try to make it more functional / general if I can ...

edit (1)

This below makes it more general. (using the same data as above):

 foo[mat_, lst_] := Select[mat, StringMatchQ[lst, StringTake[#[[1]], 3]] &] r = Map[foo[mat, #] &, lst]; MapThread[ {#1, Total[#2[[All, 2]]]} &, {lst, r}] 

gives

 {{"foo", 126}, {"faa", 177}} 

So now the same code will work if lst was changed to 3 elements instead of 2:

 lst = {"foo", "faa", "fii"}; 
+1
source

What about:

 list = {{"foobar", 77}, {"faabar", 81}, {"foobur", 22}, {"faabaa", 8}, {"faabian", 88}, {"foobar", 27}, {"fiijii", 52}}; t = StringTake[#[[1]], 3] &; {t[#[[1]]], Total[#[[All, 2]]]} & /@ SplitBy[SortBy[list, t], t] {{"faa", 177}, {"fii", 52}, {"foo", 126}} 

I'm sure I read a post, maybe here, in which someone described a function that effectively combined sorting and separation, but I can't remember it. Maybe someone can add a comment if they know about it.

Edit

ok should be a dream - how could I forget Gatherby

 {t[#[[1]]], Total[#[[All, 2]]]} & /@ GatherBy[list, t] {{"foo", 126}, {"faa", 177}, {"fii", 52}} 

Please note that for a fake list of 1.4 million pairs, this took a couple of seconds, so the method is not exactly super fast.

+1
source

Source: https://habr.com/ru/post/1390404/


All Articles