Chain Verbs in J

Suppose a boxed matrix containing various types:

matrix =: ('abc';'defgh';23),:('foo';'bar';45) matrix 
  + --- + ----- + - +
 | abc | defgh | 23 |
 + --- + ----- + - +
 | foo | bar | 45 |
 + --- + ----- + - +

And column descriptor:

 columnTypes =: 'string';'string';'num' 

I want to apply verbs to this matrix by column according to types. I will use the verbs DoString and DoNum:

 chain =: (('string';'num') i. columnTypes) { DoString`DoNum 

EDIT: column descriptors are important; deciding which verb to use based on them is not for the type itself . In fact, I could have several types of strings, numbers, and even dates (which would be numeric in J).

How to apply chain to each row of matrix ? Verbs themselves can take care of whether the transmitted meaning is placed or not, that’s fine. In addition, I would prefer to avoid the matrix transfer ( |: , since it can be quite large.

+6
source share
3 answers

Through experiments, I get this to do what I want:

 1 chain\"1 matrix 

Now, to understand this ...

0
source

The standard way to do this is:

  • Transform the ordered row (cell) structure into a column oriented structure

  • Apply the correct verb to each column (only once)

Step (1) is simple. Step (2) is also simple, but not so obvious. There is a little trick that helps.

The trick is that a number of primitive operators take gerund as their left argument and create a function that cycles through the gerund, applying each verb in turn. IMO, the most useful operator in this category is ;. . Here is an example implementation:

Step (0), inputs:

  matrix =: ('abc';'defgh';23),:('foo';'bar';45) columnTypes =: 'string';'string';'num' DoString =: toupper DoNum =: 0&j. matrix +---+-----+--+ |abc|defgh|23| +---+-----+--+ |foo|bar |45| +---+-----+--+ 

Step (1), columify data:

  columnify =: <@:>" 1@ :|: :. rowify =: <"_1&> columnify matrix +---+-----+-----+ |abc|defgh|23 45| |foo|bar | | +---+-----+-----+ 

Note that there is a reverse code in the column that will β€œencode” the data, although you should not do this: see below.

Step (2), apply the correct verb to each column (exactly once) using the verb-cycling function;. :

  homogenize =: ({. foo&.>@:{.`'') [^:('foo'-:])L:0~ ] chain =: DoString`DoNum`] homogenize@ {~ ('string';'num')&i. 

Note that the default conversion for unknown column types is an identification function, ] .

The verb homogenize normalizes the input and output of each processor column (that is, it abstracts the preprocessing and homogenize so that the user must provide only the dynamic "core" of the conversion). The verb chain takes a list of column types as input and outputs a gerund suitable for using the left argument for;. (or similar operator).

Thus:

  1 (chain columnTypes);.1 columnify matrix +---+-----+---------+ |ABC|DEFGH|0j23 0j45| |FOO|BAR | | +---+-----+---------+ 

Or, if you really have an NxM table in cell boxes, use the "under" columnify section:

  1 (chain columnTypes);.1&.columnify matrix +-----+-----+ |ABC |FOO | +-----+-----+ |DEFGH|BAR | +-----+-----+ |0j23 |0j45 | +-----+-----+ 

But note that in the context of J, it is much more appropriate to save the table as a list of homogeneous columns in both performance and notation.

J works best when handling toto arrays; rule of thumb - you must allow a primitive or user-defined name to see as much data as possible in each application. This is the main advantage of this "columificaton" approach: if you store your data as a list of homogeneous columns, it will be faster and easier to manipulate later.

However, if your use case does require that the data be stored as an NxM table in drawers with boxes, then converting your data to normal form in the column and in the column is an expensive no-op. In this case, you must adhere to your original decision,

  1 chain\"1 matrix 

which (because you asked) actually works in the same room as the approach;. . In particular, \ is another of those primitive operators that takes the gerund argument and sequentially applies each verb (i.e., to each new data window cyclically).

In fact, what 1 chain\"1 matrix does, breaks the matrix into rows ( "1 ), and for each row it creates a one-way window ( 1 f\ matrix ), applying the verbs chain to each of these 1 window windows (t .e. f changes with each 1-wide data window of each row of the matrix).

Since the moving 1-window of the row (rank-1 vector) is the atom of the row, the order and the verbs chain are given in the same order, in reality you apply these verbs to the columns of the matrix, one. atom. in. a. time.

In short: 1 chain\"1 matrix similar to foo"0 matrix , with the exception of foo changes for each atom. And this should be avoided for the same reason. foo"0 matrix should be avoided altogether: since the application of functions in a small rank works against the grain J, which leads to reduced performance.

In general, it is better to use assignment functions at higher ranks whenever you can, which in this case requires transforming (and maintaining) the matrix into a normal column form.

In other words, here ;. matches "1 because \ matches "0 . If you find that the integer part of columnify / homogenize too long or bulky (compared to 1 chain\"1 matrix ), you can import the script provided in [1], which packs these definitions as reusable utilities, with an extension. See page for examples and instructions.

[1] Related utility script:
http://www.jsoftware.com/jwiki/DanBron/Snippets/DOOG

+4
source

If these calculations depend only on the data inside the individual boxes (and, possibly, on global values), you can use the Agenda with Open aka Every. The application of this method is shown below:

  doCells =: (doNum`doString @. isLiteral)&.> isLiteral=: 2 -: 3!:0 doNum =: +: NB. Double doString =: toupper doCells matrix β”Œβ”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β” β”‚ABCβ”‚DEFGHβ”‚46β”‚ β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€ β”‚FOOβ”‚BAR β”‚90β”‚ β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”˜ 

(In this example, I set arbitrary values ​​for doNum and doString to make viability simple.)

The version of isLiteral used here may be sufficient, but it will fail if either sparse literals or unicode values ​​are involved.

If the calculations should include more matrix than one block, this will not be the answer to your question. If the calculation is to be performed on a line, instead the solution may include applying the verb in rank _1 (i.e., to each element of the highest axis.)

+2
source

Source: https://habr.com/ru/post/894071/


All Articles