Converting an imperative algorithm that "grows" a table into pure functions

My program, written in Python 3, has many places where it starts with a (very large) tabular structure of numeric data and adds columns to it according to a specific algorithm. (The algorithm is different in all places.)

I am trying to transform this into a pure functional approach, as I am having problems with the imperative approach (it’s hard to reuse it, it’s hard to remember intermediate steps, it’s hard to achieve “lazy” calculations that are prone to errors due to state dependency, etc.).

The Table class is implemented as a dictionary of dictionaries: an external dictionary contains rows indexed by row_id ; the inside contains the values ​​inside the row indexed by column_title . The table methods are very simple:

 # return the value at the specified row_id, column_title get_value(self, row_id, column_title) # return the inner dictionary representing row given by row_id get_row(self, row_id) # add a column new_column_title, defined by func # func signature must be: take a row and return a value add_column(self, new_column_title, func) 

Until now, I just added columns to the source table, and each function took the whole table as an argument. When I move on to pure functions, I will have to make all the arguments unchanged. So, the initial table becomes unchanged. Any additional columns will be created as stand-alone columns and passed only to the functions that need them. A typical function will take an initial table and several columns that are already created, and return a new column.

The problem I am facing is how to implement a separate column ( Column )?

I could make each of them a dictionary, but it seems very expensive. In fact, if I ever need to perform an operation, say, 10 fields in each logical row, I will need to perform 10 searches in the dictionary. And, in addition, each column will contain both a key and a value, doubling its size.

I could make Column simple list and store in it a reference to the mapping from row_id to the index of the array. The advantage is that this collation can be split between all columns that correspond to the same initial table, and also once when it is scanned once, it works for all columns. But does this create any other problems?

If I do this, can I go ahead and actually save the display inside the source table itself? Can I put links from Column objects back to the source table from which they were created? It seems very different from how I imagined a functional approach to work, but I don’t see what problems it can cause, since everything is immutable.

Does the functional approach generally frown while keeping the link in the return value to one of the arguments? It doesn't seem to break anything (e.g. optimization or lazy evaluation), since the argument was already known anyway. But maybe I missed something.

+4
source share
1 answer

Here's how I do it:

Now you cannot change the table -> immutability, great! The next step may be to look at each mutation function that you apply to the table to create a new one:

 f T -> T' 

This should read how to apply the function f in table T to get a new table T '. You can also try to objectify the actual processing of the table data and see in it the action that you apply or add to the table.

 add(T, A) -> T' 

Remarkably, adding can be subtracted instead of giving you an easy way to undo the model. When you enter into this thinking, your code becomes very easy to reason because you do not have a state that can spin things up.

The following is an example of how you can implement and process a table structure in a purely functional form in Python. Imho, Python is not the best language to learn about FP because it makes it easy to program imperatively. Haskell, F # or Erlang is the best choice, I think.

 class Table(frozenset): def __new__(cls, names, rows): return frozenset.__new__(cls, rows) def __init__(self, names, rows): frozenset.__init__(self, rows) self.names = names def add_column(rows, func): return [row + (func(row, idx),) for (idx, row) in enumerate(rows)] def table_process(t, (name, func)): return Table( t.names + (name,), add_column(t, lambda row, idx: func(row)) ) def table_filter(t, (name, func)): names = t.names idx = names.index(name) return Table( names, [row for row in t if func(row[idx])] ) def table_rank(t, name): names = t.names idx = names.index(name) rows = sorted(t, key = lambda row: row[idx]) return Table( names + ('rank',), add_column(rows, lambda row, idx: idx) ) def table_print(t): format_row = lambda r: ' '.join('%15s' % c for c in r) print format_row(t.names) print '\n'.join(format_row(row) for row in t) if __name__ == '__main__': from random import randint cols = ('c1', 'c2', 'c3') T = Table( cols, [tuple(randint(0, 9) for x in cols) for x in range(10)] ) table_print(T) # Columns to add to the table, this is a perfect fit for a # reduce. I'd honestly use a boring for loop instead, but reduce # is a perfect example for how in FP data and code "becomes one." # In fact, this whole program could have been written as just one # big reduce. actions = [ ('max', max), ('min', min), ('sum', sum), ('avg', lambda r: sum(r) / float(len(r))) ] T = reduce(table_process, actions, T) table_print(T) # Ranking is different because it requires an ordering, which a # table does not have. T2 = table_rank(T, 'sum') table_print(T2) # Simple where filter: select * from T2 where c2 < 5. T3 = table_filter(T2, ('c2', lambda c: c < 5)) table_print(T3) 
+1
source

Source: https://habr.com/ru/post/1434949/


All Articles