An expressive way to create generators in Python

I really like Python generators. In particular, I find that they are the right tool for connecting to Rest endpoints - my client code should only iterate on the generator that is associated with the endpoint. However, I find one area where Python generators are not as expressive as we would like. Typically, I need to filter out the data that I get from the endpoint. In my current code, I pass the predicate function to the generator and applies the predicate to the data it processes, and gives only the data if the predicate is True.

I would like to move on to the composition of the generators - for example, data_filter (datasource ()) . Here is some demo code that shows what I tried. It is clear why this does not work, what I'm trying to find out is the most expressive way to solve the problem:

# Mock of Rest Endpoint: In actual code, generator is 
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external" 
def data_filter (d):
    if len(d) < 8:
        yield d

# First Try:
# for w in data_filter(mock_datasource()):
#     print(w)
# >> TypeError: object of type 'generator' has no len()

# Second Try 
# for w in (data_filter(d) for d in mock_datasource()):
#     print(w)
# I don't get words out, 
# rather <generator object data_filter at 0x101106a40>

# Using a predicate to filter works, but is not the expressive 
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
    print(w)
+4
source share
3 answers

data_filtershould apply lenfor elements dnot for itself d, for example:

def data_filter (d):
    for x in d:
        if len(x) < 8:
            yield x

now your code:

for w in data_filter(mock_datasource()):
    print(w)

returns

liberty
seminar
formula
comedy
+4
source

More concisely, you can do this using the generator expression directly:

def length_filter(d, minlen=0, maxlen=8):
    return (x for x in d if minlen <= len(x) < maxlen)

Apply the filter to the generator in the same way as to a regular function:

for element in length_filter(endpoint_data()):
    ...

If your predicate is really simple, a built-in function filtercan also satisfy your needs.

+1

, :

def mock_datasource(filter_function):
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
             "formula","short-circuit", "generate", "comedy"]

    for d in mock_data:
        yield filter_function(d)

def filter_function(d):
    # filter
    return filtered_data
0

Source: https://habr.com/ru/post/1692110/


All Articles