Simplification for-if mess with better structure?

Please move this question to Code Review -area . This works better because I know that the code below is trash and I want to get critical feedback to complete the rewriting. I invent the wheel a lot.

# Description: you are given a bitwise pattern and a string # you need to find the number of times the pattern matches in the string. # The pattern is determined by markov chain. # For simplicity, suppose the ones and zeros as unbiased coin flipping # that stops as it hits the pattern, below. # # Any one liner or simple pythonic solution? import random def matchIt(yourString, yourPattern): """find the number of times yourPattern occurs in yourString""" count = 0 matchTimes = 0 # How can you simplify the for-if structures? # THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label # please, read clarifications in [Update] for coin in yourString: #return to base if count == len(pattern): matchTimes = matchTimes + 1 count = 0 #special case to return to 2, there could be more this type of conditions #so this type of if-conditionals are screaming for a havoc if count == 2 and pattern[count] == 1: count = count - 1 #the work horse #it could be simpler by breaking the intial string of lenght 'l' #to blocks of pattern-length, the number of them is 'l - len(pattern)-1' if coin == pattern[count]: count=count+1 average = len(yourString)/matchTimes return [average, matchTimes] # Generates the list myString =[] for x in range(10000): myString= myString + [int(random.random()*2)] pattern = [1,0,0] result = matchIt(myString, pattern) print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" + "So it took "+str(result[0])+" steps in average.\n" + "RESULT: "+str([a for a in "FAILURE" if result[0] != 8])) # Sample Output # # The sample had 1656 matches and its size was 10000. # So it took 6 steps in average. # RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E'] 

[Update]

I will explain a little about theory here, perhaps the problem can be simplified in this way. The code above will try to build a chain of marks with a transition matrix A below. The corresponding pattern corresponds to 100 , which you can imagine as turning coins over.

 >>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0') >>> I=numpy.identity(3) >>> I array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]) >>> Q matrix([[ 0.5, 0.5, 0. ], [ 0. , 0.5, 0.5], [ 0. , 0.5, 0. ]]) >>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1') >>> A matrix([[ 0.5, 0.5, 0. , 0. ], [ 0. , 0.5, 0.5, 0. ], [ 0. , 0.5, 0. , 0.5], [ 0. , 0. , 0. , 1. ]]) 

average 8 in the question becomes the sum of the values ​​in the first row in the matrix N=(IQ)^-1 , where Q higher.

 >>> (IQ)**-1 matrix([[ 2., 4., 2.], [ 0., 4., 2.], [ 0., 2., 2.]]) >>> numpy.sum(((IQ)**-1)[0]) 8.0 

Now you will probably see that this problem with only a matching pattern visible becomes a chain of marks. I see no reason why you could not replace the messy conditions for-in-case with something similar to matrices or matrices. I don’t know how to implement them, but iterators may be able to go exploring, especially with a lot of states where you need to decompose.

But there was a problem with Numpy, what are -Inf and NaN things -Inf ? Check the values ​​with which they should converge from (IQ)**-1 . N is N=I+Q+Q^2+Q^3+...=\frac{IQ^{n}}{IQ} .

 >>> (IQ**99)/(IQ) matrix([[ 2.00000000e+00, 1.80853571e-09, -Inf], [ NaN, 2.00000000e+00, 6.90799171e-10], [ NaN, 6.90799171e-10, 1.00000000e+00]]) >>> (IQ**10)/(IQ) matrix([[ 1.99804688, 0.27929688, -Inf], [ NaN, 1.82617188, 0.10742188], [ NaN, 0.10742188, 0.96679688]]) 
+4
source share
3 answers

Ok - standard (-ish) string search:

 def matchIt(needle, haystack): """ @param needle: string, text to seek @param haystack: string, text to search in Return number of times needle is found in haystack, allowing overlapping instances. Example: matchIt('abab','ababababab') -> 4 """ lastSeenAt = -1 timesSeen = 0 while True: nextSeen = haystack.find(needle, lastSeenAt+1) if nextSeen==-1: return timesSeen else: lastSeenAt = nextSeen timesSeen += 1 

but do you want to do this with a list of numbers? No problems; we just need to create a list class using the find () method, for example:

 import itertools class FindableList(list): def find(self, sub, start=None, end=None): """ @param sub: list, pattern to look for in self @param start: int, first possible start-of-list If not specified, start at first item @param: end: int, last+1 possible start-of-list If not specified, end such that entire self is searched Returns; Starting offset if a match is found, else -1 """ if start is None or start < 0: start = 0 # NB If end is allowed to be too high, # zip() will silently truncate the list comparison # and you will probably get extra spurious matches. lastEnd = len(self) - len(sub) + 1 if end is None or end > lastEnd: end = lastEnd rng = xrange if xrange else range iz = itertools.izip isl = itertools.islice for pos in rng(start, end): if all(a==b for a,b in iz(sub, isl(self, pos, end))): return pos # no match found return -1 

then the example looks like

 matchIt([1,2,1,2], FindableList([1,2,1,2,1,2,1,2,1,2])) -> 4 

and your code will look like this:

 # Generate a list randIn = lambda x: int(x*random.random()) myString =[randIn(2) for i in range(10000)] pattern = [1,0,0] result = matchIt(pattern, myString) print("The sample had {0} matches and its size was {1}.\n".format(result, len(myString))) 
+1
source
 def matchIt(yourString, yourPattern): """find the number of times yourPattern occurs in yourString""" 

Are you allowed to use the following?

 yourString.count(yourPattern) 

In your case, you can create myString as a real string of 10,000 characters, and pattern also as a string, and then count the origin in a simple pythonic way.

EDIT

A single line that gives you the number of (overlapping) inputs of pattern in text (which can be either a string or a list) can look like this:

 nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern) 
+2
source

This is not ready.

A similar question, but the main focus on graph libraries here is a similar question, but in C # it might be useful.

Files related to this question are ./networkx/generators/degree_seq.py (997 lines, about generating graphs with a given power sequence) and ./networkx/algorithms/mixing.py (line 20, function degree_assortativity(G) about probability based graphs) , and also note that its source code refers to 92 links, not sure if you want to reinvent the wheel. For igraph, please read line 835 of the convert.c file about weighted edges. You can get the Networkx source here and the source for igraph here . Please note that the first is under the BSD license and runs in Python, and igraph under the GNU (GPL) and runs in C.

To get started with Networkx, a useful line on creating a weighted graph from its jUnits test_convert_scipy.py -file:

 def create_weighted(self, G): g = cycle_graph(4) e = g.edges() source = [u for u,v in e] dest = [v for u,v in e] weight = [s+10 for s in source] ex = zip(source, dest, weight) G.add_weighted_edges_from(ex) return G 

So, to create your Markov chain, help about a focused weighted chart here , maybe something like this:

 >>> DG=nx.DiGraph() >>> DG.add_weighted_edges_from([(0,0,0.5),(1,1,0.5),(3,3,1),(0,1,0.5),(1,2,0.5),(2,3,0.5), (2,1,0.5)]) 

or, perhaps, there is some ready-made tool for generating a chain of marks, as for some other random processes, more here . It is impossible to find an algorithm to analyze a graph with the exception of the value or to do tests with different sets, as in your example, maybe there are none, and you must adhere to other replica solutions.

0
source

Source: https://habr.com/ru/post/1335267/


All Articles