A little help needed to translate code (from Python to C #)

Good night,

This question leaves me a little confused, because, I know that I can get one answer. However, my knowledge of Python is just a little more than nothing, so I need help from someone more experienced with it than me ...

The following code is from Norvig's “Natural Language Corpus Data” in a recently edited book, and about converting the sentence “likethisone” to “[like, this, one]” (this means the word segmentation is correct) ...

I ported all the code to C # (in fact, I rewrote the program myself), except for the function segmentin which I have a lot of problems, even trying to understand its syntax. Can someone please help me translate it into a more readable form in C #?

Thank you in advance.

################ Word Segmentation (p. 223)

@memo
def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []
    candidates = ([first]+segment(rem) for first,rem in splits(text))
    return max(candidates, key=Pwords)

def splits(text, L=20):
    "Return a list of all possible (first, rem) pairs, len(first)<=L."
    return [(text[:i+1], text[i+1:]) 
            for i in range(min(len(text), L))]

def Pwords(words): 
    "The Naive Bayes probability of a sequence of words."
    return product(Pw(w) for w in words)

#### Support functions (p. 224)

def product(nums):
    "Return the product of a sequence of numbers."
    return reduce(operator.mul, nums, 1)

class Pdist(dict):
    "A probability distribution estimated from counts in datafile."
    def __init__(self, data=[], N=None, missingfn=None):
        for key,count in data:
            self[key] = self.get(key, 0) + int(count)
        self.N = float(N or sum(self.itervalues()))
        self.missingfn = missingfn or (lambda k, N: 1./N)
    def __call__(self, key): 
        if key in self: return self[key]/self.N  
        else: return self.missingfn(key, self.N)

def datafile(name, sep='\t'):
    "Read key,value pairs from file."
    for line in file(name):
        yield line.split(sep)

def avoid_long_words(key, N):
    "Estimate the probability of an unknown word."
    return 10./(N * 10**len(key))

N = 1024908267229 ## Number of tokens

Pw  = Pdist(datafile('count_1w.txt'), N, avoid_long_words)
+3
source share
2 answers

First enable the first function:

def segment(text): 
    "Return a list of words that is the best segmentation of text." 
    if not text: return [] 
    candidates = ([first]+segment(rem) for first,rem in splits(text)) 
    return max(candidates, key=Pwords) 

He takes the word and returns the most likely list of words that he can be, so his signature will be static IEnumerable<string> segment(string text). Obviously, if textis an empty string, its result should be an empty list. Otherwise, it creates a recursive list comprehension that identifies possible lists of word candidates and returns a maximum based on its probability.

static IEnumerable<string> segment(string text)
{
    if (text == "") return new string[0]; // C# idiom for empty list of strings
    var candidates = from pair in splits(text)
                     select new[] {pair.Item1}.Concat(segment(pair.Item2));
    return candidates.OrderBy(Pwords).First();
}

, splits. - . :

static IEnumerable<Tuple<string, string>> splits(string text, int L = 20)
{
    return from i in Enumerable.Range(1, Math.Min(text.Length, L))
           select Tuple.Create(text.Substring(0, i), text.Substring(i));
}

Pwords, product Pw :

static double Pwords(IEnumerable<string> words)
{
    return product(from w in words select Pw(w));
}

product :

static double product(IEnumerable<double> nums)
{
    return nums.Aggregate((a, b) => a * b);
}

:

, , Norvig segment . , :

static Dictionary<string, IEnumerable<string>> segmentTable =
   new Dictionary<string, IEnumerable<string>>();

static IEnumerable<string> segment(string text)
{
    if (text == "") return new string[0]; // C# idiom for empty list of strings
    if (!segmentTable.ContainsKey(text))
    {
        var candidates = from pair in splits(text)
                         select new[] {pair.Item1}.Concat(segment(pair.Item2));
        segmentTable[text] = candidates.OrderBy(Pwords).First().ToList();
    }
    return segmentTable[text];
}
+2

#, , Python.

@memo
def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []
    candidates = ([first]+segment(rem) for first,rem in splits(text))
    return max(candidates, key=Pwords)

,

@memo

decorator. , , . . , , , , memoizes segment.

:

def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []

, docstring .

- , , , , :

    candidates = ([first]+segment(rem) for first,rem in splits(text))

for..in . , splits(text). for-loop, . candidates. "Genexps" , , , .

, , splits(text), .

splits(text) (first, rem).

first; first , .. [first]. ; segment. Python , .. [1, 2] + [3, 4] [1, 2, 3, 4].

,

    return max(candidates, key=Pwords)

iteration, max. , , , .

0

Source: https://habr.com/ru/post/1769816/


All Articles