Creating Paraphrases of English Text Using PPDB

I need to generate a paraphrase of an English sentence using the PPDB paraphrase database

I downloaded datasets from the website .

+5
source share
1 answer

I would say that your first step is to reduce the problem to more manageable components. Secondly, whether it is necessary to rephrase on the basis of a one-to-one, lexical, syntactic, phrasal or combined basis. To report this decision, I would take one sentence and rephrase it myself to get an idea of โ€‹โ€‹what I'm looking for. Then I started writing a parser for the uploaded data. Then I would remove the stop words and include the tag part of the speech part, like the ones that were included in spaCy or nltk for your phrase.

Since they seem to give you all the information you need to create a consistent dictionary filter where I start. I would write a filter that would find the parts of speech for each word in my sentence in the [LHS] column of the data set and select the source that matches the word, minimizing / maximizing the value of 1 function (for example, minimizing WordLenDiff), which in the case of "businessnow" < - "business now" = -1.5. By tracking the objective function, you will have a basic paraphrase sentence.

Using this strategy, your output may change:

"the business uses 4 gb standard." sent_score = 0 

in

 "businessnow uses 4gb standard" sent_score = -3 

After you have a basic example, you can start learning function selection algorithms, like in scikit-learn, etc., and enable word alignment. But I would seriously reduce the problem and gradually increase it. In the end, how you approach the problem depends on what is designated as the intended use and how it should function.

Hope this helps.

+1
source

Source: https://habr.com/ru/post/1269313/


All Articles