How to extract quotes from text (PHP)?

Hello!

I would like to extract all quotes from the text. In addition, the name of the quoted person should be indicated. DayLife does it very well.

Example:

"They think his" game is over, "said one senior official.

Phrase They think this is a “game," and the person quoted must be retrieved from one senior administration official.

Do you think this is possible? You can only distinguish quotes and words in quotation marks if you are checking to see if the person mentioned is mentioned.

Example:

“I think it’s serious and it’s getting worse,” Admiral Mullen said on Sunday in CNN’s State of the Union program.

. ? a) , . ) . 3 , , ? b), .

?

, .

<?php
$text = '';
$quote_marks = array('"', '"', '„', '»', '«');
$text = str_replace($quote_marks, '"', $text);
?>

, :

<?php
function extract_quotations($text) {
   $result = preg_match_all('/"([^"]+)"/', $text, $found_quotations);
   if ($result == TRUE) {
      return $found_quotations;
      // check for count of blank spaces
   }
   return array();
}
?>

?

, . !

+3
3

ceejayoz, . , ( , " , , ", " " ), . - PHP, , python: http://www.nltk.org/

, , , , . - :

abstract class QuotationExtractor {

    protected static $instances;

    public static function getAllPossibleQuotations($string) {
        $possibleQuotations = array();
        foreach (self::$instances as $instance) {
            $possibleQuotations = array_merge(
                $possibleQuotations,
                $instance->extractQuotations($string)
            );
        }
        return $possibleQuotations;
    }

    public function __construct() {
        self::$instances[] = $this;
    }

    public abstract function extractQuotations($string);

}

class RegexExtractor extends QuotationExtractor {

    protected $rules;

    public function extractQuotations($string) {
        $quotes = array();
        foreach ($this->rules as $rule) {
            preg_match_all($rule[0], $string, $matches, PREG_SET_ORDER);
            foreach ($matches as $match) {
                $quotes[] = array(
                    'quote' => trim($match[$rule[1]]),
                    'cited' => trim($match[$rule[2]])
                );
            }
        }
        return $quotes;
    }

    public function addRule($regex, $quoteIndex, $authorIndex) {
        $this->rules[] = array($regex, $quoteIndex, $authorIndex);
    }

}

$regexExtractor = new RegexExtractor();
$regexExtractor->addRule('/"(.*?)[,.]?\h*"\h*said\h*(.*?)\./', 1, 2);
$regexExtractor->addRule('/"(.*?)\h*"(.*)said/', 1, 2);
$regexExtractor->addRule('/\.\h*(.*)(once)?\h*said[\-]*"(.*?)"/', 3, 1);

class AnotherExtractor extends Quot...

, , / , . , :

array(4) {
  [0]=>
  array(2) {
    ["quote"]=>
    string(15) "Not necessarily"
    ["cited"]=>
    string(8) "ceejayoz"
  }
  [1]=>
  array(2) {
    ["quote"]=>
    string(28) "They think it `game over,'"
    ["cited"]=>
    string(34) "one senior administration official"
  }
  [2]=>
  array(2) {
    ["quote"]=>
    string(46) "I think it is serious and it is deteriorating,"
    ["cited"]=>
    string(14) "Admiral Mullen"
  }
  [3]=>
  array(2) {
    ["quote"]=>
    string(16) "Not necessarily,"
    ["cited"]=>
    string(0) ""
  }
}
+3

3 , , ?

" ", - ceejayoz.

. ? a) , . ) . 3 , , ? b), .

b) - " " 3 .

+3

- , , , (.!?).

0

Source: https://habr.com/ru/post/1715912/


All Articles