Getting excerpt from HTML in PHP

I need to get a brief snippet of news written in HTML to show on my first page. Obviously, I can’t use something as simple as substrbecause it can leave the tags open or even leave half the tag.

Which is easier:

  • Convert HTML to a decent text look and take some of it
  • Taking in the beginning from HTML and closing any closed tags when trimming (will this always look normal?)

And how can I implement the selected solution?

+3
source share
5 answers

The easiest way is to remove all the HTML from the text of the element using strip_tags()before trimming it.

+6

, HTML .

Tidy, . , . tidy:: cleanRepair.

+3

, , , . ; PHP Simple HTML DOM Parser PHP HTML DOM Parser

, , , Slashdot

// Create DOM from URL
$html = file_get_html('http://slashdot.org/');

// Find all article blocks
foreach($html->find('div.article') as $article) {
    $item['title']   = $article->find('div.title', 0)->plaintext;
    $item['intro']   = $article->find('div.intro', 0)->plaintext;
    $item['details'] = $article->find('div.details', 0)->plaintext;
    $articles[] = $item;
}

print_r($articles); 
+2

XML "" .

. XML .

+1

It comes off to the first paragraph without cutting words and adds an optional footprint.

$ excerpt = self :: excerpt_paragraph ($ html, 180)

/**
* excerpt first paragraph from html content
* 
**/
public static function excerpt_paragraph($html, $max_char = 100, $trail='...' )
{
    // temp var to capture the p tag(s)
    $matches= array();
    if ( preg_match( '/<p>[^>]+<\/p>/', $html, $matches) )
    {
        // found <p></p>
        $p = strip_tags($matches[0]);
    } else {
        $p = strip_tags($html);
    }
    //shorten without cutting words
    $p = self::short_str($p, $max_char );

    // remove trailing comma, full stop, colon, semicolon, 'a', 'A', space
    $p = rtrim($p, ',.;: aA' );

    // return nothing if just spaces or too short
    if (ctype_space($p) || $p=='' || strlen($p)<10) { return ''; }

    return '<p>'.$p.$trail.'</p>';
}
//

/**
* shorten string but not cut words
* 
**/
public static function short_str( $str, $len, $cut = false )
{
    if ( strlen( $str ) <= $len ) { return $str; }
    $string = ( $cut ? substr( $str, 0, $len ) : substr( $str, 0, strrpos( substr( $str, 0, $len ), ' ' ) ) );
    return $string;
}
//
0
source

Source: https://habr.com/ru/post/1723476/


All Articles