Facebook, as on demand, content scraper

you guys saw that FB resets the link that you post to facebook (status, message, etc.), right after you paste it into the link field and displays various metadata, thumb images, various images with page link or thumb video with a link to the video (for example, youtube).

any ideas on how to copy this feature? I am thinking of two worker reducers or even better just javascript that processes xhr requests and parses the content based on a regular expression or something similar ... any ideas? any links? Has anyone already tried to do the same and wrapped it in class? something?:)

thank!

+3
source share
3 answers

FB resets meta tags from HTML.

those. when you enter the URL, FB displays the page title, then the URL (truncated), and then the content of the <meta name = "description"> element.

Regarding the selection of thumbnails, I think that perhaps FB selects only those that exceed certain sizes, i.e. skip button graphics, 1px spacers, etc.

Edit: I don’t know exactly what you are looking for, but here is the PHP function to clear the relevant data from the pages.
This uses the simple HTML DOM library from http://simplehtmldom.sourceforge.net/

I looked at how FB does it, and it looks like scraping is done on the server side.

    class ScrapedInfo
    {
        public $ url;
        public $ title;
        public $description;
        public $imageUrls;
    }

    function scrapeUrl($url)
    {
        $info = new ScrapedInfo();
        $info->url = $url;
        $html = file_get_html($info->url);

        //Grab the page title
        $info->title = trim($html->find('title', 0)->plaintext);

        //Grab the page description
        foreach($html->find('meta') as $meta)
                if ($meta->name == "description")
                        $info->description = trim($meta->content);

        //Grab the image URLs
        $imgArr = array();
        foreach($html->find('img') as $element)
        {
                $rawUrl = $element->src;

                //Turn any relative Urls into absolutes
                if (substr($rawUrl,0,4)!="http")
                        $imgArr[] = $url.$rawUrl;
                else
                        $imgArr[] = $rawUrl;
        }
        $info->imageUrls = $imgArr;

        return $info;
    }

+14

Facebook HTML- , . title description , <link rel="image_src" href="thumbnail.jpg" /> screengrab. , . , .

0

, , , , javascript, - , . , . http://www.embedify.me, .net, , javascript, javascript api, ui/, fb.

0

Source: https://habr.com/ru/post/1748264/


All Articles