FB resets meta tags from HTML.
those. when you enter the URL, FB displays the page title, then the URL (truncated), and then the content of the <meta name = "description"> element.
Regarding the selection of thumbnails, I think that perhaps FB selects only those that exceed certain sizes, i.e. skip button graphics, 1px spacers, etc.
Edit: I don’t know exactly what you are looking for, but here is the PHP function to clear the relevant data from the pages.
This uses the simple HTML DOM library from http://simplehtmldom.sourceforge.net/
I looked at how FB does it, and it looks like scraping is done on the server side.
class ScrapedInfo
{
public $ url;
public $ title;
public $description;
public $imageUrls;
}
function scrapeUrl($url)
{
$info = new ScrapedInfo();
$info->url = $url;
$html = file_get_html($info->url);
//Grab the page title
$info->title = trim($html->find('title', 0)->plaintext);
//Grab the page description
foreach($html->find('meta') as $meta)
if ($meta->name == "description")
$info->description = trim($meta->content);
//Grab the image URLs
$imgArr = array();
foreach($html->find('img') as $element)
{
$rawUrl = $element->src;
//Turn any relative Urls into absolutes
if (substr($rawUrl,0,4)!="http")
$imgArr[] = $url.$rawUrl;
else
$imgArr[] = $rawUrl;
}
$info->imageUrls = $imgArr;
return $info;
}