Create resume from link

Many pages (facebook, google +, etc.) have a function that creates a summary with a title, image and some text from a link. I tried to find out if there were any libraries or recommendations on how to perform this function, but my search results were not at all useful.

I know that I can parse html pages and extract the elements that I would like, but I think that this should be some kind of standard (maybe also how to create pages that are friendly to this type of functionality.

Anyone who has a good link that points me in the right direction? Javascript or .Net is my preferred choice, but I can implement it too.

+4
source share
2 answers

For "perhaps also how to create pages that are friendly to this kind of functionality." part:
You are probably looking for an open graph protocol :

<html xmlns:og="http://ogp.me/ns#"> <head> <title>The Rock (1996)</title> <meta property="og:title" content="The Rock" /> <meta property="og:type" content="movie" /> <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> ... </head> ... </html> 

I think this is the first place facebook will look at. But facebook seems to have its own algorithms for detecting the most important part of the page when these tags are missing.

+1
source

Many pages (facebook, google +, etc.) have a function that creates a summary with a title, image and some text by link. I tried to find out if there were any libraries or recommendations on how to do this function, but my search results were not at all useful.

Such a function is usually created using some kind of "workaround", which means that your script opens the link and looks at its data. Just like you offer yourself.

I know that I can parse html pages and extract elements I would like, but I think there should be some standard in how to do this (maybe also how to create pages that are friendly to this kind of functionality.

The standard way is how most search engines like Google do it. You get a title from the name of the site, a description from the description, if any. Most search engines now ignore description metadata and instead try to create their own resume.

This is usually done by searching for headings (h1, h2, etc.) and then paragraphs.

And to make the site β€œFriendly” for this kind of workaround, you create your site in accordance with web standards ( W3C ).

Anyone who has a good link that will point me in the right direction? Javascript or .Net is my preferred choice, but I can implement it too.

A language does not really matter if it is capable of performing some basic HTTP-GET.

0
source

Source: https://habr.com/ru/post/1368918/


All Articles