Open webpage programmatically and get html as a string

Question

Open webpage programmatically and get html as a string

I have a facebook account and I would like to extract my friend’s photo and her personal data, such as “Date of birth”, “Studied”, etc. I can extract the address of the first facebook page for each of my friends accounts, but I do not know how to programmatically open a web page for each of my friends first pages and save the html as a string so that I can extract their personal details and photos. Please help! Thanks in advance!

+4

html c #

user377338 Jan 19 '11 at 15:05

source share

4 answers

You have three options:

1- Using the WebClient object.

 WebClient webClient = new webClient(); webClient.Credentials = new System.Net.NetworkCredential("UserName","Password", "Domain"); string pageHTML = WebClient .DownloadString("http://url");`

2- Using WebRequest . This is the best solution because it gives you more control over your request.

 WebRequest myWebRequest = WebRequest.Create("http://URL"); WebResponse myWebResponse = myWebRequest.GetResponse(); Stream ReceiveStream = myWebResponse.GetResponseStream(); Encoding encode = System.Text.Encoding.GetEncoding("utf-8"); StreamReader readStream = new StreamReader( ReceiveStream, encode ); string strResponse=readStream.ReadToEnd(); StreamWriter oSw=new StreamWriter(strFilePath); oSw.WriteLine(strResponse); oSw.Close(); readStream.Close(); myWebResponse.Close();

3- Using WebBrowser (I bet you don't want to do this)

 WebBrowser wb = new WebBrowser(); wb.Navigate("http://URL"); string pageHTML = ""; wb.DocumentCompleted += (sender, e) => pageHTML = wb.DocumentText;

Sorry if I was wrong in the code because I improvised it and I do not have a syntax check to verify that it is correct. But I think that everything should be fine.

EDIT: for facebook pages. You can use facebook graph API:

http://developers.facebook.com/docs/reference/api/

+9

deadlock Jan 19 '11 at 15:15

source share

In general, you can do 2 things here. The first thing you can do is called web scraping. So you can load the html source with the following code:

 var request = WebRequest.Create("http://example.com"); var response = request.GetResponse(); using (Stream responseStream = response.GetResponseStream()) { StreamReader reader = new StreamReader(responseStream); string stringResponse = reader.ReadToEnd(); }

stringResponse then contains the HTML source of the http://example.com website

However, this is probably not what you want to do. Facebook has an SDK that you can use to download this kind of information. You can read about it on the following pages.

http://developers.facebook.com/docs/reference/api/user/

If you want to use the FaceBook API, I think it's worth changing your question or asking a new question about it, since it is quite complicated and requires some authorization and other encodings. Nevertheless, this is the best way, since it is unlikely that your code will be broken every, and it protects the privacy of the people from whom you want to receive information.

For example, if you request me using api, you will get the following line:

 { "id": "1089655429", "name": "Timo Willemsen", "birthday": "08/29/1989", "education": [ { "school": { "id": "115091211836927", "name": "Stedelijk Gymnasium Arnhem" }, "year": { "id": "127668947248449", "name": "2001" }, "type": "High School" } ] }

You can see that I am Timo Willemmen, 21 years old, and studied @Stedelijk Gymnasium Arnhem in 2001.

+4

Timo willemsen Jan 19 '11 at 15:10

source share

Use selenium 2.0 for C #. http://seleniumhq.org/download/

 var driver = new FirefoxDriver(); driver.Navigate().GoToUrl("http://www.google.com"); String pageSource = driver.PageSource;

0

Matthew kelly Mar 14 '11 at 22:20

source share

Andrew Hare · Accepted Answer · 2011-01-19T15:06:28+0000

Try the following:

var html = new WebClient() .DownloadString("the facebook account url goes here");

Also, once you have loaded the HTML as a string, I highly recommend that you use the Html Agility Pack to parse it.

Open webpage programmatically and get html as a string

More articles: