I have been trying for several days (unsuccessfully) to clear the cities of about 500 Facebook URLs. However, Facebook processes its data in a very strange way, and I canβt understand what is happening under the hood in order to understand what I need to do.
In essence, the problem is that Facebook displays very different amounts of data depending on who is logged in and what are the privacy settings of the account. For example, try opening the following three links, both in the browser where you are logged in to Facebook and in the place where you are not:
As you can see, Facebook uploads data in both cases for the first link, but only receives data for the second link if you are logged in (to ANY account). The third link displays the city when you are logged in, but only other information is displayed if you did not.
The reason this is extremely problematic (and applies to Python) is that when I try to clear the page using Beautiful Soup or Mechanize, I cannot figure out how to make the program "pretend" that I am registered with my account. This means that I can easily capture data from the first type of link (of which there are less than 10), but I can not get the city from the second or third type. So far I have tried a number of solutions with little success.
Here is an example of code that works correctly for the first type, but not for other types:
import mechanize import re import csv user_info = [] fb_url = 'http://www.facebook.com/100004210542493' br = mechanize.Browser() br.set_handle_robots(False) br.open(fb_url) all_html = br.response().get_data() print all_html city = re.search('fsl fwb fcb">(.+?)</a></div><div class="aboutSubtitle fsm fwn fcg', all_html).group(1) user_info = [fb_url, city] print user_info
I also have a version that uses Beautiful Soup. If anyone has ideas on how to get around this, I would be extremely grateful. Thanks!
source share