Scrambling Facebook data with Python

Question

Scrambling Facebook data with Python

I have been trying for several days (unsuccessfully) to clear the cities of about 500 Facebook URLs. However, Facebook processes its data in a very strange way, and I can’t understand what is happening under the hood in order to understand what I need to do.

In essence, the problem is that Facebook displays very different amounts of data depending on who is logged in and what are the privacy settings of the account. For example, try opening the following three links, both in the browser where you are logged in to Facebook and in the place where you are not:

As you can see, Facebook uploads data in both cases for the first link, but only receives data for the second link if you are logged in (to ANY account). The third link displays the city when you are logged in, but only other information is displayed if you did not.

The reason this is extremely problematic (and applies to Python) is that when I try to clear the page using Beautiful Soup or Mechanize, I cannot figure out how to make the program "pretend" that I am registered with my account. This means that I can easily capture data from the first type of link (of which there are less than 10), but I can not get the city from the second or third type. So far I have tried a number of solutions with little success.

Here is an example of code that works correctly for the first type, but not for other types:

import mechanize import re import csv user_info = [] fb_url = 'http://www.facebook.com/100004210542493' br = mechanize.Browser() br.set_handle_robots(False) br.open(fb_url) all_html = br.response().get_data() print all_html city = re.search('fsl fwb fcb">(.+?)</a></div><div class="aboutSubtitle fsm fwn fcg', all_html).group(1) user_info = [fb_url, city] print user_info

I also have a version that uses Beautiful Soup. If anyone has ideas on how to get around this, I would be extremely grateful. Thanks!

+6

python facebook web-scraping beautifulsoup mechanize

cscanlin Sep 27 '13 at 2:37

source share

4 answers

You should study the facepy Johannes Gorset . He did a brilliant job. I used it when I was working on a small Facebook application for a personal project.

+15

Rohit Sep 27 '13 at 2:56

source share

I think that clearing facebook data is illegal. It exists in terms of using facebook. Each action is logged with your login details, even if you use the bot to clear it. If they are caught, they may prevent you from using facebook for your life. If there is a potential threat to any asset that you can pose for, they can punish you further.

+2

TNT Jun 13 '16 at 16:46

source share

You can try using selenium and the Facebook API. I also had to clear some similar data from the Facebook account testing list, and selenium webdriver helped imitate as a real user and clear the required data.

+1

shashivs Mar 27 '15 at 8:24

source share

James robinson · Accepted Answer · 2013-09-27T02:54:47+0000

The correct way to use facebook API. For various business, security and privacy purposes, they go astray to make scraping data complex.

If you insist on cleaning up, I would try to log in first using the mechanism to submit the form. I never tried to do this using facebook, but on many sites it’s easier to parse versions designed for mobile users on m.site.com.

Scrambling Facebook data with Python

More articles: