I am working on a site with urls similar to youtube. We generate identifiers on the server, and I chose base 62 (numbers, lower and upper case letters) so that they are shorter. So the urls could be something like example.com/user/123AbCaBc. It seems like the facebook robot regularly gets to my site with a lowercase version example.com/user/123AbCaBc. This results in a 404 error because the uppercase identifier is not in the database.
According to the logs, there are no other user agents creating 404, so this is definitely a robot, not a human. Here's the user agent I see:
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
This happens approximately every 4 minutes. Currently, I am not registering non-404 hits, so I'm not sure if there are other versions other than lowercase.
Server technology is nodejs / mongodb, but I don't see how this relates to the problem.
Is there something I can do to fix facebook? Is there a problem here, or should I squeak these log errors? Does anyone else have a similar problem?
source
share