PHP application variable ... maybe?

I went for an interview in PHP, I was asked to inject a piece of code to find that visitors are bots that crawl the site and steal content.

So, I implemented a few lines of code to determine if a site is being updated or visited too quickly / often, using a session variable to save the timestamp of the last visit.

I was told that session varaibles can be manupilated using cookies, etc., so I wonder if there is an application variable that I can use to store timestamp information against the visitor’s IP addresses, for example $ _SERVER [REMOTE_ADDR] ?

I know that I can write data to a file, but this is not very good for a high traffic website.

Hi

James

+4
source share
6 answers

I was told that varaibles sessions can be manupilated with cookies, etc.,

To be clear, clients cannot edit session variables to their liking. However, they can delete or change the PHPSESSID that another session provides. Global variables (i.e. $_SERVER ) are not constant, so any changes you make to them will not go to the loading of the next page.

The best way to detect crawlers is to store the IP address, user agent, and timestamp of all page loads in the database. Overhead is negligible.

+4
source

In a word, no. Your options are cookies, session cookies (e.g. server side cookies) and storage (/ db file).

+1
source

Your best bet for this might be an after-fact analysis of the magazines. This will not stop content theft on the fly, but it will be much easier to find patterns of abuse and block these IP addresses from future accesses.

+1
source

You will need to store the IP address and timestamp on the server side. It is unlikely that the bot will send cookies, and even a URL-based session is not reliable.

The overhead of a file should not be too large, unless you simply register the files that kill you. You can use SQLite or similar, possibly stored in a memory-based file system for a slight increase in speed. Or you can go with something like memcached. If you need to save data, use MySQL. The overhead of a full-blown database is practically nothing compared to when PHP requires almost nothing.

If you really want to do something similar with sessions, display the user agreement page if the session does not have a specific “I Agree” variable. Thus, if the bot does not send a valid session back, all it receives is the user agreement. If so, you can track it using session variables.

Keep in mind that a session-based solution is not required, since you do not need to remember the state of the client between requests, and these sessions will carry more, if not more, overhead than most user alternatives.

Regarding the assertion that session variables can be processed using cookies, this is not entirely true. However, if you are stupid enough to leave register_globals enabled and you are requesting a global variable, I would not want to fear that this would come from a session, cookie, query string, environment, or previously undefined. This is all debatable if you explicitly access through $ _SESSION of course.

+1
source

Bots can ignore the storage of cookies (as if the session variable did not transition). The best option would be to use some kind of external database or storage system. Like a C ++ socket program that just stores IP and compares the latest connections.

0
source

Do not expect to defeat them only during the update. I did something very similar to bonus contact spam, and some bots waited until people took the next action.

I would look more at ip addresses that only load an html document and ignore things like favicon, CSS style sheets, etc. If you install css files to parse php, you can add some logic to say that ip looks legit, just be careful with caching.

Are you also taking steps to make sure you are not blocking legitimate bots like googlebot?

0
source

Source: https://habr.com/ru/post/1304719/


All Articles