Detect if user is human without captcha or useragent

I have a website where I check email encryption for users, and I'm trying to figure out if there is a way to detect if the user is a human or a bot . I delved into $ _ SESSION in php, but it's easy to get around, I am also not interested in captcha , useragent or login , any idea what I need?

There are other questions very similar to this in SO, but I could not find a direct answer ...

Any help would be greatly appreciated, thanks everyone!

+6
source share
6 answers

This is a complex problem, and no solution that I know of will be 100% different from the point of bot protection and usability. If your attacker is really determined to use a bot on your site, they are likely to be able to do this. If you do something far enough to make a computer program available for something on your website inappropriate, hardly anyone will want it, but you can achieve a good balance.

My point of view on this is partly a web developer, but especially on the other hand, having written many web crawler programs for clients around the world. Not all bots have malicious intentions, and they can be used to automate the automatic submission of forms to fill out databases of doctors' addresses or analyze data on the stock market. If your site is well designed in terms of ease of use, there should not be a bot that "simplifies" the user, but there are times when there are special needs that you cannot plan.

Of course, there are those who have evil intentions that you definitely want to protect your site from the possible. There is practically no site that somehow cannot be automated. Most sites are not complicated, but here are some ideas from my point of view, from other answers or comments on this page and from my experience in writing (not malicious) bots.

Types of bots

First I must mention that there are two different categories in which I would put bots:

  • General purpose work robots, indexes or bots
  • Special bots specifically designed for your site to perform certain tasks.

Typically, a general-purpose bot will be something like a search engine indexer, or perhaps a hacker script that looks for a form to submit, uses a dictionary attack to find a vulnerable URL, or something like that. They can also attack “engine sites”, such as Wordpress blogs. If your site is correctly protected with good passwords, etc., they usually will not represent most of the risk for you (if you do not use Wordpress, in which case you need to keep up with the latest versions and security updates).

Special personalized bots for special purposes is what I wrote. A bot designed specifically for your site can be designed to look very similar to the person on your site, including inserting time delays between form submissions, setting cookies, etc. Therefore, they are difficult to detect. For the most part, this is what I am talking about in the remainder of the answer.

CAPTCHAs

Captchas are probably the most common approach for making sure that the user is a humanoid, and it is usually difficult for them to get around automatically. However, if you just require captcha as a one-time thing, when the user creates an account, for example, it’s easy for a person to go past it and then give their shiny new account credentials to a bot to automate the use of the system.

I remember how a few years ago I read about a rather complicated system for “hacking” captchas on a popular gaming site: a separate site was created that downloaded captchas from a gaming site and presented them to users, where they were essentially from the crowd. Users on the second site will receive some kind of reward for each correct captcha, and the owners of the site were able to automate the tasks on the gaming site using their data with the information gathered by the crowd.

As a rule, the use of a good captcha system may well guarantee one thing: somewhere there is a person who typed the text with the inscription. What happens before and after that depends on how often you need a verification check, and how the person making the bot is determined.

Cell Phone / Credit Card Check

If you do not want to use Captchas, this type of check is likely to be quite effective for everyone except the most determined bot writer. Although (as in the case of captcha) this will not prevent an already verified user from creating and using a bot, you can make sure that the person has created an account, and if abuse blocks the use of this phone / credit card number to create another account.

Sites like Facebook and Craigslist have begun using cell phone checking to prevent spam from bots. For example, to create applications on Facebook, you must have a phone number on the record, confirmed by a text message or an automatic phone call. If your attacker does not have access to many active phone numbers, this can be an effective way to verify that a person has created an account and that he creates only a limited number of accounts (one for most people).

Credit cards can also be used to confirm that a person performs an action and limits the number of accounts that one person can create.

Other [less effective] solutions

Magazine analysis

Analysis of your query logs often shows that bots repeat the same actions several times, or sometimes use dictionary attacks to search for holes in your site’s configuration. Thus, the magazines will tell you after the request was made by a bot or a person. This may or may not be useful to you, but if requests were made to a verified cell phone or credit card account, you can block the account associated with abusive requests to prevent further abuse.

Mathematics / Other Matters

Math problems or other questions can be answered with a quick google or wolfram alpha , which can be automated by a bot. Some questions will be more complicated than others, but large search companies work against you here, making their engines better understand issues like this, and in turn make this a less viable option to verify that the user is human.

Hidden form fields

Some sites use a mechanism in which parameters, such as mouse coordinates, when they click the submit button, are added to the submit form via javascript. In most cases, they are very easy to fake, but if you see in your logs a whole bunch of requests using the same coordinates, they are probably a bot (although a smart bot can easily give different coordinates with each request).

Javascript Cookies

Since most bots don’t download or run javascript, cookies set using javascript instead of the HTTP set-cookie header will make life a little more complicated for most potential bot creators. But it’s not so difficult to prevent the bot from manually setting a cookie as soon as the developer finds out how to create the same value that javascript generates.

IP address

An IP address alone will not tell you if the user is human. Some sites use IP addresses to try to detect bots, although it is true that a simple bot can appear as a bunch of requests from the same IP address. But IP addresses are cheap, and with Amazon EC2 or similar cloud services, you can create a server and use it as a proxy server. Or enter 10 or 100 and use them as a proxy.

UserAgent String

It is so easy to manipulate in a crawler that you cannot count on it to mark a bot that is trying to not be detected. It’s easy to install UserAgent on the same line as one of the main browser posts, and can even rotate between multiple browsers.

Complex markup

The most difficult site I ever wrote a bot on consisted of frames inside frames inside frames ... about 10 layers on each page, where each src frame was the same controller database, but had different parameters as to which perform actions. The order of the actions was important, so it was difficult to keep everything that was happening, but in the end (after a week or so) my bot worked, therefore, although this may restrain some botters, it will not be useful against everything. And, most likely, your site will be more difficult to maintain.

Disclaimer and conclusion

Not all bots are bad. Most of the scanners / bots that I made were intended for users who wanted to automate certain processes on the site, such as entering data that was too tedious to do manually. So doing tedious tasks is easy! Or provide an API for your users. Probably one of the easiest ways to prevent someone from writing a bot for your site is to provide access to the API. If you provide an API, it is much less likely that someone will try to create a crawler for it. And you can use the API keys to control how much someone uses it.

In order to prevent spammers, the most effective approach is likely to include a combination of captcha combinations and checking accounts by cell numbers or credit cards. Add some log analysis to identify and disable any malicious personalized bots, and you should be in pretty good shape.

+15
source

I saw / used a simple arithmetic problem with written numbers, i.e.:

Please answer the following question to prove that you are a person: "What is two plus four?"

and similar simple questions requiring reading:

"What is man's best friend?"

you can provide an endless stream of questions if the user trying to access is not familiar with the subject and is accessible to all readers, etc.

+2
source

There is a reason companies use captchas or logins. As ugly solutions like captchas, they are currently the best (most accurate, least destructive for users) way to weed out bots. If the login solution does not work for you, I am afraid that captcha is the only realistic solution.

+2
source

My favorite way is to introduce a “user” with a cat or dog and ask, “Is this a cat or a dog?” No man is ever mistaken; the computer gets this right, perhaps in 60% of cases (so you need to run it several times). There is a project that will give you bundles of images of cats and dogs - plus, all animals are available for adoption, so if the user loves a pet, he can have it.

This is a corporate project of Microsoft, which puts me in a state of cognitive dissonance, as if I found out that Harry Reid loves this music or that George W. Bush smokes a pot. Oh wait ...

+2
source

If users fill out the form, honeypot fields are easy to implement and can be quite effective, but nothing is perfect. Create one or more hidden fields in the form, and if they contain anything when the form is submitted, reject the form. Spambots usually try to fill everything.

You need to know about availability. Hidden fields will probably not be filled in by those using a standard browser (where this field is not visible), but those using screen readers can be represented with a field. Remember to specify it correctly so that these users do not fill it out. Perhaps with something like "Please help us prevent spam by leaving this field blank." In addition, if you reject the form, be sure to reject it with useful error messages, in case it is filled out by a person.

+1
source

I suggest getting the Growmap Anti Spambot Wordpress plugin and see which code you can take from it or simply using the same technique. I found this plugin very effective in limiting automatic spam on my WordPress sites, and I began to adapt this technique for my ASP.NET sites.

The only thing he can’t cope with is human spam ticks.

0
source

Source: https://habr.com/ru/post/895624/


All Articles