I am trying to load Captcha faster than executing them in WebBrowser Control, and then copy / paste the image and display it in the image window.
Why not just immediately upload the image directly to the PictureBox, which has the advantage of using less CPU and memory usage, pretty much this solution works for any other captcha service that is more advanced, called Solve Media (using Solve Media, if you look at the URL of the image the next time you try to view it, it will give you an image of a fake catpcha error).
But now I need support for the ReCaptcha Captcha system, and also to use the automation of my bot at a faster pace, and then just refresh the web page and wait for it to display.
So, Iโll just write my code here, as far as I understand, I just canโt imitate one of the properties in the HTML request. I got a User-Agent forged like a real Internet Explorer 8, I think the problem is that the cookie seems to generate a cookie somehow, I canโt figure out where, but I also get one cookie, which, like it seems to me loading a javascript file.
In any case, Google ReCaptcha is trying to trick you with a fake Captcha, which you cannot read to rub it on your face, that you are not doing anything right. I understand, when you see 2 Black circles, then it is obvious that this is a fake.
Here is an example of Bad Captcha and Good Captcha


At some point, I remember that ReCaptcha had another security feature that somehow knew if the Captcha image was downloaded from the actual domain where it is located. I do not know how this works, since I load everything locally correctly? but they seem to have removed this feature anyway. (In fact, it exists on some websites, it seems to be disabled by default, it's easy to fool its Referer header)
I'm not trying to trick anything here, I will still enter them manually manually, but I want to type them faster than what is required to render the page.
I want Captcha to become either these street numbers .. or at least 2 words without these black circles.
Anyway, this is my current code.
Dim newCaptcha = New Captcha Dim myUserAgent As String = "" Dim myReferer As String = "http://www.google.com/recaptcha/demo/" Dim outputSite As String = HTTP.HTTPGET("http://www.google.com/recaptcha/demo/", "", "", "", myUserAgent, myReferer) Dim recaptchaChallengeKey = GetBetween(outputSite, "http://www.google.com/recaptcha/api/challenge?k=", """") 'Google ReCaptcha Captcha outputSite = HTTP.HTTPGET("http://www.google.com/recaptcha/api/challenge?k=" & recaptchaChallengeKey, "", "", "", myUserAgent, myReferer) 'outputSite = outputSite.Replace("var RecaptchaState = {", "{""RecaptchaState"": {") 'outputSite = outputSite.Replace("};", "}}") 'Dim jsonDictionary As Dictionary(Of String, Object) = New JavaScriptSerializer().Deserialize(Of Dictionary(Of String, Object))(outputSite) Dim recaptchaChallenge = GetBetween(outputSite, "challenge : '", "',") outputSite = HTTP.HTTPGET("http://www.google.com/recaptcha/api/js/recaptcha.js", "", "", "", myUserAgent, myReferer) 'This page looks useless but it seems the javascript loads this anyways, maybe this why I get bad captchas? If HTTP.LoadWebImageToPictureBox(newCaptcha.picCaptcha, "http://www.google.com/recaptcha/api/image?c=" & recaptchaChallenge, myUserAgent, myReferer) = False Then MessageBox.Show("Recaptcha Image loading failed!") Else Dim newWork As New Work newWork.CaptchaForm = newCaptcha newWork.AccountId = 1234 'ID of Accounts. newWork.CaptchaHash = "recaptcha_challenge_field=" & recaptchaChallenge newWork.CaptchaType = "ReCaptcha" Works.Add(newWork) newCaptcha.Show() End If
Here is the HTTP class that I am using.
Imports System.Collections.Generic Imports System.Linq Imports System.Text Imports System.Net Imports System.IO Public Class HTTP Public StoredCookies As New CookieContainer Public Function HTTPGET(ByVal url As String, ByVal proxyname As String, ByVal proxylogin As String, ByVal proxypassword As String, ByVal userAgent As String, ByVal referer As String) As String Dim resp As HttpWebResponse Dim req As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest) If userAgent = "" Then userAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" End If req.UserAgent = userAgent req.Referer = referer req.AllowAutoRedirect = True req.ReadWriteTimeout = 5000 req.CookieContainer = StoredCookies req.Headers.Set("Accept-Language", "en-us") req.KeepAlive = True req.Method = "GET" Dim stream_in As StreamReader If proxyname <> "" Then Dim proxyIP As String = proxyname.Split(New Char() {":"})(0) Dim proxyPORT As Integer = CInt(proxyname.Split(New Char() {":"})(1)) Dim proxy As New WebProxy(proxyIP, proxyPORT) 'if proxylogin is an empty string then don't use proxy credentials (open proxy) If proxylogin <> "" Then proxy.Credentials = New NetworkCredential(proxylogin, proxypassword) End If req.Proxy = proxy End If Dim response As String = "" Try resp = DirectCast(req.GetResponse(), HttpWebResponse) StoredCookies.Add(resp.Cookies) stream_in = New StreamReader(resp.GetResponseStream()) response = stream_in.ReadToEnd() stream_in.Close() Catch ex As Exception End Try Return response End Function Public Function LoadWebImageToPictureBox(ByVal pb As PictureBox, ByVal ImageURL As String, ByVal userAgent As String, ByVal referer As String) As Boolean Dim bAns As Boolean Try Dim resp As WebResponse Dim req As HttpWebRequest Dim sURL As String = Trim(ImageURL) If Not sURL.ToLower().StartsWith("http://") Then sURL = "http://" & sURL req = DirectCast(WebRequest.Create(sURL), HttpWebRequest) If userAgent = "" Then userAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" End If req.UserAgent = userAgent req.Referer = referer req.AllowAutoRedirect = True req.ReadWriteTimeout = 5000 req.CookieContainer = StoredCookies req.Headers.Set("Accept-Language", "en-us") req.KeepAlive = True req.Method = "GET" resp = req.GetResponse() If Not resp Is Nothing Then Dim remoteStream As Stream = resp.GetResponseStream() Dim objImage As New MemoryStream Dim bytesProcessed As Integer = 0 Dim myBuffer As Byte() ReDim myBuffer(1024) Dim bytesRead As Integer bytesRead = remoteStream.Read(myBuffer, 0, 1024) Do While (bytesRead > 0) objImage.Write(myBuffer, 0, bytesRead) bytesProcessed += bytesRead bytesRead = remoteStream.Read(myBuffer, 0, 1024) Loop pb.Image = Image.FromStream(objImage) bAns = True objImage.Close() End If Catch ex As Exception bAns = False End Try Return bAns End Function End Class
EDIT: I found out that this Google Javascript Clientside Obfuscated Encryption system on
http://www.google.com/js/th/1lOyLe_nzkTfeM2GpTkE65M1Lr8y0MC8hybXoEd-x1s.js
I still want to be able to defeat it without using a heavy webbrowser, maybe some easy quick javascript analysis? It makes no sense to wave it and port it to VB.NET, because as soon as I do this, they can completely change several variables or encryption, and I did everything that nothing works, so I want something more intelligent. At the moment, I donโt even know how the URL is created, it will seem static, and this is probably a real file, not only a time-generated file.
It _challenge page that challenges the image - itโs just a decoy call .. this problem is then replaced (maybe encrypted?) With the variables t1, t2, t3, it seems that this encryption is not used every time if you pass it. as soon as you get access to what I'm trying to do pretty much, my code works, but it stops working at very random intervals, I want something more solid that I can leave unattended for several weeks.