Google App Engine URL output not working with production

I am using urlfetch function for google mechanism for remote login to another web service. Everything works fine during development, but when I move on to production, the login procedure fails. Do you have any suggestions on how to debug product URL retrieval?

I use cookies and other headers in my selection of URLs (I manually set cookies in the header). One cookie is a session cookie.

No errors or exceptions. When you create a message to enter the URL command, a session cookie is returned, but when you request a page using session cookies, they are ignored and the login information is requested again. When developing after receiving session cookies, you can access the internal pages just fine. I thought the problem was with cookies being saved, but they look right as the requests are almost identical.

This is what I call it:

fetchresp = urlfetch.fetch(url=req.get_full_url(), payload=req.get_data(), method=method, headers=all_headers, allow_truncated=False, follow_redirects=False, deadline=10 ) 

Here are some insights regarding the problem:

  • The distributed nature of the google url fetch implementation is a mess.
  • In production, headers are sent in a different order than in development, possibly confusing the server.
  • Some google servers are blacklisted by the destination server.

Here are some hypotheses that I excluded:

  • Google caching is too aggressive. But I still get the problem after disabling the cache using the Cache-Control: no-store header.
  • Google urlfetch is too fast for the destination server. But I still get the problem after inserting delays between calls.
  • Google adds some data to the User-Agent header. But I added this header for development, and I had no problem.

What other differences exist between fetching the product URL and fetching the development URL? Do you have any ideas for debugging this?

UPDATE 2

(The first update was included above) I don’t know if it was something I did (maybe by adding delays or disabling the caches mentioned above), but now the working environment works in about 50% of cases. It definitely looks like a race condition. Unfortunately, I have no idea if there is a problem in my code, google codec or destination server.

+4
source share
3 answers

As already mentioned, the key differences between dev and prod are the original IP address and how some of the request headers are processed. See here for a list of restricted headers. I don't know if this is registered, but in prod your app id is added at the end of your user agent. I had a problem when requests in prod were found only as a search robot, because my application identifier contained the string "bot".

You mentioned that you set cookies manually, including session cookies. Does this mean that you created a session in Dev and then you are trying to reuse it in prod? Is it possible that the remote server registers the source IP address that the session establishes and requires that subsequent requests come from the same IP address?

You said that this will not work, but you will not get an exception. What exactly does this mean? Are you getting HTTP 200 and an empty response enclosure? Another HTTP status? It’s best to contact the owners of the remote service and see if they can tell you more specifically what is wrong with your request. Everything else is just speculation.

+2
source

Check server logs to see if GAE disables any headers. I noticed that GAE (I think, I think I saw it on a dev server) will drop headers that it doesn't like.

Depending on the web service you are calling, it may also not be as normal when GAE calls it than your local computer.

+1
source

I ran into this problem by creating a webapp with a similar problem - looking at the urlfetch documentation , it turns out that the maximum timeout for the extraction call is 60 seconds, but by default it is 5 seconds.

5 seconds on my local machine was long enough to request URLs on my local computer, but with GAE, it only consistently completed its task in 5 seconds in about 20% of cases.

I have included the deadline=60 parameter and it has been working fine ever since.

Hope this helps others!

+1
source

Source: https://habr.com/ru/post/1383329/


All Articles