Confusion with mail.google.com, cURL and http://validator.w3.org/checklink

I am building a basic link to work using cURL. My application has a getHeaders () function that returns an array of HTTP headers:

function getHeaders ($ url) {

    if (function_exists ('curl_init')) {
        // create a new cURL resource
        $ ch = curl_init ();
        // set URL and other appropriate options
        $ options = array (
            CURLOPT_URL => $ url,
            CURLOPT_HEADER => true,
            CURLOPT_NOBODY => true,
            CURLOPT_FOLLOWLOCATION => 1,
            CURLOPT_RETURNTRANSFER => true);
        curl_setopt_array ($ ch, $ options);
        // grab URL and pass it to the browser
        curl_exec ($ ch);
        $ headers = curl_getinfo ($ ch);
        // close cURL resource, and free up system resources
        curl_close ($ ch);
    } else {
        echo "

Error: cURL is not installed on the web server. Unable to continue.

"; return false;} return $ headers;} print_r (getHeaders ('mail.google.com'));

This gives the following results:

Array
(
    [url] => http://mail.google.com
    [content_type] => text / html; charset = utf-8
    [http_code] => 404
    [header_size] => 338
    [request_size] => 55
    [filetime] => -1
    [ssl_verify_result] => 0
    [redirect_count] => 0
    [total_time] => 0.128
    [namelookup_time] => 0.042
    [connect_time] => 0.095
    [pretransfer_time] => 0.097
    [size_upload] => 0
    [size_download] => 0
    [speed_download] => 0
    [speed_upload] => 0
    [download_content_length] => 0
    [upload_content_length] => 0
    [starttransfer_time] => 0.128
    [redirect_time] => 0
)

, , , mail.google.com, .

URL (mail.google.com) W3C, :

Results

Links

Valid links!

List of redirects

The links below are not broken, but the document does not use the exact URL, and the links were redirected. It may be a good idea to link to the final location, for the sake of speed.

warning Line: 1 http://mail.google.com/mail/ redirected to

https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&bsv=zpwhtygjntrz&scc=1&ltmpl=default&ltmplcache=2

Status: 302 -> 200 OK

This is a temporary redirect. Update the link if you believe it makes sense, or leave it as is. 

Anchors

Found 0 anchors.

Checked 1 document in 4.50 seconds.

, , , , mail.google.com .

cURL , 200 mail.google.com?

404 302?

+3
2

, , cURL.

http://mail.google.com:

HTTP/1.1 200 OK
Cache-Control: public, max-age=604800
Expires: Mon, 22 Jun 2009 14:58:18 GMT
Date: Mon, 15 Jun 2009 14:58:18 GMT
Refresh: 0;URL=http://mail.google.com/mail/
Content-Type: text/html; charset=ISO-8859-1
X-Content-Type-Options: nosniff
Transfer-Encoding: chunked
Server: GFE/1.3

<html>
 <head>
  <meta http-equiv="Refresh" content="0;URL=http://mail.google.com/mail/" />
 </head>
 <body>
  <script type="text/javascript" language="javascript">
  <!--
   location.replace("http://mail.google.com/mail/")
  -->
  </script>
 </body>
</html>

, Refresh ( - HTML), javascript , http://mail.google.com/mail/.

http://mail.google.com/mail/, ( , cURL) , W3C .

HTTP/1.1 302 Moved Temporarily
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Mon, 15 Jun 2009 15:07:56 GMT
Location: https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&bsv=zpwhtygjntrz&scc=1&ltmpl=default&ltmplcache=2
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniff
Transfer-Encoding: chunked
Server: GFE/1.3

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Cache-control: no-cache, no-store
Pragma: no-cache
Expires: Mon, 01-Jan-1990 00:00:00 GMT
Set-Cookie: GALX=B8zH60M78Ys;Path=/accounts;Secure
Date: Mon, 15 Jun 2009 15:07:56 GMT
X-Content-Type-Options: nosniff
Content-Length: 19939
Server: GFE/2.0

(HTML page content here, removed)

, script, Refresh.

, open_basedir PHP, CURLOPT_FOLLOWLOCATION - , , .

, , cURL:

$useragent="Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$res = curl_exec($ch);

curl_close($ch);
+4

,

mail.google.com -> mail.google.com/mail is a 404 and then a hard redirect

mail.google.com/mail -> https://www.google.com/accounts... etc is a 302 redirect
0

Source: https://habr.com/ru/post/1710409/


All Articles