Getting final urls of abbreviated urls (e.g. bit.ly) using php

[Updated below]
Hello.

Start with short urls:
Imagine you have a collection of 5 short URLs (e.g. http://bit.ly ) in a php array, for example:

$shortUrlArray = array("http://bit.ly/123", "http://bit.ly/123", "http://bit.ly/123", "http://bit.ly/123", "http://bit.ly/123"); 

Finish with final, redirected URLs:
How can I get the final url of these short urls using php? Like this:

http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html

I have one method (found online) that works well with a single URL, but when navigating to multiple URLs, it only works with the final URL in the array. For your reference, the method is as follows:

 function get_web_page( $url ) { $options = array( CURLOPT_RETURNTRANSFER => true, // return web page CURLOPT_HEADER => true, // return headers CURLOPT_FOLLOWLOCATION => true, // follow redirects CURLOPT_ENCODING => "", // handle all encodings CURLOPT_USERAGENT => "spider", // who am i CURLOPT_AUTOREFERER => true, // set referer on redirect CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect CURLOPT_TIMEOUT => 120, // timeout on response CURLOPT_MAXREDIRS => 10, // stop after 10 redirects ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); //$header['errno'] = $err; //$header['errmsg'] = $errmsg; //$header['content'] = $content; print($header[0]); return $header; } //Using the above method in a for loop $finalURLs = array(); $lineCount = count($shortUrlArray); for($i = 0; $i <= $lineCount; $i++){ $singleShortURL = $shortUrlArray[$i]; $myUrlInfo = get_web_page( $singleShortURL ); $rawURL = $myUrlInfo["url"]; array_push($finalURLs, $rawURL); } 

Close but not enough
This method works, but with only one URL. I cannot use it in a for loop that I want to do. When used in the for example in the above example, the first four elements are returned unchanged, and only the final element is converted to its final url. This happens if your array consists of 5 elements or 500 elements.

Decision:
Please give me a hint on how to change this method to work inside a for loop with a set of URLs (not just one).

-OR -

If you know the code that is best suited for this task, include it in your answer.

Thanks in advance.

Update:
After some further pushing, I found that the problem is not with the method above (which, after all, seems to work fine for loops), but maybe it encodes. When I hard code an array of short URLs, the loop works fine. But when I go to the block of new URLs from the html form using GET or POST, the above problem arises. Are the URLs somehow changed to a method incompatible format when I submit the form?

New update:
You guys, I found that my problem is related to something not related to the above method. My problem was that the URL encoding of my short URLs converts what I thought was only newline characters (separating the URLs):% 0D% 0A, which is a string of string or return character ... And what all short URLs are saved for the final url in the collection had a ghostly character attached to the tail, making it impossible to get the final URLs only for them. I identified the ghost, fixed my php explosion, and now everything works fine. Sorry and thank you.

+3
source share
3 answers

This can be useful: How to put a string in an array, broken into a new string?

You would probably do something similar, assuming you get the URLs returned in POST:

 $final_urls = array(); $short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "\n" or "\r\n", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string. foreach ( $short_urls as $short ) { $final_urls[] = get_web_page( $short ); } 

I get the following output using var_dump($final_urls); and your .ly url bit:

http://codepad.org/8YhqlCo1

And my source: $_POST['short_urls'] = "http://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123";

I also got an error using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27 Line 27: print($header[0]); I'm not sure what you wanted there ...

Here is my test.php if this helps: http://codepad.org/zI2wAOWL

+2
source

I think you almost have it. Try the following:

 $shortUrlArray = array("http://yhoo.it/2deaFR", "http://bit.ly/900913", "http://bit.ly/4m1AUx"); $finalURLs = array(); $lineCount = count($shortUrlArray); for($i = 0; $i < $lineCount; $i++){ $singleShortURL = $shortUrlArray[$i]; $myUrlInfo = get_web_page( $singleShortURL ); $rawURL = $myUrlInfo["url"]; printf($rawURL."\n"); array_push($finalURLs, $rawURL); } 
+2
source

I implemented to get each line of a text file with one shortened URL per line corresponding to a redirect URL:

 <?php // input: textfile with one bitly shortened url per line $plain_urls = file_get_contents('in.txt'); $bitly_urls = explode("\r\n", $plain_urls); // output: where should we write $w_out = fopen("out.csv", "a+") or die("Unable to open file!"); foreach($bitly_urls as $bitly_url) { $c = curl_init($bitly_url); curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'); curl_setopt($c, CURLOPT_FOLLOWLOCATION, 0); curl_setopt($c, CURLOPT_HEADER, 1); curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 20); // curl_setopt($c, CURLOPT_PROXY, 'localhost:9150'); // curl_setopt($c, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); $r = curl_exec($c); // get the redirect url: $redirect_url = curl_getinfo($c)['redirect_url']; // write output as csv $out = '"'.$bitly_url.'";"'.$redirect_url.'"'."\n"; fwrite($w_out, $out); } fclose($w_out); 

Have fun and enjoy! Pw

0
source

Source: https://habr.com/ru/post/1308508/


All Articles