Copy using wildcards and php

Question

Copy using wildcards and php

Hey guys, it's hard for me to visualize and think to clear this page: http://www.morewords.com/ends-with/aw for the words themselves. Given the url, I would like to get the content, and then create a php array with all the words listed that look like

<a href="/word/word1/">word1</a><br /> <a href="/word/word2/">word2</a><br /> <a href="/word/word3/">word3</a><br /> <a href="/word/word4/">word4</a><br />

There are several ways that I thought about this, I would appreciate it if you could help me solve the most effective way. In addition, I would appreciate any advice or examples on how to achieve this. I understand that this is not incredibly difficult, but I could use the help of advanced hackers.

Use some kind of jquery $ .each () to scroll and somehow if used in a JS array, and then decrypt (probably heavily taxed)
use some kind of curl (in fact, he doesn't have much experience with curl)
use complex finding and regex replacement.

+6

javascript jquery php parsing screen-scraping

willium May 05 '11 at 23:29

source share

1 answer

alex · Accepted Answer · 2011-05-05T23:38:30+0000

You marked it as PHP, so here is a PHP solution :)

 $dom = new DOMDocument; $dom->loadHTMLFile('http://www.morewords.com/ends-with/aw'); $anchors = $dom->getElementsByTagName('a'); $words = array(); foreach($anchors as $anchor) { if ($anchor->hasAttribute('href') AND preg_match('~/word/\w+/~', $anchor->getAttribute('href'))) { $words[] = $anchor->nodeValue; } }

CodePad

If allow_url_fopen disabled in php.ini , you can use cURL to get HTML.

 $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, 'http://www.morewords.com/ends-with/aw'); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); $html = curl_exec($curl); curl_close($curl);

Copy using wildcards and php

More articles: