Automatic link checking system for system testing

I often have to work with fragile legacy websites that break unexpectedly when updating logic or configuration.

I do not have the time or knowledge of the system needed to create a Selenium script. In addition, I do not want to check a specific use case - I want to check every link and page on the site.

I would like to create an automated system test that will go through the site and check for broken links and crashes. Ideally, there would be a tool that I could use to achieve this. It should have as many of the following functions as possible in descending order of priority:

  • Run through script
  • No human interaction required
  • Performs all links, including anchor tags and links to CSS and js files.
  • Creates a log of all found 404, 500, etc.
  • Can be deployed locally to check intranet sites
  • Cookie / Form Authentication Support
  • Free / Open Source

There are many partial solutions, such as FitNesse , Firefox LinkChecker, and the W3C link checker , but none of them do everything I need.

I would like to use this test for projects using a number of technologies and platforms, so a more portable solution would be better.

I understand that this is not a substitute for proper system testing, but it would be very useful if I had a convenient and automatic way to verify that no part of the site was clearly broken.

+41
web-crawler automated-tests system-testing
Oct 20 '09 at 18:37
source share
9 answers

I use Xenu Link Sleuth for this kind of thing. Quickly check for dead links, etc. On any site. Just point it to any URI and it will cover all the links on this site.

Description from the site:

Xenu Link Sleuth (TM) checks Web sites for broken links. Link checking is performed on "normal" links, images, frames, plugins, backgrounds, local image cards, style sheets, scripts and java applets. This displays a constantly updated list of URLs that you can sort by criteria. A report can be prepared at any time.

It meets all your requirements, except that it is scriptable, as it is a Windows application that requires manual launch.

+27
Oct 31 '09 at 20:27
source share

We use and really like Linkchecker:

http://wummel.imtqy.com/linkchecker/

This is open-source, Python, the command line, internally deployable and output in various formats. The developer was very helpful when we contacted him with problems.

We have a Ruby script that queries our database of internal websites, runs LinkChecker with the appropriate parameters for each site, and parses the XML that LinkChecker gives us to create a custom error report for each site in our CMS.

+31
Nov 06 '09 at 22:25
source share

What part of your list will the W3C link checker test? This will be the one that I will use.

Alternatively twill (based on python) is an interesting little language for this kind of thing. It has a link checking module , but I donโ€™t think it works recursively, so itโ€™s not so good for spidering. But you can change it if you like it. And I could be wrong, maybe a recursive option. Anyway, it's worth checking out.

+2
Oct 31 '09 at 20:18
source share

You might want to use wget for this. It can host a site, including "page details" (i.e. Files), and can be configured to record errors. I do not know if you will have enough information for you, but it is free and available for Windows (cygwin), as well as for unix.

+2
Nov 02 '09 at 19:01
source share

InSite is a commercial program that seems to do what you want (not used).

If I were in your place, I would probably write this kind of spiders ...

+1
Oct 31 '09 at 13:58
source share

I'm not sure that it supports authentication, but it will process cookies if you can get it on the site, otherwise I think Checkbot will do everything in your list. I used it as a step in the assembly process before checking that nothing was broken on the site. There is sample output on a website.

+1
Nov 02 '09 at 19:29
source share

I always liked linklint for checking links on the site. However, I do not think that it meets all your criteria, especially aspects that may be JavaScript dependent. I also think this will skip images invoked from within CSS.

But for the spidering of all anchors, it works great.

+1
Nov 07 '09 at 8:55
source share

Try SortSite . It's not free, but it seems to do everything you need and more.

As an alternative, PowerMapper from the same company has a similar but excellent approach. The latter will give you less information about the detailed optimization of your pages, but it will still identify any broken links, etc.

Disclaimer: I have a financial interest in the company that produces these products.

0
Nov 07 '09 at 10:41
source share

Try http://www.thelinkchecker.com , it is an online application that checks the number of outgoing links, page rank, anchor, number of outgoing links. I think this is the solution you need.

0
Jan 18 '14 at 22:20
source share



All Articles