Instead of including sync.WaitGroup
, you can expand the send result to a parsed URL and include the number of new URLs found. In your main loop, you continue to read the results until you collect something.
In your case, the number of URLs found will contain the number of running procedures, but this is not necessary. I would personally create a more or less fixed number of procedures so that you do not open too many HTTP requests (or at least you have control over it). Then your main loop will not change, since it doesn't matter how the fetch is done. The important fact here is that you need to send a result or error for each URL - I changed the code here, so it does not generate new routines when the depth is already 1.
A side effect of this solution is that you can easily print the progress in your main loop.
Here is an example on the playground:
http://play.golang.org/p/BRlUc6bojf
package main import ( "fmt" ) type Fetcher interface { // Fetch returns the body of URL and // a slice of URLs found on that page. Fetch(url string) (body string, urls []string, err error) } type Res struct { url string body string found int // Number of new urls found } // Crawl uses fetcher to recursively crawl // pages starting with url, to a maximum of depth. func Crawl(url string, depth int, fetcher Fetcher, ch chan Res, errs chan error, visited map[string]bool) { body, urls, err := fetcher.Fetch(url) visited[url] = true if err != nil { errs <- err return } newUrls := 0 if depth > 1 { for _, u := range urls { if !visited[u] { newUrls++ go Crawl(u, depth-1, fetcher, ch, errs, visited) } } } // Send the result along with number of urls to be fetched ch <- Res{url, body, newUrls} return } func main() { ch := make(chan Res) errs := make(chan error) visited := map[string]bool{} go Crawl("http://golang.org/", 4, fetcher, ch, errs, visited) tocollect := 1 for n := 0; n < tocollect; n++ { select { case s := <-ch: fmt.Printf("found: %s %q\n", s.url, s.body) tocollect += s.found case e := <-errs: fmt.Println(e) } } } // fakeFetcher is Fetcher that returns canned results. type fakeFetcher map[string]*fakeResult type fakeResult struct { body string urls []string } func (f fakeFetcher) Fetch(url string) (string, []string, error) { if res, ok := f[url]; ok { return res.body, res.urls, nil } return "", nil, fmt.Errorf("not found: %s", url) } // fetcher is a populated fakeFetcher. var fetcher = fakeFetcher{ "http://golang.org/": &fakeResult{ "The Go Programming Language", []string{ "http://golang.org/pkg/", "http://golang.org/cmd/", }, }, "http://golang.org/pkg/": &fakeResult{ "Packages", []string{ "http://golang.org/", "http://golang.org/cmd/", "http://golang.org/pkg/fmt/", "http://golang.org/pkg/os/", }, }, "http://golang.org/pkg/fmt/": &fakeResult{ "Package fmt", []string{ "http://golang.org/", "http://golang.org/pkg/", }, }, "http://golang.org/pkg/os/": &fakeResult{ "Package os", []string{ "http://golang.org/", "http://golang.org/pkg/", }, }, }
And yes, follow @jimt's advice and provide access to a safe stream.