Building general- and special-purpose corpora by Web crawling