The duplicate filter in the scheduler filters out only those URLs that have already been seen in one web run (this means that it will receive a reset on subsequent runs). The IgnoreVistedItems middleware will maintain state between runs and avoid visited URLs that have been seen in the past, but only for the final URLs of the elements so that the rest of the site can be crawled (to find new elements).
source share