I'm using Scrapy and I want to be able to have more control over the crawler. To do this I would like to set rules depending on the current URL that I am processing.
For example if I am on
example.com/a I want to apply a rule with
LinkExtractor(restrict_xpaths='//div[@class="1"]'). And if I'm on
example.com/b I want to use another Rule with a different Link Extractor.
How do I accomplish this?
I'd just code them in separate callbacks, instead of relying in the CrawlSpider rules.
def parse(self, response): extractor = LinkExtractor(.. some default ..) if 'example.com/a' in response.url: extractor = LinkExtractor(restrict_xpaths='//div[@class="1"]') for link in extractor.extract_links(response): yield scrapy.Request(link.url, callback=self.whatever)
This is better than trying to change the rules at runtime, because the rules are supposed to be the same for all callbacks.
In this case I've just used link extractors, but if you want to use different rules you can do about the same thing, mirroring the same code to handle rules in the loop shown from CrawlSpider._requests_to_follow.