Wednesday, 28 August 2013

scrapy spider bypass deny my rules

scrapy spider bypass deny my rules

hi im trying to use crawlspider and I created my own deny rules
path_deny_base = [ '.(login)', '.(intro)', '(candidate)', '(referral)',
'(reminder)', #'(/search)',
and my rule is
rules = (Rule (SgmlLinkExtractor(deny = path_deny_base,
allow=('https://careers-cooperhealth.icims.com/jobs/'),restrict_xpaths=('*'))
, callback="parse_items", follow= True),
)
still my spider crawled pages like
https://careers-cooperhealth.icims.com/jobs/22660/registered-nurse-prn/login
where login should not be crawled what is the problem here?

No comments:

Post a Comment