public interface RobotsManager
Modifier and Type | Method and Description |
---|---|
long |
getMinWaitingTime(org.dice_research.squirrel.data.uri.CrawleableUri curi)
Returns the minimum time a crawler should wait before sending a new
request to the given domain.
|
boolean |
isUriCrawlable(org.dice_research.squirrel.data.uri.CrawleableUri curi)
Returns true, if the robots.txt file does not forbid the crawling of that
URI.
|
boolean isUriCrawlable(org.dice_research.squirrel.data.uri.CrawleableUri curi)
curi
- the URI that should be crawledlong getMinWaitingTime(org.dice_research.squirrel.data.uri.CrawleableUri curi)
curi
- a URI containing the domain to which two or more requests
should be send.Copyright © 2017–2020. All rights reserved.