public interface KnownUriFilter extends UriFilter
UriFilter
that works like a blacklist filter and contains only those
URIs on its blacklist that the crawler already has seen before.Modifier and Type | Method and Description |
---|---|
default void |
add(CrawleableUri uri)
Adds the given URI to the list of already known URIs.
|
void |
add(CrawleableUri uri,
long nextCrawlTimestamp)
Adds the given URI to the list of already known URIs.
|
void |
add(CrawleableUri uri,
long lastCrawlTimestamp,
long nextCrawlTimestamp)
Adds the given URI to the list of already known URIs together with the the time at which it has been crawled.
|
long |
count()
count the numbers of known URIs
|
List<CrawleableUri> |
getOutdatedUris()
Returns all
CrawleableUri s which have to be recrawled. |
void |
open()
Opens the queue and allocates necessary resources.
|
void add(CrawleableUri uri, long nextCrawlTimestamp)
add(CrawleableUri, long)
with the current system time.uri
- the URI that should be added to the list.nextCrawlTimestamp
- The time at which the given URI should be crawled next.void add(CrawleableUri uri, long lastCrawlTimestamp, long nextCrawlTimestamp)
uri
- the URI that should be added to the list.lastCrawlTimestamp
- the time at which the given URI has eben crawled.nextCrawlTimestamp
- The time at which the given URI should be crawled next.default void add(CrawleableUri uri)
UriFilter
#add(CrawleableUri, long)
with the current system time.List<CrawleableUri> getOutdatedUris()
CrawleableUri
s which have to be recrawled. This means their time to next crawl has passed.CrawleableUri
s.long count()
void open()
Copyright © 2017–2020. All rights reserved.