public class SimpleUriCollector extends Object implements UriCollector
UriCollector
interface based
on the given Serializer
.Modifier and Type | Field and Description |
---|---|
private static org.slf4j.Logger |
LOGGER |
protected org.dice_research.squirrel.data.uri.serialize.Serializer |
serializer
Serializer used to serialize the given URIs. |
private long |
total_uris
Number of all URIs collected.
|
protected Map<String,Map<String,byte[]>> |
urisOfUris
Mapping from URIs to the new URIs that have been found.
|
Constructor and Description |
---|
SimpleUriCollector(org.dice_research.squirrel.data.uri.serialize.Serializer serializer)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addNewUri(org.dice_research.squirrel.data.uri.CrawleableUri uri,
org.dice_research.squirrel.data.uri.CrawleableUri newUri)
Adds the given new URI to the list of URIs collected for the given URI.
|
void |
closeSinkForUri(org.dice_research.squirrel.data.uri.CrawleableUri uri) |
long |
getSize()
Returns the total number of new URIs that have been added to this collector.
|
long |
getSize(org.dice_research.squirrel.data.uri.CrawleableUri uri)
Returns the total of uris that have been collected
|
Iterator<byte[]> |
getUris(org.dice_research.squirrel.data.uri.CrawleableUri uri)
Returns a list of serialized
CrawleableUri instances that have been
collected for the given URI. |
void |
openSinkForUri(org.dice_research.squirrel.data.uri.CrawleableUri uri) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addNewUri, addNewUri, addTriple
private static final org.slf4j.Logger LOGGER
private long total_uris
protected Map<String,Map<String,byte[]>> urisOfUris
protected org.dice_research.squirrel.data.uri.serialize.Serializer serializer
Serializer
used to serialize the given URIs.public SimpleUriCollector(org.dice_research.squirrel.data.uri.serialize.Serializer serializer)
serializer
- the serializer that is used to serialize the new URIs.public void openSinkForUri(org.dice_research.squirrel.data.uri.CrawleableUri uri)
openSinkForUri
in interface org.dice_research.squirrel.sink.SinkBase
public Iterator<byte[]> getUris(org.dice_research.squirrel.data.uri.CrawleableUri uri)
UriCollector
CrawleableUri
instances that have been
collected for the given URI.getUris
in interface UriCollector
uri
- The URI from which the returned serialized URIs have been
collected.Iterator
that iterates over the already serialized URIs
that have been collected for the given URI.public void addNewUri(org.dice_research.squirrel.data.uri.CrawleableUri uri, org.dice_research.squirrel.data.uri.CrawleableUri newUri)
UriCollector
addNewUri
in interface UriCollector
uri
- The URI from which the given new URI has been collected.newUri
- The new URI that has been collected.public long getSize()
public void closeSinkForUri(org.dice_research.squirrel.data.uri.CrawleableUri uri)
closeSinkForUri
in interface org.dice_research.squirrel.sink.SinkBase
public long getSize(org.dice_research.squirrel.data.uri.CrawleableUri uri)
UriCollector
getSize
in interface UriCollector
uri
- The URI from which the returned serialized URIs have been
collected.Copyright © 2017–2020. All rights reserved.