Modifier and Type | Field and Description |
---|---|
protected String |
acceptCharset
The value which will be used for the HTTP Accept Charset header if the give
CrawleableUri object does not define a header value. |
protected static Set<String> |
ACCEPTED_SCHEMES
URI schemes which are accepted by this fetcher (i.e.,
"http" and
"https" ). |
protected String |
acceptHeader
The value which will be used for the HTTP Accept header if the give
CrawleableUri object does not define a header value. |
protected org.apache.http.impl.client.CloseableHttpClient |
client
The HTTP client instance used by this feature.
|
protected File |
dataDirectory
The temporary directory which will be used to store downloaded data.
|
static String |
DEFAULT_ACCEPT_HEADER_STRING
The default HTTP Accept header value which simply accepts everything.
|
static String |
HTTP_RESPONSE_HEADER_PREFIX
The prefix which is added to HTTP response headers before they are stored the
CrawleableUri 's data map. |
private static org.slf4j.Logger |
LOGGER |
Constructor and Description |
---|
HTTPFetcher() |
HTTPFetcher(org.apache.http.impl.client.CloseableHttpClient client) |
HTTPFetcher(String userAgent) |
Modifier and Type | Method and Description |
---|---|
void |
close() |
File |
fetch(org.dice_research.squirrel.data.uri.CrawleableUri uri,
Delayer delayer)
|
protected File |
requestData(org.dice_research.squirrel.data.uri.CrawleableUri uri,
File outputFile) |
void |
setAcceptCharset(String acceptCharset) |
void |
setAcceptHeader(String acceptHeader)
The value of the HTTP Accept header field that is used if the given
CrawleableUri instance does not define this. |
void |
setDataDirectory(File dataDirectory) |
private static final org.slf4j.Logger LOGGER
public static final String DEFAULT_ACCEPT_HEADER_STRING
public static final String HTTP_RESPONSE_HEADER_PREFIX
CrawleableUri
's data map.protected static final Set<String> ACCEPTED_SCHEMES
"http"
and
"https"
).protected String acceptHeader
CrawleableUri
object does not define a header value.protected String acceptCharset
CrawleableUri
object does not define a header value.protected org.apache.http.impl.client.CloseableHttpClient client
protected File dataDirectory
public HTTPFetcher()
public HTTPFetcher(String userAgent)
public HTTPFetcher(org.apache.http.impl.client.CloseableHttpClient client)
public File fetch(org.dice_research.squirrel.data.uri.CrawleableUri uri, Delayer delayer)
Fetcher
protected File requestData(org.dice_research.squirrel.data.uri.CrawleableUri uri, File outputFile) throws org.apache.http.client.ClientProtocolException, FileNotFoundException, IOException
org.apache.http.client.ClientProtocolException
FileNotFoundException
IOException
public void setAcceptHeader(String acceptHeader)
CrawleableUri
instance does not define this. Note that the
given string has to follow
section 5.3.2 of
RFC-7231.acceptHeader
- the new value of the accept header as defined in
RFC-7231.public void setAcceptCharset(String acceptCharset)
public void setDataDirectory(File dataDirectory)
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
IOException
Copyright © 2017–2020. All rights reserved.