| Modifier and Type | Field and Description |
|---|---|
protected String |
acceptCharset
The value which will be used for the HTTP Accept Charset header if the give
CrawleableUri object does not define a header value. |
protected static Set<String> |
ACCEPTED_SCHEMES
URI schemes which are accepted by this fetcher (i.e.,
"http" and
"https"). |
protected String |
acceptHeader
The value which will be used for the HTTP Accept header if the give
CrawleableUri object does not define a header value. |
protected org.apache.http.impl.client.CloseableHttpClient |
client
The HTTP client instance used by this feature.
|
protected File |
dataDirectory
The temporary directory which will be used to store downloaded data.
|
static String |
DEFAULT_ACCEPT_HEADER_STRING
The default HTTP Accept header value which simply accepts everything.
|
static String |
HTTP_RESPONSE_HEADER_PREFIX
The prefix which is added to HTTP response headers before they are stored the
CrawleableUri's data map. |
private static org.slf4j.Logger |
LOGGER |
| Constructor and Description |
|---|
HTTPFetcher() |
HTTPFetcher(org.apache.http.impl.client.CloseableHttpClient client) |
HTTPFetcher(String userAgent) |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
File |
fetch(org.dice_research.squirrel.data.uri.CrawleableUri uri,
Delayer delayer)
|
protected File |
requestData(org.dice_research.squirrel.data.uri.CrawleableUri uri,
File outputFile) |
void |
setAcceptCharset(String acceptCharset) |
void |
setAcceptHeader(String acceptHeader)
The value of the HTTP Accept header field that is used if the given
CrawleableUri instance does not define this. |
void |
setDataDirectory(File dataDirectory) |
private static final org.slf4j.Logger LOGGER
public static final String DEFAULT_ACCEPT_HEADER_STRING
public static final String HTTP_RESPONSE_HEADER_PREFIX
CrawleableUri's data map.protected static final Set<String> ACCEPTED_SCHEMES
"http" and
"https").protected String acceptHeader
CrawleableUri object does not define a header value.protected String acceptCharset
CrawleableUri object does not define a header value.protected org.apache.http.impl.client.CloseableHttpClient client
protected File dataDirectory
public HTTPFetcher()
public HTTPFetcher(String userAgent)
public HTTPFetcher(org.apache.http.impl.client.CloseableHttpClient client)
public File fetch(org.dice_research.squirrel.data.uri.CrawleableUri uri, Delayer delayer)
Fetcherprotected File requestData(org.dice_research.squirrel.data.uri.CrawleableUri uri, File outputFile) throws org.apache.http.client.ClientProtocolException, FileNotFoundException, IOException
org.apache.http.client.ClientProtocolExceptionFileNotFoundExceptionIOExceptionpublic void setAcceptHeader(String acceptHeader)
CrawleableUri instance does not define this. Note that the
given string has to follow
section 5.3.2 of
RFC-7231.acceptHeader - the new value of the accept header as defined in
RFC-7231.public void setAcceptCharset(String acceptCharset)
public void setDataDirectory(File dataDirectory)
public void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableIOExceptionCopyright © 2017–2020. All rights reserved.