On Squirrel, the Fetcher is responsible to fetch data from URI’s that the worker received from the Frontier.
All the fetchers must implement the interface org.dice_research.squirrel.fetcher.Fetcher, providing an implementation for the fetch method, that will return the file that was fetched. The Worker uses a generic bean to manage all the implementations used on Runtime. This generic implementation receives other fetcher beans as constructor argument, like the example below:
<bean id="fetcherBean"
class="org.dice_research.squirrel.fetcher.manage.SimpleOrderedFetcherManager">
<constructor-arg>
<list>
<ref bean="httpFetcherBean" />
<ref bean="ftpFetcherBean" />
<ref bean="ckanFetcherBean" />
<ref bean="sparqlDatasetFetcherBean" />
<ref bean="sparqlFetcherBean" />
</list>
</constructor-arg>
</bean>
Make sure that the bean referenced implements org.dice_research.squirrel.fetcher.Fetcher, otherwise an exception will be thrown and the bean will not be used.
All the given fetchers fetches data on a file using RDF serialization, which combined with the RDFAnalyzer, will be good enough for most of the cases. The only exception is the CkanFetcher.
Squirrel offers 5 different implementations for fetchers. They are the following:
<bean id="httpFetcherBean"
class="org.dice_research.squirrel.fetcher.http.HTTPFetcher" />
<bean id="ftpFetcherBean"
class="org.dice_research.squirrel.fetcher.ftp.FTPFetcher" />
<bean id="ckanFetcherBean"
class="org.dice_research.squirrel.fetcher.ckan.java.SimpleCkanFetcher" />
<bean id="sparqlFetcherBean"
class="org.dice_research.squirrel.fetcher.sparql.SparqlBasedFetcher" />
<bean id="sparqlDatasetFetcherBean"
class="org.dice_research.squirrel.fetcher.sparql.SparqlDatasetFetcher">
<constructor-arg index="0" value="20"></constructor-arg>
<constructor-arg index="1" value="0"></constructor-arg>
<constructor-arg index="2" value="1000"></constructor-arg>
</bean>
The first argument is related to the interval (in ms) between the calls. Second and third arguments are related to the offset indexes, start and end, respectively.