Fetchers

On Squirrel, the Fetcher is responsible to fetch data from URI’s that the worker received from the Frontier.

All the fetchers must implement the interface org.dice_research.squirrel.fetcher.Fetcher, providing an implementation for the fetch method, that will return the file that was fetched. The Worker uses a generic bean to manage all the implementations used on Runtime. This generic implementation receives other fetcher beans as constructor argument, like the example below:



	   <bean id="fetcherBean"
		class="org.dice_research.squirrel.fetcher.manage.SimpleOrderedFetcherManager">
		<constructor-arg>
			<list>
				<ref bean="httpFetcherBean" />
				<ref bean="ftpFetcherBean" />
				<ref bean="ckanFetcherBean" />
				<ref bean="sparqlDatasetFetcherBean" />
				<ref bean="sparqlFetcherBean" />
 		  	</list>
		</constructor-arg>
	</bean>

      

Make sure that the bean referenced implements org.dice_research.squirrel.fetcher.Fetcher, otherwise an exception will be thrown and the bean will not be used.

All the given fetchers fetches data on a file using RDF serialization, which combined with the RDFAnalyzer, will be good enough for most of the cases. The only exception is the CkanFetcher.

Squirrel offers 5 different implementations for fetchers. They are the following:

  • HttpFetcher: fetches file using the http protocol, but accepting headers for rdf (application/rdf+xml). Create the bean with:
    
    <bean id="httpFetcherBean"
    		class="org.dice_research.squirrel.fetcher.http.HTTPFetcher" />
    

  • FtpFetcher: fetches file from FTP endpoints. Create the bean with:
    
    <bean id="ftpFetcherBean"
    		class="org.dice_research.squirrel.fetcher.ftp.FTPFetcher" />
    		

  • CkanFetcher: fetches data from CKAN endpoints. The data will be fetched on a file as JSON. This fetcher requires the use of the analyzer org.dice_research.squirrel.analyzer.impl.ckan.CkanJsonAnalyzer, otherwise, the fetched data will not be analyzed and ignored. Create the bean with:
    
    <bean id="ckanFetcherBean"
    		class="org.dice_research.squirrel.fetcher.ckan.java.SimpleCkanFetcher" />
    		

  • SparqlFetcher: fetches all data from a SPARQL endpoint. Create the bean with:
    
    <bean id="sparqlFetcherBean"
    		class="org.dice_research.squirrel.fetcher.sparql.SparqlBasedFetcher" />

  • SparqlDatasetFetcher: Fetches data from a SPARQL endpoint that describes datasets according to the DCAT onthology (https://www.w3.org/TR/vocab-dcat/).
    
        <bean id="sparqlDatasetFetcherBean"
      		class="org.dice_research.squirrel.fetcher.sparql.SparqlDatasetFetcher">
      		<constructor-arg index="0" value="20"></constructor-arg>
      		<constructor-arg index="1" value="0"></constructor-arg>
      		<constructor-arg index="2" value="1000"></constructor-arg>
      	</bean>

    The first argument is related to the interval (in ms) between the calls. Second and third arguments are related to the offset indexes, start and end, respectively.