Sinks

The Sink has the task to persist extracted triples. Any implementation, must implement the interface org.dice_research.squirrel.sink.Sink. At the moment, three sinks are implemented:

  • FileSink: stores the triples locally in files. This file-based sink supports several RDF serializations and compresses the createdfiles. Below, we have a example of usage on the spring-context file. The implementation requires as arguments a File object, that will point the folder where data will be persisted and a boolean falue, indicating if the file will be compressed or not.
    
        <bean id="outputFolderBean" class="java.io.File">
          <constructor-arg index="0"
            value="#{systemEnvironment['OUTPUT_FOLDER']}" />
    	  </bean>
    
        <bean id="sinkBean"
          class="org.dice_research.squirrel.sink.impl.file.FileBasedSink"/>
          <constructor-arg name="outputDirectory"
            ref="outputFolderBean" />
          <constructor-arg name="useCompression" value="false" />
        </bean/>
        

  • HDT Sink: stores the data on HDTcompressed format. For more information about HDT, please check here. Just like the FileSink, the HDT Sink requires a File object as argument.
    
        <bean id="outputFolderBean" class="java.io.File">
        <constructor-arg index="0"
        value="#{systemEnvironment['OUTPUT_FOLDER']}" />
    	  </bean>
    
        <bean id="sinkBean"
        class="org.dice_research.squirrel.sink.impl.hdt.HdtBasedSink"> 
    		<constructor-arg name="outputDirectory" ref="outputFolderBean" /> </bean>
        

  • SPARQL Sink: uses SPARQL queries to insert the new triples into a triple store. This implementation requires 5 passed to the constructor: the Sparql endpoint URL and the user and password for the endpoint. The 4th argument refers to the interval (in ms) between each attempt, if the storage fails for some reason (temporary connectivity problem or any other unavailability) and the last argument indicates the number of attempts that the sink will perform.
    
         <bean id="sinkBean" class="org.dice_research.squirrel.sink.impl.sparql.SparqlBasedSink" factory-method="create"> 
          <constructor-arg index="0" value="#{systemEnvironment['SPARQL_URL']}" />
          <constructor-arg index="1" value="#{systemEnvironment['SPARQL_HOST_USER']}" />
          <constructor-arg index="2" value="#{systemEnvironment['SPARQL_HOST_PASSWD']}" />
          <constructor-arg index="3" value="1000" />
          <constructor-arg index="4" value="10" />
         </bean>