The Analyzer is responsible for retrieving triples from fetched data. It is possbile to have multiple Analyzers beans running, they have to be referenced by the generic implementation, like described below:
<bean id="analyzerBean"
class="org.dice_research.squirrel.analyzer.manager.SimpleAnalyzerManager">
<constructor-arg index="0">
<list>
<ref bean="rdfAnalyzerBean"/>
<ref bean="ckananalyzerBean"/>
<ref bean="rdfaAnalyzerBean"/>
<ref bean="microdataanalyzerBean"/>
<ref bean="microformatanalyzerBean"/>
<ref bean="hdtanalyzerBean"/>
<ref bean="htmlscraperanalyzerBean"/>
</list>
</constructor-arg>
</bean>
All the analyzers must extend the abstract class org.dice_research.squirrel.analyzer.impl.AbstractAnalyzer and override the anayze method. The constructor of the analyzer receives a class of type org.dice_research.squirrel.collect.UriCollector. All the analyzers should implement the analyze and isElegible methods. The isElegible method, will check if that analyzer implementation is capable of dealing with the fetched data and if it is will call the analyze method. The analyze method, will receive the URI that is being crawled, the fetched file and the sink sik implementation chosen
Currently, the following analyzers are available: