In a nutshell
TasksTask 1 – Fact ValidationTask 2 – Fact Validation at Scale
Evaluation Gerbil HOBBIT
Training data25k positive and negative examples
Testing data25k positive and negative examples
System submissionJune 15, 2019
Notification of AcceptanceJuly 08, 2019
Challenge PresentationOctober 30, 2019

The International Semantic Web Conference, to be held in Auckland in late October 2019, hosts an annual challenge that aims to promote the use of innovative and new approaches to creation and use of the Semantic Web. This year’s challenge will focus on knowledge graphs. Both public and privately owned, knowledge graphs are currently among the most prominent implementations of Semantic Web technologies. This year’s challenge is centered on validation of factual information in a newly generated Knowledge graph. The challenge is divided into two tasks:

  • Task One: Fact Validation. Given a statement about an entity, e.g., the indication of a drug, participants are expected to provide an assessment about the correctness of the statement.
  • Task Two: Fact Validation at Scale. In this task the participating systems will be evaluated for their scalability including runtime measurements and their ability to handle parallel requests.

Participants may choose to participate in one or both tasks. For both tasks, users may use a portion of the knowledge graph for training. The participants can make use of structured and unstructured information from Internet sources for validating facts. The evaluation of participating systems will be carried out on the testing portion of the Knowledge Graph owned by the organizers of the challenge.

Core Data Set

The core dataset consists of a graph of entities (drugs, diseases and products) and information linking these entities. The dataset is created by extracting information from a well known source and identifying links between entities. The identified links are used to create new properties and generate triple statements. The newly generated properties are well defined by both domain and range information and short descriptions. The dataset has a training and a testing part: the training part will be made available to the participants for building and training candidate systems. The testing part will be used for evaluation of participating systems. Both the training and testing parts are split into positive and negative statements. While the positive statements are generated by identifying the entities for which the proposed properties hold, the negative statements are generated by replacing the entities in the positive statements such that the generated triples are false or invalid. To generate facts that are challenging to be classified as false, we generate negative statements that are similar to the positive ones. To do this, we apply heuristics: for example, the domain and range information is not violated while generating the negative statements. Wherever possible, we apply string similarity metrics and select negative statements that are more similar to the positive statements.

Task Descriptions
Task One: Fact Validation

The task is to validate triple statements i.e., to check if a given property holds for the subject and object entity. For this task, participants will create an algorithm that takes as input a given triple and returns a trust score. The challenge participants are expected to provide a trust score for each of the statements (i.e., a numerical value between 0 and 1), where 0 means that they are sure that the statement is false and 1 means that they are sure the statement is true. As mentioned above, to facilitate training, the participants will be provided with the training part of the dataset consisting of positive and negative statements. The positive and negative statements are labeled with trust scores 1 and 0, respectively. An example graph of an input triple in the training set is as follows:

<> <>> .
<> <> <> .
<> <> <> .
<> <> <> .
<> <> "Lepirudin"@en .                                                              
<> <> "Edoxaban"@en .                                                               
<> <> "0.0"^^<> .

Note that, the statements in the training set are labeled with trust scores 0 or 1 using the property <>. It is up to the participants to further split the training set into training and validation sets accordingly. The following properties will be used in scoring the systems:


The domain and range information and the descriptions of the above mentioned properties will be made available with the training set.

Task Two: Fact Validation at Scale

For the second task, the participants must ensure that their systems are scalable to handle large amount of input. The systems must be as fast as possible. The participants are expected to provide their implementations encapsulated in a docker container. Additionally, the system methods must me wrapped in a System Adapter. Detailed instructions on integrating the system can be found here.

Train Dataset

For both the tasks, same set of training data is provided. The training data consists of 25k (both positive and negative) examples, equally distributed for each of the five properties mentioned above. The dataset can be accessed here.


Task One: Fact Validation

For solution scoring, the test dataset containing positive and negative statements will be provided to the participants. The test inputs contains similar statements like the training input (refer example), except the statement that indicate the trust score, which are only known to the dataset owners. The participants are expected to provide a file containing triple statements of scores, one for each input. For example, a system could return

<> <> "0.0"^^<> .

The solutions will be scored by using the area under the ROC curve (AUC).

Task Two: Fact Validation at Scale

For the evaluation of task two, the participants must provide their implementations as described in the task description section. The systems will be evaluated by using combination of AUC-ROC and runtime measurements.

Task One: Fact Validation

The participants can submit their system results via Gerbil. Note that, only submissions published to the leaderboard will be accepted as to the competition.

Task Two: Fact Validation at Scale

Submissions for the task two can be done via the HOBBIT platform. System metadata files (system.ttl) should use <> as API URI.

Winning team for each task will be invited to present their implementation at the conference. Any submitting team may also provide posters for display at the conference accompanied by 3-4 page papers. The challenge committee is working to have these collected in special proceedings.


The following timelines must be observed both by the organizers and participants of the challenge:

March 15, 2019 : Release of Dataset, competition begins.
June 15, 2019 : Submission of Systems.
July 08, 2019 : System Results and Notification of Acceptance.
October 30, 2019 : Challenge Presentation.


The participants will present their systems in the challenge session at ISWC 2019. Details of the session can be found below

14:00 - 14:10Challenge description, presentation of results.
14:10 - 14:30 An Ensemble Model for Fact Validation of Knowledge Graph by Thinh et al.
14:30 - 14:50 Fact Validation with Knowledge Graph Embeddings by Ammar et al.
14:50 - 15:00 Handing participation certificate, discussion.