dicee.evaluation.evaluator

Main Evaluator class for KGE model evaluation.

This module provides the Evaluator class which orchestrates evaluation of knowledge graph embedding models across different datasets and scoring techniques.

Classes

Evaluator

Evaluator class for KGE models in various downstream tasks.

Module Contents

class dicee.evaluation.evaluator.Evaluator(args, is_continual_training: bool = False)

Evaluator class for KGE models in various downstream tasks.

Orchestrates link prediction evaluation with different scoring techniques including standard evaluation and byte-pair encoding based evaluation.

er_vocab

Entity-relation to tail vocabulary for filtered ranking.

re_vocab

Relation-entity (tail) to head vocabulary.

ee_vocab

Entity-entity to relation vocabulary.

num_entities

Total number of entities in the knowledge graph.

num_relations

Total number of relations in the knowledge graph.

args

Configuration arguments.

report

Dictionary storing evaluation results.

during_training

Whether evaluation is happening during training.

Example

>>> from dicee.evaluation import Evaluator
>>> evaluator = Evaluator(args)
>>> results = evaluator.eval(dataset, model, 'EntityPrediction')
>>> print(f"Test MRR: {results['Test']['MRR']:.4f}")
re_vocab: Dict | None = None
er_vocab: Dict | None = None
ee_vocab: Dict | None = None
func_triple_to_bpe_representation = None
is_continual_training = False
num_entities: int | None = None
num_relations: int | None = None
domain_constraints_per_rel = None
range_constraints_per_rel = None
args
report: Dict
during_training = False
vocab_preparation(dataset) None

Prepare vocabularies from the dataset for evaluation.

Resolves any future objects and saves vocabularies to disk.

Parameters:

dataset – Knowledge graph dataset with vocabulary attributes.

eval(dataset, trained_model, form_of_labelling: str, during_training: bool = False) Dict | None

Evaluate the trained model on the dataset.

Parameters:
  • dataset – Knowledge graph dataset (KG instance).

  • trained_model – The trained KGE model.

  • form_of_labelling – Type of labelling (‘EntityPrediction’ or ‘RelationPrediction’).

  • during_training – Whether evaluation is during training.

Returns:

Dictionary of evaluation metrics, or None if evaluation is skipped.

eval_rank_of_head_and_tail_entity(*, train_set, valid_set=None, test_set=None, trained_model) None

Evaluate with negative sampling scoring.

eval_rank_of_head_and_tail_byte_pair_encoded_entity(*, train_set=None, valid_set=None, test_set=None, ordered_bpe_entities, trained_model) None

Evaluate with BPE-encoded entities and negative sampling.

eval_with_byte(*, raw_train_set, raw_valid_set=None, raw_test_set=None, trained_model, form_of_labelling) None

Evaluate BytE model with generation.

eval_with_bpe_vs_all(*, raw_train_set, raw_valid_set=None, raw_test_set=None, trained_model, form_of_labelling) None

Evaluate with BPE and KvsAll scoring.

eval_with_vs_all(*, train_set, valid_set=None, test_set=None, trained_model, form_of_labelling) None

Evaluate with KvsAll or 1vsAll scoring.

evaluate_lp_k_vs_all(model, triple_idx, info: str = None, form_of_labelling: str = None) Dict[str, float]

Filtered link prediction evaluation with KvsAll scoring.

Parameters:
  • model – The trained model to evaluate.

  • triple_idx – Integer-indexed test triples.

  • info – Description to print.

  • form_of_labelling – ‘EntityPrediction’ or ‘RelationPrediction’.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

evaluate_lp_with_byte(model, triples: List[List[str]], info: str = None) Dict[str, float]

Evaluate BytE model with text generation.

Parameters:
  • model – BytE model.

  • triples – String triples.

  • info – Description to print.

Returns:

Dictionary with placeholder metrics (-1 values).

evaluate_lp_bpe_k_vs_all(model, triples: List[List[str]], info: str = None, form_of_labelling: str = None) Dict[str, float]

Evaluate BPE model with KvsAll scoring.

Parameters:
  • model – BPE-enabled model.

  • triples – String triples.

  • info – Description to print.

  • form_of_labelling – Type of labelling.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

evaluate_lp(model, triple_idx, info: str) Dict[str, float]

Evaluate link prediction with negative sampling.

Parameters:
  • model – The model to evaluate.

  • triple_idx – Integer-indexed triples.

  • info – Description to print.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

dummy_eval(trained_model, form_of_labelling: str) None

Run evaluation from saved data (for continual training).

Parameters:
  • trained_model – The trained model.

  • form_of_labelling – Type of labelling.

eval_with_data(dataset, trained_model, triple_idx: numpy.ndarray, form_of_labelling: str) Dict[str, float]

Evaluate a trained model on a given dataset.

Parameters:
  • dataset – Knowledge graph dataset.

  • trained_model – The trained model.

  • triple_idx – Integer-indexed triples to evaluate.

  • form_of_labelling – Type of labelling.

Returns:

Dictionary with evaluation metrics.

Raises:

ValueError – If scoring technique is invalid.