dicee.evaluation.evaluator

Main Evaluator class for KGE model evaluation.

This module provides the Evaluator class which orchestrates evaluation of knowledge graph embedding models across different datasets and scoring techniques.

Attributes

VALID_SCORING_TECHNIQUES

Classes

Evaluator

Evaluator class for KGE models in various downstream tasks.

Module Contents

dicee.evaluation.evaluator.VALID_SCORING_TECHNIQUES

class dicee.evaluation.evaluator.Evaluator(args, is_continual_training: bool = False)

Evaluator class for KGE models in various downstream tasks.

Orchestrates link prediction evaluation with different scoring techniques including standard evaluation and byte-pair encoding based evaluation.

er_vocab: Entity-relation to tail vocabulary for filtered ranking.

re_vocab: Relation-entity (tail) to head vocabulary.

ee_vocab: Entity-entity to relation vocabulary.

num_entities: Total number of entities in the knowledge graph.

num_relations: Total number of relations in the knowledge graph.

args: Configuration arguments.

report: Dictionary storing evaluation results.

during_training: Whether evaluation is happening during training.

Example

>>> from dicee.evaluation import Evaluator
>>> evaluator = Evaluator(args)
>>> results = evaluator.eval(dataset, model, 'EntityPrediction')
>>> print(f"Test MRR: {results['Test']['MRR']:.4f}")

re_vocab: Dict | None = None

er_vocab: Dict | None = None

ee_vocab: Dict | None = None

func_triple_to_bpe_representation = None

is_continual_training = False

num_entities: int | None = None

num_relations: int | None = None

domain_constraints_per_rel = None

range_constraints_per_rel = None

args

report: Dict

during_training = False

vocab_preparation(dataset) → None

Prepare vocabularies from the dataset for evaluation.

Resolves any future objects and saves vocabularies to disk.

Parameters:: dataset – Knowledge graph dataset with vocabulary attributes.

eval(dataset, trained_model, form_of_labelling: str, during_training: bool = False) → Dict | None

Evaluate the trained model on the dataset.

Parameters:

dataset – Knowledge graph dataset (KG instance).
trained_model – The trained KGE model.
form_of_labelling – Type of labelling (‘EntityPrediction’ or ‘RelationPrediction’).
during_training – Whether evaluation is during training.

Returns:

Dictionary of evaluation metrics, or None if evaluation is skipped.

eval_rank_of_head_and_tail_entity(*, train_set, valid_set=None, test_set=None, trained_model) → None: Evaluate with negative sampling scoring.

eval_rank_of_head_and_tail_byte_pair_encoded_entity(*, train_set=None, valid_set=None, test_set=None, ordered_bpe_entities, trained_model) → None: Evaluate with BPE-encoded entities and negative sampling.

eval_with_byte(*, raw_train_set, raw_valid_set=None, raw_test_set=None, trained_model, form_of_labelling) → None: Evaluate BytE model with generation.

eval_with_bpe_vs_all(*, raw_train_set, raw_valid_set=None, raw_test_set=None, trained_model, form_of_labelling) → None: Evaluate with BPE and KvsAll scoring.

eval_with_vs_all(*, train_set, valid_set=None, test_set=None, trained_model, form_of_labelling) → None: Evaluate with KvsAll or 1vsAll scoring.

evaluate_lp_k_vs_all(model, triple_idx, info: str | None = None, form_of_labelling: str | None = None) → Dict[str, float]

Filtered link prediction evaluation with KvsAll scoring.

Parameters:

model – The trained model to evaluate.
triple_idx – Integer-indexed test triples.
info – Description to print.
form_of_labelling – ‘EntityPrediction’ or ‘RelationPrediction’.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

evaluate_lp_with_byte(model, triples: List[List[str]], info: str | None = None) → Dict[str, float]

Evaluate BytE model with text generation.

Parameters:

model – BytE model.
triples – String triples.
info – Description to print.

Returns:

Dictionary with placeholder metrics (-1 values).

evaluate_lp_bpe_k_vs_all(model, triples: List[List[str]], info: str | None = None, form_of_labelling: str | None = None) → Dict[str, float]

Evaluate BPE model with KvsAll scoring.

Parameters:

model – BPE-enabled model.
triples – String triples.
info – Description to print.
form_of_labelling – Type of labelling.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

evaluate_lp(model, triple_idx, info: str) → Dict[str, float]

Evaluate link prediction with negative sampling.

Parameters:

model – The model to evaluate.
triple_idx – Integer-indexed triples.
info – Description to print.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

dummy_eval(trained_model, form_of_labelling: str) → None

Run evaluation from saved data (for continual training).

Parameters:

trained_model – The trained model.
form_of_labelling – Type of labelling.

eval_with_data(dataset, trained_model, triple_idx: numpy.ndarray, form_of_labelling: str) → Dict[str, float]

Evaluate a trained model on a given dataset.

Parameters:

dataset – Knowledge graph dataset.
trained_model – The trained model.
triple_idx – Integer-indexed triples to evaluate.
form_of_labelling – Type of labelling.

Returns:

Dictionary with evaluation metrics.

Raises:

ValueError – If scoring technique is invalid.