dicee.evaluation.ensemble

Ensemble evaluation functions.

This module provides functions for evaluating ensemble models, including weighted averaging and score normalization.

Functions

evaluate_ensemble_link_prediction_performance(...)

Evaluate link prediction performance of an ensemble of KGE models.

Module Contents

dicee.evaluation.ensemble.evaluate_ensemble_link_prediction_performance(models: List, triples, er_vocab: Dict[Tuple, List], weights: List[float] | None = None, batch_size: int = 512, weighted_averaging: bool = True, normalize_scores: bool = True) → Dict[str, float][source]

Evaluate link prediction performance of an ensemble of KGE models.

Combines predictions from multiple models using weighted or simple averaging, with optional score normalization.

Parameters:

models – List of KGE models (e.g., snapshots from training).
triples – Test triples as numpy array or list, shape (N, 3), with integer indices (head, relation, tail).
er_vocab – Mapping (head_idx, rel_idx) -> list of tail indices for filtered evaluation.
weights – Weights for model averaging. Required if weighted_averaging is True. Must sum to 1 for proper averaging.
batch_size – Batch size for processing triples.
weighted_averaging – If True, use weighted averaging of predictions. If False, use simple mean.
normalize_scores – If True, normalize scores to [0, 1] range per sample before averaging.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

Raises:

AssertionError – If weighted_averaging is True but weights are not provided or have wrong length.

Example

>>> from dicee.evaluation import evaluate_ensemble_link_prediction_performance
>>> models = [model1, model2, model3]
>>> weights = [0.5, 0.3, 0.2]
>>> results = evaluate_ensemble_link_prediction_performance(
...     models, test_triples, er_vocab,
...     weights=weights, weighted_averaging=True
... )
>>> print(f"MRR: {results['MRR']:.4f}")