dicee.evaluation.ensemble

Ensemble evaluation functions.

This module provides functions for evaluating ensemble models, including weighted averaging and score normalization.

Functions

evaluate_ensemble_link_prediction_performance(...)

Evaluate link prediction performance of an ensemble of KGE models.

Module Contents

Evaluate link prediction performance of an ensemble of KGE models.

Combines predictions from multiple models using weighted or simple averaging, with optional score normalization.

Parameters:
  • models – List of KGE models (e.g., snapshots from training).

  • triples – Test triples as numpy array or list, shape (N, 3), with integer indices (head, relation, tail).

  • er_vocab – Mapping (head_idx, rel_idx) -> list of tail indices for filtered evaluation.

  • weights – Weights for model averaging. Required if weighted_averaging is True. Must sum to 1 for proper averaging.

  • batch_size – Batch size for processing triples.

  • weighted_averaging – If True, use weighted averaging of predictions. If False, use simple mean.

  • normalize_scores – If True, normalize scores to [0, 1] range per sample before averaging.

Returns:

Dictionary with H@1, H@3, H@10, and MRR metrics.

Raises:

AssertionError – If weighted_averaging is True but weights are not provided or have wrong length.

Example

>>> from dicee.evaluation import evaluate_ensemble_link_prediction_performance
>>> models = [model1, model2, model3]
>>> weights = [0.5, 0.3, 0.2]
>>> results = evaluate_ensemble_link_prediction_performance(
...     models, test_triples, er_vocab,
...     weights=weights, weighted_averaging=True
... )
>>> print(f"MRR: {results['MRR']:.4f}")