dicee.evaluation.ensemble
Ensemble evaluation functions.
This module provides functions for evaluating ensemble models, including weighted averaging and score normalization.
Functions
Evaluate link prediction performance of an ensemble of KGE models. |
Module Contents
- dicee.evaluation.ensemble.evaluate_ensemble_link_prediction_performance(models: List, triples, er_vocab: Dict[Tuple, List], weights: List[float] | None = None, batch_size: int = 512, weighted_averaging: bool = True, normalize_scores: bool = True) Dict[str, float]
Evaluate link prediction performance of an ensemble of KGE models.
Combines predictions from multiple models using weighted or simple averaging, with optional score normalization.
- Parameters:
models – List of KGE models (e.g., snapshots from training).
triples – Test triples as numpy array or list, shape (N, 3), with integer indices (head, relation, tail).
er_vocab – Mapping (head_idx, rel_idx) -> list of tail indices for filtered evaluation.
weights – Weights for model averaging. Required if weighted_averaging is True. Must sum to 1 for proper averaging.
batch_size – Batch size for processing triples.
weighted_averaging – If True, use weighted averaging of predictions. If False, use simple mean.
normalize_scores – If True, normalize scores to [0, 1] range per sample before averaging.
- Returns:
Dictionary with H@1, H@3, H@10, and MRR metrics.
- Raises:
AssertionError – If weighted_averaging is True but weights are not provided or have wrong length.
Example
>>> from dicee.evaluation import evaluate_ensemble_link_prediction_performance >>> models = [model1, model2, model3] >>> weights = [0.5, 0.3, 0.2] >>> results = evaluate_ensemble_link_prediction_performance( ... models, test_triples, er_vocab, ... weights=weights, weighted_averaging=True ... ) >>> print(f"MRR: {results['MRR']:.4f}")