dicee.static_funcs

Static utility functions for DICE embeddings.

This module provides utility functions for model initialization, data loading, serialization, and various helper operations.

Attributes

MODEL_REGISTRY

Functions

create_recipriocal_triples(→ pandas.DataFrame)

Add inverse triples to a DataFrame.

get_er_vocab(→ Dict[Tuple[int, int], List[int]])

Build entity-relation to tail vocabulary.

get_re_vocab(→ Dict[Tuple[int, int], List[int]])

Build relation-entity (tail) to head vocabulary.

get_ee_vocab(→ Dict[Tuple[int, int], List[int]])

Build entity-entity to relation vocabulary.

timeit(→ Callable)

Decorator to measure and print execution time and memory usage.

save_pickle(→ None)

Save data to a pickle file.

load_pickle(→ object)

Load data from a pickle file.

load_term_mapping(→ Union[dict, polars.DataFrame])

Load term-to-index mapping from pickle or CSV file.

select_model(args[, is_continual_training, storage_path])

load_model(→ Tuple[object, Tuple[dict, dict]])

Load weights and initialize pytorch module from namespace arguments

load_model_ensemble(...)

Construct Ensemble Of weights and initialize pytorch module from namespace arguments

save_numpy_ndarray(*, data, file_path)

numpy_data_type_changer(→ numpy.ndarray)

Detect most efficient data type for a given triples

save_checkpoint_model(→ None)

Store Pytorch model into disk

store(→ None)

add_noisy_triples(→ pandas.DataFrame)

Add randomly constructed triples

read_or_load_kg(args, cls)

intialize_model(...)

Initialize a knowledge graph embedding model.

load_json(→ Dict)

Load JSON file into a dictionary.

save_embeddings(→ None)

Save embeddings to a CSV file.

vocab_to_parquet(vocab_to_idx, name, ...)

create_experiment_folder(→ str)

Create a timestamped experiment folder.

continual_training_setup_executor(→ None)

exponential_function(→ torch.FloatTensor)

load_numpy(→ numpy.ndarray)

evaluate(entity_to_idx, scores, easy_answers, hard_answers)

# @TODO: CD: Renamed this function

download_file(url[, destination_folder])

download_files_from_url(→ None)

download_pretrained_model(→ str)

write_csv_from_model_parallel(path)

Create

from_pretrained_model_write_embeddings_into_csv(→ None)

Module Contents

dicee.static_funcs.MODEL_REGISTRY: Dict[str, Tuple[Type, str]]
dicee.static_funcs.create_recipriocal_triples(df: pandas.DataFrame) pandas.DataFrame

Add inverse triples to a DataFrame.

For each triple (s, p, o), creates an inverse triple (o, p_inverse, s).

Parameters:

df – DataFrame with ‘subject’, ‘relation’, and ‘object’ columns.

Returns:

DataFrame with original and inverse triples concatenated.

dicee.static_funcs.get_er_vocab(data: numpy.ndarray, file_path: str | None = None) Dict[Tuple[int, int], List[int]]

Build entity-relation to tail vocabulary.

Parameters:
  • data – Array of triples with shape (n, 3) where columns are (head, relation, tail).

  • file_path – Optional path to save the vocabulary as pickle.

Returns:

Dictionary mapping (head, relation) pairs to list of tail entities.

dicee.static_funcs.get_re_vocab(data: numpy.ndarray, file_path: str | None = None) Dict[Tuple[int, int], List[int]]

Build relation-entity (tail) to head vocabulary.

Parameters:
  • data – Array of triples with shape (n, 3) where columns are (head, relation, tail).

  • file_path – Optional path to save the vocabulary as pickle.

Returns:

Dictionary mapping (relation, tail) pairs to list of head entities.

dicee.static_funcs.get_ee_vocab(data: numpy.ndarray, file_path: str | None = None) Dict[Tuple[int, int], List[int]]

Build entity-entity to relation vocabulary.

Parameters:
  • data – Array of triples with shape (n, 3) where columns are (head, relation, tail).

  • file_path – Optional path to save the vocabulary as pickle.

Returns:

Dictionary mapping (head, tail) pairs to list of relations.

dicee.static_funcs.timeit(func: Callable) Callable

Decorator to measure and print execution time and memory usage.

Parameters:

func – Function to be timed.

Returns:

Wrapped function that prints timing information.

dicee.static_funcs.save_pickle(*, data: object | None = None, file_path: str) None

Save data to a pickle file.

Note: Consider using more portable formats (JSON, Parquet) for new code.

Parameters:
  • data – Object to serialize. If None, nothing is saved.

  • file_path – Path where the pickle file will be saved.

dicee.static_funcs.load_pickle(file_path: str) object

Load data from a pickle file.

Note: Consider using more portable formats (JSON, Parquet) for new code.

Parameters:

file_path – Path to the pickle file.

Returns:

Deserialized object from the pickle file.

dicee.static_funcs.load_term_mapping(file_path: str) dict | polars.DataFrame

Load term-to-index mapping from pickle or CSV file.

Attempts to load from pickle first, falls back to CSV if not found.

Parameters:

file_path – Base path without extension.

Returns:

Dictionary or Polars DataFrame containing the mapping.

dicee.static_funcs.select_model(args: dict, is_continual_training: bool = None, storage_path: str = None)
dicee.static_funcs.load_model(path_of_experiment_folder: str, model_name='model.pt', verbose=0) Tuple[object, Tuple[dict, dict]]

Load weights and initialize pytorch module from namespace arguments

dicee.static_funcs.load_model_ensemble(path_of_experiment_folder: str) Tuple[dicee.models.base_model.BaseKGE, Tuple[pandas.DataFrame, pandas.DataFrame]]

Construct Ensemble Of weights and initialize pytorch module from namespace arguments

  1. Detect models under given path

  2. Accumulate parameters of detected models

  3. Normalize parameters

  4. Insert (3) into model.

dicee.static_funcs.save_numpy_ndarray(*, data: numpy.ndarray, file_path: str)
dicee.static_funcs.numpy_data_type_changer(train_set: numpy.ndarray, num: int) numpy.ndarray

Detect most efficient data type for a given triples :param train_set: :param num: :return:

dicee.static_funcs.save_checkpoint_model(model, path: str) None

Store Pytorch model into disk

dicee.static_funcs.store(trained_model, model_name: str = 'model', full_storage_path: str = None, save_embeddings_as_csv=False) None
dicee.static_funcs.add_noisy_triples(train_set: pandas.DataFrame, add_noise_rate: float) pandas.DataFrame

Add randomly constructed triples :param train_set: :param add_noise_rate: :return:

dicee.static_funcs.read_or_load_kg(args, cls)
dicee.static_funcs.intialize_model(args: Dict, verbose: int = 0) Tuple[dicee.models.base_model.BaseKGE, str]

Initialize a knowledge graph embedding model.

Parameters:
  • args – Dictionary containing model configuration including ‘model’ key.

  • verbose – Verbosity level. If > 0, prints initialization message.

Returns:

Tuple of (initialized model, form of labelling string).

Raises:

ValueError – If the model name is not recognized.

dicee.static_funcs.load_json(path: str) Dict

Load JSON file into a dictionary.

Parameters:

path – Path to the JSON file.

Returns:

Dictionary containing the JSON data.

Raises:
  • FileNotFoundError – If the file does not exist.

  • json.JSONDecodeError – If the file contains invalid JSON.

dicee.static_funcs.save_embeddings(embeddings: numpy.ndarray, indexes: List, path: str) None

Save embeddings to a CSV file.

Parameters:
  • embeddings – NumPy array of embeddings with shape (n_items, embedding_dim).

  • indexes – List of index labels (entity/relation names).

  • path – Output file path.

dicee.static_funcs.vocab_to_parquet(vocab_to_idx, name, path_for_serialization, print_into)
dicee.static_funcs.create_experiment_folder(folder_name: str = 'Experiments') str

Create a timestamped experiment folder.

Parameters:

folder_name – Base directory name for experiments.

Returns:

Full path to the created experiment folder.

dicee.static_funcs.continual_training_setup_executor(executor) None
dicee.static_funcs.exponential_function(x: numpy.ndarray, lam: float, ascending_order=True) torch.FloatTensor
dicee.static_funcs.load_numpy(path) numpy.ndarray
dicee.static_funcs.evaluate(entity_to_idx, scores, easy_answers, hard_answers)

# @TODO: CD: Renamed this function Evaluate multi hop query answering on different query types

dicee.static_funcs.download_file(url, destination_folder='.')
dicee.static_funcs.download_files_from_url(base_url: str, destination_folder='.') None
Parameters:
dicee.static_funcs.download_pretrained_model(url: str) str
dicee.static_funcs.write_csv_from_model_parallel(path: str)

Create

dicee.static_funcs.from_pretrained_model_write_embeddings_into_csv(path: str) None