dicee.knowledge_graph

Knowledge Graph module for data loading and preprocessing.

Provides the KG class for handling knowledge graph data including loading, preprocessing, and indexing operations.

Classes

KG

Knowledge Graph container and processor.

Module Contents

class dicee.knowledge_graph.KG(dataset_dir: str | None = None, byte_pair_encoding: bool = False, padding: bool = False, add_noise_rate: float | None = None, sparql_endpoint: str | None = None, path_single_kg: str | None = None, path_for_deserialization: str | None = None, add_reciprocal: bool | None = None, eval_model: str | None = None, read_only_few: int | None = None, sample_triples_ratio: float | None = None, path_for_serialization: str | None = None, entity_to_idx: Dict | None = None, relation_to_idx: Dict | None = None, backend: str | None = None, training_technique: str | None = None, separator: str | None = None)

Knowledge Graph container and processor.

Handles loading, preprocessing, and indexing of knowledge graph data from various sources including files, SPARQL endpoints, and serialized formats.

dataset_dir

Path to directory containing train/valid/test files.

num_entities

Total number of unique entities.

num_relations

Total number of unique relations.

train_set

Indexed training triples as numpy array.

valid_set

Indexed validation triples (optional).

test_set

Indexed test triples (optional).

entity_to_idx

Mapping from entity strings to indices.

relation_to_idx

Mapping from relation strings to indices.

dataset_dir = None
sparql_endpoint = None
path_single_kg = None
byte_pair_encoding = False
ordered_shaped_bpe_tokens = None
add_noise_rate = None
num_entities: int | None = None
num_relations: int | None = None
path_for_deserialization = None
add_reciprocal = None
eval_model = None
read_only_few = None
sample_triples_ratio = None
path_for_serialization = None
entity_to_idx = None
relation_to_idx = None
backend = 'pandas'
training_technique = None
separator = None
raw_train_set = None
raw_valid_set = None
raw_test_set = None
train_set = None
valid_set = None
test_set = None
idx_entity_to_bpe_shaped: Dict
enc
num_tokens
num_bpe_entities: int | None = None
padding = False
dummy_id
max_length_subword_tokens: int | None = None
train_set_target = None
target_dim: int | None = None
train_target_indices = None
ordered_bpe_entities = None
description_of_input = None
describe() None

Generate a description string of the dataset statistics.

property entities_str: List[str]

Get list of all entity strings.

property relations_str: List[str]

Get list of all relation strings.

exists(h: str, r: str, t: str) bool

Check if a triple exists in the training set.

Parameters:
  • h – Head entity string.

  • r – Relation string.

  • t – Tail entity string.

Returns:

True if the triple exists, False otherwise.

__iter__() Iterator[Tuple[str, str, str]]

Iterate over training triples as string tuples.

__len__() int

Return number of triples in the raw training set.

func_triple_to_bpe_representation(triple: List[str])