dicee.knowledge_graph
Knowledge Graph module for data loading and preprocessing.
Provides the KG class for handling knowledge graph data including loading, preprocessing, and indexing operations.
Classes
Knowledge Graph container and processor. |
Module Contents
- class dicee.knowledge_graph.KG(dataset_dir: str | None = None, byte_pair_encoding: bool = False, padding: bool = False, add_noise_rate: float | None = None, sparql_endpoint: str | None = None, path_single_kg: str | None = None, path_for_deserialization: str | None = None, add_reciprocal: bool | None = None, eval_model: str | None = None, read_only_few: int | None = None, sample_triples_ratio: float | None = None, path_for_serialization: str | None = None, entity_to_idx: Dict | None = None, relation_to_idx: Dict | None = None, backend: str | None = None, training_technique: str | None = None, separator: str | None = None)
Knowledge Graph container and processor.
Handles loading, preprocessing, and indexing of knowledge graph data from various sources including files, SPARQL endpoints, and serialized formats.
- dataset_dir
Path to directory containing train/valid/test files.
- num_entities
Total number of unique entities.
- num_relations
Total number of unique relations.
- train_set
Indexed training triples as numpy array.
- valid_set
Indexed validation triples (optional).
- test_set
Indexed test triples (optional).
- entity_to_idx
Mapping from entity strings to indices.
- relation_to_idx
Mapping from relation strings to indices.
- dataset_dir = None
- sparql_endpoint = None
- path_single_kg = None
- byte_pair_encoding = False
- ordered_shaped_bpe_tokens = None
- add_noise_rate = None
- num_entities: int | None = None
- num_relations: int | None = None
- path_for_deserialization = None
- add_reciprocal = None
- eval_model = None
- read_only_few = None
- sample_triples_ratio = None
- path_for_serialization = None
- entity_to_idx = None
- relation_to_idx = None
- backend = 'pandas'
- training_technique = None
- separator = None
- raw_train_set = None
- raw_valid_set = None
- raw_test_set = None
- train_set = None
- valid_set = None
- test_set = None
- idx_entity_to_bpe_shaped: Dict
- enc
- num_tokens
- num_bpe_entities: int | None = None
- padding = False
- dummy_id
- max_length_subword_tokens: int | None = None
- train_set_target = None
- target_dim: int | None = None
- train_target_indices = None
- ordered_bpe_entities = None
- description_of_input = None
- describe() None
Generate a description string of the dataset statistics.
- property entities_str: List[str]
Get list of all entity strings.
- property relations_str: List[str]
Get list of all relation strings.
- exists(h: str, r: str, t: str) bool
Check if a triple exists in the training set.
- Parameters:
h – Head entity string.
r – Relation string.
t – Tail entity string.
- Returns:
True if the triple exists, False otherwise.
- __iter__() Iterator[Tuple[str, str, str]]
Iterate over training triples as string tuples.
- __len__() int
Return number of triples in the raw training set.
- func_triple_to_bpe_representation(triple: List[str])