dicee.read_preprocess_save_load_kg.util
Functions
|
Replaces 'subject', 'relation', and 'object' columns in the input Polars DataFrame with their corresponding index values |
|
|
|
|
|
Load and Preprocess via Polars |
|
|
|
|
|
Read triples from triple store into pandas dataframe |
|
|
|
|
|
|
|
|
|
Deserialize data |
|
|
|
|
|
|
|
|
Add inverse triples into dask dataframe |
|
|
|
|
Module Contents
- dicee.read_preprocess_save_load_kg.util.polars_dataframe_indexer(df_polars: polars.DataFrame, idx_entity: polars.DataFrame, idx_relation: polars.DataFrame) polars.DataFrame [source]
Replaces ‘subject’, ‘relation’, and ‘object’ columns in the input Polars DataFrame with their corresponding index values from the entity and relation index DataFrames.
This function processes the DataFrame in three main steps: 1. Replace the ‘relation’ values with the corresponding index from idx_relation. 2. Replace the ‘subject’ values with the corresponding index from idx_entity. 3. Replace the ‘object’ values with the corresponding index from idx_entity.
Parameters:
- df_polarspolars.DataFrame
The input Polars DataFrame containing columns: ‘subject’, ‘relation’, and ‘object’.
- idx_entitypolars.DataFrame
A Polars DataFrame that contains the mapping between entity names and their corresponding indices. Must have columns: ‘entity’ and ‘index’.
- idx_relationpolars.DataFrame
A Polars DataFrame that contains the mapping between relation names and their corresponding indices. Must have columns: ‘relation’ and ‘index’.
Returns:
- polars.DataFrame
A DataFrame with the ‘subject’, ‘relation’, and ‘object’ columns replaced by their corresponding indices.
Example Usage:
>>> df_polars = pl.DataFrame({ "subject": ["Alice", "Bob", "Charlie"], "relation": ["knows", "works_with", "lives_in"], "object": ["Dave", "Eve", "Frank"] }) >>> idx_entity = pl.DataFrame({ "entity": ["Alice", "Bob", "Charlie", "Dave", "Eve", "Frank"], "index": [0, 1, 2, 3, 4, 5] }) >>> idx_relation = pl.DataFrame({ "relation": ["knows", "works_with", "lives_in"], "index": [0, 1, 2] }) >>> polars_dataframe_indexer(df_polars, idx_entity, idx_relation)
Steps:
Join the input DataFrame df_polars on the ‘relation’ column with idx_relation to replace the relations with their indices.
Join on ‘subject’ to replace it with the corresponding entity index using a left join on idx_entity.
Join on ‘object’ to replace it with the corresponding entity index using a left join on idx_entity.
Select only the ‘subject’, ‘relation’, and ‘object’ columns to return the final result.
- dicee.read_preprocess_save_load_kg.util.apply_reciprical_or_noise(add_reciprical: bool, eval_model: str, df: object = None, info: str = None)[source]
Add reciprocal triples (2) Add noisy triples
- dicee.read_preprocess_save_load_kg.util.read_with_polars(data_path, read_only_few: int = None, sample_triples_ratio: float = None) polars.DataFrame [source]
Load and Preprocess via Polars
- dicee.read_preprocess_save_load_kg.util.read_with_pandas(data_path, read_only_few: int = None, sample_triples_ratio: float = None)[source]
- dicee.read_preprocess_save_load_kg.util.read_from_disk(data_path: str, read_only_few: int = None, sample_triples_ratio: float = None, backend=None)[source]
- dicee.read_preprocess_save_load_kg.util.read_from_triple_store(endpoint: str = None)[source]
Read triples from triple store into pandas dataframe
- dicee.read_preprocess_save_load_kg.util.create_constraints(triples, file_path: str = None)[source]
Extract domains and ranges of relations
(2) Store a mapping from relations to entities that are outside of the domain and range. Crete constrainted entities based on the range of relations :param triples: :return: Tuple[dict, dict]
- dicee.read_preprocess_save_load_kg.util.save_numpy_ndarray(*, data: numpy.ndarray, file_path: str)[source]
- dicee.read_preprocess_save_load_kg.util.create_recipriocal_triples(x)[source]
Add inverse triples into dask dataframe :param x: :return:
- dicee.read_preprocess_save_load_kg.util.index_triples_with_pandas(train_set, entity_to_idx: dict, relation_to_idx: dict) pandas.core.frame.DataFrame [source]
- Parameters:
train_set – pandas dataframe
entity_to_idx – a mapping from str to integer index
relation_to_idx – a mapping from str to integer index
num_core – number of cores to be used
- Returns:
indexed triples, i.e., pandas dataframe