dicee.read_preprocess_save_load_kg.util ======================================= .. py:module:: dicee.read_preprocess_save_load_kg.util Functions --------- .. autoapisummary:: dicee.read_preprocess_save_load_kg.util.polars_dataframe_indexer dicee.read_preprocess_save_load_kg.util.pandas_dataframe_indexer dicee.read_preprocess_save_load_kg.util.apply_reciprical_or_noise dicee.read_preprocess_save_load_kg.util.timeit dicee.read_preprocess_save_load_kg.util.read_with_polars dicee.read_preprocess_save_load_kg.util.read_with_pandas dicee.read_preprocess_save_load_kg.util.read_from_disk dicee.read_preprocess_save_load_kg.util.read_from_triple_store dicee.read_preprocess_save_load_kg.util.get_er_vocab dicee.read_preprocess_save_load_kg.util.get_re_vocab dicee.read_preprocess_save_load_kg.util.get_ee_vocab dicee.read_preprocess_save_load_kg.util.create_constraints dicee.read_preprocess_save_load_kg.util.load_with_pandas dicee.read_preprocess_save_load_kg.util.save_numpy_ndarray dicee.read_preprocess_save_load_kg.util.load_numpy_ndarray dicee.read_preprocess_save_load_kg.util.save_pickle dicee.read_preprocess_save_load_kg.util.load_pickle dicee.read_preprocess_save_load_kg.util.create_recipriocal_triples dicee.read_preprocess_save_load_kg.util.dataset_sanity_checking Module Contents --------------- .. py:function:: polars_dataframe_indexer(df_polars: polars.DataFrame, idx_entity: polars.DataFrame, idx_relation: polars.DataFrame) -> polars.DataFrame Replaces 'subject', 'relation', and 'object' columns in the input Polars DataFrame with their corresponding index values from the entity and relation index DataFrames. This function processes the DataFrame in three main steps: 1. Replace the 'relation' values with the corresponding index from `idx_relation`. 2. Replace the 'subject' values with the corresponding index from `idx_entity`. 3. Replace the 'object' values with the corresponding index from `idx_entity`. Parameters: ----------- df_polars : polars.DataFrame The input Polars DataFrame containing columns: 'subject', 'relation', and 'object'. idx_entity : polars.DataFrame A Polars DataFrame that contains the mapping between entity names and their corresponding indices. Must have columns: 'entity' and 'index'. idx_relation : polars.DataFrame A Polars DataFrame that contains the mapping between relation names and their corresponding indices. Must have columns: 'relation' and 'index'. Returns: -------- polars.DataFrame A DataFrame with the 'subject', 'relation', and 'object' columns replaced by their corresponding indices. Example Usage: -------------- >>> df_polars = pl.DataFrame({ "subject": ["Alice", "Bob", "Charlie"], "relation": ["knows", "works_with", "lives_in"], "object": ["Dave", "Eve", "Frank"] }) >>> idx_entity = pl.DataFrame({ "entity": ["Alice", "Bob", "Charlie", "Dave", "Eve", "Frank"], "index": [0, 1, 2, 3, 4, 5] }) >>> idx_relation = pl.DataFrame({ "relation": ["knows", "works_with", "lives_in"], "index": [0, 1, 2] }) >>> polars_dataframe_indexer(df_polars, idx_entity, idx_relation) Steps: ------ 1. Join the input DataFrame `df_polars` on the 'relation' column with `idx_relation` to replace the relations with their indices. 2. Join on 'subject' to replace it with the corresponding entity index using a left join on `idx_entity`. 3. Join on 'object' to replace it with the corresponding entity index using a left join on `idx_entity`. 4. Select only the 'subject', 'relation', and 'object' columns to return the final result. .. py:function:: pandas_dataframe_indexer(df_pandas: pandas.DataFrame, idx_entity: pandas.DataFrame, idx_relation: pandas.DataFrame) -> pandas.DataFrame Replaces 'subject', 'relation', and 'object' columns in the input Pandas DataFrame with their corresponding index values from the entity and relation index DataFrames. Parameters: ----------- df_pandas : pd.DataFrame The input Pandas DataFrame containing columns: 'subject', 'relation', and 'object'. idx_entity : pd.DataFrame A Pandas DataFrame that contains the mapping between entity names and their corresponding indices. Must have columns: 'entity' and 'index'. idx_relation : pd.DataFrame A Pandas DataFrame that contains the mapping between relation names and their corresponding indices. Must have columns: 'relation' and 'index'. Returns: -------- pd.DataFrame A DataFrame with the 'subject', 'relation', and 'object' columns replaced by their corresponding indices. .. py:function:: apply_reciprical_or_noise(add_reciprical: bool, eval_model: str, df: object = None, info: str = None) (1) Add reciprocal triples (2) Add noisy triples .. py:function:: timeit(func) .. py:function:: read_with_polars(data_path, read_only_few: int = None, sample_triples_ratio: float = None, separator: str = None) -> polars.DataFrame Load and Preprocess via Polars .. py:function:: read_with_pandas(data_path, read_only_few: int = None, sample_triples_ratio: float = None, separator: str = None) .. py:function:: read_from_disk(data_path: str, read_only_few: int = None, sample_triples_ratio: float = None, backend: str = None, separator: str = None) -> Tuple[polars.DataFrame, pandas.DataFrame] .. py:function:: read_from_triple_store(endpoint: str = None) Read triples from triple store into pandas dataframe .. py:function:: get_er_vocab(data, file_path: str = None) .. py:function:: get_re_vocab(data, file_path: str = None) .. py:function:: get_ee_vocab(data, file_path: str = None) .. py:function:: create_constraints(triples, file_path: str = None) (1) Extract domains and ranges of relations (2) Store a mapping from relations to entities that are outside of the domain and range. Crete constrainted entities based on the range of relations :param triples: :return: Tuple[dict, dict] .. py:function:: load_with_pandas(self) -> None Deserialize data .. py:function:: save_numpy_ndarray(*, data: numpy.ndarray, file_path: str) .. py:function:: load_numpy_ndarray(*, file_path: str) .. py:function:: save_pickle(*, data: object, file_path=str) .. py:function:: load_pickle(*, file_path=str) .. py:function:: create_recipriocal_triples(x) Add inverse triples into dask dataframe :param x: :return: .. py:function:: dataset_sanity_checking(train_set: numpy.ndarray, num_entities: int, num_relations: int) -> None :param train_set: :param num_entities: :param num_relations: :return: