dicee.read_preprocess_save_load_kg.util
=======================================

.. py:module:: dicee.read_preprocess_save_load_kg.util


Functions
---------

.. autoapisummary::

   dicee.read_preprocess_save_load_kg.util.polars_dataframe_indexer
   dicee.read_preprocess_save_load_kg.util.pandas_dataframe_indexer
   dicee.read_preprocess_save_load_kg.util.apply_reciprical_or_noise
   dicee.read_preprocess_save_load_kg.util.timeit
   dicee.read_preprocess_save_load_kg.util.read_with_polars
   dicee.read_preprocess_save_load_kg.util.read_with_pandas
   dicee.read_preprocess_save_load_kg.util.read_from_disk
   dicee.read_preprocess_save_load_kg.util.read_from_triple_store
   dicee.read_preprocess_save_load_kg.util.get_er_vocab
   dicee.read_preprocess_save_load_kg.util.get_re_vocab
   dicee.read_preprocess_save_load_kg.util.get_ee_vocab
   dicee.read_preprocess_save_load_kg.util.create_constraints
   dicee.read_preprocess_save_load_kg.util.load_with_pandas
   dicee.read_preprocess_save_load_kg.util.save_numpy_ndarray
   dicee.read_preprocess_save_load_kg.util.load_numpy_ndarray
   dicee.read_preprocess_save_load_kg.util.save_pickle
   dicee.read_preprocess_save_load_kg.util.load_pickle
   dicee.read_preprocess_save_load_kg.util.create_recipriocal_triples
   dicee.read_preprocess_save_load_kg.util.dataset_sanity_checking


Module Contents
---------------

.. py:function:: polars_dataframe_indexer(df_polars: polars.DataFrame, idx_entity: polars.DataFrame, idx_relation: polars.DataFrame) -> polars.DataFrame

   Replaces 'subject', 'relation', and 'object' columns in the input Polars DataFrame with their corresponding index values
   from the entity and relation index DataFrames.

   This function processes the DataFrame in three main steps:
   1. Replace the 'relation' values with the corresponding index from `idx_relation`.
   2. Replace the 'subject' values with the corresponding index from `idx_entity`.
   3. Replace the 'object' values with the corresponding index from `idx_entity`.

   Parameters:
   -----------
   df_polars : polars.DataFrame
       The input Polars DataFrame containing columns: 'subject', 'relation', and 'object'.

   idx_entity : polars.DataFrame
       A Polars DataFrame that contains the mapping between entity names and their corresponding indices.
       Must have columns: 'entity' and 'index'.

   idx_relation : polars.DataFrame
       A Polars DataFrame that contains the mapping between relation names and their corresponding indices.
       Must have columns: 'relation' and 'index'.

   Returns:
   --------
   polars.DataFrame
       A DataFrame with the 'subject', 'relation', and 'object' columns replaced by their corresponding indices.

   Example Usage:
   --------------
   >>> df_polars = pl.DataFrame({
           "subject": ["Alice", "Bob", "Charlie"],
           "relation": ["knows", "works_with", "lives_in"],
           "object": ["Dave", "Eve", "Frank"]
       })
   >>> idx_entity = pl.DataFrame({
           "entity": ["Alice", "Bob", "Charlie", "Dave", "Eve", "Frank"],
           "index": [0, 1, 2, 3, 4, 5]
       })
   >>> idx_relation = pl.DataFrame({
           "relation": ["knows", "works_with", "lives_in"],
           "index": [0, 1, 2]
       })
   >>> polars_dataframe_indexer(df_polars, idx_entity, idx_relation)

   Steps:
   ------
   1. Join the input DataFrame `df_polars` on the 'relation' column with `idx_relation` to replace the relations with their indices.
   2. Join on 'subject' to replace it with the corresponding entity index using a left join on `idx_entity`.
   3. Join on 'object' to replace it with the corresponding entity index using a left join on `idx_entity`.
   4. Select only the 'subject', 'relation', and 'object' columns to return the final result.


.. py:function:: pandas_dataframe_indexer(df_pandas: pandas.DataFrame, idx_entity: pandas.DataFrame, idx_relation: pandas.DataFrame) -> pandas.DataFrame

   Replaces 'subject', 'relation', and 'object' columns in the input Pandas DataFrame with their corresponding index values
   from the entity and relation index DataFrames.

   Parameters:
   -----------
   df_pandas : pd.DataFrame
       The input Pandas DataFrame containing columns: 'subject', 'relation', and 'object'.

   idx_entity : pd.DataFrame
       A Pandas DataFrame that contains the mapping between entity names and their corresponding indices.
       Must have columns: 'entity' and 'index'.

   idx_relation : pd.DataFrame
       A Pandas DataFrame that contains the mapping between relation names and their corresponding indices.
       Must have columns: 'relation' and 'index'.

   Returns:
   --------
   pd.DataFrame
       A DataFrame with the 'subject', 'relation', and 'object' columns replaced by their corresponding indices.


.. py:function:: apply_reciprical_or_noise(add_reciprical: bool, eval_model: str, df: object = None, info: str = None)

   (1) Add reciprocal triples (2) Add noisy triples


.. py:function:: timeit(func)

.. py:function:: read_with_polars(data_path, read_only_few: int = None, sample_triples_ratio: float = None, separator: str = None) -> polars.DataFrame

   Load and Preprocess via Polars


.. py:function:: read_with_pandas(data_path, read_only_few: int = None, sample_triples_ratio: float = None, separator: str = None)

.. py:function:: read_from_disk(data_path: str, read_only_few: int = None, sample_triples_ratio: float = None, backend: str = None, separator: str = None) -> Tuple[polars.DataFrame, pandas.DataFrame]

.. py:function:: read_from_triple_store(endpoint: str = None)

   Read triples from triple store into pandas dataframe


.. py:function:: get_er_vocab(data, file_path: str = None)

.. py:function:: get_re_vocab(data, file_path: str = None)

.. py:function:: get_ee_vocab(data, file_path: str = None)

.. py:function:: create_constraints(triples, file_path: str = None)

   (1) Extract domains and ranges of relations
   (2) Store a mapping from relations to entities that are outside of the domain and range.
   Crete constrainted entities based on the range of relations
   :param triples:
   :return:
   Tuple[dict, dict]


.. py:function:: load_with_pandas(self) -> None

   Deserialize data


.. py:function:: save_numpy_ndarray(*, data: numpy.ndarray, file_path: str)

.. py:function:: load_numpy_ndarray(*, file_path: str)

.. py:function:: save_pickle(*, data: object, file_path=str)

.. py:function:: load_pickle(*, file_path=str)

.. py:function:: create_recipriocal_triples(x)

   Add inverse triples into dask dataframe
   :param x:
   :return:


.. py:function:: dataset_sanity_checking(train_set: numpy.ndarray, num_entities: int, num_relations: int) -> None

   :param train_set:
   :param num_entities:
   :param num_relations:
   :return: