dicee.read_preprocess_save_load_kg.preprocess

Classes

PreprocessKG

Preprocess the data in memory

Module Contents

class dicee.read_preprocess_save_load_kg.preprocess.PreprocessKG(kg)

Preprocess the data in memory

kg
start() None

Preprocess train, valid and test datasets stored in knowledge graph instance

Parameter

rtype:

None

preprocess_with_byte_pair_encoding()
preprocess_with_byte_pair_encoding_with_padding() None

Preprocess with byte pair encoding and add padding

preprocess_with_pandas() None

Preprocess with pandas: add reciprocal triples, construct vocabulary, and index datasets

preprocess_with_polars() None

Preprocess with polars: add reciprocal triples and create indexed datasets

sequential_vocabulary_construction() None
  1. Read input data into memory

  2. Remove triples with a condition

  3. Serialize vocabularies in a pandas dataframe where

    => the index is integer and => a single column is string (e.g. URI)