dicee.read_preprocess_save_load_kg.preprocess

Classes

PreprocessKG

Preprocess the data in memory

Module Contents

class dicee.read_preprocess_save_load_kg.preprocess.PreprocessKG(kg)[source]

Preprocess the data in memory

kg
start() None[source]

Preprocess train, valid and test datasets stored in knowledge graph instance

Parameter

rtype:

None

preprocess_with_byte_pair_encoding()[source]
preprocess_with_byte_pair_encoding_with_padding() None[source]
preprocess_with_pandas() None[source]

Preprocess train, valid and test datasets stored in knowledge graph instance with pandas

  1. Add recipriocal or noisy triples

  2. Construct vocabulary

  3. Index datasets

Parameter

rtype:

None

preprocess_with_polars() None[source]
sequential_vocabulary_construction() None[source]
  1. Read input data into memory

  2. Remove triples with a condition

  3. Serialize vocabularies in a pandas dataframe where

    => the index is integer and => a single column is string (e.g. URI)

remove_triples_from_train_with_condition()[source]