dicee

Submodules

Attributes

__version__

Classes

Pyke

A Physical Embedding Model for Knowledge Graphs

DistMult

Embedding Entities and Relations for Learning and Inference in Knowledge Bases

KeciBase

Without learning dimension scaling

Keci

Base class for all neural network modules.

TransE

Translating Embeddings for Modeling

DeCaL

Base class for all neural network modules.

DualE

Dual Quaternion Knowledge Graph Embeddings (https://ojs.aaai.org/index.php/AAAI/article/download/16850/16657)

ComplEx

Base class for all neural network modules.

AConEx

Additive Convolutional ComplEx Knowledge Graph Embeddings

AConvO

Additive Convolutional Octonion Knowledge Graph Embeddings

AConvQ

Additive Convolutional Quaternion Knowledge Graph Embeddings

ConvQ

Convolutional Quaternion Knowledge Graph Embeddings

ConvO

Base class for all neural network modules.

ConEx

Convolutional ComplEx Knowledge Graph Embeddings

QMult

Base class for all neural network modules.

OMult

Base class for all neural network modules.

Shallom

A shallow neural model for relation prediction (https://arxiv.org/abs/2101.09090)

LFMult

Embedding with polynomial functions. We represent all entities and relations in the polynomial space as:

PykeenKGE

A class for using knowledge graph embedding models implemented in Pykeen

BytE

Base class for all neural network modules.

BaseKGE

Base class for all neural network modules.

EnsembleKGE

DICE_Trainer

DICE_Trainer implement

KGE

Knowledge Graph Embedding Class for interactive usage of pre-trained models

Execute

A class for Training, Retraining and Evaluation a model.

BPE_NegativeSamplingDataset

An abstract class representing a Dataset.

MultiLabelDataset

An abstract class representing a Dataset.

MultiClassClassificationDataset

Dataset for the 1vsALL training strategy

OnevsAllDataset

Dataset for the 1vsALL training strategy

KvsAll

Creates a dataset for KvsAll training by inheriting from torch.utils.data.Dataset.

AllvsAll

Creates a dataset for AllvsAll training by inheriting from torch.utils.data.Dataset.

OnevsSample

A custom PyTorch Dataset class for knowledge graph embeddings, which includes

KvsSampleDataset

KvsSample a Dataset:

NegSampleDataset

An abstract class representing a Dataset.

TriplePredictionDataset

Triple Dataset

CVDataModule

Create a Dataset for cross validation

QueryGenerator

Functions

create_recipriocal_triples(x)

Add inverse triples into dask dataframe

get_er_vocab(data[, file_path])

get_re_vocab(data[, file_path])

get_ee_vocab(data[, file_path])

timeit(func)

save_pickle(*[, data, file_path])

load_pickle([file_path])

load_term_mapping([file_path])

select_model(args[, is_continual_training, storage_path])

load_model(→ Tuple[object, Tuple[dict, dict]])

Load weights and initialize pytorch module from namespace arguments

load_model_ensemble(...)

Construct Ensemble Of weights and initialize pytorch module from namespace arguments

save_numpy_ndarray(*, data, file_path)

numpy_data_type_changer(→ numpy.ndarray)

Detect most efficient data type for a given triples

save_checkpoint_model(→ None)

Store Pytorch model into disk

store(→ None)

add_noisy_triples(→ pandas.DataFrame)

Add randomly constructed triples

read_or_load_kg(args, cls)

intialize_model(→ Tuple[object, str])

load_json(→ dict)

save_embeddings(→ None)

Save it as CSV if memory allows.

random_prediction(pre_trained_kge)

deploy_triple_prediction(pre_trained_kge, str_subject, ...)

deploy_tail_entity_prediction(pre_trained_kge, ...)

deploy_head_entity_prediction(pre_trained_kge, ...)

deploy_relation_prediction(pre_trained_kge, ...)

vocab_to_parquet(vocab_to_idx, name, ...)

create_experiment_folder([folder_name])

continual_training_setup_executor(→ None)

exponential_function(→ torch.FloatTensor)

load_numpy(→ numpy.ndarray)

evaluate(entity_to_idx, scores, easy_answers, hard_answers)

# @TODO: CD: Renamed this function

download_file(url[, destination_folder])

download_files_from_url(→ None)

download_pretrained_model(→ str)

write_csv_from_model_parallel(path)

Create

from_pretrained_model_write_embeddings_into_csv(→ None)

mapping_from_first_two_cols_to_third(train_set_idx)

timeit(func)

load_term_mapping([file_path])

reload_dataset(path, form_of_labelling, ...)

Reload the files from disk to construct the Pytorch dataset

construct_dataset(→ torch.utils.data.Dataset)

Package Contents

class dicee.Pyke(args)[source]

Bases: dicee.models.base_model.BaseKGE

A Physical Embedding Model for Knowledge Graphs

name = 'Pyke'
dist_func
margin = 1.0
forward_triples(x: torch.LongTensor)[source]
Parameters:

x

class dicee.DistMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Embedding Entities and Relations for Learning and Inference in Knowledge Bases https://arxiv.org/abs/1412.6575

name = 'DistMult'
k_vs_all_score(emb_h: torch.FloatTensor, emb_r: torch.FloatTensor, emb_E: torch.FloatTensor)[source]
Parameters:
  • emb_h

  • emb_r

  • emb_E

forward_k_vs_all(x: torch.LongTensor)[source]
forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor)[source]
score(h, r, t)[source]
class dicee.KeciBase(args)[source]

Bases: Keci

Without learning dimension scaling

name = 'KeciBase'
requires_grad_for_interactions = False
class dicee.Keci(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'Keci'
p
q
r
requires_grad_for_interactions = True
compute_sigma_pp(hp, rp)[source]

Compute sigma_{pp} = sum_{i=1}^{p-1} sum_{k=i+1}^p (h_i r_k - h_k r_i) e_i e_k

sigma_{pp} captures the interactions between along p bases For instance, let p e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for i in range(p - 1):

for k in range(i + 1, p):

results.append(hp[:, :, i] * rp[:, :, k] - hp[:, :, k] * rp[:, :, i])

sigma_pp = torch.stack(results, dim=2) assert sigma_pp.shape == (b, r, int((p * (p - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_qq(hq, rq)[source]

Compute sigma_{qq} = sum_{j=1}^{p+q-1} sum_{k=j+1}^{p+q} (h_j r_k - h_k r_j) e_j e_k sigma_{q} captures the interactions between along q bases For instance, let q e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for j in range(q - 1):

for k in range(j + 1, q):

results.append(hq[:, :, j] * rq[:, :, k] - hq[:, :, k] * rq[:, :, j])

sigma_qq = torch.stack(results, dim=2) assert sigma_qq.shape == (b, r, int((q * (q - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_pq(*, hp, hq, rp, rq)[source]

sum_{i=1}^{p} sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):

sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

apply_coefficients(hp, hq, rp, rq)[source]

Multiplying a base vector with its scalar coefficient

clifford_multiplication(h0, hp, hq, r0, rp, rq)[source]

Compute our CL multiplication

h = h_0 + sum_{i=1}^p h_i e_i + sum_{j=p+1}^{p+q} h_j e_j r = r_0 + sum_{i=1}^p r_i e_i + sum_{j=p+1}^{p+q} r_j e_j

ei ^2 = +1 for i =< i =< p ej ^2 = -1 for p < j =< p+q ei ej = -eje1 for i

eq j

h r = sigma_0 + sigma_p + sigma_q + sigma_{pp} + sigma_{q}+ sigma_{pq} where

  1. sigma_0 = h_0 r_0 + sum_{i=1}^p (h_0 r_i) e_i - sum_{j=p+1}^{p+q} (h_j r_j) e_j

  2. sigma_p = sum_{i=1}^p (h_0 r_i + h_i r_0) e_i

  3. sigma_q = sum_{j=p+1}^{p+q} (h_0 r_j + h_j r_0) e_j

  4. sigma_{pp} = sum_{i=1}^{p-1} sum_{k=i+1}^p (h_i r_k - h_k r_i) e_i e_k

  5. sigma_{qq} = sum_{j=1}^{p+q-1} sum_{k=j+1}^{p+q} (h_j r_k - h_k r_j) e_j e_k

  6. sigma_{pq} = sum_{i=1}^{p} sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j

construct_cl_multivector(x: torch.FloatTensor, r: int, p: int, q: int) tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor][source]

Construct a batch of multivectors Cl_{p,q}(mathbb{R}^d)

Parameter

x: torch.FloatTensor with (n,d) shape

returns:
  • a0 (torch.FloatTensor with (n,r) shape)

  • ap (torch.FloatTensor with (n,r,p) shape)

  • aq (torch.FloatTensor with (n,r,q) shape)

forward_k_vs_with_explicit(x: torch.Tensor)[source]
k_vs_all_score(bpe_head_ent_emb, bpe_rel_ent_emb, E)[source]
forward_k_vs_all(x: torch.Tensor) torch.FloatTensor[source]

Kvsall training

  1. Retrieve real-valued embedding vectors for heads and relations mathbb{R}^d .

  2. Construct head entity and relation embeddings according to Cl_{p,q}(mathbb{R}^d) .

  3. Perform Cl multiplication

  4. Inner product of (3) and all entity embeddings

forward_k_vs_with_explicit and this funcitons are identical Parameter ——— x: torch.LongTensor with (n,2) shape :rtype: torch.FloatTensor with (n, |E|) shape

construct_batch_selected_cl_multivector(x: torch.FloatTensor, r: int, p: int, q: int) tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor][source]

Construct a batch of batchs multivectors Cl_{p,q}(mathbb{R}^d)

Parameter

x: torch.FloatTensor with (n,k, d) shape

returns:
  • a0 (torch.FloatTensor with (n,k, m) shape)

  • ap (torch.FloatTensor with (n,k, m, p) shape)

  • aq (torch.FloatTensor with (n,k, m, q) shape)

forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor) torch.FloatTensor[source]

Parameter

x: torch.LongTensor with (n,2) shape

target_entity_idx: torch.LongTensor with (n, k ) shape k denotes the selected number of examples.

rtype:

torch.FloatTensor with (n, k) shape

score(h, r, t)[source]
forward_triples(x: torch.Tensor) torch.FloatTensor[source]

Parameter

x: torch.LongTensor with (n,3) shape

rtype:

torch.FloatTensor with (n) shape

class dicee.TransE(args)[source]

Bases: dicee.models.base_model.BaseKGE

Translating Embeddings for Modeling Multi-relational Data https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf

name = 'TransE'
margin = 4
score(head_ent_emb, rel_ent_emb, tail_ent_emb)[source]
forward_k_vs_all(x: torch.Tensor) torch.FloatTensor[source]
class dicee.DeCaL(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'DeCaL'
entity_embeddings
relation_embeddings
p
q
r
re
forward_triples(x: torch.Tensor) torch.FloatTensor[source]

Parameter

x: torch.LongTensor with (n, ) shape

rtype:

torch.FloatTensor with (n) shape

cl_pqr(a: torch.tensor) torch.tensor[source]

Input: tensor(batch_size, emb_dim) —> output: tensor with 1+p+q+r components with size (batch_size, emb_dim/(1+p+q+r)) each.

1) takes a tensor of size (batch_size, emb_dim), split it into 1 + p + q +r components, hence 1+p+q+r must be a divisor of the emb_dim. 2) Return a list of the 1+p+q+r components vectors, each are tensors of size (batch_size, emb_dim/(1+p+q+r))

compute_sigmas_single(list_h_emb, list_r_emb, list_t_emb)[source]

here we compute all the sums with no others vectors interaction taken with the scalar product with t, that is,

\[s0 = h_0r_0t_0 s1 = \sum_{i=1}^{p}h_ir_it_0 s2 = \sum_{j=p+1}^{p+q}h_jr_jt_0 s3 = \sum_{i=1}^{q}(h_0r_it_i + h_ir_0t_i) s4 = \sum_{i=p+1}^{p+q}(h_0r_it_i + h_ir_0t_i) s5 = \sum_{i=p+q+1}^{p+q+r}(h_0r_it_i + h_ir_0t_i)\]

and return:

\[sigma_0t = \sigma_0 \cdot t_0 = s0 + s1 -s2 s3, s4 and s5\]
compute_sigmas_multivect(list_h_emb, list_r_emb)[source]

Here we compute and return all the sums with vectors interaction for the same and different bases.

For same bases vectors interaction we have

\[\sigma_pp = \sum_{i=1}^{p-1}\sum_{i'=i+1}^{p}(h_ir_{i'}-h_{i'}r_i) (models the interactions between e_i and e_i' for 1 <= i, i' <= p) \sigma_qq = \sum_{j=p+1}^{p+q-1}\sum_{j'=j+1}^{p+q}(h_jr_{j'}-h_{j'} (models the interactions between e_j and e_j' for p+1 <= j, j' <= p+q) \sigma_rr = \sum_{k=p+q+1}^{p+q+r-1}\sum_{k'=k+1}^{p}(h_kr_{k'}-h_{k'}r_k) (models the interactions between e_k and e_k' for p+q+1 <= k, k' <= p+q+r)\]

For different base vector interactions, we have

\[\sigma_pq = \sum_{i=1}^{p}\sum_{j=p+1}^{p+q}(h_ir_j - h_jr_i) (interactionsn between e_i and e_j for 1<=i <=p and p+1<= j <= p+q) \sigma_pr = \sum_{i=1}^{p}\sum_{k=p+q+1}^{p+q+r}(h_ir_k - h_kr_i) (interactionsn between e_i and e_k for 1<=i <=p and p+q+1<= k <= p+q+r) \sigma_qr = \sum_{j=p+1}^{p+q}\sum_{j=p+q+1}^{p+q+r}(h_jr_k - h_kr_j) (interactionsn between e_j and e_k for p+1 <= j <=p+q and p+q+1<= j <= p+q+r)\]
forward_k_vs_all(x: torch.Tensor) torch.FloatTensor[source]

Kvsall training

  1. Retrieve real-valued embedding vectors for heads and relations

  2. Construct head entity and relation embeddings according to Cl_{p,q, r}(mathbb{R}^d) .

  3. Perform Cl multiplication

  4. Inner product of (3) and all entity embeddings

forward_k_vs_with_explicit and this funcitons are identical Parameter ——— x: torch.LongTensor with (n, ) shape :rtype: torch.FloatTensor with (n, |E|) shape

apply_coefficients(h0, hp, hq, hk, r0, rp, rq, rk)[source]

Multiplying a base vector with its scalar coefficient

construct_cl_multivector(x: torch.FloatTensor, re: int, p: int, q: int, r: int) tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor][source]

Construct a batch of multivectors Cl_{p,q,r}(mathbb{R}^d)

Parameter

x: torch.FloatTensor with (n,d) shape

returns:
  • a0 (torch.FloatTensor)

  • ap (torch.FloatTensor)

  • aq (torch.FloatTensor)

  • ar (torch.FloatTensor)

compute_sigma_pp(hp, rp)[source]

Compute .. math:

\sigma_{p,p}^* = \sum_{i=1}^{p-1}\sum_{i'=i+1}^{p}(x_iy_{i'}-x_{i'}y_i)

sigma_{pp} captures the interactions between along p bases For instance, let p e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for i in range(p - 1):

for k in range(i + 1, p):

results.append(hp[:, :, i] * rp[:, :, k] - hp[:, :, k] * rp[:, :, i])

sigma_pp = torch.stack(results, dim=2) assert sigma_pp.shape == (b, r, int((p * (p - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_qq(hq, rq)[source]

Compute

\[\sigma_{q,q}^* = \sum_{j=p+1}^{p+q-1}\sum_{j'=j+1}^{p+q}(x_jy_{j'}-x_{j'}y_j) Eq. 16\]

sigma_{q} captures the interactions between along q bases For instance, let q e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for j in range(q - 1):

for k in range(j + 1, q):

results.append(hq[:, :, j] * rq[:, :, k] - hq[:, :, k] * rq[:, :, j])

sigma_qq = torch.stack(results, dim=2) assert sigma_qq.shape == (b, r, int((q * (q - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_rr(hk, rk)[source]
\[\sigma_{r,r}^* = \sum_{k=p+q+1}^{p+q+r-1}\sum_{k'=k+1}^{p}(x_ky_{k'}-x_{k'}y_k)\]
compute_sigma_pq(*, hp, hq, rp, rq)[source]

Compute

\[\sum_{i=1}^{p} \sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j\]

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):

sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

compute_sigma_pr(*, hp, hk, rp, rk)[source]

Compute

\[\sum_{i=1}^{p} \sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j\]

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):

sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

compute_sigma_qr(*, hq, hk, rq, rk)[source]
\[\sum_{i=1}^{p} \sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j\]

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):

sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

class dicee.DualE(args)[source]

Bases: dicee.models.base_model.BaseKGE

Dual Quaternion Knowledge Graph Embeddings (https://ojs.aaai.org/index.php/AAAI/article/download/16850/16657)

name = 'DualE'
entity_embeddings
relation_embeddings
num_ent = None
kvsall_score(e_1_h, e_2_h, e_3_h, e_4_h, e_5_h, e_6_h, e_7_h, e_8_h, e_1_t, e_2_t, e_3_t, e_4_t, e_5_t, e_6_t, e_7_t, e_8_t, r_1, r_2, r_3, r_4, r_5, r_6, r_7, r_8) torch.tensor[source]

KvsAll scoring function

Input

x: torch.LongTensor with (n, ) shape

Output

torch.FloatTensor with (n) shape

forward_triples(idx_triple: torch.tensor) torch.tensor[source]

Negative Sampling forward pass:

Input

x: torch.LongTensor with (n, ) shape

Output

torch.FloatTensor with (n) shape

forward_k_vs_all(x)[source]

KvsAll forward pass

Input

x: torch.LongTensor with (n, ) shape

Output

torch.FloatTensor with (n) shape

T(x: torch.tensor) torch.tensor[source]

Transpose function

Input: Tensor with shape (nxm) Output: Tensor with shape (mxn)

class dicee.ComplEx(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'ComplEx'
static score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor)[source]
static k_vs_all_score(emb_h: torch.FloatTensor, emb_r: torch.FloatTensor, emb_E: torch.FloatTensor)[source]
Parameters:
  • emb_h

  • emb_r

  • emb_E

forward_k_vs_all(x: torch.LongTensor) torch.FloatTensor[source]
forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor)[source]
class dicee.AConEx(args)[source]

Bases: dicee.models.base_model.BaseKGE

Additive Convolutional ComplEx Knowledge Graph Embeddings

name = 'AConEx'
conv2d
fc_num_input
fc1
norm_fc1
bn_conv2d
feature_map_dropout
residual_convolution(C_1: Tuple[torch.Tensor, torch.Tensor], C_2: Tuple[torch.Tensor, torch.Tensor]) torch.FloatTensor[source]

Compute residual score of two complex-valued embeddings. :param C_1: a tuple of two pytorch tensors that corresponds complex-valued embeddings :param C_2: a tuple of two pytorch tensors that corresponds complex-valued embeddings :return:

forward_k_vs_all(x: torch.Tensor) torch.FloatTensor[source]
forward_triples(x: torch.Tensor) torch.FloatTensor[source]
Parameters:

x

forward_k_vs_sample(x: torch.Tensor, target_entity_idx: torch.Tensor)[source]
class dicee.AConvO(args: dict)[source]

Bases: dicee.models.base_model.BaseKGE

Additive Convolutional Octonion Knowledge Graph Embeddings

name = 'AConvO'
conv2d
fc_num_input
fc1
bn_conv2d
norm_fc1
feature_map_dropout
static octonion_normalizer(emb_rel_e0, emb_rel_e1, emb_rel_e2, emb_rel_e3, emb_rel_e4, emb_rel_e5, emb_rel_e6, emb_rel_e7)[source]
residual_convolution(O_1, O_2)[source]
forward_triples(x: torch.Tensor) torch.Tensor[source]
Parameters:

x

forward_k_vs_all(x: torch.Tensor)[source]

Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.AConvQ(args)[source]

Bases: dicee.models.base_model.BaseKGE

Additive Convolutional Quaternion Knowledge Graph Embeddings

name = 'AConvQ'
entity_embeddings
relation_embeddings
conv2d
fc_num_input
fc1
bn_conv1
bn_conv2
feature_map_dropout
residual_convolution(Q_1, Q_2)[source]
forward_triples(indexed_triple: torch.Tensor) torch.Tensor[source]
Parameters:

x

forward_k_vs_all(x: torch.Tensor)[source]

Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.ConvQ(args)[source]

Bases: dicee.models.base_model.BaseKGE

Convolutional Quaternion Knowledge Graph Embeddings

name = 'ConvQ'
entity_embeddings
relation_embeddings
conv2d
fc_num_input
fc1
bn_conv1
bn_conv2
feature_map_dropout
residual_convolution(Q_1, Q_2)[source]
forward_triples(indexed_triple: torch.Tensor) torch.Tensor[source]
Parameters:

x

forward_k_vs_all(x: torch.Tensor)[source]

Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.ConvO(args: dict)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'ConvO'
conv2d
fc_num_input
fc1
bn_conv2d
norm_fc1
feature_map_dropout
static octonion_normalizer(emb_rel_e0, emb_rel_e1, emb_rel_e2, emb_rel_e3, emb_rel_e4, emb_rel_e5, emb_rel_e6, emb_rel_e7)[source]
residual_convolution(O_1, O_2)[source]
forward_triples(x: torch.Tensor) torch.Tensor[source]
Parameters:

x

forward_k_vs_all(x: torch.Tensor)[source]

Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.ConEx(args)[source]

Bases: dicee.models.base_model.BaseKGE

Convolutional ComplEx Knowledge Graph Embeddings

name = 'ConEx'
conv2d
fc_num_input
fc1
norm_fc1
bn_conv2d
feature_map_dropout
residual_convolution(C_1: Tuple[torch.Tensor, torch.Tensor], C_2: Tuple[torch.Tensor, torch.Tensor]) torch.FloatTensor[source]

Compute residual score of two complex-valued embeddings. :param C_1: a tuple of two pytorch tensors that corresponds complex-valued embeddings :param C_2: a tuple of two pytorch tensors that corresponds complex-valued embeddings :return:

forward_k_vs_all(x: torch.Tensor) torch.FloatTensor[source]
forward_triples(x: torch.Tensor) torch.FloatTensor[source]
Parameters:

x

forward_k_vs_sample(x: torch.Tensor, target_entity_idx: torch.Tensor)[source]
class dicee.QMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'QMult'
explicit = True
quaternion_multiplication_followed_by_inner_product(h, r, t)[source]
Parameters:
  • h – shape: (*batch_dims, dim) The head representations.

  • r – shape: (*batch_dims, dim) The head representations.

  • t – shape: (*batch_dims, dim) The tail representations.

Returns:

Triple scores.

static quaternion_normalizer(x: torch.FloatTensor) torch.FloatTensor[source]

Normalize the length of relation vectors, if the forward constraint has not been applied yet.

Absolute value of a quaternion

\[|a + bi + cj + dk| = \sqrt{a^2 + b^2 + c^2 + d^2}\]

L2 norm of quaternion vector:

\[\|x\|^2 = \sum_{i=1}^d |x_i|^2 = \sum_{i=1}^d (x_i.re^2 + x_i.im_1^2 + x_i.im_2^2 + x_i.im_3^2)\]
Parameters:

x – The vector.

Returns:

The normalized vector.

score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor)[source]
k_vs_all_score(bpe_head_ent_emb, bpe_rel_ent_emb, E)[source]
Parameters:
  • bpe_head_ent_emb

  • bpe_rel_ent_emb

  • E

forward_k_vs_all(x)[source]
Parameters:

x

forward_k_vs_sample(x, target_entity_idx)[source]

Completed. Given a head entity and a relation (h,r), we compute scores for all possible triples,i.e., [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.OMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'OMult'
static octonion_normalizer(emb_rel_e0, emb_rel_e1, emb_rel_e2, emb_rel_e3, emb_rel_e4, emb_rel_e5, emb_rel_e6, emb_rel_e7)[source]
score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor)[source]
k_vs_all_score(bpe_head_ent_emb, bpe_rel_ent_emb, E)[source]
forward_k_vs_all(x)[source]

Completed. Given a head entity and a relation (h,r), we compute scores for all possible triples,i.e., [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.Shallom(args)[source]

Bases: dicee.models.base_model.BaseKGE

A shallow neural model for relation prediction (https://arxiv.org/abs/2101.09090)

name = 'Shallom'
shallom
get_embeddings() Tuple[numpy.ndarray, None][source]
forward_k_vs_all(x) torch.FloatTensor[source]
forward_triples(x) torch.FloatTensor[source]
Parameters:

x

Returns:

class dicee.LFMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Embedding with polynomial functions. We represent all entities and relations in the polynomial space as: f(x) = sum_{i=0}^{d-1} a_k x^{i%d} and use the three differents scoring function as in the paper to evaluate the score. We also consider combining with Neural Networks.

name = 'LFMult'
entity_embeddings
relation_embeddings
degree
m
x_values
forward_triples(idx_triple)[source]
Parameters:

x

construct_multi_coeff(x)[source]
poly_NN(x, coefh, coefr, coeft)[source]

Constructing a 2 layers NN to represent the embeddings. h = sigma(wh^T x + bh ), r = sigma(wr^T x + br ), t = sigma(wt^T x + bt )

linear(x, w, b)[source]
scalar_batch_NN(a, b, c)[source]

element wise multiplication between a,b and c: Inputs : a, b, c ====> torch.tensor of size batch_size x m x d Output : a tensor of size batch_size x d

tri_score(coeff_h, coeff_r, coeff_t)[source]

this part implement the trilinear scoring techniques:

score(h,r,t) = int_{0}{1} h(x)r(x)t(x) dx = sum_{i,j,k = 0}^{d-1} dfrac{a_i*b_j*c_k}{1+(i+j+k)%d}

  1. generate the range for i,j and k from [0 d-1]

2. perform dfrac{a_i*b_j*c_k}{1+(i+j+k)%d} in parallel for every batch

  1. take the sum over each batch

vtp_score(h, r, t)[source]

this part implement the vector triple product scoring techniques:

score(h,r,t) = int_{0}{1} h(x)r(x)t(x) dx = sum_{i,j,k = 0}^{d-1} dfrac{a_i*c_j*b_k - b_i*c_j*a_k}{(1+(i+j)%d)(1+k)}

  1. generate the range for i,j and k from [0 d-1]

  2. Compute the first and second terms of the sum

  3. Multiply with then denominator and take the sum

  4. take the sum over each batch

comp_func(h, r, t)[source]

this part implement the function composition scoring techniques: i.e. score = <hor, t>

polynomial(coeff, x, degree)[source]

This function takes a matrix tensor of coefficients (coeff), a tensor vector of points x and range of integer [0,1,…d] and return a vector tensor (coeff[0][0] + coeff[0][1]x +…+ coeff[0][d]x^d,

coeff[1][0] + coeff[1][1]x +…+ coeff[1][d]x^d)
pop(coeff, x, degree)[source]

This function allow us to evaluate the composition of two polynomes without for loops :) it takes a matrix tensor of coefficients (coeff), a matrix tensor of points x and range of integer [0,1,…d]

and return a tensor (coeff[0][0] + coeff[0][1]x +…+ coeff[0][d]x^d,
coeff[1][0] + coeff[1][1]x +…+ coeff[1][d]x^d)
class dicee.PykeenKGE(args: dict)[source]

Bases: dicee.models.base_model.BaseKGE

A class for using knowledge graph embedding models implemented in Pykeen

Notes: Pykeen_DistMult: C Pykeen_ComplEx: Pykeen_QuatE: Pykeen_MuRE: Pykeen_CP: Pykeen_HolE: Pykeen_HolE:

model_kwargs
name
model
loss_history = []
args
entity_embeddings = None
relation_embeddings = None
forward_k_vs_all(x: torch.LongTensor)[source]

# => Explicit version by this we can apply bn and dropout

# (1) Retrieve embeddings of heads and relations + apply Dropout & Normalization if given. h, r = self.get_head_relation_representation(x) # (2) Reshape (1). if self.last_dim > 0:

h = h.reshape(len(x), self.embedding_dim, self.last_dim) r = r.reshape(len(x), self.embedding_dim, self.last_dim)

# (3) Reshape all entities. if self.last_dim > 0:

t = self.entity_embeddings.weight.reshape(self.num_entities, self.embedding_dim, self.last_dim)

else:

t = self.entity_embeddings.weight

# (4) Call the score_t from interactions to generate triple scores. return self.interaction.score_t(h=h, r=r, all_entities=t, slice_size=1)

forward_triples(x: torch.LongTensor) torch.FloatTensor[source]

# => Explicit version by this we can apply bn and dropout

# (1) Retrieve embeddings of heads, relations and tails and apply Dropout & Normalization if given. h, r, t = self.get_triple_representation(x) # (2) Reshape (1). if self.last_dim > 0:

h = h.reshape(len(x), self.embedding_dim, self.last_dim) r = r.reshape(len(x), self.embedding_dim, self.last_dim) t = t.reshape(len(x), self.embedding_dim, self.last_dim)

# (3) Compute the triple score return self.interaction.score(h=h, r=r, t=t, slice_size=None, slice_dim=0)

abstract forward_k_vs_sample(x: torch.LongTensor, target_entity_idx)[source]
class dicee.BytE(*args, **kwargs)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'BytE'
config
temperature = 0.5
topk = 2
transformer
lm_head
loss_function(yhat_batch, y_batch)[source]
Parameters:
  • yhat_batch

  • y_batch

forward(x: torch.LongTensor)[source]
Parameters:

x (B by T tensor)

generate(idx, max_new_tokens, temperature=1.0, top_k=None)[source]

Take a conditioning sequence of indices idx (LongTensor of shape (b,t)) and complete the sequence max_new_tokens times, feeding the predictions back into the model each time. Most likely you’ll want to make sure to be in model.eval() mode of operation for this.

training_step(batch, batch_idx=None)[source]

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:
  • batch – The output of your data iterable, normally a DataLoader.

  • batch_idx – The index of this batch.

  • dataloader_idx – The index of the dataloader that produced this batch. (only if multiple dataloaders used)

Returns:

  • Tensor - The loss tensor

  • dict - A dictionary which can include any keys, but must include the key 'loss' in the case of automatic optimization.

  • None - In automatic optimization, this will skip to the next batch (but is not supported for multi-GPU, TPU, or DeepSpeed). For manual optimization, this has no special meaning, as returning the loss is not required.

In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.

Example:

def training_step(self, batch, batch_idx):
    x, y, z = batch
    out = self.encoder(x)
    loss = self.loss(out, x)
    return loss

To use multiple optimizers, you can switch to ‘manual optimization’ and control their stepping:

def __init__(self):
    super().__init__()
    self.automatic_optimization = False


# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx):
    opt1, opt2 = self.optimizers()

    # do training_step with encoder
    ...
    opt1.step()
    # do training_step with decoder
    ...
    opt2.step()

Note

When accumulate_grad_batches > 1, the loss returned here will be automatically normalized by accumulate_grad_batches internally.

class dicee.BaseKGE(args: dict)[source]

Bases: BaseKGELightning

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

args
embedding_dim = None
num_entities = None
num_relations = None
num_tokens = None
learning_rate = None
apply_unit_norm = None
input_dropout_rate = None
hidden_dropout_rate = None
optimizer_name = None
feature_map_dropout_rate = None
kernel_size = None
num_of_output_channels = None
weight_decay = None
loss
selected_optimizer = None
normalizer_class = None
normalize_head_entity_embeddings
normalize_relation_embeddings
normalize_tail_entity_embeddings
hidden_normalizer
param_init
input_dp_ent_real
input_dp_rel_real
hidden_dropout
loss_history = []
byte_pair_encoding
max_length_subword_tokens
block_size
forward_byte_pair_encoded_k_vs_all(x: torch.LongTensor)[source]
Parameters:

x (B x 2 x T)

forward_byte_pair_encoded_triple(x: Tuple[torch.LongTensor, torch.LongTensor])[source]

byte pair encoded neural link predictors

Parameters:

-------

init_params_with_sanity_checking()[source]
forward(x: torch.LongTensor | Tuple[torch.LongTensor, torch.LongTensor], y_idx: torch.LongTensor = None)[source]
Parameters:
  • x

  • y_idx

  • ordered_bpe_entities

forward_triples(x: torch.LongTensor) torch.Tensor[source]
Parameters:

x

forward_k_vs_all(*args, **kwargs)[source]
forward_k_vs_sample(*args, **kwargs)[source]
get_triple_representation(idx_hrt)[source]
get_head_relation_representation(indexed_triple)[source]
get_sentence_representation(x: torch.LongTensor)[source]
Parameters:
  • (b (x shape)

  • 3

  • t)

get_bpe_head_and_relation_representation(x: torch.LongTensor) Tuple[torch.FloatTensor, torch.FloatTensor][source]
Parameters:

x (B x 2 x T)

get_embeddings() Tuple[numpy.ndarray, numpy.ndarray][source]
class dicee.EnsembleKGE(seed_model=None, pretrained_models: List = None)[source]
name
train_mode = True
named_children()[source]
property example_input_array
parameters()[source]
modules()[source]
__iter__()[source]
__len__()[source]
eval()[source]
to(device)[source]
mem_of_model()[source]
__call__(x_batch)[source]
step()[source]
get_embeddings()[source]
__str__()[source]
dicee.create_recipriocal_triples(x)[source]

Add inverse triples into dask dataframe :param x: :return:

dicee.get_er_vocab(data, file_path: str = None)[source]
dicee.get_re_vocab(data, file_path: str = None)[source]
dicee.get_ee_vocab(data, file_path: str = None)[source]
dicee.timeit(func)[source]
dicee.save_pickle(*, data: object = None, file_path=str)[source]
dicee.load_pickle(file_path=str)[source]
dicee.load_term_mapping(file_path=str)[source]
dicee.select_model(args: dict, is_continual_training: bool = None, storage_path: str = None)[source]
dicee.load_model(path_of_experiment_folder: str, model_name='model.pt', verbose=0) Tuple[object, Tuple[dict, dict]][source]

Load weights and initialize pytorch module from namespace arguments

dicee.load_model_ensemble(path_of_experiment_folder: str) Tuple[dicee.models.base_model.BaseKGE, Tuple[pandas.DataFrame, pandas.DataFrame]][source]

Construct Ensemble Of weights and initialize pytorch module from namespace arguments

  1. Detect models under given path

  2. Accumulate parameters of detected models

  3. Normalize parameters

  4. Insert (3) into model.

dicee.save_numpy_ndarray(*, data: numpy.ndarray, file_path: str)[source]
dicee.numpy_data_type_changer(train_set: numpy.ndarray, num: int) numpy.ndarray[source]

Detect most efficient data type for a given triples :param train_set: :param num: :return:

dicee.save_checkpoint_model(model, path: str) None[source]

Store Pytorch model into disk

dicee.store(trained_model, model_name: str = 'model', full_storage_path: str = None, save_embeddings_as_csv=False) None[source]
dicee.add_noisy_triples(train_set: pandas.DataFrame, add_noise_rate: float) pandas.DataFrame[source]

Add randomly constructed triples :param train_set: :param add_noise_rate: :return:

dicee.read_or_load_kg(args, cls)[source]
dicee.intialize_model(args: dict, verbose=0) Tuple[object, str][source]
dicee.load_json(p: str) dict[source]
dicee.save_embeddings(embeddings: numpy.ndarray, indexes, path: str) None[source]

Save it as CSV if memory allows. :param embeddings: :param indexes: :param path: :return:

dicee.random_prediction(pre_trained_kge)[source]
dicee.deploy_triple_prediction(pre_trained_kge, str_subject, str_predicate, str_object)[source]
dicee.deploy_tail_entity_prediction(pre_trained_kge, str_subject, str_predicate, top_k)[source]
dicee.deploy_head_entity_prediction(pre_trained_kge, str_object, str_predicate, top_k)[source]
dicee.deploy_relation_prediction(pre_trained_kge, str_subject, str_object, top_k)[source]
dicee.vocab_to_parquet(vocab_to_idx, name, path_for_serialization, print_into)[source]
dicee.create_experiment_folder(folder_name='Experiments')[source]
dicee.continual_training_setup_executor(executor) None[source]
dicee.exponential_function(x: numpy.ndarray, lam: float, ascending_order=True) torch.FloatTensor[source]
dicee.load_numpy(path) numpy.ndarray[source]
dicee.evaluate(entity_to_idx, scores, easy_answers, hard_answers)[source]

# @TODO: CD: Renamed this function Evaluate multi hop query answering on different query types

dicee.download_file(url, destination_folder='.')[source]
dicee.download_files_from_url(base_url: str, destination_folder='.') None[source]
Parameters:
dicee.download_pretrained_model(url: str) str[source]
dicee.write_csv_from_model_parallel(path: str)[source]

Create

dicee.from_pretrained_model_write_embeddings_into_csv(path: str) None[source]
class dicee.DICE_Trainer(args, is_continual_training: bool, storage_path, evaluator=None)[source]
DICE_Trainer implement

1- Pytorch Lightning trainer (https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html) 2- Multi-GPU Trainer(https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) 3- CPU Trainer

args

is_continual_training:bool

storage_path:str

evaluator:

report:dict

report
args
trainer = None
is_continual_training
storage_path
evaluator = None
form_of_labelling = None
continual_start(knowledge_graph)[source]
  1. Initialize training.

  2. Load model

(3) Load trainer (3) Fit model

Parameter

returns:
  • model

  • form_of_labelling (str)

initialize_trainer(callbacks: List) lightning.Trainer | dicee.trainer.model_parallelism.TensorParallel | dicee.trainer.torch_trainer.TorchTrainer | dicee.trainer.torch_trainer_ddp.TorchDDPTrainer[source]

Initialize Trainer from input arguments

initialize_or_load_model()[source]
init_dataloader(dataset: torch.utils.data.Dataset) torch.utils.data.DataLoader[source]
init_dataset() torch.utils.data.Dataset[source]
start(knowledge_graph: dicee.knowledge_graph.KG | numpy.memmap) Tuple[dicee.models.base_model.BaseKGE, str][source]

Start the training

  1. Initialize Trainer

  2. Initialize or load a pretrained KGE model

in DDP setup, we need to load the memory map of already read/index KG.

k_fold_cross_validation(dataset) Tuple[dicee.models.base_model.BaseKGE, str][source]

Perform K-fold Cross-Validation

  1. Obtain K train and test splits.

  2. For each split,

    2.1 initialize trainer and model 2.2. Train model with configuration provided in args. 2.3. Compute the mean reciprocal rank (MRR) score of the model on the test respective split.

  3. Report the mean and average MRR .

Parameters:
  • self

  • dataset

Returns:

model

class dicee.KGE(path=None, url=None, construct_ensemble=False, model_name=None)[source]

Bases: dicee.abstracts.BaseInteractiveKGE

Knowledge Graph Embedding Class for interactive usage of pre-trained models

__str__()[source]
to(device: str) None[source]
get_transductive_entity_embeddings(indices: torch.LongTensor | List[str], as_pytorch=False, as_numpy=False, as_list=True) torch.FloatTensor | numpy.ndarray | List[float][source]
create_vector_database(collection_name: str, distance: str, location: str = 'localhost', port: int = 6333)[source]
generate(h='', r='')[source]
eval_lp_performance(dataset=List[Tuple[str, str, str]], filtered=True)[source]
predict_missing_head_entity(relation: List[str] | str, tail_entity: List[str] | str, within=None) Tuple[source]

Given a relation and a tail entity, return top k ranked head entity.

argmax_{e in E } f(e,r,t), where r in R, t in E.

Parameter

relation: Union[List[str], str]

String representation of selected relations.

tail_entity: Union[List[str], str]

String representation of selected entities.

k: int

Highest ranked k entities.

Returns: Tuple

Highest K scores and entities

predict_missing_relations(head_entity: List[str] | str, tail_entity: List[str] | str, within=None) Tuple[source]

Given a head entity and a tail entity, return top k ranked relations.

argmax_{r in R } f(h,r,t), where h, t in E.

Parameter

head_entity: List[str]

String representation of selected entities.

tail_entity: List[str]

String representation of selected entities.

k: int

Highest ranked k entities.

Returns: Tuple

Highest K scores and entities

predict_missing_tail_entity(head_entity: List[str] | str, relation: List[str] | str, within: List[str] = None) torch.FloatTensor[source]

Given a head entity and a relation, return top k ranked entities

argmax_{e in E } f(h,r,e), where h in E and r in R.

Parameter

head_entity: List[str]

String representation of selected entities.

tail_entity: List[str]

String representation of selected entities.

Returns: Tuple

scores

predict(*, h: List[str] | str = None, r: List[str] | str = None, t: List[str] | str = None, within=None, logits=True) torch.FloatTensor[source]
Parameters:
  • logits

  • h

  • r

  • t

  • within

predict_topk(*, h: str | List[str] = None, r: str | List[str] = None, t: str | List[str] = None, topk: int = 10, within: List[str] = None)[source]

Predict missing item in a given triple.

Parameter

head_entity: Union[str, List[str]]

String representation of selected entities.

relation: Union[str, List[str]]

String representation of selected relations.

tail_entity: Union[str, List[str]]

String representation of selected entities.

k: int

Highest ranked k item.

Returns: Tuple

Highest K scores and items

triple_score(h: List[str] | str = None, r: List[str] | str = None, t: List[str] | str = None, logits=False) torch.FloatTensor[source]

Predict triple score

Parameter

head_entity: List[str]

String representation of selected entities.

relation: List[str]

String representation of selected relations.

tail_entity: List[str]

String representation of selected entities.

logits: bool

If logits is True, unnormalized score returned

Returns: Tuple

pytorch tensor of triple score

t_norm(tens_1: torch.Tensor, tens_2: torch.Tensor, tnorm: str = 'min') torch.Tensor[source]
tensor_t_norm(subquery_scores: torch.FloatTensor, tnorm: str = 'min') torch.FloatTensor[source]

Compute T-norm over [0,1] ^{n imes d} where n denotes the number of hops and d denotes number of entities

t_conorm(tens_1: torch.Tensor, tens_2: torch.Tensor, tconorm: str = 'min') torch.Tensor[source]
negnorm(tens_1: torch.Tensor, lambda_: float, neg_norm: str = 'standard') torch.Tensor[source]
return_multi_hop_query_results(aggregated_query_for_all_entities, k: int, only_scores)[source]
single_hop_query_answering(query: tuple, only_scores: bool = True, k: int = None)[source]
answer_multi_hop_query(query_type: str = None, query: Tuple[str | Tuple[str, str], Ellipsis] = None, queries: List[Tuple[str | Tuple[str, str], Ellipsis]] = None, tnorm: str = 'prod', neg_norm: str = 'standard', lambda_: float = 0.0, k: int = 10, only_scores=False) List[Tuple[str, torch.Tensor]][source]

# @TODO: Refactoring is needed # @TODO: Score computation for each query type should be done in a static function

Find an answer set for EPFO queries including negation and disjunction

Parameter

query_type: str The type of the query, e.g., “2p”.

query: Union[str, Tuple[str, Tuple[str, str]]] The query itself, either a string or a nested tuple.

queries: List of Tuple[Union[str, Tuple[str, str]], …]

tnorm: str The t-norm operator.

neg_norm: str The negation norm.

lambda_: float lambda parameter for sugeno and yager negation norms

k: int The top-k substitutions for intermediate variables.

returns:
  • List[Tuple[str, torch.Tensor]]

  • Entities and corresponding scores sorted in the descening order of scores

find_missing_triples(confidence: float, entities: List[str] = None, relations: List[str] = None, topk: int = 10, at_most: int = sys.maxsize) Set[source]

Find missing triples

Iterative over a set of entities E and a set of relation R :

orall e in E and orall r in R f(e,r,x)

Return (e,r,x)

otin G and f(e,r,x) > confidence

confidence: float

A threshold for an output of a sigmoid function given a triple.

topk: int

Highest ranked k item to select triples with f(e,r,x) > confidence .

at_most: int

Stop after finding at_most missing triples

{(e,r,x) | f(e,r,x) > confidence land (e,r,x)

otin G

deploy(share: bool = False, top_k: int = 10)[source]
train_triples(h: List[str], r: List[str], t: List[str], labels: List[float], iteration=2, optimizer=None)[source]
train_k_vs_all(h, r, iteration=1, lr=0.001)[source]

Train k vs all :param head_entity: :param relation: :param iteration: :param lr: :return:

train(kg, lr=0.1, epoch=10, batch_size=32, neg_sample_ratio=10, num_workers=1) None[source]

Retrained a pretrain model on an input KG via negative sampling.

class dicee.Execute(args, continuous_training=False)[source]

A class for Training, Retraining and Evaluation a model.

  1. Loading & Preprocessing & Serializing input data.

  2. Training & Validation & Testing

  3. Storing all necessary info

args
is_continual_training = False
trainer = None
trained_model = None
knowledge_graph = None
report
evaluator = None
start_time = None
setup_executor() None[source]
save_trained_model() None[source]

Save a knowledge graph embedding model

  1. Send model to eval mode and cpu.

  2. Store the memory footprint of the model.

  3. Save the model into disk.

  4. Update the stats of KG again ?

Parameter

rtype:

None

end(form_of_labelling: str) dict[source]

End training

  1. Store trained model.

  2. Report runtimes.

  3. Eval model if required.

Parameter

rtype:

A dict containing information about the training and/or evaluation

write_report() None[source]

Report training related information in a report.json file

start() dict[source]

Start training

# (1) Loading the Data # (2) Create an evaluator object. # (3) Create a trainer object. # (4) Start the training

Parameter

rtype:

A dict containing information about the training and/or evaluation

dicee.mapping_from_first_two_cols_to_third(train_set_idx)[source]
dicee.timeit(func)[source]
dicee.load_term_mapping(file_path=str)[source]
dicee.reload_dataset(path: str, form_of_labelling, scoring_technique, neg_ratio, label_smoothing_rate)[source]

Reload the files from disk to construct the Pytorch dataset

dicee.construct_dataset(*, train_set: numpy.ndarray | list, valid_set=None, test_set=None, ordered_bpe_entities=None, train_target_indices=None, target_dim: int = None, entity_to_idx: dict, relation_to_idx: dict, form_of_labelling: str, scoring_technique: str, neg_ratio: int, label_smoothing_rate: float, byte_pair_encoding=None, block_size: int = None) torch.utils.data.Dataset[source]
class dicee.BPE_NegativeSamplingDataset(train_set: torch.LongTensor, ordered_shaped_bpe_entities: torch.LongTensor, neg_ratio: int)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

train_set
ordered_bpe_entities
num_bpe_entities
neg_ratio
num_datapoints
__len__()[source]
__getitem__(idx)[source]
collate_fn(batch_shaped_bpe_triples: List[Tuple[torch.Tensor, torch.Tensor]])[source]
class dicee.MultiLabelDataset(train_set: torch.LongTensor, train_indices_target: torch.LongTensor, target_dim: int, torch_ordered_shaped_bpe_entities: torch.LongTensor)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

train_set
train_indices_target
target_dim
num_datapoints
torch_ordered_shaped_bpe_entities
collate_fn = None
__len__()[source]
__getitem__(idx)[source]
class dicee.MultiClassClassificationDataset(subword_units: numpy.ndarray, block_size: int = 8)[source]

Bases: torch.utils.data.Dataset

Dataset for the 1vsALL training strategy

Parameters:
Return type:

torch.utils.data.Dataset

train_data
block_size = 8
num_of_data_points
collate_fn = None
__len__()[source]
__getitem__(idx)[source]
class dicee.OnevsAllDataset(train_set_idx: numpy.ndarray, entity_idxs)[source]

Bases: torch.utils.data.Dataset

Dataset for the 1vsALL training strategy

Parameters:
Return type:

torch.utils.data.Dataset

train_data
target_dim
collate_fn = None
__len__()[source]
__getitem__(idx)[source]
class dicee.KvsAll(train_set_idx: numpy.ndarray, entity_idxs, relation_idxs, form, store=None, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

Creates a dataset for KvsAll training by inheriting from torch.utils.data.Dataset.

Let D denote a dataset for KvsAll training and be defined as D:= {(x,y)_i}_i ^N, where x: (h,r) is an unique tuple of an entity h in E and a relation r in R that has been seed in the input graph. y: denotes a multi-label vector in [0,1]^{|E|} is a binary label.

orall y_i =1 s.t. (h r E_i) in KG

Note

TODO

train_set_idxnumpy.ndarray

n by 3 array representing n triples

entity_idxsdictonary

string representation of an entity to its integer id

relation_idxsdictonary

string representation of a relation to its integer id

self : torch.utils.data.Dataset

>>> a = KvsAll()
>>> a
? array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
train_data = None
train_target = None
label_smoothing_rate
collate_fn = None
__len__()[source]
__getitem__(idx)[source]
class dicee.AllvsAll(train_set_idx: numpy.ndarray, entity_idxs, relation_idxs, label_smoothing_rate=0.0)[source]

Bases: torch.utils.data.Dataset

Creates a dataset for AllvsAll training by inheriting from torch.utils.data.Dataset.

Let D denote a dataset for AllvsAll training and be defined as D:= {(x,y)_i}_i ^N, where x: (h,r) is a possible unique tuple of an entity h in E and a relation r in R. Hence N = |E| x |R| y: denotes a multi-label vector in [0,1]^{|E|} is a binary label.

orall y_i =1 s.t. (h r E_i) in KG

Note

AllvsAll extends KvsAll via none existing (h,r). Hence, it adds data points that are labelled without 1s,

only with 0s.

train_set_idxnumpy.ndarray

n by 3 array representing n triples

entity_idxsdictonary

string representation of an entity to its integer id

relation_idxsdictonary

string representation of a relation to its integer id

self : torch.utils.data.Dataset

>>> a = AllvsAll()
>>> a
? array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
train_data = None
train_target = None
label_smoothing_rate
collate_fn = None
target_dim
__len__()[source]
__getitem__(idx)[source]
class dicee.OnevsSample(train_set: numpy.ndarray, num_entities, num_relations, neg_sample_ratio: int = None, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

A custom PyTorch Dataset class for knowledge graph embeddings, which includes both positive and negative sampling for a given dataset for multi-class classification problem..

Parameters:
  • train_set (np.ndarray) – A numpy array containing triples of knowledge graph data. Each triple consists of (head_entity, relation, tail_entity).

  • num_entities (int) – The number of unique entities in the knowledge graph.

  • num_relations (int) – The number of unique relations in the knowledge graph.

  • neg_sample_ratio (int, optional) – The number of negative samples to be generated per positive sample. Must be a positive integer and less than num_entities.

  • label_smoothing_rate (float, optional) – A label smoothing rate to apply to the positive and negative labels. Defaults to 0.0.

train_data

The input data converted into a PyTorch tensor.

Type:

torch.Tensor

num_entities

Number of entities in the dataset.

Type:

int

num_relations

Number of relations in the dataset.

Type:

int

neg_sample_ratio

Ratio of negative samples to be drawn for each positive sample.

Type:

int

label_smoothing_rate

The smoothing factor applied to the labels.

Type:

torch.Tensor

collate_fn

A function that can be used to collate data samples into batches (set to None by default).

Type:

function, optional

train_data
num_entities
num_relations
neg_sample_ratio = None
label_smoothing_rate
collate_fn = None
__len__()[source]

Returns the number of samples in the dataset.

__getitem__(idx)[source]

Retrieves a single data sample from the dataset at the given index.

Parameters:

idx (int) – The index of the sample to retrieve.

Returns:

A tuple consisting of:
  • x (torch.Tensor): The head and relation part of the triple.

  • y_idx (torch.Tensor): The concatenated indices of the true object (tail entity) and the indices of the negative samples.

  • y_vec (torch.Tensor): A vector containing the labels for the positive and negative samples, with label smoothing applied.

Return type:

tuple

class dicee.KvsSampleDataset(train_set_idx: numpy.ndarray, entity_idxs, relation_idxs, form, store=None, neg_ratio=None, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

KvsSample a Dataset:
D:= {(x,y)_i}_i ^N, where

. x:(h,r) is a unique h in E and a relation r in R and . y in [0,1]^{|E|} is a binary label.

orall y_i =1 s.t. (h r E_i) in KG
At each mini-batch construction, we subsample(y), hence n

|new_y| << |E| new_y contains all 1’s if sum(y)< neg_sample ratio new_y contains

train_set_idx

Indexed triples for the training.

entity_idxs

mapping.

relation_idxs

mapping.

form

?

store

?

label_smoothing_rate

?

torch.utils.data.Dataset

train_data = None
train_target = None
neg_ratio = None
num_entities
label_smoothing_rate
collate_fn = None
max_num_of_classes
__len__()[source]
__getitem__(idx)[source]
class dicee.NegSampleDataset(train_set: numpy.ndarray, num_entities: int, num_relations: int, neg_sample_ratio: int = 1)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

neg_sample_ratio
train_set
length
num_entities
num_relations
__len__()[source]
__getitem__(idx)[source]
class dicee.TriplePredictionDataset(train_set: numpy.ndarray, num_entities: int, num_relations: int, neg_sample_ratio: int = 1, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

Triple Dataset

D:= {(x)_i}_i ^N, where

. x:(h,r, t) in KG is a unique h in E and a relation r in R and . collact_fn => Generates negative triples

collect_fn:

orall (h,r,t) in G obtain, create negative triples{(h,r,x),(,r,t),(h,m,t)}

y:labels are represented in torch.float16

train_set_idx

Indexed triples for the training.

entity_idxs

mapping.

relation_idxs

mapping.

form

?

store

?

label_smoothing_rate

collate_fn: batch:List[torch.IntTensor] Returns ——- torch.utils.data.Dataset

label_smoothing_rate
neg_sample_ratio
train_set
length
num_entities
num_relations
__len__()[source]
__getitem__(idx)[source]
collate_fn(batch: List[torch.Tensor])[source]
class dicee.CVDataModule(train_set_idx: numpy.ndarray, num_entities, num_relations, neg_sample_ratio, batch_size, num_workers)[source]

Bases: pytorch_lightning.LightningDataModule

Create a Dataset for cross validation

Parameters:
Return type:

?

train_set_idx
num_entities
num_relations
neg_sample_ratio
batch_size
num_workers
train_dataloader() torch.utils.data.DataLoader[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

setup(*args, **kwargs)[source]

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Parameters:

stage – either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)
transfer_batch_to_device(*args, **kwargs)[source]

Override this hook if your DataLoader returns tensors wrapped in a custom data structure.

The data types listed below (and any arbitrary nesting of them) are supported out of the box:

  • torch.Tensor or anything that implements .to(…)

  • list

  • dict

  • tuple

For anything else, you need to define how the data is moved to the target device (CPU, GPU, TPU, …).

Note

This hook should only transfer the data and not modify it, nor should it move the data to any other device than the one passed in as argument (unless you know what you are doing). To check the current state of execution of this hook you can use self.trainer.training/testing/validating/predicting so that you can add different logic as per your requirement.

Parameters:
  • batch – A batch of data that needs to be transferred to a new device.

  • device – The target device as defined in PyTorch.

  • dataloader_idx – The index of the dataloader to which the batch belongs.

Returns:

A reference to the data on the new device.

Example:

def transfer_batch_to_device(self, batch, device, dataloader_idx):
    if isinstance(batch, CustomBatch):
        # move all tensors in your custom data structure to the device
        batch.samples = batch.samples.to(device)
        batch.targets = batch.targets.to(device)
    elif dataloader_idx == 0:
        # skip device transfer for the first dataloader or anything you wish
        pass
    else:
        batch = super().transfer_batch_to_device(batch, device, dataloader_idx)
    return batch

See also

  • move_data_to_device()

  • apply_to_collection()

prepare_data(*args, **kwargs)[source]

Use this to download and prepare data. Downloading and saving data with multiple processes (distributed settings) will result in corrupted data. Lightning ensures this method is called only within a single process, so you can safely add your downloading logic within.

Warning

DO NOT set state to the model (use setup instead) since this is NOT called on every device

Example:

def prepare_data(self):
    # good
    download_data()
    tokenize()
    etc()

    # bad
    self.split = data_split
    self.some_state = some_other_state()

In a distributed environment, prepare_data can be called in two ways (using prepare_data_per_node)

  1. Once per node. This is the default and is only called on LOCAL_RANK=0.

  2. Once in total. Only called on GLOBAL_RANK=0.

Example:

# DEFAULT
# called once per node on LOCAL_RANK=0 of that node
class LitDataModule(LightningDataModule):
    def __init__(self):
        super().__init__()
        self.prepare_data_per_node = True


# call on GLOBAL_RANK=0 (great for shared file systems)
class LitDataModule(LightningDataModule):
    def __init__(self):
        super().__init__()
        self.prepare_data_per_node = False

This is called before requesting the dataloaders:

model.prepare_data()
initialize_distributed()
model.setup(stage)
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()
model.predict_dataloader()
class dicee.QueryGenerator(train_path, val_path: str, test_path: str, ent2id: Dict = None, rel2id: Dict = None, seed: int = 1, gen_valid: bool = False, gen_test: bool = True)[source]
train_path
val_path
test_path
gen_valid = False
gen_test = True
seed = 1
max_ans_num = 1000000.0
mode
ent2id = None
rel2id: Dict = None
ent_in: Dict
ent_out: Dict
query_name_to_struct
list2tuple(list_data)[source]
tuple2list(x: List | Tuple) List | Tuple[source]

Convert a nested tuple to a nested list.

set_global_seed(seed: int)[source]

Set seed

construct_graph(paths: List[str]) Tuple[Dict, Dict][source]

Construct graph from triples Returns dicts with incoming and outgoing edges

fill_query(query_structure: List[str | List], ent_in: Dict, ent_out: Dict, answer: int) bool[source]

Private method for fill_query logic.

achieve_answer(query: List[str | List], ent_in: Dict, ent_out: Dict) set[source]

Private method for achieve_answer logic. @TODO: Document the code

ground_queries(query_structure: List[str | List], ent_in: Dict, ent_out: Dict, small_ent_in: Dict, small_ent_out: Dict, gen_num: int, query_name: str)[source]

Generating queries and achieving answers

unmap(query_type, queries, tp_answers, fp_answers, fn_answers)[source]
unmap_query(query_structure, query, id2ent, id2rel)[source]
generate_queries(query_struct: List, gen_num: int, query_type: str)[source]

Passing incoming and outgoing edges to ground queries depending on mode [train valid or text] and getting queries and answers in return @ TODO: create a class for each single query struct

save_queries(query_type: str, gen_num: int, save_path: str)[source]
abstract load_queries(path)[source]
get_queries(query_type: str, gen_num: int)[source]
static save_queries_and_answers(path: str, data: List[Tuple[str, Tuple[collections.defaultdict]]]) None[source]

Save Queries into Disk

static load_queries_and_answers(path: str) List[Tuple[str, Tuple[collections.defaultdict]]][source]

Load Queries from Disk to Memory

dicee.__version__ = '0.1.5'