dicee

Attributes

__version__

Classes

`Pyke`	A Physical Embedding Model for Knowledge Graphs
`DistMult`	Embedding Entities and Relations for Learning and Inference in Knowledge Bases
`KeciBase`	Without learning dimension scaling
`Keci`	Base class for all neural network modules.
`TransE`	Translating Embeddings for Modeling
`DeCaL`	Base class for all neural network modules.
`DualE`	Dual Quaternion Knowledge Graph Embeddings (https://ojs.aaai.org/index.php/AAAI/article/download/16850/16657)
`ComplEx`	Base class for all neural network modules.
`AConEx`	Additive Convolutional ComplEx Knowledge Graph Embeddings
`AConvO`	Additive Convolutional Octonion Knowledge Graph Embeddings
`AConvQ`	Additive Convolutional Quaternion Knowledge Graph Embeddings
`ConvQ`	Convolutional Quaternion Knowledge Graph Embeddings
`ConvO`	Base class for all neural network modules.
`ConEx`	Convolutional ComplEx Knowledge Graph Embeddings
`QMult`	Base class for all neural network modules.
`OMult`	Base class for all neural network modules.
`Shallom`	A shallow neural model for relation prediction (https://arxiv.org/abs/2101.09090)
`LFMult`	Embedding with polynomial functions. We represent all entities and relations in the polynomial space as:
`PykeenKGE`	A class for using knowledge graph embedding models implemented in Pykeen
`BytE`	Base class for all neural network modules.
`BaseKGE`	Base class for all neural network modules.
`EnsembleKGE`
`DICE_Trainer`	DICE_Trainer implement
`KGE`	Knowledge Graph Embedding Class for interactive usage of pre-trained models
`Execute`	A class for Training, Retraining and Evaluation a model.
`BPE_NegativeSamplingDataset`	An abstract class representing a `Dataset`.
`MultiLabelDataset`	An abstract class representing a `Dataset`.
`MultiClassClassificationDataset`	Dataset for the 1vsALL training strategy
`OnevsAllDataset`	Dataset for the 1vsALL training strategy
`KvsAll`	Creates a dataset for KvsAll training by inheriting from torch.utils.data.Dataset.
`AllvsAll`	Creates a dataset for AllvsAll training by inheriting from torch.utils.data.Dataset.
`OnevsSample`	A custom PyTorch Dataset class for knowledge graph embeddings, which includes
`KvsSampleDataset`	KvsSample a Dataset:
`NegSampleDataset`	An abstract class representing a `Dataset`.
`TriplePredictionDataset`	Triple Dataset
`CVDataModule`	Create a Dataset for cross validation
`QueryGenerator`

Functions

`create_recipriocal_triples`(x)	Add inverse triples into dask dataframe
`get_er_vocab`(data[, file_path])
`get_re_vocab`(data[, file_path])
`get_ee_vocab`(data[, file_path])
`timeit`(func)
`save_pickle`(*[, data, file_path])
`load_pickle`([file_path])
`load_term_mapping`([file_path])
`select_model`(args[, is_continual_training, storage_path])
`load_model`(→ Tuple[object, Tuple[dict, dict]])	Load weights and initialize pytorch module from namespace arguments
`load_model_ensemble`(...)	Construct Ensemble Of weights and initialize pytorch module from namespace arguments
`save_numpy_ndarray`(*, data, file_path)
`numpy_data_type_changer`(→ numpy.ndarray)	Detect most efficient data type for a given triples
`save_checkpoint_model`(→ None)	Store Pytorch model into disk
`store`(→ None)
`add_noisy_triples`(→ pandas.DataFrame)	Add randomly constructed triples
`read_or_load_kg`(args, cls)
`intialize_model`(→ Tuple[object, str])
`load_json`(→ dict)
`save_embeddings`(→ None)	Save it as CSV if memory allows.
`random_prediction`(pre_trained_kge)
`deploy_triple_prediction`(pre_trained_kge, str_subject, ...)
`deploy_tail_entity_prediction`(pre_trained_kge, ...)
`deploy_head_entity_prediction`(pre_trained_kge, ...)
`deploy_relation_prediction`(pre_trained_kge, ...)
`vocab_to_parquet`(vocab_to_idx, name, ...)
`create_experiment_folder`([folder_name])
`continual_training_setup_executor`(→ None)
`exponential_function`(→ torch.FloatTensor)
`load_numpy`(→ numpy.ndarray)
`evaluate`(entity_to_idx, scores, easy_answers, hard_answers)	# @TODO: CD: Renamed this function
`download_file`(url[, destination_folder])
`download_files_from_url`(→ None)
`download_pretrained_model`(→ str)
`write_csv_from_model_parallel`(path)	Create
`from_pretrained_model_write_embeddings_into_csv`(→ None)
`mapping_from_first_two_cols_to_third`(train_set_idx)
`timeit`(func)
`load_term_mapping`([file_path])
`reload_dataset`(path, form_of_labelling, ...)	Reload the files from disk to construct the Pytorch dataset
`construct_dataset`(→ torch.utils.data.Dataset)

Package Contents

class dicee.Pyke(args)[source]

Bases: dicee.models.base_model.BaseKGE

A Physical Embedding Model for Knowledge Graphs

name = 'Pyke'

dist_func

margin = 1.0

forward_triples(x: torch.LongTensor)[source]

Parameters:: x

class dicee.DistMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Embedding Entities and Relations for Learning and Inference in Knowledge Bases https://arxiv.org/abs/1412.6575

name = 'DistMult'

k_vs_all_score(emb_h: torch.FloatTensor, emb_r: torch.FloatTensor, emb_E: torch.FloatTensor)[source]

Parameters:

emb_h
emb_r
emb_E

forward_k_vs_all(x: torch.LongTensor)[source]

forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor)[source]

score(h, r, t)[source]

class dicee.KeciBase(args)[source]

Bases: Keci

Without learning dimension scaling

name = 'KeciBase'

requires_grad_for_interactions = False

class dicee.Keci(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'Keci'

p

q

r

requires_grad_for_interactions = True

compute_sigma_pp(hp, rp)[source]

Compute sigma_{pp} = sum_{i=1}^{p-1} sum_{k=i+1}^p (h_i r_k - h_k r_i) e_i e_k

sigma_{pp} captures the interactions between along p bases For instance, let p e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for i in range(p - 1):

for k in range(i + 1, p):
results.append(hp[:, :, i] * rp[:, :, k] - hp[:, :, k] * rp[:, :, i])

sigma_pp = torch.stack(results, dim=2) assert sigma_pp.shape == (b, r, int((p * (p - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_qq(hq, rq)[source]

Compute sigma_{qq} = sum_{j=1}^{p+q-1} sum_{k=j+1}^{p+q} (h_j r_k - h_k r_j) e_j e_k sigma_{q} captures the interactions between along q bases For instance, let q e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for j in range(q - 1):

for k in range(j + 1, q):
results.append(hq[:, :, j] * rq[:, :, k] - hq[:, :, k] * rq[:, :, j])

sigma_qq = torch.stack(results, dim=2) assert sigma_qq.shape == (b, r, int((q * (q - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_pq(*, hp, hq, rp, rq)[source]

sum_{i=1}^{p} sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):
sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

apply_coefficients(hp, hq, rp, rq)[source]: Multiplying a base vector with its scalar coefficient

clifford_multiplication(h0, hp, hq, r0, rp, rq)[source]

Compute our CL multiplication

h = h_0 + sum_{i=1}^p h_i e_i + sum_{j=p+1}^{p+q} h_j e_j r = r_0 + sum_{i=1}^p r_i e_i + sum_{j=p+1}^{p+q} r_j e_j

ei ^2 = +1 for i =< i =< p ej ^2 = -1 for p < j =< p+q ei ej = -eje1 for i

eq j

h r = sigma_0 + sigma_p + sigma_q + sigma_{pp} + sigma_{q}+ sigma_{pq} where

sigma_0 = h_0 r_0 + sum_{i=1}^p (h_0 r_i) e_i - sum_{j=p+1}^{p+q} (h_j r_j) e_j

sigma_p = sum_{i=1}^p (h_0 r_i + h_i r_0) e_i

sigma_q = sum_{j=p+1}^{p+q} (h_0 r_j + h_j r_0) e_j

sigma_{pp} = sum_{i=1}^{p-1} sum_{k=i+1}^p (h_i r_k - h_k r_i) e_i e_k

sigma_{qq} = sum_{j=1}^{p+q-1} sum_{k=j+1}^{p+q} (h_j r_k - h_k r_j) e_j e_k

sigma_{pq} = sum_{i=1}^{p} sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j

construct_cl_multivector(x: torch.FloatTensor, r: int, p: int, q: int) → tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor][source]

Construct a batch of multivectors Cl_{p,q}(mathbb{R}^d)

Parameter

x: torch.FloatTensor with (n,d) shape

returns:

a0 (torch.FloatTensor with (n,r) shape)
ap (torch.FloatTensor with (n,r,p) shape)
aq (torch.FloatTensor with (n,r,q) shape)

forward_k_vs_with_explicit(x: torch.Tensor)[source]

k_vs_all_score(bpe_head_ent_emb, bpe_rel_ent_emb, E)[source]

forward_k_vs_all(x: torch.Tensor) → torch.FloatTensor[source]

Kvsall training

Retrieve real-valued embedding vectors for heads and relations mathbb{R}^d .
Construct head entity and relation embeddings according to Cl_{p,q}(mathbb{R}^d) .
Perform Cl multiplication
Inner product of (3) and all entity embeddings

forward_k_vs_with_explicit and this funcitons are identical Parameter ——— x: torch.LongTensor with (n,2) shape :rtype: torch.FloatTensor with (n, |E|) shape

construct_batch_selected_cl_multivector(x: torch.FloatTensor, r: int, p: int, q: int) → tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor][source]

Construct a batch of batchs multivectors Cl_{p,q}(mathbb{R}^d)

Parameter

x: torch.FloatTensor with (n,k, d) shape

returns:

a0 (torch.FloatTensor with (n,k, m) shape)
ap (torch.FloatTensor with (n,k, m, p) shape)
aq (torch.FloatTensor with (n,k, m, q) shape)

forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor) → torch.FloatTensor[source]

Parameter

x: torch.LongTensor with (n,2) shape

target_entity_idx: torch.LongTensor with (n, k ) shape k denotes the selected number of examples.

rtype:: torch.FloatTensor with (n, k) shape

score(h, r, t)[source]

forward_triples(x: torch.Tensor) → torch.FloatTensor[source]

Parameter

x: torch.LongTensor with (n,3) shape

rtype:: torch.FloatTensor with (n) shape

class dicee.TransE(args)[source]

Bases: dicee.models.base_model.BaseKGE

Translating Embeddings for Modeling Multi-relational Data https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf

name = 'TransE'

margin = 4

score(head_ent_emb, rel_ent_emb, tail_ent_emb)[source]

forward_k_vs_all(x: torch.Tensor) → torch.FloatTensor[source]

class dicee.DeCaL(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'DeCaL'

entity_embeddings

relation_embeddings

p

q

r

re

forward_triples(x: torch.Tensor) → torch.FloatTensor[source]

Parameter

x: torch.LongTensor with (n, ) shape

rtype:: torch.FloatTensor with (n) shape

cl_pqr(a: torch.tensor) → torch.tensor[source]

Input: tensor(batch_size, emb_dim) —> output: tensor with 1+p+q+r components with size (batch_size, emb_dim/(1+p+q+r)) each.

1) takes a tensor of size (batch_size, emb_dim), split it into 1 + p + q +r components, hence 1+p+q+r must be a divisor of the emb_dim. 2) Return a list of the 1+p+q+r components vectors, each are tensors of size (batch_size, emb_dim/(1+p+q+r))

compute_sigmas_single(list_h_emb, list_r_emb, list_t_emb)[source]

here we compute all the sums with no others vectors interaction taken with the scalar product with t, that is,

\[s0 = h_0r_0t_0 s1 = \sum_{i=1}^{p}h_ir_it_0 s2 = \sum_{j=p+1}^{p+q}h_jr_jt_0 s3 = \sum_{i=1}^{q}(h_0r_it_i + h_ir_0t_i) s4 = \sum_{i=p+1}^{p+q}(h_0r_it_i + h_ir_0t_i) s5 = \sum_{i=p+q+1}^{p+q+r}(h_0r_it_i + h_ir_0t_i)\]

and return:

\[sigma_0t = \sigma_0 \cdot t_0 = s0 + s1 -s2 s3, s4 and s5\]

compute_sigmas_multivect(list_h_emb, list_r_emb)[source]

Here we compute and return all the sums with vectors interaction for the same and different bases.

For same bases vectors interaction we have

\[\sigma_pp = \sum_{i=1}^{p-1}\sum_{i'=i+1}^{p}(h_ir_{i'}-h_{i'}r_i) (models the interactions between e_i and e_i' for 1 <= i, i' <= p) \sigma_qq = \sum_{j=p+1}^{p+q-1}\sum_{j'=j+1}^{p+q}(h_jr_{j'}-h_{j'} (models the interactions between e_j and e_j' for p+1 <= j, j' <= p+q) \sigma_rr = \sum_{k=p+q+1}^{p+q+r-1}\sum_{k'=k+1}^{p}(h_kr_{k'}-h_{k'}r_k) (models the interactions between e_k and e_k' for p+q+1 <= k, k' <= p+q+r)\]

For different base vector interactions, we have

\[\sigma_pq = \sum_{i=1}^{p}\sum_{j=p+1}^{p+q}(h_ir_j - h_jr_i) (interactionsn between e_i and e_j for 1<=i <=p and p+1<= j <= p+q) \sigma_pr = \sum_{i=1}^{p}\sum_{k=p+q+1}^{p+q+r}(h_ir_k - h_kr_i) (interactionsn between e_i and e_k for 1<=i <=p and p+q+1<= k <= p+q+r) \sigma_qr = \sum_{j=p+1}^{p+q}\sum_{j=p+q+1}^{p+q+r}(h_jr_k - h_kr_j) (interactionsn between e_j and e_k for p+1 <= j <=p+q and p+q+1<= j <= p+q+r)\]

forward_k_vs_all(x: torch.Tensor) → torch.FloatTensor[source]

Kvsall training

Retrieve real-valued embedding vectors for heads and relations
Construct head entity and relation embeddings according to Cl_{p,q, r}(mathbb{R}^d) .
Perform Cl multiplication
Inner product of (3) and all entity embeddings

forward_k_vs_with_explicit and this funcitons are identical Parameter ——— x: torch.LongTensor with (n, ) shape :rtype: torch.FloatTensor with (n, |E|) shape

apply_coefficients(h0, hp, hq, hk, r0, rp, rq, rk)[source]: Multiplying a base vector with its scalar coefficient

construct_cl_multivector(x: torch.FloatTensor, re: int, p: int, q: int, r: int) → tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor][source]

Construct a batch of multivectors Cl_{p,q,r}(mathbb{R}^d)

Parameter

x: torch.FloatTensor with (n,d) shape

returns:

a0 (torch.FloatTensor)
ap (torch.FloatTensor)
aq (torch.FloatTensor)
ar (torch.FloatTensor)

compute_sigma_pp(hp, rp)[source]

Compute .. math:

\sigma_{p,p}^* = \sum_{i=1}^{p-1}\sum_{i'=i+1}^{p}(x_iy_{i'}-x_{i'}y_i)

sigma_{pp} captures the interactions between along p bases For instance, let p e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for i in range(p - 1):

for k in range(i + 1, p):
results.append(hp[:, :, i] * rp[:, :, k] - hp[:, :, k] * rp[:, :, i])

sigma_pp = torch.stack(results, dim=2) assert sigma_pp.shape == (b, r, int((p * (p - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_qq(hq, rq)[source]

Compute

\[\sigma_{q,q}^* = \sum_{j=p+1}^{p+q-1}\sum_{j'=j+1}^{p+q}(x_jy_{j'}-x_{j'}y_j) Eq. 16\]

sigma_{q} captures the interactions between along q bases For instance, let q e_1, e_2, e_3, we compute interactions between e_1 e_2, e_1 e_3 , and e_2 e_3 This can be implemented with a nested two for loops

results = [] for j in range(q - 1):

for k in range(j + 1, q):
results.append(hq[:, :, j] * rq[:, :, k] - hq[:, :, k] * rq[:, :, j])

sigma_qq = torch.stack(results, dim=2) assert sigma_qq.shape == (b, r, int((q * (q - 1)) / 2))

Yet, this computation would be quite inefficient. Instead, we compute interactions along all p, e.g., e1e1, e1e2, e1e3,

e2e1, e2e2, e2e3, e3e1, e3e2, e3e3

Then select the triangular matrix without diagonals: e1e2, e1e3, e2e3.

compute_sigma_rr(hk, rk)[source]: \[\sigma_{r,r}^* = \sum_{k=p+q+1}^{p+q+r-1}\sum_{k'=k+1}^{p}(x_ky_{k'}-x_{k'}y_k)\]

compute_sigma_pq(*, hp, hq, rp, rq)[source]

Compute

\[\sum_{i=1}^{p} \sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j\]

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):
sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

compute_sigma_pr(*, hp, hk, rp, rk)[source]

Compute

\[\sum_{i=1}^{p} \sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j\]

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):
sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

compute_sigma_qr(*, hq, hk, rq, rk)[source]

\[\sum_{i=1}^{p} \sum_{j=p+1}^{p+q} (h_i r_j - h_j r_i) e_i e_j\]

results = [] sigma_pq = torch.zeros(b, r, p, q) for i in range(p):

for j in range(q):
sigma_pq[:, :, i, j] = hp[:, :, i] * rq[:, :, j] - hq[:, :, j] * rp[:, :, i]

print(sigma_pq.shape)

class dicee.DualE(args)[source]

Bases: dicee.models.base_model.BaseKGE

Dual Quaternion Knowledge Graph Embeddings (https://ojs.aaai.org/index.php/AAAI/article/download/16850/16657)

name = 'DualE'

entity_embeddings

relation_embeddings

num_ent = None

kvsall_score(e_1_h, e_2_h, e_3_h, e_4_h, e_5_h, e_6_h, e_7_h, e_8_h, e_1_t, e_2_t, e_3_t, e_4_t, e_5_t, e_6_t, e_7_t, e_8_t, r_1, r_2, r_3, r_4, r_5, r_6, r_7, r_8) → torch.tensor[source]: KvsAll scoring function

Input

x: torch.LongTensor with (n, ) shape

Output

torch.FloatTensor with (n) shape

forward_triples(idx_triple: torch.tensor) → torch.tensor[source]: Negative Sampling forward pass:

Input

x: torch.LongTensor with (n, ) shape

Output

torch.FloatTensor with (n) shape

forward_k_vs_all(x)[source]: KvsAll forward pass

Input

x: torch.LongTensor with (n, ) shape

Output

torch.FloatTensor with (n) shape

T(x: torch.tensor) → torch.tensor[source]

Transpose function

Input: Tensor with shape (nxm) Output: Tensor with shape (mxn)

class dicee.ComplEx(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'ComplEx'

static score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor)[source]

static k_vs_all_score(emb_h: torch.FloatTensor, emb_r: torch.FloatTensor, emb_E: torch.FloatTensor)[source]

Parameters:

emb_h
emb_r
emb_E

forward_k_vs_all(x: torch.LongTensor) → torch.FloatTensor[source]

forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor)[source]

class dicee.AConEx(args)[source]

Bases: dicee.models.base_model.BaseKGE

Additive Convolutional ComplEx Knowledge Graph Embeddings

name = 'AConEx'

conv2d

fc_num_input

fc1

norm_fc1

bn_conv2d

feature_map_dropout

residual_convolution(C_1: Tuple[torch.Tensor, torch.Tensor], C_2: Tuple[torch.Tensor, torch.Tensor]) → torch.FloatTensor[source]: Compute residual score of two complex-valued embeddings. :param C_1: a tuple of two pytorch tensors that corresponds complex-valued embeddings :param C_2: a tuple of two pytorch tensors that corresponds complex-valued embeddings :return:

forward_k_vs_all(x: torch.Tensor) → torch.FloatTensor[source]

forward_triples(x: torch.Tensor) → torch.FloatTensor[source]

Parameters:: x

forward_k_vs_sample(x: torch.Tensor, target_entity_idx: torch.Tensor)[source]

class dicee.AConvO(args: dict)[source]

Bases: dicee.models.base_model.BaseKGE

Additive Convolutional Octonion Knowledge Graph Embeddings

name = 'AConvO'

conv2d

fc_num_input

fc1

bn_conv2d

norm_fc1

feature_map_dropout

static octonion_normalizer(emb_rel_e0, emb_rel_e1, emb_rel_e2, emb_rel_e3, emb_rel_e4, emb_rel_e5, emb_rel_e6, emb_rel_e7)[source]

residual_convolution(O_1, O_2)[source]

forward_triples(x: torch.Tensor) → torch.Tensor[source]

Parameters:: x

forward_k_vs_all(x: torch.Tensor)[source]: Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.AConvQ(args)[source]

Bases: dicee.models.base_model.BaseKGE

Additive Convolutional Quaternion Knowledge Graph Embeddings

name = 'AConvQ'

entity_embeddings

relation_embeddings

conv2d

fc_num_input

fc1

bn_conv1

bn_conv2

feature_map_dropout

residual_convolution(Q_1, Q_2)[source]

forward_triples(indexed_triple: torch.Tensor) → torch.Tensor[source]

Parameters:: x

forward_k_vs_all(x: torch.Tensor)[source]: Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.ConvQ(args)[source]

Bases: dicee.models.base_model.BaseKGE

Convolutional Quaternion Knowledge Graph Embeddings

name = 'ConvQ'

entity_embeddings

relation_embeddings

conv2d

fc_num_input

fc1

bn_conv1

bn_conv2

feature_map_dropout

residual_convolution(Q_1, Q_2)[source]

forward_triples(indexed_triple: torch.Tensor) → torch.Tensor[source]

Parameters:: x

forward_k_vs_all(x: torch.Tensor)[source]: Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.ConvO(args: dict)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'ConvO'

conv2d

fc_num_input

fc1

bn_conv2d

norm_fc1

feature_map_dropout

static octonion_normalizer(emb_rel_e0, emb_rel_e1, emb_rel_e2, emb_rel_e3, emb_rel_e4, emb_rel_e5, emb_rel_e6, emb_rel_e7)[source]

residual_convolution(O_1, O_2)[source]

forward_triples(x: torch.Tensor) → torch.Tensor[source]

Parameters:: x

forward_k_vs_all(x: torch.Tensor)[source]: Given a head entity and a relation (h,r), we compute scores for all entities. [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.ConEx(args)[source]

Bases: dicee.models.base_model.BaseKGE

Convolutional ComplEx Knowledge Graph Embeddings

name = 'ConEx'

conv2d

fc_num_input

fc1

norm_fc1

bn_conv2d

feature_map_dropout

residual_convolution(C_1: Tuple[torch.Tensor, torch.Tensor], C_2: Tuple[torch.Tensor, torch.Tensor]) → torch.FloatTensor[source]: Compute residual score of two complex-valued embeddings. :param C_1: a tuple of two pytorch tensors that corresponds complex-valued embeddings :param C_2: a tuple of two pytorch tensors that corresponds complex-valued embeddings :return:

forward_k_vs_all(x: torch.Tensor) → torch.FloatTensor[source]

forward_triples(x: torch.Tensor) → torch.FloatTensor[source]

Parameters:: x

forward_k_vs_sample(x: torch.Tensor, target_entity_idx: torch.Tensor)[source]

class dicee.QMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'QMult'

explicit = True

quaternion_multiplication_followed_by_inner_product(h, r, t)[source]

Parameters:

h – shape: (*batch_dims, dim) The head representations.
r – shape: (*batch_dims, dim) The head representations.
t – shape: (*batch_dims, dim) The tail representations.

Returns:

Triple scores.

static quaternion_normalizer(x: torch.FloatTensor) → torch.FloatTensor[source]

Normalize the length of relation vectors, if the forward constraint has not been applied yet.

Absolute value of a quaternion

\[|a + bi + cj + dk| = \sqrt{a^2 + b^2 + c^2 + d^2}\]

L2 norm of quaternion vector:

\[\|x\|^2 = \sum_{i=1}^d |x_i|^2 = \sum_{i=1}^d (x_i.re^2 + x_i.im_1^2 + x_i.im_2^2 + x_i.im_3^2)\]

Parameters:: x – The vector.
Returns:: The normalized vector.

score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor)[source]

k_vs_all_score(bpe_head_ent_emb, bpe_rel_ent_emb, E)[source]

Parameters:

bpe_head_ent_emb
bpe_rel_ent_emb
E

forward_k_vs_all(x)[source]

Parameters:: x

forward_k_vs_sample(x, target_entity_idx)[source]: Completed. Given a head entity and a relation (h,r), we compute scores for all possible triples,i.e., [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.OMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'OMult'

static octonion_normalizer(emb_rel_e0, emb_rel_e1, emb_rel_e2, emb_rel_e3, emb_rel_e4, emb_rel_e5, emb_rel_e6, emb_rel_e7)[source]

score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor)[source]

k_vs_all_score(bpe_head_ent_emb, bpe_rel_ent_emb, E)[source]

forward_k_vs_all(x)[source]: Completed. Given a head entity and a relation (h,r), we compute scores for all possible triples,i.e., [score(h,r,x)|x in Entities] => [0.0,0.1,…,0.8], shape=> (1, |Entities|) Given a batch of head entities and relations => shape (size of batch,| Entities|)

class dicee.Shallom(args)[source]

Bases: dicee.models.base_model.BaseKGE

A shallow neural model for relation prediction (https://arxiv.org/abs/2101.09090)

name = 'Shallom'

shallom

get_embeddings() → Tuple[numpy.ndarray, None][source]

forward_k_vs_all(x) → torch.FloatTensor[source]

forward_triples(x) → torch.FloatTensor[source]

Parameters:: x
Returns:

class dicee.LFMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

Embedding with polynomial functions. We represent all entities and relations in the polynomial space as: f(x) = sum_{i=0}^{d-1} a_k x^{i%d} and use the three differents scoring function as in the paper to evaluate the score. We also consider combining with Neural Networks.

name = 'LFMult'

entity_embeddings

relation_embeddings

degree

m

x_values

forward_triples(idx_triple)[source]

Parameters:: x

construct_multi_coeff(x)[source]

poly_NN(x, coefh, coefr, coeft)[source]: Constructing a 2 layers NN to represent the embeddings. h = sigma(wh^T x + bh ), r = sigma(wr^T x + br ), t = sigma(wt^T x + bt )

linear(x, w, b)[source]

scalar_batch_NN(a, b, c)[source]: element wise multiplication between a,b and c: Inputs : a, b, c ====> torch.tensor of size batch_size x m x d Output : a tensor of size batch_size x d

tri_score(coeff_h, coeff_r, coeff_t)[source]

this part implement the trilinear scoring techniques:

score(h,r,t) = int_{0}{1} h(x)r(x)t(x) dx = sum_{i,j,k = 0}^{d-1} dfrac{a_i*b_j*c_k}{1+(i+j+k)%d}

generate the range for i,j and k from [0 d-1]

2. perform dfrac{a_i*b_j*c_k}{1+(i+j+k)%d} in parallel for every batch

take the sum over each batch

vtp_score(h, r, t)[source]

this part implement the vector triple product scoring techniques:

score(h,r,t) = int_{0}{1} h(x)r(x)t(x) dx = sum_{i,j,k = 0}^{d-1} dfrac{a_i*c_j*b_k - b_i*c_j*a_k}{(1+(i+j)%d)(1+k)}

generate the range for i,j and k from [0 d-1]
Compute the first and second terms of the sum
Multiply with then denominator and take the sum
take the sum over each batch

comp_func(h, r, t)[source]: this part implement the function composition scoring techniques: i.e. score = <hor, t>

polynomial(coeff, x, degree)[source]

This function takes a matrix tensor of coefficients (coeff), a tensor vector of points x and range of integer [0,1,…d] and return a vector tensor (coeff[0][0] + coeff[0][1]x +…+ coeff[0][d]x^d,

coeff[1][0] + coeff[1][1]x +…+ coeff[1][d]x^d)

pop(coeff, x, degree)[source]

This function allow us to evaluate the composition of two polynomes without for loops :) it takes a matrix tensor of coefficients (coeff), a matrix tensor of points x and range of integer [0,1,…d]

and return a tensor (coeff[0][0] + coeff[0][1]x +…+ coeff[0][d]x^d,

coeff[1][0] + coeff[1][1]x +…+ coeff[1][d]x^d)

class dicee.PykeenKGE(args: dict)[source]

Bases: dicee.models.base_model.BaseKGE

A class for using knowledge graph embedding models implemented in Pykeen

Notes: Pykeen_DistMult: C Pykeen_ComplEx: Pykeen_QuatE: Pykeen_MuRE: Pykeen_CP: Pykeen_HolE: Pykeen_HolE:

model_kwargs

name

model

loss_history = []

args

entity_embeddings = None

relation_embeddings = None

forward_k_vs_all(x: torch.LongTensor)[source]

# => Explicit version by this we can apply bn and dropout

# (1) Retrieve embeddings of heads and relations + apply Dropout & Normalization if given. h, r = self.get_head_relation_representation(x) # (2) Reshape (1). if self.last_dim > 0:

h = h.reshape(len(x), self.embedding_dim, self.last_dim) r = r.reshape(len(x), self.embedding_dim, self.last_dim)

# (3) Reshape all entities. if self.last_dim > 0:

t = self.entity_embeddings.weight.reshape(self.num_entities, self.embedding_dim, self.last_dim)

else:: t = self.entity_embeddings.weight

# (4) Call the score_t from interactions to generate triple scores. return self.interaction.score_t(h=h, r=r, all_entities=t, slice_size=1)

forward_triples(x: torch.LongTensor) → torch.FloatTensor[source]

# => Explicit version by this we can apply bn and dropout

# (1) Retrieve embeddings of heads, relations and tails and apply Dropout & Normalization if given. h, r, t = self.get_triple_representation(x) # (2) Reshape (1). if self.last_dim > 0:

h = h.reshape(len(x), self.embedding_dim, self.last_dim) r = r.reshape(len(x), self.embedding_dim, self.last_dim) t = t.reshape(len(x), self.embedding_dim, self.last_dim)

# (3) Compute the triple score return self.interaction.score(h=h, r=r, t=t, slice_size=None, slice_dim=0)

abstract forward_k_vs_sample(x: torch.LongTensor, target_entity_idx)[source]

class dicee.BytE(*args, **kwargs)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

name = 'BytE'

config

temperature = 0.5

topk = 2

transformer

lm_head

loss_function(yhat_batch, y_batch)[source]

Parameters:

yhat_batch
y_batch

forward(x: torch.LongTensor)[source]

Parameters:: x (B by T tensor)

generate(idx, max_new_tokens, temperature=1.0, top_k=None)[source]: Take a conditioning sequence of indices idx (LongTensor of shape (b,t)) and complete the sequence max_new_tokens times, feeding the predictions back into the model each time. Most likely you’ll want to make sure to be in model.eval() mode of operation for this.

training_step(batch, batch_idx=None)[source]

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:

batch – The output of your data iterable, normally a DataLoader.
batch_idx – The index of this batch.
dataloader_idx – The index of the dataloader that produced this batch. (only if multiple dataloaders used)

Returns:

Tensor - The loss tensor
dict - A dictionary which can include any keys, but must include the key 'loss' in the case of automatic optimization.
None - In automatic optimization, this will skip to the next batch (but is not supported for multi-GPU, TPU, or DeepSpeed). For manual optimization, this has no special meaning, as returning the loss is not required.

In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.

Example:

def training_step(self, batch, batch_idx):
    x, y, z = batch
    out = self.encoder(x)
    loss = self.loss(out, x)
    return loss

To use multiple optimizers, you can switch to ‘manual optimization’ and control their stepping:

def __init__(self):
    super().__init__()
    self.automatic_optimization = False


# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx):
    opt1, opt2 = self.optimizers()

    # do training_step with encoder
    ...
    opt1.step()
    # do training_step with decoder
    ...
    opt2.step()

Note

When accumulate_grad_batches > 1, the loss returned here will be automatically normalized by accumulate_grad_batches internally.

class dicee.BaseKGE(args: dict)[source]

Bases: BaseKGELightning

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:: training (bool) – Boolean represents whether this module is in training or evaluation mode.

args

embedding_dim = None

num_entities = None

num_relations = None

num_tokens = None

learning_rate = None

apply_unit_norm = None

input_dropout_rate = None

hidden_dropout_rate = None

optimizer_name = None

feature_map_dropout_rate = None

kernel_size = None

num_of_output_channels = None

weight_decay = None

loss

selected_optimizer = None

normalizer_class = None

normalize_head_entity_embeddings

normalize_relation_embeddings

normalize_tail_entity_embeddings

hidden_normalizer

param_init

input_dp_ent_real

input_dp_rel_real

hidden_dropout

loss_history = []

byte_pair_encoding

max_length_subword_tokens

block_size

forward_byte_pair_encoded_k_vs_all(x: torch.LongTensor)[source]

Parameters:: x (B x 2 x T)

forward_byte_pair_encoded_triple(x: Tuple[torch.LongTensor, torch.LongTensor])[source]

byte pair encoded neural link predictors

Parameters:: -------

init_params_with_sanity_checking()[source]

forward(x: torch.LongTensor | Tuple[torch.LongTensor, torch.LongTensor], y_idx: torch.LongTensor = None)[source]

Parameters:

x
y_idx
ordered_bpe_entities

forward_triples(x: torch.LongTensor) → torch.Tensor[source]

Parameters:: x

forward_k_vs_all(*args, **kwargs)[source]

forward_k_vs_sample(*args, **kwargs)[source]

get_triple_representation(idx_hrt)[source]

get_head_relation_representation(indexed_triple)[source]

get_sentence_representation(x: torch.LongTensor)[source]

Parameters:

(b (x shape)
3
t)

get_bpe_head_and_relation_representation(x: torch.LongTensor) → Tuple[torch.FloatTensor, torch.FloatTensor][source]

Parameters:: x (B x 2 x T)

get_embeddings() → Tuple[numpy.ndarray, numpy.ndarray][source]

class dicee.EnsembleKGE(seed_model=None, pretrained_models: List = None)[source]

name

train_mode = True

named_children()[source]

property example_input_array

parameters()[source]

modules()[source]

__iter__()[source]

__len__()[source]

eval()[source]

to(device)[source]

mem_of_model()[source]

__call__(x_batch)[source]

step()[source]

get_embeddings()[source]

__str__()[source]

dicee.create_recipriocal_triples(x)[source]: Add inverse triples into dask dataframe :param x: :return:

dicee.get_er_vocab(data, file_path: str = None)[source]

dicee.get_re_vocab(data, file_path: str = None)[source]

dicee.get_ee_vocab(data, file_path: str = None)[source]

dicee.timeit(func)[source]

dicee.save_pickle(*, data: object = None, file_path=str)[source]

dicee.load_pickle(file_path=str)[source]

dicee.load_term_mapping(file_path=str)[source]

dicee.select_model(args: dict, is_continual_training: bool = None, storage_path: str = None)[source]

dicee.load_model(path_of_experiment_folder: str, model_name='model.pt', verbose=0) → Tuple[object, Tuple[dict, dict]][source]: Load weights and initialize pytorch module from namespace arguments

dicee.load_model_ensemble(path_of_experiment_folder: str) → Tuple[dicee.models.base_model.BaseKGE, Tuple[pandas.DataFrame, pandas.DataFrame]][source]

Construct Ensemble Of weights and initialize pytorch module from namespace arguments

Detect models under given path
Accumulate parameters of detected models
Normalize parameters
Insert (3) into model.

dicee.save_numpy_ndarray(*, data: numpy.ndarray, file_path: str)[source]

dicee.numpy_data_type_changer(train_set: numpy.ndarray, num: int) → numpy.ndarray[source]: Detect most efficient data type for a given triples :param train_set: :param num: :return:

dicee.save_checkpoint_model(model, path: str) → None[source]: Store Pytorch model into disk

dicee.store(trained_model, model_name: str = 'model', full_storage_path: str = None, save_embeddings_as_csv=False) → None[source]

dicee.add_noisy_triples(train_set: pandas.DataFrame, add_noise_rate: float) → pandas.DataFrame[source]: Add randomly constructed triples :param train_set: :param add_noise_rate: :return:

dicee.read_or_load_kg(args, cls)[source]

dicee.intialize_model(args: dict, verbose=0) → Tuple[object, str][source]

dicee.load_json(p: str) → dict[source]

dicee.save_embeddings(embeddings: numpy.ndarray, indexes, path: str) → None[source]: Save it as CSV if memory allows. :param embeddings: :param indexes: :param path: :return:

dicee.random_prediction(pre_trained_kge)[source]

dicee.deploy_triple_prediction(pre_trained_kge, str_subject, str_predicate, str_object)[source]

dicee.deploy_tail_entity_prediction(pre_trained_kge, str_subject, str_predicate, top_k)[source]

dicee.deploy_head_entity_prediction(pre_trained_kge, str_object, str_predicate, top_k)[source]

dicee.deploy_relation_prediction(pre_trained_kge, str_subject, str_object, top_k)[source]

dicee.vocab_to_parquet(vocab_to_idx, name, path_for_serialization, print_into)[source]

dicee.create_experiment_folder(folder_name='Experiments')[source]

dicee.continual_training_setup_executor(executor) → None[source]

dicee.exponential_function(x: numpy.ndarray, lam: float, ascending_order=True) → torch.FloatTensor[source]

dicee.load_numpy(path) → numpy.ndarray[source]

dicee.evaluate(entity_to_idx, scores, easy_answers, hard_answers)[source]: # @TODO: CD: Renamed this function Evaluate multi hop query answering on different query types

dicee.download_file(url, destination_folder='.')[source]

dicee.download_files_from_url(base_url: str, destination_folder='.') → None[source]

Parameters:

base_url (e.g. “https://files.dice-research.org/projects/DiceEmbeddings/KINSHIP-Keci-dim128-epoch256-KvsAll”)
destination_folder (e.g. "KINSHIP-Keci-dim128-epoch256-KvsAll")

dicee.download_pretrained_model(url: str) → str[source]

dicee.write_csv_from_model_parallel(path: str)[source]: Create

dicee.from_pretrained_model_write_embeddings_into_csv(path: str) → None[source]

class dicee.DICE_Trainer(args, is_continual_training: bool, storage_path, evaluator=None)[source]

DICE_Trainer implement

1- Pytorch Lightning trainer (https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html) 2- Multi-GPU Trainer(https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) 3- CPU Trainer

args

is_continual_training:bool

storage_path:str

evaluator:

report:dict

report

args

trainer = None

is_continual_training

storage_path

evaluator = None

form_of_labelling = None

continual_start(knowledge_graph)[source]

Initialize training.
Load model

(3) Load trainer (3) Fit model

Parameter

returns:

model
form_of_labelling (str)

initialize_trainer(callbacks: List) → lightning.Trainer | dicee.trainer.model_parallelism.TensorParallel | dicee.trainer.torch_trainer.TorchTrainer | dicee.trainer.torch_trainer_ddp.TorchDDPTrainer[source]: Initialize Trainer from input arguments

initialize_or_load_model()[source]

init_dataloader(dataset: torch.utils.data.Dataset) → torch.utils.data.DataLoader[source]

init_dataset() → torch.utils.data.Dataset[source]

start(knowledge_graph: dicee.knowledge_graph.KG | numpy.memmap) → Tuple[dicee.models.base_model.BaseKGE, str][source]

Start the training

Initialize Trainer
Initialize or load a pretrained KGE model

in DDP setup, we need to load the memory map of already read/index KG.

k_fold_cross_validation(dataset) → Tuple[dicee.models.base_model.BaseKGE, str][source]

Perform K-fold Cross-Validation

Obtain K train and test splits.
For each split,
2.1 initialize trainer and model 2.2. Train model with configuration provided in args. 2.3. Compute the mean reciprocal rank (MRR) score of the model on the test respective split.
Report the mean and average MRR .

Parameters:

self
dataset

Returns:

model

class dicee.KGE(path=None, url=None, construct_ensemble=False, model_name=None)[source]

Bases: dicee.abstracts.BaseInteractiveKGE

Knowledge Graph Embedding Class for interactive usage of pre-trained models

__str__()[source]

to(device: str) → None[source]

get_transductive_entity_embeddings(indices: torch.LongTensor | List[str], as_pytorch=False, as_numpy=False, as_list=True) → torch.FloatTensor | numpy.ndarray | List[float][source]

create_vector_database(collection_name: str, distance: str, location: str = 'localhost', port: int = 6333)[source]

generate(h='', r='')[source]

eval_lp_performance(dataset=List[Tuple[str, str, str]], filtered=True)[source]

predict_missing_head_entity(relation: List[str] | str, tail_entity: List[str] | str, within=None) → Tuple[source]

Given a relation and a tail entity, return top k ranked head entity.

argmax_{e in E } f(e,r,t), where r in R, t in E.

Parameter

relation: Union[List[str], str]

String representation of selected relations.

tail_entity: Union[List[str], str]

String representation of selected entities.

k: int

Highest ranked k entities.

Returns: Tuple

Highest K scores and entities

predict_missing_relations(head_entity: List[str] | str, tail_entity: List[str] | str, within=None) → Tuple[source]

Given a head entity and a tail entity, return top k ranked relations.

argmax_{r in R } f(h,r,t), where h, t in E.

Parameter

head_entity: List[str]

String representation of selected entities.

tail_entity: List[str]

String representation of selected entities.

k: int

Highest ranked k entities.

Returns: Tuple

Highest K scores and entities

predict_missing_tail_entity(head_entity: List[str] | str, relation: List[str] | str, within: List[str] = None) → torch.FloatTensor[source]

Given a head entity and a relation, return top k ranked entities

argmax_{e in E } f(h,r,e), where h in E and r in R.

Parameter

head_entity: List[str]

String representation of selected entities.

tail_entity: List[str]

String representation of selected entities.

Returns: Tuple

scores

predict(*, h: List[str] | str = None, r: List[str] | str = None, t: List[str] | str = None, within=None, logits=True) → torch.FloatTensor[source]

Parameters:

logits
h
r
t
within

predict_topk(*, h: str | List[str] = None, r: str | List[str] = None, t: str | List[str] = None, topk: int = 10, within: List[str] = None)[source]: Predict missing item in a given triple.

Parameter

head_entity: Union[str, List[str]]

String representation of selected entities.

relation: Union[str, List[str]]

String representation of selected relations.

tail_entity: Union[str, List[str]]

String representation of selected entities.

k: int

Highest ranked k item.

Returns: Tuple

Highest K scores and items

triple_score(h: List[str] | str = None, r: List[str] | str = None, t: List[str] | str = None, logits=False) → torch.FloatTensor[source]: Predict triple score

Parameter

head_entity: List[str]

String representation of selected entities.

relation: List[str]

String representation of selected relations.

tail_entity: List[str]

String representation of selected entities.

logits: bool

If logits is True, unnormalized score returned

Returns: Tuple

pytorch tensor of triple score

t_norm(tens_1: torch.Tensor, tens_2: torch.Tensor, tnorm: str = 'min') → torch.Tensor[source]

tensor_t_norm(subquery_scores: torch.FloatTensor, tnorm: str = 'min') → torch.FloatTensor[source]: Compute T-norm over [0,1] ^{n imes d} where n denotes the number of hops and d denotes number of entities

t_conorm(tens_1: torch.Tensor, tens_2: torch.Tensor, tconorm: str = 'min') → torch.Tensor[source]

negnorm(tens_1: torch.Tensor, lambda_: float, neg_norm: str = 'standard') → torch.Tensor[source]

return_multi_hop_query_results(aggregated_query_for_all_entities, k: int, only_scores)[source]

single_hop_query_answering(query: tuple, only_scores: bool = True, k: int = None)[source]

answer_multi_hop_query(query_type: str = None, query: Tuple[str | Tuple[str, str], Ellipsis] = None, queries: List[Tuple[str | Tuple[str, str], Ellipsis]] = None, tnorm: str = 'prod', neg_norm: str = 'standard', lambda_: float = 0.0, k: int = 10, only_scores=False) → List[Tuple[str, torch.Tensor]][source]

# @TODO: Refactoring is needed # @TODO: Score computation for each query type should be done in a static function

Find an answer set for EPFO queries including negation and disjunction

Parameter

query_type: str The type of the query, e.g., “2p”.

query: Union[str, Tuple[str, Tuple[str, str]]] The query itself, either a string or a nested tuple.

queries: List of Tuple[Union[str, Tuple[str, str]], …]

tnorm: str The t-norm operator.

neg_norm: str The negation norm.

lambda_: float lambda parameter for sugeno and yager negation norms

k: int The top-k substitutions for intermediate variables.

returns:

List[Tuple[str, torch.Tensor]]
Entities and corresponding scores sorted in the descening order of scores

find_missing_triples(confidence: float, entities: List[str] = None, relations: List[str] = None, topk: int = 10, at_most: int = sys.maxsize) → Set[source]

Find missing triples

Iterative over a set of entities E and a set of relation R :

orall e in E and orall r in R f(e,r,x)

Return (e,r,x)

otin G and f(e,r,x) > confidence

confidence: float

A threshold for an output of a sigmoid function given a triple.

topk: int

Highest ranked k item to select triples with f(e,r,x) > confidence .

at_most: int

Stop after finding at_most missing triples

{(e,r,x) | f(e,r,x) > confidence land (e,r,x)

otin G

deploy(share: bool = False, top_k: int = 10)[source]

train_triples(h: List[str], r: List[str], t: List[str], labels: List[float], iteration=2, optimizer=None)[source]

train_k_vs_all(h, r, iteration=1, lr=0.001)[source]: Train k vs all :param head_entity: :param relation: :param iteration: :param lr: :return:

train(kg, lr=0.1, epoch=10, batch_size=32, neg_sample_ratio=10, num_workers=1) → None[source]: Retrained a pretrain model on an input KG via negative sampling.

class dicee.Execute(args, continuous_training=False)[source]

A class for Training, Retraining and Evaluation a model.

Loading & Preprocessing & Serializing input data.
Training & Validation & Testing
Storing all necessary info

args

is_continual_training = False

trainer = None

trained_model = None

knowledge_graph = None

report

evaluator = None

start_time = None

setup_executor() → None[source]

save_trained_model() → None[source]

Save a knowledge graph embedding model

Send model to eval mode and cpu.
Store the memory footprint of the model.
Save the model into disk.
Update the stats of KG again ?

Parameter

rtype:: None

end(form_of_labelling: str) → dict[source]

End training

Store trained model.
Report runtimes.
Eval model if required.

Parameter

rtype:: A dict containing information about the training and/or evaluation

write_report() → None[source]: Report training related information in a report.json file

start() → dict[source]

Start training

# (1) Loading the Data # (2) Create an evaluator object. # (3) Create a trainer object. # (4) Start the training

Parameter

rtype:: A dict containing information about the training and/or evaluation

dicee.mapping_from_first_two_cols_to_third(train_set_idx)[source]

dicee.timeit(func)[source]

dicee.load_term_mapping(file_path=str)[source]

dicee.reload_dataset(path: str, form_of_labelling, scoring_technique, neg_ratio, label_smoothing_rate)[source]: Reload the files from disk to construct the Pytorch dataset

dicee.construct_dataset(*, train_set: numpy.ndarray | list, valid_set=None, test_set=None, ordered_bpe_entities=None, train_target_indices=None, target_dim: int = None, entity_to_idx: dict, relation_to_idx: dict, form_of_labelling: str, scoring_technique: str, neg_ratio: int, label_smoothing_rate: float, byte_pair_encoding=None, block_size: int = None) → torch.utils.data.Dataset[source]

class dicee.BPE_NegativeSamplingDataset(train_set: torch.LongTensor, ordered_shaped_bpe_entities: torch.LongTensor, neg_ratio: int)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

train_set

ordered_bpe_entities

num_bpe_entities

neg_ratio

num_datapoints

__len__()[source]

__getitem__(idx)[source]

collate_fn(batch_shaped_bpe_triples: List[Tuple[torch.Tensor, torch.Tensor]])[source]

class dicee.MultiLabelDataset(train_set: torch.LongTensor, train_indices_target: torch.LongTensor, target_dim: int, torch_ordered_shaped_bpe_entities: torch.LongTensor)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

train_set

train_indices_target

target_dim

num_datapoints

torch_ordered_shaped_bpe_entities

collate_fn = None

__len__()[source]

__getitem__(idx)[source]

class dicee.MultiClassClassificationDataset(subword_units: numpy.ndarray, block_size: int = 8)[source]

Bases: torch.utils.data.Dataset

Dataset for the 1vsALL training strategy

Parameters:

train_set_idx – Indexed triples for the training.
entity_idxs – mapping.
relation_idxs – mapping.
form –
?
num_workers – int for https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

Return type:

torch.utils.data.Dataset

train_data

block_size = 8

num_of_data_points

collate_fn = None

__len__()[source]

__getitem__(idx)[source]

class dicee.OnevsAllDataset(train_set_idx: numpy.ndarray, entity_idxs)[source]

Bases: torch.utils.data.Dataset

Dataset for the 1vsALL training strategy

Parameters:

train_set_idx – Indexed triples for the training.
entity_idxs – mapping.
relation_idxs – mapping.
form –
?
num_workers – int for https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

Return type:

torch.utils.data.Dataset

train_data

target_dim

collate_fn = None

__len__()[source]

__getitem__(idx)[source]

class dicee.KvsAll(train_set_idx: numpy.ndarray, entity_idxs, relation_idxs, form, store=None, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

Creates a dataset for KvsAll training by inheriting from torch.utils.data.Dataset.: Let D denote a dataset for KvsAll training and be defined as D:= {(x,y)_i}_i ^N, where x: (h,r) is an unique tuple of an entity h in E and a relation r in R that has been seed in the input graph. y: denotes a multi-label vector in [0,1]^{|E|} is a binary label.

orall y_i =1 s.t. (h r E_i) in KG

Note

TODO

train_set_idxnumpy.ndarray
n by 3 array representing n triples

entity_idxsdictonary
string representation of an entity to its integer id

relation_idxsdictonary
string representation of a relation to its integer id

self : torch.utils.data.Dataset
>>> a = KvsAll()
>>> a
? array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

train_data = None

train_target = None

label_smoothing_rate

collate_fn = None

__len__()[source]

__getitem__(idx)[source]

class dicee.AllvsAll(train_set_idx: numpy.ndarray, entity_idxs, relation_idxs, label_smoothing_rate=0.0)[source]

Bases: torch.utils.data.Dataset

Creates a dataset for AllvsAll training by inheriting from torch.utils.data.Dataset.: Let D denote a dataset for AllvsAll training and be defined as D:= {(x,y)_i}_i ^N, where x: (h,r) is a possible unique tuple of an entity h in E and a relation r in R. Hence N = |E| x |R| y: denotes a multi-label vector in [0,1]^{|E|} is a binary label.

orall y_i =1 s.t. (h r E_i) in KG

Note

AllvsAll extends KvsAll via none existing (h,r). Hence, it adds data points that are labelled without 1s,
only with 0s.

train_set_idxnumpy.ndarray
n by 3 array representing n triples

entity_idxsdictonary
string representation of an entity to its integer id

relation_idxsdictonary
string representation of a relation to its integer id

self : torch.utils.data.Dataset
>>> a = AllvsAll()
>>> a
? array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

train_data = None

train_target = None

label_smoothing_rate

collate_fn = None

target_dim

__len__()[source]

__getitem__(idx)[source]

class dicee.OnevsSample(train_set: numpy.ndarray, num_entities, num_relations, neg_sample_ratio: int = None, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

A custom PyTorch Dataset class for knowledge graph embeddings, which includes both positive and negative sampling for a given dataset for multi-class classification problem..

Parameters:

train_set (np.ndarray) – A numpy array containing triples of knowledge graph data. Each triple consists of (head_entity, relation, tail_entity).
num_entities (int) – The number of unique entities in the knowledge graph.
num_relations (int) – The number of unique relations in the knowledge graph.
neg_sample_ratio (int, optional) – The number of negative samples to be generated per positive sample. Must be a positive integer and less than num_entities.
label_smoothing_rate (float, optional) – A label smoothing rate to apply to the positive and negative labels. Defaults to 0.0.

train_data

The input data converted into a PyTorch tensor.

Type:: torch.Tensor

num_entities

Number of entities in the dataset.

Type:: int

num_relations

Number of relations in the dataset.

Type:: int

neg_sample_ratio

Ratio of negative samples to be drawn for each positive sample.

Type:: int

label_smoothing_rate

The smoothing factor applied to the labels.

Type:: torch.Tensor

collate_fn

A function that can be used to collate data samples into batches (set to None by default).

Type:: function, optional

train_data

num_entities

num_relations

neg_sample_ratio = None

label_smoothing_rate

collate_fn = None

__len__()[source]: Returns the number of samples in the dataset.

__getitem__(idx)[source]

Retrieves a single data sample from the dataset at the given index.

Parameters:

idx (int) – The index of the sample to retrieve.

Returns:

A tuple consisting of:

x (torch.Tensor): The head and relation part of the triple.
y_idx (torch.Tensor): The concatenated indices of the true object (tail entity) and the indices of the negative samples.
y_vec (torch.Tensor): A vector containing the labels for the positive and negative samples, with label smoothing applied.

Return type:

tuple

class dicee.KvsSampleDataset(train_set_idx: numpy.ndarray, entity_idxs, relation_idxs, form, store=None, neg_ratio=None, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

KvsSample a Dataset:

D:= {(x,y)_i}_i ^N, where
. x:(h,r) is a unique h in E and a relation r in R and . y in [0,1]^{|E|} is a binary label.

orall y_i =1 s.t. (h r E_i) in KG

At each mini-batch construction, we subsample(y), hence n
|new_y| << |E| new_y contains all 1’s if sum(y)< neg_sample ratio new_y contains

train_set_idx: Indexed triples for the training.
entity_idxs: mapping.
relation_idxs: mapping.
form: ?
store: ?
label_smoothing_rate: ?

torch.utils.data.Dataset

train_data = None

train_target = None

neg_ratio = None

num_entities

label_smoothing_rate

collate_fn = None

max_num_of_classes

__len__()[source]

__getitem__(idx)[source]

class dicee.NegSampleDataset(train_set: numpy.ndarray, num_entities: int, num_relations: int, neg_sample_ratio: int = 1)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

neg_sample_ratio

train_set

length

num_entities

num_relations

__len__()[source]

__getitem__(idx)[source]

class dicee.TriplePredictionDataset(train_set: numpy.ndarray, num_entities: int, num_relations: int, neg_sample_ratio: int = 1, label_smoothing_rate: float = 0.0)[source]

Bases: torch.utils.data.Dataset

Triple Dataset

D:= {(x)_i}_i ^N, where
. x:(h,r, t) in KG is a unique h in E and a relation r in R and . collact_fn => Generates negative triples

collect_fn:

orall (h,r,t) in G obtain, create negative triples{(h,r,x),(,r,t),(h,m,t)}

y:labels are represented in torch.float16

train_set_idx
Indexed triples for the training.

entity_idxs
mapping.

relation_idxs
mapping.

form
?

store
?

label_smoothing_rate

collate_fn: batch:List[torch.IntTensor] Returns ——- torch.utils.data.Dataset

label_smoothing_rate

neg_sample_ratio

train_set

length

num_entities

num_relations

__len__()[source]

__getitem__(idx)[source]

collate_fn(batch: List[torch.Tensor])[source]

class dicee.CVDataModule(train_set_idx: numpy.ndarray, num_entities, num_relations, neg_sample_ratio, batch_size, num_workers)[source]

Bases: pytorch_lightning.LightningDataModule

Create a Dataset for cross validation

Parameters:

train_set_idx – Indexed triples for the training.
num_entities – entity to index mapping.
num_relations – relation to index mapping.
batch_size – int
form –
?
num_workers – int for https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

Return type:

?

train_set_idx

num_entities

num_relations

neg_sample_ratio

batch_size

num_workers

train_dataloader() → torch.utils.data.DataLoader[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

fit()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

setup(*args, **kwargs)[source]

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Parameters:: stage – either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)

transfer_batch_to_device(*args, **kwargs)[source]

Override this hook if your DataLoader returns tensors wrapped in a custom data structure.

The data types listed below (and any arbitrary nesting of them) are supported out of the box:

torch.Tensor or anything that implements .to(…)
list
dict
tuple

For anything else, you need to define how the data is moved to the target device (CPU, GPU, TPU, …).

Note

This hook should only transfer the data and not modify it, nor should it move the data to any other device than the one passed in as argument (unless you know what you are doing). To check the current state of execution of this hook you can use self.trainer.training/testing/validating/predicting so that you can add different logic as per your requirement.

Parameters:

batch – A batch of data that needs to be transferred to a new device.
device – The target device as defined in PyTorch.
dataloader_idx – The index of the dataloader to which the batch belongs.

Returns:

A reference to the data on the new device.

Example:

def transfer_batch_to_device(self, batch, device, dataloader_idx):
    if isinstance(batch, CustomBatch):
        # move all tensors in your custom data structure to the device
        batch.samples = batch.samples.to(device)
        batch.targets = batch.targets.to(device)
    elif dataloader_idx == 0:
        # skip device transfer for the first dataloader or anything you wish
        pass
    else:
        batch = super().transfer_batch_to_device(batch, device, dataloader_idx)
    return batch

dicee

Submodules

Attributes

Classes

Functions

Package Contents

Parameter

Parameter

Parameter

Parameter

Parameter

Parameter

Input

Output

Input

Output

Input

Output

Parameter

Parameter

Returns: Tuple

Parameter

Returns: Tuple

Parameter

Returns: Tuple

Parameter

Returns: Tuple

Parameter

Returns: Tuple

Parameter

Parameter

Parameter

Parameter