dicee.models.real

Classes

DistMult

DistMult: bilinear diagonal knowledge graph embedding.

TransE

TransE: translation-based knowledge graph embedding.

Shallom

Shallom: shallow neural model for relation prediction.

Pyke

Pyke: Physical Embedding Model for Knowledge Graphs.

CoKEConfig

Configuration for the CoKE (Contextualized Knowledge Graph Embedding) model.

CoKE

Contextualized Knowledge Graph Embedding (CoKE) model.

Module Contents

class dicee.models.real.DistMult(args)[source]

Bases: dicee.models.base_model.BaseKGE

DistMult: bilinear diagonal knowledge graph embedding.

Scores a triple (h, r, t) as the element-wise product of the head, relation, and tail embeddings summed over the embedding dimension:

f(h, r, t) = \sum_i  h_i  \cdot  r_i  \cdot  t_i

Simple yet effective baseline; incapable of modelling asymmetric relations.

References

Yang et al., Embedding Entities and Relations for Learning and Inference in Knowledge Bases, ICLR 2015. https://arxiv.org/abs/1412.6575

name = 'DistMult'
k_vs_all_score(emb_h: torch.FloatTensor, emb_r: torch.FloatTensor, emb_E: torch.FloatTensor) torch.FloatTensor[source]

Score a head/relation batch against all entity embeddings.

Computes (h * r) @ E^T after applying hidden dropout and normalisation to the element-wise product.

Parameters:
  • emb_h (torch.FloatTensor) – Head entity embeddings, shape (batch_size, embedding_dim).

  • emb_r (torch.FloatTensor) – Relation embeddings, shape (batch_size, embedding_dim).

  • emb_E (torch.FloatTensor) – All entity embeddings, shape (num_entities, embedding_dim).

Returns:

Shape (batch_size, num_entities) score matrix.

Return type:

torch.FloatTensor

forward_k_vs_all(x: torch.LongTensor) torch.FloatTensor[source]

KvsAll forward pass: score head/relation against all entities.

Parameters:

x (torch.LongTensor) – Shape (batch_size, 2) integer tensor [head_idx, relation_idx].

Returns:

Shape (batch_size, num_entities) score matrix.

Return type:

torch.FloatTensor

forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor) torch.FloatTensor[source]

KvsSample forward pass: score head/relation against a sampled entity subset.

Parameters:
  • x (torch.LongTensor) – Shape (batch_size, 2) integer tensor [head_idx, relation_idx].

  • target_entity_idx (torch.LongTensor) – Shape (batch_size, k) indices of the k target entities per sample.

Returns:

Shape (batch_size, k) score matrix.

Return type:

torch.FloatTensor

score(h: torch.FloatTensor, r: torch.FloatTensor, t: torch.FloatTensor) torch.FloatTensor[source]

Score a batch of (head, relation, tail) embedding triples.

Parameters:
  • h (torch.FloatTensor) – Each has shape (batch_size, embedding_dim).

  • r (torch.FloatTensor) – Each has shape (batch_size, embedding_dim).

  • t (torch.FloatTensor) – Each has shape (batch_size, embedding_dim).

Returns:

Shape (batch_size,) triple scores.

Return type:

torch.FloatTensor

class dicee.models.real.TransE(args)[source]

Bases: dicee.models.base_model.BaseKGE

TransE: translation-based knowledge graph embedding.

Models a relation r as a translation in embedding space such that h + r t for a true triple (h, r, t). The score function is defined as:

f(h, r, t) = margin - ||h + r - t||_2

TransE is effective for 1-to-1 relations but struggles with reflexive, one-to-many, and many-to-one patterns.

References

Bordes et al., Translating Embeddings for Modeling Multi-relational Data, NeurIPS 2013. https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf

name = 'TransE'
margin = 4
score(head_ent_emb: torch.FloatTensor, rel_ent_emb: torch.FloatTensor, tail_ent_emb: torch.FloatTensor) torch.FloatTensor[source]

Score a batch of triples using the TransE margin-distance formula.

Parameters:
  • head_ent_emb (torch.FloatTensor) – Each has shape (batch_size, embedding_dim).

  • rel_ent_emb (torch.FloatTensor) – Each has shape (batch_size, embedding_dim).

  • tail_ent_emb (torch.FloatTensor) – Each has shape (batch_size, embedding_dim).

Returns:

Shape (batch_size,) scores equal to margin - ||h + r - t||_2.

Return type:

torch.FloatTensor

forward_k_vs_all(x: torch.Tensor) torch.FloatTensor[source]

KvsAll forward pass: score head/relation against all entities.

Computes margin - ||h + r - e||_2 for every entity embedding e.

Parameters:

x (torch.Tensor) – Shape (batch_size, 2) integer tensor [head_idx, relation_idx].

Returns:

Shape (batch_size, num_entities) score matrix.

Return type:

torch.FloatTensor

class dicee.models.real.Shallom(args)[source]

Bases: dicee.models.base_model.BaseKGE

Shallom: shallow neural model for relation prediction.

Represents each triple as the concatenation of head and tail entity embeddings and feeds it through a two-layer MLP to predict the relation. Designed for the RelationPrediction labelling form.

References

Demir et al., A Shallow Neural Model for Relation Prediction, ISWC 2021. https://arxiv.org/abs/2101.09090

name = 'Shallom'
shallom
get_embeddings() Tuple[numpy.ndarray, None][source]

Return the entity and relation embedding matrices as numpy arrays.

Returns:

  • entity_embeddings (numpy.ndarray) – Shape (num_entities, embedding_dim).

  • relation_embeddings (numpy.ndarray) – Shape (num_relations, embedding_dim).

forward_k_vs_all(x) torch.FloatTensor[source]

Score a (head, relation) batch against every entity.

Sub-classes must override this method. The default implementation raises ValueError to make missing overrides obvious at runtime.

Returns:

Shape (batch_size, num_entities) score matrix.

Return type:

torch.FloatTensor

forward_triples(x) torch.FloatTensor[source]

Score a batch of triples by looking up relation scores from forward_k_vs_all.

Parameters:

x (torch.LongTensor) – Shape (batch_size, 3) integer tensor [head_idx, relation_idx, tail_idx].

Returns:

Shape (batch_size,) triple scores.

Return type:

torch.FloatTensor

class dicee.models.real.Pyke(args)[source]

Bases: dicee.models.base_model.BaseKGE

Pyke: Physical Embedding Model for Knowledge Graphs.

Scores a triple (h, r, t) based on the average pairwise distance between head-to-relation and relation-to-tail in embedding space:

f(h, r, t) = margin - (||h - r||_2 + ||r - t||_2) / 2

The model encodes geometric proximity between entities and the relations that connect them.

name = 'Pyke'
dist_func
margin = 1.0
forward_triples(x: torch.LongTensor) torch.FloatTensor[source]

Score a batch of triples using the Pyke distance formula.

Parameters:

x (torch.LongTensor) – Shape (batch_size, 3) integer tensor [head_idx, relation_idx, tail_idx].

Returns:

Shape (batch_size,) triple scores.

Return type:

torch.FloatTensor

class dicee.models.real.CoKEConfig[source]

Configuration for the CoKE (Contextualized Knowledge Graph Embedding) model.

block_size

Sequence length for transformer (3 for triples: head, relation, tail)

vocab_size

Total vocabulary size (num_entities + num_relations)

n_layer

Number of transformer layers

n_head

Number of attention heads per layer

n_embd

Embedding dimension (set to match model embedding_dim)

dropout

Dropout rate applied throughout the model

bias

Whether to use bias in linear layers

causal

Whether to use causal masking (False for bidirectional attention)

block_size: int = 3
vocab_size: int = None
n_layer: int = 6
n_head: int = 8
n_embd: int = None
dropout: float = 0.3
bias: bool = True
causal: bool = False
class dicee.models.real.CoKE(args, config: CoKEConfig = CoKEConfig())[source]

Bases: dicee.models.base_model.BaseKGE

Contextualized Knowledge Graph Embedding (CoKE) model. Based on: https://arxiv.org/pdf/1911.02168.

CoKE uses a transformer encoder to learn contextualized representations of entities and relations. For link prediction, it predicts masked elements in (head, relation, tail) triples using bidirectional attention, similar to BERT’s masked language modeling approach.

The model creates a sequence [head_emb, relation_emb, mask_emb], adds positional embeddings, and processes it through transformer layers to predict the tail entity.

name = 'CoKE'
config
pos_emb
mask_emb
blocks
ln_f
coke_dropout
forward_k_vs_all(x: torch.Tensor)[source]

Score a (head, relation) batch against every entity.

Sub-classes must override this method. The default implementation raises ValueError to make missing overrides obvious at runtime.

Returns:

Shape (batch_size, num_entities) score matrix.

Return type:

torch.FloatTensor

score(emb_h, emb_r, emb_t)[source]
forward_k_vs_sample(x: torch.LongTensor, target_entity_idx: torch.LongTensor)[source]

Score a (head, relation) batch against a sampled subset of entities.

Used by KvsSample and 1vsSample datasets. Sub-classes that support sample-based labelling must override this method.

Returns:

Shape (batch_size, k) score matrix where k is the number of sampled target entities.

Return type:

torch.FloatTensor