dicee.models.transformers

Full definition of a GPT Language Model, all of it in this single file. References: 1) the official GPT-2 TensorFlow implementation released by OpenAI: https://github.com/openai/gpt-2/blob/master/src/model.py 2) huggingface/transformers PyTorch implementation: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py

Classes

BytE

Base class for all Knowledge Graph Embedding models.

LayerNorm

LayerNorm but with an optional bias. PyTorch doesn't support simply bias=False

SelfAttention

Base class for all neural network modules.

MLP

Base class for all neural network modules.

Block

Base class for all neural network modules.

GPTConfig

GPT

Base class for all neural network modules.

Module Contents

class dicee.models.transformers.BytE(*args, **kwargs)[source]

Bases: dicee.models.base_model.BaseKGE

Base class for all Knowledge Graph Embedding models.

Inherits the Lightning training loop from BaseKGELightning and adds the embedding tables, normalisation / dropout layers, and the routing logic that dispatches forward() calls to the appropriate scoring method.

Sub-classes must implement at minimum:

  • forward_triples() — score a batch of (h, r, t) triples.

  • forward_k_vs_all() — score a (h, r) batch against every entity.

Parameters:

args (dict) – Flat configuration dictionary produced by vars(argparse.Namespace). Required keys: embedding_dim, num_entities, num_relations, learning_rate (or lr), optim, scoring_technique.

name = 'BytE'
config
temperature = 0.5
topk = 2
transformer
lm_head
loss_function(yhat_batch, y_batch)[source]

Compute the loss between model predictions and targets.

Delegates to self.loss which is configured in BaseKGE.__init__ based on the scoring technique (BCEWithLogitsLoss for entity/relation prediction, CrossEntropyLoss for classification).

Parameters:
  • yhat_batch (torch.FloatTensor) – Model output scores, shape (batch_size, *).

  • y_batch (torch.FloatTensor) – Ground-truth labels of the same shape as yhat_batch.

Returns:

Scalar loss value.

Return type:

torch.FloatTensor

forward(x: torch.LongTensor)[source]
Parameters:

x (B by T tensor)

generate(idx, max_new_tokens, temperature=1.0, top_k=None)[source]

Take a conditioning sequence of indices idx (LongTensor of shape (b,t)) and complete the sequence max_new_tokens times, feeding the predictions back into the model each time. Most likely you’ll want to make sure to be in model.eval() mode of operation for this.

training_step(batch, batch_idx=None)[source]

Execute one optimisation step for the given mini-batch.

Handles two- and three-element batches produced by the different dataset classes (KvsAll / NegSample vs. KvsSample).

Parameters:
  • batch (tuple) – (x, y) for standard scoring, or (x, y_select, y) for sample-based labelling.

  • batch_idx (int, optional) – Index of the current batch (unused, kept for Lightning API compat).

Returns:

Scalar loss value for this batch.

Return type:

torch.FloatTensor

class dicee.models.transformers.LayerNorm(ndim, bias)[source]

Bases: torch.nn.Module

LayerNorm but with an optional bias. PyTorch doesn’t support simply bias=False

weight
bias
forward(input)[source]
class dicee.models.transformers.SelfAttention(config)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F


class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

c_attn
c_proj
attn_dropout
resid_dropout
n_head
n_embd
dropout
causal
flash = True
forward(x)[source]
class dicee.models.transformers.MLP(config)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F


class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

c_fc
gelu
c_proj
dropout
forward(x)[source]
class dicee.models.transformers.Block(config)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F


class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

ln_1
attn
ln_2
mlp
forward(x)[source]
class dicee.models.transformers.GPTConfig[source]
block_size: int = 1024
vocab_size: int = 50304
n_layer: int = 12
n_head: int = 12
n_embd: int = 768
dropout: float = 0.0
bias: bool = False
causal: bool = True
class dicee.models.transformers.GPT(config)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F


class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

config
transformer
lm_head
get_num_params(non_embedding=True)[source]

Return the number of parameters in the model. For non-embedding count (default), the position embeddings get subtracted. The token embeddings would too, except due to the parameter sharing these params are actually used as weights in the final layer, so we include them.

forward(idx, targets=None)[source]
crop_block_size(block_size)[source]
classmethod from_pretrained(model_type, override_args=None)[source]
configure_optimizers(weight_decay, learning_rate, betas, device_type)[source]
estimate_mfu(fwdbwd_per_iter, dt)[source]

estimate model flops utilization (MFU) in units of A100 bfloat16 peak FLOPS