dicee.models.transformers
Full definition of a GPT Language Model, all of it in this single file. References: 1) the official GPT-2 TensorFlow implementation released by OpenAI: https://github.com/openai/gpt-2/blob/master/src/model.py 2) huggingface/transformers PyTorch implementation: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py
Classes
Base class for all Knowledge Graph Embedding models. |
|
LayerNorm but with an optional bias. PyTorch doesn't support simply bias=False |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
Module Contents
- class dicee.models.transformers.BytE(*args, **kwargs)[source]
Bases:
dicee.models.base_model.BaseKGEBase class for all Knowledge Graph Embedding models.
Inherits the Lightning training loop from
BaseKGELightningand adds the embedding tables, normalisation / dropout layers, and the routing logic that dispatchesforward()calls to the appropriate scoring method.Sub-classes must implement at minimum:
forward_triples()— score a batch of(h, r, t)triples.forward_k_vs_all()— score a(h, r)batch against every entity.
- Parameters:
args (dict) – Flat configuration dictionary produced by
vars(argparse.Namespace). Required keys:embedding_dim,num_entities,num_relations,learning_rate(orlr),optim,scoring_technique.
- name = 'BytE'
- config
- temperature = 0.5
- topk = 2
- transformer
- lm_head
- loss_function(yhat_batch, y_batch)[source]
Compute the loss between model predictions and targets.
Delegates to
self.losswhich is configured inBaseKGE.__init__based on the scoring technique (BCEWithLogitsLossfor entity/relation prediction,CrossEntropyLossfor classification).- Parameters:
yhat_batch (torch.FloatTensor) – Model output scores, shape
(batch_size, *).y_batch (torch.FloatTensor) – Ground-truth labels of the same shape as yhat_batch.
- Returns:
Scalar loss value.
- Return type:
torch.FloatTensor
- generate(idx, max_new_tokens, temperature=1.0, top_k=None)[source]
Take a conditioning sequence of indices idx (LongTensor of shape (b,t)) and complete the sequence max_new_tokens times, feeding the predictions back into the model each time. Most likely you’ll want to make sure to be in model.eval() mode of operation for this.
- training_step(batch, batch_idx=None)[source]
Execute one optimisation step for the given mini-batch.
Handles two- and three-element batches produced by the different dataset classes (
KvsAll/NegSamplevs.KvsSample).- Parameters:
batch (tuple) –
(x, y)for standard scoring, or(x, y_select, y)for sample-based labelling.batch_idx (int, optional) – Index of the current batch (unused, kept for Lightning API compat).
- Returns:
Scalar loss value for this batch.
- Return type:
torch.FloatTensor
- class dicee.models.transformers.LayerNorm(ndim, bias)[source]
Bases:
torch.nn.ModuleLayerNorm but with an optional bias. PyTorch doesn’t support simply bias=False
- weight
- bias
- class dicee.models.transformers.SelfAttention(config)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- c_attn
- c_proj
- attn_dropout
- resid_dropout
- n_head
- n_embd
- dropout
- causal
- flash = True
- class dicee.models.transformers.MLP(config)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- c_fc
- gelu
- c_proj
- dropout
- class dicee.models.transformers.Block(config)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- ln_1
- attn
- ln_2
- mlp
- class dicee.models.transformers.GPTConfig[source]
- block_size: int = 1024
- vocab_size: int = 50304
- n_layer: int = 12
- n_head: int = 12
- n_embd: int = 768
- dropout: float = 0.0
- bias: bool = False
- causal: bool = True
- class dicee.models.transformers.GPT(config)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- config
- transformer
- lm_head
- get_num_params(non_embedding=True)[source]
Return the number of parameters in the model. For non-embedding count (default), the position embeddings get subtracted. The token embeddings would too, except due to the parameter sharing these params are actually used as weights in the final layer, so we include them.