dicee.knowledge_graph_embeddings

Classes

KGE

Knowledge Graph Embedding Class for interactive usage of pre-trained models

Module Contents

class dicee.knowledge_graph_embeddings.KGE(path=None, url=None, construct_ensemble=False, model_name=None)[source]

Bases: dicee.abstracts.BaseInteractiveKGE, dicee.abstracts.InteractiveQueryDecomposition, dicee.abstracts.BaseInteractiveTrainKGE

Knowledge Graph Embedding Class for interactive usage of pre-trained models

__str__()[source]
to(device: str) None[source]
get_transductive_entity_embeddings(indices: torch.LongTensor | List[str], as_pytorch=False, as_numpy=False, as_list=True) torch.FloatTensor | numpy.ndarray | List[float][source]
create_vector_database(collection_name: str, distance: str, location: str = 'localhost', port: int = 6333)[source]
generate(h='', r='')[source]
eval_lp_performance(dataset=List[Tuple[str, str, str]], filtered=True)[source]
predict_missing_head_entity(relation: List[str] | str, tail_entity: List[str] | str, within=None, batch_size=2, topk=1, return_indices=False) Tuple[source]

Given a relation and a tail entity, return top k ranked head entity.

argmax_{e in E } f(e,r,t), where r in R, t in E.

Parameter

relation: Union[List[str], str]

String representation of selected relations.

tail_entity: Union[List[str], str]

String representation of selected entities.

k: int

Highest ranked k entities.

Returns: Tuple

Highest K scores and entities

predict_missing_relations(head_entity: List[str] | str, tail_entity: List[str] | str, within=None, batch_size=2, topk=1, return_indices=False) Tuple[source]

Given a head entity and a tail entity, return top k ranked relations.

argmax_{r in R } f(h,r,t), where h, t in E.

Parameter

head_entity: List[str]

String representation of selected entities.

tail_entity: List[str]

String representation of selected entities.

k: int

Highest ranked k entities.

Returns: Tuple

Highest K scores and entities

predict_missing_tail_entity(head_entity: List[str] | str, relation: List[str] | str, within: List[str] = None, batch_size=2, topk=1, return_indices=False) torch.FloatTensor[source]

Given a head entity and a relation, return top k ranked entities

argmax_{e in E } f(h,r,e), where h in E and r in R.

Parameter

head_entity: List[str]

String representation of selected entities.

tail_entity: List[str]

String representation of selected entities.

Returns: Tuple

scores

predict(*, h: List[str] | str | None = None, r: List[str] | str | None = None, t: List[str] | str | None = None, within: List[str] | None = None, logits: bool = True) torch.FloatTensor[source]

Predict scores for triples or missing triple elements.

Parameters:
  • h – Head entity/entities. None to predict heads.

  • r – Relation/relations. None to predict relations.

  • t – Tail entity/entities. None to predict tails.

  • within – Optional list of entities to restrict predictions to.

  • logits – If True, return raw scores. If False, return sigmoid scores (0-1).

Returns:

  • Single triple (h, r, t): scalar score

  • Missing element: vector of all possible scores

Return type:

torch.FloatTensor of scores. Shape depends on the query type

Raises:

AssertionError – If inputs are not strings or lists of strings.

Examples

>>> # Score a specific triple
>>> model.predict(h="Mongolia", r="isLocatedIn", t="Asia", logits=False)
tensor(0.9523)
>>> # Get scores for all possible tail entities
>>> model.predict(h="Mongolia", r="isLocatedIn", t=None)
tensor([0.21, 0.95, 0.03, ...])  # One score per entity
predict_topk(*, h: str | List[str] | None = None, r: str | List[str] | None = None, t: str | List[str] | None = None, topk: int = 10, within: List[str] | None = None, batch_size: int = 1024) List[Tuple[str, float]] | List[List[Tuple[str, float]]][source]

Predict top-k missing items in a given triple pattern.

Parameters:
  • h – Head entity/entities. None to predict heads.

  • r – Relation/relations. None to predict relations.

  • t – Tail entity/entities. None to predict tails.

  • topk – Number of top predictions to return.

  • within – Optional list of entities to restrict predictions to.

  • batch_size – Batch size for processing multiple queries.

Returns:

List[(item, score), …] of length topk. For batch query: List of such lists, one per query.

Return type:

For single query

Raises:
  • AssertionError – If more than one of h, r, t is None.

  • AssertionError – If the required arguments for a query type are None.

Examples

>>> model.predict_topk(h=["Mongolia"], r=["isLocatedIn"], topk=3)
[('Asia', 0.99), ('Europe', 0.02), ...]
>>> model.predict_topk(r=["isLocatedIn"], t=["Asia"], topk=5)
[('Mongolia', 0.85), ('China', 0.82), ...]
triple_score(h: List[str] | str = None, r: List[str] | str = None, t: List[str] | str = None, logits=False) torch.FloatTensor[source]

Predict triple score

Parameter

head_entity: List[str]

String representation of selected entities.

relation: List[str]

String representation of selected relations.

tail_entity: List[str]

String representation of selected entities.

logits: bool

If logits is True, unnormalized score returned

Returns: Tuple

pytorch tensor of triple score

return_multi_hop_query_results(aggregated_query_for_all_entities, k: int, only_scores)[source]
single_hop_query_answering(query: tuple, only_scores: bool = True, k: int = None, use_logits: bool = True)[source]
answer_multi_hop_query(query_type: str | None = None, query: Tuple[str | Tuple[str, str], Ellipsis] | None = None, queries: List[Tuple[str | Tuple[str, str], Ellipsis]] | None = None, tnorm: str = 'prod', neg_norm: str = 'standard', lambda_: float = 0.0, k: int = 10, only_scores: bool = False, use_logits: bool = True) List[Tuple[str, torch.Tensor]] | List[List[Tuple[str, torch.Tensor]]][source]

Answer multi-hop EPFO (Existential Positive First-Order) queries.

Supports 9 query types: 1p, 2p, 3p, 2i, 3i, ip, pi, 2u, up. See docs/guides/multi_hop_queries.md for detailed query patterns.

Parameters:
  • query_type – Query pattern name. One of: - 1p: (e, (r,)) # One-hop - 2p: (e, (r1, r2)) # Two-hop - 3p: (e, (r1, r2, r3)) # Three-hop - 2i: ((e1, (r1,)), (e2, (r2,))) # Two-way intersection - 3i: ((e1, (r1,)), (e2, (r2,)), (e3, (r3,))) # Three-way intersection - ip: (((e1, (r1,)), (e2, (r2,))), (r3,)) # Intersection + projection - pi: ((e, (r1, r2)), (r3,)) # Projection + intersection (2i meets 2p) - 2u: ((e1, (r1,)), (e2, (r2,))) # Two-way union - up: ((e, (r1, r2)), (e, (r3,))) # Union + projection

  • query – Single query tuple matching the query_type pattern.

  • queries – Batch of queries. If provided, query must be None.

  • tnorm – T-norm for intersection/union. Options: “prod”, “min”.

  • neg_norm – Negation norm. Options: “standard”, “sugeno”, “yager”.

  • lambda – Parameter for sugeno and yager negation (0.0-1.0).

  • k – Number of top answer entities to return.

  • only_scores – If True, return only scores tensor. If False, return (entity, score) tuples.

  • use_logits – If True, use raw model logits. If False, use sigmoid probabilities.

Returns:

List[(entity, score), …] of top-k answers. For batch queries: List of such lists, one per query.

Return type:

For single query

Raises:
  • ValueError – If query_type is not in {1p, 2p, 3p, 2i, 3i, ip, pi, 2u, up}.

  • AssertionError – If query structure doesn’t match query_type pattern.

Examples

>>> # 1p: Find entities located in Asia
>>> model.answer_multi_hop_query(
...     query_type="1p",
...     query=("Asia", ("isLocatedIn",)),
...     k=5
... )
[("Mongolia", 0.92), ("China", 0.89), ...]
>>> # 2p: Two-hop query (e.g., "capital of countries in Europe")
>>> model.answer_multi_hop_query(
...     query_type="2p",
...     query=("Europe", ("isLocatedIn", "hasCapital")),
...     k=3
... )
[("Paris", 0.85), ("Berlin", 0.82), ...]
>>> # 2i: Intersection query
>>> model.answer_multi_hop_query(
...     query_type="2i",
...     query=(("Asia", ("isLocatedIn",)), ("Mountains", ("hasGeography",))),
...     k=5
... )
[("Nepal", 0.78), ("Tibet", 0.65), ...]

See also

  • docs/guides/multi_hop_queries.md: Complete guide with all query patterns

  • tests/test_answer_multi_hop_query.py: Usage examples

find_missing_triples(confidence: float, entities: List[str] = None, relations: List[str] = None, topk: int = 10, at_most: int = sys.maxsize) Set[source]

Find missing triples

Iterative over a set of entities E and a set of relation R :

orall e in E and orall r in R f(e,r,x)

Return (e,r,x)

otin G and f(e,r,x) > confidence

confidence: float

A threshold for an output of a sigmoid function given a triple.

topk: int

Highest ranked k item to select triples with f(e,r,x) > confidence .

at_most: int

Stop after finding at_most missing triples

{(e,r,x) | f(e,r,x) > confidence land (e,r,x)

otin G

predict_literals(entity: List[str] | str = None, attribute: List[str] | str = None, denormalize_preds: bool = True) numpy.ndarray[source]

Predicts literal values for given entities and attributes.

Parameters:
  • entity (Union[List[str], str]) – Entity or list of entities to predict literals for.

  • attribute (Union[List[str], str]) – Attribute or list of attributes to predict literals for.

  • denormalize_preds (bool) – If True, denormalizes the predictions.

Returns:

Predictions for the given entities and attributes.

Return type:

numpy ndarray