owlapy.agen_kg

AGen-KG module

Submodules

Classes

AGenKG

Package Contents

class owlapy.agen_kg.AGenKG(model='gpt-4o', api_key='<YOUR_GITHUB_PAT>', api_base='https://models.github.ai/inference', temperature=0.1, seed=42, cache=False, enable_logging=False, max_tokens=4000)[source]
logging = False
model = 'gpt-4o'
api_key = '<YOUR_GITHUB_PAT>'
api_base = 'https://models.github.ai/inference'
temperature = 0.1
seed = 42
cache = False
enable_logging = False
max_tokens = 4000
open_graph_extractor
domain_graph_extractor
configure_chunking(chunk_size: int = None, overlap: int = None, strategy: str = None, auto_chunk_threshold: int = None, summarization_threshold: int = None, max_summary_length: int = None)[source]

Configure text chunking settings for all extractors.

This method allows fine-tuning of how large texts are split into manageable pieces for LLM processing.

Parameters:
  • chunk_size – Maximum characters per chunk (default: 3000, ~750 tokens).

  • overlap – Characters to overlap between chunks (default: 200).

  • strategy – Chunking strategy - “sentence”, “paragraph”, or “fixed”.

  • auto_chunk_threshold – Character threshold for automatic chunking (default: 4000).

  • summarization_threshold – Character threshold for using summarization in clustering (default: 8000). When text exceeds this, summaries are generated to provide context for clustering operations.

  • max_summary_length – Maximum length of summaries used for clustering context (default: 3000, ~750 tokens).

Example

# Configure for a model with smaller context window agenkg.configure_chunking(chunk_size=2000, overlap=150, strategy=”sentence”)

# Configure summarization thresholds for large documents agenkg.configure_chunking(summarization_threshold=10000, max_summary_length=4000)

configure_chunking_for_model(max_context_tokens: int, prompt_overhead_tokens: int = 1500)[source]

Automatically configure chunking based on model specifications for all extractors.

Parameters:
  • max_context_tokens – Maximum context window of your model (e.g., 4096 for GPT-3.5).

  • prompt_overhead_tokens – Estimated tokens used by prompts/few-shot examples.

Example

# For GPT-3.5-turbo (4K context) agenkg.configure_chunking_for_model(4096, 1000)

# For GPT-4 (8K context) agenkg.configure_chunking_for_model(8192, 1500)

generate_ontology(text, ontology_type='domain', query=None, **kwargs)[source]

Generate an ontology from the provided text using the specified extractor type. :param text: The input text from which to extract the ontology or path to a text file.

Supported file formats: .txt, .docx, .pdf, .rtf, .html.

Parameters:
  • ontology_type – Type of ontology extraction - “domain” or “open”. “domain” stands for domain-specific ontology extraction. “open” stands for open-world ontology extraction (more general).

  • query – Specific instructions to the agent.

  • **kwargs – Additional parameters for customization.