owlapy.agen_kg
AGen-KG module
Submodules
Classes
Package Contents
- class owlapy.agen_kg.AGenKG(model='gpt-4o', api_key='<YOUR_GITHUB_PAT>', api_base='https://models.github.ai/inference', temperature=0.1, seed=42, cache=False, enable_logging=False, max_tokens=4000)[source]
- logging = False
- model = 'gpt-4o'
- api_key = '<YOUR_GITHUB_PAT>'
- api_base = 'https://models.github.ai/inference'
- temperature = 0.1
- seed = 42
- cache = False
- enable_logging = False
- max_tokens = 4000
- open_graph_extractor
- domain_graph_extractor
- configure_chunking(chunk_size: int = None, overlap: int = None, strategy: str = None, auto_chunk_threshold: int = None, summarization_threshold: int = None, max_summary_length: int = None)[source]
Configure text chunking settings for all extractors.
This method allows fine-tuning of how large texts are split into manageable pieces for LLM processing.
- Parameters:
chunk_size – Maximum characters per chunk (default: 3000, ~750 tokens).
overlap – Characters to overlap between chunks (default: 200).
strategy – Chunking strategy - “sentence”, “paragraph”, or “fixed”.
auto_chunk_threshold – Character threshold for automatic chunking (default: 4000).
summarization_threshold – Character threshold for using summarization in clustering (default: 8000). When text exceeds this, summaries are generated to provide context for clustering operations.
max_summary_length – Maximum length of summaries used for clustering context (default: 3000, ~750 tokens).
Example
# Configure for a model with smaller context window agenkg.configure_chunking(chunk_size=2000, overlap=150, strategy=”sentence”)
# Configure summarization thresholds for large documents agenkg.configure_chunking(summarization_threshold=10000, max_summary_length=4000)
- configure_chunking_for_model(max_context_tokens: int, prompt_overhead_tokens: int = 1500)[source]
Automatically configure chunking based on model specifications for all extractors.
- Parameters:
max_context_tokens – Maximum context window of your model (e.g., 4096 for GPT-3.5).
prompt_overhead_tokens – Estimated tokens used by prompts/few-shot examples.
Example
# For GPT-3.5-turbo (4K context) agenkg.configure_chunking_for_model(4096, 1000)
# For GPT-4 (8K context) agenkg.configure_chunking_for_model(8192, 1500)
- generate_ontology(text, ontology_type='domain', query=None, **kwargs)[source]
Generate an ontology from the provided text using the specified extractor type. :param text: The input text from which to extract the ontology or path to a text file.
Supported file formats: .txt, .docx, .pdf, .rtf, .html.
- Parameters:
ontology_type – Type of ontology extraction - “domain” or “open”. “domain” stands for domain-specific ontology extraction. “open” stands for open-world ontology extraction (more general).
query – Specific instructions to the agent.
**kwargs – Additional parameters for customization.