`its-hub`: A Python library for inference-time scaling

its_hub is a Python library for inference-time scaling of LLMs, focusing on mathematical reasoning tasks.

ITS Hub algorithms: Self-Consistency, Best-of-N, and Particle Filtering

📚 Documentation

For comprehensive documentation, including installation guides, tutorials, and API reference, visit:

https://ai-innovation.team/its_hub

Installation

its_hub provides a minimal core focused on algorithms, with optional language model implementations.

Core Installation (Algorithms Only)

For gateway integration - just algorithms and interfaces, minimal dependencies:

pip install its_hub

This includes:

✓ Self-Consistency and Best-of-N algorithms
✓ Abstract base classes (AbstractLanguageModel, AbstractOutcomeRewardModel)
✓ Only 2 dependencies: numpy, typing-extensions

With Language Model Support

For standalone use - includes OpenAI-compatible language model implementation:

pip install its_hub[lm]

Adds: OpenAICompatibleLanguageModel, LLMJudge, StepGeneration (requires openai, aiohttp, backoff)

vLLM users: its_hub uses the max_completion_tokens parameter (the OpenAI API standard), which requires vLLM >= 0.6.2. We recommend vLLM >= 0.14.0.

With Experimental Algorithms

For experimental features - includes beam search and particle filtering:

pip install its_hub[experimental]

Adds: Process reward models, beam search, particle filtering algorithms

Development Installation

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
cd its_hub
pip install -e ".[dev]"
# or using uv:
uv sync --extra dev

Quick Start

Example 1: Gateway Integration (Core Installation)

Installation required: pip install its_hub (core only, minimal dependencies)

Gateway integration requires implementing two interfaces: AbstractLanguageModel for LM calls and AbstractOrchestrator for managing parallel execution with concurrency control and rate limiting.

import asyncio

from its_hub import AbstractLanguageModel, AbstractOrchestrator, SelfConsistency

# Step 1: Implement AbstractLanguageModel with your gateway's LM client
class MyGatewayLM(AbstractLanguageModel):
    def __init__(self, gateway_client):
        self.client = gateway_client

    async def agenerate_single(self, messages, stop=None, **kwargs):
        response = await self.client.generate(messages, stop=stop, **kwargs)
        return {"role": "assistant", "content": response}

# Step 2: Implement AbstractOrchestrator for concurrency control
# (or use the built-in LMOrchestrator from its_hub[lm])
class MyGatewayOrchestrator(AbstractOrchestrator):
    async def agenerate(self, lm, messages_lst, **kwargs):
        # Manage parallel calls with your gateway's rate limits
        ...

async def main():
    lm = MyGatewayLM(your_gateway_client)
    orchestrator = MyGatewayOrchestrator()
    algorithm = SelfConsistency(orchestrator=orchestrator)
    result = await algorithm.ainfer(lm, "What is 2+2?", budget=5)
    print(result)  # {"role": "assistant", "content": "4", ...}

asyncio.run(main())

The AbstractOrchestrator is the central coordination point — it controls how algorithms fan out parallel LM calls, enforces rate limits, and provides structured error handling. See Orchestration for details.

Example 2: Standalone Use with OpenAI-Compatible LM

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import OpenAICompatibleLanguageModel, SelfConsistency

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

algorithm = SelfConsistency()
result = algorithm.infer(lm, "What is the capital of France?", budget=3)
print(result)  # Most common answer from 3 generations

# Close lm for resource cleanup
asyncio.run(lm.close())

Example 3: Best-of-N with LLM Judge

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import BestOfN, LLMJudge, OpenAICompatibleLanguageModel

its-hub

Popularity

What's Inside

README

`its-hub`: A Python library for inference-time scaling

📚 Documentation

Installation

Core Installation (Algorithms Only)

With Language Model Support

With Experimental Algorithms

Development Installation

Quick Start

Example 1: Gateway Integration (Core Installation)

Example 2: Standalone Use with OpenAI-Compatible LM

Example 3: Best-of-N with LLM Judge

Confidence

Similar Plugins

llm-council

llm-router

which-ai

llm-observability

muratcankoylan-advanced-evaluation

squeeze-evolve

More by Red-Hat-AI-Innovation-Team

sdg-hub

training-hub

Popularity

Health & Quality

More by Red-Hat-AI-Innovation-Team

sdg-hub

training-hub

Similar Plugins

llm-council

llm-router

which-ai

llm-observability

muratcankoylan-advanced-evaluation

squeeze-evolve

its-hub

Popularity

What's Inside

README

its-hub: A Python library for inference-time scaling

📚 Documentation

Installation

Core Installation (Algorithms Only)

With Language Model Support

With Experimental Algorithms

Development Installation

Quick Start

Example 1: Gateway Integration (Core Installation)

Example 2: Standalone Use with OpenAI-Compatible LM

Example 3: Best-of-N with LLM Judge

Confidence

Similar Plugins

llm-council

llm-router

which-ai

llm-observability

muratcankoylan-advanced-evaluation

squeeze-evolve

More by Red-Hat-AI-Innovation-Team

sdg-hub

training-hub

Popularity

Health & Quality

More by Red-Hat-AI-Innovation-Team

sdg-hub

training-hub

Similar Plugins

llm-council

llm-router

which-ai

llm-observability

muratcankoylan-advanced-evaluation

squeeze-evolve

`its-hub`: A Python library for inference-time scaling