Intelligence Service — Overview & Startup

Responsibilities

The Python Intelligence service is the cognitive layer of OpenTier. It handles all operations that require ML inference or complex data retrieval:

Chat message processing (RAG-augmented generation)
Token streaming (gRPC server-streaming → Rust SSE bridge)
Document ingestion (scraping, chunking, embedding, storage)
Hybrid semantic + keyword search
User memory persistence
LLM client abstraction (OpenAI-compatible + Google GenAI)
Health and readiness probing

It exposes no HTTP endpoints — its only interface is the gRPC server on port 50051.

Technology Stack

Category	Package	Version
gRPC	`grpcio` + `grpcio-tools`	≥ 1.76.0
Protobuf	`protobuf`	≥ 6.33.3
DB async driver	`asyncpg`	≥ 0.30.0
DB sync driver	`psycopg[binary]`	≥ 3.3.2
ORM	`sqlalchemy`	≥ 2.0.45
Vector DB	`pgvector`	≥ 0.4.2
Embeddings	`sentence-transformers`	≥ 5.2.2
Deep learning	`torch`	≥ 2.10.0
Numerical	`numpy`	≥ 2.4.1
LLM (primary)	`google-genai`	≥ 1.63.0
HTTP client	`httpx`	≥ 0.28.1
Web scraping	`playwright` + `beautifulsoup4` + `lxml`	≥ 1.48.0
GitHub client	`pygithub`	≥ 2.1.0
Validation	`pydantic` + `pydantic-settings`	≥ 2.12.5
Retry	`tenacity`	≥ 8.2.3
Markdown	`markdown`	≥ 3.5.0
Python	`≥ 3.14`	—

Startup Sequence

Configuration

Loaded from environment via pydantic-settings with typed validation:

Core Config

Variable	Default	Type
`ENVIRONMENT`	`development`	str
`LOG_LEVEL`	`INFO`	str
`GRPC_PORT`	`50051`	int

Database (`DB_` prefix)

Variable	Default
`DB_URL`	`postgresql://postgres:postgres@localhost:5432/opentier`
`DB_POOL_SIZE`	`10`
`DB_MAX_OVERFLOW`	`20`
`DB_POOL_TIMEOUT`	`30`
`DB_POOL_RECYCLE`	`3600`

Embedding (`EMBEDDING_` prefix)

Variable	Default
`EMBEDDING_MODEL_NAME`	`sentence-transformers/all-MiniLM-L6-v2`
`EMBEDDING_DIMENSIONS`	`384`
`EMBEDDING_BATCH_SIZE`	`32`
`EMBEDDING_DEVICE`	`None` (auto: CUDA if available, else CPU)
`EMBEDDING_NORMALIZE`	`True`
`EMBEDDING_CACHE_SIZE`	`10000`

LLM (`LLM_` prefix)

Variable	Default
`LLM_PROVIDER`	`openai`
`LLM_API_KEY`	`""`
`LLM_MODEL`	`gpt-4o`
`LLM_BASE_URL`	`https://api.openai.com/v1`
`LLM_TEMPERATURE`	`0.7`
`LLM_MAX_TOKENS`	`1000`

Ingestion (`INGESTION_` prefix)

Variable	Default
`INGESTION_CHUNK_SIZE`	`512`
`INGESTION_CHUNK_OVERLAP`	`50`
`INGESTION_MAX_BATCH_SIZE`	`100`
`INGESTION_AUTO_CLEAN`	`True`
`INGESTION_CLEANING_STRATEGY`	`standard`
`INGESTION_GENERATE_EMBEDDINGS`	`True`
`INGESTION_MAX_CONTENT_LENGTH`	`1_000_000`

Graceful Shutdown

On SIGTERM or SIGINT:

server.stop(grace=30) — stops accepting new connections, waits up to 30 seconds for in-flight RPCs to complete
Lifecycle.shutdown() — calls close_db() to drain the connection pool
Process exits

Readiness Probe

Health.Ready RPC returns:


ReadyCheckResponse(
    ready=db_healthy and embedding_model_loaded,
    dependencies=["database", "embedding_model"],
    dependency_status={
        "database": db_health_check(),       # SELECT 1
        "embedding_model": model_is_loaded,  # in-memory flag
    }
)