Skip to Content
Intelligence Service (Python)Overview & Startup

Intelligence Service — Overview & Startup

Responsibilities

The Python Intelligence service is the cognitive layer of OpenTier. It handles all operations that require ML inference or complex data retrieval:

  • Chat message processing (RAG-augmented generation)
  • Token streaming (gRPC server-streaming → Rust SSE bridge)
  • Document ingestion (scraping, chunking, embedding, storage)
  • Hybrid semantic + keyword search
  • User memory persistence
  • LLM client abstraction (OpenAI-compatible + Google GenAI)
  • Health and readiness probing

It exposes no HTTP endpoints — its only interface is the gRPC server on port 50051.

Technology Stack

CategoryPackageVersion
gRPCgrpcio + grpcio-tools≥ 1.76.0
Protobufprotobuf≥ 6.33.3
DB async driverasyncpg≥ 0.30.0
DB sync driverpsycopg[binary]≥ 3.3.2
ORMsqlalchemy≥ 2.0.45
Vector DBpgvector≥ 0.4.2
Embeddingssentence-transformers≥ 5.2.2
Deep learningtorch≥ 2.10.0
Numericalnumpy≥ 2.4.1
LLM (primary)google-genai≥ 1.63.0
HTTP clienthttpx≥ 0.28.1
Web scrapingplaywright + beautifulsoup4 + lxml≥ 1.48.0
GitHub clientpygithub≥ 2.1.0
Validationpydantic + pydantic-settings≥ 2.12.5
Retrytenacity≥ 8.2.3
Markdownmarkdown≥ 3.5.0
Python≥ 3.14

Startup Sequence

Configuration

Loaded from environment via pydantic-settings with typed validation:

Core Config

VariableDefaultType
ENVIRONMENTdevelopmentstr
LOG_LEVELINFOstr
GRPC_PORT50051int

Database (DB_ prefix)

VariableDefault
DB_URLpostgresql://postgres:postgres@localhost:5432/opentier
DB_POOL_SIZE10
DB_MAX_OVERFLOW20
DB_POOL_TIMEOUT30
DB_POOL_RECYCLE3600

Embedding (EMBEDDING_ prefix)

VariableDefault
EMBEDDING_MODEL_NAMEsentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIMENSIONS384
EMBEDDING_BATCH_SIZE32
EMBEDDING_DEVICENone (auto: CUDA if available, else CPU)
EMBEDDING_NORMALIZETrue
EMBEDDING_CACHE_SIZE10000

LLM (LLM_ prefix)

VariableDefault
LLM_PROVIDERopenai
LLM_API_KEY""
LLM_MODELgpt-4o
LLM_BASE_URLhttps://api.openai.com/v1
LLM_TEMPERATURE0.7
LLM_MAX_TOKENS1000

Ingestion (INGESTION_ prefix)

VariableDefault
INGESTION_CHUNK_SIZE512
INGESTION_CHUNK_OVERLAP50
INGESTION_MAX_BATCH_SIZE100
INGESTION_AUTO_CLEANTrue
INGESTION_CLEANING_STRATEGYstandard
INGESTION_GENERATE_EMBEDDINGSTrue
INGESTION_MAX_CONTENT_LENGTH1_000_000

Graceful Shutdown

On SIGTERM or SIGINT:

  1. server.stop(grace=30) — stops accepting new connections, waits up to 30 seconds for in-flight RPCs to complete
  2. Lifecycle.shutdown() — calls close_db() to drain the connection pool
  3. Process exits

Readiness Probe

Health.Ready RPC returns:

ReadyCheckResponse( ready=db_healthy and embedding_model_loaded, dependencies=["database", "embedding_model"], dependency_status={ "database": db_health_check(), # SELECT 1 "embedding_model": model_is_loaded, # in-memory flag } )
Last updated on