Docs/Aigovernance/Aicost/Optimization

Cost Optimization

Reduce AI costs without sacrificing quality through intelligent optimization strategies.

Optimization Strategies

1. Model Selection

Use the right model for each task:

  • GPT-4 for complex reasoning
  • GPT-3.5 for simple tasks
  • Smaller models for classification

2. Response Caching

Eliminate redundant LLM calls by caching responses for identical inputs. When your OpenAI calls are wrapped with observeOpenAI, every call is traced automatically, so you can measure cache hit rates and the spend you avoid in the dashboard at https://app.agenticants.ai.

typescript
const openai = observeOpenAI(new OpenAI()); const cache = new Map<string, string>(); async function complete(prompt: string): Promise<string> { const cached = cache.get(prompt); if (cached) return cached; // no LLM call, no cost const res = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: prompt }], }); const text = res.choices[0]?.message?.content ?? ""; cache.set(prompt, text); return text; }

3. Prompt Optimization

Shorter prompts = lower costs:

  • Remove unnecessary context
  • Use concise instructions
  • Optimize system messages

4. Smart Sampling

Don't trace everything. Wrap only the requests you want to observe so you keep ingestion volume (and cost) down:

python
from ants_platform import get_client client = get_client() # configured via ANTS_PLATFORM_PUBLIC_KEY / ANTS_PLATFORM_SECRET_KEY / ANTS_PLATFORM_HOST def handle(request): if should_trace(request): with client.start_as_current_span(name="handle-request") as span: span.update(input=request) result = run_model(request) span.update(output=result) return result # Skip tracing for sampled-out requests return run_model(request)

Configure the client explicitly when not relying on env vars:

python
from ants_platform import AntsPlatform client = AntsPlatform( public_key="pk_...", secret_key="sk_...", host="https://api.agenticants.ai", )

Next Steps

© 2026 ANTS Platform, Inc.Docs v1.0 · Last updated June 2026