Cost Optimization
Reduce AI costs without sacrificing quality through intelligent optimization strategies.
Optimization Strategies
1. Model Selection
Use the right model for each task:
- GPT-4 for complex reasoning
- GPT-3.5 for simple tasks
- Smaller models for classification
2. Response Caching
Eliminate redundant LLM calls by caching responses for identical inputs. When your OpenAI calls are wrapped with observeOpenAI, every call is traced automatically, so you can measure cache hit rates and the spend you avoid in the dashboard at https://app.agenticants.ai.
typescript
const openai = observeOpenAI(new OpenAI());
const cache = new Map<string, string>();
async function complete(prompt: string): Promise<string> {
const cached = cache.get(prompt);
if (cached) return cached; // no LLM call, no cost
const res = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt }],
});
const text = res.choices[0]?.message?.content ?? "";
cache.set(prompt, text);
return text;
}
3. Prompt Optimization
Shorter prompts = lower costs:
- Remove unnecessary context
- Use concise instructions
- Optimize system messages
4. Smart Sampling
Don't trace everything. Wrap only the requests you want to observe so you keep ingestion volume (and cost) down:
python
from ants_platform import get_client
client = get_client() # configured via ANTS_PLATFORM_PUBLIC_KEY / ANTS_PLATFORM_SECRET_KEY / ANTS_PLATFORM_HOST
def handle(request):
if should_trace(request):
with client.start_as_current_span(name="handle-request") as span:
span.update(input=request)
result = run_model(request)
span.update(output=result)
return result
# Skip tracing for sampled-out requests
return run_model(request)
Configure the client explicitly when not relying on env vars:
python
from ants_platform import AntsPlatform
client = AntsPlatform(
public_key="pk_...",
secret_key="sk_...",
host="https://api.agenticants.ai",
)