OpenAI in production

OpenAI is in the Avo client toolkit for vector search, vision tasks, and ChatGPT-compatible APIs. Avo uses Anthropic internally but knows when OpenAI is the better client-side call.

HOW WE USE IT

OpenAI at Avo

OpenAI and Anthropic are not interchangeable. Each has concrete strengths, and choosing the wrong one for a given task costs money and reliability. Avo runs Anthropic Claude in production for its own structured extraction agents because Claude's structured output consistency on complex JSON schemas was measurably better in head-to-head testing on the email personalization workload. That said, three specific client scenarios push Avo toward OpenAI: vector search requiring text-embedding-3-large, clients who need a ChatGPT-compatible API endpoint for their end users, and vision tasks where GPT-4o's image understanding outperforms available alternatives.

Avo does not run OpenAI in its own production pipeline. For client work, Avo builds OpenAI integrations when the client's use case fits one of those three categories.

The most common pattern is RAG (retrieval-augmented generation) for client-facing Q&A products. A client wants users to ask questions against a proprietary document corpus. The answer pipeline needs vector search. text-embedding-3-large at 3,072 dimensions gives meaningfully better retrieval quality than smaller models for technical or domain-specific text. The embed-once-query-many cost profile is favorable: embedding a 10,000-document corpus costs approximately $1 using text-embedding-3-large at $0.00013 per 1K tokens. Queries are cheap because only the user's question needs embedding at query time, not the full corpus.

Example workflow: building a RAG system for a client's internal knowledge base. 1. Chunk the document corpus at 512 tokens with 64-token overlap. Overlap prevents answer truncation at chunk boundaries. 2. Call the OpenAI embeddings endpoint (model: text-embedding-3-large, dimensions: 1536 for pgvector compatibility) for each chunk. Batch 100 chunks per API call to stay under the token-per-minute limit. 3. Store vectors in PostgreSQL with the pgvector extension using an HNSW index (ef_construction: 128, m: 16). HNSW gives 10 to 50x faster approximate nearest-neighbor queries than IVFFlat for under 1M vectors. 4. At query time, embed the user's question with the same model and dimensions. Run SELECT chunk_text, 1 - (embedding <=> $1) AS score FROM doc_chunks ORDER BY score DESC LIMIT 5. 5. Pass the top-5 chunks as context to GPT-4o with a system prompt instructing it to answer only from the provided context and cite which chunk each answer comes from. 6. Cache the embedding of common questions in Redis (TTL 24h). The embedding API call adds 80 to 150ms; caching eliminates it on repeated queries.

Tradeoffs to think through honestly. OpenAI's rate limits (tier-dependent, typically 500K tokens per minute on tier 3) can block batch embedding jobs at scale; plan for exponential backoff and batch sizing below the limit. GPT-4o is more expensive than Claude Haiku for structured extraction tasks where Haiku is sufficient. For any task involving tool use and structured JSON output, run both models on a sample of real inputs and compare output validity rates before committing to one. Cost per token is not the right metric; cost per valid output is. Finally, OpenAI's API has had higher incident frequency than Anthropic's over the past 12 months; any client product that depends on it needs a graceful degradation path when the API is unavailable.

Production numbers

text-embedding-3-large

Embedding model

~$1

Embed cost (10K docs)

HNSW via pgvector

Vector index

Anthropic Claude

Avo internal AI

Case studies using OpenAI

AI / Machine Learning

ML Signal Scoring: From 48% Accuracy to a 72% Win Rate Through Architectural Selection

We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.

72.2% Win rate (top-decile signals)

Read case study →

AI / Machine Learning

Lead Scoring Model: From 6 Static Factors to a Three-Tier Behavioral Composite

We upgraded from a static 6-factor lead score to a three-tier behavioral composite integrating email engagement, AUM, and headcount, projecting 5 to 10% conversion uplift.

3,889 Tier A leads (V1)

Read case study →

AI / Machine Learning

AI-Driven Email Personalization at Scale

We generated 500 personalized cold email pitches using Claude Haiku and a Rust web scraper for $1.40 total, achieving 34% open rate versus 11% for category-level templates.

500 Leads processed

Read case study →

Start a project

Need a OpenAI build?

Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a project →

Start a Project

OpenAI in production

OpenAI is in the Avo client toolkit for vector search, vision tasks, and ChatGPT-compatible APIs. Avo uses Anthropic internally but knows when OpenAI is the better client-side call.

HOW WE USE IT

OpenAI at Avo

Avo does not run OpenAI in its own production pipeline. For client work, Avo builds OpenAI integrations when the client's use case fits one of those three categories.

Production numbers

text-embedding-3-large

Embedding model

~$1

Embed cost (10K docs)

HNSW via pgvector

Vector index

Anthropic Claude

Avo internal AI

Case studies using OpenAI

AI / Machine Learning

ML Signal Scoring: From 48% Accuracy to a 72% Win Rate Through Architectural Selection

We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.

72.2% Win rate (top-decile signals)

Read case study →

AI / Machine Learning

Lead Scoring Model: From 6 Static Factors to a Three-Tier Behavioral Composite

We upgraded from a static 6-factor lead score to a three-tier behavioral composite integrating email engagement, AUM, and headcount, projecting 5 to 10% conversion uplift.

3,889 Tier A leads (V1)

Read case study →

AI / Machine Learning

AI-Driven Email Personalization at Scale

We generated 500 personalized cold email pitches using Claude Haiku and a Rust web scraper for $1.40 total, achieving 34% open rate versus 11% for category-level templates.

500 Leads processed

Read case study →

Start a project

Need a OpenAI build?

Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a project →