Integration
One line change. Works everywhere.
Drop-in replacement for OpenAI. Works with LangChain, LlamaIndex, and every OpenAI-compatible library out of the box.
from openai import OpenAI
client = OpenAI(
api_key="your-dalesai-key",
base_url="https://api.dalesai.com/v1",
)
response = client.chat.completions.create(
model="llama3.1-70b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Pricing
Simple, predictable pricing
No surprise bills. No per-request fees. One monthly price, use it all month.
Free trial
$0
one time · no card needed
✓ 100K tokens
✓ Qwen 2.5 14B
✓ Streaming included
✓ Full API access
Get free key
Hobby
$5
per month
✓ 10M tokens / month
✓ Qwen 2.5 14B + Gemma 4
✓ Streaming included
✓ Standard support
Get started
Most popular
Builder
$20
per month
✓ 50M tokens / month
✓ All 6 models
✓ Streaming + function calling
✓ Standard support
Get started
Studio
$50
per month
✓ 150M tokens / month
✓ All 6 models
✓ Streaming + function calling
✓ Priority queue
✓ Priority DM support
✓ Custom model tuning
Get started
Why DalesAI
Built different from cloud AI
Not another wrapper around OpenAI. Your prompts and responses never touch a third-party server.
🔒
Zero data retention
Requests are processed in memory and never logged, stored, or used for training.
⚡
70B models, no wait
128GB unified memory means our 70B models are always loaded and ready — no cold start delays.
🖥️
True on-premise
Running on Apple Silicon M5 Max in Phoenix, AZ. No cloud middleman anywhere in the stack.
🔗
OpenAI compatible
Works with any library that supports a custom base URL. Swap one line, keep everything else.