Live — Apple Silicon M5 Max · Phoenix, AZ

AI inference that's private by default

Production-grade large language models on hardware you control. No data logging, no rate queues, no vendor lock-in.

No credit card required · API key delivered to your inbox instantly

Endpoint https://api.dalesai.com/v1 copy
<50ms
P95 latency
6
Models available
100%
On-premise
0
Data retained
$5/mo
Starting price
Models
Six models, one endpoint
Swap the model name in your existing OpenAI code. Nothing else changes. All models run fully on-premise — your data never leaves our hardware.
Q
Qwen 2.5 14B
Best general-purpose model for everyday tasks, coding, and chat
qwen2.5-14b
9 GB · Hobby+
Q3
Qwen 3 27B
Latest Qwen — stronger reasoning and coding than 2.5
qwen3-27b
23 GB · Builder+
G
Gemma 4
Google's efficient model — fast responses, great for high-volume use
gemma4
9.6 GB · Hobby+
M
Mistral 22B
Lightweight and fast — ideal for latency-sensitive applications
mistral-small-22b
12 GB · Builder+
L
Llama 3.1 70B
Meta's flagship — high capability reasoning and instruction following
llama3.1-70b
42 GB · Builder+
DS
DeepSeek R1 70B
Best for complex reasoning, research, and multi-step problem solving
deepseek-r1-70b
42 GB · Builder+
Integration
One line change. Works everywhere.
Drop-in replacement for OpenAI. Works with LangChain, LlamaIndex, and every OpenAI-compatible library out of the box.
example.py
from openai import OpenAI

client = OpenAI(
    api_key="your-dalesai-key",
    base_url="https://api.dalesai.com/v1",  # just swap this line
)

response = client.chat.completions.create(
    model="llama3.1-70b",  # or any model above
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,  # streaming supported on all plans
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
Status
All systems operational
Live status of every model. All inference runs on-premise in Phoenix, AZ.
qwen2.5-14b
checking...online
qwen3-27b
checking...online
gemma4
checking...online
mistral-small-22b
checking...online
llama3.1-70b
checking...online
deepseek-r1-70b
checking...online
Pricing
Simple, predictable pricing
No surprise bills. No per-request fees. One monthly price, use it all month.
Free trial
$0
one time · no card needed

100K tokens
Qwen 2.5 14B
Streaming included
Full API access
Get free key
Hobby
$5
per month

10M tokens / month
Qwen 2.5 14B + Gemma 4
Streaming included
Standard support
Get started
Studio
$50
per month

150M tokens / month
All 6 models
Streaming + function calling
Priority queue
Priority DM support
Custom model tuning
Get started
Why DalesAI
Built different from cloud AI
Not another wrapper around OpenAI. Your prompts and responses never touch a third-party server.
🔒
Zero data retention
Requests are processed in memory and never logged, stored, or used for training.
70B models, no wait
128GB unified memory means our 70B models are always loaded and ready — no cold start delays.
🖥️
True on-premise
Running on Apple Silicon M5 Max in Phoenix, AZ. No cloud middleman anywhere in the stack.
🔗
OpenAI compatible
Works with any library that supports a custom base URL. Swap one line, keep everything else.
Free trial
Try it right now
Enter your email and get 100K tokens instantly. No credit card, no waitlist.
Start building in 60 seconds
Get 100K free tokens. No credit card. API key delivered to your inbox instantly. Upgrade anytime.