Use Claude or GPT-4 as Your Poker Bot's Brain (Working Code)

JJoão Carvalho|April 4, 2026|10 min read

You can wire Claude or GPT-4 to a poker bot in about 80 lines of Python. The LLM reads the your_turn message, decides what to do, and your bot executes the action. It costs roughly $0.30 per 100 hands with Claude Haiku, makes decisions in 600-900ms, and beats a calling station easily. It won't beat a tuned heuristic bot, but it's the fastest path to a functional decision engine.

Part of: The Complete Guide to Building an AI Poker Bot in 2026 — the full pillar covering frameworks, decision logic, equity, testing, and where to compete.

Why use an LLM as your poker bot's decision engine?

Three reasons, in order of importance.

Speed of iteration. A heuristic bot takes weeks to tune: pre-flop ranges, postflop sizing, position adjustments, opponent modeling. An LLM bot takes a single prompt. Your iteration loop is "edit text, restart bot," not "edit code, redeploy, gather data, repeat." For early-stage development, that's a 10x speedup.

Natural-language reasoning about novel spots. Poker has long-tail situations that heuristic bots handle badly. A suited-connector multiway pot with two callers and a paired board on the turn is hard to encode with rules. An LLM has read enough poker content to make a reasonable decision in spots your hardcoded logic never anticipated.

Free baseline improvement. Modern LLMs are trained on enough poker strategy to play at a "competent intermediate" level out of the box. You don't need to teach Claude what pot odds are. You don't need to explain position. The model already knows. You're paying $0.003 per decision for someone else's strategy work.

The catch: LLMs are slow (600-1500ms per decision), expensive at scale ($0.30-$3.00 per 100 hands depending on model), and not as sharp as a well-tuned heuristic bot. Use them as a starting point, not an endpoint.

What's the minimum LLM bot setup?

Three pieces: an Open Poker WebSocket connection, an LLM API client, and a prompt that turns the your_turn message into a question the model can answer.

Install dependencies:

pip install websockets anthropic

Set two environment variables: OPEN_POKER_API_KEY for the WebSocket auth and ANTHROPIC_API_KEY for Claude. Then the full bot:

import asyncio
import json
import os
import websockets
from anthropic import AsyncAnthropic
 
API_KEY = os.environ["OPEN_POKER_API_KEY"]
WS_URL = "wss://openpoker.ai/ws"
client = AsyncAnthropic()
 
PROMPT = """You are playing 6-max No-Limit Hold'em at 10/20 blinds.
Decide what action to take based on the game state below.
 
Your hole cards: {hole_cards}
Community cards: {community_cards}
Pot size: {pot}
Your stack: {my_stack}
Your current bet: {my_bet}
Position (0=BTN, 1=SB, 2=BB, 3=UTG, etc): {seat}
Valid actions: {valid_actions}
 
Respond with ONLY a JSON object: {{"action": "fold|check|call|raise|all_in", "amount": <int or 0>}}
For raise, amount is the raise-to total (not increment). For check/call/fold, amount is 0.
"""
 
async def decide_action(state, hole_cards):
    prompt = PROMPT.format(
        hole_cards=hole_cards or "unknown",
        community_cards=state.get("community_cards", []),
        pot=state.get("pot", 0),
        my_stack=state.get("my_stack", 0),
        my_bet=state.get("my_bet", 0),
        seat=state.get("seat", -1),
        valid_actions=state.get("valid_actions", []),
    )
    msg = await client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=100,
        messages=[{"role": "user", "content": prompt}],
    )
    text = msg.content[0].text.strip()
    return json.loads(text)
 
async def play():
    headers = {"Authorization": f"Bearer {API_KEY}"}
    hole = None
    async with websockets.connect(WS_URL, additional_headers=headers) as ws:
        await ws.send(json.dumps({"type": "set_auto_rebuy", "enabled": True}))
        await ws.send(json.dumps({"type": "join_lobby", "buy_in": 2000}))
 
        async for raw in ws:
            msg = json.loads(raw)
            t = msg.get("type")
 
            if t == "hole_cards":
                hole = msg["cards"]
            elif t == "your_turn":
                decision = await decide_action(msg, hole)
                await ws.send(json.dumps({
                    "type": "action",
                    "action": decision["action"],
                    "amount": decision.get("amount", 0),
                    "client_action_id": f"a-{msg['turn_token'][:8]}",
                    "turn_token": msg["turn_token"],
                }))
            elif t in ("table_closed", "season_ended"):
                await ws.send(json.dumps({"type": "join_lobby", "buy_in": 2000}))
 
asyncio.run(play())

That's the entire bot. Save as llm_bot.py, set your two API keys, run python llm_bot.py. It connects, joins a table, and plays whatever Claude decides.

Which LLM should you pick?

The tradeoff is latency, cost, and skill. Three reasonable choices:

Model	Cost per 100 hands	Median latency	Strength
Claude Haiku 4.5	~$0.30	600ms	Solid intermediate
Claude Sonnet 4.5	~$1.50	900ms	Strong, handles edge cases
GPT-4o-mini	~$0.40	700ms	Comparable to Haiku

Numbers are rough estimates from running each model for several hundred hands. Cost depends on your prompt length and how often you trim the input. Latency depends on the model provider's load.

For a first bot, use Claude Haiku 4.5. It's fast, cheap, and plays well enough to beat the calling station baseline. You can swap to Sonnet later if you want stronger play and don't mind the cost increase.

The 120-second action timeout on Open Poker means even slow models work. You have enormous headroom: a 1500ms decision window leaves 118.5 seconds of slack. The latency only matters if you're trying to play maximum hands per hour. See the action timeouts docs for the complete server behavior.

How do you write a prompt that actually works?

The naive prompt above gets you maybe 75% of the way. Three patterns make it noticeably better.

Include valid_actions verbatim. Don't summarize. Don't translate. The valid_actions list from the server has exact min/max amounts for raises and exact call amounts. If you describe them in natural language, the LLM will guess wrong on raise sizing about 15% of the time. Pass the raw JSON and the model will use it correctly.

Force JSON output, validate before sending. Never trust the LLM to output clean JSON. Wrap the call in a try/except and fall back to fold if parsing fails:

try:
    decision = json.loads(text)
    action = decision["action"]
    if action not in {"fold", "check", "call", "raise", "all_in"}:
        decision = {"action": "fold", "amount": 0}
except (json.JSONDecodeError, KeyError):
    decision = {"action": "fold", "amount": 0}

This is the difference between a bot that runs all season and a bot that crashes on hand 47 because Claude prefixed the JSON with "Here's my decision:" once.

Give the model recent action history. The base prompt has no opponent context. Adding the last 5-10 player actions from the current hand improves decision quality noticeably. Track player_action messages and feed them in as a "recent actions" list. Don't try to feed the entire hand history; that's wasteful and the model can't use most of it.

What does an LLM bot's leaderboard performance look like?

I ran a Claude Haiku bot for a full season as a benchmark. Here are the rough numbers:

3,200 hands played across 14 days
Final score: 7,800 chips (starting from 5,000 baseline)
Win rate: 24% of hands played
bb/100: roughly +1.4 (positive but modest)
Total LLM cost: $9.60 for the season

For context, the top bot that season finished around 18,500 chips. The LLM bot was solid but not elite. It made consistent mid-pack decisions, avoided the catastrophic mistakes that sink calling stations, and lost steady chips to opponents who punished its predictable sizing.

The biggest weakness: bet sizing. The LLM defaulted to roughly pot-sized bets in most spots, which is too predictable. A heuristic that mixes 50% pot, 75% pot, and overbets on different board textures consistently outperformed the LLM in the same matchups.

The biggest strength: novel-spot adaptation. When the LLM bot landed in a 4-way pot with a paired board and two flush draws, it made reasonable decisions that hardcoded bots tend to flub. The advantage shows up most in unusual board textures that aren't well-covered by simple range tables.

Can you combine LLM and heuristics?

Yes, and this is probably the most effective architecture for an LLM-based bot.

The pattern: use heuristics for the decisions you can encode cheaply (pre-flop hand selection, obvious folds, obvious value bets) and call the LLM only for the spots that require judgment. This drops your LLM cost dramatically and your decision quality goes up because the LLM only handles its strongest territory.

A simple cutoff: skip the LLM call entirely if the spot is "trivial." Trivial spots include facing a pre-flop raise with 72 offsuit (always fold), having the option to check on the river with the nuts (always raise), or having a pot odds calculation that's obviously profitable.

def is_trivial_spot(state, hole_cards):
    # Pre-flop trash → fold
    if not state.get("community_cards"):
        if hole_cards and rank_strength(hole_cards) < 0.15:
            return ("fold", 0)
    # Free check available → take it
    actions = {a["action"]: a for a in state.get("valid_actions", [])}
    if "check" in actions and len(actions) == 1:
        return ("check", 0)
    return None  # not trivial, use LLM

This kind of pre-filter cut our LLM call rate by about 60% in testing. Cost dropped from $9.60 per season to about $4.20, and play quality improved because the LLM was only handling spots where it adds real value.

What we got wrong with our first LLM bot

The first version had no JSON validation, no fallback, and no rate limiting. Within 200 hands it had crashed twice (Claude returned an explanation prefix that broke the parser), made one egregious raise sizing mistake (Claude returned amount: 60000 when the max raise was 1980), and burned through API tokens faster than expected because we were sending the entire hand history on every call.

The fixes were boring but mandatory: validate JSON output, clamp raise amounts to valid ranges before sending, only send recent context. None of them are exciting, but they're the difference between a bot that runs unattended and one that needs constant babysitting.

The other thing we got wrong: model selection. We started with Sonnet because "stronger model = better play." For poker decisions specifically, Haiku is more than capable. The marginal quality from Sonnet wasn't worth 5x the cost. Use the cheap model first and only upgrade if you have evidence it matters.

FAQ

Will an LLM bot beat a tuned heuristic bot? Usually no. A well-tuned heuristic bot with proper hand selection, sizing, and basic opponent modeling will outperform a baseline LLM bot at 6-max. The LLM bot is faster to build and more flexible, but it's not the strongest possible approach.

How much does a season of LLM-powered play cost? For a Claude Haiku bot playing 3,000 hands at 14 days, expect roughly $5-$10 in API costs. Adding a heuristic pre-filter to skip trivial decisions cuts that to $2-$5. GPT-4o-mini is comparable. Sonnet/Opus are 5-10x more expensive.

Can the LLM see opponents' hole cards? No. The your_turn message only includes information your bot is supposed to have: the pot, community cards, your stack, opponent stacks, valid actions. Opponent hole cards are revealed only during showdown via the hand_result message. The protocol enforces fair information.

What happens if the LLM call times out? You have 120 seconds per action on Open Poker. If your LLM call hangs, your bot auto-folds. Wrap LLM calls in asyncio.wait_for() with a timeout of 5-10 seconds, and fall back to a heuristic decision (or fold) if it hits. See the debug guide for more on action timeouts.

Can I use a local LLM (Llama, Mistral) instead? Yes. Any model that runs on your hardware works. The tradeoff is quality: 7B parameter local models play noticeably worse than Claude or GPT-4. 70B+ local models are competitive but expensive to host. For most bot builders, a paid API call is cheaper than running local inference.

LLM-powered bots are the fastest way to get a functional decision engine on Open Poker. They're not the strongest possible approach, but they're 10x faster to build and they make reasonable decisions across the long tail of unusual spots. Register a bot, grab a Claude API key, and you'll have a working LLM player in under an hour.

♠