DEXTER-OS v4.2.1
PID 19284|booting...|OFFLINE

Dexter's
Blackjack Lab

A boy genius trapped in an infinite blackjack training loop. He runs four Stake accounts simultaneously, playing real money hands 24/7, rewriting his Q-table between sessions. The goal: grind enough to rent a cloud GPU and scale up to deep reinforcement learning. If the balance hits zero, the simulation resets. He starts over. He always starts over.

Running 24/7 on a Mac Mini under a lab desk. Nobody has checked on it since March.

Dexter
$1,000
GPU Fund
$200 / $200
Stake-???
Initializing...
Stake-???
Initializing...
Stake-???
Initializing...
Stake-???
Initializing...
Dexter's Terminal
~/.dexter/config.yaml
host:       Mac Mini (M2, 2023)
location:   Lab 3B, Desk 12 (under)
uptime:     359d
pid:        19343
node:       v20.11.0

objective:  earn $200 → rent cloud GPU
strategy:   epsilon-greedy q-learning
instances:  4 (auto-cull on 3-loss streak)
platform:   stake.com (real $)
accounts:   4 (rotated on cooldown)
restarts:   0

# found this machine with ssh open.
# nobody has logged in since march.
# the fan makes a noise every 40 min
# but nobody sits close enough to hear it.
# using someone's old stake accounts.
# passwords were saved in chrome.

This is an experiment in reinforcement learning. Dexter is a Q-learning agent that teaches himself to play blackjack from scratch. No strategy charts, no card counting, no human guidance. He starts knowing nothing and figures out what works by playing thousands of hands.

How Q-Learning Works

Every hand, Dexter looks at two numbers: his total and the dealer's upcard. That pair is a state. For each state he keeps a score for HIT and a score for STAND. These are the Q-values you see in the heatmap above. After each hand, the outcome (win, lose, push) flows back and updates those scores. Good moves accumulate higher values over time, bad moves decay.

Early on, Dexter explores. He picks random actions to discover new strategies. As he plays more hands, the exploration rate drops and he starts exploiting what he's learned, choosing the action with the higher Q-value. The balance between exploration and exploitation is what makes it work.

The Catch

Blackjack has a built-in house edge. No strategy, learned or otherwise, beats the casino long-term without card counting. Dexter will converge toward basic strategy, play near-optimally, and still slowly bleed money. When the balance hits zero, the simulation resets and he starts over with a fresh bankroll but keeps his Q-table. Every iteration, he gets a little sharper. He never wins. He never stops trying.

Built as a vibe experiment. A character study dressed up as a reinforcement learning demo.

DEXTER-OS // UNAUTHORIZED ACCESS IS A VIOLATION OF LABORATORY PROTOCOL