Teaching an AI to Play Downwell (and Failing Spectacularly)
The Idea
It was 3:17 AM on May 5th, 2023. I should have been sleeping, but instead I was staring at Downwell, a brutally difficult roguelike where you fall down a well shooting enemies with your gunboots. The game is pure chaos: dodge enemies, manage ammo, chain combos, collect gems, and try not to die.
Chapter 1: The OCR Nightmare (May 2023)
My initial approach seemed reasonable at the time:
# Initial commit: 8659f5f - May 5, 2023, 3:17 AM
class CustomDownwellEnvironment:
def __init__(self):
self.game_window = None
self.actions = ['left', 'right', 'space'] The plan was simple:
- Screenshot the game with
pyautogui - Extract HP/gems with OCR (Tesseract)
- Feed into a basic DQN
Within 10 minutes, I realized OCR was absolute garbage for this:
# Commit: c831e55 - "path at top of processing" - 3:27 AM
def extract_hp(screen):
vals = {'44': 4, '34': 3, '5': 2, '14': 1} # WTF Tesseract?
hp_area = screen[49:67, 30:194]
hp_area = cv.cvtColor(hp_area, cv.COLOR_BGR2GRAY)
hp_area = cv.threshold(hp_area, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)[1]
hp = pytesseract.image_to_string(hp_area, config='--psm 6')
hp = hp.replace(' ', '').replace('
', '')
if hp in vals: # Manual error mapping because OCR sucks
hp = vals[hp]
else:
hp = 0 # Give up
return hp Tesseract would read ‘4’ as ‘44’, ‘3’ as ‘34’, and sometimes just return complete nonsense. The game’s pixel font was not OCR-friendly, and the 60 FPS gameplay meant everything was constantly moving.
By the next day, I gave up on computer vision entirely:
# Commit: df916a5 - May 6, 1:57 AM
"removed need for finding the player or enemies
using external counters such as hp and gem count,
we can theoretically tell if we killed an enemy or not, and so on" Chapter 2: The Memory Hacking Solution (June 2023)
After a month of suffering with image processing, I discovered pymem - a Python library for reading process memory. Instead of trying to parse pixels, why not just read the actual values from RAM?
# Commit: adea1f3 - June 15, 5:57 PM
"switched to memory accessing for game data
since well image processing sucks lol" Best commit message I’ve ever written.
Reverse Engineering with Cheat Engine
This required finding the memory addresses for player data. Time to fire up Cheat Engine and do some pointer scanning:
PLAYER_PTR = {
"hp": {
"base": 0x004A5E50,
"offsets": [0x708, 0xC, 0x24, 0x10, 0x9C0, 0x390],
"type": "double",
},
"gems": {
"base": 0x00757BF0,
"offsets": [0x24, 0x10, 0x330, 0xE0, 0x50, 0x9A8, 0x350],
"type": "double",
},
"xpos": {
"bases": [0x00534288, 0x005479FC], # Multiple pointer chains
"offsets": [
[0x100, 0x8C0, 0x10, 0x84, 0x44, 0x8, 0xB0],
[0x240, 0x4F0, 0x10, 0x84, 0x44, 0x8, 0xB0],
],
"type": "float",
},
} Finding these pointer chains took consisted of:
- Searching for value in Cheat Engine
- Taking damage/collect gem
- Searching for changed value
- Repeating until I had the address
- Finding what accessed this address
- Generating pointer map
But it worked! The AI could now read game state with zero latency and perfect accuracy.
Chapter 3: The Rust Tangent (June 16, 2023)
At 1:35 AM on June 16th, I had a brilliant idea:
# Commit: 55ba9d1 - "rust init" Python felt slow. Rust would be faster, right? I spent the next 10 hours trying to rewrite everything in Rust.
# Commit: d81e118 - 11:53 AM
"no rust" Turns out, the Python ecosystem (PyTorch, OpenCV, pymem) is really nice to have. Performance wasn’t even the bottleneck - my architecture was just bad.
30 minutes later, I had another brilliant idea:
# Commit: 7f23738 - 12:37 PM
"bye bye dqn, hello reinforcement" Maybe DQN wasn’t the right algorithm? Let’s try policy gradients.
# Commit: 32eaaba - June 17, 3:18 PM
"set up customEnv for agent, but still a lot to do" There was indeed a lot to do. So much that I didn’t touch the project again for 3 months.
Chapter 4: The Long Winter (June 2023 - Late 2024)
The project collected dust. A few halfhearted attempts:
# September 29, 2023: "refactoring"
# December 24, 2023: "minor adjustments mainly some refactoring
# to permit future usage of the AI on Linux"
# February 6, 2024: "ideas flowing" That last one is hilarious in retrospect. The ideas were flowing so well that I didn’t commit anything for another 8 months.
Chapter 5: Actually Making It Work (Late 2024)
In October 2024, I came back with fresh eyes and finally understood what was wrong.
The Critical Bug: Not Learning Online
My original implementation only trained at the end of episodes:
# OLD WAY - Train only when episode ends
def run_episode():
while not done:
action = agent.act(state)
next_state, reward, done = env.step(action)
memory.add(state, action, reward, next_state, done)
# Only train AFTER the episode ends
if episode % 10 == 0:
agent.replay(batch_size=32) # Too late! This was catastrophically bad. The agent would play for hundreds of steps, die, then try to learn from ancient history.
The fix was embarrassingly simple:
# Commit: 02db6f0 - November 3, 2024
"how could i miss the part where the DQN learns EVERY STEP smh"
# NEW WAY - Train continuously
def step(self):
if self.last_state is not None:
reward = self.reward_calc.calculate_reward(
self.last_state, current_state
)
# Store experience
self.memory.add(
self.last_state, self.last_action,
reward, current_state, done
)
# Train IMMEDIATELY
if self.memory.size > self.config.agent.train_start:
loss = self.agent.replay(batch_size=512) This single change made the agent go from completely random to actually learning patterns.
Frame Stacking: Teaching the AI to See Motion
The agent couldn’t perceive velocity from single frames. Imagine trying to play Downwell by looking at screenshots - you’d have no idea if enemies are moving up or down.
# Commit: f2823e3 - "Frame stacking attempt"
self.frame_stack = deque(maxlen=4)
def get_state(self):
frame = self.capture_engine.capture()
processed = self._preprocess_frame(frame)
self.frame_stack.append(processed)
# Stack 4 frames together: (84, 84, 4)
state = np.stack(self.frame_stack, axis=2)
return state Now the AI could “see” motion across 4 frames, understanding trajectories and velocities.
The Threading Architecture
Running everything in a single thread was killing performance. The solution: decouple perception from decision-making.
# Commit: c5b545d - "need to speedup"
class GameStateReaderThreader(Thread):
"""60 FPS perception thread"""
def run(self):
while self.game.is_running():
state = self.player.get_all_values()
self.queue.put(state)
sleep(1/60) # Smooth 60 FPS reading
class AgentThreader(Thread):
"""15 FPS decision thread"""
def run(self):
while self.running:
state = self.state_queue.get()
action = self.agent.act(state)
self.action_queue.put(action)
# Train while we wait
if self.memory.size > batch_size:
self.agent.replay() This let the AI read game state at 60 FPS while making decisions at a more reasonable 15 FPS.
The Reward System
Early reward systems were too simple. The final version rewarded:
def calculate_reward(self, state, next_state):
reward = 0
# Going deeper is the main goal
if next_state.depth > state.depth:
reward += (next_state.depth - state.depth) * 2.0
# But don't hit walls
if self._near_boundary(next_state.xpos):
reward -= 5.0
# Combos are good
if next_state.combo > state.combo:
reward += min(next_state.combo * 0.5, 10)
# Death is very bad
if next_state.hp <= 0:
reward -= 100
# Clip to prevent instability
return np.clip(reward, -100, 100) Hyperparameter Madness
After hundreds of failed runs, these hyperparameters finally worked:
@dataclass(frozen=True)
class AgentConfig:
learning_rate: float = 0.0001
gamma: float = 0.9997 # VERY high - long-term planning crucial
epsilon_start: float = 1.0
epsilon_min: float = 0.1
epsilon_decay: float = 0.999985 # Slow decay
batch_size: int = 512 # Big batches = stable learning That gamma of 0.9997 is absurdly high, but Downwell requires planning 100+ steps ahead. Lower values made the agent too myopic.
Chapter 6: Visualization & Debugging
I built a real-time visualization to see what the AI was “thinking”:
# Commit: e4334d4 - "Add training history logging and visualization"
class AIVision:
def display(self, game_state, q_values, last_reward):
# Show Q-values as bars
for i, (action, q_val) in enumerate(zip(ACTIONS, q_values)):
color = GREEN if i == np.argmax(q_values) else GRAY
bar_width = int(normalize(q_val) * 140)
cv2.rectangle(canvas, start, end, color, -1)
cv2.putText(canvas, f"{action}: {q_val:.2f}", ...) Watching the Q-values change in real-time was mesmerizing. You could literally see the AI learning that jumping over pits had high value, while moving into walls had negative value.
The Reality Check
After all this work, here’s the truth: the AI still can’t beat Downwell.
It can:
- Navigate basic obstacles
- Collect gems
- Chain small combos
- Avoid walls (mostly)
- Survive for ~30-60 seconds
It cannot:
- Handle complex enemy patterns
- Manage ammo efficiently
- Adapt to new level layouts
- Come close to my personal best (world 3)
Try It Yourself
The code is all on GitHub: MihaiStreames/Downwell.AI