MountainCar RL Pipeline
Idle

MountainCar Off-Policy RL Pipeline

Collect experience with a behavior policy, initialize a Q-table, train with off-policy importance sampling, then evaluate the greedy target policy — step by step.

Step 1

Data Collection

Idle

Behavior Policy

Action: 0 left · 1 coast · 2 right

State visitation heatmap (x = position bin, y = velocity bin)

Collection Metrics

Episodes0
Successes0
Success rate0.00%
Avg return0.00
Avg length0.00
Cur step0
Cur return0.00
Last action-
Behavior prob-
Position-
Velocity-
Discrete state-
Action 0 (←)0
Action 1 (·)0
Action 2 (→)0
StatusIdle
Step 2

Build Q-Table

Waiting for Step 1

-

Step 3

Train Model

Waiting for Step 2

-

Step 4

Test Model

Waiting for Step 3

Greedy Policy Playback

Trained greedy policy playback

Test Metrics

Episodes0
Successes0
Success rate0.00%
Avg return0.00
Avg length0.00
Cur step0
Cur return0.00
StatusIdle