RevealMNIST Exploration with Deep Reinforcement Learning

This project investigates the use of deep reinforcement learning (DRL) to solve the RevealMNIST challenge, where an agent must selectively reveal parts of an image in order to classify it correctly while minimizing the amount of visual information disclosed.

📌 This project is based on and extends RevealMNIST by emirarditi

Project Summary

We implemented and compared two major policy gradient algorithms:

REINFORCE with Baseline: A foundational algorithm, useful for benchmarking and understanding training challenges.
Advantage Actor-Critic (A2C): A more advanced method with n-step returns, entropy annealing, reward shaping, and dual-input neural network design.

We optimized agents in both deterministic and stochastic versions of the RevealMNIST environment.

RevealMNIST Environment

RevealMNIST is a partially observable environment where the agent:

Can reveal a pixel patch (Up, Down, Left, Right actions).
Can choose to predict the digit (Predict action).
Must optimize for accuracy, information efficiency (low reveal ratio), and high episode reward.

We built on the open-source RevealMNIST repo by emirarditi and developed customized training, logging, and visualization tools using PyTorch.

🧠 Agent Architecture

Our A2C agent uses two separate networks:

Policy Network: Outputs action probabilities.
Value Network: Estimates state value.

Both networks process:

28x28 grayscale image (CNN layers).
4 auxiliary features (FC layers): reveal ratio, position, etc.

Combined via concatenation + dense layers.

Key Techniques

n-step bootstrapped returns
Entropy regularization with linear annealing
Reward shaping with logarithmic penalty on reveal ratio
Two-phase grid search for hyperparameter optimization
Separate experiments for deterministic vs. stochastic environments

📊 Results

Setting	Accuracy	Avg Reward	Reveal Ratio
Deterministic A2C	99.4%	68.53	13.5%
Stochastic A2C	98.9%	62.31	15.0%
REINFORCE (best)	95.3%	34.55	22.4%

🔍 A2C clearly outperforms REINFORCE in both accuracy and information efficiency.

📈 Training & Evaluation

Models trained for up to 50,000 episodes
Grid search on:
- n_step, gamma, lr_policy, lr_value, entropy_coef, penalty_coef
Evaluation based on:
- Accuracy
- Average Reward
- Reveal Ratio
Stochastic policy evaluation in both training and testing

🧑‍💻 Authors

Yavuz Serdar Koçyiğit
Osman Serhat Yılmaz
Graduate School of Science and Engineering – Özyeğin University
Course: CS545 – Deep Reinforcement Learning

📎 References

RevealMNIST GitHub Repo

This project demonstrates how structured exploration, principled reward design, and adaptive DRL strategies can enable agents to make highly accurate predictions with minimal data exposure.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
RevealMNIST-main		RevealMNIST-main
data/MNIST/raw		data/MNIST/raw
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
reinforce_agent.py		reinforce_agent.py
reinforce_test.py		reinforce_test.py
reinforce_trainn.py		reinforce_trainn.py
td_a2c_evaluate_models.py		td_a2c_evaluate_models.py
td_a2c_gridsearch.py		td_a2c_gridsearch.py
td_a2c_train.py		td_a2c_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RevealMNIST Exploration with Deep Reinforcement Learning

Project Summary

RevealMNIST Environment

🧠 Agent Architecture

Key Techniques

📊 Results

📈 Training & Evaluation

🧑‍💻 Authors

📎 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RevealMNIST Exploration with Deep Reinforcement Learning

Project Summary

RevealMNIST Environment

🧠 Agent Architecture

Key Techniques

📊 Results

📈 Training & Evaluation

🧑‍💻 Authors

📎 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages