Skip to content

yavuzserdarkocyigit/Efficient-Image-Classification-with-Deep-Reinforcement-Learning

Repository files navigation

RevealMNIST Exploration with Deep Reinforcement Learning

This project investigates the use of deep reinforcement learning (DRL) to solve the RevealMNIST challenge, where an agent must selectively reveal parts of an image in order to classify it correctly while minimizing the amount of visual information disclosed.

📌 This project is based on and extends RevealMNIST by emirarditi

Project Summary

We implemented and compared two major policy gradient algorithms:

  • REINFORCE with Baseline: A foundational algorithm, useful for benchmarking and understanding training challenges.
  • Advantage Actor-Critic (A2C): A more advanced method with n-step returns, entropy annealing, reward shaping, and dual-input neural network design.

We optimized agents in both deterministic and stochastic versions of the RevealMNIST environment.

RevealMNIST Environment

RevealMNIST is a partially observable environment where the agent:

  • Can reveal a pixel patch (Up, Down, Left, Right actions).
  • Can choose to predict the digit (Predict action).
  • Must optimize for accuracy, information efficiency (low reveal ratio), and high episode reward.

We built on the open-source RevealMNIST repo by emirarditi and developed customized training, logging, and visualization tools using PyTorch.

🧠 Agent Architecture

Our A2C agent uses two separate networks:

  • Policy Network: Outputs action probabilities.
  • Value Network: Estimates state value.

Both networks process:

  • 28x28 grayscale image (CNN layers).
  • 4 auxiliary features (FC layers): reveal ratio, position, etc.

Combined via concatenation + dense layers.

Key Techniques

  • n-step bootstrapped returns
  • Entropy regularization with linear annealing
  • Reward shaping with logarithmic penalty on reveal ratio
  • Two-phase grid search for hyperparameter optimization
  • Separate experiments for deterministic vs. stochastic environments

📊 Results

Setting Accuracy Avg Reward Reveal Ratio
Deterministic A2C 99.4% 68.53 13.5%
Stochastic A2C 98.9% 62.31 15.0%
REINFORCE (best) 95.3% 34.55 22.4%

🔍 A2C clearly outperforms REINFORCE in both accuracy and information efficiency.

📈 Training & Evaluation

  • Models trained for up to 50,000 episodes
  • Grid search on:
    • n_step, gamma, lr_policy, lr_value, entropy_coef, penalty_coef
  • Evaluation based on:
    • Accuracy
    • Average Reward
    • Reveal Ratio
  • Stochastic policy evaluation in both training and testing

🧑‍💻 Authors

  • Yavuz Serdar Koçyiğit
  • Osman Serhat Yılmaz
  • Graduate School of Science and Engineering – Özyeğin University
  • Course: CS545 – Deep Reinforcement Learning

📎 References


This project demonstrates how structured exploration, principled reward design, and adaptive DRL strategies can enable agents to make highly accurate predictions with minimal data exposure.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages