Skip to content

Conversation

@CasBex
Copy link
Contributor

@CasBex CasBex commented Sep 25, 2023

Hi, I implemented NFQ a while ago (#897) and have since noticed some things that could be improved. Changes are summarized below:

  1. Use a (state) -> ... -> |action space| structure for the Q-network (rather than (state, single action) -> ... -> value structure and looping over all actions)
  2. Explicitly use gradient / optimise instead of Flux.train
  3. Rely on the Trajectory controller for sampling rather than sampling inside the optimise! function

In addition to increased uniformity, this also consumes much less memory on the gpu compared to the previous version, which needed to duplicate all input data |action space| times to loop over all actions. The main drawback is that it is less faithful to the original implementation [1].

[1] Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32

@CasBex
Copy link
Contributor Author

CasBex commented Sep 25, 2023

Unrelated, but during testing I tried to work with the EpisodeSampleRatioController and didn't manage to get it working: it complains that NamedTuple has no field terminal. I didn't find any examples of it in this repo or in RLTrajectories, so that may be worthwile to pursue in another issue.

Copy link
Member

@HenriDeh HenriDeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good changes, this implements NFQ in a more idiosyncratic way that reflects better RL.jl's design.
I only have one question (see below).

@HenriDeh
Copy link
Member

Unrelated, but during testing I tried to work with the EpisodeSampleRatioController and didn't manage to get it working: it complains that NamedTuple has no field terminal. I didn't find any examples of it in this repo or in RLTrajectories, so that may be worthwile to pursue in another issue.

Yes it's new and unused at the moment. Can you share a stacktrace, I'll open a PR to fix this.

@HenriDeh HenriDeh merged commit dd19ee0 into JuliaReinforcementLearning:main Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants