Skip to content

Conversation

@CasBex
Copy link
Contributor

@CasBex CasBex commented Jun 2, 2023

Fixes #895

PR Checklist

  • Refactor old NFQ code
  • Categorise NFQ
  • Verify NFQ with example
  • Code review

@CasBex
Copy link
Contributor Author

CasBex commented Jun 2, 2023

5e6f09d was functional in a separate julia project where ReinforcementLearning.jl was simply imported (for reference). Refactor c423048 is untested.

NFQ is meant to be used together with QBasedPolicy. During an episode, action selection happens exactly the same as in any Q-based policy, with the sole difference that the optimise! step should never happen during an episode, only after each episode / at the start of a new episode.

  1. I saw no test code anywhere in the repo. Am I supposed to add tests to the repo or keep them separate? Is there an example somewhere that uses the refactored library?
  2. I don't know what the appropriate location is since it's not quite a DQN algorithm but also not fully offline, feel free to move the files as appropriate.

PS: I am not very familiar with Flux so any comments on that side are welcome as well

@CasBex CasBex marked this pull request as draft June 2, 2023 13:43
@CasBex CasBex changed the title Draft: NFQ NFQ Jun 2, 2023
@codecov
Copy link

codecov bot commented Jun 2, 2023

Codecov Report

Merging #897 (c43f37a) into main (6de371f) will decrease coverage by 24.36%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #897       +/-   ##
==========================================
- Coverage   24.36%   0.01%   -24.36%     
==========================================
  Files         219     209       -10     
  Lines        7711    7412      -299     
==========================================
- Hits         1879       1     -1878     
- Misses       5832    7411     +1579     
Impacted Files Coverage Δ
...xperiments/experiments/DQN/JuliaRL_NFQ_CartPole.jl 0.00% <0.00%> (ø)
...xperiments/src/ReinforcementLearningExperiments.jl 0.00% <ø> (-100.00%) ⬇️
...einforcementLearningZoo/src/algorithms/dqns/NFQ.jl 0.00% <0.00%> (ø)

... and 63 files with indirect coverage changes

@HenriDeh
Copy link
Member

HenriDeh commented Jun 3, 2023

  1. I saw no test code anywhere in the repo. Am I supposed to add tests to the repo or keep them separate? Is there an example somewhere that uses the refactored library?

Algorithms are tested by implementing a functioning example experiment in the RLExperiments package. This is a great way to document your algorithm also as people can see how you build a working agent with the correct trajectory. If you create some functions such as a loss, you can also add tests to see if they behave as expected.

I don't know what the appropriate location is since it's not quite a DQN algorithm but also not fully offline, feel free to move the files as appropriate.

I think the dqns directory is fine.

I am not very familiar with Flux so any comments on that side are welcome as well

I can review your implementation, just ping me when it is ready to be, or if you have a question.

@CasBex CasBex marked this pull request as ready for review June 16, 2023 13:32
@CasBex
Copy link
Contributor Author

CasBex commented Jun 16, 2023

As far as I'm concerned, this PR is ready for review. The results of the test are attached.

The provided example performs 500 random steps and then stops exploration. Contrary to the setup in the original paper [1], the example retrains NFQ at the end of every episode.

I noticed while tuning the example that NFQ is somewhat sensitive to the batch size. If the batch size is too small performance oscillates (but some episodes NFQ still obtains the maximum score). I think that could be caused by having 'bad luck' with the sample since one optimise! call can drastically change NFQ behavior. Larger batch size results in more consistent performance, so I chose the batch size equal to the amount of steps.

[1] Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32

JuliaRL_NFQ_CartPole

Copy link
Member

@HenriDeh HenriDeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that last little change then we're good to merge I think. Thanks for the PR, it helped us clean up a bit the learners shenanigans.

@CasBex
Copy link
Contributor Author

CasBex commented Jun 23, 2023

I found that the NFQ docs were also a bit outdated, so I'm updating that as well. Give me some time to get this in order because my git history is a bit messed up

@CasBex
Copy link
Contributor Author

CasBex commented Jun 23, 2023

@HenriDeh It should be done now, thanks for helping me with this

HenriDeh
HenriDeh previously approved these changes Jun 23, 2023
@HenriDeh
Copy link
Member

I'm letting you merge in case you have a last minute tweak to do.

@CasBex
Copy link
Contributor Author

CasBex commented Jun 26, 2023

I don't have anything more to add, this branch is ready. I don't have write permissions to main, so I'll let you do the honors

@HenriDeh HenriDeh self-requested a review June 26, 2023 12:46
@HenriDeh HenriDeh enabled auto-merge (squash) June 26, 2023 13:12
@HenriDeh HenriDeh merged commit 72d6766 into JuliaReinforcementLearning:main Jun 26, 2023
@CasBex CasBex mentioned this pull request Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Contribute Neural Fitted Q-iteration algorithm

4 participants