-
-
Notifications
You must be signed in to change notification settings - Fork 107
NFQ #897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NFQ #897
Conversation
|
5e6f09d was functional in a separate julia project where ReinforcementLearning.jl was simply imported (for reference). Refactor c423048 is untested. NFQ is meant to be used together with
PS: I am not very familiar with Flux so any comments on that side are welcome as well |
Codecov Report
@@ Coverage Diff @@
## main #897 +/- ##
==========================================
- Coverage 24.36% 0.01% -24.36%
==========================================
Files 219 209 -10
Lines 7711 7412 -299
==========================================
- Hits 1879 1 -1878
- Misses 5832 7411 +1579
|
Algorithms are tested by implementing a functioning example experiment in the RLExperiments package. This is a great way to document your algorithm also as people can see how you build a working agent with the correct trajectory. If you create some functions such as a loss, you can also add tests to see if they behave as expected.
I think the
I can review your implementation, just ping me when it is ready to be, or if you have a question. |
|
As far as I'm concerned, this PR is ready for review. The results of the test are attached. The provided example performs 500 random steps and then stops exploration. Contrary to the setup in the original paper [1], the example retrains NFQ at the end of every episode. I noticed while tuning the example that NFQ is somewhat sensitive to the batch size. If the batch size is too small performance oscillates (but some episodes NFQ still obtains the maximum score). I think that could be caused by having 'bad luck' with the sample since one [1] Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32 |
src/ReinforcementLearningExperiments/deps/experiments/experiments/DQN/JuliaRL_NFQ_CartPole.jl
Outdated
Show resolved
Hide resolved
Co-authored-by: Henri Dehaybe <[email protected]>
HenriDeh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that last little change then we're good to merge I think. Thanks for the PR, it helped us clean up a bit the learners shenanigans.
|
I found that the NFQ docs were also a bit outdated, so I'm updating that as well. Give me some time to get this in order because my git history is a bit messed up |
|
@HenriDeh It should be done now, thanks for helping me with this |
|
I'm letting you merge in case you have a last minute tweak to do. |
|
I don't have anything more to add, this branch is ready. I don't have write permissions to main, so I'll let you do the honors |

Fixes #895
PR Checklist