-
Notifications
You must be signed in to change notification settings - Fork 271
Implement mixed hmc #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement mixed hmc #826
Conversation
|
(moving conversation from slack)
I think a nice interface to specify this would be to follow the pattern we used for parallel enumeration, where a user could specify an infer config like |
|
@fritzo I think it is doable without much change in the implementation. Based on those "infer" information, we can maintain 3 behaviors for a discrete D-size variable
|
|
Hi @StannisZhou, I'm not sure why Another issue is: in Bernoulli(0.8) case, if we take 2 discrete steps, then when we start at the discrete value 1, we will never reach the discrete value 0 because if we accept the proposal in the first step, then discrete kinetic energy will be increased by an amount that always allows us to accept the proposal in the second step. So the chain will stuck at value |
|
Hi @fehiepsi , thanks for bringing up this very important issue. I still need to look a bit more into this, but a few points I know at the moment:
|
Agreed! The formula About the tests, I believe if I add an independent continuous variable to the model that has an exact HMC dynamics, the tests will fail by the same reason. On the other hand, tests are aslo failing because the periodicity that I mentioned in my last comment: if we use the scheme (with/without the final MH correction) for Bernoulli(0.8) with 2 discrete updates per each MCMC step, the chain will stuck at value 1. About a simpler formulation, we mimiced the scheme in your paper but separating out the HMC halminton dynamics and discrete halmintonian dynamics. In other words, we consider the continuous variable as a discrete variable and HMC dynamics with MH proposal as a Gibbs proposal (i.e. Delta_E = 0). That way, we don't need the final MH correction and it allows us to use the full NUTS update each time we want to update the continuous variable. We are still making the mass adaptation scheme (to control the speed of updates) to be more polished to be able to deliver it to users. Without mass adaptation, the scheme will be similar to random Gibbs scan as you mentioned. |
|
Agreed about the periodicity case. I basically made a technical assumption when proving correctness. Your comments led me to think there might be mess-ups in my cancellation reasonings in the proof which led to the seemingly incorrect final correction. I'm looking into this point. I see your point about taking the continuous part as a discrete variable. That's very interesting, and sounds similar but slightly different from what I'm thinking (and what I had in the paper). To make sure I understand, you specify an auxiliary kinetic energy for the continuous part of the system, then after each continuous Hamiltonian dynamics step you calculate the energy difference and decide whether you accept/reject based on whether you have enough kinetic energy? |
|
Yes, that is similar to a sequence of Gibbs updates in MH-within-Gibbs, but updating (not reseting) the kinetic energy each time we update a variable, as in your paper. We just found that the scheme makes it easier to adjust the selection probability for which variable we want to update each time (e.g. some variables we want to update more often than the others). There are other ways to adjust the selection probabilty but the scheme in your paper is more intuitive to me. In NUTS (and HMC with MH correction too), the accept ratio (after all internal MH corrections in HMC/NUTS) will always be 1, so having an auxiliary kinetic energy is not important. But it is good to use your scheme (with constant kinetic energy) to adjust the speed, hence we can adapt for how often we want to perform NUTS update in an MCMC step. |
|
So you are indeed making an MH correction after each continuous HMC update. I don't think that's necessarily the best scheme in the HMC case, but for NUTS it does make more sense. |
|
Updated: by incoporating the updated version by @StannisZhou, tests pass now. The new formula has the last acceptance ratio = 1 when the continuous HMC dynamic is exact - so it looks more reasonable than previously. |
|
Thanks @fehiepsi ! I'm attaching the updated algorithm as reference while I'm working to fix the MH term in the paper. |
|
Update: I have fixed the arxiv version of the paper and the code on github. I additionally included a script to showcase sampling with different proposal distributions (e.g. Gibbs proposals which were invalid with previous MH correction). A brief postmortem:
Also see the erratum for more details and some additional experiments on performances of mixed HMC with different discrete proposals. |
|
great to hear. thanks @StannisZhou for following up! |
|
Thanks for reviewing and feedback, @martinjankowiak and @StannisZhou! |

Resolves #785.
Tasks
- [ ] Mass adaptation