Hi, thanks for the work.
I've just found that the code is not doing the same way as the paper said.
For instance, The Smooth l1 Loss of Key-point Locations is not the same. In the paper, only the m predicted labels contribute to the loss. In the code, since the proj_label are the set to be all zeros besides 9 * m locations around m key points, every location will contribute to the loss.
Can anyone explain this for me?