In the paper, it is stated that the temperature T has to be a positive number. In the code, however, although the temperature is initialized with a positive number (i.e., self.temperature = nn.Parameter(torch.ones(1) * 1.5)), it seems to me that there is nothing in the code of .set_temperature() to make sure that we do not end up with a negative temperature.
Did I miss something? Or, is it because it can be mathematically proven that the gradient will never push the temperature to the negative side as long as it is initialized to be positive? If not so, should we initialize with something like self.temperature = nn.Parameter(torch.ones(1) * 1.5) ** 2 to ensure that self.temperature is always positive?