-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Gaussian Process tutorial #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, and deserves more than a tutorial 😄 I think we should add GPRegressor as a first-class part of Pyro. Would you be willing to:
- Create a PR that adds a
pyro.contrib.gp.GPRegressorsay in a new filepyro/contrib/gp.py, including a couple tests? I suggest basing this off of @dwd31415'sMultivariateNormalin #651, since that PR already adds tests and deals withtorch.potriin a safe way. - Simplify this PR to
from pyro.contrib.gp import GPRegressor.
(EDIT changed pyro.nn to pyro.contrib)
|
If you separate the |
|
@fritzo Thank you for your suggestions! |
|
@fritzo : I am on a vacation and will return in a week. Your suggestion is good for me. I will follow it when I come back to my home. |
|
@fritzo I have updated the tutorial after moving GPR model and RBF kernel to |
fritzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for factoring out the gp library, I think it will be useful.
pyro/contrib/gp/kernels/kernel.py
Outdated
| Calculate covariance matrix of inputs on active dimensionals. | ||
| :param torch.autograd.Variable X: A 2D tensor of size `N x input_dim`. | ||
| :param torch.autograd.Variable X2: A 2D tensor of size `N x input_dim`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace X2 with Z here and below
pyro/contrib/gp/kernels/rbf.py
Outdated
|
|
||
| class RBF(Kernel): | ||
| """ | ||
| Implementation of RBF kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand this to "Radial Basis Function kernel"?
.gitignore
Outdated
| @@ -1,3 +1,4 @@ | |||
| # temp | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this line
pyro/contrib/gp/models/gpr.py
Outdated
| kernel_fn = pyro.random_module(self.kernel.name, self.kernel, self.priors) | ||
| kernel = kernel_fn() | ||
| K = kernel.K(self.X) + self.noise.repeat(self.input_dim).diag() | ||
| zero_loc = Variable(torch.zeros(self.input_dim).type_as(K.data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to get correct GPU device placement you'll need to
zero_loc = K.new([0]).expand(self.input_dim)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no problem with using type_as method (in pytorch ver 3). However, I just have 1 GPU to test. Did you mean that this will fail for multi-GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, K.new(...) ensures that the result is on the same GPU as K, whereas I believe .type_as() only ensures that the result is on some GPU. I only learned this recently, and we still have a few bugs in Pyro relating to GPU placement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I try to use .new() whenever I create a new tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, thank you a lot for the tip! Will change soon.
|
@karalets Could you please review the high-level approach? I've already reviewed for software details. |
| def __init__(self, variance=torch.ones(1), lengthscale=torch.ones(1), active_dims=None, name="RBF"): | ||
| super(RBF, self).__init__(active_dims=active_dims, name=name) | ||
| self.variance = Parameter(variance) | ||
| self.lengthscale = Parameter(lengthscale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add a description of how lengthscale works, typically you have one independent lengthscale per dimension, but you are assuming that lengthscales across all dimensions are the same (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ysaatchi You are right (originally, I just wrote this module for 1-dimensional data). We need to refactor it a bit. I will solve it by introducing input_dim parameter (similar to GPy or GPflow) so we can set correct shape for initial lengthscale.
| self.kernel = kernel | ||
| # TODO: define noise as a nn.Module, so we can train/set prior to it | ||
| if noise is None: | ||
| self.noise = Variable(X.data.new([1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Needs a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noise plays the role of Gaussian likelihood. I intend to add Likelihood, Mean function modules in a future pull request. So temporarily, I let it be constant. See #681 for a plan I have in mind. The pull request is served for the purpose of simplifying the original tutorial code.
pyro/contrib/gp/models/gpr.py
Outdated
| self.y = y | ||
| self.input_dim = X.size(0) | ||
| self.kernel = kernel | ||
| # TODO: define noise as a nn.Module, so we can train/set prior to it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the best idea, you should define noise as a likelihood with its own hypers and optimize it that way. In general we need to support arbitrary likelihoods for the GP so defining them at this early stage will be very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree!
| zero_loc = Variable(K.data.new([0]).expand(self.input_dim)) | ||
| pyro.sample("f", dist.MultivariateNormal(zero_loc, K), obs=self.y) | ||
|
|
||
| def guide(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the guide in this context? It seems like you are doing inference in forward(), so what is the point of the guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I want to put some constraints on parameters but it needs to write a wrapper of nn.Parameter (support transform methods). Then I found that setting priors and using guide for MAP inference might be a simpler idea. Do you find a better way of using the guide?
pyro/contrib/gp/models/gpr.py
Outdated
| K_zx = K_xz.t() | ||
| K_zz = kernel(Z) | ||
| loc = K_zx.matmul(self.y.gesv(K)[0]).squeeze(1) | ||
| covariance_matrix = K_zz - K_zx.matmul(K_xz.gesv(K)[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very inefficient (calling gesv twice on an NxN matrix), see GPML book (Gaussian Processes for Machine Learning) for correct pseudocode for doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, does gesv play well with autograd -- quite cool if so :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ysaatchi Originally, I put noise as hyperparameter and use Cholesky decomposition. Then when noise is small, the Lapack error "the leading minor of order ... is not positive definite" annoyed me. In addition, somehow, I find torch.trtrs is not stable (pytorch/pytorch#4296). So I use gesv instead. Of course, these problems might come from some bugs in my code at that time.
Anyway, using gesv might be not a good way, so I will use Cholesky decomposition again.
p/s: gesv supports autograd, but not supports batch yet. :)
ysaatchi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start, I have some major questions at this stage, once clarified we can move forward.
|
@fehiepsi We plan to merge this early next week. We're about to do a big refactoring and I want to make sure your PR can come along for the ride. |
|
@fritzo @ysaatchi I have made several changes reflect your reviews. For the change from |
|
@ysaatchi I make a benchmark to compare the following three methods on 1000x1000 matrices. def method1(L, K):
return K.t().matmul(K.potrs(L, upper=False))
def method2(L, K):
v = K.trtrs(L, upper=False)[0]
return v.t().matmul(v)
def method3(L, K):
v = L.inverse().matmul(K)
return v.t().matmul(v)On my CPU: method2 is fastest (10ms), method1 is slower (13ms), method3 is slowest (20ms). Edited: |
I believe def _kernel_norm(L, K):
if L.is_cuda:
# work around lack of CUDA support for trtrs
v = L.inverse().matmul(K)
else:
v = K.trtrs(L, upper=False)[0]
return v.t().matmul(v) |
|
@fritzo I think that calling |
|
Hi @fehiepsi I'm going to merge this now so it can follow our refactoring work, but feel free to keep submitting updates in follow-up PRs. Thanks for contributing this! |
|
@fehiepsi is it ok if i submit a PR to clean up the language/organization of this tutorial a bit to make it a bit easier to follow? |
|
@martinjankowiak Sure, it would be great! Given that the API for GP is stable now (current PRs does not affect the API), it is a good time to revise it. I am happy to see the changes from you. |
This pull request is a WIP tutorial on how to create a (simple) GP regression model using pyro. Definitely, there should be added more explanations, comments, plots,... Any comments are really helpful for me to complete it.
Distribution. (skip as suggested by @fritzo)random_moduleprimitive. (skip)model-guide+SVIapproach. (skip)pyro.contrib.gp.