Gaussian Process tutorial #650

fehiepsi · 2017-12-22T09:12:14Z

This pull request is a WIP tutorial on how to create a (simple) GP regression model using pyro. Definitely, there should be added more explanations, comments, plots,... Any comments are really helpful for me to complete it.

Add an introduction, formulas, remarks, comments to code.
Explain how to define a Distribution. (skip as suggested by @fritzo)
Explain how to use random_module primitive. (skip)
Explain Pyro's model-guide + SVI approach. (skip)
Implement MAP inference.
Simplify the tutorial after moving some parts to pyro.contrib.gp.

fritzo · 2017-12-22T21:01:53Z

cc @ysaatchi @karalets

fritzo

This looks great, and deserves more than a tutorial 😄 I think we should add GPRegressor as a first-class part of Pyro. Would you be willing to:

Create a PR that adds a pyro.contrib.gp.GPRegressor say in a new file pyro/contrib/gp.py, including a couple tests? I suggest basing this off of @dwd31415's MultivariateNormal in #651, since that PR already adds tests and deals with torch.potri in a safe way.
Simplify this PR to from pyro.contrib.gp import GPRegressor.

(EDIT changed pyro.nn to pyro.contrib)

fritzo · 2017-12-26T20:19:43Z

If you separate the MultivariateNormalTri and GPRegressor into a separate PR, then you can avoid explaining some of the coding details in this PR (Distributions, nn.random_module) and focus more attention on the interesting parts of Gaussian Processes.

fehiepsi · 2017-12-27T02:11:10Z

@fritzo Thank you for your suggestions! contrib.gp is a good place to start making a skeleton for GP. I will try to finish this tutorial first (without explaining details of implementation, it seems that the remaining work is to make MAP inference). After my vacation (next week), I will focus on contrib.gp to see how far it can go. :)

fritzo · 2018-01-03T03:02:28Z

@fehiepsi If it's easier for you, it would be fine to split this up into two PRs by:

update this tutorial to use @dwd31415's MultivariateNormal which is now merged;
merge this PR
factor out GaussianProcess into pyro.congrib.gp in a second PR.

fehiepsi · 2018-01-03T08:26:35Z

@fritzo : I am on a vacation and will return in a week. Your suggestion is good for me. I will follow it when I come back to my home.

fehiepsi · 2018-01-15T15:21:44Z

@fritzo I have updated the tutorial after moving GPR model and RBF kernel to pyro.contrib.gp and using the MultivariateNormal distribution which is currently implemented.

fritzo

Looks great! Thanks for factoring out the gp library, I think it will be useful.

fritzo · 2018-01-16T17:34:38Z

pyro/contrib/gp/kernels/kernel.py

+        Calculate covariance matrix of inputs on active dimensionals.
+
+        :param torch.autograd.Variable X: A 2D tensor of size `N x input_dim`.
+        :param torch.autograd.Variable X2: A 2D tensor of size `N x input_dim`.


replace X2 with Z here and below

fritzo · 2018-01-16T17:54:02Z

pyro/contrib/gp/kernels/rbf.py

+
+class RBF(Kernel):
+    """
+    Implementation of RBF kernel.


Could you expand this to "Radial Basis Function kernel"?

fritzo · 2018-01-16T17:55:33Z

.gitignore

@@ -1,3 +1,4 @@
+# temp


nit: remove this line

fritzo · 2018-01-16T17:58:35Z

pyro/contrib/gp/models/gpr.py

+        kernel_fn = pyro.random_module(self.kernel.name, self.kernel, self.priors)
+        kernel = kernel_fn()
+        K = kernel.K(self.X) + self.noise.repeat(self.input_dim).diag()
+        zero_loc = Variable(torch.zeros(self.input_dim).type_as(K.data))


I think to get correct GPU device placement you'll need to

zero_loc = K.new([0]).expand(self.input_dim)

I see no problem with using type_as method (in pytorch ver 3). However, I just have 1 GPU to test. Did you mean that this will fail for multi-GPU?

Yes, K.new(...) ensures that the result is on the same GPU as K, whereas I believe .type_as() only ensures that the result is on some GPU. I only learned this recently, and we still have a few bugs in Pyro relating to GPU placement.

Now I try to use .new() whenever I create a new tensor.

Aha, thank you a lot for the tip! Will change soon.

fritzo · 2018-01-18T19:25:39Z

@karalets Could you please review the high-level approach? I've already reviewed for software details.

ysaatchi · 2018-01-19T23:30:48Z

pyro/contrib/gp/kernels/rbf.py

+    def __init__(self, variance=torch.ones(1), lengthscale=torch.ones(1), active_dims=None, name="RBF"):
+        super(RBF, self).__init__(active_dims=active_dims, name=name)
+        self.variance = Parameter(variance)
+        self.lengthscale = Parameter(lengthscale)


Need to add a description of how lengthscale works, typically you have one independent lengthscale per dimension, but you are assuming that lengthscales across all dimensions are the same (?)

@ysaatchi You are right (originally, I just wrote this module for 1-dimensional data). We need to refactor it a bit. I will solve it by introducing input_dim parameter (similar to GPy or GPflow) so we can set correct shape for initial lengthscale.

ysaatchi · 2018-01-19T23:37:25Z

pyro/contrib/gp/models/gpr.py

+        self.kernel = kernel
+        # TODO: define noise as a nn.Module, so we can train/set prior to it
+        if noise is None:
+            self.noise = Variable(X.data.new([1]))


What does this do? Needs a comment

Noise plays the role of Gaussian likelihood. I intend to add Likelihood, Mean function modules in a future pull request. So temporarily, I let it be constant. See #681 for a plan I have in mind. The pull request is served for the purpose of simplifying the original tutorial code.

ysaatchi · 2018-01-19T23:40:48Z

pyro/contrib/gp/models/gpr.py

+        self.y = y
+        self.input_dim = X.size(0)
+        self.kernel = kernel
+        # TODO: define noise as a nn.Module, so we can train/set prior to it


Not the best idea, you should define noise as a likelihood with its own hypers and optimize it that way. In general we need to support arbitrary likelihoods for the GP so defining them at this early stage will be very helpful.

Yes, I agree!

ysaatchi · 2018-01-19T23:46:19Z

pyro/contrib/gp/models/gpr.py

+        zero_loc = Variable(K.data.new([0]).expand(self.input_dim))
+        pyro.sample("f", dist.MultivariateNormal(zero_loc, K), obs=self.y)
+
+    def guide(self):


What is the purpose of the guide in this context? It seems like you are doing inference in forward(), so what is the point of the guide?

Originally, I want to put some constraints on parameters but it needs to write a wrapper of nn.Parameter (support transform methods). Then I found that setting priors and using guide for MAP inference might be a simpler idea. Do you find a better way of using the guide?

ysaatchi · 2018-01-19T23:47:36Z

pyro/contrib/gp/models/gpr.py

+        K_zx = K_xz.t()
+        K_zz = kernel(Z)
+        loc = K_zx.matmul(self.y.gesv(K)[0]).squeeze(1)
+        covariance_matrix = K_zz - K_zx.matmul(K_xz.gesv(K)[0])


This is very inefficient (calling gesv twice on an NxN matrix), see GPML book (Gaussian Processes for Machine Learning) for correct pseudocode for doing this.

Out of interest, does gesv play well with autograd -- quite cool if so :)

@ysaatchi Originally, I put noise as hyperparameter and use Cholesky decomposition. Then when noise is small, the Lapack error "the leading minor of order ... is not positive definite" annoyed me. In addition, somehow, I find torch.trtrs is not stable (pytorch/pytorch#4296). So I use gesv instead. Of course, these problems might come from some bugs in my code at that time.

Anyway, using gesv might be not a good way, so I will use Cholesky decomposition again.

p/s: gesv supports autograd, but not supports batch yet. :)

ysaatchi

Looks like a good start, I have some major questions at this stage, once clarified we can move forward.

fritzo · 2018-01-20T05:43:05Z

@fehiepsi We plan to merge this early next week. We're about to do a big refactoring and I want to make sure your PR can come along for the ride.

fehiepsi · 2018-01-20T07:21:00Z

@fritzo @ysaatchi I have made several changes reflect your reviews. For the change from noise to Gaussian Likelihood (a nn.Module), I think it is better to do in another pull request. My main concern is not about its implementation, but about naming: we should have a good way to distinguish parameters of kernels and likelihoods/mean_functions when setting priors to them.

fehiepsi · 2018-01-20T08:35:11Z

@ysaatchi I make a benchmark to compare the following three methods on 1000x1000 matrices.

def method1(L, K):
    return K.t().matmul(K.potrs(L, upper=False))

def method2(L, K):
    v = K.trtrs(L, upper=False)[0]
    return v.t().matmul(v)

def method3(L, K):
    v = L.inverse().matmul(K)
    return v.t().matmul(v)

On my CPU: method2 is fastest (10ms), method1 is slower (13ms), method3 is slowest (20ms).
On my GPU: method1 is much faster than method3 (2ms comparing to 10ms), method2 throws an error.

Edited:
The drawbacks of method2 are: torch.trtrs() does not support cuda tensor. In addition, torch.potrs and torch.trtrs does not support gradients. So I think that using method3 is the best option for GPR.

fritzo · 2018-01-22T18:01:19Z

@fehiepsi FYI we're refactoring MultivariateNormal in #693 as part of our migration to PyTorch distributions. I've tried to preserve all the functionality you need for your tutorial. Let me know if you have any problems with the new version and I'll prioritize fixing it for you.

fritzo · 2018-01-22T18:07:42Z

method3 is the best option

I believe torch.trtrs is currently lacks gradient support. It's also fine to make a helper like

def _kernel_norm(L, K):
    if L.is_cuda:
        # work around lack of CUDA support for trtrs
        v = L.inverse().matmul(K)
    else:
        v = K.trtrs(L, upper=False)[0]
    return v.t().matmul(v)

fehiepsi · 2018-01-23T02:39:03Z

@fritzo I think that calling K.inverse() is better than L.inverse(). L is just helpful when it is used together with torch.trtrs(). So in my implementation, I choose the method1 among the following 4 methods (previously I chose method2, but @ysaatchi points out that it is ineffective to call torch.gesv two times on the same kernel matrix`). Anyway, I think that this is ready to merge.

def method1(K, A, y):
    K_inv = K.inverse()
    A_t = A.t()
    loc = A_t.matmul(K_inv.matmul(y))
    scale = A_t.matmul(K_inv.matmul(A))
    return loc, scale
    
def method2(K, A, y):
    A_t = A.t()
    loc = A_t.matmul(y.gesv(K)[0])
    scale = A_t.matmul(A.gesv(K)[0])
    return loc, scale

def method3(K, A, y):
    L = K.potrf(upper=False)
    L_inv = L.inverse()
    L_inv_A = L_inv.matmul(A)
    L_inv_y = L_inv.matmul(y)
    L_inv_A_t = L_inv_A.t()
    loc = L_inv_A_t.matmul(L_inv_y)
    scale = L_inv_A_t.matmul(L_inv_A)
    return loc, scale
    
def method4(K, A, y):
    L = K.potrf(upper=False)
    L_inv_A = A.trtrs(L, upper=False)[0]
    L_inv_y = y.trtrs(L, upper=False)[0]
    L_inv_A_t = L_inv_A.t()
    loc = L_inv_A_t.matmul(L_inv_y)
    scale = L_inv_A_t.matmul(L_inv_A)
    return loc, scale

fritzo · 2018-01-23T18:41:51Z

Hi @fehiepsi I'm going to merge this now so it can follow our refactoring work, but feel free to keep submitting updates in follow-up PRs. Thanks for contributing this!

martinjankowiak · 2018-04-16T18:31:02Z

@fehiepsi is it ok if i submit a PR to clean up the language/organization of this tutorial a bit to make it a bit easier to follow?

fehiepsi · 2018-04-16T21:28:45Z

@martinjankowiak Sure, it would be great! Given that the API for GP is stable now (current PRs does not affect the API), it is a good time to revise it. I am happy to see the changes from you.

fehiepsi added 5 commits December 15, 2017 01:17

init gp

a0c528e

Merge remote-tracking branch 'upstream/dev' into add-gp

a06908e

add skeleton

5946af8

temp save

eaf91fc

add gp tutorial

8d8d272

fritzo added Examples awaiting review labels Dec 22, 2017

add more text

94aa221

fritzo reviewed Dec 26, 2017

View reviewed changes

fritzo mentioned this pull request Dec 26, 2017

Add multivariate normal distribution #651

Merged

show the fail case of inference, and the importance of constraint

df30ef1

fritzo added awaiting response and removed awaiting review labels Jan 8, 2018

fehiepsi added 9 commits January 14, 2018 11:02

move skeleton to contrib

cdfe529

remove supporting for likelihoods and mean_functions

c7c22cc

add rbf, gpr

14e42c1

fix bugs and and tests

08e8d2f

Merge remote-tracking branch 'upstream/dev' into add-gp

7573ab4

lint

ddd91ec

Merge branch 'add-gp' into add-gp-tutorial

6b7e38b

update tutorials

000aa1f

fix bugs

8619c4c

fehiepsi changed the title ~~[WIP] Gaussian Process tutorial~~ Gaussian Process tutorial Jan 15, 2018

fritzo reviewed Jan 16, 2018

View reviewed changes

eb8680 requested review from karalets and ysaatchi January 18, 2018 23:28

fehiepsi added 3 commits January 19, 2018 21:28

add params doc for gpr, remove unnecessary K

241391f

remove .K(...)

cca7069

remove .K method in test, fix a bug in setting GPR noise

6efd5bf

ysaatchi reviewed Jan 19, 2018

View reviewed changes

fehiepsi added 3 commits January 20, 2018 16:06

add input_dim parameter to kernel, replace gesv by potrf

5f454e5

make lint

dbd79c4

rerun tutorial to add input_dim to kernel

758bbf5

fritzo mentioned this pull request Jan 20, 2018

Refactor and simplify MultivariateNormal distribution #693

Merged

fehiepsi added 2 commits January 22, 2018 23:40

fix potrs not support grad

e3bcfe3

fix typo self.y

cdcd7bb

fritzo approved these changes Jan 23, 2018

View reviewed changes

fritzo merged commit 1fe22a5 into pyro-ppl:dev Jan 23, 2018

fehiepsi deleted the add-gp-tutorial branch June 10, 2018 06:42

Uh oh!

Gaussian Process tutorial #650

Gaussian Process tutorial #650

Uh oh!

Conversation

fehiepsi commented Dec 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo commented Dec 22, 2017

Uh oh!

fritzo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo commented Dec 26, 2017

Uh oh!

fehiepsi commented Dec 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo commented Jan 3, 2018

Uh oh!

fehiepsi commented Jan 3, 2018

Uh oh!

fehiepsi commented Jan 15, 2018

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo commented Jan 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysaatchi Jan 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysaatchi left a comment

Choose a reason for hiding this comment

Uh oh!

fritzo commented Jan 20, 2018

Uh oh!

fehiepsi commented Jan 20, 2018

Uh oh!

fehiepsi commented Jan 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fehiepsi commented Dec 22, 2017 •

edited

Loading

fritzo left a comment •

edited

Loading

fehiepsi commented Dec 27, 2017 •

edited

Loading

ysaatchi Jan 19, 2018 •

edited

Loading

fehiepsi commented Jan 20, 2018 •

edited

Loading