@@ -4,93 +4,5 @@ title: Survival Regression - spark.ml
44displayTitle : Survival Regression - spark.ml
55---
66
7-
8- `\[
9- \newcommand{\R}{\mathbb{R}}
10- \newcommand{\E}{\mathbb{E}}
11- \newcommand{\x}{\mathbf{x}}
12- \newcommand{\y}{\mathbf{y}}
13- \newcommand{\wv}{\mathbf{w}}
14- \newcommand{\av}{\mathbf{\alpha}}
15- \newcommand{\bv}{\mathbf{b}}
16- \newcommand{\N}{\mathbb{N}}
17- \newcommand{\id}{\mathbf{I}}
18- \newcommand{\ind}{\mathbf{1}}
19- \newcommand{\0}{\mathbf{0}}
20- \newcommand{\unit}{\mathbf{e}}
21- \newcommand{\one}{\mathbf{1}}
22- \newcommand{\zero}{\mathbf{0}}
23- \] `
24-
25-
26- In ` spark.ml ` , we implement the [ Accelerated failure time (AFT)] ( https://en.wikipedia.org/wiki/Accelerated_failure_time_model )
27- model which is a parametric survival regression model for censored data.
28- It describes a model for the log of survival time, so it's often called
29- log-linear model for survival analysis. Different from
30- [ Proportional hazards] ( https://en.wikipedia.org/wiki/Proportional_hazards_model ) model
31- designed for the same purpose, the AFT model is more easily to parallelize
32- because each instance contribute to the objective function independently.
33-
34- Given the values of the covariates $x^{'}$, for random lifetime $t_ {i}$ of
35- subjects i = 1, ..., n, with possible right-censoring,
36- the likelihood function under the AFT model is given as:
37- `\[
38- L(\beta,\sigma)=\prod_ {i=1}^n[ \frac{1}{\sigma}f_ {0}(\frac{\log{t_ {i}}-x^{'}\beta}{\sigma})] ^{\delta_ {i}}S_ {0}(\frac{\log{t_ {i}}-x^{'}\beta}{\sigma})^{1-\delta_ {i}}
39- \] `
40- Where $\delta_ {i}$ is the indicator of the event has occurred i.e. uncensored or not.
41- Using $\epsilon_ {i}=\frac{\log{t_ {i}}-x^{'}\beta}{\sigma}$, the log-likelihood function
42- assumes the form:
43- `\[
44- \iota(\beta,\sigma)=\sum_ {i=1}^{n}[ -\delta_ {i}\log\sigma+\delta_ {i}\log{f_ {0}}(\epsilon_ {i})+(1-\delta_ {i})\log{S_ {0}(\epsilon_ {i})}]
45- \] `
46- Where $S_ {0}(\epsilon_ {i})$ is the baseline survivor function,
47- and $f_ {0}(\epsilon_ {i})$ is corresponding density function.
48-
49- The most commonly used AFT model is based on the Weibull distribution of the survival time.
50- The Weibull distribution for lifetime corresponding to extreme value distribution for
51- log of the lifetime, and the $S_ {0}(\epsilon)$ function is:
52- `\[
53- S_ {0}(\epsilon_ {i})=\exp(-e^{\epsilon_ {i}})
54- \] `
55- the $f_ {0}(\epsilon_ {i})$ function is:
56- `\[
57- f_ {0}(\epsilon_ {i})=e^{\epsilon_ {i}}\exp(-e^{\epsilon_ {i}})
58- \] `
59- The log-likelihood function for AFT model with Weibull distribution of lifetime is:
60- `\[
61- \iota(\beta,\sigma)= -\sum_ {i=1}^n[ \delta_ {i}\log\sigma-\delta_ {i}\epsilon_ {i}+e^{\epsilon_ {i}}]
62- \] `
63- Due to minimizing the negative log-likelihood equivalent to maximum a posteriori probability,
64- the loss function we use to optimize is $-\iota(\beta,\sigma)$.
65- The gradient functions for $\beta$ and $\log\sigma$ respectively are:
66- `\[
67- \frac{\partial (-\iota)}{\partial \beta}=\sum_ {1=1}^{n}[ \delta_ {i}-e^{\epsilon_ {i}}] \frac{x_ {i}}{\sigma}
68- \] `
69- `\[
70- \frac{\partial (-\iota)}{\partial (\log\sigma)}=\sum_ {i=1}^{n}[ \delta_ {i}+(\delta_ {i}-e^{\epsilon_ {i}})\epsilon_ {i}]
71- \] `
72-
73- The AFT model can be formulated as a convex optimization problem,
74- i.e. the task of finding a minimizer of a convex function $-\iota(\beta,\sigma)$
75- that depends coefficients vector $\beta$ and the log of scale parameter $\log\sigma$.
76- The optimization algorithm underlying the implementation is L-BFGS.
77- The implementation matches the result from R's survival function
78- [ survreg] ( https://stat.ethz.ch/R-manual/R-devel/library/survival/html/survreg.html )
79-
80- ## Example:
81-
82- <div class =" codetabs " >
83-
84- <div data-lang =" scala " markdown =" 1 " >
85- {% include_example scala/org/apache/spark/examples/ml/AFTSurvivalRegressionExample.scala %}
86- </div >
87-
88- <div data-lang =" java " markdown =" 1 " >
89- {% include_example java/org/apache/spark/examples/ml/JavaAFTSurvivalRegressionExample.java %}
90- </div >
91-
92- <div data-lang =" python " markdown =" 1 " >
93- {% include_example python/ml/aft_survival_regression.py %}
94- </div >
95-
96- </div >
7+ > This section has been moved into the
8+ [ classification and regression section] ( ml-classification-regression.html#survival-regression ) .
0 commit comments