diff --git a/docs/source/machine_learning.rst b/docs/source/machine_learning.rst index 41a1353f6..45606b7c6 100644 --- a/docs/source/machine_learning.rst +++ b/docs/source/machine_learning.rst @@ -75,34 +75,35 @@ create and train a model from SQL: .. code-block:: sql CREATE MODEL my_model WITH ( - model_class = 'sklearn.ensemble.GradientBoostingClassifier', + model_class = 'sklearn.linear_model.LogisticRegression', wrap_predict = True, target_column = 'target' ) AS ( - SELECT x, y, target + SELECT x, y, x*y > 0 as target FROM timeseries LIMIT 100 ) .. group-tab:: GPU - + .. code-block:: sql CREATE MODEL my_model WITH ( model_class = 'cuml.linear_model.LogisticRegression', wrap_predict = True, - wrap_fit = False, -- can we explain why this needs to be added? target_column = 'target' ) AS ( - SELECT x, y, target + SELECT x, y, x*y > 0 as target FROM timeseries LIMIT 100 ) -This call will create a new instance of ``sklearn.ensemble.GradientBoostingClassifier`` +This call will create a new instance of ``linear_model.LogisticRegression`` and train it with the data collected from the ``SELECT`` call (again, every valid ``SELECT`` query can be given). The model can than be used in subsequent calls to ``PREDICT`` -using the given name. +using the given name. +We set ``wrap_predict`` = ``True`` here to parallelize post fit prediction task of non distributed models (sklearn/cuML etc) across workers. + Have a look into :ref:`ml` for more information. 4. Check Model parameters - Model meta data