Skip to content

Model type LassoModel doesn't support intercept #74

@ForceBru

Description

@ForceBru

Code that doesn't work

julia> using DataFrames, Lasso

julia> df = DataFrame(x=randn(100), y=3randn(100) .+ 1);

julia> fit(LassoModel, @formula(x ~ 1 + y), df)
ERROR: ArgumentError: Model type LassoModel doesn't support intercept specified in formula x ~ 1 + y
Stacktrace:
 [1] apply_schema(t::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, schema::StatsModels.Schema, Mod::Type{LassoModel})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/schema.jl:288
 [2] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}}; model::Type{LassoModel}, contrasts::Dict{Symbol, Any})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:84
 [3] kwcall(::NamedTuple{(:model, :contrasts), Tuple{UnionAll, Dict{Symbol, Any}}}, ::Type{ModelFrame}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:73
 [4] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:85
 [5] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame)
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:78
 [6] top-level scope
   @ REPL[7]:1

Why can I not manually specify an intercept like @formula(x ~ 1 + y)? The documentation ?@formula says:

1, 0, and -1 indicate the presence (for 1) or absence (for 0 and -1) of an intercept column.

So 1 is a valid intercept specification, like in R. This @formula also works in GLM.lm.

Code that works

If I write @formula(x ~ y), Lasso.jl will automatically fit a model with an intercept:

julia> fit(LassoModel, @formula(x ~ y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

I assume the first coefficient is the intercept and the second one is multiplied by y, so the model is:

x = -0.132743 + 0.0497596 * y

So, intercepts are supported, but I can't manually specify that I want an intercept.

More code that doesn't work

Let's fit a model without an intercept. I specify this with the 0 in @formula(x ~ 0 + y).

julia> fit(LassoModel, @formula(x ~ 0 + y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ 0 + y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

It seems like the package ignored the zero in the formula, fitted an intercept -0.132743 anyway and produced the same model as above, even though the @formula is different. R's glmnet supports fitting without an intercept since 2013.


It would be nice if it were possible to specify the intercept in the formula.

Versions

  • Julia v1.9-beta2
  • Lasso v0.7.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions