Skip to content

Bug Report: Missing Confidence Intervals and P-Values in tbl_regression for Survey Design with Many Independent Variables #1913

@zhaohongxin0

Description

@zhaohongxin0

Hello,

I’d like to report a bug in the tbl_regression function when creating regression tables for survey designs with many independent variables. The issue is that the confidence intervals and p-values are missing from the table.

my R data attached:
data_frame.zip

Below is a simplified version of my analysis code:

library(gtsummary)
library(survey)

data_frame <- data_to_save

# Constructing a complex survey design object
svy_design <- survey::svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR, nest = TRUE, data = data_frame)

# Fitting a survey-weighted logistic regression model
result <- survey::svyglm(RIAGENDR ~ DMDHRGND + DMDCITZN + SIALANG + DMDHRAGZ + RIDRETH3 + DMDHRMAZ + INDFMPIR, family = binomial(), design = svy_design)

tbl_regression(result, exponentiate = TRUE)

Here’s an example of the output, where the confidence intervals and p-values are missing:

image

Interestingly, when we examine the model summary, the standard errors are present, but the p-values are missing:

# Print model summary with statistical measures
summary(result)

image

According to the svyglm function documentation, specifying the degrees of freedom brings back the p-values:

# Print model summary with degrees of freedom adjustment (df = degf)
summary(result, df.resid = degf(result$survey.design))

image

# Print model summary with degrees of freedom adjustment (df = Inf)
summary(result, df.resid = Inf)

image

In summary, when the number of independent variables is large, the tbl_regression output lacks confidence intervals and p-values. I suspect that the missing confidence intervals are due to tbl_regression extracting profile likelihood CI, and perhaps switching to Wald CI might resolve this. The missing p-values seem related to the degrees of freedom setting, which works fine when using df.resid = Inf or df.resid = degf(result$survey.design) in the summary() function. The question is, how can this setting be passed to tbl_regression?

Could you please look into this issue? It's becoming a significant limitation as most of my analyses involve many independent variables, making the tbl_regression results unusable.

I've attached my R dataset for your reference.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions