se_prop()
estimates the proportions and confidence intervals for each level of one or multiple categorical variables
of FSO's structural survey, by first converting columns into dummy variables and then estimating proportions and confidence intervals.
Arguments
- data
A data frame or tibble.
- ...
Categorical variables. Can be passed unquoted (e.g.,
gender
,birth_country
) or programmatically using!!!syms(c("gender", "birth_country"))
.- strata
Unquoted or quoted name of the strata column. Defaults to
zone
if omitted.- weight
Unquoted or quoted name of the sampling weights column. For programmatic use with a string variable (e.g.,
wt <- "weights"
), use!!sym(wt)
in the function call.- alpha
Numeric significance level for confidence intervals. Default is 0.05 (95% CI).
Value
A tibble with proportion estimates for all grouping column combinations, including:
- occ
Sample size (number of observations) per group.
- prop
Estimated proportion of the specified categorical variable in the corresponding group.
- vhat, stand_dev
Estimated variance of the mean (
vhat
) and its standard deviation (stand_dev
, square root of the variance).- ci, ci_l, ci_u
Confidence interval: half-width (
ci
), lower (ci_l
) and upper (ci_u
) bounds.
Examples
# Direct column references (unquoted)
se_prop(
data = nhanes,
interview_lang,
birth_country,
strata = strata,
weight = weights
)
#> # A tibble: 6 × 9
#> interview_lang birth_country occ prop vhat stand_dev ci ci_l
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 English Missing 1 5.68e-5 3.22e-9 0.0000568 1.11e-4 -5.45e-5
#> 2 English Other 1368 1.01e-1 1.18e-5 0.00344 6.74e-3 9.39e-2
#> 3 English US 7215 8.27e-1 1.73e-5 0.00416 8.15e-3 8.19e-1
#> 4 Spanish Missing 1 3.70e-5 1.37e-9 0.0000370 7.25e-5 -3.55e-5
#> 5 Spanish Other 868 5.05e-2 3.81e-6 0.00195 3.83e-3 4.66e-2
#> 6 Spanish US 518 2.19e-2 1.11e-6 0.00105 2.07e-3 1.99e-2
#> # ℹ 1 more variable: ci_u <dbl>
# Quoted column names
se_prop(
data = nhanes,
"interview_lang",
gender,
"birth_country",
strata = "strata",
weight = weights,
)
#> # A tibble: 10 × 10
#> interview_lang gender birth_country occ prop vhat stand_dev ci
#> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 English Female Missing 1 0.0000568 3.22e-9 0.0000568 1.11e-4
#> 2 English Female Other 704 0.0514 5.66e-6 0.00238 4.66e-3
#> 3 English Female US 3640 0.425 5.21e-5 0.00722 1.41e-2
#> 4 English Male Other 664 0.0492 6.34e-6 0.00252 4.93e-3
#> 5 English Male US 3575 0.402 5.20e-5 0.00721 1.41e-2
#> 6 Spanish Female Missing 1 0.0000370 1.37e-9 0.0000370 7.25e-5
#> 7 Spanish Female Other 464 0.0242 1.55e-6 0.00125 2.44e-3
#> 8 Spanish Female US 269 0.0106 4.79e-7 0.000692 1.36e-3
#> 9 Spanish Male Other 404 0.0263 2.18e-6 0.00148 2.90e-3
#> 10 Spanish Male US 249 0.0113 6.06e-7 0.000779 1.53e-3
#> # ℹ 2 more variables: ci_l <dbl>, ci_u <dbl>
# Programmatic use with strings
wt <- "weights"
vars <- c("interview_lang", "gender", "birth_country")
se_prop(
data = nhanes,
strata = strata,
weight = !!rlang::sym(wt),
!!!rlang::syms(vars)
)
#> # A tibble: 10 × 10
#> interview_lang gender birth_country occ prop vhat stand_dev ci
#> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 English Female Missing 1 0.0000568 3.22e-9 0.0000568 1.11e-4
#> 2 English Female Other 704 0.0514 5.66e-6 0.00238 4.66e-3
#> 3 English Female US 3640 0.425 5.21e-5 0.00722 1.41e-2
#> 4 English Male Other 664 0.0492 6.34e-6 0.00252 4.93e-3
#> 5 English Male US 3575 0.402 5.20e-5 0.00721 1.41e-2
#> 6 Spanish Female Missing 1 0.0000370 1.37e-9 0.0000370 7.25e-5
#> 7 Spanish Female Other 464 0.0242 1.55e-6 0.00125 2.44e-3
#> 8 Spanish Female US 269 0.0106 4.79e-7 0.000692 1.36e-3
#> 9 Spanish Male Other 404 0.0263 2.18e-6 0.00148 2.90e-3
#> 10 Spanish Male US 249 0.0113 6.06e-7 0.000779 1.53e-3
#> # ℹ 2 more variables: ci_l <dbl>, ci_u <dbl>