Estimate Proportions of Categorical Variables in Structural Survey

se_prop() estimates the proportions and confidence intervals for each level of one or multiple categorical variables of FSO's structural survey, by first converting columns into dummy variables and then estimating proportions and confidence intervals.

Usage

se_prop(data, ..., strata, weight, alpha = 0.05)

Arguments

data: A data frame or tibble.
...: Categorical variables. Can be passed unquoted (e.g., gender, birth_country) or programmatically using !!!syms(c("gender", "birth_country")).
strata: Unquoted or quoted name of the strata column. Defaults to zone if omitted.
weight: Unquoted or quoted name of the sampling weights column. For programmatic use with a string variable (e.g., wt <- "weights"), use !!sym(wt) in the function call.
alpha: Numeric significance level for confidence intervals. Default is 0.05 (95% CI).

Value

A tibble with proportion estimates for all grouping column combinations, including:

occ: Sample size (number of observations) per group.
prop: Estimated proportion of the specified categorical variable in the corresponding group.
vhat, stand_dev: Estimated variance of the mean (vhat) and its standard deviation (stand_dev, square root of the variance).
ci, ci_l, ci_u: Confidence interval: half-width (ci), lower (ci_l) and upper (ci_u) bounds.

Examples

# Direct column references (unquoted)
se_prop(
  data = nhanes,
  interview_lang,
  birth_country,
  strata = strata,
  weight = weights
)
#> # A tibble: 6 × 9
#>   interview_lang birth_country   occ     prop    vhat stand_dev      ci     ci_l
#>   <chr>          <chr>         <int>    <dbl>   <dbl>     <dbl>   <dbl>    <dbl>
#> 1 English        Missing           1  5.68e-5 3.22e-9 0.0000568 1.11e-4 -5.45e-5
#> 2 English        Other          1368  1.01e-1 1.18e-5 0.00344   6.74e-3  9.39e-2
#> 3 English        US             7215  8.27e-1 1.73e-5 0.00416   8.15e-3  8.19e-1
#> 4 Spanish        Missing           1  3.70e-5 1.37e-9 0.0000370 7.25e-5 -3.55e-5
#> 5 Spanish        Other           868  5.05e-2 3.81e-6 0.00195   3.83e-3  4.66e-2
#> 6 Spanish        US              518  2.19e-2 1.11e-6 0.00105   2.07e-3  1.99e-2
#> # ℹ 1 more variable: ci_u <dbl>

# Quoted column names
se_prop(
  data = nhanes,
  "interview_lang",
  gender,
  "birth_country",
  strata = "strata",
  weight = weights,
)
#> # A tibble: 10 × 10
#>    interview_lang gender birth_country   occ      prop    vhat stand_dev      ci
#>    <chr>          <chr>  <chr>         <int>     <dbl>   <dbl>     <dbl>   <dbl>
#>  1 English        Female Missing           1 0.0000568 3.22e-9 0.0000568 1.11e-4
#>  2 English        Female Other           704 0.0514    5.66e-6 0.00238   4.66e-3
#>  3 English        Female US             3640 0.425     5.21e-5 0.00722   1.41e-2
#>  4 English        Male   Other           664 0.0492    6.34e-6 0.00252   4.93e-3
#>  5 English        Male   US             3575 0.402     5.20e-5 0.00721   1.41e-2
#>  6 Spanish        Female Missing           1 0.0000370 1.37e-9 0.0000370 7.25e-5
#>  7 Spanish        Female Other           464 0.0242    1.55e-6 0.00125   2.44e-3
#>  8 Spanish        Female US              269 0.0106    4.79e-7 0.000692  1.36e-3
#>  9 Spanish        Male   Other           404 0.0263    2.18e-6 0.00148   2.90e-3
#> 10 Spanish        Male   US              249 0.0113    6.06e-7 0.000779  1.53e-3
#> # ℹ 2 more variables: ci_l <dbl>, ci_u <dbl>

# Programmatic use with strings
wt <- "weights"
vars <- c("interview_lang", "gender", "birth_country")
se_prop(
  data = nhanes,
  strata = strata,
  weight = !!rlang::sym(wt),
  !!!rlang::syms(vars)
)
#> # A tibble: 10 × 10
#>    interview_lang gender birth_country   occ      prop    vhat stand_dev      ci
#>    <chr>          <chr>  <chr>         <int>     <dbl>   <dbl>     <dbl>   <dbl>
#>  1 English        Female Missing           1 0.0000568 3.22e-9 0.0000568 1.11e-4
#>  2 English        Female Other           704 0.0514    5.66e-6 0.00238   4.66e-3
#>  3 English        Female US             3640 0.425     5.21e-5 0.00722   1.41e-2
#>  4 English        Male   Other           664 0.0492    6.34e-6 0.00252   4.93e-3
#>  5 English        Male   US             3575 0.402     5.20e-5 0.00721   1.41e-2
#>  6 Spanish        Female Missing           1 0.0000370 1.37e-9 0.0000370 7.25e-5
#>  7 Spanish        Female Other           464 0.0242    1.55e-6 0.00125   2.44e-3
#>  8 Spanish        Female US              269 0.0106    4.79e-7 0.000692  1.36e-3
#>  9 Spanish        Male   Other           404 0.0263    2.18e-6 0.00148   2.90e-3
#> 10 Spanish        Male   US              249 0.0113    6.06e-7 0.000779  1.53e-3
#> # ℹ 2 more variables: ci_l <dbl>, ci_u <dbl>