Skip to contents

se_prop() estimates the proportions and confidence intervals for each level of one or multiple categorical variables of FSO's structural survey, by first converting columns into dummy variables and then estimating proportions and confidence intervals.

Usage

se_prop(data, ..., strata, weight, alpha = 0.05)

Arguments

data

A data frame or tibble.

...

Categorical variables. Can be passed unquoted (e.g., gender, birth_country) or programmatically using !!!syms(c("gender", "birth_country")).

strata

Unquoted or quoted name of the strata column. Defaults to zone if omitted.

weight

Unquoted or quoted name of the sampling weights column. For programmatic use with a string variable (e.g., wt <- "weights"), use !!sym(wt) in the function call.

alpha

Numeric significance level for confidence intervals. Default is 0.05 (95% CI).

Value

A tibble with proportion estimates for all grouping column combinations, including:

occ

Sample size (number of observations) per group.

prop

Estimated proportion of the specified categorical variable in the corresponding group.

vhat, stand_dev

Estimated variance of the mean (vhat) and its standard deviation (stand_dev, square root of the variance).

ci, ci_l, ci_u

Confidence interval: half-width (ci), lower (ci_l) and upper (ci_u) bounds.

Examples

# Direct column references (unquoted)
se_prop(
  data = nhanes,
  interview_lang,
  birth_country,
  strata = strata,
  weight = weights
)
#> # A tibble: 6 × 9
#>   interview_lang birth_country   occ     prop    vhat stand_dev      ci     ci_l
#>   <chr>          <chr>         <int>    <dbl>   <dbl>     <dbl>   <dbl>    <dbl>
#> 1 English        Missing           1  5.68e-5 3.22e-9 0.0000568 1.11e-4 -5.45e-5
#> 2 English        Other          1368  1.01e-1 1.18e-5 0.00344   6.74e-3  9.39e-2
#> 3 English        US             7215  8.27e-1 1.73e-5 0.00416   8.15e-3  8.19e-1
#> 4 Spanish        Missing           1  3.70e-5 1.37e-9 0.0000370 7.25e-5 -3.55e-5
#> 5 Spanish        Other           868  5.05e-2 3.81e-6 0.00195   3.83e-3  4.66e-2
#> 6 Spanish        US              518  2.19e-2 1.11e-6 0.00105   2.07e-3  1.99e-2
#> # ℹ 1 more variable: ci_u <dbl>

# Quoted column names
se_prop(
  data = nhanes,
  "interview_lang",
  gender,
  "birth_country",
  strata = "strata",
  weight = weights,
)
#> # A tibble: 10 × 10
#>    interview_lang gender birth_country   occ      prop    vhat stand_dev      ci
#>    <chr>          <chr>  <chr>         <int>     <dbl>   <dbl>     <dbl>   <dbl>
#>  1 English        Female Missing           1 0.0000568 3.22e-9 0.0000568 1.11e-4
#>  2 English        Female Other           704 0.0514    5.66e-6 0.00238   4.66e-3
#>  3 English        Female US             3640 0.425     5.21e-5 0.00722   1.41e-2
#>  4 English        Male   Other           664 0.0492    6.34e-6 0.00252   4.93e-3
#>  5 English        Male   US             3575 0.402     5.20e-5 0.00721   1.41e-2
#>  6 Spanish        Female Missing           1 0.0000370 1.37e-9 0.0000370 7.25e-5
#>  7 Spanish        Female Other           464 0.0242    1.55e-6 0.00125   2.44e-3
#>  8 Spanish        Female US              269 0.0106    4.79e-7 0.000692  1.36e-3
#>  9 Spanish        Male   Other           404 0.0263    2.18e-6 0.00148   2.90e-3
#> 10 Spanish        Male   US              249 0.0113    6.06e-7 0.000779  1.53e-3
#> # ℹ 2 more variables: ci_l <dbl>, ci_u <dbl>

# Programmatic use with strings
wt <- "weights"
vars <- c("interview_lang", "gender", "birth_country")
se_prop(
  data = nhanes,
  strata = strata,
  weight = !!rlang::sym(wt),
  !!!rlang::syms(vars)
)
#> # A tibble: 10 × 10
#>    interview_lang gender birth_country   occ      prop    vhat stand_dev      ci
#>    <chr>          <chr>  <chr>         <int>     <dbl>   <dbl>     <dbl>   <dbl>
#>  1 English        Female Missing           1 0.0000568 3.22e-9 0.0000568 1.11e-4
#>  2 English        Female Other           704 0.0514    5.66e-6 0.00238   4.66e-3
#>  3 English        Female US             3640 0.425     5.21e-5 0.00722   1.41e-2
#>  4 English        Male   Other           664 0.0492    6.34e-6 0.00252   4.93e-3
#>  5 English        Male   US             3575 0.402     5.20e-5 0.00721   1.41e-2
#>  6 Spanish        Female Missing           1 0.0000370 1.37e-9 0.0000370 7.25e-5
#>  7 Spanish        Female Other           464 0.0242    1.55e-6 0.00125   2.44e-3
#>  8 Spanish        Female US              269 0.0106    4.79e-7 0.000692  1.36e-3
#>  9 Spanish        Male   Other           404 0.0263    2.18e-6 0.00148   2.90e-3
#> 10 Spanish        Male   US              249 0.0113    6.06e-7 0.000779  1.53e-3
#> # ℹ 2 more variables: ci_l <dbl>, ci_u <dbl>