This vignette demonstrates how the NHANES 2015–2016 demographic data included in this package were obtained, processed, and are intended to be used. The data are adapted from the National Health and Nutrition Examination Survey NHANES, conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC).
Disclaimer: The data sets provided in this package are derived from the NHANES database and have been adapted for educational purposes. As such, they are NOT suitable for use as a research database. For research purposes, you should download original data files from the NHANES website and follow the analysis instructions given there.
Data Preparation
The raw NHANES data were downloaded in SAS transport format (.xpt) and processed using R, with the following key steps:
- Reading the demographic file (DEMO_I.xpt) using the haven package.
- Selecting and renaming key demographic variables (e.g., gender, age, education, income) and survey design variables (strata, weights, PSU).
- Recoding categorical variables using external code files for clarity (e.g., marital status, education level).
- Labelling missing values and infrequent categories appropriately.
- Saving the processed data frame as
nhanes
, which is then loaded with the package for easy access.
Data Structure
The included nhanes
data frame contains 9,971
participants and 13 variables. Below is a summary of the variables:
Variable | Description | Original Name |
---|---|---|
PSU | Masked variance pseudo-PSU | SDMVPSU |
weights | 2-year interview weight | WTINT2YR |
strata | Masked variance pseudo-stratum | SDMVSTRA |
gender | Gender (Male/Female) | RIAGENDR |
age | Age in years at screening | RIDAGEYR |
birth_country | Country of birth | DMDBORN4 |
marital_status | Marital status | DMDMARTL |
interview_lang | Interview language | SIALANG |
edu_level | Education level | DMDHREDU |
household_size | Number of people in household | DMDHHSIZ |
family_size | Number of people in family | DMDFMSIZ |
annual_household_income | Annual household income | INDHHIN2 |
annual_family_income | Annual family income | INDFMIN2 |
Example Usage
# View the structure of the data
glimpse(nhanes)
#> Rows: 9,971
#> Columns: 13
#> $ PSU <dbl> 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1…
#> $ weights <dbl> 134671.370, 24328.560, 12400.009, 102717.996, …
#> $ strata <dbl> 125, 125, 131, 131, 126, 128, 120, 124, 119, 1…
#> $ gender <fct> Male, Male, Male, Female, Female, Female, Fema…
#> $ age <dbl> 62, 53, 78, 56, 42, 72, 11, 4, 1, 22, 32, 18, …
#> $ birth_country <fct> US, Other, US, US, US, Other, US, US, US, US, …
#> $ marital_status <fct> Married, Divorced, Married, Living with partne…
#> $ interview_lang <fct> English, English, English, English, English, S…
#> $ edu_level <fct> College graduate or above, High School, High S…
#> $ household_size <dbl> 2, 1, 2, 1, 5, 5, 5, 5, 7, 3, 4, 3, 1, 3, 4, 2…
#> $ family_size <dbl> 2, 1, 2, 1, 5, 5, 5, 5, 7, 3, 4, 3, 1, 3, 4, 2…
#> $ annual_household_income <dbl> 10, 4, 5, 10, 7, 14, 6, 15, 77, 7, 6, 15, 3, 4…
#> $ annual_family_income <dbl> 10, 4, 5, 10, 7, 14, 6, 15, 77, 7, 6, 15, 3, 4…
# Count participants by education level
nhanes |>
count(edu_level)
#> # A tibble: 7 × 2
#> edu_level n
#> <fct> <int>
#> 1 College degree 2908
#> 2 College graduate or above 2331
#> 3 High School 2015
#> 4 9-11th Grade 1200
#> 5 Less Than 9th Grade 1087
#> 6 Missing 396
#> 7 Don't Know 34
Best Practices and References
- For research: Always download the latest, official data directly from the NHANES website.
- Documentation: Refer to the official NHANES code books for detailed variable definitions and survey methodology.
- Acknowledgment: Data were obtained from the National Health and Nutrition Examination Survey (NHANES), conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC).