Skip to contents

This vignette demonstrates how the NHANES 2015–2016 demographic data included in this package were obtained, processed, and are intended to be used. The data are adapted from the National Health and Nutrition Examination Survey NHANES, conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC).

Disclaimer: The data sets provided in this package are derived from the NHANES database and have been adapted for educational purposes. As such, they are NOT suitable for use as a research database. For research purposes, you should download original data files from the NHANES website and follow the analysis instructions given there.

Data Preparation

The raw NHANES data were downloaded in SAS transport format (.xpt) and processed using R, with the following key steps:

  • Reading the demographic file (DEMO_I.xpt) using the haven package.
  • Selecting and renaming key demographic variables (e.g., gender, age, education, income) and survey design variables (strata, weights, PSU).
  • Recoding categorical variables using external code files for clarity (e.g., marital status, education level).
  • Labelling missing values and infrequent categories appropriately.
  • Saving the processed data frame as nhanes, which is then loaded with the package for easy access.

Data Structure

The included nhanes data frame contains 9,971 participants and 13 variables. Below is a summary of the variables:

Variable Description Original Name
PSU Masked variance pseudo-PSU SDMVPSU
weights 2-year interview weight WTINT2YR
strata Masked variance pseudo-stratum SDMVSTRA
gender Gender (Male/Female) RIAGENDR
age Age in years at screening RIDAGEYR
birth_country Country of birth DMDBORN4
marital_status Marital status DMDMARTL
interview_lang Interview language SIALANG
edu_level Education level DMDHREDU
household_size Number of people in household DMDHHSIZ
family_size Number of people in family DMDFMSIZ
annual_household_income Annual household income INDHHIN2
annual_family_income Annual family income INDFMIN2

Example Usage

# View the structure of the data
glimpse(nhanes)
#> Rows: 9,971
#> Columns: 13
#> $ PSU                     <dbl> 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1…
#> $ weights                 <dbl> 134671.370, 24328.560, 12400.009, 102717.996, 
#> $ strata                  <dbl> 125, 125, 131, 131, 126, 128, 120, 124, 119, 1…
#> $ gender                  <fct> Male, Male, Male, Female, Female, Female, Fema…
#> $ age                     <dbl> 62, 53, 78, 56, 42, 72, 11, 4, 1, 22, 32, 18, 
#> $ birth_country           <fct> US, Other, US, US, US, Other, US, US, US, US, 
#> $ marital_status          <fct> Married, Divorced, Married, Living with partne…
#> $ interview_lang          <fct> English, English, English, English, English, S…
#> $ edu_level               <fct> College graduate or above, High School, High S…
#> $ household_size          <dbl> 2, 1, 2, 1, 5, 5, 5, 5, 7, 3, 4, 3, 1, 3, 4, 2…
#> $ family_size             <dbl> 2, 1, 2, 1, 5, 5, 5, 5, 7, 3, 4, 3, 1, 3, 4, 2…
#> $ annual_household_income <dbl> 10, 4, 5, 10, 7, 14, 6, 15, 77, 7, 6, 15, 3, 4…
#> $ annual_family_income    <dbl> 10, 4, 5, 10, 7, 14, 6, 15, 77, 7, 6, 15, 3, 4…

# Count participants by education level
nhanes |>
  count(edu_level)
#> # A tibble: 7 × 2
#>   edu_level                     n
#>   <fct>                     <int>
#> 1 College degree             2908
#> 2 College graduate or above  2331
#> 3 High School                2015
#> 4 9-11th Grade               1200
#> 5 Less Than 9th Grade        1087
#> 6 Missing                     396
#> 7 Don't Know                   34

Best Practices and References

  • For research: Always download the latest, official data directly from the NHANES website.
  • Documentation: Refer to the official NHANES code books for detailed variable definitions and survey methodology.
  • Acknowledgment: Data were obtained from the National Health and Nutrition Examination Survey (NHANES), conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC).

Further Information

Note: This vignette is intended to ensure transparency and proper attribution for the use of NHANES data in this package. Always consult the official NHANES documentation for authoritative guidance.