HCUP Sample Design: National Databases
Menu
Splash

Welcome
Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial on Sample Design of National Databases. This tutorial was created for researchers who are using HCUP national databases and who have some background in basic research methods.
In order for you to create accurate and unbiased estimates in your research, it is essential for you to understand the sampling methods of HCUP national databases.
In this tutorial, you'll learn how the three national HCUP databases are created from the state-level HCUP databases. The three national databases are: the Nationwide Inpatient Sample, or the NIS; the Nationwide Emergency Department Sample, or the NEDS; and the Kids' Inpatient Database, or the KID.
Because the HCUP national databases each serve a different purpose, each one is designed slightly differently. Understanding these differences and how they impact your research is critical to ensure your data estimates are accurate and unbiased, and that you draw sound conclusions.
About HCUP
Before we get started, a quick word about HCUP:
HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.
If you are unfamiliar with HCUP or would like a refresher, please consider taking our HCUP general overview course.
Learning Objectives
The underlying goal of this module is to ensure that you select the best HCUP databases for your research. One important factor in doing so is understanding the sample designs of the national databases. By the end of this module, you will:
Understand the sample designs for each of the HCUP national databases
Understand how the sample designs can influence which database is best for your research
Avoid common errors that result from misunderstanding of sample design
Key Terms
Because this module is focused on sample design there are a few key terms that are helpful to review. We will be tying this information to the HCUP database design later in the module. Click each key term to learn more.
The Target Universe is all people or entities (such as hospitals or emergency departments) that we wish to understand.
The Sample Frame is a subset of the target universe that we are able to study to make inferences about the target universe.
The Sample Strata are relatively homogeneous groups from the sample frame. A sample is selected from each strata.
The Sample Unit is the level at which we sample within each strata, such as at the discharge level or hospital level.
HCUP Databases

We have state- and national-level databases that reflect inpatient, emergency department, and ambulatory surgery care. The national databases are derived from our state-level databases. In other words, the state-level databases serve as the sample frame for the national databases: the NIS, the NEDS, and the KID.
The State Inpatient Databases, or SID, are a set of databases that include all inpatient hospital discharges from community hospitals in participating states.
The State Emergency Department Databases, or SEDD, are a set of databases that contain all treat-and-release hospital emergency department visits from community hospitals in participating states.
The State Ambulatory Surgery Databases, or SASD, are a set of databases that capture same-day surgeries performed at community hospitals and some freestanding ambulatory surgery centers.
Summary
In summary, HCUP has six types of databases that cover inpatient, emergency department, and ambulatory surgery data at state, regional, and/or national levels.
The next three sections will focus on the design of HCUP's three national-level databases:
The NIS
The NEDS
And the KID
Nationwide Inpatient Sample (NIS)
The Nationwide Inpatient Sample, or NIS, is a unique and powerful database of hospital inpatient stays. Researchers and policymakers use the NIS to identify, track, and analyze national and regional trends in hospital utilization, access, charges, and quality.
The NIS contains annual data from 1988 forward.
NIS Sample
The NIS is a stratified sample of hospitals from the State Inpatient Databases, or SID.
The Target Universe for the NIS is all US community hospitals. We define the target universe from the American Hospital Association (A H A) Annual Survey of Hospitals.
The sampling frame for the NIS is the State Inpatient Databases, which, in recent years, includes 90% of the target universe from all states contributing data to HCUP.
The strata used in creating the NIS are US region, urban or rural location, teaching status, ownership, and bed size. These strata will be described in more detail later in this tutorial. Please note that because the NIS sample is not designed with "state" as a stratification variable, state-level analyses cannot be conducted. If you are interested in analyses by state, you should use the state-specific SID.
Once the hospitals have been stratified, a random sample of hospitals which approximates the target universe (all US community hospitals) is taken. This sample includes approximately 20% of US community hospitals. 100% of the discharges from each of the sampled hospitals are included in the NIS. This type of sampling design is referred to as a stratified, single-stage cluster sample. A stratified random sample of hospitals (clusters) is drawn and then all discharges are included from each selected hospital.
NIS Strata
In creating the NIS, the first step was to stratify the SID hospitals according to five strata: geographic region, location, teaching status, ownership, and bed size.
The Geographic Region is Northeast, Midwest, West, or South as defined by the U.S Census Bureau.
Practice patterns have been shown to vary substantially by region. Region is defined by the US Census Bureau.
The Location is urban or rural.
Government payment policies often differ according to this designation. Also, rural hospitals are generally smaller and offer fewer services than urban hospitals. The classification of urban or rural hospital location is based on Core Based Statistical Area (CBSA) codes. Hospitals with a CBSA type of Metropolitan or Division are classified as urban, while hospitals with a CBSA type of Micropolitan or Rural are classified as rural.
The Teaching Status is teaching or non-teaching.
The missions of teaching hospitals differ from non-teaching hospitals. In addition, financial considerations differ between these two hospital groups. A hospital is considered a teaching hospital if it meets any one of the following three criteria: Residency training approval by the Accreditation Council for Graduate Medical Education (ACGME) Membership in the Council of Teaching Hospitals (COTH) A ratio of full-time equivalent interns and residents to beds of .25 or higher
Ownership is Government non-federal (public), private not-for-profit (voluntary), or private investor-owned (proprietary).
Depending on their control, hospitals tend to have different missions and different responses to government regulations and policies. When there are enough hospitals of each type to allow it, the NIS stratifies hospitals as public, voluntary, and proprietary. For smaller strata the NIS uses a collapsed stratification of public versus private, with the voluntary and proprietary hospitals combined to form a single "private" category. For all other combinations of region, location, and teaching status, no stratification based on control is advisable, given the number of hospitals in these cells.
Hospital size or bed size is small, medium, or large.
Bed size categories were based on the number of hospital beds and were specific to the hospital's region, location, and teaching status. About one-third of the hospitals in a given region, location, and teaching status combination fall within each bed size category (small, medium, or large). The NIS uses different cutoff points for rural, urban non-teaching, and urban teaching hospitals because hospitals in those categories tend to be small, medium, and large, respectively.
NIS Weight Variable
To produce national or regional estimates, the HCUP databases provide a "weight" variable that you can apply to your data. If you're interested in learning more about weighting the national databases, please access the HCUP tutorial on weighting.
Sample Designs Change Over Time
Now that you understand the NIS sample design, you should know that revisions have been made to the NIS sample design that could affect estimates calculated from the NIS.
You should always check the NIS online documentation on the HCUP User Support website before starting your research project.
Over time there have been changes to the NIS. States have been added to the sampling frame.
In 1988, the NIS was based on 8 states. The more recent years of the NIS have 40+ states.
There were important sample design changes in 1998. The NIS excluded short term rehabilitation hospitals from frame, changed the definition of discharges, discontinued the preference for NIS hospitals that were in the sample in prior years, and redefined the hospital stratification variables for sampling.
The sample designs are refined over time in other databases as well. There is useful documentation on the HCUP User Support website that details how you can account for these sample design changes.
NIS Summary
In this section you have learned the following information about the NIS database:
The NIS is constructed from the State Inpatient Databases, or SID.
The NIS is a stratified sample of hospitals.
The NIS cannot be used to conduct state-level analyses. Below, you can see a summary of the target universe, sample frame, sample strata, and sample unit for the NIS.
| Target Universe | All discharges from community, non-rehabilitation hospitals in the United States |
| Sample Frame | All discharges from community, non-rehabilitation hospitals in the participating HCUP Partner States |
| Sample Strata | US region, urban or rural location, teaching status, ownership, bed size |
| Sample Unit | Hospital |
Nationwide Emergency Department Sample (NEDS)
The Nationwide Emergency Department Sample, or NEDS, is a unique and powerful database of emergency department visits. Researchers and policymakers use the NEDS to identify, track, and analyze national and regional hospital emergency department: care, utilization, access, charges, and quality.
The NEDS contains annual data from 2006 forward.
NEDS Sample
The NEDS is a stratified sample of hospitals from the State Emergency Department Databases, or SEDD, and the State Inpatient Databases, or SID.
The target universe for the NEDS is all US community hospital-based emergency departments. We define the target universe from the American Hospital Association (AHA) Annual Survey of Hospitals.
Both the State Emergency Department Databases, the SEDD, and the State Inpatient Databases, the SID, are used to construct the NEDS - or, in other words, they are the frame for the NEDS. The SEDD provides data on treat-and-release emergency department visits, which account for more than 80% of all emergency department visits. The SID provides data on the emergency department visits that resulted in an inpatient admission. The NEDS includes data on care that began in the emergency department regardless of whether the patient was treated and released or admitted to the hospital.
The strata used in creating the NEDS are US region, urban or rural location, teaching status, ownership, and trauma-level. These strata will be described in more detail later in this tutorial. As in the NIS sample design, "state" is not included as a stratum; therefore state-level analyses cannot be conducted. If you are interested in analyses by state, you should use the state-specific SID or SEDD.
Once the hospital-based emergency departments have been stratified, a sample that approximates a 20% stratified sample of US hospital-based emergency departments (the target universe) is constructed. 100% of all emergency department visits from the selected hospitals are included in the NEDS. This type of sampling design is referred to as a stratified, single-stage cluster sample. A stratified random sample of hospitals (clusters) is drawn and then all discharges are included from each selected hospital.
NEDS Strata
Like the NIS, the NEDS is also a stratified, single-stage cluster sample. The NEDS sampling design is very similar, in concept, to that of the NIS. The NEDS is constructed by categorizing hospitals according to five strata. The strata include geographic region, location, teaching status, ownership and trauma-level designation. Note that the five NEDS strata are the same as the NIS strata with one exception. In the NEDS sample design, the hospital bed size stratum is replaced by the trauma stratum. Emergency department bed size would have been a good stratifier to include, but AHRQ could not find a readily available and reliable source of emergency department bed size information.
The Geographic Region is Northeast, Midwest, West, or South as defined by the U.S Census Bureau.
Practice patterns have been shown to vary substantially by region. Region is defined by the US Census Bureau.
The Location is urban or rural.
Government payment policies often differ according to this designation. Also, rural hospitals are generally smaller and offer fewer services than urban hospitals. The classification of urban or rural hospital location is based on Core Based Statistical Area (CBSA) codes. Hospitals with a CBSA type of Metropolitan or Division are classified as urban, while hospitals with a CBSA type of Micropolitan or Rural are classified as rural.
The Teaching Status is teaching or non-teaching.
The missions of teaching hospitals differ from non-teaching hospitals. In addition, financial considerations differ between these two hospital groups. A hospital is considered a teaching hospital if it meets any one of the following three criteria: Residency training approval by the Accreditation Council for Graduate Medical Education (ACGME) Membership in the Council of Teaching Hospitals (COTH) A ratio of full-time equivalent interns and residents to beds of .25 or higher
Ownership is Government non-federal (public), private not-for-profit (voluntary), or private investor-owned (proprietary).
Depending on their control, hospitals tend to have different missions and different responses to government regulations and policies. When there are enough hospitals of each type to allow it, the NIS stratifies hospitals as public, voluntary, and proprietary. For smaller strata the NIS uses a collapsed stratification of public versus private, with the voluntary and proprietary hospitals combined to form a single "private" category. For all other combinations of region, location, and teaching status, no stratification based on control is advisable, given the number of hospitals in these cells.
Trauma-level designation is a modified version of the Trauma Information Exchange Program, or TIEP, trauma-level designation. A trauma center is a hospital equipped to provide comprehensive emergency medical services to patients suffering traumatic injuries 24 hours a day, 365 days per year. The NEDS distinguishes between Trauma Levels one, two, and three.
Trauma designation is made by a state or local authority or verified by the American College of Surgeons: Level I: Full range of specialists/equipment 24 hours a day, has surgical residency program, has program of research, referral resource for communities in nearby regions, 1,200 admissions a yearLevel II: Comprehensive trauma care in collaboration with Level I center, essential specialties/equipment available 24 hours a day, not required to have teaching and researchLevel III: Resources for resuscitation, surgery and intensive care but not full availability of specialists, transfer agreements with Level I and II centersLevel IV/V: Resources for advanced trauma life support in remote areas
NEDS Weight Variable
Similar to the NIS, to produce national or regional estimates, the HCUP databases provide a "weight" variable that you can apply to your data. If you're interested in learning more about weighting the national databases, please access the HCUP tutorial on weighting.
NEDS Summary
In this section you have learned the following information about the NEDS database:
The NEDS is constructed from the State Emergency Department Databases, or the SEDD, and the State Inpatient Databases, or the SID.
The NEDS is a stratified sample of hospital-based emergency departments.
The NEDS cannot be used to conduct state-level analyses. Below, you can see a summary of the target universe, sample frame, sample strata, and sample unit for the NEDS.
| Target Universe | All emergency department visits from hospital-based emergency department units in community, non-rehabilitation hospitals in the United States |
| Sample Frame | All emergency department visits from hospital-based emergency department units in community, non-rehabilitation hospitals in the participating HCUP Partner States |
| Sample Strata | US region, urban or rural location, teaching status, ownership, trauma-level |
| Sample Unit | Hospital emergency department |
The Kids' Inpatient Database (KID)
The third national-level database, the Kids' Inpatient Database, or the KID, is specifically designed for pediatric research, particularly for the study of rare pediatric conditions. The KID is produced every three years starting with 1997 data. The way the KID is created is quite different than the NIS and the NEDS.
Children are not hospitalized that often and that's a good thing. In fact, the most common reason for a child to be hospitalized is for their own birth!
About two thirds of all pediatric hospital stays are for newborns. And, the vast majority of these newborn stays are uncomplicated, routine births. This is great from a public health perspective, but all these healthy, uncomplicated births overwhelm the database making it difficult to identify rare pediatric hospitalizations.
The KID is designed to accommodate research on rare pediatric conditions that require hospitalization, such as congenital anomalies, as well as rare pediatric medical procedures, such as heart surgery and organ transplantation.
While the NIS does include pediatric discharges, the NIS is not designed for research on rare pediatric inpatient hospitalizations - it's best to use the KID for this kind of research.
Note that the NEDS is well-suited for research on pediatric emergency care.
KID Sample
The KID is a stratified sample of discharges from the State Inpatient Databases, or SID.
The target universe for the KID is US community hospitals with pediatric discharges, again, based on the American Hospital Association (AHA) Annual Survey of Hospitals.
While the NIS and NEDS are samples of hospitals and hospital emergency departments, the KID is a sample of individual discharges of pediatric patients. The definition of pediatric is 20 years and under. The sampling frame for the KID is the same as the sampling frame for the NIS: the State Inpatient Databases, or SID. However, unlike the NIS, the KID includes a sample of pediatric discharges from all hospitals with pediatric stays in the sampling frame. Recall that the NIS includes a sample of hospitals not a sample of discharges.
For sampling, pediatric discharges are stratified into three categories: uncomplicated in-hospital births, complicated in-hospital births, and all other pediatric hospital stays.
Systematic random sampling is used to select 10% of uncomplicated in-hospital births and 80% of complicated in-hospital births and other pediatric cases from each frame hospital. This over-sampling of complicated births and pediatric non-births ensures that we get a good representation of rare pediatric hospitalizations. We do not need to sample many of the uncomplicated births because there is little difference in the characteristics of one uncomplicated birth compared to another uncomplicated birth. So we only need a small representation of uncomplicated births.
KID Strata
The KID is stratified by:
Uncomplicated in-hospital births
Complicated in-hospital births
And all other pediatric hospital stays.
Unlike the NIS or NEDS, the KID records are post-stratified in order to enable users to create national and regional estimates. The post-stratification variables are similar to those used in the NIS.
The records are post-stratified in proportion to the number of AHA newborns and tthe total number of non-newborn AHA discharges.
KID Weight Variable
In order to produce national or regional estimates of pediatric hospitalizations using the KID, discharge weights are developed using the American Hospital Association (AHA) target universe as the standard.
To do so, KID records are post-stratified by the same characteristics used to define the NIS sampling strata (US region, urban or rural location, teaching status, ownership, and bed size), with the addition of a stratum for freestanding children's hospitals.
The KID is stratified by freestanding children's or other hospitals. Children's hospitals restrict admissions to children, while other hospitals admit both adults and children. There may be significant differences in practice patterns, severity of illness, and available services between children's hospitals and other hospitals. Children's units in general hospitals are not stratified as children's hospitals.
If you're interested in learning more about weighting the national databases, please access the HCUP tutorial on weighting.
KID Summary
In this section you have learned the following information about the KID database:
The KID is constructed from the State Inpatient Databases, or SID.
The KID is a stratified sample of pediatric discharges: complicated births and non-births are over-sampled.
The KID cannot be used to conduct state-level analyses. Below, you can see a summary of the target universe, sample frame, sample strata, and sample unit for the KID.
| Target Universe | Pediatric discharges from community, non-rehabilitation hospitals in the United States |
| Sample Frame | Pediatric discharges from community, non-rehabilitation hospitals in the participating HCUP Partner States |
| Sample Strata | Uncomplicated births, complicated births, all other pediatric hospital stays |
| Sample Unit | Pediatric discharges |
Common Errors
There are some mistakes that are easy to make when working with the HCUP national databases. Understanding the sample design of each database will help you avoid these errors.
Not weighting when attempting to produce national and/or regional estimates.
One of the most common errors is not weighting the NIS, NEDS, and KID data when attempting to produce national and/or regional estimates. Remember that these national databases are based on samples - they must be weighted to derive national and/or regional estimates. If you do not weight the data, what you have are actual record counts, not national and/or regional estimates.
Reporting cell sizes less than 11.HCUP DUA module
A serious violation occurs if users report cell sizes less than 11 in their publications. Remember that you signed an HCUP Data Use Agreement (DUA) that prohibits you from reporting any cell sizes less than 11. This is required as a privacy precaution. From a sample design perspective, any estimate that you base off of such a low count probably isn't that reliable anyway. If you'd like a refresher on the HCUP DUA, please consider reviewing the HCUP DUA module - it's only 15 minutes in length and can be accessed via the link on the screen.
Producing state-level estimates from the national databases.
Another error is that sometimes new users attempt to produce state-level estimates from the national databases. Remember that none of the HCUP national databases have a sample design that includes "state" as a strata variable. Only national and regional estimates should be produced from the national databases. Trying to produce state-level estimates from the NIS, NEDS, or KID could result in biased results.
Choosing the wrong HCUP database for a particular research question.
New users sometimes use the inappropriate database for a particular study. For example, remember to use the KID, rather than the NIS, for your research on rare pediatric conditions as the sample design of the KID is specifically created to accommodate rare pediatric research. Also, take caution when using any of the HCUP national databases for race-related research as race data are not uniformly available across the HCUP state databases - or, put another way, across the "sampling frame."
Not using appropriate statistical software to work with the national databases.
Sometimes users try to work with the HCUP national databases in software packages that are not designed to account for complex survey design, such as Microsoft Excel. You must use statistical software, such as SAS, Sudaan, or Stata, that can handle data derived from complex sampling designs. This is important because analyses that fail to account for the sample design could yield biased estimates - and may have direct impact on your variance calculations.
Not checking estimates against other data sources.
Users sometimes neglect to check their estimates against other data sources. At a minimum, it is recommended that you check your estimates against HCUPnet (which is a free online query system with access to HCUP data).
Key Differences in Sample Design
When looking at key differences in sample design amongst the NIS, NEDS, and KID, remember that each database has a unique purpose and that the target universe, frame, strata, and unit of each database differ. The table below highlights those differences.
| NIS | NEDS | KID | |
| Target Universe | All discharges from community, non-rehabilitation hospitals in the United States | All emergency department visits from hospital-based emergency department units in community, non-rehabilitation hospitals in the United States | Pediatric discharges from community, non-rehabilitation hospitals in the United States |
| Sample Frame | All discharges from community, non-rehabilitation hospitals in the participating HCUP Partner States | All emergency department visits from hospital-based emergency department units in community, non-rehabilitation hospitals in the participating HCUP Partner States | Pediatric discharges from community, non-rehabilitation hospitals in the participating HCUP Partner States |
| Sample Strata | US Region, urban or rural location, teaching status, ownership, bed size | US Region, urban or rural location, teaching status, ownership, trauma-level | Uncomplicated births, complicated births, all other pediatric hospital stays |
| Sample Unit | Hospital-level | Hospital emergency department-level | Pediatric discharges |
Key Points
As you begin your work with the HCUP national databases, you will want to keep in mind the following key points:
It's important to select the appropriate database given your research question. The sample design of the HCUP national databases can influence which HCUP database is best suited for your research.
All three national databases are derived from the HCUP State-level Databases.
You should not use the national databases for state-specific questions. You should get state-level data from state-specific databases only.
The NIS and NEDS are developed based on a stratified sample of hospitals. In contrast, the KID is based on a stratified sample of pediatric discharges.
Resources and Other Training
If you are looking for more information on the subject matter covered here, many resources are available on the HCUP User Support website.
If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has senior research personnel available to answer technical questions you may have.
Thank you for accessing this module. There are several other HCUP online tutorials. Take a look to see if there are other topics that could be helpful to you.
If you have any feedback regarding this module, please email us at hcup@ahrq.gov.