HEALTHCARE COST & UTILIZATION PROJECT

User Support

Do Your own analysis
Explore Expert Research & Limited Datasets

Load and Check HCUP Data - Accessible Version


Load and Check HCUP Data - Accessible Version


Contents:

Welcome

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial.

My name is Sue, and I'm going to show you how to get started with your HCUP research.

In this module I'll walk you through how to properly load HCUP on to your computer and how to check that the data have loaded correctly.

These are first steps to conducting successful analyses with HCUP data.

This module is for individuals who have completed the HCUP Data Use Agreement Training, signed the HCUP Data Use Agreement, obtained their copy of the HCUP data, and are ready to begin their research. This tutorial will take approximately 20 minutes to complete.

Return to Contents

About HCUP

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our General Overview Course.

Return to Contents

Learning Objectives

There are two learning objectives for this module.

The first objective is to save the HCUP data products to your computer, unzip (or decompress) HCUP data, and then load the HCUP data into a standard statistical software package.

The second objective is to learn how to verify that you have correctly loaded the data onto your computer. I'll show you how to run a few basic programs to generate summary statistics which you can check against the summary statistics available on the HCUP-US Web site.

Return to Contents

Data File Contents

HCUP data products can be purchased from the online HCUP Central Distributor.

HCUP databases that are purchased from the HCUP Central Distributor are delivered in one of two ways:

  • Nationwide Databases are downloaded directly from the online HCUP Central Distributor
  • State Databases are shipped to you by the HCUP Central Distributor on DVDs using next day or 2-day service

To get started, let's review what you received from the online HCUP Central Distributor, and exactly what you'll be saving to your computer.

All Nationwide databases are delivered through a digital download from the online HCUP Central Distributor ordering website, and bundled into a single delivery zip file. All State databases are delivered on DVDs.

Regardless of the delivery format, your HCUP data products will arrive in zip files: compressed, encrypted format that requires a password to unzip.

Depending on the database and year, the number and types of files will vary. The full description of all HCUP data files can be found on the HCUP-US website for each database.

Let's review what's contained when you receive your data products from the Central Distributor, and what exactly you'll be loading onto your computer. For your reference, the information I'm about to present can also be found in the introductory documentation that came with your data products.

Nationwide data products are downloaded electronically in a single zip delivery file. State data products are delivered on physical media. Contents of the data delivery files are:

  • Database zip files
  • Documentation files

When you download any Nationwide data product from the online HCUP Central Distributor, your download is delivered in a single zip file which must be unzipped in order to access the full collection of related files making up your product set. The zip file and its nested database zip files all use the same password which was emailed to you.

If you are using the NIS, your downloaded zip file will contain fixed-width ASCII formatted data files that are compressed and encrypted. Every NIS product set contains the Inpatient Core File, which is the Main Data File with over 7 million discharge records, and the Hospital Weights File, which can be used to up-weight your results to the hospital universe. Depending on the year you ordered, the product set may also include Disease Severity Measures Files, which provide severity measures for the NIS, and Diagnosis and Procedure Groups File, which is used to group diagnoses and procedures into different clinical categories.

The specific contents of the files by year are available on the NIS HCUP-US File Specifications page.

In order to load and analyze HCUP NIS data onto your computer, you will need at least 15 gigabytes of available storage, depending on which analysis software you plan to use. Detailed specifications are described in the NIS Overview located on the HCUP-US website.

The National (Nationwide) Inpatient Sample (NIS) is available annually

NIS ASCII Files

Unit of Observation

Contents

Inpatient Core File

Discharge-level

All discharges from the sample of hospitals in participating states - over 8,000,000 records

Hospital Weights File

Hospital-level

One record for each hospital included in the NIS, linkable to Inpatient Core File

Disease Severity Measures File

Discharge-level

Disease severity measures linkable to Inpatient Core File

Diagnosis and Procedure GroupsFile

Discharge-level

Diagnosis and procedure groups linkable to Inpatient Core File

If you are working with the KID, your downloaded zip file will contain compressed, encrypted ASCII files including the same types of files: An Inpatient Core File containing between 2 and 3 million records, a Hospital Weights File (depending on the year), a Disease Severity Measures File, and a Diagnosis and Procedure Groups File.

The specific contents of the files by year are available on the KID HCUP-US File Specifications page.

In order to load and analyze HCUP KID data onto your computer, you will need at least 10 gigabytes of available storage. Detailed specifications are described in the KID Overview located on the HCUP-US website.

Kids' Inpatient Database (KID) is available every three years

KID ASCII Files

Unit of Observation

Contents

Inpatient Core File

Discharge-level

Pediatric discharges sampled from HCUP hospitals - 2,000,000 to 3,000,000 records

Hospital Weights File

Hospital-level

One record for each hospitalincluded in the KID, linkable to Inpatient Core File

Disease Severity Measures File

Discharge-level

Disease severity measures linkable to Inpatient Core File

Diagnosis and Procedure GroupsFile

Discharge-level

Diagnosis and procedure groups linkable to Inpatient Core File

If you are working with the NEDS, your downloaded zip file will contain compressed, encrypted CSV files including a Core File, with over 30 million emergency department records, and a Hospital Weights File. The NEDS also has a Supplemental Emergency Department File, which contains information on procedures performed in the emergency department for treat-and-release patients, and a Supplemental Inpatient File, which contains additional data elements for patients admitted to the hospital after an emergency department visit.

The specific contents of the files by year are available on the NEDS HCUP-US File Specifications page.

Because the NEDS is such a large database, you should have 75-100 gigabytes of storage space available on your computer in order to be able to work comfortably with the NEDS. Detailed specifications are described in the NEDS Overview located on the HCUP-US website.

Nationwide Emergency Department Sample (NEDS) is available annually
NEDS CSV Files Unit of Observation Contents
Core File Discharge-level All ED visits from the sample of hospitals in participating states - over 30,000,000 records
Hospital Weights File Hospital-level One record for each hospital included in the NEDS, linkable to Core File
Supplemental Emergency Department File Discharge-level Information on procedures that were performed in the ED for treat-and-release visits
Supplemental Inpatient File Discharge-level Information on inpatient admissions after ED visits, linkable to Core file

If you are working with the NRD, your downloaded zip file will contain compressed, encrypted CSV files including a Core File, Hospital Weights File, Severity Measures File, and Diagnosis and Procedure Groups File.

The specific contents of the files by year are available on the NRD HCUP-US File Specifications page.

In order to load and analyze HCUP NRD data onto your computer, you will need at least 50 gigabytes of available storage space. Detailed specifications are described in the NRD Overview located on the HCUP-US website.

Nationwide Readmissions Database (NRD) is available annually
NRD CSV Files Unit of Observation Contents
Core File Discharge-level Contains data elements critical to readmission analyses
Hospital Weights File Hospital-level Contains information on hospital characteristics
Severity Measures File Discharge-level Contains additional data elements to aid in identifying the severity of the condition for a specific discharge (e.g., comorbidity flags, 3M All-Patient Refined Diagnosis-Related Group (APR-DRG) value, risk or mortality, and severity)
Diagnosis and Procedure Groups File Discharge-level Contains additional information on the diagnoses (e.g., chronic condition indicators) and procedures (e.g., procedure class)

If you are working with one of the state databases, the files you receive depend on the state, the year and the database you are using. The files always include the Core File, and may also include a Charges File, AHA Linkage File, Disease Severity Measures Files, and Diagnosis and Procedure Groups File.

The specific contents of the files by state, year and database are available on the HCUP-US File Specifications page accessible through the link on the screen. Note that the number of DVDs you receive will vary by state, year, and database.

State databases are always in ASCII format, compressed and encrypted to fit on the physical media.

State Inpatient Database (SID), State Ambulatory Surgery and Services Database (SASD), and State Emergency Department Database (SEDD)
ASCII Files Unit of Observation Contents
Available across all databases, states, and years:
Core File Discharge-level Data elements that form the nucleus of the database
Charges File Discharge-level Information on charges associated with visit, linkable to core files
AHA Linkage File Hospital-level Hospital information linkable to core files
Available for selected databases, states and years:
Disease Severity Measures File Discharge-level Disease severity measures linkable to core files
Diagnosis and Procedure Groups File Discharge-level Diagnosis and procedure groups linkable to core files

Return to Contents

HCUP-US Documentation

No matter which database you are working with, all the documentation and tools you will need to use your HCUP data files are online at HCUP-US.

Let's go to the HCUP-US website right now, and I'll show you the documentation available to you. It's located in the "Databases" section of the website, under "Database Documentation".

Today, we'll be using the NIS Inpatient Core File to demonstrate the load and check processes, so I'll take a look at the NIS database documentation.

The database documentation includes a detailed introduction to the database, descriptions of the data elements, lists of the available data elements, file specifications, and programs needed to load the data, and analytic tools designed specifically for use with the HCUP data.

If you're just getting started using HCUP data, you may want to begin with the Introduction document, which is available for each database, located on the specific database documentation page: Introduction to the NIS, Introduction to the KID, Introduction to the NEDS, Introduction to the NRD, Introduction to the SID, Introduction to the SASD, and Introduction to the SEDD.

These documents can all be found on the left hand side of the appropriate database documentation page. These introductory documents contain much of the information I'm reviewing with you today, such as the size and structure of the HCUP database files.

Return to Contents

Decompressing Data Overview

In order to load and analyze HCUP data onto your computer, you'll need to have 15 to 100 gigabytes of space available on your computer.

Because of the size of the HCUP database files, the files are compressed and encrypted with SecureZip® from PKWARE.

To begin, you must save the data to your hard drive before you unzip and decrypt the data.

The steps involved in saving the delivered files to your hard drive may differ slightly depending on whether you're working with Nationwide or State databases, but the process of decompressing the data are the same for each of the HCUP databases.

Return to Contents

Decompressing the Data Step by Step

I'm going to walk through the steps involved in saving the zip files to my hard drive and then unzipping the data. If you are extracting the files onto a Macintosh®, you will need a program such as SecureZIP® for Mac or StuffIt Expander®. If you are in need of assistance in using software packages other than those used in this demonstration, please contact HCUP Technical Assistance at hcup@ahrq.gov.

Please note, the HCUP data files are zipped files that cannot be decrypted by the built in zip/unzip utility that comes with Windows operating systems or Macintosh (Archive Utility). Unzip programs are available from several reputable vendors.

  • ZIP Reader® (Windows) - PKWARE corporation
  • SecureZIP® for (Mac) - PKWARE coporation
  • WinZip® (Windows) - Win Zip corporation
  • Stuffit Expander® (Mac) - Smith Micro corporation

The process I'm about to walk through is for a Nationwide digital download, and is just one of several ways to go about loading the data file from either a DVD or a digital download onto a computer. If you're familiar with other means of accomplishing the same steps, it will not be a problem to use those in place of what I'm about to show you. Just make sure to always check your work, as I will demonstrate throughout this tutorial to confirm that the data have decompressed, saved and loaded correctly. So, let's get started.

Here in the hard drive directory, I've created a "data" folder to hold the HCUP database products. Today I'm working with the 2013 NIS, so I'll name my directory "NIS 2013".

Next, I will download the NIS 2013 file from the online HCUP Central Distributor website. I have logged in to my online account, and displayed the "Order History". I will click "View Downloads" for my "Order".

This order contains two product files: the small CCR supplemental file, and the main NIS Nationwide 2013 product file. For this demonstration, I will save the NIS 2013 file to the folder I already created. Clicking the "download" link triggers the browsers download widget. This will vary by browser.

Clicking the arrow by the word "Save As", I will place the zip file in the NIS 2013 folder I created.

If your browser does not offer a "Save-as" option, the file will automatically download to a location on your computer from which you can later unzip it.

Downloading the file will take some time depending on your internet connection speed.

Once the delivery zip file has completely downloaded, you will be able to see the NIS 2013 zip file in the destination folder.

I will choose "Open with WinZip" to see the files contained in the delivery zip.

The zip dialog window should open. I will be prompted for the password before I can see the contents.

The WinZip dialog opens showing the contents of the delivery zip file.

Use the Unzip or Extract function to extract the files from the delivery zip file to the location desired.

I will enter the decryption password when prompted. Purchasers received this password in an email from the HCUP Central Distributor.

The zip utility will extract all the files and place them in the folder I created. A progress window may display as the files unzip.

When the extraction function is completed, the files will display in my folder.

The zip files in this folder are data files I will unzip and load into the analysis software.

The other method to save the data to my computer is from a DVD.

If my data products are on a DVD, I will insert the DVD into my disk drive and open up the DVD directory on my computer.

The DVD will display the files that are to be downloaded to my hard drive. Select all of the files and select "Copy".

I will copy the files to the folder I created on my hard drive. Once the files are copied to my folder, open up the folder.

From this point, the steps to extract the databases are the same regardless of which database you are working with. You will usually extract the "Core" database file first.

In this example, in the NIS 2013 folder, I will select the NIS_2013_Core.zip file.

I will be using WinZip to open the file. Your Zip utility may have a different appearance and different options.

In the WinZip dialog box, I will Unzip the Core file using Unzip button. When I click Unzip, I am prompted for the password to decrypt the file.

I'll enter the same password as the zip file used earlier, and click okay. The file begins extracting.

When the Core ASCII file is extracted, the newly extracted file appears in my folder. This is the Core data file I will load into my analysis software.

Remember that you are responsible for the security of your HCUP data and the Data Use Agreement requires the data to be stored in a safe place. Loading this data onto a LAN where other users have access is not allowed unless the other users have all signed an HCUP Data Use Agreement.

Return to Contents

Load into Statistical Software

Now that I've saved the data onto my hard drive, I have to load it into a statistical software package in order to work with it. The HCUP-US website offers load programs in SAS, SPSS and Stata because these are some of the most frequently-used packages, but there are other statistical software programs on the market that can also be used.

For the HCUP state databases, load programs are available in SAS, SPSS, and "Stata - beginning in 2014". Note that HCUP data files cannot be analyzed using desktop spreadsheet or database applications because of their size and complexity.

I'm going to demonstrate loading the NIS 2013 Core File using SAS. Navigate to the NIS database documentation page on HCUP-US. Halfway down on the left side you will see "File Specifications and Load Programs". I will click on the "Nationwide SAS Load Programs".

I will pull up the "2013 NIS" file from the drop down menu and then I will click on the "SAS NIS 2013 Core File load program" to download the program to my computer.

This will open the load program in a new window with text that needs to be copied and pasted into SAS. Open SAS and copy this text into SAS. Note that the version of SAS you see here may differ from the version you're using. While the layout and icons may be different, the code you are using will be the same regardless of the version.

I will save the file to my local hard drive in the NIS 2013 folder.

Once the program is open, I need to assign a library as I do every time I run a SAS program.<

I will use the libname command to set the library to the same location where I have stored the ASCII file—that is, the NIS 2013 folder on my hard drive.

libname NIS 'c:\NIS 2013\';

When I am calling the file into SAS, I need to use the assigned libname as a prefix to the name of the data set in order to let SAS know where to store the data set.

DATA nis.NIS_2013_Core;

I also need to indicate where on the computer SAS should look to find the ASCII file I want to load.

INFILE 'c:\NIS_2013\NIS_2013_Core.ASC'

Now that I've made the 2 modifications needed for the program to know where my data can be found, I'm going to scroll through the rest of the load program to see it in its entirety.

Once I've finished looking through the program, I'll select run and submit.

The program takes a few minutes to run.

When the run is completed, SAS generates a log file of the program. I will check this log file to make sure that I do not see any error messages.

If there is error message, I may need to double check my work to this point.

If there are no error messages, I will see notes indicating how long it took SAS to load the file, in this case nearly 3 minutes, as well as notes describing each step the program executed.

Once I've taken a look at the log file, I want to check the data to make sure it loaded properly.

The files from each of the databases will look quite similar, although the data elements and number of records included in each will differ greatly.

Note that actual data are not shown in this tutorial. This is an illustration of what the SAS option in this example may look like.

Right now I am scrolling across each of the data elements that were loaded. These include patient characteristics, diagnosis codes, procedure codes, and other information about the hospitalization.

I also want to scroll down through the records to make sure there 7 million 119 thousand 563 records.

Yes, there are.

Return to Contents

Approach

I've loaded the data into my statistical software, and I'm ready for the next step. What I need to do now is check that I've loaded the data correctly.

First, let's verify that I've saved the file in the appropriate location on my "C" drive. The file should be in the NIS 2013 folder and I should see that it's 3 million 674 thousand 752 Kilobytes.

Note that now that the file has been loaded into the statistical software, you may want to delete the original ASCII version as well as the ZIP file to keep them from taking up too much space on your computer.

Next, to check the data, I'm going to go back to HCUP-US and pull some summary statistics files. I'll compare the summary statistics files from HCUP-US to some basic frequencies and distributions I'll run on the data I've just loaded.

Return to Contents

Summary Statistics

Let's go back to the NIS Data Documentation section of HCUP-US, and scroll a bit down the page.

Under the heading "Data Elements" there is a bullet with a link to "NIS Summary Statistics".

Let's click on this link to open the Summary Statistics page. Note that if you are working with a database other than the NIS, go to the appropriate Data Documentation screen and choose the summary statistics which correspond to the database you've loaded.

Right now, I'm only interested in the statistics for the "unweighted Core File".

I will open up the "unweighted Core File" and take a look.

As you can see, this file provides you with basic statistics on each of the data elements in the NIS, the "N", or number of records which contain the data element, the "N Miss" or number of records from which the data element is missing, the "minimum and maximum" values of the data element, the "mean", and the "standard deviation".

Then, for each data element, I have information on the frequency and percent distribution of discharges by specific data element.

For example, if I look at AWEEKEND, the data element which stores the information on whether the admission occurred on a weekend or a weekday, I see of the 7 million 119 thousand 563 records in the NIS, about 20 percent - 1 million 438 thousand 800 - are coded as having been admitted on the weekend.

I can see that 188 records did not have anything coded for AWEEKEND.

These frequency distribution tables are available for each of the data elements in the NIS in the Summary Statistics PDF file we just downloaded from the HCUP-US website.

Return to Contents

Running Check Programs

In order to check the data, I'll create tables of means and of frequency distributions from the data I've loaded on my computer and then compare those statistics to the summary statistics available on HCUP-US.

If the numbers match the HCUP summary statistics I downloaded from HCUP-US, I'll know the data has loaded correctly. If not, I'll know there's a problem and I'll have to go back and figure out what I've done wrong.

To create the tables, I'll go back to SAS. I'll start with a short program to check the means of the data elements.

Submit the SAS program.

This will generate a table which should match the first table we saw in the Summary Statistics file.

Now, I'll compare the output of our SAS program to the information in the Summary Statistics file. I can see that so far it looks good.

Next, I'll generate some tables to check the frequency distributions of various data elements. Since I've looked at the AWEEKEND table in the Summary Statistics File, I will start with the AWEEKEND data element.

I will need to run the frequency on AWEEKEND.

Submit the SAS program.

My output should match exactly the data I saw in the Summary Statistics File. It looks like it checks out.

I recommend running frequency distributions on a few other data elements as well.

Return to Contents

Identifying Possible Problems

What happens if the statistics you generate from the data you load do not agree with those in the summary statistics? You will need to double check your work.

First, check to make sure you used the load program that corresponds to the database you are working with.

Then check to make sure you used the summary statistics which correspond to the database you are working with.

Finally, I will check the code I used to generate the means and frequency distributions.

If none of these resolve the problem, HCUP Technical Assistance is available at hcup@ahrq.gov.

Return to Contents

Key Points

As you begin to work with HCUP data, the following key points will help you along the way:

The load programs and summary statistics you will need to properly load and check your data are available on the HCUP-US along with any other database documentation you may need.

Data should always be checked after it has been loaded to make sure it has loaded correctly.

If you encounter problems loading or checking your data, review your work and then contact HCUP Technical Assistance if necessary at hcup@ahrq.gov.

Return to Contents

Resources and Other Training

If you are looking for more information on the subject matter covered here, many resources are available on the HCUP User Support website.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has experienced research personnel available to respond to technical questions you may have. Inquiries are answered within three business days.

Thank you for accessing this module. There are several other HCUP online tutorials. Access these tutorials to see if there are other topics that could be helpful to you.

If you have any feedback regarding this module, please email us at hcup@ahrq.gov.

Resources:

Detailed documentation of HCUP resources is available on the HCUP User Support website. To access introductory documentation and load programs for each of the HCUP national databases, see links below:

Return to Contents


Internet Citation: Load and Check HCUP Data - Accessible Version. Healthcare Cost and Utilization Project (HCUP). October 2016. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/tech_assist/loadandcheck/508_course/508course_2016.jsp.
Are you having problems viewing or printing pages on this website?
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov.
Privacy Notice, Viewers & Players
Last modified 10/21/16