Load and Check HCUP Data

Insert alt text

Welcome

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial.

This tutorial will teach you how to get started with your HCUP research.

In this module, you‘ll learn how to properly load HCUP data onto your computer and how to check that the data have loaded correctly. These are the first steps to conducting successful analyses with HCUP data.

This module is for individuals who have completed the HCUP Data Use Agreement, obtained their copy of the HCUP data, and who are ready to begin their research. This tutorial will take approximately 20 minutes to complete.

About HCUP

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our General Overview Course.

Learning Objectives

There are two learning objectives for this module.

The first objective is to learn how to unzip (or decompress) HCUP data, save it on your computer, and how to load the HCUP data into a standard statistical software package.

The second objective is to learn how to verify that you have correctly loaded the data onto your computer. You will learn how to run a few basic programs to generate summary statistics which you can check against the summary statistics available on the HCUP-US website.

Content of the CDs

To get started, let's review what's contained on the CDs you received from the Central Distributor, and what it is exactly that you'll be loading onto your computer. For your reference, the information presented in this tutorial can also be found in the introductory documentation that came with the CDs.

If you are using the NIS, you will receive two CDs. These discs contain fixed-width ASCII files: data are stored in the ASCII text format because it is a commonly-used format that is universally readable. Disc one contains the Inpatient Core File, the Main Data File with over 8 million records, and the Hospital Weights File, which can be used to up-weight your results to the hospital universe. The second disc contains Disease Severity Measures Files, which provide severity measures for the NIS, and the Diagnosis and Procedure Groups File which is used to group diagnoses and procedures into different clinical categories. In order to load and analyze HCUP NIS data onto your computer, you will need at least 13 gigabytes of available storage.

NIS ASCII Files

Unit of Observation

Contents

Inpatient Core File

Discharge-level

All discharges from the sample of hospitals in participating states - over 8,000,000 records

Hospital Weights File

Hospital-level

One record for each hospital included in the NIS, linkable to Inpatient Core File

Disease Severity Measures File

Discharge-level

Disease severity measures linkable to Inpatient Core File

Diagnosis and Procedure Groups File

Discharge-level

Diagnosis and procedure groups linkable to Inpatient Core File

If you are working with the KID, you will receive a single CD with the same types of files: an Inpatient Core File containing between 2 and 3 million inpatient records, a Hospital Weights File, a Disease Severity Measures File, and a Diagnosis and Procedure Groups File. In order to load and analyze HCUP KID data onto your computer, you will need at least 5 gigabytes of available storage.

KID ASCII Files

Unit of Observation

Contents

Inpatient Core File

Discharge-level

Pediatric discharges sampled from HCUP hospitals - 2,000,000 to 3,000,000 records

Hospital Weights File

Hospital-level

One record for each hospital included in the KID, linkable to Inpatient Core File

Disease Severity Measures File

Discharge-level

Disease severity measures linkable to Inpatient Core File

Diagnosis and Procedure Groups File

Discharge-level

Diagnosis and procedure groups linkable to Inpatient Core File

If you are working with the NEDS there will be a Core File, with over 27 million emergency department records, and a Hospital Weights File. The NEDS also has a Supplemental Emergency Department File, which contains information on procedures performed in the emergency department for treat-and-release patients, and a Supplemental Inpatient File, which contains additional data elements for patients admitted to the hospital after an emergency department visit. Because the NEDS is such a large database, you should have 75-100 gigabytes of storage available on your computer in order to be able to work comfortably with the NEDS.

NEDS ASCII Files

Unit of Observation

Contents

Core File

Discharge-level

All ED visits from the sample of hospitals in participating states - over 27,000,000 records

Hospital Weights File

Hospital-level

One record for each hospital included in the NEDS, linkable to Core File

Supplemental Emergency Department File

Discharge-level

Information on treat-and-release visits, linkable to Core File

Supplemental Inpatient File

Discharge-level

Information on visits resulting in an inpatient admission, linkable to Core File

If you are working with one of the state databases, the files you receive depend on the state, the year and the database you are using. The files include: Core File, Charges File, AHA Linkage File, Disease Severity Measures Files, and Diagnosis and Procedure Groups File.

ASCII Files

Unit of Observation

Contents

Available across all databases, states, and years:

Core File

Discharge-level

Size and contents vary by database, state, and year

Charges File

Discharge-level

Information on charges associated with visit, linkable to core files

AHA Linkage File

Hospital-level

Hospital information linkable to core files

Available for selected databases, states and years:

Disease Severity Measures File

Discharge-level

Disease severity measures linkable to core files

Diagnosis and Procedure Groups File

Discharge-level

Diagnosis and procedure groups linkable to core files

The specific contents of the files by state, year and database are available on the HCUP Central Distributor file specification page. Note that the number of CDs you receive will vary by state, year, and database.

HCUP-US Documentation

No matter which database you are working with, all the documentation and tools you will need to use your HCUP data files are online at HCUP-US: www.hcup-us.ahrq.gov.

The documentation available to you is located in the "Databases" section of the website, under "Database Documentation." This demonstration will use the NIS Inpatient Core File to demonstrate the load and check processes, so we will pay particular attention to the NIS database documentation. The database documentation includes introductory reports on the database, descriptions of the data elements, lists of the available data elements, programs needed to load the data (SAS®, SPSS®, STATA®), and analytic tools designed specifically for use with HCUP data (CCS, DRG formats, HCUP formats, HCUP diagnosis and procedure groups format, ICD-9-CM formats).

If you are just getting started using HCUP data, you may want to begin with the Introduction to the NIS, Introduction to the KID, and Introduction to the NEDS documents which can all be found on the appropriate database documentation home pages. These introductory documents contain much of the information covered in this tutorial, such as the size and structure of the HCUP database files.

Decompressing Data Overview

In order to load and analyze HCUP data onto your computer, you will need to have a minimum of 5 to 100 gigabytes of space available on your computer, depending on which database you are working with.

Because of the size of the HCUP database files, the data are distributed as self-extracting PKZIP compressed files.

It is not possible to work with the data directly off the CD, so you must decompress the data and save it onto your hard drive. The steps involved in decompressing the data and saving it onto your hard drive are the same for each of the HCUP databases.

Decompressing Data Step by Step

Next, this tutorial will cover the steps involved in decompressing the HCUP data and saving it onto your hard drive. The HCUP data files are self-extracting ZIP files. As such, if you are using a Windows® operating system, no additional program is needed to extract them. Note that if a computer has applicable software installed, it may use such software to decompress the files by default. For example, WinZip®, a commonly used product, automatically assists in decompressing the data files. WinZip® is just one of many software packages that functions in this manner. If you are extracting the files onto a Macintosh®, you will need a utility program such as StuffIt® Expander, which can be downloaded for free from Apple®. If you are in need of assistance in using software packages other than those used in this demonstration, please contact HCUP Technical Assistance at hcup@ahrq.gov. The process covered in this tutorial is just one of several ways to go about loading the data file from the CD onto a computer. If you are familiar with other means of accomplishing the same steps, it will not be a problem to use those in place of what is covered here. Just make sure to always check your work to confirm that the data have decompressed, saved and loaded correctly.

Step 1: Open the directory of your hard drive by selecting My Computer on the desktop, then select Local Disk (C:).

Step 2: In the hard drive directory, create a folder for the HCUP database so you have a location to store your files. For the sake of demonstration in this tutorial, let‘s assume you are working with the 2007 NIS. Select the File menu, select New, and select Folder. Name your folder "NIS 2007."

Step 3: Insert the CD into your disk drive, and open up the CD directory on your computer. The CD contains two files: one named NIS_2007_Core.exe and the other named NIS_2007_Hospital.exe.

Step 4: Select both of these files and select copy.

Step 5: Copy both of these files to the NIS 2007 directory (C:\NIS 2007) you just created on your hard drive. It will take a few moments to copy the files to the NIS 2007 folder.

Step 6: Once the files are completely copied, open up the NIS 2007 folder.

Step 7: Next, unzip each file by double-clicking on the file. This will bring up the WinZip Self-Extractor dialog box.

Step 8: Edit the name of the "Unzip to folder" in the WinZip Self-Extractor dialog to select the desired destination directory for the extracted file. You will want the extracted file to be saved in the NIS 2007 directory as well, so select that directory (C:\NIS 2007).

Step 9: Now, click on the Unzip button to begin the unzip process. This process will take about 5 to 10 minutes, depending on the speed of your computer. The ASCII data files will then be uncompressed and saved in this directory.

When the files are saved in the correct location, they should be on your hard drive, in the NIS 2007 folder.

check_file_size.jpg

Remember that you are responsible for the security of your HCUP data and the Data Use Agreement requires the data to be stored in a safe place. Loading this data onto a LAN where other users have access is not allowed unless the other users have all signed an HCUP Data Use Agreement.

Load into Statistical Software

Now that you've saved the data onto your hard drive, you have to load it into a statistical software package in order to work with it. HCUP-US offers load programs in SAS®, SPSS® and STATA® because these are some of the most frequently-used packages, but there are other statistical software programs on the market that can also be used. For the HCUP state databases, load programs are available in SAS® and SPSS®. Note that HCUP data files cannot be analyzed using desktop spreadsheet or database applications because of their size and complexity.

Next, this tutorial will demonstrate loading the Core File using SAS®. Go to the NIS SAS Load Programs

Step 1: Click on the 2007 NIS Core File Load Program to download the program to your computer.

Step 2: When the dialogue box appears, select Open to open the load program in SAS®. Note that your version of SAS® may differ from the version used in this tutorial. While the layout and icons may be different, the code used will be the same regardless of the version.

Step 3: Once the program is open, use the libname command to assign a library to the same location where you have stored the ASCII file–that is, the NIS 2007 folder on the C drive.

libname nis "C:\nis 2007\";

 

When you are calling the file into SAS®, use the assigned libname as a prefix to the name of the data set in order to let SAS® know where to store the data set. Indicate where on the computer SAS® should look to find the ASCII file you want to load. In this case, you should indicate that the data is saved in the NIS 2007 folder on your hard drive.

 

Find the lines:

DATA NIS_2007_Core;

INFILE ‘NIS_2007_Core.ASC’ LRECL = 508;

 

Change to:

DATA nis.NIS_2007_Core;

INFILE ‘C:\NIS 2007\NIS_2007_Core.ASC’ LRECL = 508;

Step 4: After you‘ve made the two modifications needed for the program to know where your data can be found, scroll through the rest of the load program.

Step 5: After you‘ve finished scrolling through the program, select run and submit.

The program takes a few minutes to run. When the run is completed, SAS® generates a log file of the program.

You should check this log file to make sure that there are not any error messages. If there are error messages, you may need to double check your work to this point. If there are no error messages, there will be notes indicating how long it took SAS® to load the file as well as notes describing each step the program executed.

After scrolling through the log file, check the data to make sure it loaded properly. The files from each of the databases will be quite similar, although the data elements and number of records included in each will differ greatly.

Approach

After you have loaded the data into your statistical software, you are ready for the next step. You need to check that you‘ve loaded the data correctly.

Step 1: Verify that you saved the file in the appropriate location on your drive. The file should be in the NIS 2007 folder and it should be 4,151,457 KB.

Note that now that the file has been loaded into the statistical software, you may want to delete the original ASCII version as well as the ZIP file to keep them from taking up too much space on your computer.

Step 2: To check the data, you can go back to HCUP-US and pull some summary statistics files. You can compare the summary statistics files from HCUP-US to basic frequencies and distributions run on the data you've just loaded.

Step 1: Go back to the NIS Data Documentation section of HCUP-US.

Step 2: Under the heading "Description of Data Elements in the NIS" there is a bullet with a link to Summary Statistics. Open the Summary Statistics page. Note that if you are working with a database other than the NIS, just go to the appropriate Data Documentation screen and choose the summary statistics which correspond to the database you've loaded. Summary statistics are available for each of the files on the data CD you received from the HCUP Central Distributor. Right now, focus on the statistics for the unweighted Core File. In later modules, you'll come back to the other summary statistics files.

Step 3: Open up the 2007 NIS unweighted Core File summary statistics. This file provides you with basic statistics on each of the data elements in the NIS - the N, or number of records which contain the data element, the number of records from which the data element is missing, the minimum and maximum values of the data element, the mean, and the standard deviation.

Step 4: For each data element, there will be information on the frequency and percent distribution of discharges by specific data element. For example, for AWEEKEND, the data element which stores the information on whether the admission occurred on a weekend or a weekday, of the 8,043,415 records in the NIS, about 20 percent - 1,556,914 - are coded as having been admitted on the weekend. Less than 10 records did not have anything coded for AWEEKEND.

Running Check Programs

In order to check the data, you will need to create tables of means and of frequency distributions from the data you've loaded on your computer and then compare those statistics to the summary statistics available on HCUP-US.

If the numbers match the HCUP summary statistics you downloaded from HCUP-US, you‘ll know the data have loaded correctly. If not, you’ll know there's a problem and you‘ll have to go back and figure out what you've done wrong.

Step 1: To create the tables, go back to SAS®. Start with a short program to check the means of the data elements - it will generate a table which should match the first table in the Summary Statistics file.

libname nis "C:\nis 2007\";

proc means data=nis.nis_2007_core;

run;

Step 2: Compare the output of the SAS® program to the information in the Summary Statistics file. The information should match up.

Insert alt text

Insert alt text

Step 3: Generate some tables to check the frequency distributions of various data elements. Since you covered the AWEEKEND table in the Summary Statistics File, start with the AWEEKEND data element.

libname nis "C:\nis 2007\";

proc freq data=nis.nis_2007_core;

tables aweekend;

run;

Your output should match exactly the data in the Summary Statistics File. It is recommended that you run frequency distributions on a few other data elements as well.

Identifying Possible Problems

What happens if the statistics you generate from the data you load do not agree with those in the summary statistics? You will need to double check your work.

First, check to make sure you used the load program that corresponds to the database you are working with.

Check to make sure you used the summary statistics which correspond to the database you are working with.

Finally, you should check the code you used to generate the means and frequency distributions.

If none of these resolve the problem, HCUP Technical Assistance is available at hcup@ahrq.gov.

Key Points

As you begin to work with HCUP data, the following key points will help you along the way:

The load programs and summary statistics you will need to properly load and check your data are available on HCUP-US along with any other database documentation you may need.

Data should always be checked after it has been loaded to make sure it has loaded correctly.

If you encounter problems loading or checking your data, review your work and then contact HCUP Technical Assistance if necessary at hcup@ahrq.gov.

Resources and Other Training

If you are looking for more information on the subject matter covered here, many resources are available on the HCUP User Support website.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has senior research personnel available to respond to technical questions you may have. Inquiries are answered within three business days.

Thank you for accessing this module. There are several other HCUP online tutorials. Access these tutorials to see if there are other topics that could be helpful to you.

If you have any feedback regarding this module, please email us at hcup@ahrq.gov.