Load and Check
HCUP Data
Splash

Welcome
Thank
you for joining us for this Healthcare Cost and Utilization Project (HCUP)
online tutorial.
This
tutorial will teach you how to get started with your HCUP research.
In
this module, you‘ll learn how to properly load HCUP data onto your computer and
how to check that the data have loaded correctly. These are the first steps to
conducting successful analyses with HCUP data.
This
module is for individuals who have completed the HCUP Data Use Agreement,
obtained their copy of the HCUP data, and who are ready to begin their
research. This tutorial will take approximately 20 minutes to complete.
About
HCUP
Before
we get started, a quick word about HCUP:
HCUP
is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is
a family of databases, software tools, and related research products that
enable research on a variety of healthcare topics.
If
you are unfamiliar with HCUP or would like a refresher, please consider taking
our General
Overview Course.
Learning
Objectives
There
are two learning objectives for this module.
The
first objective is to learn how to unzip (or decompress) HCUP data, save it on
your computer, and how to load the HCUP data into a standard statistical
software package.
The
second objective is to learn how to verify that you have correctly loaded the
data onto your computer. You will learn how to run a few basic programs to
generate summary statistics which you can check against the summary statistics
available on the HCUP-US website.
Content
of the CDs
To
get started, let's review what's contained on the CDs you received from the
Central Distributor, and what it is exactly that you'll be loading onto your
computer. For your reference, the information presented in this tutorial can
also be found in the introductory documentation that came with the CDs.
If
you are using the NIS, you will receive two CDs. These discs contain
fixed-width ASCII files: data are stored in the ASCII text format because it is
a commonly-used format that is universally readable. Disc one contains the
Inpatient Core File, the Main Data File with over 8 million records, and the
Hospital Weights File, which can be used to up-weight your results to the
hospital universe. The second disc contains Disease Severity Measures Files,
which provide severity measures for the NIS, and the Diagnosis and Procedure
Groups File which is used to group diagnoses and procedures into different
clinical categories. In order to load and analyze HCUP NIS data onto your
computer, you will need at least 13 gigabytes of available storage.
|
NIS ASCII Files |
Unit of Observation |
Contents |
|
Inpatient Core File |
Discharge-level |
All discharges from the sample of
hospitals in participating states - over 8,000,000 records |
|
Hospital Weights File |
Hospital-level |
One record for each hospital
included in the NIS, linkable to Inpatient Core File |
|
Disease Severity Measures File |
Discharge-level |
Disease severity measures linkable
to Inpatient Core File |
|
Diagnosis and Procedure Groups
File |
Discharge-level |
Diagnosis and procedure groups linkable
to Inpatient Core File |
If
you are working with the KID, you will receive a single CD with the same types
of files: an Inpatient Core File containing between 2 and 3 million inpatient
records, a Hospital Weights File, a Disease Severity Measures File, and a
Diagnosis and Procedure Groups File. In order to load and analyze HCUP KID data
onto your computer, you will need at least 5 gigabytes of available storage.
|
KID ASCII Files |
Unit of Observation |
Contents |
|
Inpatient Core File |
Discharge-level |
Pediatric discharges sampled from
HCUP hospitals - 2,000,000 to 3,000,000 records |
|
Hospital Weights File |
Hospital-level |
One record for each hospital
included in the KID, linkable to Inpatient Core File |
|
Disease Severity Measures File |
Discharge-level |
Disease severity measures linkable
to Inpatient Core File |
|
Diagnosis and Procedure Groups
File |
Discharge-level |
Diagnosis and procedure groups
linkable to Inpatient Core File |
If
you are working with the NEDS there will be a Core File, with over 27 million
emergency department records, and a Hospital Weights File. The NEDS also has a
Supplemental Emergency Department File, which contains information on
procedures performed in the emergency department for treat-and-release
patients, and a Supplemental Inpatient File, which contains additional data
elements for patients admitted to the hospital after an emergency department
visit. Because the NEDS is such a large database, you should have 75-100 gigabytes
of storage available on your computer in order to be able to work comfortably
with the NEDS.
|
NEDS ASCII Files |
Unit of Observation |
Contents |
|
Core File |
Discharge-level |
All ED visits from the sample of hospitals
in participating states - over 27,000,000 records |
|
Hospital Weights File |
Hospital-level |
One record for each hospital
included in the NEDS, linkable to Core File |
|
Supplemental Emergency Department
File |
Discharge-level |
Information on treat-and-release
visits, linkable to Core File |
|
Supplemental Inpatient File |
Discharge-level |
Information on visits resulting in
an inpatient admission, linkable to Core File |
If
you are working with one of the state databases, the files you receive depend
on the state, the year and the database you are using. The files include: Core
File, Charges File, AHA Linkage File, Disease Severity Measures Files, and
Diagnosis and Procedure Groups File.
|
ASCII Files |
Unit of Observation |
Contents |
|
Available across all databases,
states, and years: |
||
|
Core File |
Discharge-level |
Size and contents vary by
database, state, and year |
|
Charges File |
Discharge-level |
Information on charges associated
with visit, linkable to core files |
|
AHA Linkage File |
Hospital-level |
Hospital information linkable to
core files |
|
Available for selected databases,
states and years: |
||
|
Disease Severity Measures File |
Discharge-level |
Disease severity measures linkable
to core files |
|
Diagnosis and Procedure Groups
File |
Discharge-level |
Diagnosis and procedure groups
linkable to core files |
The
specific contents of the files by state, year and database are available on the
HCUP Central Distributor file specification page. Note that
the number of CDs you receive will vary by state, year, and database.
HCUP-US
Documentation
No
matter which database you are working with, all the documentation and tools you
will need to use your HCUP data files are online at HCUP-US: www.hcup-us.ahrq.gov.
The documentation
available to you is located in the "Databases" section of the
website, under "Database Documentation." This demonstration will use
the NIS Inpatient Core File to demonstrate the load and check processes, so we
will pay particular attention to the NIS
database documentation. The database documentation includes introductory
reports on the database, descriptions of the
data elements, lists
of the available data elements, programs needed to load the data (SAS®, SPSS®,
STATA®),
and analytic tools designed specifically for use with HCUP data (CCS, DRG formats, HCUP formats, HCUP diagnosis and
procedure groups format, ICD-9-CM formats).
If
you are just getting started using HCUP data, you may want to begin with the Introduction
to the NIS, Introduction
to the KID, and Introduction
to the NEDS documents which can all be found on the appropriate database
documentation home pages. These introductory documents contain much of the
information covered in this tutorial, such as the size and structure of the
HCUP database files.
Decompressing
Data Overview
In order to
load and analyze HCUP data onto your computer, you will need to have a minimum
of 5 to 100 gigabytes of space available on your computer, depending on which
database you are working with.
Because of
the size of the HCUP database files, the data are distributed as
self-extracting PKZIP compressed files.
It is not
possible to work with the data directly off the CD, so you must decompress the data
and save it onto your hard drive. The steps involved in decompressing the data
and saving it onto your hard drive are the same for each of the HCUP databases.
Decompressing
Data Step by Step
Next,
this tutorial will cover the steps involved in decompressing the HCUP data and
saving it onto your hard drive. The HCUP data files are self-extracting ZIP
files. As such, if you are using a Windows® operating system, no additional
program is needed to extract them. Note that if a computer has applicable software
installed, it may use such software to decompress the files by default. For
example, WinZip®, a commonly used product, automatically assists in
decompressing the data files. WinZip® is just one of many software packages
that functions in this manner. If you are extracting the files onto a
Macintosh®, you will need a utility program such as StuffIt® Expander, which
can be downloaded for free from Apple®. If you are in need of assistance in
using software packages other than those used in this demonstration, please
contact HCUP Technical Assistance at hcup@ahrq.gov.
The process covered in this tutorial is just one of several ways to go about
loading the data file from the CD onto a computer. If you are familiar with other
means of accomplishing the same steps, it will not be a problem to use those in
place of what is covered here. Just make sure to always check your work to
confirm that the data have decompressed, saved and loaded correctly.
Step
1: Open the directory of your hard drive by selecting My Computer on the
desktop, then select Local Disk (C:).
Step
2: In the hard drive directory, create a folder for the HCUP database so you
have a location to store your files. For the sake of demonstration in this
tutorial, let‘s assume you are working with the 2007 NIS. Select the File menu,
select New, and select Folder. Name your folder "NIS 2007."
Step
3: Insert the CD into your disk drive, and open up the CD directory on your
computer. The CD contains two files: one named NIS_2007_Core.exe and the other
named NIS_2007_Hospital.exe.
Step
4: Select both of these files and select copy.
Step
5: Copy both of these files to the NIS 2007 directory (C:\NIS 2007) you just
created on your hard drive. It will take a few moments to copy the files to the
NIS 2007 folder.
Step
6: Once the files are completely copied, open up the NIS 2007 folder.
Step
7: Next, unzip each file by double-clicking on the file. This will bring up the
WinZip Self-Extractor dialog box.
Step
8: Edit the name of the "Unzip to folder" in the WinZip
Self-Extractor dialog to select the desired destination directory for the
extracted file. You will want the extracted file to be saved in the NIS 2007
directory as well, so select that directory (C:\NIS 2007).
Step
9: Now, click on the Unzip button to begin the unzip process. This process will
take about 5 to 10 minutes, depending on the speed of your computer. The ASCII
data files will then be uncompressed and saved in this directory.
When
the files are saved in the correct location, they should be on your hard drive,
in the NIS 2007 folder.

Remember
that you are responsible for the security of your HCUP data and the Data Use
Agreement requires the data to be stored in a safe place. Loading this data
onto a LAN where other users have access is not allowed unless the other users
have all signed an HCUP Data Use Agreement.
Load
into Statistical Software
Now
that you've saved the data onto your hard drive, you have to load it into a
statistical software package in order to work with it. HCUP-US offers load
programs in SAS®, SPSS® and STATA® because these are some of the most
frequently-used packages, but there are other statistical software programs on
the market that can also be used. For the HCUP state databases, load programs
are available in SAS® and SPSS®. Note that HCUP data files cannot be analyzed
using desktop spreadsheet or database applications because of their size and
complexity.
Next,
this tutorial will demonstrate loading the Core File using SAS®. Go to the NIS SAS
Load Programs.
Step
1: Click on the 2007 NIS Core
File Load Program to download the program to your computer.
Step
2: When the dialogue box appears, select Open to open the load program in SAS®.
Note that your version of SAS® may differ from the version used in this
tutorial. While the layout and icons may be different, the code used will be
the same regardless of the version.
Step
3: Once the program is open, use the
libname command to assign a library to the same location where you have stored
the ASCII file–that is, the NIS 2007 folder on the C drive.
libname nis "C:\nis 2007\";
When you are calling the file into SAS®, use the assigned
libname as a prefix to the name of the data set in order to let SAS® know where
to store the data set. Indicate where on the computer SAS® should look to find
the ASCII file you want to load. In this case, you should indicate that the
data is saved in the NIS 2007 folder on your hard drive.
Find the lines:
DATA NIS_2007_Core;
INFILE ‘NIS_2007_Core.ASC’ LRECL =
508;
Change to:
DATA nis.NIS_2007_Core;
INFILE ‘C:\NIS
2007\NIS_2007_Core.ASC’ LRECL = 508;
Step
4: After you‘ve made the two modifications needed for the program to know where
your data can be found, scroll through the rest of the load program.
Step
5: After you‘ve finished scrolling through the program, select run and submit.
The
program takes a few minutes to run. When the run is completed, SAS® generates a
log file of the program.
You
should check this log file to make sure that there are not any error messages.
If there are error messages, you may need to double check your work to this
point. If there are no error messages, there will be notes indicating how long
it took SAS® to load the file as well as notes describing each step the program
executed.
After
scrolling through the log file, check the data to make sure it loaded properly.
The files from each of the databases will be quite similar, although the data
elements and number of records included in each will differ greatly.
Approach
After
you have loaded the data into your statistical software, you are ready for the
next step. You need to check that you‘ve loaded the data correctly.
Step
1: Verify that you saved the file in the appropriate location on your drive.
The file should be in the NIS 2007 folder and it should be 4,151,457 KB.
Note
that now that the file has been loaded into the statistical software, you may
want to delete the original ASCII version as well as the ZIP file to keep them
from taking up too much space on your computer.
Step
2: To check the data, you can go back to HCUP-US and pull some summary
statistics files. You can compare the summary statistics files from HCUP-US
to basic frequencies and distributions run on the data you've just loaded.
Summary
Statistics
Step
1: Go back to the NIS
Data Documentation section of HCUP-US.
Step
2: Under the heading "Description of Data Elements in the NIS" there is a
bullet with a link to Summary
Statistics. Open the Summary Statistics page. Note that if you are working
with a database other than the NIS, just go to the appropriate Data
Documentation screen and choose the summary statistics which correspond to the
database you've loaded. Summary statistics are available for each of the files
on the data CD you received from the HCUP Central Distributor. Right now, focus
on the statistics for the unweighted Core File. In later modules, you'll come
back to the other summary statistics files.
Step
3: Open up the 2007 NIS unweighted Core File summary statistics. This file provides
you with basic statistics on each of the data elements in the NIS - the N, or
number of records which contain the data element, the number of records from
which the data element is missing, the minimum and maximum values of the data
element, the mean, and the standard deviation.
Step
4: For each data element, there will be information on the frequency and
percent distribution of discharges by specific data element. For example, for
AWEEKEND, the data element which stores the information on whether the admission
occurred on a weekend or a weekday, of the 8,043,415 records in the NIS, about
20 percent - 1,556,914 - are coded as having been admitted on the weekend. Less
than 10 records did not have anything coded for AWEEKEND.
Running
Check Programs
In
order to check the data, you will need to create tables of means and of
frequency distributions from the data you've loaded on your computer and then
compare those statistics to the summary
statistics available on HCUP-US.
If
the numbers match the HCUP summary statistics you downloaded from HCUP-US,
you‘ll know the data have loaded correctly. If not, you’ll know there's a
problem and you‘ll have to go back and figure out what you've done wrong.
Step
1: To create the tables, go back to SAS®. Start with a short program to check
the means of the data elements - it will generate a table which should match
the first table in the Summary Statistics file.
libname nis "C:\nis 2007\";
proc means data=nis.nis_2007_core;
run;
Step
2: Compare the output of the SAS® program to the information in the Summary
Statistics file. The information should match up.


Step
3: Generate some tables to check the frequency distributions of various data
elements. Since you covered the AWEEKEND table in the Summary Statistics File,
start with the AWEEKEND data element.
libname nis "C:\nis 2007\";
proc freq data=nis.nis_2007_core;
tables aweekend;
run;
Your
output should match exactly the data in the Summary Statistics File. It is
recommended that you run frequency distributions on a few other data elements
as well.
Identifying
Possible Problems
What
happens if the statistics you generate from the data you load do not agree with
those in the summary statistics? You will need to double check your work.
First,
check to make sure you used the load program that corresponds to the database
you are working with.
Check
to make sure you used the summary statistics which correspond to the database
you are working with.
Finally,
you should check the code you used to generate the means and frequency
distributions.
If
none of these resolve the problem, HCUP Technical Assistance is available at hcup@ahrq.gov.
Key
Points
As
you begin to work with HCUP data, the following key points will help you along
the way:
The
load programs and summary statistics you will need to properly load and check
your data are available on HCUP-US
along with any other database documentation you may need.
Data
should always be checked after it has been loaded to make sure it has loaded
correctly.
If
you encounter problems loading or checking your data, review your work and then
contact HCUP Technical Assistance if necessary at hcup@ahrq.gov.
Resources
and Other Training
If
you are looking for more information on the subject matter covered here, many
resources are available on the HCUP User Support website.
If
you can't find what you need, feel free to email the HCUP Technical Assistance
staff at hcup@ahrq.gov. AHRQ has senior research personnel
available to respond to technical questions you may have. Inquiries are
answered within three business days.
Thank
you for accessing this module. There are several other HCUP
online tutorials. Access these tutorials to see if there are other topics
that could be helpful to you.
If
you have any feedback regarding this module, please email us at hcup@ahrq.gov.