SPARCS offers three types of data files. The type of data file that is right for you will depend on the details required to do your research.
1. Deidentified (Public Use)
Public Use data files reside on the Department’s open data platform called Health Data NY. Topics range from counts of inpatient stays by facilities or counties, to 30-day acute stroke mortality rates by hospital, to calculated inpatient quality indicators and potentially preventable emergency visits. These files are longitudinal, often containing data from 2009 to present day. Most of the Public Use files are aggregate data reported at the facility, county, or statewide level.
For those that want data at a patient-level, we do have a patient-level deidentified Hospital Inpatient Discharges (SPARCS De-Identified) file. The files begin in 2009 and contain information on a person’s inpatient visit. However, the data has been deidentified by masking, aggregating certain data elements, or removing them all together in compliance with laws and standards. At this time, we do not have similar patient-level data files for emergency department visits or ambulatory surgery visits. For access to these claim types, you will need to apply for a Limited or Identifiable data file.
2. Limited Identifiable (Limited)
Limited data files come as 14 relational data tables (in a .dat format). The data tables contain patient-level health information that are considered identifiable, but the files do not contain personal direct identifiers in keeping with various laws and standards. There are indirect identifiers and personal health information in these files, therefore, this type of request requires the data requestors to submit an application, go through an approval process, and sign appropriate security guidelines and data use agreements.
Limited data files mask or redact direct identifiers. For example, Limited data files provide users with masked dates. Instead of showing actual admission or discharge dates for the patient, the data is masked to the first of the month the admission/discharge occurred (0X/01/XXXX).
To see how a Limited data file treats data elements of interest to you, review Tab 3 within the SPARCS Data Dictionary and the Data Governance Policy and Procedure Manual.
3. Identifiable
Identifiable data files come as 14 relational data tables (in a .dat format). The data tables contain patient-level health information that are considered identifiable. Researchers can request five (5) different types of personal direct identifiable data element categories: Dates, Date of Birth, Patient Address, Patient Record Numbers, and Policy Numbers. Requestors must provide a justification for why the data is needed for the proposed research and how the data will be used during the project. This type of request requires the data requestors to submit an application, go through a Data Governance Committee, and sign appropriate security guidelines and data use agreements.
Identifiable data files mask or redact direct identifiers to the level of a Limited data file by default, unless the researcher is approved to use the identifiable data element categories. For example, you may need the exact date a person was admitted to, or discharged from, the hospital. The Limited file’s masking to the month will not work for you. You can request “Dates” in your application and justify why it is needed. If you are approved to use Dates, your file will contain the exact admission and discharge date. The other Identifiable data element categories such as Date of Birth, Patient Address, etc will remain the values in a Limited data file.
To see how a Limited data file treats data elements of interest to you, review Tab 3 within the SPARCS Data Dictionary and the Data Governance Policy and Procedure Manual. For more information on the approval process, you can read this article from our Knowledge Center.