Submitting a Protocol for Existing Data

Submitting a Protocol for Existing Data


“Research involving collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.” (§46.101(b)(4))

Exempt Category 4 - Secondary Research is commonly referred to as “previously collected data,” “existing data,” or “secondary data.” Research that falls under Exempt Category 4 does not involve new recruitment of human participants. In order to qualify for this exemption, the materials to be used in the research must satisfy two criteria: 

  1. The data/specimens must be existing or "on the shelf" when the research is proposed to the IRB (e.g., proposed project start date). 
  2. The existing data cannot include direct identifiers (e.g., names, social security numbers, addresses, phone numbers) or indirect identifiers (codes or pseudonyms that are linked to the subject's identity).

The following guide will explain how to submit a protocol for existing data to Teachers College (TC) Institutional Review Board (IRB) review. 

Submission Requirements

All submissions to TC IRB should include a Primary Investigator (PI) survey (submitted through TC Mentor IRB) and a completed IRB application (IRB Application Template). A Sample Application for Existing Data can also be downloaded and used for reference. For detailed information on how to submit to TC IRB, please visit our How to Submit page. 

On the IRB Application, there are a number of questions that ask about “new data collection” or “recruitment efforts.” For these questions, you may respond with “Not Applicable” when submitting an Exempt Category 4 protocol, because no new data will be collected as part of the study parameters.

Researchers should clearly indicate the identifiers that will not be included with the final dataset. 

Data Security

Data retrieved from an external institution or individual should be transferred via secure methods and stored in a safe location. When receiving datasets from external institutions, researchers are required to abide by guidelines and policies set in place by the data owners. Failure to do so can result in consequences from both the external institutions and TC IRB. Data protection measures should also be carefully considered. Researchers are advised to consider the following documents per their specific research needs: 

In most cases, the following documents are not required for an Exempt Category 4 TC IRB protocol submission:

  • Consent, Assent, or Parent Permission 
  • Recruitment Materials
  • Study Instruments

What about Prospective Data?

Prospective vs. Retrospective: Prospective studies involve individuals over time, where data is collected about them as their characteristics or circumstances change. These studies are not considered Exempt Category 4. Retrospective studies must have all data or specimens in existence prior to the start date of data analysis in order to qualify for exempt review. IRB protocols can be submitted for existing data before data collection has ended. However, the protocol activities cannot begin until data collection is complete. 

What about Publicly Available Data?

Public sources of data include local telephone directory information, publicly available websites, open use datasets, etc. Student records, which are covered by the Family Educational Rights and Privacy Act (FERPA), are not public records. These (de-identified) records can be considered existing data, as the information was collected prior to the start of the study protocol.

Existing data can help reduce new data collection burdens, and can also show how data evolves over time. Researchers can consider the following ways to explore existing data:

  • Techniques to Acquire Existing Data: Publicly available digital spaces provide ideal resources for accessing existing data. These sources may include websites, newspapers, blogs, or social media. Existing data may also include personal archives, notes, journal entries, or text that were collected in the past. 
  • Existing but Unexplored Data: Colleagues may possess unanalyzed (de-identified) data sources or data sets. This data may be considered a viable source for analysis. Researchers may consider ways to share data.
  • Possible Triangulation: Existing data may offer pathways to compare and contrast various data sources to find patterns of difference and discuss their meaning. The researcher may synthesize information from different research sources or compare existing over time. 

What Research is NOT Exempt Category 4 - Existing Data?

  • New Data Collection: A researcher wants to understand how critical thinking techniques impact math learning. He will survey and interview participants in order to answer his research questions. This study involves new data collection, and does not qualify for Exempt Category 4. 
  • Linked Identifiers & Potential Follow Up: A researcher will obtain access to the research database for an existing study, and will record whether or not participants on the study received information about smoking cessation. The entries have linked identifiers (e.g., names, phone numbers, etc.), and will be coded so that she can go back to the research data at a later date to assess health outcomes. Aside from the code, she will only record age, smoking status, whether they received the cessation information, and blood pressure. The linked identifiers (and potential participant follow-up) may not qualify for Exempt Category 4 - Existing Data. 
  • Not “On the Shelf” Data: A graduate student submits a protocol for review to access existing data, with a start date on January 1, 2020. However, the data will not be fully collected and de-identified until May 1, 2020. This does not qualify as Exempt Category 4 - Existing Data, as the start date of her project and the completion date of data collection do not coincide. In other words, the data must be "on the shelf" to qualify as existing data, and must align with the start date of the project.

An existing data set may be suitable for answering a new research question or exploring a hypothesis. Secondary analysis may be a researcher's preferable option since it can be completed in less time, for less money, and with far lower risk to research subjects. A researcher must carefully consider if the existing data quantity and quality are adequate to answer the proposed research questions.

Tips for completing an IRB Protocol involving Existing Data

You can outline these details in the IRB application, through an uploaded Data Security Plan and/or Data Sharing (or Use) Agreement (templates are on our website and on Mentor IRB/Documentation).

  1. What is the data? The IRB will need to know if you are using data sets, video recordings, audio recordings, journal entries, photos, transcripts, survey responses, etc. If you are using data sets, the IRB will need to know what data fields you will use and how. Explain the data set. What is it? What content is included in the data set? Does it include education or health related content? How was it originally collected (e.g., under the oversight of another IRB, not originally for research purposes)?
  2. How will you obtain access to the data? The IRB will need to know if the data are publicly available or if there are restrictions for accessing the data. Is there an affiliated data repository? Is the data privately held? Explain how you plan to access existing data. What is your data transfer method? Are you engaged in a Data Sharing (or Use) Agreement? Please elaborate on the IRB application or upload supplemental documents with your IRB submission to explain how you will access the data (e.g., Data Sharing (or Use) Agreement).
  3. How many records will you access? Will the data be combined with other data sources? How easy is it to deduce the identities of the participants? The IRB needs to understand the complete picture of the data and the potential to deduce identity which could compromise confidentiality. What are the inclusion/exclusion criteria for the existing data you are proposing to receive? Are you using every single data point from the previous study/data set? Are you only using certain data points? Does it include personally identifiable information? Is it de-identified? Please elaborate on the IRB application.
  4. Can the participants be linked to their data? The IRB will need to know in what form you will receive the data. Can the data be de-identified? Are the data linked and stripped of identifiers? Who prepared the data for you? Will you merge multiple data sets? What is your data security plan for obtaining, receiving, accessing, storing, sharing, and protecting that data? Will the data be made public (stored in a repository)? Is the data only shared in aggregate? How long will you keep the data?
Back to skip to quick links