Researchers who use online platforms often aim to collect data from hundreds or even thousands of participants. Bots, programmed to complete surveys in minutes, can sabotage the online research system by posing as eligible human participants. Bot programmers have developed ways to create a normal distribution across responses or craft open-ended responses using extracted language from the survey itself to appear logical or believable. Bot-generated responses can jeopardize data quality, add undue stress, delay study timelines, and potentially eat researchers’ valuable resources, especially when the study offers participant payment.

Diligence and Bot Identification 

The recent spike in bot programmers makes it essential for researchers to remain diligent and develop quality attention checks to identify bots, but not overburden legitimate human participants. Researchers should always consult TC IT for data security strategies particular to their research design. Below are some suggestions for researchers wishing to collect data online.

  • Be Observant: Bot programmers design systems that are sophisticated and adaptable. Stacking attention checks and other safeguards enable a researcher to investigate suspicious survey responses while not unfairly disqualifying a legitimate participant.
  • Build Safeguards: Build attention and logic checks directly into your survey, as well as open-ended questions. For example, consider building a paragraph-based attention check. Within that paragraph directly state which answer to choose. This approach embeds the attention check within a body of text which bots may be unable to decipher. Then, ask the participant to type out their response to that attention check.
  • Check Contacts: Check for unusual details in participant information. There may be unusual email addresses, discrepancies between the participants' name and contact information, or missing details. 
  • Consult Others: Consult others when evaluating suspicious survey responses. Some responses may not be from a bot but just an inattentive human survey taker, who would otherwise qualify for the study and may be due compensation.
  • Develop a Compensation Protocol: Decide if a suspicious survey response will be removed from data analysis but the respondent will still be paid, or if a suspicious response automatically disqualifies the respondent from payment. Reasons for disqualification should be clearly articulated in the consent form (e.g., “two failed attention checks”). 
  • Develop If/Then Conditional Logic Questions: Conditional logic questions branch outward and can disrupt the flow of bot responses, as a bot may skip to a specific action based on its programming. This could trigger an unusual response, making it possible for a researcher to detect.
  • Identify Oddities: Human subjects are flawed, and researchers should not exclude a legitimate participant just because their responses are quirky. However, researchers should flag and further analyze survey responses that include: 
    • (1) unanswered required questions or requests; 
    • (2) inconsistent responses to identical questions; 
    • (3) incomplete surveys; 
    • (4) impossible data values (e.g., asked for age listed 103);
    • (5) illogical responses to open-ended questions.
  • Look for Locations: Look over where participants are located. If there are responses coming from unexpected countries without personal recruitment or participants' addresses do not match their GPS coordinates, they are signs of bot activity. 
  • QR Codes, Word Scramble, and “Click Here” Prompts: Embed QR codes that are only detectable by bots. Present a scrambled word like ‘S-D-R-I-A-U’ and ask the participant to unscramble it to spell RADIUS. Use “click only the areas that have…” questions, which often derail bots. 
  • Repeat Questions: Ask the same question at separate points using different modes (e.g., ask about age in a drop down menu style, and then later as an open-ended question). Verify that the answers match.
  • Track Study Time Stamps: When was the survey started/ended? Do those dates/times seem unusual (e.g., 2:30 a.m. on a Sunday)? Were there many surveys completed at the exact same time/date? Was the survey completed at an unusually fast pace? 
  • Take Care where you Share: Reconsider posting a survey link on public social media profiles, such as Twitter or Instagram, as many bot programmers tend to gain access through these platforms.
  • Unique Survey Link: If possible, design survey links that are unique and not an open, public link. Share an online survey link through study sites, not just broadly on social media. This approach will also help prevent the participant from clicking the link more than once or sharing the link broadly with others.

Online data collection is rich in resources; it provides researchers with opportunities to collect diverse responses in a relatively low-cost way. However, the benefits of online data collection should be metered with its risks.