The presence of social media plays an important role in our social lives. It can shape and shift attitudes, values, beliefs, intentions, and behaviors. Creating and disseminating social media content into key information, concepts, and themes is critical for generating knowledge, developing strategies, and creating policies (Lai &To, 2015). 

From an ethical oversight perspective, social media content analysis intersects with established human subjects research regulations. Under the U.S. Common Rule (45 CFR 46), investigators must determine whether the analysis constitutes research involving “human subjects”. This distinction between data analysis and human subjects research is foundational for the IRBs, influencing whether studies qualify for exemption, expedited, or full review. Moreover, the Belmont Report principles of Respect for Persons, Beneficence, and Justice remain essential benchmarks for internet-mediated research.

As a research method, social media content analysis involves systematically analyzing and interpreting data from various forms of media, such as written texts, visuals, audio, and videos. It is a highly specialized approach that has become increasingly popular in the fields of sociology, psychology, health care, communication, and marketing, despite the notable disadvantages of using data from social media platforms (Fu et al., 2023). These disadvantages include potential dissemination of non-factual data, analysis of data that are not representative of the general population, presentation of extreme views, and presentation of content with the intent of sarcasm or criticism. Other difficult considerations may include determining the value of social media research; balancing benefit and harm with study participants, as well as maintaining user privacy and confidentiality, and the requirement for obtaining informed consent, particularly from publicly available content (D’Souza et al., 2021).

For qualitative researchers, this sort of content analysis provides the opportunity to harvest a massive and diverse range of content without the need for intrusive or intensive data collection procedures (Andreotta et al., 2019). One of the crucial debates relates to the lack of comprehensive ethical standards in place for social media data use in mental health research. For instance, whether freely accessible social media data is public or private remains unclear. Content posted on fully public platforms like X (formerly known as Twitter) may be considered publicly available data under the IRB policy, whereas data drawn from closed or semi-private online communities such as Facebook groups may still require informed consent or de-identification, given the users’ reasonable expectations of privacy. The IRBs should therefore evaluate not only the platform's terms of service but also the contextual integrity of user expectations when determining risk level. Additionally, information that many would deem sensitive could be frequently shared in ways that were not intended by the user, leading to a potential for both government and private misuse. The users may not have intended their social media posts to be used for health research purposes or may have never signed an informed consent to be part of a research study, and would perceive this as a breach of their privacy. Due to the stigma around mental health problems, any identification of personal information could be harmful for the individual, particularly if the dataset includes posts by minors, individuals with disabilities and mental health conditions, or other vulnerable groups. The IRB may request additional safeguards, such as exclusion criteria for minors, trigger warnings for sensitive content, or referral protocols if the posts indicate imminent risk (like self-harm disclosures). To mitigate these risks, research protocols should indicate clear data protection plans, such as encryption of datasets, restricted access to identifiable information, systematic removal of usernames, geolocation data, or embedded media. The IRB will also require investigators to describe how re-identification risks will be minimized, particularly when datasets are shared for secondary use.

Some researchers use the data, but do not have robust security measures in place to protect against the loss or theft of user information. Even as a descriptive method, social media content analysis cannot establish a cause-and-effect relationship. It can show associations between social media use and psychological outcomes, but not that one causes the other. Similarly, the impact of social media on well-being and education is varied and complex, depending on context and how it is used, and the research could often produce inconsistent results (Tusl et al., 2022).

Despite the limitations, one perceived advantage of content analysis is its systematic and replicable structure, which can promote transparency and consistency. This can allow for accurate and consistent analyses of data. Objectivity is particularly important when dealing with large datasets, as it ensures that researchers do not miss important information or make incorrect interpretations. However, true objectivity is rarely attainable; the process of coding and interpretation remains subject to researcher bias, cultural assumptions, and linguistic nuance. Secondly, content analysis is versatile. It can be used to analyze data from a wide range of sources, including social media, news articles, advertisements, and historical documents. It allows researchers to access diverse data and gain insight into various topics and research questions. For example, content analysis of social media posts can provide valuable insights into public opinions and attitudes towards a particular topic or product. Compared to other research methods, content analysis is relatively cost and time-efficient. It does not require a lot of resources or equipment, and data can be collected and analyzed quickly. This makes content analysis a suitable method for researchers with limited budgets or tight timelines. Besides, since the data is already available, there is no need for time-consuming data collection processes, which can save significant time and effort. Another advantage is that content analysis allows for both quantitative and qualitative data analysis. This flexibility is essentially useful when researchers want to combine the strengths of both approaches. For instance, researchers can use quantitative methods to identify patterns and trends in data, while also using qualitative methods to gain a deeper understanding of the underlying meanings and motivations (Marti et al., 2019).

In conclusion, while researchers face practical and ethical challenges of accessing social media data, content analysis is a valuable research method that has its advantages as well. It allows for objective and systematic analysis of diverse data, making it a versatile and cost-effective approach. However, researchers must be aware of the limitations and take necessary precautions to ensure the validity and reliability of their findings. For the IRBs, the operational challenge lies in standardizing how social media research proposals are assessed. Reviewers should consider whether the data are truly public, whether the research could reasonably harm individuals if disclosed, and whether informed consent can be ethically waived under 45 CFR 46.116. Developing standardized checklists or review templates for internet-based research would ensure consistency and thorough accountability.

The path forward would be for the ethics committees and IRBs to issue clear, field-specific guidance outlining procedures for consent waivers, data anonymization, and privacy risk assessment in digital contexts. Such guidelines should emphasize transparency, user consultation, and protection of vulnerable populations, ensuring that social media research upholds the same ethical rigor as traditional human subjects studies. With careful planning and utilization, content analysis can provide valuable insights and contribute to the advancement of research in various fields. 

References:

Andreotta, M., Nugroho, R., Hurlstone, M. J., Boschetti, F., Farrell, S., Walker, I., & Paris, C. (2019).  Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis. Behavior Research Methods, 51, 1766-1781. https://doi.org/10.3758/s13428-019-01202-8

D’Souza, R. S., Hooten, W. M., & Murad, M. H. (2021). A proposed approach for conducting studies  that use data from social media platforms. Mayo Clinic Proceedings, 96(8), 2218-2229. https://doi.org/10.1016/j.mayocp.2021.02.010

Fu, J., Li, C., Zhou, C., Li, W., Lai, J., Deng, S., Zhang, Y., Guo, Z., & Wu, Y. (2023). Methods for  analyzing the contents of social media for health care: Scoping review. Journal of Medical Internet Research, 25, e43349. http://dx.doi.org/10.2196/43349

Lai, L. S. L., & To, W. M. (2015). Content analysis of social media: A grounded theory approach. Journal of Electronic Commerce Research, 16(2), 138-152. https://www.researchgate.net/publication/276304592_Content_analysis_of_social_media_A_gr ounded_theory_approach  

Marti, P., Serrano-Estrada, L., & Nolasco-Cirugeda, A. (2019). Social media data: Challenges,  opportunities and limitations in urban studies. Computers, Environment and Urban Systems,  74, 161-174. https://doi.org/10.1016/j.compenvurbsys.2018.11.001

Tusl, M., Thelen, A., Marcus, K., Peters, A., Shalaeva, E., Scheckel, B., Sykora, M., Elayan, S.,  Naslund, J. A., Shankardass, K., Mooney, S. J., Fadda, M., & Gruebner, O. (2022).  Opportunities and challenges of using social media big data to assess mental health consequences of the COVID-19 crisis and future major events. Discover Mental Health, 2, 14.  https://doi.org/10.1007/s44192-022-00017-y