Supporting data for "A Meta-Intervention: Quantifying the Impact of Social Media Information on Adherence to Non-Pharmaceutical Interventions"
This dataset supports a research project in the field of digital medicine, which aims to quantify the impact of disseminating scientific information on social media—as a form of "meta-intervention"—on public adherence to Non-Pharmaceutical Interventions (NPIs) during health crises such as the COVID-19 pandemic. The research encompasses multiple sub-studies and pilot experiments, drawing data from various global and China-specific social media platforms.
The data included in this submission has been collected from several sources:
- From Sina Weibo and Tencent WeChat, 189 online poll datasets were collected, involving a total of 1,391,706 participants. These participants are users of Sina Weibo or Tencent WeChat.
- From Twitter, 187 tweets published by scientists (verified with a blue checkmark) related to COVID-19 were collected.
- From Xiaohongshu and Bilibili, textual content from 143 user posts/videos concerning COVID-19, along with associated user comments and specific user responses to a question, were gathered.
It is important to note that while the broader research project also utilized a 3TB Reddit corpus hosted on Academic Torrents (academictorrents.com), this specific Reddit dataset is publicly available directly from Academic Torrents and is not included in this particular DataHub submission.
The submitted dataset comprises publicly available data, formatted as Excel files (.xlsx), and includes the following:
- Filename: scientists' discourse (source from screenshot of tweets)
- Description: This file contains screenshots of tweets published by scientists on Twitter concerning COVID-19 research, its current status, and related topics. It also includes a coded analysis of the textual content from these tweets. Specific details regarding the coding scheme can be found in the
readme.txt
file.
- Description: This file contains screenshots of tweets published by scientists on Twitter concerning COVID-19 research, its current status, and related topics. It also includes a coded analysis of the textual content from these tweets. Specific details regarding the coding scheme can be found in the
- Filename: The links of online polls (Weibo & WeChat)
- Description: This data file includes information from online polls conducted on Weibo and WeChat after December 7, 2022. These polls, often initiated by verified users (who may or may not be science popularizers), aimed to track the self-reported proportion of participants testing positive for COVID-19 (via PCR or rapid antigen test) or remaining negative, particularly during periods of rapid Omicron infection spread. The file contains links to the original polls, links to the social media accounts that published these polls, and relevant metadata about both the poll-creating accounts and the online polls themselves.
- Filename: Online posts & comments (From Xiaohongshu & Bilibili)
- Description: This file contains textual content from COVID-19 related posts and videos published by users on the Xiaohongshu and Bilibili platforms. It also includes user-generated comments reacting to these posts/videos, as well as user responses to a specific question posed within the context of the original content.
Key Features of this Dataset:
- Data Type: Mixed, including textual data, screenshots of social media posts, web links to original sources, and coded metadata.
- Source Platforms: Twitter (global), Weibo/WeChat (primarily China), Xiaohongshu (China), and Bilibili (video-sharing platform, primarily China).
- Use Case: This dataset is intended for the analysis of public discourse, the dissemination of scientific information, and user engagement patterns across different cultural contexts and social media platforms, particularly in relation to public health information.