This README.txt file was generated on 2022-04-07 by Marie Broeckling ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: China-State-Endorsers 2. Author Information First Author Contact Information Name: Marie Broeckling Faculty: Journalism and Media Studies Center, University of Hong Kong Email: marieb@connect.hku.hk Corresponding Author Contact Information Name: Marie Broeckling Faculty: Journalism and Media Studies Center, University of Hong Kong Email: marieb@connect.hku.hk Second Author Contact Information Name: Hu Haohan Lily Faculty: Journalism and Media Studies Center, University of Hong Kong Email: lilyhhhu@connect.hku.hk Third Author Contact Information Name: Prof. Fu King-wa Faculty: Journalism and Media Studies Center, University of Hong Kong --------------------- DATA & FILE OVERVIEW --------------------- Directory of Files: A. Filename: Code Short description: The folder "Code" contains all code that was used in this project. SE_01 is a Python script to download video information, user information, comments and replies from YouTube using the YouTube API. SE_02 is a R script to download tweets that contain relevant video URLs from Twitter using the Twitter v2 API with academic access. SE_03 is a R script to scrape information about users that are labelled as Chinese officials from Twitter, using Selenium software. SE_04 is a R script to analyse percentage of tweets posted by Chinese officials, and create plots. SE_05 is a R script to calculate overlap between state endorser state media audiences on YouTube and Twitter, and create plots. SE_06 is a R script to analyse the 20% most active users who tweet state endorser and state media videos for language, activity and followers. Creates plots. SE_07 is a Python script to create timer series analysis plots. B. Filename: Raw Short description: The folder "Raw" contains data that was used to conduct analysis and create plots. Namely: Data downloaded via the Twitter and YouTube API. List of Chinese officials Twitter accounts published by other researchers. List of state endorsers and state media accounts on YouTube compiled by us using manual coding. C. Filename: Results Short description: The folder "Results" contains data that is the result of our analysis. Namely: Frequency tables created by us. Data about the 20% most active Twitter users in our sample. Data frame containing all Chinese officials Twitter accounts compiled by us. D. Filename: Output Short description: The folder "Output" contains all graphics created by us. Additional Notes on File Relationships, Context, or Content (for example, if a user wants to reuse and/or cite your data, what information would you want them to know?): File Naming Convention: The seven scripts are numbered from SE_01 to SE_07. The order is intentional because they built on each other. Scripts that create plots have the ending "_plots". Files that contain data about state endorsers are named "stateendorser" Files that contain data about state media are named "statemedia" The term "Chinese officials" in file names means it contains data about Chinese government AND Chinese state media. Files containing YouTube data follow YouTube API convention. That means they are separated into "comments" and "replies", where replies are replies to comments. ----------------------------------------- DATA DESCRIPTION FOR: [FILENAME] ----------------------------------------- 1. Number of variables: NA 2. Number of cases/rows: NA 3. Missing data codes: NA Code/symbol Definition Code/symbol Definition 4. Variable List A. Name: Description: Value labels if appropriate B. Name: Description: Value labels if appropriate -------------------------- METHODOLOGICAL INFORMATION -------------------------- # # Software: If specialized software(s) generated your data or # are necessary to interpret it, please provide for each (if # applicable): software name, version, system requirements, # and developer. #If you developed the software, please provide (if applicable): #A copy of the software’s binary executable compatible with the system requirements described above. #A source snapshot or distribution if the source code is not stored in a publicly available online repository. #All software source components, including pointers to source(s) for third-party components (if any) 1. Software-specific information: Name: R and RStudio Version: R version 4.0.4 (2021-02-15) -- "Lost Library Book RStudio 2021.09.2+382 "Ghost Orchid" Release (fc9e217980ee9320126e33cdf334d4f4e105dc4f, 2022-01-04) for macOS Mozilla/5.0 (Macintosh; Intel Mac OS X 12_2_0) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.10 Chrome/69.0.3497.128 Safari/537.36 System Requirements: Open Source? (Y/N): Y (if available and applicable) Executable URL: Source Repository URL: Developer: Product URL: Software source components: Name: Python Version: Python 3.7 System Requirements: Open Source? (Y/N): Y (if available and applicable) Executable URL: Source Repository URL: Developer: Product URL: Software source components: Name: Selenium Version: selenium-server-standalone-3.141.59 System Requirements: Open Source? (Y/N): Y (if available and applicable) Executable URL: Source Repository URL: Developer: Product URL: Software source components: Additional Notes(such as, will this software not run on certain operating systems?): 2. Equipment-specific information: NA Manufacturer: Model: (if applicable) Embedded Software / Firmware Name: Embedded Software / Firmware Version: Additional Notes: