This README.txt file was generated on 2022-04-07 by Marie Broeckling


-------------------
GENERAL INFORMATION
-------------------

1. Title of Dataset: China-State-Endorsers


2. Author Information
<create a new entry for each additional author>

First Author Contact Information
    Name: Marie Broeckling
    Faculty: Journalism and Media Studies Center, University of Hong Kong
    Email: marieb@connect.hku.hk


Corresponding Author Contact Information
    Name: Marie Broeckling
    Faculty: Journalism and Media Studies Center, University of Hong Kong
    Email: marieb@connect.hku.hk


Second Author Contact Information 
    Name: Hu Haohan Lily 
    Faculty: Journalism and Media Studies Center, University of Hong Kong
    Email: lilyhhhu@connect.hku.hk


Third Author Contact Information 
    Name: Prof. Fu King-wa
    Faculty: Journalism and Media Studies Center, University of Hong Kong

---------------------
DATA & FILE OVERVIEW
---------------------

Directory of Files:
   A. Filename: Code        
      Short description: The folder "Code" contains all code that was used in this project.
SE_01 is a Python script to download video information, user information, comments and replies from YouTube using the YouTube API.
SE_02 is a R script to download tweets that contain relevant video URLs from Twitter using the Twitter v2 API with academic access. 
SE_03 is a R script to scrape information about users that are labelled as Chinese officials from Twitter, using Selenium software.
SE_04 is a R script to analyse percentage of tweets posted by Chinese officials, and create plots.
SE_05 is a R script to calculate overlap between state endorser state media audiences on YouTube and Twitter, and create plots.
SE_06 is a R script to analyse the 20% most active users who tweet state endorser and state media videos for language, activity and followers. Creates plots.
SE_07 is a Python script to create timer series analysis plots.
        
   B. Filename: Raw        
      Short description: The folder "Raw" contains data that was used to conduct analysis and create plots. Namely:
Data downloaded via the Twitter and YouTube API.
List of Chinese officials Twitter accounts published by other researchers.
List of state endorsers and state media accounts on YouTube compiled by us using manual coding.

        
   C. Filename: Results       
      Short description: The folder "Results" contains data that is the result of our analysis. Namely:
Frequency tables created by us.
Data about the 20% most active Twitter users in our sample.
Data frame containing all Chinese officials Twitter accounts compiled by us.


   D. Filename: Output        
      Short description: The folder "Output" contains all graphics created by us.

Additional Notes on File Relationships, Context, or Content 
(for example, if a user wants to reuse and/or cite your data, 
what information would you want them to know?):              

File Naming Convention:

The seven scripts are numbered from SE_01 to SE_07. The order is intentional because they built on each other. 
Scripts that create plots have the ending "_plots". 

Files that contain data about state endorsers are named "stateendorser"
Files that contain data about state media are named "statemedia"

The term "Chinese officials" in file names means it contains data about Chinese government AND Chinese state media.

Files containing YouTube data follow YouTube API convention. That means they are separated into "comments" and "replies", 
where replies are replies to comments. 


-----------------------------------------
DATA DESCRIPTION FOR: [FILENAME]
-----------------------------------------
<create sections for each dataset included>


1. Number of variables: NA


2. Number of cases/rows: NA


3. Missing data codes: NA
        Code/symbol        Definition
        Code/symbol        Definition


4. Variable List

    A. Name: <variable name>
       Description: <description of the variable>
                    Value labels if appropriate


    B. Name: <variable name>
       Description: <description of the variable>
                    Value labels if appropriate

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

#
# Software: If specialized software(s) generated your data or
# are necessary to interpret it, please provide for each (if
# applicable): software name, version, system requirements,
# and developer. 
#If you developed the software, please provide (if applicable): 
#A copy of the software’s binary executable compatible with the system requirements described above. 
#A source snapshot or distribution if the source code is not stored in a publicly available online repository.
#All software source components, including pointers to source(s) for third-party components (if any)

1. Software-specific information:
<create a new entry for each qualifying software program>

Name: R and RStudio 
Version: R version 4.0.4 (2021-02-15) -- "Lost Library Book
RStudio 2021.09.2+382 "Ghost Orchid" Release (fc9e217980ee9320126e33cdf334d4f4e105dc4f, 2022-01-04) for macOS
Mozilla/5.0 (Macintosh; Intel Mac OS X 12_2_0) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.10 Chrome/69.0.3497.128 Safari/537.36
System Requirements:
Open Source? (Y/N): Y

(if available and applicable)
Executable URL:
Source Repository URL:
Developer:
Product URL:
Software source components:

Name: Python
Version: Python 3.7
System Requirements:
Open Source? (Y/N): Y

(if available and applicable)
Executable URL:
Source Repository URL:
Developer:
Product URL:
Software source components:

Name: Selenium
Version: selenium-server-standalone-3.141.59
System Requirements:
Open Source? (Y/N): Y

(if available and applicable)
Executable URL:
Source Repository URL:
Developer:
Product URL:
Software source components:


Additional Notes(such as, will this software not run on 
certain operating systems?):


2. Equipment-specific information: NA
<create a new entry for each qualifying piece of equipment>

Manufacturer: 
Model: 

(if applicable)
Embedded Software / Firmware Name:
Embedded Software / Firmware Version:
Additional Notes: