post_data.sql (876.7 MB)
Protest-Related Posts on the LIHKG Forum from June 10 to July 11 2019
dataset
posted on 2023-05-02, 09:30 authored by Qichang MaQichang Ma, Kwan Nok ChanKwan Nok Chan, Wai Fung LamWai Fung LamThis dataset contains all protest-relevant posts on the LIHKG forum between June 10 and July 11, 2019. The dataset comprises a substantial corpus of 2,389,590 individual posts that are organized into 49,658 threads and were contributed by 12,624 distinct users.
Note: all data could be publicly accessible in the LIHKG forum.
Data key fields:
- thread_id: Unique identifier for a thread.
- cat_id: Identifier for thread category.
- user_id: User ID who created the thread.
- item_data_reply_time: Date and time of the reply to the post within the thread data.
- item_data_user_id: ID of the user who posted within the thread data.
- post_text_token: Token of the thread data.
- push_count: Whether contain any of the following terms: "push", "pish", "posh", "pash", "psuh", "up", "tui", "推", or "幫推".
- issues_pred: Strategic framing identified in the thread by the Bayesian algorithm.
- topic: Substantive topics identified in the thread by the LDA model.