HKU Data Repository
post_data.sql (876.7 MB)

Protest-Related Posts on the LIHKG Forum from June 10 to July 11 2019

Download (876.7 MB)
posted on 2023-05-02, 09:30 authored by Qichang MaQichang Ma, Kwan Nok ChanKwan Nok Chan, Wai Fung LamWai Fung Lam

This dataset contains all protest-relevant posts on the LIHKG forum between June 10 and July 11, 2019. The dataset comprises a substantial corpus of 2,389,590 individual posts that are organized into 49,658 threads and were contributed by 12,624 distinct users. 
Note: all data could be publicly accessible in the LIHKG forum. 

Data key fields: 

  1. thread_id: Unique identifier for a thread.
  2. cat_id: Identifier for thread category.
  3. user_id: User ID who created the thread.
  4. item_data_reply_time: Date and time of the reply to the post within the thread data.
  5. item_data_user_id: ID of the user who posted within the thread data.
  6. post_text_token: Token of the thread data.
  7. push_count: Whether contain any of the following terms: "push", "pish", "posh", "pash", "psuh", "up", "tui", "推", or "幫推". 
  8. issues_pred: Strategic framing identified in the thread by the Bayesian algorithm.
  9. topic: Substantive topics identified in the thread by the LDA model.


Usage metrics

    Research Postgraduates