File(s) under embargo
Supporting data for “Reinventing Colonial Legacy in A Worlding City: Rethinking Gentrification in Central Shanghai, China”
The notion of gentrification links global forces (e.g. neoliberalism, globalisation) with local responses (e.g. redevelopment, conservation) at the neighbourhood level. With a focus on central Shanghai, this thesis first unpacks how (both symbolic and material) colonial legacy serves as key resources to attract transnational gentrifiers/(re)investments and displace less affluent classes in the contemporary gentrification process. Second, this study examines the constantly changing interplay between the state, market and society to unravel the driving forces, processes, and social consequences in various waves and forms of gentrification. Third, particular focus is given to the role of the (traditional and digital) media in contemporary (re)gentrification, examining the influence exerted by the market, state and society on media narratives and representations.
The dataset consists of web-scraped Weibo data in three case studies. Weibo, launched by Sina Corporations, was selected as the representative social media platform for several reasons. First, Weibo boasts the largest user base among Chinese social media platforms, with over 0.5 billion active users. This vase user base provides a huge amount of information that can be utilised to analyse public perceptions of gentrifying/gentrified neighbourhoods. Second, the features of Weibo are more like a fusion of Twitter and Instagram, making it easier to align findings on Weibo with existing research on Twitter and Instagram. Third, Weibo, the first Chinese-style microblog platform launched in August 2009, provides long-term data spanning over a decade, facilitating the track of changing media portrayals of gentrification over time.
To obtain sufficient data, the Chinese terms ‘JianYeLi’, ‘WuKang Road, ‘AnFu Road’ were respectively chosen as the initial keywords for data collection using a web-crawler tool. Specifically, each scraped post contained a social media ID, a timestamp, textual content, image URLs, and received likes/shares/comments. The initial dataset consisted of nearly 50,000 posts between 2011 and 2021. Since the main focus is on how social media users reflect the gentrification of selected neighbourhoods, forwarded and irrelevant posts were first cleaned, and only the used subsampled datasets were uploaded. As the posts were public-facing, individual user permission was not sought.