Explore chapters and articles related to this topic
Data Pre-processing
Published in Peter Wlodarczak, Machine Learning and its Applications, 2019
Duplicates are instances with the exact same features. Most machine learning tools will produce different results if some of the instances in the data files are duplicated, because repetition gives them more influence on the result [40]. For example, Retweets are Tweets posted by a user that is not the author of the original Tweet and have the exact same content as the original Tweet except for metadata such as the timestamp of when it has been posted and the user who posted, retweeted, it. As with outliers, if duplicates should be removed or not depends on the context of the application. Duplicates are usually easily detectable by simple comparison of the instances, especially if the values are numeric, and machine learning frameworks often offer data deduplication functionality out of the box. We can also use clustering for data deduplication since many clustering techniques use similarity metrics and they can be used for instance matching based on similarities.
Feature engineering for twitter-based applications
Published in Guozhu Dong, Huan Liu, Feature Engineering for Machine Learning and Data Analytics, 2018
Sanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S. Al-Olimat, Manas Gaur, Amir Hossein Yazdavar, Krishnaprasad Thirunarayan
Retweet: Retweet is the task of sharing an existing tweet so that the tweet reaches the followers of the user who shares the tweet. This has become a very common practice among many Twitter users [50]. Retweets can be used to identify influential users [3,10,62]. For example, a retweet pattern among a set of Twitter users can be used to derive features that can help identify influential users and how influence propagates.
Analyzing Twitter communication about heavy precipitation events to improve future risk communication and disaster reduction in Germany
Published in Urban Water Journal, 2021
Leon Netzel, Sonja Heldt, Martin Denecke
As mentioned in the previous section, Terpstra et al. (2012) show that messages originating from official media receive much more attention than from private individuals. Similar findings were made by Vicari et al. (2019) in relation to flooding in Paris. They found that the most popular tweets originate from the media and public institutions. Therefore, they concluded that Twitter users probably prefer to retweet from official sources for reliability reasons. The preference to retweet from publicly recognized organizations was also found by Bruns et al. (2012), who examined the role of Twitter in disseminating and sharing information during the South East Queensland floods in 2011. It seems that public people do not have the necessary reliability to spread important information as Twitter users are more likely to share information originating from official and well-known institutions than from individuals. Nevertheless, Suh et al. (2010) showed that retweeting can depend on content features, like URLs or hashtags, as well as on contextual features, like the number of followers.
‘The bot predicted rain, grab an umbrella’: few perceived differences in communication quality of a weather Twitterbot versus professional and amateur meteorologists
Published in Behaviour & Information Technology, 2019
Patric R. Spence, Autumn Edwards, Chad Edwards, Xianlin Jin
With over 328 million monthly users (Aslam 2017), Twitter is one of the largest social networking platforms in the world. Twitter allows both humans and automated bots to interact with other users (Hwang, Pearce, and Nanis 2012; Varol et al. 2017). Automated bots (Twitterbots) are programmes that can produce Twitter posts, follow other Twitter users, and retweet posts. It is estimated that up to 48 million Twitter users (15%) may be bots (Varol et al. 2017). Chu, Gianvecchio, Wang, and Jajodia (2010) argued that the automation of a Twitter feed can obscure the type of agent (human user or bot) producing the feed. Many Twitterbots mimic human behaviour, making the task of distinguishing bot versus human more complex (Boshmaf et al. 2011). Twitterbots have been designed to tweet about news (Lokot and Diakopoulos 2016) and politics (Woolley 2016), call volunteers to action (Savage, Monroy-Hernandez, and Höllerer 2016), and parody literature (Bollmer and Rodley 2016), to name a few applications. Yet, relatively little is known about how people judge these digital communication sources in comparison to their human counterparts.
Data mining of IoT based sentiments to classify political opinions
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022
Aqsa Manzoor, Zahoor Ur Rehman, Muhammad Shaheen, Muhammad Zeb Khan
The work of Anatoliy et al. on the Canadian 2011 federal election found the spots on IoT where polarisation existed based on twitter data. Both the content analysis and social network analysis were considered for determining polarisation across the IoT (Gruzd & Roy, 2013). The authors extend their analysis by examining the polarisation dynamics as to how users become more polarised based on their comments (Bessi et al., 2016). They attempted a quantitative analysis on a huge dataset of 12 M users and compared videos consumption patterns supporting conspiracy and scientific like extraordinary news on YouTube and Facebook. Furthermore, Alexander et al. present a computational method for political polarisation as well as examine people engagement during French and U.S. presidential campaign (Hanna et al., 2013). They produced partisan alignment on the retweet network and hashtag usage. They found evidence in the case of French for unambiguous political polarisation while the U.S. case shows that less partisan polarisation exists. Aibek et al. worked on predicting the political preferences of the user based on S-IoT behaviour about political parties (Makazhanov et al., 2014). They proposed prediction language models on the base of a variety of behaviour and contextual features. User preference was evaluated based on tweets alignment using language models of political parties. Both a Logistic regression and decision tree J48 classifiers are used for the classification. Moreover, it was assessed in the perspective of 2013 Pakistan and 2012 Albertan real elections that outperformed in text classification, F-measure and sentiment analysis approaches.