Big Data Visualization- Twitter
Big Data Visualization- Twitter
The literature review chapter focuses on previous studies regarding Big data visualization especially techniques used in the analysis and visualization of internet based social media such as twitter users’ credibility. Recent studies focusing on visualizing and analyzing of on twitter content have also been used to reveal fake users. Resources with twitter based cybercrimes such as phishing, farming, and spreading of hate or inflammatory content have also been analyzed.
According to (Cheong, 2011), the existing social media differ both in content shared via the platforms and characteristics. Twitter is one of the most popular social media with unique characteristics and content shared on the platform. However, (De Longueville et al., 2009) notes that the type of content, which includes users’ information, opinions, and reactions, shared on Twitter corresponds to real life events. (De Longueville et al., 2009) also indicate that there exists an immense challenge to twitter content consumers in that information on the platform is highly polluted.
Therefore, extraction of information with good quality from all the generated content is required due to increased presence of inflammatory content, fake images, spam, phishing, advertisements, and rumors on the platform. According to (Ghosh et al., 2012), Micro-blogs such as Twitter are more appropriate for news based information sharing and dissemination because they are normally public; therefore, widens the contents’ audience range. Previous studies have been conducted using varying classical computation strategies such as characterization, classification, ranking, and user surveys to gather more information on the issue of trust on twitter (Zhang and Xiaoqing, 2014).
Some of the past studies conducted on the issue are based on various categories of classifiers such as decision trees, Naïve Bayes, and SVM to establish non-credible information, spam, and phishing on Twitter utilizing network, user, message, and features based on topic on the platform. Some previous researches have also utilized and improved ranking algorithms for questions regarding issues related to trust; for instance spam and credibility (Dilrukish and Kasun, 2014).
According to (Zhang and Xiaoqing, 2014), there exist several techniques of measuring and visualizing user trustworthiness on social media, which include machine learning; graph based, and feature based approaches. (Cheong, 2011) indicates that some of the machine-learning techniques are very useful for analyzing on twitter content and users by providing insights to such information. (Grier et al., 2010) also notes that graphical machine-learning techniques are also very critical because they enhance visualization of users’ credibility. Some of the widely used machine-learning techniques are briefly discussed below.
(Cheong, 2011) defines machine learning as a technique of analyzing data, which automates the analytical model building. Machine learning enables computers to identify hidden insights without having to program them, whereby they utilize algorithms that iteratively learn from data. Although, machine-learning algorithms have been in existence for a considerably long time, the recent need to use complex mathematical calculations to analyze big data, with an improved speed has increased the demand for machine learning techniques. According to (Grier et al., 2010), machine learning evolved from the interest of researchers on artificial intelligence to identify whether computers can learn without having to be programmed. (De Longueville et al., 2009) indicate that there exist three categories of machine learning techniques, which are, supervised, unsupervised, and reinforcement approaches.
Supervised machine learning algorithms perform predictions on a particular set of samples (Arakawa et al., 2014). These kind of algorithms identifies patterns in the value labels assigned to data points. On the other hand, machine-learning techniques organize data into sets of clusters for structure description and simplification of complex data, without the use of labels. Reinforcement approaches utilize every data point to choose an action and later analyze the decision (Arakawa et al., 2014). According to (Corvey et al., 2012), the technique changes its approach with time to acquire best results ever.
18.104.22.168.Naïve Bayes Classifier Algorithm
One of the machine learning techniques is Naïve Bayes Classifier Algorithm, which enhances classification of a document, email, webpage, or lengthy texts. According to (Reyes and Smith, 2015), a classifier is a function, which allocates an element value of a population from one of the existing categories. For example, Naïve Bayes algorithm is commonly utilized in spam filtering. In such a case, a spam filter is a classifier that labels emails as ‘not spam’ or ‘Spam’. According (Chung, 2016) naïve Bayes classifier is one of the most popular techniques that utilize a probabilistic approach to develop machine-learning models especially for description of documents. The method is based on Bayes probability theorem that performs a subjective content analysis.
Figure 1: Naive Bayes graphical representation (Reyes and Smith, 2015).
Previous studies have used applied Naïve Bayes approach to investigate fake users and content on social media especially twitter. Currently, Facebook utilizes Bayes classifier algorithm to analyze updates that express negative or positive emotions. Google also uses the technique to identify relevancy scores and index documents. The technique is also very popular in analyzing and classifying technology related documents and filtering spam emails (Dilrukish and Kasun, 2014).
22.214.171.124. K Means Clustering Algorithm
K means is a very common unsupervised machine learning approach utilized to perform cluster analysis (Alrubaian et al., 2016). The method is iterative and non-deterministic, whereby the algorithm operates on a specific set of data via a predefined cluster number K. K means method partitions the input data into K clusters. For instance, the approach can be to cluster Google Search results for the word Jaguar. In such a case, K means can be used to cluster documents according to the manner the word is used. For example, documents with the word Jaguar implying an animal can be clustered in one category, while those with Jaguar meaning a car can be grouped in a different set. The method has also been heavily applied to determine fake users by clustering content based on similarity and establish the relevance rate of content on internet based social media such as Facebook and twitter (Kim and Park, 2013).
126.96.36.199.Support Vector Machine (SVM) Learning Algorithms
SVM is a supervised method of machine learning, whereby the set of data involved teaches the approach about the classes such that new data can be easily classified. The approach classifies data in to varying classes by establishing a line known as a Hyperlane that separates a data set into classes. SVM algorithms exist in two categories that are linear and non-linear support vector machine learning (Chung, 2016) Linear SVM’s separate the training data set using a Hyperlane, while non-linear algorithms do not utilize a hyperlane. According to (Grier et al., 2010), SVMs are very applicably in analyzing the stock markets, while they can also be used to classify data online-based social media.
According to (Dilrukish and Kasun, 2014), other methods of Machine learning techniques include Apriori, linear regression analysis that performs a comparison of two variable, and decision trees among others.
Figure 2: decision tree (Chung, 2016)
Features based methods of analyzing internet based social media include using text, network, and, propagation, and Top-element subsets. Text subsets include analyzing characteristics of messages such as average tweets’ length and sentiments features such as URLs. The network subsets include message authors whereby the number of friends and followers are included. Propagation subsets include tweets and retweets among other features. Finally, top element subsets include a fraction of tweets containing most frequent hashtags, URLs, mentions, among others (Dilrukish and Kasun, 2014).
According to (Chung, 2016), there exists numerous tools and techniques for visualization of user’s trustworthiness, which include graph, chart, and map based approaches. (Chung, 2016) also argues that trustworthiness techniques and tools are very critical in visualizing user’s credibility and influence, since they create a physical representation of data, thereby improving understanding. Previous studies have focused on several techniques and tools such as credibility and influence approaches whereby strengths and weakness associated with these strategies have been identified.
One of the tools used in visualizing users trustworthiness is Truthy which is an online based platform used for studying diffusion of information on twitter and computing the level of trustworthiness of micro-blogging, streaming publicly, that are related to a particular event, in a bid, to establish misinformation, political smears, astroturfing, among other social media pollution categories. Truthy utilizes a Boolean technique to analyze user’s content whereby, the tool returns a true or false value (Arakawa et al., 2014). Regarding visualization of user’s credibility, the tool is applied in identifying malicious information and graphically represents the data.
The energy function is a reliable tool, which enhances relational learning of large amounts of data from many applications such as internet based social media. (Grier et al., 2010) notes that the energy function involves embedding of multi-relational graphs in vector space that is both continuous and flexible, while enhancing the original data. Some of the techniques related to the function involve encoding the semantics of graphs to assign low energy values to components that are plausible. Graphs from the tool are a very relevant in visualizing data related to particular content (Awan, 2014).
188.8.131.52. J 48 Decision Trees
According to (Jungherr, 2014), decision trees is another robust category of visualizing online data, whereby an algorithm known as an iterative Dichotomiser 3 is used to predict a new data set record’s target variable. The tool utilizes features based on attributes such as length and width of content to predict the objective attribute. The technique is very useful in classifying content and producing figures that improve visibility of data (Arakawa et al., 2014). (Cheong, 2011) also notes that the tool has been widely used to analyze and visualize credibility of online-based social media users including their content due the ability of the method to graphically display the process of classifying input. (Dilrukish and Kasun, 2014) also argues that since the algorithm can be used with other powerful tools such as Weka, a java application developed by Wakaito University in New Zealand, which improves preformatted data classification and visualization of the process, they are widely used in the analysis of on line content, which demand the use of powerful software.
LDA model is a technique that utilizes Dirichlet distribution to identify topics in documents and provide a representation of the topics in percentages (Arakawa et al., 2014). According to (Dilrukish and Kasun, 2014), the technique is highly applicable in visualization of credibility of internet based social media such as twitter, especially in identifying inflammatory or hate content. (Dilrukish and Kasun, 2014) argue that, since the technique can establish topics with offensive words, and provide the data in percentages there by enhancing the process of separating malicious and genuine content, the method improves credibility of content and users visualization.
Bootstrapping is a statistical based approach utilized to analyze the efficiency of data analysis and visualization tools using random samples (Sharf and Saeed, 2013). The technique involves assigning of accuracy measures such as error prediction, confidence intervals, or variance to random estimates or samples. According to (Corvey et al., 2012), the technique is very useful in visualizing internet based social media users, whereby the method estimates the performance approaches used (to classify and analyze data), hence providing immense knowledge about the topic. In addition, included in the output of Bootstrapping is a graphical representation of techniques used to evaluate credibility, hence improving the visibility of data.
Some other approaches used in improving the visibility of users include rating, whereby approaches and user’s accounts are rated using features such as precision and the quality of content respectively (Verma, Divya and Sofat, 2015). Computing percentages of data involved in various analysis is another technique whereby visibility of credibility can be improved. The method may include charts containing the information regarding several mathematical computations and comparisons.
Several tools and techniques exist that can be used to analyze and visualize users’ influence on twitter. Some of the existing tools and techniques have been applied to visualize user relationship and interactions on internet based social media. A number of users’ influence visualization techniques are discussed below.
According to (DilruKish and Kasun, 2014), a geo-scatter map with links, a node-link diagram that is overlaid on a map, is a natural fit for graphs that have geo-located nodes. The points representing the child commit’s user and the parent commit’s user are connected with a semi-transparent line to depict the build-upon relationships. (Anwan, 2014) notes that the lines become more opaque as the build-upon connections between nearby locations increase. A log-scaled circle to highlight diversity at location at the comparison’s accuracy expense is utilized to represent the number of commits made at that specific location. According to (Chung, 2016), the technique has been used to visualize friendships on Facebook and professional networks. Due to the ability to provide detailed analysis of connections between users and ability to graphically represent such relationships, Geo-scatter maps with links are highly applicable in visualizing Users’ influence on Facebook.
Data is one of the components of social media that can be highly in visualizing users’ influence. According to (DilruKish and Kasun, 2014), using tools such as crawlers, seed group of username can be formed, whereby for every data repository, contributor collaborator, and owner usernames, along with branch names can be established. Fresh usernames are utilized to identify new repositories while the branch names are utilized to establish commits. In addition, data from on line based social media can be used to construct, graphs, which can enhance more influence visibility. (Anwan, 2014) argues that content on twitter has been highly utilized in efforts to visualize users’ influence on the platform.
Generating of a matrix of maps from data is another technique used to increase the depth of analysis and improving visualization of users’ influence, which known as small multiples. According to (Anwan, 2014), comparison is the core of quantitative reasoning, whereby, small multiple designs, data bountiful, and multivariate techniques are used to visually enhance the comparison of changes. (DilruKish and Kasun, 2014) notes that the techniques enables visualization of some patterns, which enhance identification of unforeseen influences since they buck established trends. On twitter, visualization of users influence can be done using small multiple techniques by revealing user details such as relationships, activities, locations, among others.
(DilruKish and Kasun, 2014) argues that although liked scatter maps provide a users’ influence visualization technique, whereby they enhance identification of critical patterns, in an intuitive approach, discerning more subtle relationships and connections using the method can be cumbersome and sometime impossible. Therefore, social links utilizing matrix diagrams are required, whereby they enhance improved visualization by minimizing clutter and enable perceiving of edge metrics, using visual encoding based on the honeycomb project. (Chung, 2016) defines a matrix as a grid, whereby every cell is used to represent a link metric while the columns and rows are used to represent nodes. Some of the metrics used in this users visualization technique include followers; the follow link number, asymmetry; relative difference between totals of followers in every direction, and deviation from the expected, which is the relative difference between the totals of actual links compared to the expected links from a random sampling of the node distribution.
(Polat, 2014) indicates that matrix diagrams are robust tools that can be used to visualize users’ relationships and connection on twitter, since they can be used to provide comparisons that enhance an in-depth understanding of the relationships. According to (DilruKish and Kasun, 2014), using the matrix diagrams it can be easy to identify twitter accounts causing much of user influence on the platform, by offering an abstract representation of user relationships.
Regarding Big Data visualization, the energy function discussed above is a tool widely used to improve the visibility of large amount of data such as data regarding internet based social media. According to (Ghosh et al., 2012), Big data visualization can be challenging due to the volume of content involved and the complexity of big data platforms such as browsers and some websites. Therefore, (Grier et al., 2010) argues that robust data analysis and visualization tools are required. One such tool is the energy function that assists in providing an in depth analysis of online data and provides graphical presentation of output; hence highly improves visibility. According to (Dilrukish and Kasun, 2014), possibility of using powerful software to analyze J 48 decision trees process of classifying data makes algorithms highly favorable in the analysis of big data. (Dilrukish and Kasun, 2014) argues that J 48 decision trees classification process is highly visible using software such as Weka.
184.108.40.206. Twitter: A News Media
Immense research has been conducted in efforts to analyze the relevance of internet based social media especially Twitter as an agent of news disseminating. (Grier et al., 2010) revealed that twitter is one of the prominent news internet based social media, whereby eighty-five percent of discussions’ topics on twitter were found to be related to news. In the research, the patterns of tweeting activities and parameters particular to users such as followees and followers analysis versus retweeting/tweeting numbers, relationship was also highlighted.
A study by (Albalawi and Sixsmith, 2015) also utilized topic-modelling approach (unsupervised) to conduct a comparison of news topics extracted against those from New York Times, which is a conventional medium for news dissemination. The study revealed that; although twitter users post a less interest on world news compared to conventional media consumers, they still actively spread news regarding essential world events. (Chung, 2016) also conducted another critical study regarding twitter as a news media using nine hundred news events in 2010-2011, whereby the scholar, demonstrated techniques to map news’ event related tweets utilizing the energy function. The researcher proposed some strategies for mapping the tweets that act as novel event detection methods.
Existence of malware, phishing attacks, spam, and compromised accounts is a primary source of concern regarding twitter information quality. Previous studies have examined numerous mechanisms to filter phishing and spam and proposed a numbers of effective solutions. According to (Grier et al., 2010), one of the most prominent issues on the social media; for instance Facebook and twitter, is Phishing. (Albalawi and Sixsmith, 2015) indicate that every year, genuine users of such platforms lose large amounts of money (millions of dollars) to phishing related frauds. A study by (Chung, 2016) revealed the contribution of URL shortener services such as bit.ly in the spread of phishing whereby the research established that such services enhanced phishing by hiding the identity of links. The scholar also demonstrated the increase in popularity of social media to match the e-commerce related sites such as PayPal with regard to phishing.
The researcher, in a bid to validate the study utilized blacklisted phishing urls sourced out from phish tanks. In a follow up research, (De Longueville et al., 2009) using varying features like URL, tweet, and WhoIs based features, established features, which point to phishing tweets. The scholar using such features, identified phishing tweets with a 92.5 percent accuracy. The primary deliverable of the study was a chrome Extension, which was deployed for identifying phishing on twitter in real-time. (Ghosh et al., 2012) also identified that out of twenty-five million URLs posted on twitter, 8% point to malware, phishing, and frauds that appear on famous blacklists.
The scholar also established that legitimate users’ compromised accounts were the major avenue used in spam spreading compared to spammers’ created dedicated or fake accounts. (Corvey et al., 2012) also conducted a study leading to characterization of farming on twitter and proposed a method of fighting link farming on the platform. According to the scholar, link farming is a technique used to improve twitter accounts rank by linking them to each other. The study also recommended a ranking strategy aimed at punishing users following spammers, whereby the study also revealed a substantial decrease of spammers including their followers on the network. In an analysis of cyber criminals’ ecosystem or community, (Polat, 2014) also established the manner in which the offenders create a minute global network. The researcher nicknamed accounts with extra ordinary large numbers of followings and followers as social butterflies.
In the study, the scholar also proved a method to establish new offenders on twitter from a group of criminals known as criminal account inference algorithm. The algorithm propagates malicious score from users to followers including other social engagement to identify the likely malicious accounts. Using a real world dataset, the study also evaluated the algorithm’s performance. (Alrubaian et al., 2016) also used machine-learning strategies to establish spammers, whereby they utilized URL searches, keyword detection techniques, matching the username patterns, and achieved a 91 percent accuracy. The study also established that the frequency of spammers’ tweeting was slightly higher compared to legit users and their accounts were not as new as expected.
Spamming, on other social media such as YouTube, a video sharing site, is a category of users. According to (Dilrukish and Kasun, 2014), there exists three categories of real users of YouTube, which are, legitimates, promoters, and spammers. (Alrubaian et al., 2016) also insightfully characterized phantom profiles for applications related to gaming on Facebook. Using the characterization, the researcher, identified the differences between activity and behavior of phantom and legitimate user profiles. The study was conducted using an online game known as Fighters club to establish how many of the total user applications were legitimate or phantom.
220.127.116.11. Assessing Credibility/Trust
Previous research by the computer science community has focused on the issues of assessing, analyzing, computing, and characterizing credibility and trust of internet based social media (Dilrukish and Kasun, 2014). One of such studies was conducted by (Chung, 2016), whereby, the scholar developed Truthy, a tool for studying diffusion of information on twitter and computing the level of trustworthiness of micro-blogging, streaming publicly, that are related to a particular event, in a bid, to establish misinformation, political smears, and astroturfing, among other social media pollution categories. In the study, several cases of abuse by twitter consumers were presented using Truthy, which is a live web service based on the above descriptions.
Figure 3: a screenshot of Truthy analysis (Chung, 2016).
Several researchers have also utilized the classical method of machine learning to establish the credibility of online social media content. (Choukri et al., n.d.) established that classifications techniques bearing some form of automation can be applied in differentiating news and conversational related topics and evaluated credibility of such approaches using varying twitter features. Using an algorithm known as J48 decision classification tree, the scholar scored a 70-80 percent precision and recall and evaluated the study results using data perceived by people as the ground truth. Features used in the study include topic, message, user, and features based on propagation, which enhanced observations like tweets with negative sentiments have a relationship with credible news and those without URLs are mostly related to news that are not credible.
(Corvey et al., 2012) argues that apart from the credibility of content shared on online social media, the users’ credibility is also critical. (Ghosh et al., 2012) conducted a study using automated ranking techniques to detect information sources on twitter credibility and identify any trust expertise related to that source. In the study, it was also observed that network structure and content are very prominent features in the ranking of twitter users based on effective credibility.
Some previous studies also focused on analyzing genuine information sources during specific important world events. For instance, (Alrubaian et al., 2016) in an analysis of tweets posted during the Mumbai terrorist attacks, established that the largest number of information sources are unknown and possess a considerable low number of followers; hence, have a reduced reputation on twitter. The study indicated the necessity to come up with twitter information credibility assessing mechanisms that are automated.
In a follow up study, the scholar used SVM Rank, machine learning algorithms, and relevance feedback, techniques for information retrieval, to evaluate credibility of content on twitter. The researcher conducted an analysis of 14 events, with high impact, of 2011, whereby they established that 14 percent of posted tweets regarding an event were while 30 percent possessed situational informational about the event. The study also identified that only 17 percent of tweets related to an event possessed credible situational information about the event.
(Dilrukish and Kasun, 2014), also utilized a supervised Bayesian Network, which a technique for predicting tweets in emergency situations credibility, to analyze tweets generated in the 2011 England riots. In the study, a two-step methodology was proposed and evaluated, whereby step one involved a K-means function to for detecting emergencies and the second step a Bayesian Network structure learning function was utilized to determine the credibility of the information. The algorithm evaluation revealed an improvement compared to new techniques. (Chung, 2016) also used eight different event tweets to identify the credibility indicators in varying situations, whereby the study indicated tweets length, mention tweets, and URLs as the best credibility indicators The study also revealed that during emergencies, such features immensely increases.
(Chung, 2016) conducted a study utilizing a different approach than the one highlighted above, whereby the scholar carried out a survey to establish the perception of users concerning content on twitter. The study involved about two hundred participants to mark what was considered as the credibility, of users and content, indicators, whereby, the research established that people utilize features visible at a glance, such as user’s photo or username, to identify credibility of content (Awan, 2014).
In addition, (Chung, 2016) also proved that, using content alone, users are poor judges of credibility, whereby they are oftenly influenced by other pieces of information such as username. The study also revealed that a disparity exists between features utilized by search engines and those considered by the users as relevant regarding credibility. (Ghosh et al., 2012) also utilized a different approach to identify users with high value of trustworthiness and credibility, whereby the scholar established topic experts on tweeter. The technique used in the study was based on the twitter crowd concepts, that is, twitter lists.
Previous studies have revealed that social media has been utilized to instigate hate. According to (Dilrukish and Kasun, 2014), if inflammatory content is propagated during volatile situations in real life, can have many adverse effects. (Grier et al., 2010) conducted one of the few existing research that analyzed hate related content on twitter or YouTube, whereby the researchers identified utilized semi-automated techniques to establish content used to spread hate on YouTube. (Grier et al., 2010) identified hate users, virtual communities, and videos using social network analysis and data-mining techniques, with a precision of 88%. Specifically (Grier et al., 2010) used bootstrapping techniques to detect hate on YouTube. (Chung, 2016) also used topic modelling and machine learning techniques to detect offensive content on twitter, whereby the researcher-outperformed keyword matching techniques by achieving about 75% of true positive rate. The scholar applied a seed lexicon of offensive words followed by LDA models to discover topics. The study established that most of the words in a specific category of topic were not offensive but formed sex related phrases upon combination with other words.
The content analyzed above reveals various approaches of directly identifying fake users and some other indirect methods by identifying malicious content and actions on twitter. Various credibility analysis and visualization approaches highlighted which include feature and graphical based machine learning techniques, and the energy function. For instance, the LDA model, has been identified as one of the most powerful techniques of identifying topics with offensive content and enabling visualization of the credibility of internet based social media, whereby the method provides a percentage representation of topics with offensive words. Several surveys and case studies have also been utilized to articulate arguments made in the chapter. In addition, various techniques of detecting phishing and spams, such as, analyzing the URLs have been identified. The literature review reveals that a myriad of techniques of identifying fake twitter exists. However, the problem of fake internet users still exists; hence, the need for more efficient techniques of analyzing and visualizing of twitter content.
Albalawi, Yousef, and Jane Sixsmith. “Identifying Twitter Influencer Profiles for Health Promotion In Saudi Arabia”. Health Promotion International (2015).
Alrubaian, Majed, Muhammad Al-Qurishi, Mabrook Al-Rakhami, Mohammad Mehedi Hassan, and Atif Alamri. “Reputation-Based Credibility Analysis Of Twitter Social Network Users”. Concurrency and Computation: Practice and Experience 29, no. 7 (2016): e3873.
Arakawa, Yui, Akihiro Kameda, Akiko Aizawa, and Takafumi Suzuki. “Adding Twitter-Specific Features To Stylistic Features For Classifying Tweets By User Type And Number Of Retweets”. Journal of the Association for Information Science and Technology 65, no. 7 (2014): 1416-1423.
Awan, Imran. “Islamophobia and Twitter: A Typology Of Online Hate Against Muslims On Social Media”. Policy & Internet 6, no. 2 (2014): 133-150.
Choukri, T. Declerck, M. U.Do?an, B. Maegaard, J. Mariani, J. Odijk, and S. Piperidis, Eds., European Language Resources Association (ELRA).
Chung, Jae Eun. “A Smoking Cessation Campaign On Twitter: Understanding The Use Of Twitter And Identifying Major Players In A Health Campaign”. Journal of Health Communication 21, no. 5 (2016): 517-526.
Corvey, W. J., Verma, S., Vieweg, S., Palmer, M., and Martin, J. H. Foundations of a multilayer annotation framework for twitter communications during crisis events. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (Istanbul, Turkey, may 2012), N. C. C. Chair),
De Longueville, B., Smith, R. S., and Luraschi, G. ”omg, from here, i can see theflames!”: a use case of mining location based social networks to acquire spatio-temporal data on forest fires. In Proceedings of the 2009 International Workshop on Location Based Social Networks (New York, NY, USA, 2009), LBSN ’09, ACM, pp. 73–80.
Dilrukshi, Inoshika, and Kasun de Zoysa. “A Feature Selection Method For Twitter News Classification”. International Journal of Machine Learning and Computing 4, no. 4 (2014): 365-370.
France, Cheong, C. C., Social media data mining: A social network analysis of tweets
during the 2010-2011 australian floods. In PACIS (2011).
Ghosh, S., Sharma, N., Benevenuto, F., Ganguly, N., and Gummadi, K. Cognos:
crowdsourcing search for topic experts in microblogs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (2012), SIGIR ’12.
Ghosh, S., Viswanath, B., Kooti, F., Sharma, N. K., Korlam, G., Benevenuto, F., Ganguly, N., and PhaniGummadi, K. Understanding and combating link farming
in the twitter social network. In Proceedings of the 21st international conference on WorldWide Web (2012), WWW ’12.
Grier, C., Thomas, K., Paxson, V., and Zhang, M. “@spam: the underground on 140 characters or less”, In Proceedings of the 17th ACM conference on Computer and
communications security (New York, NY, USA, 2010), CCS ’10, ACM, pp. 27–7.
Jungherr, Andreas. “The Logic Of Political Coverage On Twitter: Temporal Dynamics And Content”. Journal of Communication 64, no. 2 (2014): 239-259.
Kim, Young An, and Gun Woo Park. “Topic-Driven Socialrank: Personalized Search Result Ranking By Identifying Similar, Credible Users In A Social Network”. Knowledge-Based Systems 54 (2013): 230-242.
Polat, Burak. “Twitter User Behaviors In Turkey: A Content Analysis On Turkish Twitter Users”. Mediterranean Journal of Social Sciences (2014).
Reyes, Joseph Anthony L., and Tom Smith. “Analysing Labels, Associations, And Sentiments In Twitter On The Abu Sayyaf Kidnapping Of Viktor Okonek”. Terrorism and Political Violence (2015): 1-19.
Sharf, Zareen, and Anwar Us Saeed. “Twitter News Credibility Meter”. International Journal of Computer Applications 83, no. 6 (2013): 49-51.
Verma, Monika, Divya Divya, and Sanjeev Sofat. “Techniques to Detect Spammers In Twitter- A Survey”. International Journal of Computer Applications 85, no. 10 (2014): 27-32.
Zhang, Yifeng, and Xiaoqing Li. “Relative Superiority of Key Centrality Measures For Identifying Influencers on Social Media”. International Journal of Intelligent Information Technologies 10, no. 4 (2014): 1-23.