Sentiment Analysis using Naive Bayes with Bigrams

ADNAN, RASHID HUSSAIN and MOHD, ABDUL HAMEED and S. FOUZIA, SAYEEDUNNISSA (2014) Sentiment Analysis using Naive Bayes with Bigrams. In: International Conference on Advances in Computing and Information Technology - ACIT 2014, 04-05 January, 2014, Bangkok.

[img]
Preview
Text
20140326_091347.pdf - Published Version

Download (524kB) | Preview
Official URL: https://www.seekdl.org/conferences/paper/details/2...

Abstract

With the rapid growth of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. Sentiment analysis extracts, identifies and measures the sentiment or opinion of documents as well as the topics within these documents. The Naïve Bayes algorithm performs a boolean classification i.e. it classifies a document as either positive or negative according to its sentiment. We have already seen by Sayeedunnisa et al [1], that the application of Naïve Bayes trained on high value features, extracted from a bag-of-words model, yields an accuracy of 89.2%. This paper studies the application of Naïve Bayes technique for sentiment analysis by including training of bigram features to improve accuracy and the overall performance of the classifier. We also evaluate the impact of selecting low vs. high value features, calculated using the concepts of Information Gain. Our dataset constitutes of tweets containing movie reviews retrieved from the Twitter social network, which were obtained and analyzed on a cloud computing platform. Our experiment is divided into three steps; the first step constitutes of selecting high value features (words) from our bag-of-words model. The next step involves the identification and calculation of the probability of co-occurrence of words within the bag-of-words to derive a set of bigrams. We then used this set and the original features to re-train and test our classifier. In the final step, we selected the most informative features (unigrams + bigrams) using a Chi-Square scoring function, which yielded the best result with accuracy at 98.2%, positive precision 98%, positive recall 98.4% and negative recall 98%. It is evident from the results, that Naïve Bayes performs the best when including only the most informative (high value) features which constitute of both unigrams and bigrams for training.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Naïve Bayes, Information Gain, Sentiment Analysis, Social Network, Twitter, Cloud Computing
Depositing User: Mr. John Steve
Date Deposited: 10 May 2019 11:29
Last Modified: 10 May 2019 11:29
URI: http://publications.theired.org/id/eprint/2142

Actions (login required)

View Item View Item