Predicting the Popularity Rating of Thai TV Drama by Text Mining of Social Network
DOI:
https://doi.org/10.14456/nujst.2021.38Keywords:
Text Mining, Sentiment Analysis, Multiple Regression, TwitterAbstract
The objectives of this study were to predict the popularity ratings of Thai TV drama programs with a prediction model, based on found and synthesized factors affecting them, and to check the accuracy of the model in terms of Root Mean Square Error (RMSE) of the predicted outcomes. The analyzed data were both structured and unstructured data. The structured data included the TV channels airing the programs, type of drama, on-air time, number of episodes, average time per episode, number of viewers watching already aired programs, number of viewers watching the highlight of already aired programs, and number of viewers listening to program soundtracks. The unstructured data included messages posted on Twitter. The messages were processed by sentiment analysis, and the sentiments found were statistically analyzed together with the structured data by multiple regression, yielding predicted popularity ratings. The results show that comments on Thai TV drama programs in social media significantly affected the predicted popularity ratings of those programs. A factor affecting the predicted ratings was ‘message with positive sentiment’. A factor, the number of viewers watching the highlight of already aired programs, positively affected the popularity ratings when other factors were kept fixed. Another factor, number of viewers watching already aired programs, negatively and significantly affected the popularity ratings (< 0.05). Finally, the RMSE of the prediction model was 0.717 on the training data set containing data from 430,256 people, and the RSME of the prediction model was 0.41 on the test data set containing data from 246,133 people. Our findings may directly benefit Thai TV drama program producers and TV channel administrators in their effort to provide programs that will fully satisfy most viewers.
References
Acharya, S. S., Gupta, A., & Shankar, P. K. C. (2019). TV Show Popularity Analysis using Social Media, Data Mining. International Journal of Innovative Technology and Exploring Engineering, 8(7), 23-26.
Boyd, D. M., & Ellison, N. B. (2007). Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication, 13(1), 210–230. https://dx.doi.org/10.1111/j.1083-6101.2007.00393.x
Breusch, T., & Pagan, A. (1979). A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica, 47, 1287-1294. https://dx.doi.org/10.2307/1911963
D’Agostino, R. B. (1971). An omnibus test of normality for moderate and large size samples. Biometrika, 58(2), 341-348. https://doi.org/10.1093/biomet/58.2.341
Henrique, J. (2018). Get Old Tweets Programatically. Retrieved from https://github.com/Jefferson-Henrique/GetOldTweets-python
Jada, P. (2017). Survey and Analysis of Studies on the Effects of Television Media's Transition from Analog to Digital. NBTC Journal, 4(1), 160-173.
Kim, D., Kim, Y., & Choi, S. (2016). Predicting the popularity of TV-show through text mining of tweets: A Drama Case in South Korea. Journal of Internet Computing and Services, 17(5), 131–139. https://doi.org/10.7472/jksii.2016.17.5.131
Kim, S., Jeon, S., Kim, J., Park, Y., & Yu, H. (2012). Finding Core Topics: Topic Extraction with Clustering on Tweet. 2012 International Conference on Cloud and Green Computing (CGC), 1-3 November 2012 (pp. 777-782). Chinese: Xiangtan.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality. Biometrika, 52, 591-611. https://doi.org/10.1093/biomet/52.3-4.591
Sodprasert, S. (2015). Perception and Feedback Behavior of TV Series Viewers on Facebook. Dhonburi Rajabhat University Journal, 12(1), 65-80.
TV Digital Watch. (2020). Digital TV Revenue. Retrieved from https://www.tvdigitalwatch.com/category/highlight/revenue/performance/
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Naresuan University Journal: Science and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.