MultiLexScaled
High-performance dictionary-based sentiment analysis
By Maurits van der Veen in text preprocessing sentiment analysis
December 10, 2021
Lexicon-based sentiment analysis, using multiple lexica and scaled against representative text corpora.
Easy-to-use, high-quality sentiment analysis. Instead of trying to develop yet another general-purpose sentiment analysis lexicon, we average across 8 widely-used ones that have different strengths and weaknesses. In addition, we calibrate against a set of representative texts and adjust each individual lexicon’s score so that its mean is 0 (the neutral point) and the standard deviation is 1. We rescale the final average so that its standard deviation is 1 as well, to produce a sentiment measure that is readily interpretable (relative to the benchmark used for scaling). MultiLexScaled outperforms other widely-used sentiment analysis dictionaries on a range of different test sets.
The Github repository contains all the code and notebooks needed to apply MultiLexScaled to a corpus of texts, along with a paper explaining the method in more detail.
- Posted on:
- December 10, 2021
- Length:
- 1 minute read, 142 words
- Categories:
- text preprocessing sentiment analysis
- See Also: