Being aware of bias and manipulation in stock market news
In a news landscape incentived to keep readers on the page to serve them ads, here are some steps you can take to stay viligant and objective when researching stocks.
Uptrends Cofounder & CEO
At Babbl, we’re *passionate* about improving the way folks find & act on market-moving news. A core component of this mission has been our work in natural language processing (NLP). We rely on our team’s deep expertise in NLP, machine learning (ML), and linguistics to create a delightful product for our users; a product that enables DIY investors to track sentiment in stock market conversation en masse. This is our quick overview of how we do it.
First, let’s set the table by answering a few questions rapid fire:
Our sentiment scoring is a way to gauge the overall tone or attitude being expressed about a stock or the market at large in online discourse. The theory is that general sentiment can (and often does) have an impact on stock market outcomes — high degrees of news optimism (bullish discourse) can lead to boosted investor confidence and encourage buying activity, while high degrees of news pessimism (bearish discourse) can have the opposite effect.
Babbl’s sentiment is measured for individual stocks, industries, and the market at large. For all of these, sentiment is a compilation of four different indicators that each measure some aspect of a given text snippet’s tone and quality. In order of priority, they are: mood, tense, relevance, and credibility. The composite sentiment score weighs each of these metrics in terms of their absolute value at a given moment in time, their divergence from their historical average, and their rate of change over time; measured on a sentence-by-sentence basis. The composite sentiment score is normalized on a scale from -100 to +100, where -100 represents maximum pessimistic (bearish) sentiment, and +100 represents maximum optimistic (bullish) sentiment. More on each of these four aspects in the section below.
Babbl currently tracks discourse from a variety of online sources: accredited financial news outlets (ie. MarketWatch, CNBC, Yahoo Finance, etc.), social media platforms (ie. Twitter, Reddit), and newsletter providers (ie. Substack) — we refer to this variety of discourse collectively as “news”. New sources are added periodically as they become available, and future sources might include alternative outlets such as Youtube comments, TikTok or Podcast transcriptions, and more. A full list of our current news source coverage can be found here.
Each component and the composite sentiment score are calculated for a given stock or the market at large as soon as new sentences about a stock become available.
Babbl’s sentiment scores are used to gauge the mood of the market. Many investors are emotional, speculative and reactionary, and sentiment is a tool for measuring these emotions and biases more objectively. When combined with fundamental and technical analysis, sentiment analysis helps to form the three-legged stool of investment research — together, all three legs can be used together to inform better stock market decisions.
Now, an overview of what goes into a news sentiment score:
Before even measuring the 5 indicators of sentiment, the very first step is to pull in news content being written about a given stock, sector, or the market at large. Babbl extracts news from the variety of sources mentioned above, and then separates each piece of new news content into what we call “snippets” — these are the component sentences or phrases that make up an article, Tweet, blog post, or comment. Within each snippet, we use NLP to determine its subject entity: this could be an individual stock (ex: Apple), sector (ex: Healthcare), or the market as a whole. Once we’ve identified if a snippet pertains to a given entity, we then calculate its sentiment as a composite of the following four components:
Once a snippet mentioning a given stock, sector, or the market has been parsed, the next step is to calculate its 4 component sentiment indicators as follows:
This is what folks typically think of when they hear “sentiment” — mood is the most crucial piece of the puzzle; it refers to the amount of positive or negative language being expressed about a given entity. This is measured via NLP by detecting positive or negative keywords and associated grammatical patterns (ex: “Tesla looks good” or “Amazon dipped more than the broader market”). Think of this as the “up” or “down” of the sentiment.
The second key here is the time-sense (or more succinctly, tense) that a given snippet’s mood is being expressed in terms of. Sentences about an entity can be written in past, present, or future-tense; each provides a different context for the mood. We think of this as the direction of sentiment, past-tense mood is reactionary, while future-tense mood is speculative.
The third component is relevance, specifically market relevance. Within each snippet, we search for identifier words to determine how relevant the snippet is to a given entity's financial performance. The idea here is to focus more on snippets talking about an entity’s financials (ex: “Apple’s stock fell”) and less on snippets talking about non-financial things (ex: “How to jailbreak your iPhone”). Think of this as more of a filter than a direction.
The fourth and final piece is credibility. This pertains more to the author of a given snippet than the text within a snippet itself (accounting for things like views, follower count, etc.), but does partially account for the language being used in a given snippet. Think of this as the weight of a snippet; after all, not all opinions are created equal.
Once snippets have been ingested and scores across these four dimensions to determine a composite sentiment score, the final step is aggregating individual snippet scores into one holistic sentiment score for the entire entity:
Assuming a given stock has more than a minimum threshold of snippets over a given time window (say N=30 snippets), its sentiment is calculated by simply taking the average sentiment across all snippets pertaining to the stock over the set period of time. This can be done over an hourly, daily, weekly, or monthly window.
One level above the sentiment of an individual stock, sector sentiment is calculated by taking the average sentiment across all snippets pertaining to the group of stocks included in the given sector. For example, to calculate the sentiment of the Healthcare sector, the sentiment of all snippets pertaining to Healthcare stocks (ex: Johnson & Johnson, Abbott Labs, CVS Health, etc.) is averaged to form a singular aggregate Healthcare sentiment score over a given hour, day, week, or month.
Finally, the sentiment of the market at large is calculated by simply taking the average sentiment of all snippets mentioning any stock. An important note here is that this does not include snippets that do not pertain directly to an individual stock.
Last but not least, it feels important to include a few notes about what our news sentiment doesn't directly account for. While we believe news sentiment does implicitly account for micro- and macro-market effects, our sentiment refers only to what’s being expressed via text in online discourse, and does not take into account explicit components relating to stock or market performance, such as the following:
If you made it this far, thank you for reading and getting up to speed on how things work around here. If you have any questions, comments, suggestions, or cheap shots, please let us know by emailing us at firstname.lastname@example.org, and finally — perhaps most importantly — do your own due diligence. We believe market sentiment can be a great tool for managing your stock market decisions, but just as you wouldn’t hire a screwdriver to build your house, you shouldn’t take a news sentiment score like ours to be the only thing informing your investments. Thanks!