This post was written by Union Metrics CEO and Founder Hayes Davis.
We started TweetReach in 2009 with a simple idea: to provide a simple report that showed people the reach of tweets about any topic. Since that time, we’ve grown far beyond that simple reach report and added comprehensive tracking, as well as many other metrics and insights. But reach is still something we care a great deal about, so I wanted to tell you about some changes we’re making to the algorithm we use to calculate reach.
This is a long post, so here’s the executive summary:
- We’ve built a new and extremely robust model for calculating reach that will replace our current algorithm.
- Historical reach data won’t change, and newly calculated reach will change only slightly in most cases relative to historical trends.
- This new algorithm allows us to increase our data limits across all TweetReach Pro plans.
- These changes go into effect next week.
For those of you who are interested in learning more about how we built our new algorithm, read on.
Setting the stage
Reach is a complex metric with many definitions across vendors and industries, so let me explain how we think about reach on Twitter. For us, reach is the total number of unique Twitter accounts that received at least one tweet about a topic in some period. Knowing this helps you understand how broadly your message is being distributed on Twitter.
For most of our existence we’ve measured reach by using Twitter’s API to determine the actual Twitter IDs of users who received tweets about a topic. From that copious raw data, we then applied a dose of math and lots of computational horsepower to derive our reach measurement. While this brute force method produces a very reasonable estimate for reach, it has some serious drawbacks in terms of meeting the needs of our customers. It slows down our reporting for customers pulling data on ad-hoc periods and – while our data limits are generous relative to our competitors – it meant we had to place stricter data limits than we wanted on our TweetReach Pro plans.
In addition to these increasingly frustrating drawbacks, Twitter has announced a major set of technical changes to their API. Included in those changes are additional restrictions on the API calls we make to determine the raw data we use in our reach calculation. So instead of working around those API limits and continuing with our brute force approach, we decided it was time to get smarter.
Investigating the data
At TweetReach, one thing we have is data – lots and lots of data. This means that we have an extraordinarily large archive of information about how campaigns work on Twitter, which goes back years and is unique to us. From these data and our experience, we know that the reach of a Twitter campaign is essentially a function of the number of unique contributors (users tweeting), how large their follower bases are, and the overall number of tweets. The question is: What are the mathematical parameters of that function?
We started our investigation by looking at what we call the “potential reach” of any conversation on Twitter. This is the maximum possible reach of any conversation if all people who tweet about a topic have no followers in common. While it provides an upper bound on reach, it’s obviously flawed; the assumption that no one has followers in common just doesn’t make common sense. It is, however, a good starting point, so we put it in a scatter plot to at least see if there was a relationship between potential reach and actual reach:
The way this graph turns upward at the end shows us there’s not a clear linear relationship in this data, but there might be if we plotted this on a log-log graph.
There is a nice positive linear correlation after all. However, there are also some pretty absurd numbers. In fact, some of those “up and to the right” data points in the first graph show a potential reach above 2 billion (nearly 30% of the world’s population and more than 8x Twitter’s 250 million monthly active users). As it turns out, this is what many in our industry call “reach”. But we knew we could do better.
Armed with the notion that potential reach had some value, we set out to combine that with other data to build an algorithm that could predict reach. We experimented with many different approaches that we applied to tens of thousands of data points derived from real Twitter campaigns. And after many iterations, we’ve developed an extremely robust model that explains 99.51% of the variance in reach on a Twitter campaign.
Below is another scatter plot (with a trendline) that shows our reach prediction model applied to a test data set.
The data have a nearly 1:1 positive linear correlation, and there are no crazy outliers. This means we can predict an accurate reach with an extremely high degree of confidence without having to resort to brute-force methods.
What does this mean for our customers?
For the vast majority of our customers there will be very little noticeable impact to reach. Most of you won’t see any change at all. But a few of you will see some small changes. We will not be altering our reach calculations for historical periods, so some of you may notice your future reach increase or decrease slightly when compared to historical levels. And since no model is absolutely perfect, a small set of customers may see somewhat larger increases in reach for certain campaigns. If you have any questions at all about a change in your reach, don’t hesitate to contact our support team and we’ll be happy to take a look!
But best of all, these changes bring some significant benefits to our TweetReach Pro subscribers. The first benefit is that viewing ad-hoc periods within a TweetReach Tracker will now be much faster than before. The second, much more exciting benefit, is that we’re now able to increase our data limits for TweetReach Pro plans.
We’ll be rolling these changes out next week and we’ll be communicating with you along the way. We’re extremely excited to share the results of this work with you – our customers! If you have any questions, please let us know.