5 updated questions to ask your data analytics provider

When you choose a social analytics provider, you have the right to ask questions about the service you’re getting, even when you’re using free services. With that in mind, we wrote up a list of questions to ask when you’re checking out a new social analytics product. Got any we missed? Let us know on Twitter @UnionMetrics, or drop us a line.

1. WHERE DOES YOUR DATA COME FROM?

Not all social data sources are made equally; what you can get from building on a platform’s open API is vastly different from what you can get if you have access to that company’s full firehose of data. So what do you need? If you’re looking for a quick overview of recent data, then something built on an open API will work for you. If you want something more in-depth, you should consider a provider who works with a licensed data partner like Gnip or DataSift. These data resellers provide commercially licensed access to the full data streams from platforms like Twitter, Tumblr, Facebook and others, giving you the highest quality data possible.

Keep in mind that services built on a licensed data stream are also more reliable than something built on a free API: you don’t have to worry about hitting rate limits or missing important data. Again, if you’re just looking for enough recent information to keep track of general trends or overviews, then you don’t need to pay for extensive, real-time access to full-fidelity data– but remember the difference if your needs change.

To illustrate: if you want an idea of how many people are talking about a documentary the day after it aired and what they’re talking about, then something built on a free API would be fine. If you made the documentary and want an extensive review of the conversation before, during and after your documentary aired and a deeper dive into the different facets of the conversation around it, you want something built on a stable, more comprehensive data source.

2. WHAT IS THE FIREHOSE AND DO YOU HAVE ACCESS TO IT?

A firehose is full access to all the data from a platform – that’s everything. In the case of Twitter, very few analytics providers have direct access to the full Twitter firehose, mostly because it’s unnecessary, but also because it’s quite costly. Twitter acquired Gnip in 2015, and cut off firehose service to other providers. Anyone interested in access to Twitter data now has to work directly with them, which we do through our previous relationship with Gnip. If your analytics provider says they use the Twitter firehose, they actually probably don’t. Clarify what they mean by that; the word “firehose” is misused a lot.

Instead, most serious analytics providers will have access to a full-coverage stream of data built on the firehose. This is a full-fidelity stream of tweets that matches their needs, based on a set of search queries or other filters. The result is a smaller stream of only the data they need – including all tweets that match their filters – without all the unrelated or irrelevant data.

This is a case of “you get what you pay for”; Twitter doesn’t have the infrastructure or impetus to give you access to all of their data for free, and they clarify that most don’t need access to the firehose but that “creative use of a combination of other resources and various access levels can satisfy nearly every application use case”. In-depth levels of data aren’t free, however, so be sure to choose the option that meets your needs. If you’re using a free tool, chances are good that tool is not built on the full data stream in any way.

twitter snapshot short
Our snapshot reports use Twitter’s public Search API, which includes up to 1500 tweets from the past few days. More on this in the last section.

3. WHAT KIND OF DATA COVERAGE DO YOU HAVE? IS IT A SAMPLE, OR THE FULL CENSUS?

We can use Twitter as example again here, since they have several different forms of data access. Twitter’s Search API, for example, is an index of recent tweets from a window of the past few days and does not include all tweets (say, for example, you wanted an overview of what people have been searching about “overnight oatmeal in a jar” on Twitter for the past month; this wouldn’t cover your needs). You can read a more technical explanation from Twitter about the Search API here. The bottom line is that it covers relevance to your search over completeness.

Other data streams are intentional portions of the full firehose, which are useful for sampling and other use cases. The Streaming API, for example, allows you to monitor or process tweets in real time. You can read a more in-depth breakdown of APIs here.

The bottom line is: Ask your analytics provider if you’ll have full-coverage access to data for a chosen platform, or if they use just a sample. And remember that a sample may be all you need to fit your needs now, but full-coverage may be necessary in the future.

4. DOES THE DATA COMPLY WITH THE PLATFORM’S TERMS OF SERVICE (TOS)?

The great unread novel of our time is the complete terms of service to just about anything. You’ll want to do your homework with your data provider, however, and be sure that their product does indeed comply with the ToS of your platform of choice. An easy way to do this is to check and see if they are a partner with them, or an approved or preferred provider. Most platform websites will list who they work with and your data provider should list the same in turn. If both sites say they work together, it’s a safe bet they’re following the ToS, or the platform wouldn’t have partnered with them or given them a title of approval. If it seems unclear, don’t be afraid to ask for clarification. If they’re not willing to talk about it, go elsewhere so you won’t run the risk of your provider being shut down and disappearing with all of your analytics.

5. ARE THE METRICS ACTUAL COUNTS OR JUST ESTIMATES?

Finally, even if your provider has access to high quality data, you want to be sure they’ve built a product that gives you the best possible measure of the specific data you’re looking for. (You want the best results around “overnight oatmeal in a jar”, not “overnight oatmeal in a crockpot”, after all.) If you test several different providers and get wildly different results, compare those results with how these companies are telling you they generate their results. If they don’t have documentation that tells you how their product works, or that documentation is vague and confusing, that’s a bad sign.

If in comparing results from two companies that are both built on the Twitter Search API, you notice one is returning wild estimates and the other is giving you the most accurate count they can, definitely go for the latter. Don’t go for the product that returns estimates just because the numbers are bigger. You don’t want your marketing plan or quarterly report to be based on imaginary numbers.

BONUS: 6. WHAT DATA ACCESS DOES UNION METRICS HAVE?

Ever wondered where we get the social data we build our analytics on at Union Metrics? Here’s a quick rundown on where our data comes from and what that means to you.

  • Twitter: We get all our Twitter data directly from Twitter! We’ve had a relationship with Gnip since 2011, and now that Gnip is owned by Twitter, we have a direct relationship with Twitter. Specifically, our Trackers access raw Twitter data directly through Twitter’s commercial real-time streaming API, which means full-fidelity ongoing coverage of any tweets you’re interested in. Our historical analytics are built on Twitter’s commercial historical stream, which is full access to the entire Twitter archive. And our snapshot reports use Twitter’s public Search API, which includes up to 1500 tweets from the past few days.
  • Tumblr: Our Tumblr analytics are based on the full-fidelity, commercially licensed Tumblr firehose. We’ve been consuming that firehose since the summer of 2012. That means you’ll have full coverage of all posts for any blogs or keyword-based topics you want to monitor on Tumblr. We’re also a Tumblr preferred data partner, which means that Tumblr has certified us as legit.
  • Instagram: At this time, Instagram does not provide a commercial firehose. However, we’ve built a custom robust system that works with the public Instagram API to deliver the closest thing to a full-fidelity streaming experience you’ll find for any Instagram analytics in the market. This means you’ll have full coverage analytics in real-time for the accounts and hashtags you’re tracking.
  • FacebookWe use the Facebook Graph API to access their Insight data. This means you get full coverage for any Facebook pages you have admin credentials to. 

If you have any questions at all about our data, where it comes from or how it’s treated, please don’t hesitate to ask! We’re here any time you need us.

Try a free TweetReach snapshot report to analyze any Twitter hashtag, account or keywords.