When you choose a social analytics provider, you have the right to ask questions about the service you’re getting, even when you’re using free services. With that in mind, we wrote up a list of questions to ask when you’re checking out a new social analytics product. Got any we missed? Share ‘em in the comments below, or drop us a line.
1. Where does your data come from?
Not all social data sources are made equally; what you can get from building a tool on a platform’s open API is vastly different from what you can get if you have access to that company’s full firehose of data. So what do you need? If you’re looking for a quick overview of recent data, then something built on an open API will work for you. If you want something more in-depth, you should consider a provider who works with a licensed data partner like Gnip or DataSift. These data resellers provide commercial, licensed access to the full data streams from platforms like Twitter, Tumblr, Foursquare and others, giving you the highest quality data possible.
Keep in mind that services built on a licensed data stream are also more reliable than something built on a free API: you don’t have to worry about hitting rate limits or missing important data. Again, if you’re just looking for enough recent information to keep track of general trends or overviews, then you don’t need to pay for extensive, real-time access to full-fidelity data– but remember the difference if your needs change.
To illustrate: if you want an idea of how many people are talking about a documentary the day after it aired and what they’re talking about, then something built on a a free API would be fine. If you made the documentary and want an extensive review of the conversation before, during and after your documentary aired and a deeper dive into the different facets of the conversation around it, you want something built on a stable, more comprehensive data source.
2. What is the firehose and do you have access to it?
A firehose is full access to all the data from a platform – that’s everything. In the case of Twitter, very few analytics providers have direct access to the full Twitter firehose, mostly because it’s unnecessary, but also because it’s quite costly. Gnip and DataSift have full firehose access, as do a very rare few others. If your analytics provider says they use the Twitter firehose, they actually probably do not. Clarify what they mean by that; the word “firehose” is misused a lot.
Instead, most serious analytics providers will have access to a full-coverage stream of data built on the firehose. This is a full-fidelity stream of tweets that matches their needs, based on a set of search queries or other filters. The result is a smaller stream of only the data they need – including all tweets that match their filters – without all the unrelated or irrelevant data.
This is a case of “you get what you pay for”; Twitter doesn’t have the infrastructure or impetus to give you access to all of their data for free, so through agreements with companies like Gnip and DataSift, a third party can gain full access to the social data they need. But this kind of data isn’t free, so be sure to choose the option that meets your needs. And if you’re using a free tool, chances are good that tool is not built on the firehose in any way.
3. What kind of data coverage do you have? Is it a sample, or the full census?
We can use Twitter as example again here, since they have several different forms of data access. Twitter’s Search API, for example, is an index of recent tweets from a window of the past few days and does not include all tweets (say, for example, you wanted an overview of what people have been searching about “overnight oatmeal in a jar” on Twitter for the past month; this wouldn’t cover your needs). You can read a more technical explanation from Twitter about the Search API here.
Other data streams are intentional portions of the full firehose, which are useful for sampling and other use cases. Twitter has a decahose option, for example, that includes a random sample of 10% of all tweets. It’s great for research, but not ideal if your needs require full-fidelity coverage.
The only full-coverage options are through a data provider like Gnip, or from a partnership with the platform itself. This could be through the full firehose (which only a couple companies actually have), or through a full-coverage, keyword-based data stream. Ask your analytics provider if you’ll have full-coverage access to your tweets, or if they use just a sample.
4. Does the data comply with the platform’s terms of service (ToS)?
The great unread novel of our time is the complete terms of service to just about anything. You’ll want to do your homework with your data provider, however, and be sure that their product does indeed comply with the ToS of your platform of choice. An easy way to do this is to check and see if they are a partner with them, or an approved or preferred provider. You can also check with the data resellers like Gnip for this. You’ll also want to be sure it says this on the platform’s website, and isn’t just a wild, false claim on the data provider’s. If both sites say they work together, it’s a safe bet they’re following the ToS, or the platform wouldn’t have partnered with them or given them a title of approval. If it seems unclear, don’t be afraid to ask for clarification. If they’re not willing to talk about it, go elsewhere so you won’t run the risk of your provider being shut down and disappearing with all of your analytics.
5. Are the metrics actual counts or just estimates?
Finally, even if your provider has access to high quality data, you want to be sure they’ve built a tool that gives you the best possible measure of the specific data you’re looking for. (You want the best results around “overnight oatmeal in a jar”, not “overnight oatmeal in a crockpot”, after all.) If you test several different providers and get wildly different results, compare those results with how these companies are telling you they generate their results. If they don’t have documentation that tells you how their tool works, or that documentation is vague and confusing, that’s a bad sign.
If in comparing results from two companies that are both built on the Twitter Search API, you notice one is returning wild estimates and the other is giving you the most accurate count they can, definitely go for the latter. Don’t go for the tool that returns estimates just because the numbers are bigger. You don’t want your marketing plan or quarterly report to be based on imaginary numbers.
Bonus: 6. What data access does Union Metrics have?
We are a certified Plugged In To Gnip partner, which means we have commercially licensed, full-coverage access to Twitter and Tumblr data. That’s reliable, reputable data you can count on, both now and in the future. Here’s the breakdown.
- Our TweetReach Pro Trackers have Gnip PowerTrack access – that’s full coverage of all public tweets in real time for any search terms you enter. That means no missed tweets and no sampling.
- Our TweetReach snapshot reports use the Twitter Search API, so they’re great for quick estimates of recent activity, but are limited to about 1500 tweets from the past week.
- Our TweetReach premium historical analytics use Gnip’s Historical PowerTrack. That gives us full access to any public tweet in Twitter’s history, dating back to the very first tweet posted in March 2006.
- Finally, with Union Metrics for Tumblr, we consume the full Tumblr firehose. That means we process 100% of all public posts, notes and other Tumblr activities.
Have any questions about our data access? Please just ask!