THE “SMALL” WORLD OF “BIG” DATA

By Sumant Mandal

I just returned from DLD (Digital-Life-Design) Munich, a conference that brings together social leaders, scientists, entrepreneurs, investors, artists and creatives from around the world. The theme of this year’s conference was “big data”.

Interestingly, most of the speakers were not only from the US, but actually live in California (so much for global themes!). Some of the highlights of the great line-up were Yahoo’s new COO Henrique De Castro, PayPal alumni Peter Thiel and Max Levchin, “big data” experts DJ Patil and Padmashree Warrior, and Verner Wogels from Amazon. For me, some interesting conversations were about trends in global E-commerce, personalization and media companies that face issues in accessing a digital audience.

But most intriguing for me were the informal talks and exploration of data.
The Hive, my new incubator with T.M. Ravi, is anchored on the observation that every foreseeable application that lives on top of data will eventually need to filter vast amounts of data into decipherable and practical information.

The analogy most often used in big data today is about “teenage sex” — everyone thinks everyone else is doing it, yet no one really knows how to do it. While data-storing and -processing open source technology is available, creating the “stack” is harder than it seems. If the rules engines and algorithms come up with the right cohorts, matches and recommendations, and the “stack” gets to work, the real value in big data will be with the companies that know “the right question to ask”. Applications that are built to ask the right questions will create successful executions and win.

With The Hive, we will co-create, incubate and invest in these applications. Our incubation and investment philosophy is anchored around our three I’s of big data – infrastructure, intelligence and invention.

The infrastructure opportunities are pretty self-explanatory.Companies like Cloudera, MapR, and Hortonworks are positioned to bring hardened, enterprise-class open source solutions to customers who want to build a Hadoop stack for their in-house needs. The primary use case is a platform that allows large customers to use the stack to create their applications on top of big data. Analytics companies would fit under this category. Cetas, a company that we incubated and sold to VMware, was an initial success that gave us insight into the opportunity in this market.

We see opportunities to create versions of the stack that are either optimized for real time, streaming data, etc. This blog post by T.M. Ravi and Dhruba, key players at The Hive, explores new areas of innovation in this part of the stack quite well.

The second bucket of innovation, intelligence, is where the action is today.
The world of online advertising, shopping and gaming are all generating tons of data in the form of log files. The adoption of mobile apps for smart phones is exaggerating the amount of data being collected so there is no alternative than to create intelligence stacks on top of Hadoop-like infrastructures.

At the same time, new approaches for security, marketing, CRM and better ways to access and use customer data, like enterprise applications, are all being re-written, renovated or adapted to include signals that come from “big data”. We are very active in exploring, creating and launching companies that create these technologies. For us, this is the “new wine in old bottle” type of innovation. Target buyers, budgets and needs are clearly defined.

Recently, I’ve been thinking about a new bucket of data driven apps that I am calling “inventive” (for lack of a better word). Part of this thought is coming from conversations with exciting entrepreneurs and other investors. It’s also driven by what I am hearing and inferring at panel discussions at DLD and other conferences.

There is a whole host of new sets of data that never existed before: data that is being generated by machines, or humans that have seamless access to machines. This gives rise to a set of applications that could not have existed before the ability to capture this data, and then create a value proposition on top of it.

For example, the car service company Uber allows drivers in cars to share their availability in real time. Recalling the “intelligence” of our three I’s, the company creates a map overlay with real time feeds and allows consumer demand to be served through a simple web app. It is a new application that takes a “new” data source and creates efficiency in the world of transportation. A host of other companies like Airbnb fit the “collaborative consumption” area too. I believe there are many such other opportunities where “new wine in new bottle” kinds of innovation can be created.

Here’s a challenge to the followers of The Hive.

Where do you think we should create a company that fits the third bucket?

If the idea is good, we will help create and fund it.