Graph Databases and Star Wars
Source: https://www.youtube.com/watch?v=_Tyg68GqZhM
Often when we speak of the social graph, big data and new applications we present them as steps to the epiphany: "Wouldn't it be great if you could do THIS!?" This is a great approach in good times and even in a tough economy this is a fine message for visionaries, as it appeals to one of the two core emotions that may generally underlie crossing the chasm and signing-on to a deal. It appealed, at base, to greed.
That's a great way to get a deal done but it sure isn't the only way. In difficult times, many executives are driven not by greed but by fear. Even in the strongest businesses Executives are staring down veritable Sarlacc Pits of worry, and the winning message is often NOT fulfilling their aspirations at the top of Maslow's pyramid, but calming their fears at Maslow's base.
Google is a great example of "the strongest of businesses." Google has been the most compelling business on the Internet, but today even Google has some real problems. Google's challenges have made it as far as Paul Kedrosky, and Google has one very big, very current problem: Google is weak in "local" and "social" search. In a world of content farms and with the rise of successful walled gardens, Google's Pagerank model is finally exposed as context-limited and not semantical. For local and social search Google created Percolator -- something more real-time than Pagerank/MapReduce. That's a step forward, and Google still might show you the most popular content, but with the rise of content farms such as Demand Media and Answers.com, they may no longer be showing the best content.
In Google's world links cost money, and Google probably never imagined that it would be economically viable to spam Pagerank. The key point here isn't about Google, it's the recognition that in 2011 it's possible to spam anything! No Marketing executive is immune from the question: "How do we keep from getting spammed out of the marketplace?" The answer is to use social linkages to de-spammify their messages and their marketing. "Social" might be a great visionary message, but were pitching fear here: if you don't go social, your marketing message may no longer be seen by anyone!
What does "Social" mean, and why does it matter so much now? In Pagerank, loosely speaking, one link offers the same level of validation as any other link. This is fine in an asocial world, but in our social world we all know that some links count WAY more than others. A billion Google users can't be wrong, but I don't value their opinions nearly as much as those from co-workers, friends and family -- the people close to me. But how can Google tell who is close to me? From the Marketing standpoint, wouldn't it be great to know how close anyone is to anyone else? But how can you know that?
To understand this problem, let's play a game. It's a formerly pretty popular parlor game, called "Six Degrees of Kevin Bacon" and you can play along too here: Six Degrees of Kevin Bacon. The original code for the game can be found at: Pragmatic Programmers - Everyday JRuby, with modest updates for Ruby and Amazon hosting by me. The game is one actor Kevin Bacon brought on himself:
In a February 1994 Premiere magazine interview about the film The River Wild, Kevin Bacon commented that he had worked with everybody in Hollywood or someone who's worked with them.
Our game tests this premise, and you play it by naming any other actor, and trying to span the personal connections that link Kevin Bacon with that actor. Let's try it with my favorite character actor, the late John Cazale (note: Cazale acted in 5 movies during his life, all 5 were nominated for Best Picture, and 3 won. In fact, Cazale acted in one further picture after his decease, and THAT was nominated for Best Picture, too!).
So let's let the computer play:
Not bad! John Cazale has a "Kevin Bacon #" of 2, as he is not Kevin Bacon himself (#0), but he's acted with Bruno Kirby, who has a Bacon # of 1. The game is a fun one, and on playing it you discover that Kevin Bacon is basically correct -- practically the whole movie industry has a Bacon # of 3 or less. The game offers other wrinkles as well: we can look up Athletes as Actors here:

We can also make up our own categories, such as "Famous 4-letter celebrities who also acted in films" such as the Cher-Bono linkage below:
The point here is that social linkages is actually a pretty hard problem, as (in this case) we have each Movie casting many Actors, and each Actor in many Movies. This is not unsolvable but it's a pretty challenging problem if all of our data is kept in a Relational database. If instead our data is stored in a Graph database the problem is a lot easier and possibly a lot faster to solve as well. A graph database, such as the Neo4j database used here, will reasonably have Dijkstra's algorithm for shortest-path traversal in its instruction set, and the command to solve the Kevin Bacon problem can be a one-line command as easy as:
database.shortest_path 'Cher', 'Bono'
This is the command I executed to produce the output shown above. I won't go into the programming or cloud setup in this posting, but if you're curious you can download and review the code at Six Degrees of Graph Databases. To run your own demo you'll need JRuby (as shortest_path draws on core Java libraries) and the Neo4j graph database, but the setup is pretty straightforward from there (and I'll write on it if anybody's interested).
There are a number of key lessons here, and there's a reason I started this posting with a picture of the Cantina at Mos Eisley from Star Wars.
Anything can be spammed -- if you want a close relationship with your customers, you're going to have to get social with them. Social relationships aren't easy to formalize -- a lot of the data is fuzzy, and most of the relationships are many-to-many. Graph databases like Neo4j are a great match for capturing social information, relatively easy to program, and fit well with a dynamic-language "cloud" world.
Social graphing is the kind of problem that offers great solutions if you JUST HAVE THE RIGHT TOOLS!
Finally, about Mos Eisley above. My personal social graph of people I worked with at Oracle probably has nearly a thousand people with a "Repko #" of 1, and untold tens of thousands with numbers of 3 or less. Two key points on that:
- I worked at other tech companies (Apple, HP, MSFT) at various points in the past as well, and have similar communities with each of them as well
- I'm not unusual at this -- we've all worked in the past with lots of great people!
I've often described Oracle as the "Star Wars Bar Scene" of high tech -- at some point everybody in the universe will wander through there. My badge number there was around 10,000, and I believe that if they still number badges that (with Oracle's acquisitions) the badge numbers may top 250,000 now.
We've all worked with lots of great people....