« Graph Databases and Star Wars | Main | NoSQL Next Up: Hadoop and Cloudera »
Monday
Jan102011

NoSQL and the Cloud: Enabling Modern Marketing

A conversation is a dialogue, not a monologue. -- Truman Capote

In the earliest days of the World Wide Web, if you could just present images and text of your products and company then you were mission-accomplished for Marketing on the web. User interest made this modest offering compelling -- enough that every company had to put up a website and an initial level of user expectations were set.

In the second era of web marketing your web presence would now interact with your users. Interested users could submit information to you on the web, get specific information back, complete simple transactions and each of these actions began to build a custom web conversation.

Every user action on the web leaves electronic footprints, and the ability to read these footprints analytically is what now enables a third era of web marketing. In the first era the website really didn't care who you were, and in the second era the site managed an increasingly-rich cookie to keep its history of you. In the third era that cookie is reverting back to a unique user:session identifier as the back-end now manages a rich store of user information. As the web evolves from Informational to Social, marketing has already passed from caring about who you are to seeking what you want, and our views on what you want are increasingly determined from who you know and what your influences are.

The new marketing era takes advantage of new techniques to capture and analyze data, and offers a rich new toolset to divine business information from this rich data collection. Let's take a look at the What and the Where of modern web marketing.

The What - Web Data Tools and Data Stores

Every user interaction with a web site -- every link clicked, every search entered, and every transaction started leaves a detailed record, and this data can be mined to produce better customer understanding and more-focused marketing and sales efforts. To support a conversation with your customers your web tools need to support a rich set of services:

  • Componentization and templates for site content
  • Workflows that support rapid modification and publication of content
  • An API for programmatic access and integration
  • Intelligent meta-data associated with each content item

Componentization and templates are critical for product management in web marketing, as you will want to track user and customer conversations at a specific content or SKU level. Workflows and APIs are essential to making your web presence a living document and escaping the static-page past. A rich, powerful data store is essential to supporting content and ongoing customer conversations, and the data model for web marketing is one of the areas that has undergone the greatest transformation in arriving in the current web marketing era.

The Relational data model -- a model originally designed for 50's-era accounting data and long the backbone of Enterprise computing -- in many cases no longer fits the needs of modern marketing. For rich, schema-less customer interactions and analytics a rigorous relational model just doesn't make sense anymore. To better-fit the new generation of rich, unstructured user data a new family of data store approaches become prominent, characterized by the explanation that they are "Not Only SQL" -- the "NoSQL" family of data stores.

NoSQL solutions have risen to prominence in many retail and social web companies because the rigor and restrictions of a purely Relational model simply don't match their data processing needs. We can see this by looking at the data processing challenges that a company like Facebook faces:

  • 570 billion page views per month
  • 25 billion pieces of content, served by more than 30,000 servers
  • More photos than all other photo sites combined -- More than 3 billion photos uploaded every month, 1.2 million uploaded per second

Facebook is clearly not a 50's-era accounting department, and its data processing needs are vastly different than the ACID rows and columns that characterized the Relational data era. Facebook has adopted a rich set of tools to meet these data challenges:

  • Memcached. Facebook runs thousands of Memcached servers with tens of TB of cached data at any point in time
  • Cassandra (now replaced by HBase). Distributed storage with no single point of failure
  • Hadoop and Hive. Used for massive data analysis and marketing analytics

These data architectures are key to Facebook's growth and scalability and they underlie the growth of other companies like Yahoo, Foursquare and Twitter as well. These companies may represent the frontier of "big data" tools, but the core technologies that underlie their growth are generally available and are finding wide experimentation and adoption in the broader business community.

NoSQL approaches make sense for problem domains outside the traditional relational world, and can give vastly better performance for certain families of uses:

  • Frequently-written, rarely read data (like web hit counters, or data from logging devices or space-probes) work well in key-value stored like Redis or Voldemort, or document-oriented databases like MongoDB
  • Frequently-read, seldom written or updated data (such as the Facebook statistics above) benefit from several NoSQL data approaches: Memcached for transient data caching, Cassandra or HBase for searching, and Hadoop and Hive for data analysis
  • High-availability applications which demand minimal downtime do well with clustered, redundant data stores like Riak or Cassandra
  • Data that will be sync'd across multiple locations can benefit from the replication features of a database like CouchDB, MongoDB or Tokyo Cabinet
  • Transient data (like web sessions and caches) do well in transient key-value data stores like Voldemort or Memcached
  • Big data arising from business or web analytics that may not follow any apparent schema but which will still require rich (possibly parallel) querying will do well in the family of access tools like Hadoop

NoSQL tools are particularly well-adapted to the latest generation of content management systems. A good CMS is about much more than just "individualization" and serving up general web content. A state-of-the-art CMS today is focused on providing better answers two basic questions:

  • What are consumers looking for? and matching that information with
  • What do you know about them?

New data technologies enable the best modern CMS's to capture and map consumer intent to deliver relevant online ads across multiple channels. New CMS applications and data technologies enable massive scale—and dramatically improve customer touch and conversation.

The Where - Web Marketing in "The Cloud"

The cloud -- virtualized web hosting available in increments -- is a great fit for web marketing because it originally arose out of the needs of web retail. Amazon's retail data processing has long been focused on responsiveness in the extreme: "Black Friday" (_Note: the day following the US Thanksgiving holiday, a retail crush that is traditionally the start of holiday shopping_). The processing power to meet the peak demands of "Black Friday," coupled with virtualization technology from offerings such as Xen enabled Amazon to bundle up processing power in virtualized chunks, and thus was born the Amazon Web Services offering, that along with similar offerings from competitors has created the market known as "cloud computing."

With cloud computing, a web browser is all that is required to create online servers and server farms, storage, queuing, monitoring, database, backup and recovery and electronic commerce: in short, an entire virtual data center, available by-the-hour. With the cloud it's possible to spin up market tests and rich data analysis for modest hourly fees with no capital expenses.

So what can the cloud provide? The cloud provides the best imaginable for product trials, focus groups and general customer analytics and "predictive marketing." Cloud systems:

  • Offer low latency and fast (sub-50ms) response times
  • Can map/identify visitors across touchpoints
  • Can support specific campaigns or marketing events with no capital costs
  • Are highly available and scalable to support massively parallel data processing
  • Offer secure access to first-party data and integration to third-part data stores
  • Support new families of analytic tools that can be applied to customer "big data"

The magic does not lie in cloud-parts: practically all the components used for cloud apps run and offer the same benefits in a cloudless world. Still, the cloud offers unique opportunities: Suppose you have an customer-data analysis application ready for pilot, but with all your customer data it'll take 100 servers to run. Here the cloud can come to your rescue -- spin up the app on AWS on 100 servers over a long weekend, and your life-changing pilot will come in at under $1000. Try comparing that to the cost of renting data center space for 100 servers, buying or leasing all the computer and network hardware, setting it all up, running the test, and then tearing it all down again in a long weekend.

What makes the cloud magical is its flexibility -- and when combined with web standards and tools and modern development practices, it's possible to "solve" the IT equation -- specific rich solutions generated quickly and reliably, and built from standard parts that are familiar to any IT shop.

With the right plans and tools it's possible to engage in an ongoing conversation with your customers. You website and systems track their interests, and data from their web interactions and transactions can mesh seamlessly with the data you already keep in your existing systems. With the right tools and a strong implementation, the web becomes your interpersonal channel to your customers.