Technology Case Studies
Businesses Harnessing the Power of Complex Events Processing
| posted 09-29-2009 |
Average Rating:
|
![]() A new approach to analyzing large volumes of data, something called stream processing, or complex-events processing (CEP), is changing how companies keep up with the world and react to it. The technology’s aim is to sift through large, fast-changing streams of data as quickly as possible and identify patterns and correlations in the data that signify meaningful events or opportunities to take profitable action. In traditional database setups, which form the basis for virtually all enterprise software applications today, data first gets collected, then organized in a highly structured way, and then cross-indexed for rapid searching, and only after all that—hours or even days after its creation—is the data finally ready for any kind of analysis. CEP systems, in contrast, are designed to analyze floods of data virtually at the moment each item is generated, with no pre-processing. That calls for highly specialized software and, in the most extreme cases, specially engineered hardware, too. How fast is fast? Try several hundred thousand messages processed in one second using a standard, single-core microprocessor. But more processors helps: IBM recently unveiled a CEP product called System S that, running a 1,424-core computer, analyzes five million messages per second for customer TD Securities. How it works Potential applications for CEP run the gamut, from algorithmic trading in fast-paced financial markets to interpreting torrents of live battlefield data, from managing far-flung supply chains to enabling massive multiplayer Internet games to securing IT systems and networks against intruders. Curt Monash, principal at Monash Research, Acton, Massachusetts, identifies two broad classes of CEP apps: One, as used in financial trading, centers on low-latency analysis of data—identifying significant data and events near-instantly—while the other focuses on filtering data to find the most significant records, which may get stored for analysis at a later time. In technical terms, CEP achieves its high speed mainly by analyzing incoming data records entirely in main memory, a.k.a. RAM, with no need to call on comparatively slow hard disk drives. Traditional database systems, in contrast, store data on the hard disk, organized as rows and columns, and swap selected chunks in and out of high-speed memory as needed. For more speed, extra processors can be ganged together to work in parallel. And for the most extreme applications, such as analyzing data packets on a network to detect hacker activity, specialized silicon may be required—network processors designed solely for that task. Speed, in short, is of the essence. In fact, by jacking up the speed of data analysis, CEP brings companies a big step closer to what business theorists have described as the “real-time enterprise.” Ideally, the enterprise should monitor changing business conditions moment by moment and, in response, reorient its internal operations and change its business processes on a proverbial dime. Until recently, business intelligence (BI) and data warehousing techniques, building on traditional database management, have helped managers and executives to understand and delve into a company’s past performance, but with CEP, BI tools would illuminate current performance and alert executives to situations that need immediate attention and action. Business intelligence 2.0 Such tools might even fuse data from a variety of independent sources and, in effect, create entirely new information and unanticipated insights. For example, data indicating a particular combination of bad weather and unexpectedly low inventories in a certain geographic region might trigger a call for special logistics to move extra quantities of a much-in-demand product to store shelves in time to support an important retail promotion already underway. CEP-based BI may have especially big implications for Web-based companies, such as Google, Yahoo, and Amazon. They are able to collect mind-boggling quantities of information about their visitors’ behavior—every mouse-click on every Web page. And these firms try their best to keep up. Last year, Yahoo disclosed that it had assembled a 2-petabyte (or 200,000-gigabyte) data warehouse to analyze the activities of the 500 million visitors it serves each month. The Yahoo database processes 24 billion events a day, a huge stream of data. But with CEP technology, these events might be analyzed and acted on almost as they happen—selecting just the right content and advertising for each and every person using the site, say. As usually happens with promising new software technologies, a handful of startup companies have been funded by venture capitalists to pursue the CEP opportunity, even as established database companies—including IBM, Oracle, Sybase, Tibco, and Microsoft—move into the arena. Among the startups are Streambase (founded by relational database pioneer Michael Stonebraker), Truviso (originally launched as Amalgamated Insight), and EsperTech. Though nobody expects CEP to rival the traditional database market in terms of size or importance, it is, for now, one of the fastest-growing and most exciting segments of the overall data management field. More is better One obvious reason for all the excitement is the near-exponential growth in the amount of data that’s becoming available in seemingly every sector of the economy. Thanks to advances in microelectronics, the cost of physical sensors has plummeted, making it more viable than ever to measure temperatures, pressures, and speeds on seemingly every kind of machine and vehicle. BMW, for instance, builds dozens of such sensors into its cars, along with banks of microprocessors to analyze feedback from them. One future scenario: A car might warn its driver of a dangerous driving condition, such as hydroplaning on a rainy highway, or it might alert the car’s dealer to schedule a service visit to check on what appears to be a faulty part. What’s more, radio-frequency identification (RFID) tags are now low enough in cost to help track the movement of goods through complex supply chains—all the way from a factory in China to a distribution center in Illinois to a specific shelf in a Wal-Mart store. But with potentially millions of tagged pallets and boxes in the world, each one registering its location as it passes through a doorway or leaves a truck, traditional IT systems may easily be overwhelmed with data. That has made the RFID industry particularly interested in CEP, and companies such as SAP and Oracle are paying particular attention to the opportunity. Another highly interested party, albeit mum on such matters, is the intelligence establishment. The National Security Agency could use some form of CEP to filter the mountains of data it gleans from phone taps and eavesdropping on global flows of email. The CIA could use the technology to find tell-tale correlations between items in the floods of data it collects from field agents, newswires, and spy satellites. Trucking companies, as analyst Monash points out, are interested in the technology for monitoring the movement of their vehicles—each one equipped with GPS and a radio link back to HQ. “It could provide an early-warning system, discovering if a truck were off-course, lost, or involved in an accident.” As with traditional database management, making pattern-recognition engines reasonably easy to set up and easy to modify over time is one of the major technical problems that CEP engineers have had to solve. It’s one thing to build a program that can search for a specified set of patterns at high speed. It’s quite another to produce a generalized solution whose search patterns can be altered on the fly and not lose any speed. These are difficult technical challenges that have intrigued academic researchers for many years, and it is from their projects that most of today’s CEP companies originate: Telegraph at University of California, Berkeley, for example, and The Aurora Project at Brown University. Still, as technical problems get solved, CEP will likely weave its way into everyday life, perhaps even finding patterns in people’s daily activities and travels, second by second. After all, what is a cell phone but a potential sensor and source of rich data just waiting to be correlated with billions of others? ______________________________ Practical complex events processing Saving lives with complex events processing ______________________________ COMMENTSHow will instant analysis of data change enterprise management strategies and consumers’ experience of the Web? Leave your response in the comments below. |







