IBM has taken out the corporate checkbook and instead of using $100m to buy back a chunk of its shares back from Wall Street has decided to allocate the funds to new research and development efforts to help customers chew through big data.
The $100m, an IBM spokeswoman confirmed to El Reg, is incremental funds, not just boasting about money that was already allocated to IBM Research for various big data projects. In fact, the money is being allocated for pure research, such as what was done in prior years to take the open source Apache Hadoop big data chewer and convert it into InfoSphere BigInsights, a project that involved over 200 researchers at IBM and took more than four years to create a framework for throwing unstructured data of the type generated by regular companies (not just hyperscale Web application providers) and turn it into data the top brass can use to run the business.
The most famous Hadoop-derived project, of course, is the Watson parallel question-answer system that took on humanity to play the Jeopardy! game show in January and won. Watson married the Apache Hadoop data chunking and processing system with Apache UIMA, an information management framework that IBM created back in 2005 and that has subsequently been contributed as open source code to the Apache Project. UIMA is short for Unstructured Information Management Architecture, and it performs the natural-language processing that parsed text and helped Watson figure out what a Jeopardy! clue was about.
UIMA can organize other unstructured information, such as audio and video streams. Hadoop, of course, was created by Yahoo! software engineers after they analyzed Google's research papers on its MapReduce techniques for sifting through the content on the Internet to index it for its own search engine and to support various other applications.
IBM has not commercialized Watson yet, but it is working on it. The company is working with Nuance, which owns the popular Dragon speech recognition software and which has created a clinical language understanding (CLU) engine, to turn Watson from a game-playing superstar to a medical assistant. The Nuance CLU engine can record and transcribe what doctors say to it as they discuss their patients, and can also cope with structured and unstructured data and put that data into patient records. IBM and Nuance are going to plug this CLU front end into a clone of Watson and start feeding it medical data.
The goal back in February, when the partnership was inked, was to have a commercial product--and it is a fair guess that it will be a service or closed source software--available within 18 to 24 months. Doctors at Columbia University Medical Center and the University of Maryland School of Medicine are working with IBM and Nuance to figure out how to integrate a Doctor Watson appliance into medical rounds in hospitals and patient visits to doctor's offices.
The incremental $100m in R&D money will go into new technology and services research; it is not being pumped into the BigInsights, Watson, or even the InfoSphere Streams "System S" streaming server. The latter machine, which made its debut in April 2009, was designed to chew through real-time data feeds (like stock tickers, news services, and such), culling them for information. Those products already have their funding and have either been commercialized or in the process of being turned into something with a serial number and a price tag.
As you might expect, IBM would not be specific about where it was putting its incremental R&D money for big data research.
IBM has over 200 mathematicians working at its Research division, and over 8,000 consultants dedicated to business analytics and optimization (BAO); the company expects for these areas to generate $16bn in revenues annually for it by 2015. That BAO area will account for about 20 per cent of IBM's revenue growth in the next five years, according to Mark Loughridge, IBM's chief financial officer.
In addition to announcing the funding for big data analytics research, Big Blue's Global Services behemoth said it was eating the company's own dog food. IBM announced twenty new services for customers that use internally developed analytics engines to help them better cope with their data center infrastructure, doing predictive analytics on servers, storage, data center capacity, and cloudy workloads running in the data center.
It would be interesting to see what these services cost and how well they work. To your typical system administrator, such know-it-all services could be annoying, like Microsoft's "Clippy" office assistant agent and about as valuable. Or valueless, in the case of Clippy. ®