On the software side of the question, the vendor with the largest marketshare by far in advanced analytic tools is SAS, which held 35.2% of the market in 2010, according to IDC figures. IBM with its SPSS unit held the second highest share at 16.2% while Microsoft was third with just 1.9% of the market. The also-ran commercial competitors tend to slam SAS in particular as the highest-cost provider. At least one comparison of statistical puts the vendor's statistical package at the top of the price heap at about $6,000 per user for first year and about half that in subsequent years. That's multiples of the cost of some of the other vendors' software.
Among the score of smaller vendors that each have less than 1% of the market are startups Alpine Data Labs and Revolution Analytics, both of which are using low software prices among their competitive weapons, as they try to grab share of a market for advanced analytic tools that grew 8.7% last year, according to IDC stats.
Alpine Data Lab's starting price is $100,000 per year for 20 users for a subscription, but that's for a big-data in-database deployment that's tough to compare to the other two. Revolution says a $25,000 per-year deployment on a low-end server will comfortably support 8 to 10 years. Without sharing any of the prices quoted by others, I asked SAS for its latest entry-level pricing structure and the figures weren't as pricey as the competitors suggest. More on that below.
Founded last year, Alpine was incubated and spun out of Greenplum, the massively parallel processing (MPP) database vendor acquired by EMC last year. EMC is now among a handful of venture capital investors in the company, which entered the U.S. market in May.
The company's product is Alpine Miner, and given the company's MPP heritage, it's no surprise that the emphasis is on in-database processing. As the name suggests, this approach handles iterative modeling and scoring steps inside the database, taking advantage of MPP processing power and avoiding cumbersome and time-consuming movement of large data sets from the database off to a separate analytic server for analysis, and then copying results back to the database.
The Greenplum database (now featured in EMC-powered appliances) is of course one of the databases Alpine Miner works with, but a 2.0 version of the product released last week added compatibility with Oracle Exadata and the PostgreSQL open-source database. It’s a Java-based product, and the upgrade also added time-series analyses, support for repeatable user-defined functions, and support for C or R programming, in addition to Java.
Alpine says its hallmark is ease of use, with a visual interface that lets "business users" select icons representing various analytic functions that can be run within the database against large data sets. No need for writing code or many of the kludgy steps associated with rival analytic products, says Alpine. I'm guessing those business users will have to be a least hip to the basic concepts and methods of statistical and predictive analytics, even if they don't have to be hard-core data jockeys or code slingers. Alpine Miner costs $100,000 for a 15-user perpetual license plus 22% maintenance per year. If you prefer expensing this cost, you can also subscribe to the on-premises software for $100,000 per year for 20 licenses with no maintenance fees.
Just how broad and deep can a newbie product be? There are 15 common modeling techniques available, including sampling, logistic regression, linear regression, decision tree, neural network, time series, and lift analyses. In the scoring vein, predictive operators apply logistic and linear regression, naive bayes, tree, and neural network models to dataset prediction.
Alpine Miner is essentially a "greatest hits" selection of the most popular algorithms and functions. There's little doubt it will add more functionality, but that's it for now.
SAS's entry-level SAS’s Analytics Pro product (detailed below for price comparison) supports 16 statistical methods as well as a battery of data-visualization, mapping, and plotting options. But when you're ready to go deeper, SAS has more than 200 software products and applications, with lots more algorithms and techniques available.
The open source R programming language for statistical computing is even more extensive, with more than 3,000 community-developed analytical applications and more than 4,000 of user-created packages available with specialized statistical techniques, graphical devices, import/export capabilities, and reporting tools
Source: Information Week