IBM defines the role as an evolution of the typical data analyst, with similar training in computer science, statistics and analytics, but with added business acumen and the communicative abilities to relay their findings to CIOs, and in turn CEOs.
That definition itself is evidence of the role's importance. These wizards of ones and zeroes, where they are employed effectively, are charged with communicating their findings to senior decision makers.
The data scientist role is described as 'part analyst, part artist' by Anjul Bhambhri, Vice President of Big Data Products at IBM. "A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It's almost like a Renaissance individual who really wants to learn and bring change to an organisation".
Analytics solutions provider Pivotal sees the data scientist blend as a mixture of both soft and technical skills. From the technical side there should be a basis in mathematics, statistics and machine learning, as well as being into computer science and understanding technology. But you need to have knowledge of the domain you are working with, be it finance, bioinformatics or digital media.
Data science is now where computer science was a decade ago, says Noelle Sio, Senior Data Scientist for Pivotal - formerly EMC's Greenplum. The skillset may have seemed superfluous and the demand for such skills was certainly not widely comprehended when universities began offering degree courses, she tells AMEinfo.
The science of making money
As the New York Times noted last month, North Carolina State University have been offering a master's degree in analytics since 2007. Fully 100% of last year's graduates had job offers - 84 in total. The average salary offer was $89,100, and surpassed $100,000 for those with prior work experience.
But why do we suddenly require experts just to process business data when existing desktop solutions have been sufficient for so long?
"The limit of [Microsoft] Excel is about a million rows and other statistical software can run between one and ten million rows of data, although a colleague once told me his computer literally caught on fire while trying to do that. I've certainly brought down my fair share of big systems trying to calculate too much data," says Sio.
For most businesses, we're talking about petabytes of data - trillions of rows of data. This is information that can take well over a day to either transfer or back up, never mind face thorough interrogation to extract relevant information for business intelligence.
"The end result of a data science project isn't just a beautiful model, but a business action. The input for a data science team is a business question - how can we sell more? The data science team can turn that from a business problem into a maths problem, though it's not just about doing it in theory but figuring out how to do it in practice and executing that."
It may be that companies are reluctant to initially draft in a data scientist because of a lack case studies and understanding, but it's more likely that they would simply struggle to find one. Such experts are scarce and Sio herself predicts we will see more university degrees and graduate training programs emerge in the near future to fit the need.



Steven Bond, Reporter



