There’s a lot of hype around data scientists. You can blame big data and the cloud. Data scientists are lauded, hunted and positively desired by those wanting to squeeze the most from their information.
In accordance with such demand come large salaries – the average is $123,000 in the US. Like the Yanks of WWII, data scientists are overpaid, oversexed and over here – "here" being the cosy IT establishment we have been familiar with for decades as firms recruit outside the comp-sci arena.
But take a moment: do you actually know what these data scientist actually do?
Finding patterns in big data, yeah, but that's only one small part of a larger subject. Data scientists shape your world in unperceivable ways.
Data science is a broad term that finds a home in web analytics, machine learning, healthcare and biotechnology, amongst others fields. The only commonality is huge amounts of data from which to extract useful information.
We all know that as consumers we are mined for data when we connect to any website or service. What you probably don’t appreciate, however, is the degree to which data science wizards have been deployed, and how far their work extends on the other side of that site or service.
Take for example the happy prospect of booking a holiday. With big data, it becomes a whole new proposition. The website becomes all about the highly personalised experience for you, and only you. Frictionless is the word that is frequently used.
The site automatically adjusts itself around items that may appeal to you specifically. Obviously if you are presented with something that is compelling you are much more likely to commit to its purchase. The personalised experience is a byword for increased profits and also enhanced user happiness. It needs to be done right from the start though.
This is where a data scientist earns his pay. The personalisation effect comes from a number of places. Items that are collected and used include obvious information such as IP, visit frequency, location (even without GPS location is now fairly accurate thanks to extensive work on geo-location.)
What you may not have expected however is how this data is used. You probably knew this much.
Using advanced (usually proprietary) algorithms data science can make some educated assumptions about you and where you buy stuff. An example of this: if you log in from two different places frequently it is usually fairly safe to deduce that one location is home and the other is work.
At this point algorithms can make an educated guess about your socio-economic grouping based on aggregate information from data collected from other nearby IP addresses and their search history, and such like.
Although all this may sound scary, allowing big data logic to make decisions rather than emotion and with more facts at its disposal it may well chose a better holiday for you than you personally would have done without the big data technology.
This data crunching allows the website to present offers that have proven popular with people that not only have the same economic grouping but also location, lifestyle and aspirations. In short, it can potentially pick a better experience.
To build up an even more complete picture (and this is where it gets a bit scary) is that after you have done some initial browsing a batch job in the system can then go out and trawl publicly available databases such as the electoral roll to find out more information about you.
If it finds the data about you it will then enable the offers to become even more personalised. It will know not only your name but also your household make up.
Therefore, if you have two adults and two children under 16 it will know that certain places can have a big appeal. There are other closed databases that hold even more aggregated data than this but these vendors don’t advertise the fact for obvious reasons.
If you leave feedback about a holiday you went on, it will aggregate that and know that similar people will share similar viewpoints.
As the site collects more and more data on you it gets to know you better and better. At some point it may actually know you better than you know yourself and it applies cold hard logic to what it presents, so the system knows that if it presents you with three or four customised offers you will more than likely love at least one option if not more.
That data feeds back for your next visit so it can fine-tune what it shows you in the future.
Find them, hold them
There are, however, big problems in big data land. Firstly, they (data scientists) are in short supply. There are many people who purport to be data scientists but they tend to not understand the big picture, nor have the attitude or love of building the answers to complex puzzles within big data sets.
Next, assuming you can find a data scientist, employing them and retaining them can be difficult. It is currently the “must have” thing for even medium-sized companies to have big data analytics. The problem occurs when the head honcho declares they need a 'big data' setup without fully understanding what big data is about, nor the most important thing: what questions they need answered.
Data scientists get hired and then the CxO starts asking for spreadsheets and PowerPoint presentations. Any decent data scientist will be hot footing it out of there quicker than you can say pivot table.
However, the biggest problem is privacy. Given that so much can be learned about you from so little there is the burning question of how far should data mining be able to go in order to try and sell you something? Within a few visits the sites will have a full and complete picture of you, like it or not.
Most people are prepared to give up some privacy for a highly personalised and frictionless experience, according to one data scientist that I spoke to – Kamil Bartocha, head of data science at Lastminute.com – but companies need to be aware of wide ranging privacy concerns and do all they can to maintain trust.
I concur with Kamil’s stance to a degree but when the system knows you better than yourselves and knows how to appeal to you with just what you want, is it perhaps an unfair fight.
Data science has become a force in technology, like it or not. It can be a force for driving more sales for retailers and banks as much as it can be a force for the selfless cause – drug modelling in Africa.
To quote US businessman John Wanamaker: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”
OK, so Wanamaker also proposed the US buy Belgium in 1915 for $1m; we'll ignore that. But his quote on money will have resonance with many in business and government today, namely those spending money on activities without any tangible evidence of a return or benefit.
Big data and data scientists, therefore, raise the prospect that the children of Wanamaker can (at last) not just uncover which half of what they spend is wasted but also find new ways to make more.
And that’s got to be a good thing for the rest of us working in the next cubicle. ®