Big Data – A Sinking Ship?!?
The Disaster
Stockholm 1628. It is warm and sunny on this day in August. Everyone who can, makes it to the piers to witness the historic event. Many even travelled for days to be part of the moment when the proud of the Swedish nation setsets sail. The Vasa is the most powerful navy vessel that the Swedish navy ever took into service. It has an unmatched firepower and might be the decisive factor in the raging war with Poland-Lithuania. Considering its costs, it better be! The 64 massive bronze cannons and the rich decoration with hundreds of painted and gilded sculptures cost a fortune and make the Vasa a huge asset to the Swedish Kingdom.
On this clear and sunny summer day, the Vasa sinks on her maiden voyage, 1300 meters after setting sail, when the first light gust of wind fills part of her sails. Some 30 people drown.
What Happened?
How could this moment, which was supposed to be triumphal, turn into a devastating catastrophe?
The answer is shockingly simple. The design of the ship was not suitable for the heavy weight of the large number of canons that set the ship’s centre of mass too high for the ship to be stable.
Another Golden Age
388 years later, history repeats itself once again. Just like during the days, the Dutch refer to as “the Golden Age”, we are living in the middle of a time of adventures and opportunities. The ships of the VOC (Verenigde Oost-Indische Compagnie) brought back not only precious goods from their journeys to distant lands, but also not to be underestimated insights on discovered cultures, flora and fauna as well as important naval knowledge.
Today, most companies find themselves surrounded by an ocean of data. Just like in the past, sailing this ocean can be an adventure leading to new fantastic business opportunities. In addition, new insights and trends can be obtained from the journey for the management to facilitate the right decision making. Another Golden Age! However, the journey is tricky and requires knowhow and skills. Just like for the Vasa it is easy to capsize and sink with missing knowhow and the wrong approaches.
Less Is More
There is another lesson to be learned here: less is sometimes more. Even for data analysis! While it is widely believed that more data gives you necessarily more and more reliable insights, this is in most cases a dangerous assumption and more often than not simply wrong! If you have sufficient large statistics, i.e. huge datasets, modern algorithms can create reliable insights even with polluted and incomplete data. But the vast majority of companies do not have enormous datasets like Google or Facebook. They do not analyse hundreds and thousands of Terabyte of data per day to deliver internet-search results or analyse their clients’ behaviour. Their datasets are a thousand and even million times smaller. Here, a good data quality is essential. ‘’Garbage in, garbage out” remains a valid statement for the vast majority of analyses. The most effective and efficient approach is usually a ‘light’ and transparent analysis that is easy to verify and understand, based on a few but powerful variables. Trying to ‘squeeze’ all your data and variables in a black box and leaning on the results is a risky enterprise and in most cases doomed to fail without prior knowledge and thorough understanding of the data. Less is sometimes more, the predictions of a predictive model, based on a machine-learning algorithm, for example, might be significantly worsened by the addition of irrelevant variables.
The Vasa might have played an important part in the war with a few cannons less on board.
Prevent yourself from the risk of sinking and ask for our advice on your (potential) data analysis.