Big data refers to massive, complex collections of data that are continuously growing and are both structured (such as the specs of all of the products that a company sells) and unstructured (such as metadata, social media posts, and web log data). In 2001, Gartner published this definition: Big data is high-volume, and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. This three-Vs definition has recently been extended with an additional V – veracity — expressing the need for quality when using big data to solve problems and make decisions.
One never-ending source of data that organizations rely on is streaming data that comes in to their IT systems from the large, interconnected web of devices they have access to. The Internet of Things (IoT) is part of this growing big data source. Data streaming in from IoT devices — from cars to wearables to mobile phones — already provides a wealth of insight to manufacturers, governments, media outlets, and advertisers about their audiences. Another major source is social media data. This unstructured data can reveal nuanced patterns and preferences and has great value for things like marketing studies, sales and support insights, political forecasting and sociological research. And there are several publicly available sources of data that anyone can use, such as data.gov (U.S. government) and the European Union Open Data Portal.
However, big data is of little value in its raw state. Organizations can only derive value from it by storing, processing and analyzing it and using it to find insights that can influence the products and services they offer. The potential payoff is enormous. Big data is helping businesses anticipate customer demand and build new products and services, and helping others analyze call logs, web visits and social media posts to fix problems with their customer experience. And big data is helping many companies detect patterns that signal fraud long before a human could detect it. Technologies are available to support the special requirements of this kind of project. For example, SUSE Linux Enterprise High Performance Computing offers a parallel computing platform that can handle big data analytics workloads.