Like cloud, mobile and social media, big data has had a big effect on organizations and is reshaping the way we do business. But what is big data, and how does it differ from the normal data companies have in their networks?
Big data is a term for datasets that are so huge and complicated that traditional data processing applications just aren’t up to the job of dealing with them. The problem with dealing with data that exceeds storage or compute power is nothing new. New strategies and technologies are needed to gather, organize, and process insights from the massive datasets.
Elements of big data
Depending on who you ask, there are many different elements to big data, but the most common are the four Vs: volume, velocity, variety, and veracity. You can also include a fifth V: value.
Volume: The main quality of big data is its volume. Not long ago, it was employees that generated the bulk of data found in organizations. But now data is generated by networks, systems, social media and Internet of Things devices. The volume of data that needs analyzing is huge.
Variety: There are a lot of different sources of data as well as different types, both structured (data that typically comes from a database, and is therefore well-organized and clear) and unstructured (data from everywhere else, like Twitter, which is more chaotic; it could be emails, photos, videos, audio, or documents). This variety of unstructured data creates problems for the storing, processing, and analyzing of data. This is a fundamental concept of big data, and big data tools seek to process unstructured data and make sense of it.
Velocity: With such a wide range of data coming from different sources, it is no surprise that the pace at which this comes into an organization is important. The flow of data is huge and continuous. Email, text messages, social media updates and credit card transactions arrive every minute of every day. Real-time data should be processed and analyzed in order to make valuable business decisions. This requires highly available systems with failover capabilities to cope with the data pipeline.
Veracity: Such is the volume, variety and velocity of the data coming in, the challenge can be to evaluate the quality of the data. This influences the quality of the resulting analysis that comes from this data. With a big data project, help is needed to ensure data is clean and processes are in place to prevent dirty data from accumulating.
Value: When the four Vs are put together, will any insights you collect from analysis be worthwhile to your organization? In the end, it is not the data that matters, but how smart you are with that data. An organization can have a lot of data, but if it is not used intelligently, it will not deliver any value to the business.