These images don’t provide definitions, however, and definitions help us to understand what Big Data really is and where it is coming from. Big Data is unstructured data, which due to its nature takes up more storage space and requires new technology to house, and analyze.
Where is the big data coming from?
Big data is coming from everywhere. The social media phenomenon is one huge source with 1 billion Facebook users globally while twitter has hit the 200 million mark (Source: http://mashable.com/2012/03/06/facebook-growth-slows/). According to the YouTube site, they have over 537 million videos available for viewing, with new content being added at a rate of one hour of video every second.
Besides purely social media sites, another source of big data is typical business enterprises. With the cost of storage dropping due to an ever-increasing number of choices to store data, corporations now have the ability to store data which was previously lost, and the choices for storage have expanded to meet the demand. What types of unstructured data does business have?
General business information
- Human resource files
- Email archives
- Project files and documentation
- Customer service correspondence
- Legal documents
- Shipping manifests
Industry specific big data
Some big data is derived from industry specific needs. Examples of specific industries known to have big data issues are:
- Healthcare Industry--Medical records including scans and images which can be accessed by medical professionals all over the country.
- Insurance Industry—Insurance companies now routinely photograph for claims purposes and these images can even travel with the claims process allowing customers to access the status of their claims and the process of the repairs online.
- Shipping industry—Probably everyone has tracked a package and looked up the scanned signature of the receiver.
- Media industry—companies in the media industry leverage their digital assets online and thus must maintain the data in storage and in databases for easy access to recall when needed.
- Travel and Hospitality industry—some hotel chains maintain that they can serve you better by maintaining your individual preferences on file. In addition, thanks to the world of providing ratings for everything under the sun, the hospitality industry can maintain customer feedback on their various properties with the goal of benchmarking and improving satisfaction.
- The energy sector—oil and gas drilling equipment are now producing data in real time to broadcast various parameters of the drill. Devices read and transmit data to allow for early warning of problems in the field.
Where is the big data going?
Currently the solutions regarding the storage of big data are outpacing the data analytic tools. Some companies manage their data storage in house and have merely ramped up the number of servers utilized to house the data.
Other companies may find that outsourcing the task of storing big data is a better option. One thing is certain, however, and that is that the mere existence of Big Data is driving innovation, and creating a lot of opportunities in the Information Technology sector.
Cloud computing is one of the outflows of big data. Data storage in the cloud makes sense to some businesses because the storage of the data can be centralized and the access can be distributed to all end users.
What are the leading technologies to utilize big data?
Hadoop, is the name of the technology developed to manage big data, fundamentally, Hadoop allows software to be run in a distributed manner across very large datasets, so that thousands of nodes of computing power are leveraged to process the data much more quickly than if just a single or a small number of nodes were used ( http://radar.oreilly.com/2012/02/what-is-apache-hadoop.html ). The engine of Hadoop, MapReduce, efficiently leverages the power of a network of computers to push work to available nodes for a processing task. This engine, originated at Google to reduce the time required to create web search indexes.
Hadoop is being utilized at many of the large companies including Facebook, LinkedIn, The New York Times, American Airlines, AOL, Twitter among others.
The technologies developed thus far to leverage large datasets involve the ability to search and retrieve information.
How can organizations extract value from big data?
Before the value can be extracted from these enormous datasets, each organization who is capturing data must have good data management processes to scrub and store the data. Many companies lack the resources to complete this fundamental step.
A comprehensive benchmarking study from The Economist Intelligence Unit Sponsored by SASindicates that the experience and value derived from big data is variable based on the individual business, the particular business model, and the discipline they have adopted around their data processes.
In this study only 22% of the 586 senior executives interviewed would characterize their organization as putting nearly all of the data that is of real value to good use. While 53% said they leverage about half of their organization’s valuable data.
Also quoted in the research findings is Stan Lepeak, Research Direct in KPMG’s Shared Services and Outsourcing Advisory group who notes, ”The process of capturing is actually relatively easy, and these firms have gotten very good at it over the last 10 to 15 years.” He notes that the cost of the actual data, as well as the storage and data warehousing products needed to collect them, has dropped dramatically over the last decade. “But a number of them are struggling to extract value from the data. In particular, many are failing to organize them properly so that they can be analyzed and queried. And often they don’t have people with the skills to interpret the results.” You can find this complete study in pdf file format at: http://www.managementthinking.eiu.com/sites/default/files/downloads/SAS_BigData_final_0.pdf
Some data managers concede that while they believe that some of the data they capture has very high value, other information may be completely worthless. Their jobs are so fast-paced, however, it is not possible to make the case that some data lacks value while the transactions accumulate constantly. Showing the value (or lack thereof) is not the responsibility of the IT teams.
The most common problem among companies who fail to extract value from their data is that they have too much data and too few resources, 45% of the respondents to the Economist Intelligence Unit Study cite this issue as their biggest challenge.
There is an emergence in the discipline of data science which incorporates ideas from computer science, mathematics, statistical analysis, data visualization and social science, to meet the demand for data scientists which is anticipated due to the increasing prevalence of Big Data. Hum…maybe I can help NYU create such a program.
Does everyone believe in the value of big data?
There are data skeptics who believe that the talk of big data is a bunch of hype. That it is never really necessary to capture all of the data associated with some process precisely because of the volume. Since such volumes of data are unlikely to ever be mined to reveal their value, a sample of the data should be sufficient to determine the value of saving entire transactional histories. There will always be skeptics in the world, and their influence looms larger when a new concept is in the formative stages.
In addition, some IT leaders believe that for their organization, there just isn’t enough data for it to be classified as “Big Data.” For this group, it can be argued, that the Big Data movement is less about the absolute size of the data being managed, and more about the new tools and practices that are being deployed to maximize the efficiency of scrubbing and processing the data.
I hope this helped clarify for you what big data is, and how we will utilize it in the near future.
Rhonda Knehans Drake
President, Drake Direct