In this chapter, we are going to learn about what exactly the Big Data is, the classification of Big Data, and its features along with the suitable examples. Before, we can start Introduction to Big Data, let us refresh computer data first. Traditional data in terms of computer terminology refers to the characters (alphabets or numbers), or symbols on which the computer do the processing in order to get the meaningful outputs. Such data could be transmitted electronically over the internet as data packets and can be stored on the physical drives magnetically, optically, and mechanically for frequent data accesses or as a data backup for future use.
In this tutorial series we are going to cover the below topics:
Big Data Course Syllabus
Tutorial | Introduction to BigData |
Tutorial | Introduction to Hadoop Architecture, and Components |
Tutorial | Hadoop installation on Windows |
Tutorial | HDFS Read & Write Operation using Java API |
Tutorial | Hadoop MapReduce |
Tutorial | Hadoop MapReduce First Program |
Tutorial | Hadoop MapReduce and Counter |
Tutorial | Apache Sqoop |
Tutorial | Apache Flume |
Tutorial | Hadoop Pig |
Tutorial | Apache Oozie |
Tutorial | Big Data Testing |
To start with introduction to Big Data see different examples where Big Data used. The computer data but it is voluminous as compared to the traditional Data. It can be defined as the collection of data that is very vast in size and has the capability to grow multi-fold times exponentially over the period of time. In other words, the volume of data for Big Data is so immense that it can be stored or processed easily as compared to the traditionally available data management tools.
- The stock exchanges generate over terabytes of data every day.
- Social media sites such as Twitter, Facebook, Instagram, TikTok, LinkedIn, etc. generate hundreds of terabytes of data every day as a result of uploading videos, audios, photos, comments, etc.
- A multinational banking system generates Terabytes of payment data globally every day.
- Around 60 to 70 terabytes of data get generated through Boeing, and Airbus Jet engines every hour. If the radar system needs to collect the data for hundreds of flights every day, then Petabytes of data need to be collected and processed.
Classification of Big Data
With introduction to Big Data, it can be classified into the following types.
Organized or Structured Big Data:
As the name suggests, organized or structured Big Data is a fixed formatted data which can be stored, processed, and accessed easily. For example, the maintenance of student records by a university in the form of a fixed table which grows uniformly every year is an organized data. Student’s records here can be inserted, modified, and selected easily without any data processing hassles.
Student Name | Student ID | Address | Phone Number | National Identifier |
APARAJITA | ST123 | Toronto, ON | 647-123 (4567) | 123-456-789 |
MOHIT | ST124 | Mississauga, ON | 905-123 (4567) | 987-654-321 |
Unorganized or Unstructured Big Data:
As the name suggests, organized or structured Big Data is the raw data that cannot be processed and accessed easily. Here, the data storage structure is unknown and it is vast in size. Therefore, it poses many challenges in processing it in order to get the desired output data. A google search engine result is an example of unstructured data where it gets data from multiple sources and in multiple forms such as text, videos, and images to display on the web browser. Many organizations have voluminous data available to them but unfortunately, it is difficult for them to process the data in raw form and reap the benefits out of it.
Semi-Organized or Semi-structured Big Data:
It is an intermediate form of Big Data that occurs between structured and unstructured Big Data. An XML form of a student record is the best example of semi-structured Big Data. Here, the data appears to be organized in the XML form but it does not have the fixed form as a database table in DBMS.
Features of Big Data
Big data has the following features.
- Volume: The size of Big data is very enormous and hence it is considered as different from traditional computer data. As the data size increase over time, the Big Data can still maintain the data volume very easily without compromising the storage, access, and processing of data.
- Variety: Big data enables us to store data in various data varieties such as emails, videos, audios, photos, monitoring devices, PDFs, audios, etc. These different types of data forms are also known as heterogeneous data and without Big Data it poses great difficulties in terms of storage, processing, mining and analyzing data.
- Velocity: Velocity refers to the data generation speed in Big Data. It is the feature of Big data with which it can store and process the data rapidly in order to meet the demands of an organization. It deals with massive and continuous data flow from multiple data sources such as application logs, networks, business processes, social media sites, mobile devices, sensors, etc.
- Variability: Variability refers to the discrepancy which can be exposed by the data at times, therefore Big data can obstruct the process of being able to grip and cope up with the data efficiently.
Advantages of Big Data
The following are the advantages of Big Data.
- It can process voluminous data efficiently and quickly in order to provide improved customer services in terms of customer feedback, etc.
- It can store and process data from multiple sources very efficiently.
- It enables search engines to gather data from social sites such as Facebook, Twitter, LinkedIn, etc. to enable the end-users to get desired results.
- It can provide better operational efficiency for data warehouses to store payment data, trading data, etc.
Disadvantages of Big Data
- Data maintenance cost is slightly higher than traditional data storage using DBMS.
Over to you:
Today we started on introduction to Big Data, it is an advanced technology over DBMS which has limitations in terms of data storage, and processing. Big Data is capable to store voluminous data from multiple sources and multiple forms such as emails, videos, audios, photos, monitoring devices, PDFs, audios, etc. It can easily handle data growth rates with time. Big Data could be organized, unorganized or semi-structured. Big data has the vital features of Volume, Variety, Velocity, and Variability. Big data assist in data mining, decision making based on the business data available to an organization, and it can improve customer services as well.
>>> Checkout Big Data Tutorial List <<<
⇓ Subscribe Us ⇓
If you are not regular reader of this website then highly recommends you to Sign up for our free email newsletter!! Sign up just providing your email address below:
Happy Testing!!!
- Big Data: From Testing Perspective
- Tutorial 3: Hadoop installation on Windows
- Tutorial 5: Hadoop MapReduce
- Tutorial 6: Hadoop MapReduce First Program
- Tutorial 7: Hadoop MapReduce Example on Join Operation and Counter
- Big Data Testing Tutorial
- Tutorial 8: Apache Sqoop
- Tutorial 9: Apache Flume Tutorial
- Tutorial 10: Hadoop Pig
- Tutorial 11: Apache Oozie
1 thought on “Tutorial 1: Introduction to Big Data”
Nice introduction to the topic.