The differentiation between Big data with hadoop and Big data without Hadoop . Basic concepts
Hadoop was created in 2003 to be open source and free. It was actually Introduced in 2008. It also won a record to sort a terabyte of data. In the web world, this is one of the Biggest news but already most of the web world was already using MySQL and object brokerage. In the enterprise world, nobody cared. At and T for example also used the terabyte to hangle data which is quite enlarged or Big.
Even that segment also had VLDB at that point in time which is generally known as Very Large DataBase.The point is in the enterprise world terabyte database is known and used but very rare.
At the very same point of enterprise requires a huge amount of data and a larger capacity shareable database.
ok, first I just want to make you understand the differentiation between Big data and Hadoop.
Big Data?
Basically, the Internet is full of data that is classified into two types like Structured and unstructured data. The size of data generated per day, in general, is basically around 2.5 quintillion Data Bytes. And it is also expected 1.7 megabytes of data for every single person who will be generating data per second by this year. collecting this data and storing it in a huge database is challenging. There are various tools to manage Big data. The main aspect of the challenge would be visualizing the data and analyzing, transferring it, sharing, searching, storing, curating, and capturing it.
Big data is classified into three types :
Structured data, Unstructured data, and Semi-Structured data
Unstructured data :
These data will not be basically arranged in structure and it's also quite difficult to analyze. It will be a mixture or combo of various data like images, videos, or audio files.
Semi-Structured data :
This will be a combo of both structured and unstructured data, there won't be any structured format in this like XML.
Structured data :
This one will have a proper and structured format. This type of data will be much organized RDBMS Schema will be used which makes it easier to process the data and analyze it as well.
There are around 7 types of V in Big data :
The first V is known as variety in Big data there will be varied data such as emails, images, audio, video, sharing, likes, comments, and even more.
The second one would be the velocity of data the speed at which the data is generated is very huge, Facebook users would generate around millions of views per day.
The third V is volume actually the term big data got its name from the huge or large set of data produced every day Walmart produces around 2.5 petabytes of data from the customer transaction every day.
Veracity basically portrays or depicts the uncertainty of the Big data which actually creates the trust in the data collected only based on this data the big decisions are made. It is also used for Knowing the majority of the customer's or audience's feedback, opinions about a product, or a service offered. Based on this only the success of the product or services offered is determined.
Value: The entire data which you have collected is supposed to have some basic value. Just collecting data is not enough. It has value only the data is processed or used for various purposes
Variability: The entire data which you have collected and stored would have changed in the future. The data you have collected would be the same in the database but in the future, the entire stats would have been changed. In order to keep the database.
Visualization: Accessibility and readability of the Big data. The readability and accessibility will be quite difficult due to the huge volume. Even though you find it quite complicated.
These Kinds, of course, can't be learned alone from home and require a Big data Hadoop Training in Chennai
azure devops training
ReplyDeleteaws devops certification
aws solution architect certification