/ Category / Tech Trends

Challenges of Big Data

by admin
January 23, 2018
Rate this post

What is Big Data?

by admin
January 23, 2018
Rate this post

Big Data Sources

by admin
January 23, 2018
Rate this post


Types of Big Data Analytics

by admin
January 23, 2018
3.8 (76%) 5 votes

What is Big Data and how is it Useful?

As the name implies, the term Big Data can be applied to any internal and external enterprise-based information that can be used to make business forecasts, improve existing infrastructures, manage smart power grids, manage business intelligence and other applications.

Incidentally, this phenomenon is characterized by three main factors namely:

  1. Volume – how much data is too much data

  2. Velocity/Speed – the speed at which data is flowing in and out making examination difficult

  3. Variety – the type of data as well as its range which are too much to take in

In other words, big data is typically used by enterprises to manage their business intelligence processes and programs. However, with relevant analytics, it can be harnessed to gain richer insights into business practices from a number of resources and transactions to unearth hidden trends and relationships.

 Big Data Analytics in Action

There are basically 4 types of analytics that big data depends on Prescriptive


These analytics reveals what kind of actions should be taken and which determines future rules and regulations. These are quite valuable since they allow business owners to answer specific queries. Take the bariatric healthcare industry for example. Patient populations can be measured using prescriptive analytics to measure how many patients are morbidly obese. That number can then be filtered further by adding categories such as diabetes, LDL cholesterol levels, and others to determine the exact treatment. Some companies also use this data analysis to forecast sales leads, social media, CRM data etc.Diagnostic


These analytics analyze past data to determine why certain incidents happened. Say, you end up with an unsuccessful social media campaign; using a diagnostic big data analysis you can examine the number of posts that were put up, followers, fans, page views/reviews, pins etc that will allow you to sift the grain from the chaff so to speak. In other words, you can distill literally thousands of data into a single view to see what worked and what didn’t thus saving time and resources.


This phase is based on present processes and incoming data. Such analysis can help you determine valuable patterns that can offer critical insights into important processes. For instance, it can help you assess credit risk, review old financial performance to determine how a customer might pay in the future and even categorize your clientele according to their preferences and sales cycle. Mining descriptive analytics involves the usage of a dashboard or simple email reports.

In a nutshell, we can say that harnessing the potential of big data can aid entrepreneurs add context to their business data to get a more in-depth and focused view of their needs. With analytics, those massive volumes of information can be simplified to determine actionable steps to ensure accurate business decisions. In other words, if you can understand and demystify big data, then you can increase your business value tenfold and leave your competitors in the dust to boot.

Predictive Analytics

These analytics involves the extraction of current data sets that can help users determine upcoming trends and outcomes with ease. However, these cannot tell us exactly what will happen in the future but what a business owner can expect along with different scenarios. In other words, predictive analysis is an enabler of big data in that it amasses an enormous amount of data such as customer info, historical data and customer insight in order to predict future scenarios. In this way it allows organizations to utilize large volumes of information to determine their clientele’s future perspectives.

Inferential Analytics

This type of data analytics takes different theories of the world into account or certain parts according to certain ‘subjects’. In other words, they take a small sample of info to determine certain facets of bigger issues such as a large population. It basically takes the quantity the analyst cares about along with any anomalies in estimates and relies heavily on the population and sample type.Causal Analytics

Causal Analytics

These types of analytics allow big data analysts figure out what can happen if they change a component or variable in a bigger scheme. This method typically involves a number of random studies but non-random studies are also conducted at times to infer causation. Causal analytics is considered the ‘gold standard’ when it comes to analyzing large volumes of data and involves random trial data sets.

Mechanistic Analytics

These take the most effort but pay off with clear results. Mechanistic analytics, as the name implies, allow big data analysts to understand clear changes in procedures or variables that can result in a change of variables in single objects. The results are typically determined by equations in engineering and physical science, but they can also be quite hard to infer. Additionally, if the analyst knows the equation but not the parameters, they can infer it with data analysis.

Big Data Market size in coming future

by admin
January 23, 2018
Rate this post

Big data is here to stay and business owners couldn’t be happier. The term emerged in a bid to describe the large volumes of information that databases comprise of, manage and maintain, but the concept has taken on a life of its own in the modern era. Now it not only refers to the information itself but also encompasses a number of technologies that can handle the same tasks to solve complex problems.

It’s because of its flexible nature that investments in big data continue to grow on a global scale. In fact, according to Forbes, it will amount to a whopping $40 Billion dollars this year alone and will expand further by almost 14% in the coming 5 years. It is currently a $5 Billion business already and will easily reach new heights of success if new business models are enabled that can leverage big data to create more beneficial analytical abilities along with state of the art applications that can solve critical business issues in minutes rather than hours.

New approaches to this concept has made the IT sector better than ever by allowing the creation of new game-changing models, in-depth business analytics and app development that are just the tip of the iceberg. Big data has literally changed the way businesses compete to get ahead of their competitors along with business models that aid in such endeavors. It has also altered how enterprises view their databases, warehouses and especially their business intelligence operations.

It’s no wonder big data is such a huge hit in Silicon Valley and is well on its way to becoming a global phenomenon in the next couple of years. Needless to say, the concept has changed the way businesses develop and is just beginning to gain momentum as a significant movement.

Importance of Big Data Analytics

by admin
January 23, 2018
Rate this post

The term itself is not new and there is a very good reason for that. Companies across the globe, both large conglomerates, and small startups are utilizing its potential to gain valuable insight into existing operations for future development and to improve their customer service.

Take today’s data for instance. According to a study conducted by scientists at the UC San Diego, by 2024, most businesses across the globe would have processed the digital equivalent of a gigantic number of books that if placed on top of each other, could go from Earth to Neptune and back. At the rate global enterprises are focusing on big data, that feat would be repeated 20 times each year!

What is Big Data Analytics?

However, just why are so many enterprises dependent on this phenomenon? This is where analytics comes into the picture. The process refers to the examination of big data to reveal revealing patterns, significant correlations and other useful info that business owners can use to increase decision making and unearth new opportunities. Data scientists today use such data to get access to and simplify huge volumes of info that traditional analytics falls short of.

To understand its importance, say your company has already collected large amounts of data along with multitudes of data combinations, formats, and stores. Analyzing billions of rows of data to figure out what is important is not possible manually. Big Data analytics can allow you to go through said information in context via:

  • Predictive Analysis

  • Forecasting

  • Optimization

  • Text mining

These processes have allowed countless business owners to streamline their decision-making processes and pinpoint the best ones for enterprise development. Additionally, there are four ways entrepreneurs are harnessing the power of Big Data analytics to improve their businesses:

Big Data Business Intelligence

Business Intelligence or BI refers to typical business reports, ad hoc reports, alerts, OLAP and even notifications that are based on this process. The main aim of this process is to analyze the static past that can be used to determine future actions. When reporting involves extraction of data from huge data sets we call it performing Big Data business intelligence or BI. However, the decisions that result from these two methods are largely reactionary.

Big Analytics

This method is largely proactive and requires a hands-on approach which involves optimization, predictive analytics, modeling, text mining and statistical analysis on a large scale. These processes allow big data analysts to pinpoint weaknesses, strengths and also figure out new and better decision making practices for the future. However, this is where it gets interesting; using big data analytics, business owners can hone in on and extract relevant information for easy analysis.

In other words we can say that big data analytics is more than just a one-time endeavor. If they are proactive with it, business owners can do wonders for their enterprises and remain ahead of their competitors with tactics that the latter would not be privy to.

Types of NoSql Databases

by admin
January 23, 2018
Rate this post

What is a NOSql Database?

Typically referred to as a non SQL, a NoSQL offers a set mechanism for storage and extraction of important data. It actually encompasses a number of database technologies that were created to accommodate large volumes of data regarding users, products, objects, how much data was accessed, performance metrics and processing requirements. These are way more beneficial than their relational counterparts; the latter is not able to handle the scale and speed alterations in modern applications, nor are they capable of handling agility changes.

Types of NoSQL Databases

There are basically 4 types of these databases:

1. Key-value store – These are the least complex options and are designed to store data without a schema. All of the data it comprises of has a key that is indexed, thus the name.

2. Column Store – Column or wide column stores are designed to store large volumes of data as sections or columns, thus the name. This type of NoSQL database allows the storage of data in rows rather than columns thus ensuring high performance and a scalable structure.

3. Document Database – These databases comprise of more complex data and each has its own key for easy extraction. Document databases are designed to store, manage and extract info that is mainly in document form and is also called semi-structured data.

4. Graph – As the name implies, these NoSQL databases are based on graphs and comprise of data that has interconnected components along with variable relations between them.

Why NoSql?

A NoSQL database allows easy and quick retrieval of complex data and also ensures its availability on a consistent basis. Since these are created on a distributed architecture, anomalies can be handled quickly and effectively; if a node goes down, others will continue operations without data loss thus ensuring consistent performance 24/7.

Map Reduce Interview Questions and Answers

by admin
January 23, 2018
Rate this post

  1. What is Map Reduce?

Map Reduce is a java based programming paradigm of Hadoop framework that

provides scalability across various Hadoop clusters

  1. How Map Reduce works in Hadoop?

MapReduce distributes the workload into two different jobs namely

1. Map job and 2. Reduce job that can run in parallel.

The Map job breaks down the data sets into key-value pairs or tuples.

The Reduce job then takes the output of the map job and combines the data tuples

into smaller set of tuples.

  1. What is ‘Key value pair’ in Map Reduce?

Key value pair is the intermediate data generated by maps and sent to reduces for

generating the final output.

  1. What is the difference between MapReduce engine and HDFS cluster?

HDFS cluster is the name given to the whole configuration of master and slaves

where data is stored. Map Reduce Engine is the programming module which is used

to retrieve and analyze data.

  1. Is map like a pointer?

No, Map is not like a pointer.

  1. Why are the number of splits equal to the number of maps?

The number of maps is equal to the number of input splits because we want the key

and value pairs of all the input splits.

  1. Is a job split into maps?

No, a job is not split into maps. Spilt is created for the file. The file is placed on

datanodes in blocks. For each split, a map is needed.

  1. How can you set an arbitrary number of mappers to be created for a job inHadoop?

This is a trick question. You cannot set it

  1. How can you set an arbitary number of reducers to be created for a job in Hadoop?

You can either do it progamatically by using method setNumReduceTasksin the

JobConfclass or set it up as a configuration setting

  1. How will you write a custom partitioner for a Hadoop job?

The following steps are needed to write a custom partitioner.

– Create a new class that extends Partitioner class

– Override method getPartition

– In the wrapper that runs the Map Reducer, either

– add the custom partitioner to the job programtically using method

setPartitionerClass or

– add the custom partitioner to the job as a config file (if your wrapper reads from

config file or oozie)

  1. What is the difference between TextInputFormat and KeyValueInputFormat class?

TextInputFormat: It reads lines of text files and provides the offset of the line as

key to the Mapper and actual line as Value to the mapper

  KeyValueInputFormat: Reads text file and parses lines into key, val pairs.

Everything up to the first tab character is sent as key to the Mapper and the

remainder of the line is sent as value to the mapper.

  1. What is a Combiner?

The Combiner is a “mini-reduce” process which operates only on data generated by a

mapper. The Combiner will receive as input all data emitted by the Mapper instances

on a given node. The output from the Combiner is then sent to the Reducers, instead

of the output from the Mappers.

  1. If no custom partitioner is defined in the hadoop then how is data partitioned before its sent to the reducer? 

The default partitioner computes a hash value for the key and assigns the partition

based on this result.

  1. Have you ever used Counters in Hadoop. Give us an example scenario?

Anybody who claims to have worked on a Hadoop project is expected to use


  1. Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job?

Yes, The input format class provides methods to add multiple directories as input to

a Hadoop job.

  1. Is it possible to have Hadoop job output in multiple directories. If yes then how?

Yes, by using Multiple Outputs class

  1. Explain what are the basic parameters of a Mapper?

The basic parameters of a Mapper are

LongWritable and Text

Text and IntWritable

  1. Explain what is the function of MapReducer partitioner?

The function of MapReducer partitioner is to make sure that all the value of a single

key goes to the same reducer, eventually which helps evenly distribution of the map

output over the reducers.

  1. Explain what is difference between an Input Split and HDFS Block?

Logical division of data is known as Split while physical division of data is known as

HDFS Block.

  1. Mention what are the main configuration parameters that user need to specify to run Mapreduce Job ?

The user of Mapreduce framework needs to specify

  • Job’s input locations in the distributed file system

  • Job’s output location in the distributed file system

  • Input format

  • Output format

  • Class containing the map function

  • Class containing the reduce function

  • JAR file containing the mapper, reducer and driver classes

What are the advantages of BIG DATA Analytics, and how will it impact the future

by admin
January 23, 2018
4 (80%) 1 vote

In the present era, Big data analytics is no longer used only for the purpose of experimenting. Many companies began to achieve a lot more real results with its approach, and  they are expanding their efforts to surround more data and models. It is a term that used to describe the collection, availability, and processing of streaming data in real-time of  huge volumes. The three V’s are nothing but volume, velocity , and variety. To make more accurate decisions, companies who are combining their marketing, customer data, sales, transactional data, external data and social conversations such as stock prices, news, and weather in order to identify the correlation and root are statistically valid models.

  1. Timely:   It can save plenty of time since on every working day 60%  knowledge workers are spending time attempting to find and manage data.

  2. Accessible:   Half of the senior executives report that accessing the right data is difficult. So this helps to access the data more vulnerable.

  3. Trustworthy:  Due to poor data quality in the average of 29% companies are measuring the monetary cost. Even the simple things like customer contact information updates monitoring in multiple systems will help the company to save millions of dollars.

  4. Relevant:   Keeping irrelevant data is a curse for the database since it will make the filtering process complicated. But the statistics say, around 43% of companies are having tools which are unable to filter the junk data. A simple thing like filtering the customers from web analytics will be able to provide an insight for the efforts of your acquisition.

  5. Secure: With data hosting and technology, companies can secure their infrastructures since an average of the security breach in any company costs  $214. So with this technology, the company can save up to 1.6% of their revenue per year.

This technology is like cloud-based analytics that will provide a substantial cost of advantages. The traditional architectures like data marts and warehouses, in particular, comparing it with Big data is difficult. That is because there is the difference in the functionality and the manipulative thing is the price comparison which decides the magnitude improvements.

Companies who traditionally used all the data sources for understanding customers found difficult to integrate it in the real-time and act towards them. While Analytics has always played a major role that involves attempts in order to improve prominent decision making, Big data doesn’t want to change that. Nowadays large organizations are in search of both better and faster decisions with this, and they are finding them. Driven by the speed and the compatibility, several companies are now in the task of speeding up their decisions for using the powerful technology.

Perhaps the most astounding use of this analytics is to create and innovate new products and services for the customers. Previously, Online companies have done this so far, maybe for a decade! But now primarily offline firms are doing it as well. Companies who had made a major investment in any new service models for their industrial products are using analytics. So, it is pretty clear that this era would and it’s always advisable to say “Don’t wait too long to acquire!”

3 Technologies Empowering Digital Marketing

by admin
January 23, 2018
Rate this post

Digital marketing has become the most important type of marketing in the world. The biggest advantage that digital marketing has over traditional marketing is that it is targeted. When you publish an ad in a newspaper, you hope that a few of the people who see the ad will be interested in your products. With digital marketing, advertisers can choose who sees their ads, which allows for much greater ROI. It also allows smaller businesses to advertise at low costs since they only pay for the people who actually end up looking at their ads.

All this targeting happens thanks to data analytics. Big data plays a big part in the process – without it, targeting would be impossible. In order to create a marketing profile of customers, systems have to comb through billions of online searches and trends to bring out patterns as well as trends to help marketers target the right people. This is not easy work but it is worth the time and cost put into it. There are many exciting technologies that allow marketers to reap the benefits of big data such as:


MongoDB is a fantastic way of creating relational databases. Analytics is all about the relations between different variables. Marketers need to predict what type of a person would be interested in a certain product, for which they use relational data (such as, this person seems interested in comic books; they might want to buy comic book themed apparel) to empower online ad networks. MongoDB’s advantage is it’s similarity to JSON and the ability to handle dynamic schemas.

Analytics with R

R is a godsend for people who need to work with data analytics and processing. It is among the most used statistical tools in the world and for good reason. It combines many different data analysis techniques in one place. R empowers advertisers and developers both by putting everything they need within their reach. Another great advantage of R is its ability to create engaging and beautiful data representations which allow non-technical people to understand its results.

Scala & Spark

Scala & Spark is one of the fastest engines out there for processing data on a very large scale. When things start to get serious and when you are handling millions of rows of data, Scala & Spark is the way to do it at the greatest speed possible. It is being used more and more often due to the advantages it has over python. Python can be very slow when it comes to large-scale data processing needs of today while Scala & Spark combined can easily deliver in the same situations.

There are many other similar technologies that have made big data processing possible. The advertisers and developers of tomorrow will have to make themselves familiar with big data technologies. As the Internet of Things becomes a reality, the available data will only increase and the need for processing even more data than we do now will become apparent.