Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Social Media and Machine Learning

Literature Review on Big Data Analytics Methods

Submitted: 18 February 2019 Reviewed: 14 May 2019 Published: 24 October 2019

DOI: 10.5772/intechopen.86843

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Social Media and Machine Learning

Edited by Alberto Cano

To purchase hard copies of this book, please contact the representative in India: CBS Publishers & Distributors Pvt. Ltd. www.cbspd.com | [email protected]

Chapter metrics overview

1,701 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

IntechOpen

Total Chapter Views on intechopen.com

Companies and industries are faced with a huge amount of raw data, which have information and knowledge in their hidden layer. Also, the format, size, variety, and velocity of generated data bring complexity for industries to apply them in an efficient and effective way. So, complexity in data analysis and interpretation incline organizations to deploy advanced tools and techniques to overcome the difficulties of managing raw data. Big data analytics is the advanced method that has the capability for managing data. It deploys machine learning techniques and deep learning methods to benefit from gathered data. In this research, the methods of both ML and DL have been discussed, and an ML/DL deployment model for IOT data has been proposed.

  • big data analytics
  • machine learning
  • deep learning

Author Information

Iman raeesi vanani.

  • Information Technology Management, Allameh Tabataba’i University, Iran

Setareh Majidian *

*Address all correspondence to: [email protected]

1. Introduction

Digital era with its opportunity and complexity overwhelms industries and markets that are faced with a huge amount of potential information in each transaction. Being aware of the value of gathered data and benefitting from hidden knowledge create a new paradigm in this era, which redefines the meaning of power for corporation. The power of information leads organizations toward being agile and to hit the goals. Big data analytics (BDA) enforces industries to describe, diagnose, predict, prescribe, and cognate the hidden growth opportunities and leads them toward gaining business value [ 68 ]. BDA deploys advanced analytical techniques to create knowledge from exponentially increasing amount of data, which will affect the decision-making process in decreasing complexity of the process [ 43 ]. BDA needs novel and sophisticated algorithms that process and analyze real-time data and result in high-accuracy analytics. Machine and deep learning allocate their complex algorithms in this process considering the problem approach [ 28 ].

In this research, a literature review on big data analytics, deep learning and its algorithms, and machine learning and related methods has been considered. As a result, a conceptual model is provided to show the relation of the algorithms that helps researchers and practitioners in deploying BDA on IOT data.

The process of discussing over DL and ML methods has been shown in Figure 1 .

literature review of big data

The big data analytics methods in this research.

2. Big data and big data analytics

One of the vital consequences of the digital world is creating a collection of bulk of raw data. Managing such valuable capital with different shape and size on the basis of organizations’ needs the manager’s attention. Big data has the power to affect all parts of society from social aspect to education and all in between. As the amount of data increases especially in technology-based companies, the matter of managing raw data becomes much more important. Facing with features of raw data like variety, velocity, and volume of big data entitles advanced tools to overcome the complexity and hidden body of them. So, big data analytics has been proposed for “experimentation,” “simulations,” “data analysis,” and “monitoring.” Machine learning as one of the BDA tools creates a ground to have predictive analysis on the basis of supervised and unsupervised data input. In fact, a reciprocal relation has existed between the power of machine learning analytics and data input; the more exact and accurate data input, the more effective the analytical performance. Also, deep learning as a subfield of machine learning is deployed to extract knowledge from hidden trends of data [ 28 ].

3. Big data analytics

In digital era with growing rate of data production, big data has been introduced, which is known by big volume, variety, veracity, velocity, and high value. It brings hardness in analyzing with itself which entitled organization to deploy a new approach and tools in analytical aspects to overcome the complexity and massiveness of different types of data (structured, semistructured, and unstructured). So, a sophisticated technique that aims to cope with complexity of big data by analyzing a huge volume of data is known as big data analytics [ 50 ]. Big data analytics for the first time was coined by Chen Chiang (2012) who pointed out the relation between business intelligence and analytics that has strong ties with data mining and statistical analysis [ 11 ].

Big data analytics supports organizations in innovation, productivity, and competition [ 16 ]. Big data analytics has been defined as techniques that are deployed to uncover hidden patterns and bring insight into interesting relations in understanding contexts by examining, processing, discovering, and exhibiting the result [ 69 ]. Complexity reduction and handling cognitive burden in knowledge-based society create a path toward gaining advantages of big data analytics. Also, the most vital feature that led big data analytics toward success is feature identification. This means that the crucial features that have important affection on results should be defined. It is followed by identifying of corelations between input and a dynamic given point, which may change during times [ 69 ].

As a result of fast evolution of big data analytics, e-business and dense connectivity globally have flourished. Governments, also, take advantages of big data analytics to serve better services to their citizens [ 69 ].

Big data in business context can be managed and analyzed through big data analytics, which is known as a specific application of this field. Also, big data gained from social media can be managed efficiently through big data analytics process. In this way, customer behavior can be understood and five features of big data, which are enumerated as volume, velocity, value, variety, and veracity, can be handled. Big data analytics not only helps business to create a comprehensive view toward consumer behavior but also helps organizations to be more innovative and effective in deploying strategies [ 14 ]. Small and medium size company use big data analytics to mine their semistructured big data, which results in better quality of product recommendation systems and improved website design [ 19 ]. As Ref. [ 9 ] cited, big data analytics gains advantages of deploying technology and techniques on their massive data to improve a firm’s performance.

According to Ref. [ 19 ], the importance of big data analytics has been laid in the fact that decision-making process is supported by insight, which is the result of processing diverse data. This will turn decision-making process into an evidence-based field. Insight extraction from big data has been divided into two main processes, namely data management and data analytics with the former referring to technology support for gathering, storing, and preparing data for analyzing purpose and the latter is about techniques deployed for data analyzing and extracting knowledge from them. Thus, big data analytics has been known as a subprocess of insight extraction. Big data analytics tools are text analytics, audio analytics, video analytics, social media analytics, and predictive analytics. It can be inferred that big data analytics is the main tool for analyzing and interpreting all kinds of digital information [ 35 ]. And the processes involved are data storage, data management, data analyzing, and data visualization [ 9 ].

Big data analytics has the potential for creating effective and efficient value in both operational and strategic approach for organization and it plays as a game changer in augmenting productivity [ 20 ].

Industry practitioners believe that big data analytics is the next ‘blue ocean’ that brings opportunities for organizations [ 33 ], and it is known as “the fourth paradigm of science” [ 70 ].

Fields of machine learning (ML) and deep learning (DL) were expanded to deal with BDA. Different fields like “medicine,” “Internet of Things (IOT),” and “search engines” deploy ML for exploration of predictive features of big data. In other words, it generalizes learnt patterns to predict future data. Feature construction and data representation are two main elements of ML. Also, useful data extraction from big data is the reason for deploying DL, which is a human-brain inspired technique for processing neural signals as a subfield of ML [ 28 ].

4. Big data analytics and deep learning

In 1940s, deep learning was been introduced [ 71 ], but the birth of deep learning algorithms has been determined in year 2006 when layer-wise-greedy-learning method was introduced by Hinton to overcome the deficiency of neural network (NN) method in finding optimized point by trapping in optima local point that is exacerbated when the size of training data was not enough. The underlying thought of proposed method by Hinton is to use unsupervised learning before layer-by-layer training happens [ 72 ].

Inspiring from hierarchical structure of human brain, deep learning algorithms extract complex hidden features with a high level of abstraction. When massive amounts of unstructured data represent, the layered architecture of deep learning algorithms works effectively. The goal of deep learning is to deploy multiple transformation layers where in every layer output representation is occurred [ 42 ]. Big data analytics comprises the whole learnt untapped knowledge gained from deep learning. The main feature of big data analytics, which is extracting underlying features in huge amounts of data, makes it a beneficial tool for big data analytics [ 42 ].

convolutional neural networks (CNN)

restricted Boltzmann machines

autoencoder

sparse coding [ 24 ]

4.1 Convolutional neural networks (CNN)

CNN inspired from neural network model as a type of deep learning algorithm has a “convolutional layer” and “subsampling layer” architecture. Multi-instance data is deployed as a bag of instances in which each data point is a set of instances [ 73 ].

CNN has been known with three features namely “local field,” “subsampling,” and “weight sharing” and comprised of three layers, which are input, hidden that consists of “convolutional layer” and “subsampling layer” and output layer. In hidden layer, each “convolutional layer” comes after “subsampling layer.” CNN training process has been done in two phases of “feed forward” in which the result of previous level entered into next level and “back propagation” pass, which is about modification of errors and deviation through a process of spreading training errors backward and in a hierarchical process [ 74 ]. In the first layer, convolution operation is deployed that is to take various filtering phases in each instances, and then, nonlinear transformation function takes place as the result of previous phase transforming into a nonlinear space. After that, the transformed nonlinear space is considered in max-pooling layer, which represents the bag of instances. This step has been done by considering the maximum response of each instance, which was in filtering step. The representation creates a strong pie with the maximum response that can be deployed by predicting instances’ status in each class. This will lead to constructing a classification model [ 73 ].

CNN is comprised of feature identifier, which is an automatic learning process from extracted features from data with two components of convolutional and pooling layers. Another element of CNN is multilayer perception, which is about taking features that were learned into classification phase [ 3 ].

4.2 Deep neural network (DNN)

A deep architecture in supervised data has been introduced with advances in computation algorithm and method, which is called deep neural network (DNN) [ 3 ]. It originates from shallow artificial neural networks (SANN) that are related to artificial intelligence (AI) [ 30 ].

As hierarchical architecture of DL can constitute nonlinear information in the set of layers, DNN deploys a layered architecture with complex function to deal with complexity and high number of layers [ 3 ].

DNN is known as one of the most prominent tools for classifying [ 49 ] because of its outstanding classification performance in complex classification matters. One of the most challenging issues in DNN is training performance of it, as in optimization problems it tries to minimize an objective function with high amount of parameters in a multidimensional searching space. So, fining and training a proper DNN optimization algorithm requires in high level of attention. DNN is constructed of structure stacked denoising auto encoder (SDAE) [ 75 ] and has a number of cascade auto encoder layers and softmax classifier. The first one deploys raw data to generate novel features, and with the help of softmax, the process of feature classification is performed in an accurate way. The cited features are complementary to each other that helps DNN do its main performance, which is classification in an effective way. Gradient descent (GD) algorithm, which is an optimization method, can be deployed in linear problems with no complex objective function especially in DNN training, and the main condition of this procedure is that the amount of optimization parameter is near to optimal solution [ 6 ]. According to Ref. [ 30 ], DNN with the feature of deep architecture is deployed as a prediction model [ 30 ].

4.3 Recurrent neural network (RNN)

RNN, a network of nodes that are similar to neurons, was developed in 1980s. Each neuron-like node is interconnected with each other, and it can be divided into categories of input, hidden, and output neurons. The data will receive, transform, and generate results in this triple process. Each neuron has the feature of time-varying real-valued activation and every synapse is real-valued weight justifiable [ 66 ]. A classifier for neural networks has outstanding performance in not only learning and approximating [ 105 ] but also in dynamic system modeling with nonlinear approach by using present data [ 29 , 52 ]. RNN with the background of human brain–inspired algorithm has been derived from artificial neural network but they are slightly different from each other. Various fields of “associative memories,” “image processing,” “pattern recognition,” “signal processing,” “robotics,” and “control” have been in the center of focus in research of RNN [ 67 ]. RNN with its feedback and feed forward relations can take a comprehensive view from past information and deploy it for adjusting with sudden changes. Also, RNN has the capability of using time-varying data in a recursive way, which simplified the neural network architecture. Its simplicity and dynamic features work effectively in real-time problems [ 40 ]. RNN has the ability to process temporal data in hierarchy method and take multilayer of abstract data to show dynamical features, which is another capability of RNN [ 18 ]. RNN has the potential to make connection between signals in different levels, which brings significant processing power with huge amounts of memory space [ 45 ].

5. Big data analytics and machine learning

Machine learning has been defined as predictive algorithms by data interpretation, which is followed by learning algorithm in an unstructured program. Three main categories of ML are supervised, unsupervised, and reinforcement learning [ 47 ], which is done during “data preprocessing,” “learning,” and “evaluation phase.” Preprocessing is related to transformation of raw data into right form that can be deployed in learning phase, which comprises of some levels like cleaning the data, extracting, transforming, and combining it. In the evaluation phase, data set will be selected, and evaluation of performance, statistical tests, and estimation of errors or deviation occur. This may lead to modifying selected parameters from learning process [ 76 ]. The first one refers to analyzing features that are critical for classification through a given training data. The data deployed in training algorithm will then become trained and then it will be used in testing of unlabeled data. After interpreting unlabeled data, the output will be generated, which can be classified as discrete or regression if it is continuous. On the other hand, ML can be deployed in pattern identification without training process, which is called unsupervised ML. In this category, when pattern of characteristics are used to group the data, cluster analysis is formed, and if the hidden rules of data have been recognized, another form of ML, which is association, will be formed [ 77 ]. In the other words, the main process of unsupervised ML or clustering is to find natural grouping from those data, which is unlabeled. In this process, K cluster in a set number of data is much more similar in comparison with other clusters considering similarity measure. Three categories of unsupervised ML are “hierarchical,” “partitioned,” and “overlapping” techniques. “Agglomerative” and “divisive” are two kinds of hierarchical methods. The first one is referred to an element that creates a separate cluster with tendency to get involved with larger cluster; however, the second one is a comprehensive set that is going to divide into some smaller clusters. “Partitioned” methods begin with creating several disjoint clusters from data set without considering any hierarchical structure, and “overlapping” techniques are defined as methods that try to find fuzzy or deffuzy partitioning, which is done by “relaxing the mutually disjoint constraint.” Among all unsupervised learning techniques, K-means grabs attention. “Simplicity” and “effectiveness” are two main characteristics of unsupervised techniques [ 47 ].

5.1 Machine learning and fuzzy logic

Fuzzy logic proposed by Lotfi Zadeh (1965) has been deployed in many fields from engineering to data analysis and all in between. Machine learning also gains advantage from fuzzy logic as fuzzy takes inductive inference. The changes happened in such grounds like “fuzzy rule induction,” “fuzzy decision trees,” “fuzzy nearest neighbor estimation,” or “fuzzy support vector machines” [ 27 ].

5.2 Machine learning and classification methods

One of the most critical aspects of ML is classifications [ 23 ], which is the initial phase in data analytics [ 17 ]. Prior studies found new fields that can deploy this aspect like face recognition or even recognition of hand writing. According to [ 23 ], operating algorithm of classification has been divided into two categories: offline and online. In offline approach, static dataset is deployed for training. The training process will be stopped by classifiers after training process is finished and modification of data structured will not be allowed. On the other hand, online category is defined as a “one-pass” type, which is learning from new data. The prominent features of data will be stored in memory and will be kept until the processed training data is erased. Incremental and evolving processes (changing data pattern in unstable environment, which is a result of evolutionary system structure, and continuously updating meta-parameters) are two main approaches for online category [ 23 ].

Support vector machine (SVM) was proposed in 1995 by Cortes and Vapnik to solve problems related to multidimensional classification and regression issues as its outstanding learning performance [ 64 ]. In this process, SVM constructs a high-dimensional hyperplane that divides data into binary categories, and finding greatest margin in binary categories considering the hyperplane space is the main objective of this method [ 10 ]. “Statistical learning theory,” “Vapnik-Chervonenkis (VC) dimension,” and the “kernel method” are underlying factors of development of SVM [ 78 ], which deploys limited number of learning patterns to desirable generalization considering a risk minimization structure [ 22 ].

It is highly dependent on the value of K parameter, which is a gauge for determination of neighborhood space.

The method lacks discrimination ability to differentiate between far and close neighbors.

Overlapping or noise may happen when neighbor are close [ 80 ].

KNN as one of the most important data mining algorithms was first introduced for classification problems, which are expanded to pattern recognition and machine learning research. Expert systems take advantage of KNN classification problems. Three main KNN classifiers that put focus on k-nearest vector neighbor in every class of test sample are as follows:

“Local mean-based k-nearest neighbor classifier (LMKNN)”: despite the fact that existing outlier negative influence can be solved by this method, LMKNN is prone to misclassification because of taking single value of k considering neighborhood size per class and applying it in all classes.

“Local mean-based pseudo nearest neighbor classifier (LMPNN)”: LMKNN and PNN methods create LMPNN, which is known as a good classifier in “multi-local mean vectors of k-nearest neighbors and pseudo nearest neighbor based on the multi-local mean vectors for each class.” Outlier points in addition to k sensitivity have been more considered in this technique. However, differentiation of information in nearest sample of classification cannot recognize widely as weight of all classes are the same [ 81 ].

“Multi-local means-based k-harmonic nearest neighbor classifier (MLMKHNN)”: MLMKHNN as an extension to KNN takes harmonic mean distance for classification of decision rule. It deploys multi-local mean vectors of k-nearest neighbors per class of every query sample and harmonic mean distance will be deployed as the result of this phase [ 82 ]. These methods are designed in order to find different classification decisions [ 81 ].

In 2006, Huang et al. proposed extreme learning machine (ELM) as a classification method that works by a hidden single layer feedback in neural network [ 92 ]. In this layer, the input weight and deviation will be randomly generated and least square method will be deployed to determine output weight analytically [ 17 ], which differentiates this method from traditional methods. In this phase, learning happens followed by finding transformation matrix [ 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 ]. It is deployed to minimize the sum-of-squares error function. The result of minimizing function will then be used in classification or reduction of dimension [ 48 ]. Neural networks are divided into two categories of feed forward neural network and feedback neural networks and ELM is on the first category, which has a strong learning ability specially in solving nonlinear functions with high complexity. ELM uses this feature in addition to fast learning methods to solve traditional feed forward neural network problems in a mathematical change without iteration with higher speed in comparison with traditional neural network [ 13 ].

Despite the efficiency of ELM in classification problems, binary classification problems emerge as the deficiency of ELM; as in these problems, a parallel training phase on ELM is needed. In twin extreme learning machine (TELM), the problems will be solved by a simultaneous train and two nonparallel classification hyperplanes, which are deployed for classification. Every hyperplane enters into a minimization function to minimize the distance of it with one class, which is located far away from other classes [ 60 ]. ELM is at the center of attention in data stream classification research [ 83 ].

5.3 Machine learning and clustering

Clustering as a supervised learning method aims to create groups of clusters, which members of it are in common with each other in characteristics and dissimilar with other cluster members [ 84 ]. The calculated interpoint distance of every observation in a cluster is small in comparison with its distance to a point in other clusters [ 36 ]. “Exploratory pattern-analysis,” “grouping,” “decision-making,” and “machine-learning situations” are some main applications of clustering technique. Five groups of clustering are “hierarchical clustering,” “partitioning clustering,” “density-based clustering,” “grid-based clustering,” and “model-based clustering” [ 84 ]. Clustering problems are divided into two categories: generative and discriminative approaches. The first one refers to maximizing the probability of sample generation, which is used in learning from generated models, and the other is related to deploying pairwise similarities, which maximize intercluster similarities and minimize similarities of clusters in between [ 63 ].

There are important clustering methods like K-means clustering, kernel K means, spectral clustering, and density-based clustering algorithms that are at the center of research topics for several decades. In K-means clustering, data is assigned to the nearest center, which results from being unable to detect nonspherical clusters. Kernel k-means and spectral clustering create a link between the data and feature space and after that k-means clustering is deployed. Obtaining feature space is done by using kernel function and graph model by kernel k-means and spectral clustering, respectively. Also spectral clustering deploys Eigen-decomposition techniques additionally [ 26 ]. K-means clustering works effectively in clustering of numerical data, which is multidimensional [ 85 ].

Density-based clustering is represented by DBSCAN, and clusters tend to be separate from data set and be as higher density area. This method does not deploy one cluster for clusters recognition in the data a priori. It considers user-defined parameter to create clusters, which has a bit deviation from cited parameter in clustering process [ 84 ].

5.4 Machine learning and evolutionary methods

The main goal of optimization problems is to find an optimal solution among a set of alternatives. Providing the best solution has become difficult if the searching area is large. Heuristic algorithm proposed different techniques to find the optimal solution, but they lack finding the best solution. However, population-based algorithm was generated to overcome the cited deficiency, which is considered to find the best alternative [ 7 ].

5.5 Genetic algorithms (GA)

GA is defined as a randomized search, which tries to find near-optimal solution in complex and high-dimensional environment. In GA, a bunch of genes that are called chromosomes are the main parameters in the technique. These chromosomes are deployed as a search space. A number of chromosomes that seem as a collection are called population. The creation of a random population will be followed by representing the goodness degree of objective and fitness function related to each string. The result of this step that will be a few of selected string with a number of copies will be entered into the mating pool. By deploying cross-over and mutation process, a new generation of string will be created from the string. This process will be continued until a termination condition is found. “Image processing,” “neural network,” and “machine learning” are some examples of application fields for genetic algorithms [ 38 ]. GA as nature-inspired algorithm is based on genetic and natural selection algorithms [ 31 ].

GA tries to find optimal solution without considering the starting point [ 104 ]; also, GA has the potential to find optimal clustering considering clustering metrics [ 38 ]. Filter and wrapper search are two main approaches of GA in the field of feature selection. The first one aims to investigate the value of features by deploying heuristic-based data characteristics like correlation, and the second one assesses the goodness of GA solution by using machine learning algorithm [ 53 ]. In K-means algorithm, optimized local point is found on the basis of initializing seed values and the generated cluster is on the basis of initial seed values. GA by the aim of finding near-optimal or optimal clustering searches for initial seed values, outperforms K-mean algorithm, and covers the lack of K-mean algorithm [ 4 ]. Gaining knowledge from data base is another ground for GA, which plays the role of building “classifier system” and “mining association rules” [ 58 ].

Feature selection is a vital problem in big data as it usually contains many features that describe target concepts and chooses proper amount of feature for pre-processing traditionally as a main matter was done by data mining. Feature selection is divided into two groups: independent of learning algorithm, which deploys filter approach, and dependent on learning algorithm, which uses a wrapper approach. However, filter approach is independent of learning algorithm, and the optimal set of feature may be dependent on learning algorithm, which is one of the main drawbacks of filter selection. In contrast, wrapper approach by deploying learning algorithm in evaluation of every feature set works better. A main problem of this approach is complexity in computation field, which is overcome by using GA in feature selection as learning algorithm [ 56 ].

5.6 Ant colony optimization (ACO)

Ant colony optimization method was proposed by Dorigo [ 17 ] as a population-based stochastic method [ 15 ]. The method has been created biologically from real ant behavior in food-seeking pattern. In other words, this bionic algorithm has been deployed for finding the optimal path [ 44 ]. The process is that when ants start to seek food they deposit a chemical material on the ground, which is known as pheromone while they are moving toward food source. As the path between the food source and nest become shorter, the amount of pheromone will become larger. New ants in this system tend to choose the path with greater amount of pheromone. By passing time, all ants follow the positive feedback and choose the shortest path, which is signed by greatest amount of pheromone [ 86 ]. The applications of ant colony optimization in recent research have been declared as traveling salesman problem, scheduling, structural and concrete engineering, digital image processing, electrical engineering, clustering, routing optimization algorithm [ 41 ], data mining [ 32 ], robot path planning [ 87 ], and deep learning [ 39 ].

Less complexity in integration of this method with other algorithms

Gain advantage of distributed parallel computing (e.g., intelligent search)

Work better in optimization in comparison with swarm intelligence

High speed and high accuracy

Robustness in finding a quasi-optimal solution [ 41 ]

As it is stated, the emitted material called pheromone causes clustering between species around optimal position. In big data analytics, ant colony clustering is deployed on the grid board to cluster the data objects [ 21 ].

Initializing pheromone trail

Deploying pheromone trail to construct solution

Updating trail pheromone

On the basis of probabilistic state transition rule, which depends on the state of the pheromone, a complete solution is made by each ant. Two steps of evaporation and reinforcement phase are passed in pheromone updating procedure, where evaporation of pheromone fraction happens and emitting of pheromone that shows the level of solution fitness is determined, respectively, which is followed by finalizing condition [ 46 ].

Ant colony decision tree (ACDT) is a branch of ant colony decision that aims to develop decision tress that are created in running algorithm, but as a nondeterministic algorithm in every execution, different decision tree is created. A pheromone trail on the edge and heuristics used in classical algorithm is the principle of ACDT algorithm.

The multilayered ant colony algorithm has been proposed after the disability of one layer ant colony optimization has been declared in finding optimal solution. As an item, value with massive amount of quantity takes too long to grow. In this way, through transactions, maximum quantities of an item is determined and a rough set of membership function will be set, which will be improved by refining process at subsequent levels by reduction in search space. As a result, search ranges will be differing considering the levels. Solution derived from every level is an input for next level, which is considered in the cited approach but with a smaller search space that is necessary for modifying membership functions [ 88 ]. Tsang and Kwong proposed ant colony clustering in anomaly detection [ 65 ].

5.7 Bee colony optimization (BCO)

BCO algorithm works on inspiration from honey bee’s behavior, which is widely used in optimization problems like “traveling salesman problem,” “internet hosting center,” vehicle routing, and the list goes on. Karaboga in 2005 proposed artificial bee colony (ABC) algorithm. The main features of artificial bee colony (ABC) algorithm are simplicity, easy used and has few elements which need to be controlled in optimization problems. “Face recognition,” “high-dimensional gene expression,” and “speech segment classification” are some examples that ABC and ACO use to select features and optimize them by having a big search space. In ABC algorithms, three types of bees called “employed bees (EBees),” “onlooker bees (OBees),” and “scout bees deployed” are deployed. In this process, food sources are positioned and then EBees, where their numbers are equal to number of food source, pass the nectar information to OBees. They are equal to the number of EBees. The information is taken to exploit the food source till the finishing amount. Scouts in exhausted food source are employed to search for new food source. The nectar amount is a factor that shows solution quality [ 25 , 55 ].

This method is comprised of two steps: step forward, which is exploring new information by bees, and step back, which is related to sharing information considering new alternative by bee of hives.

In this method, exploration is started by a bee that tries to discover a full path for its travel. When it leaves the hive, it comes across with random dances of other bees, which are equipped with movement array of other bees that is known as “preferred path.” This will lead in foraging process and it comprises of a full path, which was previously discovered by its partner who guides the bee to the final destination. The process of moving from one node to another will be continued till the final destination is reached. For choosing the node by bees, a heuristic algorithm is used, which involves two factors of arc fitness and the distance heuristic. The shortest distance has the possibility to be selected by bees [ 7 ]. In BCO algorithm, two values of alpha and beta will be considered, which are exploitation and exploration processes, respectively [ 8 ].

5.8 Particle swarm optimization (PSO)

PSO was generated from inspiration from biological organisms, particularly the ability of a grouped animal to work together in order to find the desired location in particular area. The method was introduced by Kennedy and Eberhart in 1995 as a stochastic population-based algorithm, which is known by features like trying to find global optimize point and easy implementation with taking a small amount of parameters in adjusting process. It takes benefit from a very productive searching algorithm, which makes it a best tool to work on different optimization research area and problems [ 59 ].

The searching process is led toward solving a nonlinear optimization problem in a real value search space. In this process, an iterative searching happens to find the destination, which is the optimal point. In other words, each particle has a multidimensional search with a specific space, which is updated by particle experience or the best neighbor’s space and the objective function assesses the fitness value of each particle. The best solution, which is found in each iteration, will be kept in memory. If the optimal solution is found by particle, it is called local best or pbest and the optimal point among the particle neighbors is called global best or gbest [ 89 ]. In this algorithm, every potential solution is considered as a particle, which has several features like the current position and velocity. The balance between global and local search can be adjusted by adopting different inertia weight. One of critical success factors in PSO is a trade-off between global and local search in iteration [ 59 ]. Artificial neural network, pattern classification, and fuzzy control are some area for deploying PSO [ 5 ]. Social interaction and communication metaphor like “birds flock and fish schooling” developed this algorithm and it works on the basis of improving social information sharing, which is done among swarm particles [ 12 ].

5.9 Firefly algorithm (FA)

Firefly algorithm was been introduced by Yang [ 16 ]. The main idea of FA is that each firefly has been assumed as unisexual, which is attracted toward other firefly regardless of the gender. Brightness is the main attraction for firefly that stimulates the less bright to move toward brighter ones. The attractiveness and brightness are opposed to distance. The brightness of a firefly has been determined by the area of fitness function [ 90 ]. As the brightness of firefly increased, the level of goodness of solution increased. A full attraction model has been proposed that shows all fireflies will be attracted to brighter ones and similarity of all fireflies will occur if a great number of fireflies attract to a brighter one, which is measured by fitness value. So, convergence rate during the search method will occur in a slow pace.

FA has been inspired from the lightening feature of fireflies and known as swarm intelligence algorithm. FA better works in comparison with genetic algorithm (GA) and PSO in some cases. “Unit commitment,” “energy conservation,” and “complex networks” are some examples of working area of FA [ 61 ]. Fluctuation may occur when huge numbers of fireflies attract to light emission source and the searching process becomes time-consuming. To overcome these issues, neighborhood attraction FA (NaFA) is introduced, which shows that fireflies are just attracted to only some brighter points, which are outlined by previous neighbor [ 62 ].

5.10 Tabu search algorithm (TS)

Tabu search is a meta-heuristic, which was proposed by y Glover and Laguna (1997) on the basis of edge projection and making it better and it tries to make a progress in local search, which leads to a global optimized solution by taking possibility on consecutive algorithm iterations. Local heuristic search process is taken to find solution that can be deployed to combinatorial optimization paradigm [ 2 ]. The searching process in this methodology is flexible as it takes adaptive memory. The process is done during different iterations. In each iteration, a solution is found. The solution has a neighbor point that can be reached via “move.” In every move, a better solution is found, which can be stopped when no better answer is found [ 37 ]. In TS, the aspiration criteria are critical factors that lead the searching process by not considering forbidden solutions that are known by TS. In each solution, the constraints of the objective are met. So, the solutions are both feasible and time-consuming. TS process is continued by using a tabu list (TL), which is a short-term history. The short memory just keeps the recent movement, which is done by deleting the old movement when the memory is full to the maximum level [ 1 ].

The main idea of TS is to move toward solution space, which remains unexplored, which would be an opportunity to keep away from local solution. So, “tabu” movements that are recent movements are kept forbidden, which prevents from visiting previous solution points. This is proved that the method brings high-quality solutions in its iterations [ 57 ].

6. Big data analytics and Internet of Things (IOT)

Internet of things (IOT) put focus on creating an intelligent environment in which things socialize with each other by sensing, processing, communicating, and actuating activities. As IOT sensors gathered a huge amount of raw data, which is needed to be processed and analyzed, powerful tools will enforce the analytics process. This will stimulate to deploy BDA and its methods on IOT-based data. Ref. [ 51 ] proposed a four-layer model to show how BDA can help IOT-based system to work better. This model comprised of data generation, sensor communication, data processing, and data interpretation [ 51 ]. It is cited that beyond 2020 cognitive processing and optimization will be considered on IOT data processing [ 34 ]. In IOT-based systems, acquired signals from sensors are gathered and deployed for processing in frame-by-frame or batch mode. Also, gathered data in IOT system will be deployed in feature extraction, which is followed by classification stage. Machine learning algorithms will be used in data classifying [ 54 ]. Machine learning classification can be deployed on three types of data, which are supervised, semisupervised, and unsupervised [ 54 ]. In decision-making level, which is comprised of pattern recognition, deep learning methods, namely, RNN, DNN, CNN, and ANN can be used for discovering knowledge. Optimization process in IOT can be used to create an optimized cluster in IOT data [ 91 ].

In Figure 2 , the process of IOT is shown. Data is gathered from sensors. Data enters the filtering process. In this level, denoising and data cleansing happen. Also, in this level, feature extraction is considered for classification phase. After preprocessing, decision making happens on the basis of deep learning methodology ( Table 1 ). Deep learning and machine learning algorithms can be used in analyzing of data generated through IOT device, especially in the classification and decision-making phase. Both supervised and unsupervised techniques can be used in classification phase considering the data type. However, both deep learning and machine learning algorithms are eligible in deploying in decision-making phase.

literature review of big data

IOT process.

Deep learning and machine learning techniques on IOT phases.

7. Future research directions

For feature endeavors, it is proposed to work on application of big data analytics methods on IOT fog and edge computing. It is useful to extract patterns from hidden knowledge of data gathered from sensors deploying powerful analytical tools. Fog computing is defined as a technology that is implemented in near distance to end user, which provides local processing and storage to support different devices and sensors. Health care systems gain advantage from IOT for fog computing, which supports mobility and reliability in such systems. Health care data acquisition, processing, and storage of real-time data are done in edge, cloud, and fog layer [ 47 ]. In future research, the area that machine learning algorithms can provide techniques for fog computing can be on the focus. IOT data captured from smart houses needs analytical algorithms to overcome the complexity of offline and online data gathered in processing, classification, and also next best action, or even pattern recognition [ 81 ]. Hospital information system creates “life sciences data,” “clinical data,” “administrative data,” and “social network data.” These data sources are overwhelmed with illness predictions, medical research, or even management and control of disease [ 39 ]. Big data analytics can be a future subject by helping HIS to cover data processing and disease pattern recognition.

Smart house creates ground for real-time data with high complexity, which entitles big data analytics to overcome such sophistication. Classical methods of data analyzing lost their ability in front of evolutionary methods of classification and clustering. So graphic processing unit (GPU) for machine learning and data mining purposes bring advantage for large scale dataset [ 7 ], which leads the applications into lower cost of data analytics. Another way to create future research is to work over different frameworks like Spark, which is an in-memory computation, and with the help of big data analytics, optimization problems can be solved [ 20 ].

Deployment of natural language processing (NLP) in text classification can be accompanied by different methods like CNN and RNN. These methods can gain the result with higher accuracy and lower time (Li et al., 2018).

Predictive analytics offered by big data analytics works on developing predictive models to analyze large volume data both structured and unstructured with the goal of identifying hidden patterns and relations between variables in near future [ 76 ]. Big data analytics can help cognitive computing, and behavior pattern recognition deploys deep learning technique to predict future action as it is used to predict cancer in health care system [ 59 ]. It also leads organizations to understand their problems [ 13 ].

So, future research can be focused on both the new area for application of different machine learning or deep learning algorithm for censored data gathered and also mixture of techniques that can create globally optimal solution with higher accuracy and lower cost. Researchers can put focus on existing problems of industries through mixed application of machine learning and deep learning techniques, which may results in optimize solution with lower cost and higher speed. They also can take identified algorithms in new area of industries to solve problems, create insight, and identify hidden patterns.

In summary, future research can be done as it is shown in Figure 3 .

literature review of big data

Future research on big data analytics (BDA).

8. Conclusion

This chapter has been attempted to give an overview on big data analytics and its subfields, which are machine learning and deep learning techniques. As it is cited before, big data analytics has been generated to overcome the complexity of data managing and also create and bring knowledge into organizations to empower the performances. In this chapter, DNN, RNN, and CNN have been introduced as deep learning methods, and classification, clustering, and evolutionary techniques have been overviewed. Also, a glance at some techniques of every field has been given. Also, the application of machine learning and deep learning in IOT-based data is shown in order to make IOT data analytics much more powerful in phase of classification and decision-making. It has been identified that on the basis of rapid speed of data generation through IOT sensors, big data analytics methods have been widely used for analyzing real-time data, which can solve the problem of complexity of data processing. Hospital information systems (HIS), smart cities, and smart houses take benefits of to-the-point data processing by deploying fog and cloud platforms. The methods are not only deployed to create a clear picture of clusters and classifications of data but also to create insight for future behavior by pattern recognition. A wide variety of future research has been proposed by researchers, from customer pattern recognition to predict illness like cancer and all in between are comprised in area of big data analytics algorithms.

Acknowledgments

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

  • 1. Bożejko W et al. Parallel tabu search for the cyclic job shop scheduling problem. Computers & Industrial Engineering. 2018; 113 :512-524
  • 2. Kiziloz H, Dokeroglu T. A robust and cooperative parallel tabu search algorithm for the maximum vertex weight clique problem. Computers & Industrial Engineering. 2018; 118 :54-66
  • 3. Acharya U et al. Automated detection of coronary artery disease using different durations of ECG segments with convolutional neural network. Knowledge-Based Systems. 2017; 132 :62-71
  • 4. Babu GP, Murty M. A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recognition Letters. 1993; 14 (10):763-769
  • 5. Bonyadi MR, Michalewicz Z. Particle swarm optimization for single objective continuous space problems: A review. Evolutionary Computation. 2017; 25 (1):1-54
  • 6. Caliskan A et al. Classification of high resolution hyperspectral remote sensing data using deep neural networks. Engineering Applications of Artificial Intelligence. 2018; 67 :14-23
  • 7. Cano A. A survey on graphic processing unit computing for large-scale data mining. WIREs Data Mining and Knowledge Discovery. 2017; 8 (1):e1232. DOI: 10.1002/widm.1232
  • 8. Caraveo C et al. Optimization of fuzzy controller design using a new bee colony algorithm with fuzzy dynamic parameter adaptation. Applied Soft Computing. 2016; 43 :131-142
  • 9. Castillo O, Amador-Angulo L. A generalized type-2 fuzzy logic approach for dynamic parameter adaptation in bee colony optimization applied to fuzzy controller design. Information Sciences. 2018; 460-461 :476-496
  • 10. Chen J et al. The synergistic effects of IT-enabled resources on organizational capabilities and firmperformance. Information and Management. 2012; 49 (34):140-152
  • 11. Chou J et al. Metaheuristic optimization within machine learning-based classification system for early warnings related to geotechnical problems. Automation in Construction. 2016; 68 :65-80
  • 12. Côrte-Real A et al. Assessing business value of Big Data Analytics in European firms. Journal of Business Research. 2017; 70 :379-390
  • 13. Côrte-Real N et al. Unlocking the drivers of big data analytics value in firms. Journal of Business Research. 2019; 97 :160-173
  • 14. Delice Y et al. A modified particle swarm optimization algorithm to mixed-model two-sided assembly line balancing. Journal of Intelligent Manufacturing. 2017; 28 (1):23-36
  • 15. Ding S et al. Extreme learning machine: Algorithm, theory and applications. Artificial Intelligent Review. 2015; 44 (1):103-115
  • 16. Dong J, Yang C. Business value of big data analytics: A systems-theoretic approach and empirical test. In: Information & Management. 2018. [In Press]
  • 17. Dorigo M. Ant Colony Optimization: New Optimization Techniques in Engineering. Berlin Heidelberg: Springer-Verlag; 1991. pp. 101-117
  • 18. Esposito C et al. A knowledge-based platform for Big Data analytics based on publish/subscribe services and stream processing. Knowledge-Based Systems. 2015; 79 :3-17
  • 19. Feng L et al. Rough extreme learning machine: A new classification method based on uncertainty measure. Neurocomputing. 2019; 325 :269-282
  • 20. Gonzalez-Lopez J et al. Distributed nearest neighbor classification for large-scale multi-label data on spark. Future Generation Computer Systems. 2018; 87 :66-82
  • 21. Gallicchio C et al. Deep reservoir computing: A critical experimental analysis. Neurocomputing. 2017; 268 :87-99
  • 22. Gandomi A, Haider M. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management. 2015; 35 (2):137-144
  • 23. German F et al. Do retailers benefit from deploying customer analytics? Journal of Retailing. 2014; 90 :587-593
  • 24. Ghosh A et al. Aggregation pheromone density based data clustering. Information Sciences. 2008; 178 :2816-2831
  • 25. Gonzalez-Abril L et al. Handling binary classification problems with a priority class by using support vector machines. Applied Soft Computing. 2017; 61 :661-669
  • 26. Gu X, Angelov P. Self-organizing fuzzy logic classifier. Information Sciences. 2018; 447 :36-51
  • 27. Guo Y et al. Deep learning for visual understanding: A review. Neurocomputing. 2016; 187 :27-48
  • 28. Harfouchi F et al. Modified multiple search cooperative foraging strategy for improved artificial bee colony optimization with robustness analysis. Soft Computing. 2017; 22 (19)
  • 29. Huang J et al. A clustering method based on extreme learning machine. Neurocomputing. 2018; 227 :108-119
  • 30. Hüllermeier E. Does machine learning need fuzzy logic? Fuzzy Sets and Systems. 2015; 281 :292-299
  • 31. Jan B et al. Deep learning in big data analytics: A comparative study. Computers and Electrical Engineering. 2017; 75 :1-13
  • 32. Jiang P, Chen J. Displacement prediction of landslide based on generalized regression neural networks with K-fold cross-validation. Neurocomputing. 2016; 198 :40-47
  • 33. Jiang S et al. Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Systems With Applications. 2017; 82 :216-230
  • 34. Ko Y. How to use negative class information for Naive Bayes classification. Information Processing and Management. 2017; 53 (6):1255-1268
  • 35. Koonce D, Tsaib S. Using data mining to find patterns in genetic algorithm solutions to a job shop schedule. Computers & Industrial Engineering. 2000; 38 (3):361-374
  • 36. Kozak J, Boryczka U. Collective data mining in the ant colony decision tree approach. Information Sciences. 2016; 372 :126-147
  • 37. Kwon O et al. Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management. 2014; 34 (3):387-394
  • 38. Lee I, Lee K. The Internet of Things (IoT): Applications, investments, and challenges for enterprises. Business Horizons. 2015; 58 (4):1-10
  • 39. Li J et al. Medical big data analysis in hospital information system. In: Big Data on Real-World Applications. 2016. Chapter 4
  • 40. Loebbecke C, Picot A. Reflections on societal and business model transformation arising from digitization and big data analytics: A research agenda. Journal of Strategic Information Systems. 2015; 24 (3):149-157
  • 41. Lohrmann C, Luukka P. A novel similarity classifier with multiple ideal vectors based on k-means clustering. Decision Support Systems. 2018; 111 :27-37
  • 42. Martí R et al. Tabu search for the dynamic bipartite drawing problem. Computers and Operations Research. 2018; 91 :1-12
  • 43. Maulik U et al. Genetic algorithm-based clustering technique. Pattern Recognition. 2000; 33 (9):1455-1465
  • 44. Mavrovounioti M, Yang S. Training neural networks with ant colony optimization algorithms for pattern classification. Journal of Soft Computing. 2015; 19 (6):1511-1522
  • 45. Miao Z et al. Robust tracking control of uncertain dynamic nonholonomic systems using recurrent neural networks. Neurocomputing. 2014; 142 :216-227
  • 46. Mohan B, Baskaran R. A survey: Ant colony optimization based recent research and implementation on several engineering domain. Expert Systems with Applications. 2012; 39 (4):4618-4627
  • 47. Mutlag AA et al. Enabling technologies for fog computing in health care IoT systems. Future Generation Computer Systems. 2019; 90 :62-78
  • 48. Najafabadi M et al. Deep learning applications and challenges in big data analytics. Journal of Big Data. 2015:1-21. DOI: 10.1186/s40537-014-0007-7
  • 49. Nguyen T et al. Big data analytics in supply chain management: A state-of-the-art literature review. Computers and Operations Research. 2018; 98 :254-264
  • 50. Ning J et al. A best-path-updating information-guided ant colony optimization algorithm. Information Sciences. 2018; 433-434 :142-162
  • 51. Osipov V, Osipova M. Space–time signal binding in recurrent neural networks with controlled elements. Neurocomputing. 2018; 308 :194-204
  • 52. Panda M, Abraham A. Hybrid evolutionary algorithms for classification data mining. In: Neural Computing & Applications. 2014; 26 (3):507-523
  • 53. Peng H et al. An unsupervised learning algorithm for membrane computing. Information Sciences. 2015; 304 :80-91
  • 54. Peng Y et al. Orthogonal extreme learning machine for image classification. Neurocomputing. 2017; 266 :458-464
  • 55. Qawaqneh Z et al. Age and gender classification from speech and face images by jointly fine-tuned deep neural networks. Expert Systems With Applications. 2017; 85 :78-86
  • 56. Ramsingh J, Bhuvaneswari V. An efficient map reduce-based hybrid NBC-TFIDF algorithm to mine the public sentiment on diabetes mellitus—A big data approach. Journal of King Saud University – Computer and Information Sciences. 2018. [In Press]
  • 57. Rathore M et al. Urban planning and building smart cities based on the Internet of Things using Big Data analytics. Computer Networks. 2016; 101 :63-80
  • 58. Ruan X, Zhang Y. Blind sequence estimation of MPSK signals using dynamically driven recurrent neural networks. Neurocomputing. 2014; 129 :421-427
  • 59. Sekaran K et al. Deep learning convolutional neural network (CNN) with Gaussian mixture model for predicting pancreatic cancer. Multimedia Tools and Applications. 2019:1-15. DOI: 10.1007/s11042-019-7419-5
  • 60. Shah S, Kusiak A. Data mining and genetic algorithm based gene/SNP selection. Artificial Intelligence in Medicine. 2004; 31 (3):183-196
  • 61. Shanthamallu U et al. A brief survey of machine learning methods and their sensor and IoT applications. In: 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA). 2017. DOI: 10.1109/IISA.2017.8316459
  • 62. Shunmugapriya P, Kanmani S. A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm and Evolutionary Computation. 2017; 36 :27-36
  • 63. Sikora R, Piramuthu S. Framework for efficient feature selection in genetic algorithm based data mining. European Journal of Operational Research. 2007; 180 (2):723-737
  • 64. Silva M, Cunha C. A tabu search heuristic for the uncapacitated single allocation p-hub maximal covering problem. European Journal of Operational Research. 2017; 262 (3):954-965
  • 65. Srinivasa KG et al. A self-adaptive migration model genetic algorithm for data mining applications. Information Sciences. 2007; 177 (20):4295-4313
  • 66. Taherkhani M, Safabakhsh R. A novel stability-based adaptive inertia weight for particle swarm optimization. Applied Soft Computing. 2016; 38 :281-295
  • 67. Wan Y et al. Twin extreme learning machines for pattern classification. Neurocomputing. 2017; 260 :235-244
  • 68. Wang H et al. Firefly algorithm with neighborhood attraction. Information Sciences. 2017; 382-383 :374-387
  • 69. Wang H et al. Randomly attracted firefly algorithm with neighborhood search and dynamic parameter adjustment mechanism. Journal of Soft Computing. 2017; 21 (18):5325-5339
  • 70. Wang Q et al. Local kernel alignment based multi-view clustering using extreme learning machine. Neurocomputing. 2018; 275 :1099-1111
  • 71. Wu J et al. A patent quality analysis and classification system using self-organizing maps with support vector machine. Applied Soft Computing. 2016; 41 :305-316
  • 72. Zhang L, Zhang Q. A novel ant-based clustering algorithm using the kernel method. Information Sciences. 2011; 181 :4658-4672
  • 73. Zhang X et al. An overview of recent developments in Lyapunov–Krasovskii functionals and stability criteria for recurrent neural networks with time-varying delays. Neurocomputing. 2018; 313 :392-401
  • 74. Zhu S, Shen Y. Robustness analysis for connection weight matrix of global exponential stability recurrent neural networks. Neurocomputing. 2013; 101 :370-374
  • 75. Wang Y et al. Integrated big data analytics-enabled transformation model: Application to health care. Information and Management. 2018; 55 (1):64-79
  • 76. Wang Y, Hajli N. Exploring the path to big data analytics success in healthcare. Journal of Business Research. 2017; 70 :287-299
  • 77. Iqbal R et al. Big data analytics: Computational intelligence techniques and application areas. Technological Forecasting & Social Change. 2018. [In Press]
  • 78. Wamba S et al. Big data analytics and firm performance: Effects of dynamic capabilities. Journal of Business Research. 2017; 70 :356-365
  • 79. Zhang Q et al. A survey on deep learning for big data. Information Fusion. 2018; 42 :146-157
  • 80. Liu W et al. A survey of deep neural network architectures and their applications. Neurocomputing. 2017; 234 :11-26
  • 81. Yassine A et al. IoT big data analytics for smart homes with fog and cloud computing. Future Generation Computer Systems. 2019; 91 :563-573
  • 82. Yin Z et al. A-optimal convolutional neural network. Neural Computings & Applications. 2016; 30 (7):2295-2304
  • 83. Wang S et al. Convolutional neural network-based hidden Markov models for rolling element bearing fault identification. Knowledge-Based Systems. 2018; 144 :65-76
  • 84. Shi X et al. Tracking topology structure adaptively with deep neural networks. Neural Computing & Application. 2017; 30 (11):3317-3326
  • 85. Zhou L et al. Machine learning on big data: Opportunities and challenges. Neurocomputing Journal. 2017; 237 :350-361
  • 86. Tack C. Artificial intelligence and machine learning | applications in musculoskeletal physiotherapy. Musculoskeletal Science and Practice. 2018; 39 :164-169
  • 87. Tang L et al. A novel perspective on multiclass classification: Regular simplex support vector machine. Information Sciences. 2018; 480 :324-338
  • 88. Xia M et al. A hybrid method based on extreme learning machine and k-nearest neighbor for cloud classification of ground-based visible cloud image. Neurocomputing. 2015; 160 :238-249
  • 89. Onan A. A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Systems with Applications. 2015; 42 (20):6844-6852
  • 90. Gou J et al. A generalized mean distance-based k-nearest neighbor classifier. Expert Systems With Applications. 2019; 115 :356-372
  • 91. Pan Z et al. A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Systems With Applications. 2017; 67 :115-125
  • 92. Xu S, Wang J. Dynamic extreme learning machine for data stream classification. Neurocomputing. 2017; 238 :433-449
  • 93. Du G et al. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Systems. 2016; 99 :135-145
  • 94. Yu S et al. Two improved k-means algorithms. Applied Soft Computing. 2018; 68 :747-755
  • 95. Tabakhi S et al. Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing. 2015; 168 :1024-1036
  • 96. Liu H et al. A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm. Applied Soft Computing. 2018; 68 :360-376
  • 97. Hong T et al. A multi-level ant-colony mining algorithm for membership functions. Information Sciences. 2012; 182 (1):3-14
  • 98. Kuo RJ et al. Integration of growing self-organizing map and bee colony optimization algorithm for part clustering. Computers & Industrial Engineering. 2018; 120 :251-265
  • 99. Verma O et al. Opposition and dimensional based modified firefly algorithm. Expert Systems With Applications. 2016; 44 :168-176
  • 100. Janakiraman S. A hybrid ant colony and artificial bee colony optimization algorithm-based cluster head selection for IoT. Procedia Computer Science. 2018; 143 :360-366
  • 101. Tsai C et al. Metaheuristic algorithms for healthcare: Open issues and challenges. Computers and Electrical Engineering. 2016; 53 :421-434
  • 102. Villarrubia G et al. Artificial neural networks used in optimization problems. Neurocomputing. 2018; 272 :10-16
  • 103. Wari E, Zhu W. A survey on metaheuristics for optimization in food manufacturing. Applied Soft Computing. 2016; 46 :328-343
  • 104. Wu J et al. Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm. Neurocomputing. 2015; 148 :136-142
  • 105. Yang F et al. A new approach to non-fragile state estimation for continuous neural networks with time-delays. Neurocomputing. 2016; 197 :205-211

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Continue reading from the same book

Published: 19 February 2020

By Nityashree Nadar and R. Kamatchi

851 downloads

By Alberto Cano

920 downloads

By Leila Kerkeni, Youssef Serrestou, Mohamed Mbarki, ...

5957 downloads

  • Survey Paper
  • Open access
  • Published: 18 December 2021

A new theoretical understanding of big data analytics capabilities in organizations: a thematic analysis

  • Renu Sabharwal 1 &
  • Shah Jahan Miah   ORCID: orcid.org/0000-0002-3783-8769 1  

Journal of Big Data volume  8 , Article number:  159 ( 2021 ) Cite this article

15k Accesses

7 Citations

Metrics details

Big Data Analytics (BDA) usage in the industry has been increased markedly in recent years. As a data-driven tool to facilitate informed decision-making, the need for BDA capability in organizations is recognized, but few studies have communicated an understanding of BDA capabilities in a way that can enhance our theoretical knowledge of using BDA in the organizational domain. Big Data has been defined in various ways and, the past literature about the classification of BDA and its capabilities is explored in this research. We conducted a literature review using PRISMA methodology and integrated a thematic analysis using NVIVO12. By adopting five steps of the PRISMA framework—70 sample articles, we generate five themes, which are informed through organization development theory, and develop a novel empirical research model, which we submit for validity assessment. Our findings improve effectiveness and enhance the usage of BDA applications in various Organizations.

Introduction

Organizations today continuously harvest user data [e.g., data collections] to improve their business efficiencies and practices. Significant volumes of stored data or data regarding electronic transactions are used in support of decision making, with managers, policymakers, and executive officers now routinely embracing technology to transform these abundant raw data into useful, informative information. Data analysis is complex, but one data-handling method, “Big Data Analytics” (BDA)—the application of advanced analytic techniques, including data mining, statistical analysis, and predictive modeling on big datasets as new business intelligence practice [ 1 ]—is widely applied. BDA uses computational intelligence techniques to transform raw data into information that can be used to support decision-making.

Because decision-making in organizations has become increasingly reliant on Big Data, analytical applications have increased in importance for evidence-based decision making [ 2 ]. The need for a systematic review of Big Data stream analysis using rigorous and methodical approaches to identify trends in Big Data stream tools, analyze techniques, technologies, and methods is becoming increasingly important [ 3 ]. Organizational factors such as organizational resources adjustment, environmental acceptance, and organizational management relate to implement its BDA capability and enhancing its benefits through BDA technologies [ 4 ]. It is evident from past literature that BDA supports the organizational decision-making process by developing suitable theoretical understanding, but extending existing theories remains a significant challenge. The improved capability of BDA will ensure that the organizational products and services are continuously optimized to meet the evolving needs of consumers.

Previous systematic reviews have focused on future BDA adoption challenges [ 5 , 6 , 7 ] or technical innovation aspects of Big Data analytics [ 8 , 9 ]. This signifies those numerous studies have examined Big Data issues in different domains. These different domains are included: quality of Big Data in financial service organization [ 10 ]; organizational value creation because of BDA usage [ 11 ]; application of Big Data in health organizations [ 9 ]; decision improvement using Big Data in health [ 12 ]; application of Big Data in transport organizations [ 13 ]; relationships between Big Data in financial domains [ 14 ]; and quality of Big Data and its impact on government organizations [ 15 ].

While there has been a progressive increase in research on BDA, its capabilities and how organizations may exploit them are less well studied [ 16 ]. We apply a PRISMA framework [ 17 ]) and qualitative thematic analysis to create the model to define the relationship between BDAC and OD. The proposed research presents an overview of BDA capabilities and how they can be utilized by organizations. The implications of this research for future research development. Specifically, we (1) provide an observation into key themes regarding BDAC concerning state-of-the-art research in BDA, and (2) show an alignment to organizational development theory in terms of a new empirical research model which will be submitted for validity assessment for future research of BDAC in organizations.

According to [ 20 ], a systematic literature review first involves describing the key approach and establishing definitions for key concepts. We use a six-phase process to identify, analyze, and sequentially report themes using NVIVO 12.

Study background

Many forms of BDA exist to meet specific decision-support demands of different organizations. Three BDA analytical classes exist: (1) descriptive , dealing with straightforward questions regarding what is or has happened and why—with ‘opportunities and problems’ using descriptive statistics such as historical insights; (2) predictive , dealing with questions such as what will or is likely to happen, by exploring data patterns with relatively complex statistics, simulation, and machine-learning algorithms (e.g., to identify trends in sales activities, or forecast customer behavior and purchasing patterns); and (3) prescriptive , dealing with questions regarding what should be happening and how to influence it, using complex descriptive and predictive analytics with mathematical optimization, simulation, and machine-learning algorithms (e.g., many large-scale companies have adopted prescriptive analytics to optimize production or solve schedule and inventory management issues) [ 18 ]. Regardless of the type of BDA analysis performed, its application significantly impacts tangible and intangible resources within an organization.

Previous studies on BDA

BDA tools or techniques are used to analyze Big Data (such as social media or substantial transactional data) to support strategic decision-making [ 19 ] in different domains (e.g., tourism, supply chain, healthcare), and numerous studies have developed and evaluated BDA solutions to improve organizational decision support. We categorize previous studies into two main groups based on non-technical aspects: those which relate to the development of new BDA requirements and functionalities in a specific problem domain and those which focus on more intrinsic aspects such as BDAC development or value-adding because of their impact on particular aspects of the business. Examples of reviews focusing on technical or problem-solving aspects are detailed in Table 1 .

The second literature group examines BDA in an organizational context, such as improving firm performance using Big Data analytics in specific business domains [ 26 ]. Studies that support BDA lead to different aspects of organizational performance [ 20 , 24 , 25 , 27 , 28 , 29 ] (Table 2 ). Another research on BDA to improve data utilization and decision-support qualities. For example, [ 30 ] explained how BDAC might be developed to improve managerial decision-making processes, and [ 4 ] conducted a thematic analysis of 15 firms to identify the factors related to the success of BDA capability development in SCM.

Potential applications of BDA

Many retail organizations use analytical approaches to gain commercial advantage and organizational success [ 31 ]. Modern organizations increasingly invest in BDA projects to reduce costs, make accurate decision making, and future business planning. For example, Amazon was the first online retailer and maintained its innovative BDA improvement and use [ 31 ]. Examples of successful stories of BDA use in business sectors include.

Retail: business organizations using BDA for dynamic (surge) pricing [ 32 ] to adjust product or service prices based on demand and supply. For instance, Amazon uses dynamic pricing to surge prices by product demand.

Hospitality: Marriott hotels—the largest hospitality agent with a rapidly increasing number of hotels and serviced customers—uses BDA to improve sales [ 33 ].

Entertainment: Netflix uses BDA to retain clientele and increase sales and profits [ 34 , 35 ].

Transportation : Uber uses BDA [ 36 ] to capture Big Data from various consumers and identify the best routes to locations. ‘Uber eats,’ despite competing with other delivery companies, delivers foods in the shortest possible time.

Foodservice: McDonald's continuously updates information with BDA, following a recent shift in food quality, now sells healthy food to consumers [ 37 ], and has adopted a dynamic menu [ 38 ].

Finance: American Express has used BDA for a long time and was one of the first companies to understand the benefits of using BDA to improve business performance [ 39 ]. Big Data is collected on the ways consumers make on- and offline purchases, and predictions are made as to how they will shop in the future.

Manufacturing: General Electric manufactures and distributes products such as wind turbines, locomotives, airplane engines, and ship engines [ 40 ]. By dealing with a huge amount of data from electricity networks, meteorological information systems, geographical information systems, benefits can be brought to the existing power system, including improving customer service and social welfare in the era of big data.

Online business: music streaming websites are increasingly popular and continue to grow in size and scope because consumers want a customized streaming service [ 41 ]. Many streaming services (e.g., Apple Music, Spotify, Google Music) use various BDA applications to suggest new songs to consumers.

Organization value assessment with BDA

Specific performance measures must be established that rely on the number of organizational contextual factors such as the organization's goal, the external environment of the organization, and the organization itself. When looking at the above contexts regarding the use of BDA to strengthen process innovation skills, it is important to note that the approach required to achieve positive results depends on the different combinations along with the area in which BDA deployed [ 42 ].

Organizational development and BDA

To assist organization decision-making for growth, effective processes are required to perform operations such as continuous diagnosis, action planning, and the implementation and evaluation of BDA. Lewin’s Organizational Development (OD) theory regards processes as having a goal to transfer knowledge and skills to an organization, with the process being mainly to improve problem-solving capacity and to manage future change. Beckhard [ 43 ] defined OD as the internal dynamics of an organization, which involve a collection of individuals working as a group to improve organizational effectiveness, capability, work performance, and the ability to adjust culture, policies, practices, and procedure requirements.

OD is ‘a system-wide application and transfer of behavioral science knowledge to the planned development, improvement, and reinforcement of the strategies, structures, and processes that lead to organization effectiveness’ [ 44 ], and has three concepts: organizational climate, culture, and capability [ 45 ]. Organizational climate is ‘the mood or unique personality of an organization’ [ 45 ] which includes shared perceptions of policies, practices, and procedures; climate features also consist of leadership, communication, participative management, and role clarity. Organizational culture involves shared basic assumptions, values, norms, behavioral patterns, and artifacts, defined by [ 46 ] as a pattern of shared basic assumptions that a group learned by solving problems of external adaptation and internal integration (p. 38). Organizational capacity (OC) implies the organization's function, such as the production of services or products or maintenance of organizational operations, and has four components: resource acquisition, organization structure, production subsystem, and accomplishment [ 47 ]. Organizational culture and climate affect an organization’s capacity to operate adequately (Fig.  1 ).

figure 1

Framework of modified organizational development theory [ 45 ]

Research methodology

Our systematic literature review presents a research process for analyzing and examining research and gathering and evaluating it [ 48 ] In accordance with a PRISMA framework [ 49 ]. We use keywords to search for articles related to the BDA application, following a five-stage process.

Stage1: design development

We establish a research question to instruct the selection and search strategy and analysis and synthesis process, defining the aim, scope, and specific research goals following guidelines, procedures, and policies of the Cochrane Handbook for Systematic Reviews of Intervention [ 50 ]. The design review process is directed by the research question: what are the consistent definitions of BDA, unique attributes, objections, and business revolution, including improving the decision-making process and organization performance with BDA? The below table is created using the outcome of the search performed using Keywords- Organizational BDAC, Big Data, BDA (Table 3 ).

Stage 2: inclusion and elimination criteria

To maintain the nuances of a systematic review, we apply various inclusion and exclusion criteria to our search for research articles in four databases: Science Direct, Web of Science, IEEE (Institute of Electrical and Electronics Engineers), and Springer Link. Inclusion criteria include topics on ‘Big Data in Organization’ published between 2015 to 2021, in English. We use essential keywords to identify the most relevant articles, using truncation, wildcarding, and appropriate Boolean operators (Table 4 ).

Stage 3: literature sources and search approach

Research articles are excluded based on keywords and abstracts, after which 8062 are retained (Table 5 ). The articles only selected keywords such as Big Data, BDA, BDAC, and the Abstract only focused on the Organizational domain.

Stage 4: assess the quality of full papers

At this stage, for each of the 161 research articles that remained after stage 3 presented in Table 6 , which was assessed independently by authors in terms of several quality criteria such as credibility, to assess whether the articles were well presented, relevance which was assessed based on whether the articles were used in the organizational domain.

Stage 5: literature extraction and synthesis process

At this stage, only journal articles and conference papers are selected. Articles for which full texts were not open access were excluded, reducing our references to 70 papers Footnote 1 (Table 7 ).

Meta-analysis of selected papers

Of the 70 papers satisfying our selection criteria, publication year and type (journal or conference paper) reveal an increasing trend in big data analytics over the last 6 years (Table 6 ). Additionally, journals produced more BDA papers than Conference proceedings (Fig.  2 ), which may be affected during 2020–2021 because of COVID, and fewer conference proceedings or publications were canceled.

figure 2

Distribution of publications by year and publication type

Of the 70 research articles, 6% were published in 2015, 13% (2016), 14% (2017), 16% (2018), 20% (2019), 21% (2020), and 10% (untill May 2021).

Thematic analysis is used to find the results which can identify, analyze and report patterns (themes) within data, and produce an insightful analysis to answer particular research questions [ 51 ].

The combination of NVIVO and Thematic analysis improves results. Judger [ 52 ] maintained that using computer-assisted data analysis coupled with manual checks improves findings' trustworthiness, credibility, and validity (p. 6).

Defining big data

Of 70 articles, 33 provide a clear replicable definition of Big Data, from which the five representative definitions are presented in Table 8 .

Defining BDA

Of 70 sample articles, 21 clearly define BDA. The four representative definitions are presented in Table 9 . Some definitions accentuate the tools and processes used to derive new insights from big data.

Defining Big Data analytics capability

Only 16% of articles focus on Big Data characteristics; one identifies challenges and issues with adopting and implementing the acquisition of Big Data in organizations [ 42 ]. The above study resulted that BDAC using the large volumes of data generated through different devices and people to increase efficiency and generate more profits. BDA capability and its potential value could be more than a business expects, which has been presented that the professional services, manufacturing, and retail have structural barriers and overcome these barriers with the use of Big Data [ 60 ]. We define BDAC as the combined ability to store, process, and analyze large amounts of data to provide meaningful information to users. Four dimensions of BDAC exist data integration, analytical, predictive, and data interpretation (Table 10 ).

It is feasible to identify outstanding issues of research that are of excessive relevance, which has termed in five themes using NVIVO12 (Fig.  3 ). Table 11 illustrates four units that combine NVIVO with thematic analysis for analysis: Big data, BDA, BDAC, and BDA themes. We manually classify five BDA themes to ensure accuracy with appropriate perception in detail and provide suggestions on how future researchers might approach these problems using a research model.

figure 3

Thematic analysis using NVIVO 12

Manyika et al . [ 63 ] considered that BDA could assist an organization to improve its decision making, minimize risks, provide other valuable insights that would otherwise remain hidden, aid the creation of innovative business models, and improve performance.

The five themes presented in Table 11 identify limitations of existing literature, which are examined in our research model (Fig.  4 ) using four hypotheses. This theoretical model identifies organizational and individual levels as being influenced by organization climate, culture, and capacity. This model can assist in understanding how BDA can be used to improve organizational and individual performance.

figure 4

The framework of organizational development theory [ 64 ]

The Research model development process

We analyze literature using a new research method, driven by the connection between BDAC and resource-based views, which included three resources: tangible (financial and physical), human skills (employees’ knowledge and skills), and intangible (organizational culture and organizational learning) used in IS capacity literature [ 65 , 66 , 67 , 68 ]. Seven factors enable firms to create BDAC [ 16 ] (Fig.  5 ).

figure 5

Classification of Big Data resources (adapted from [ 16 ])

To develop a robust model, tangible, intangible, and human resource types should be implemented in an organization and contribute to the emergence of the decision-making process. This research model recognizes BDAC to enhance OD, strengthening organizational strategies and the relationship between BD resources and OD. Figure  6 depicts a theoretical framework illustrating how BDA resources influence innovation sustainability and OD, where Innovation sustainability helps identify market opportunities, predict customer needs, and analyze customer purchase decisions [ 69 ].

figure 6

Theroretical framework illustrating how BDA resources influence innovation sustainability and organizational development (adapted from [ 68 ])

Miller [ 70 ] considered data a strategic business asset and recommended that businesses and academics collaborate to improve knowledge regarding BD skills and capability across an organization; [ 70 ] concluded that every profession, whether business or technology, will be impacted by big data and analytics. Gobble [ 71 ] proposed that an organization should develop new technologies to provide necessary supplements to enhance growth. Big Data represents a revolution in science and technology, and a data-rich smart city is the expected future that can be developed using Big Data [ 72 ]. Galbraith [ 73 ] reported how an organization attempting to develop BDAC might experience obstacles and opportunities. We found no literature that combined Big Data analytics capability and Organizational Development or discussed interaction between them.

Because little empirical evidence exists regarding the connection between OD and BDA or their characteristics and features, our model (Fig.  7 ) fills an important void, directly connecting BDAC and OD, and illustrates how it affects OD in the organizational concepts of capacity, culture, and climate, and their future resources. Because BDAC can assist OD through the implementation of new technologies [ 15 , 26 , 57 ], we hypothesize:

figure 7

Proposed interpretation in the research model

H1: A positive relationship exists between Organizational Development and BDAC.

OC relies heavily on OD, with OC representing a resource requiring development in an organization. Because OD can improve OC [ 44 , 45 ], we hypothesize that:

H2: A positive relationship exists between Organizational Development and Organizational Capability.

With the implementation or adoption of BDAC, OC is impacted [ 46 ]. Big data enables an organization to improve inefficient practices, whether in marketing, retail, or media. We hypothesize that:

H3: A positive relationship exists between BDAC and Organizational Culture.

Because BDAC adoption can affect OC, the policies, practices, and measures associated with an organization's employee experience [ 74 ], and improve both the business climate and an individual’s performance, we hypothesize that:

H4: A positive relationship exists between BDAC and Organizational Climate.

Our research is based on a need to develop a framework model in relation to OD theory because modern organizations cannot ignore BDA or its future learning and association with theoretical understanding. Therefore, we aim to demonstrate current trends in capabilities and a framework to improve understanding of BDAC for future research.

Despite the hype that encompasses Big Data, the organizational development and structure through which it results in competitive gains have remained generally underexplored in empirical studies. It is feasible to distinguish the five prominent, highly relevant themes discussed in an earlier section by orchestrating a systematic literature review and recording what is known to date. By conducting those five thematic areas of the research, as depicted in the research model in Fig.  7 , provide relation how they are impacting each other’s performance and give some ideas on how researchers could approach these problems.

The number of published papers on Big Data is increasing. Between 2015 and May 2021, the highest proportion of journal articles for any given year (21%) occurred until May 2021 with the inclusion or exclusion criteria such as the article selection only opted using four databases: Science Direct, Web of Science, IEEE (Institute of Electrical and Electronics Engineers), and Springer Link and included only those articles which titled as 'Big Data in Organization' published, in the English language. We use essential keywords to identify the most relevant articles, using truncation, wildcarding, and appropriate Boolean operators. While BDAC can improve business-related outcomes, including more effective marketing, new revenue opportunities, customer personalization, and improved operational efficiency, existing literature has focused on only one or two aspects of BDAC. Our research model (Fig.  7 ) represents the relationship between BDAC and OD to better understand their impacts on OC. We explain that the proposed model education will enhance knowledge of BDAC and that it may better meet organizational requirements, ensuring improved products and services to optimize consumer outcomes.

Considerable research has been conducted in many different contexts such as the health sector, education about Big Data, but according to past literature, BDAC in an organization is still an open issue, how to utilize BDAC within the organization for development purposes. The full potential of BDA and what it can offer must be leveraged to gain a commercial advantage. Therefore, we focus on summarizing by creating the themes using past relevant literature and propose a research model based on literature [ 61 ] for business.

While we explored Springer Link, IEEE, Science Direct, and Web of Science (which index high-impact journal and conference papers), the possibility exists that some relevant journals were missed. Our research is constrained by our selection criteria, including year, language (English), and peer-reviewed journal articles (we omitted reports, grey journals, and web articles).

A steadily expanding number of organizations has been endeavored to utilize Big Data and organizational analytics to analyze available data and assist with decision-making. For these organizations, influence the full potential that Big Data and organizational analytics can present to acquire competitive advantage. In any case, since Big Data and organizational analytics are generally considered as new innovative in business worldview, there is a little exploration on how to handle them and leverage them adequately. While past literature has shown the advantages of utilizing Big Data in various settings, there is an absence of theoretically determined research on the most proficient method to use these solutions to acquire competitive advantage. This research recognizes the need to explore BDA through a comprehensive approach. Therefore, we focus on summarizing with the proposed development related to BDA themes on which we still have a restricted observational arrangement.

To this end, this research proposes a new research model that relates earlier studies regarding BDAC in organizational culture. The research model provides a reference to the more extensive implementation of Big Data technologies in an organizational context. While the hypothesis present in the research model is on a significant level and can be deciphered as addition to theoretical lens, they are depicted in such a way that they can be adapted for organizational development. This research poses an original point of view on Big Data literature since, by far majority focuses on tools, infrastructure, technical aspects, and network analytics. The proposed framework contributes to Big Data and its capability in organizational development by covering the gap which has not addressed in past literature. This research model also can be viewed as a value-adding knowledge for managers and executives to learn how to drive channels of creating benefit in their organization through the use of Big Data, BDA, and BDAC.

We identify five themes to leverage BDA in an organization and gain a competitive advantage. We present a research model and four hypotheses to bridge gaps in research between BDA and OD. The purpose of this model and these hypotheses is to guide research to improve our understanding of how BDA implementation can affect an organization. The model goes for the next phase of our study, in which we will test the model for its validity.

Availability of data and materials

Data will be supplied upon request.

Appendix A is submitted as a supplementary file for review.

Abbreviations

The Institute of Electrical and Electronics Engineers

  • Big Data Analytics

Big Data Analytics Capabilities

Organizational Development

  • Organizational Capacity

Russom P. Big data analytics. TDWI Best Practices Report, Fourth Quarter. 2011;19(4):1–34.

Google Scholar  

Mikalef P, Boura M, Lekakos G, Krogstie J. Big data analytics and firm performance: findings from a mixed-method approach. J Bus Res. 2019;98:261–76.

Kojo T, Daramola O, Adebiyi A. Big data stream analysis: a systematic literature review. J Big Data. 2019;6(1):1–30.

Jha AK, Agi MA, Ngai EW. A note on big data analytics capability development in supply chain. Decis Support Syst. 2020;138:113382.

Posavec AB, Krajnović S. Challenges in adopting big data strategies and plans in organizations. In: 2016 39th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE. 2016. p. 1229–34.

Madhlangobe W, Wang L. Assessment of factors influencing intent-to-use Big Data Analytics in an organization: pilot study. In: 2018 IEEE 20th International Conference on High-Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE. 2018. p. 1710–1715.

Saetang W, Tangwannawit S, Jensuttiwetchakul T. The effect of technology-organization-environment on adoption decision of big data technology in Thailand. Int J Electr Comput. 2020;10(6):6412. https://doi.org/10.11591/ijece.v10i6.pp6412-6422 .

Article   Google Scholar  

Pei L. Application of Big Data technology in construction organization and management of engineering projects. J Phys Conf Ser. 2020. https://doi.org/10.1088/1742-6596/1616/1/012002 .

Marashi PS, Hamidi H. Business challenges of Big Data application in health organization. In: Khajeheian D, Friedrichsen M, Mödinger W, editors. Competitiveness in Emerging Markets. Springer, Cham; 2018. p. 569–584. doi: https://doi.org/10.1007/978-3-319-71722-7_28 .

Haryadi AF, Hulstijn J, Wahyudi A, Van Der Voort H, Janssen M. Antecedents of big data quality: an empirical examination in financial service organizations. In 2016 IEEE International Conference on Big Data (Big Data). IEEE. 2016. p. 116–121.

George JP, Chandra KS. Asset productivity in organisations at the intersection of Big Data Analytics and supply chain management. In: Chen JZ, Tavares J, Shakya S, Iliyasu A, editors. Image Processing and Capsule Networks. ICIPCN 2020. Advances in Intelligent Systems and Computing, vol 1200. Springer, Cham; 2020. p. 319–330.

Sousa MJ, Pesqueira AM, Lemos C, Sousa M, Rocha Á. Decision-making based on big data analytics for people management in healthcare organizations. J Med Syst. 2019;43(9):1–10.

Du G, Zhang X, Ni S. Discussion on the application of big data in rail transit organization. In: Wu TY, Ni S, Chu SC, Chen CH, Favorskaya M, editors. International conference on smart vehicular technology, transportation, communication and applications. Springer: Cham; 2018. p. 312–8.

Wahyudi A, Farhani A, Janssen M. Relating big data and data quality in financial service organizations. In: Al-Sharhan SA, Simintiras AC, Dwivedi YK, Janssen M, Mäntymäki M, Tahat L, Moughrabi I, Ali TM, Rana NP, editors. Conference on e-Business, e-Services and e-Society. Springer: Cham; 2018. p. 504–19.

Alkatheeri Y, Ameen A, Isaac O, Nusari M, Duraisamy B, Khalifa GS. The effect of big data on the quality of decision-making in Abu Dhabi Government organisations. In: Sharma N, Chakrabati A, Balas VE, editors. Data management, analytics and innovation. Springer: Singapore; 2020. p. 231–48.

Gupta M, George JF. Toward the development of a big data analytics capability. Inf Manag. 2016;53(8):1049–64.

Selçuk AA. A guide for systematic reviews: PRISMA. Turk Arch Otorhinolaryngol. 2019;57(1):57.

Tiwari S, Wee HM, Daryanto Y. Big data analytics in supply chain management between 2010 and 2016: insights to industries. Comput Ind Eng. 2018;115:319–30.

Miah SJ, Camilleri E, Vu HQ. Big Data in healthcare research: a survey study. J Comput Inform Syst. 2021;7:1–3.

Mikalef P, Pappas IO, Krogstie J, Giannakos M. Big data analytics capabilities: a systematic literature review and research agenda. Inf Syst e-Business Manage. 2018;16(3):547–78.

Nguyen T, Li ZHOU, Spiegler V, Ieromonachou P, Lin Y. Big data analytics in supply chain management: a state-of-the-art literature review. Comput Oper Res. 2018;98:254–64.

MathSciNet   MATH   Google Scholar  

Günther WA, Mehrizi MHR, Huysman M, Feldberg F. Debating big data: a literature review on realizing value from big data. J Strateg Inf. 2017;26(3):191–209.

Rialti R, Marzi G, Ciappei C, Busso D. Big data and dynamic capabilities: a bibliometric analysis and systematic literature review. Manag Decis. 2019;57(8):2052–68.

Wamba SF, Gunasekaran A, Akter S, Ren SJ, Dubey R, Childe SJ. Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res. 2017;70:356–65.

Wang Y, Hajli N. Exploring the path to big data analytics success in healthcare. J Bus Res. 2017;70:287–99.

Akter S, Wamba SF, Gunasekaran A, Dubey R, Childe SJ. How to improve firm performance using big data analytics capability and business strategy alignment? Int J Prod Econ. 2016;182:113–31.

Kwon O, Lee N, Shin B. Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manage. 2014;34(3):387–94.

Chen DQ, Preston DS, Swink M. How the use of big data analytics affects value creation in supply chain management. J Manag Info Syst. 2015;32(4):4–39.

Kim MK, Park JH. Identifying and prioritizing critical factors for promoting the implementation and usage of big data in healthcare. Inf Dev. 2017;33(3):257–69.

Popovič A, Hackney R, Tassabehji R, Castelli M. The impact of big data analytics on firms’ high value business performance. Inf Syst Front. 2018;20:209–22.

Hewage TN, Halgamuge MN, Syed A, Ekici G. Big data techniques of Google, Amazon, Facebook and Twitter. J Commun. 2018;13(2):94–100.

BenMark G, Klapdor S, Kullmann M, Sundararajan R. How retailers can drive profitable growth through dynamic pricing. McKinsey & Company. 2017. https://www.mckinsey.com/industries/retail/our-insights/howretailers-can-drive-profitable-growth-throughdynamic-pricing . Accessed 13 Mar 2021.

Richard B. Hotel chains: survival strategies for a dynamic future. J Tour Futures. 2017;3(1):56–65.

Fouladirad M, Neal J, Ituarte JV, Alexander J, Ghareeb A. Entertaining data: business analytics and Netflix. Int J Data Anal Inf Syst. 2018;10(1):13–22.

Hadida AL, Lampel J, Walls WD, Joshi A. Hollywood studio filmmaking in the age of Netflix: a tale of two institutional logics. J Cult Econ. 2020;45:1–26.

Harinen T, Li B. Using causal inference to improve the Uber user experience. Uber Engineering. 2019. https://eng.uber.com/causal-inference-at-uber/ . Accessed 10 Mar 2021.

Anaf J, Baum FE, Fisher M, Harris E, Friel S. Assessing the health impact of transnational corporations: a case study on McDonald’s Australia. Glob Health. 2017;13(1):7.

Wired. McDonald's Bites on Big Data; 2019. https://www.wired.com/story/mcdonalds-big-data-dynamic-yield-acquisition

Bernard M. & Co. American Express: how Big Data and machine learning Benefits Consumers And Merchants, 2018. https://www.bernardmarr.com/default.asp?contentID=1263

Zhang Y, Huang T, Bompard EF. Big data analytics in smart grids: a review. Energy Informatics. 2018;1(1):8.

HBS. Next Big Sound—moneyball for music? Digital Initiative. 2020. https://digital.hbs.edu/platform-digit/submission/next-big-sound-moneyball-for-music/ . Accessed 10 Apr 2021.

Mneney J, Van Belle JP. Big data capabilities and readiness of South African retail organisations. In: 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence). IEEE. 2016. p. 279–86.

Beckhard R. Organizational issues in the team delivery of comprehensive health care. Milbank Mem Fund. 1972;50:287–316.

Cummings TG, Worley CG. Organization development and change. 8th ed. Mason: Thompson South-Western; 2009.

Glanz K, Rimer BK, Viswanath K, editors. Health behavior and health education: theory, research, and practice. San Francisco: Wiley; 2008.

Schein EH. Organizational culture and leadership. San Francisco: Jossey-Bass; 1985.

Prestby J, Wandersman A. An empirical exploration of a framework of organizational viability: maintaining block organizations. J Appl Behav Sci. 1985;21(3):287–305.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62(10):e1–34.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

Higgins JP, Green S, Scholten RJPM. Maintaining reviews: updates, amendments and feedback. Cochrane handbook for systematic reviews of interventions. 31; 2008.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101.

Judger N. The thematic analysis of interview data: an approach used to examine the influence of the market on curricular provision in Mongolian higher education institutions. Hillary Place Papers, University of Leeds. 2016;3:1–7

Khine P, Shun W. Big data for organizations: a review. J Comput Commun. 2017;5:40–8.

Zan KK. Prospects for using Big Data to improve the effectiveness of an education organization. In: 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) . IEEE. 2019. p. 1777–9.

Ekambaram A, Sørensen AØ, Bull-Berg H, Olsson NO. The role of big data and knowledge management in improving projects and project-based organizations. Procedia Comput Sci. 2018;138:851–8.

Rialti R, Marzi G, Silic M, Ciappei C. Ambidextrous organization and agility in big data era: the role of business process management systems. Bus Process Manag. 2018;24(5):1091–109.

Wang Y, Kung L, Gupta S, Ozdemir S. Leveraging big data analytics to improve quality of care in healthcare organizations: a configurational perspective. Br J Manag. 2019;30(2):362–88.

De Mauro A, Greco M, Grimaldi M, Ritala P. In (Big) Data we trust: value creation in knowledge organizations—introduction to the special issue. Inf Proc Manag. 2018;54(5):755–7.

Batistič S, Van Der Laken P. History, evolution and future of big data and analytics: a bibliometric analysis of its relationship to performance in organizations. Br J Manag. 2019;30(2):229–51.

Jokonya O. Towards a conceptual framework for big data adoption in organizations. In: 2015 International Conference on Cloud Computing and Big Data (CCBD). IEEE. 2015. p. 153–160.

Mikalef P, Krogstie J, Pappas IO, Pavlou P. Exploring the relationship between big data analytics capability and competitive performance: the mediating roles of dynamic and operational capabilities. Inf Manag. 2020;57(2):103169.

Shuradze G, Wagner HT. Towards a conceptualization of data analytics capabilities. In: 2016 49th Hawaii International Conference on System Sciences (HICSS). IEEE. 2016. p. 5052–64.

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung Byers A. Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute. 2011. https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation . Accessed XX(day) XXX (month) XXXX (year).

Wu YK, Chu NF. Introduction of the transtheoretical model and organisational development theory in weight management: a narrative review. Obes Res Clin Pract. 2015;9(3):203–13.

Grant RM. Contemporary strategy analysis: Text and cases edition. Wiley; 2010.

Bharadwaj AS. A resource-based perspective on information technology capability and firm performance: an empirical investigation. MIS Q. 2000;24(1):169–96.

Chae HC, Koh CH, Prybutok VR. Information technology capability and firm performance: contradictory findings and their possible causes. MIS Q. 2014;38:305–26.

Santhanam R, Hartono E. Issues in linking information technology capability to firm performance. MIS Q. 2003;27(1):125–53.

Hao S, Zhang H, Song M. Big data, big data analytics capability, and sustainable innovation performance. Sustainability. 2019;11:7145. https://doi.org/10.3390/su11247145 .

Miller S. Collaborative approaches needed to close the big data skills gap. J Organ Des. 2014;3(1):26–30.

Gobble MM. Outsourcing innovation. Res Technol Manag. 2013;56(4):64–7.

Ann Keller S, Koonin SE, Shipp S. Big data and city living–what can it do for us? Signif (Oxf). 2012;9(4):4–7.

Galbraith JR. Organizational design challenges resulting from big data. J Organ Des. 2014;3(1):2–13.

Schneider B, Ehrhart MG, Macey WH. Organizational climate and culture. Annu Rev Psychol. 2013;64:361–88.

Download references

Acknowledgements

Not applicable

Not applicable.

Author information

Authors and affiliations.

Newcastle Business School, University of Newcastle, Newcastle, NSW, Australia

Renu Sabharwal & Shah Jahan Miah

You can also search for this author in PubMed   Google Scholar

Contributions

The first author conducted the research, while the second author has ensured quality standards and rewritten the entire findings linking to underlying theories. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shah Jahan Miah .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sabharwal, R., Miah, S.J. A new theoretical understanding of big data analytics capabilities in organizations: a thematic analysis. J Big Data 8 , 159 (2021). https://doi.org/10.1186/s40537-021-00543-6

Download citation

Received : 17 August 2021

Accepted : 16 November 2021

Published : 18 December 2021

DOI : https://doi.org/10.1186/s40537-021-00543-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Organization
  • Systematic literature review
  • Big Data Analytics capabilities
  • Organizational Development Theory
  • Organizational Climate
  • Organizational Culture

literature review of big data

A Literature Review on Big Data Analytics Capabilities

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

A systematic literature review on the use of big data analytics in humanitarian and disaster operations

Abhilash kondraganti.

University of Liverpool Management School, Chatham Street, Liverpool, L69 7ZH UK

Gopalakrishnan Narayanamurthy

Hossein sharifi, associated data.

Not applicable.

At the start of this review, 168 million individuals required humanitarian assistance, at the conclusion of the research, the number had risen to 235 million. Humanitarian aid is critical not just for dealing with a pandemic that occurs once every century, but more for assisting amid civil conflicts, surging natural disasters, as well as other kinds of emergencies. Technology's dependability to support humanitarian and disaster operations has never been more pertinent and significant than it is right now. The ever-increasing volume of data, as well as innovations in the field of data analytics, present an incentive for the humanitarian sector. Given that the interaction between big data and humanitarian and disaster operations is crucial in the coming days, this systematic literature review offers a comprehensive overview of big data analytics in a humanitarian and disaster setting. In addition to presenting the descriptive aspects of the literature reviewed, the results explain review of existent reviews, the current state of research by disaster categories, disaster phases, disaster locations, and the big data sources used. A framework is also created to understand why researchers employ various big data sources in different crisis situations. The study, in particular, uncovered a considerable research disparity in the disaster group, disaster phase, and disaster regions, emphasising how the focus is on reactionary interventions rather than preventative approaches. These measures will merely compound the crisis, and so is the reality in many COVID-19-affected countries. Implications for practice and policy-making are also discussed.

Introduction

Humanitarian crises have been on the rise (UN OCHA, 2020 ), and, due to the increasing complexity of human societies, are threatening societies' livelihood more than ever. According to UNDRR ( 2020b ), the number of natural disasters has doubled from the period of 1980–1999 to the period of 2000–2019. Response to the events and crises costs the global society some extensive amounts (e.g., according to Financial Tracking Service ( 2021 ), the funding requirements in 2020 were estimated at $38.54 billion). While accessing such funds is increasingly challenging to provide for, the bigger issue is the cost-effectiveness in operations to prevent excessive reliance solely on funding.

Generally, as most aspects of the modern society, use of new and emerging technologies has been a major part of the new solutions to old and new problems. For instance, disaster relief operations are mainly logistical, accounting for 60 to 80% of total humanitarian relief spending (Lacourt & Radosta, 2019 ; Van Wassenhove, 2006 ). Owing to the lack of analysis and relief efforts duplication, it is estimated about 35 to 40% of these logistical expenses are frittered (Day et al., 2012 ; Kwapong Baffoe & Luo, 2020 ). The crucial, uncertain, and intricate nature of field operations necessitates swift decision-making (Knox Clarke & Campbell, 2020 ). Furthermore, the field of humanitarian and disaster operations (HDO) is diversifying with the engagement of individuals including volunteers and crowdsourcing participants who may not be closely associated or affiliated with humanitarian organisations and lack adequate training. As a result, the deployment of new technologies, particularly big data analytics (BDA), has become a critical component in resolving concerns about collaboration, efficiency, and efficacy in crisis and relief operations (Dubey et al., 2019 ; Jeble et al., 2019 ; UN OCHA, 2021 ). HDO has seen considerable transformations over the years, from traditional volunteers to digital volunteers (Behl et al., 2021a , 2021b ), and from conventional donations to technology-driven crowdfunding platforms (Behl & Dutta, 2020 ; Behl et al., 2021a , 2021b ).

Evidently, the application of BDA in the humanitarian and disaster sector has been rare (Centre for Humanitarian Data, 2019 ), and way behind the commercial business sector. On the other hand, usefulness of these technologies has been a matter of debate. As examples, while Swaminathan ( 2018 ) argues that, incorporating BDA may also enable humanitarian organisations in experiencing operational improvements, Sharma and Joshi ( 2019 ) opined that data does not accurately reflect the situation on the ground, and relying significantly on BDA may undermine humanitarian operations. A line of disagreement can be the social and human sides of humanitarian operations, where the core humanitarian principle of being humane (UN OCHA, 2010 ) might be challenging to achieve if sent in a data-driven non-human context.

In our study, HDO are defined as operational activities involved in any stage of a humanitarian crisis or disaster, including mitigation initiatives, preparedness efforts, relief-related activities, and recovery associated actions. The future of HDO in the light of BDA is an important topic, which has been partially addressed with many questions and challenges to engage with and answer. While the literature on the subject has been growing, it still does not encompass all of the existing collective views, challenges, aspects of the use of new technologies (BDA here), and ways ahead for the sector. Such challenges are further intensified when considering the scope and depth of the problems in hand including: types of disasters; contextual aspects of the problem such as geographical, social and economic issues; and complexities of the process to adopt, successfully apply and manage implications of the new technologies.

The academic research domain is yet to become mature on the use of BDA in the HDO field. The use of technology in HDO witnessed a surge at some point, particularly after the 2010 Haiti earthquake (Burns, 2015 ; Ragini et al., 2018 ; Read et al., 2016 ; Sandvik et al., 2014 ), but still remains as a discussion point largely. After a decade, another disaster, COVID-19, as an unprecedented event, has brought the attention back on BDA where data driven decision making is significantly increased (Gazi & Gazis, 2020 ). But what has happened in the last 10 years, how far have we come in this field, and what key issues are there for the research community to consider that need new insight and answers. This research attends this matter and attempts to review the state of academic research on BDA in HDO. The article aims at delivering a thorough review of the subject matter as well as insights into areas where the future research could focus. The main objective of this review is to examine and evaluate how BDA has been employed in numerous disasters, disaster phases, and disaster locations in the field of HDO to assess how far the research has progressed so far. To achieve this, three research questions (RQ) are designed for this study as follows:

  • RQ1. How has the research on the application of BDA for HDO evolved over time?
  • RQ2. What is the status of the BDA application across different disaster categories, disaster phases, disaster locations, and what different types of big data have been used?
  • RQ3. What are the key theoretical lenses used to examine and explain BDA application in HDO?

The study contributes to the subject domain by offering a research background to understand the state of disasters and the review of extant literature reviews in the field. The method utilised to undertake the review, including the review protocol, search strategy, and article quality assessment, is outlined in the following section. The outcomes of the review are reported and discussed in the sections that follow. Finally, the paper offers areas for further research to enhance the application of BDA in the HDO sector, as well as the review's limitations.

Research background

This research integrates humanitarian crisis and disaster operations together. Humanitarian operations takes place to alleviate human suffering where local mechanisms are inadequate to accommodate and offer the necessary assistance (ReliefWeb, 2008 ). Disaster operations, on the other hand, include activities carried out before, during, and after a disaster to save lives, reduce economic damage, and restore normalcy (Altay & Green III, 2006 ). This section discusses the current state of different disaster categories, and evaluates existing literature reviews in the field to assess the field's progress.

State of the disaster types

Before getting into the actual review, this study needs to understand what types of disasters are out there and how these are classified over the years in order to report disasters in review articles in the form of a standardised list. Besides, adhering to the standard list of disasters leads to better reporting and ease in comparisons.

The exploration revealed that there is no particular norm when it comes to disaster types. Scholars initially described disasters into two types, ‘natural’ and ‘man-made’ (Berren et al., 1980 ; de Boer, 1990 ) or ‘natural’ and ‘human-induced’ (Gray, 1982 ) but the new type of disasters ‘industrial’ (Taylor, 1990 ) and ‘hybrid’ (Shaluf et al., 2001 ; Shaluf, 2007a , 2007b ) are added to the list at later years. Altay and Green III ( 2006 ) in their review of disasters in operation management separated disasters mainly into natural and man-made and the continuation review by Galindo and Batta ( 2013 ) also retains the same description for disasters. These are again altered in the last decade and changed the description to natural and human-made or human-induced disasters (Khan et al., 2020 ). Disasters in the twenty-first century are never constant as the human race has witnessed and is continuously witnessing new and different kinds of modern disasters in this century (De Smet et al., 2012 ). Hence the type of disasters is changing over the years. Eshghi and Larson ( 2008 ) reviewed twentieth-century disasters to build a new classification and described that the variance in initial classifications is due to the difference in describing the disasters and their impacts. Although the categorisation is inconsistent and changing over time, natural disasters and human-induced disaster categories are commonly used and considered as a broader generic group.

As Lukić et al. ( 2013 ) suggested, natural disasters can be categorised based on the physical cause of the incident. Further, a common classification is necessary to have global standards and this will help in assessing disasters without any hazard bias, threshold bias, and accounting bias. Guha-Sapir and Below ( 2002 ) assessed and compared three well-known global disaster datasets EM-Dat (by CRED), NatCatSERVICE (by Munich Re), and Sigma (by Swiss Re). One of the key issues that surfaced from this comparison is the lack of standardisation of methods and definitions. These differences were mainly attributed to the discrepancies in disaster typology. To overcome this, disaster databases EM-Dat and NatCatSERVICE have come together to implement a standard disaster classification which is reviewed and agreed upon by other databases and OCHA (Wirtz et al., 2014 ). The new classification provides two generic categories of natural and technological, which comprise the entire disaster spectrum. The first generic category, natural disasters are further divided into six groups namely biological, climatological, extraterrestrial, geophysical, hydrological, and meteorological. The second generic category technological disasters, is in the place of human-induced disasters and covers three groups; industrial, transport, and miscellaneous (Guha-Sapir, 2008 ). The new classification hierarchy is established on a ‘triggering event’ logic (Below et al., 2009 ). The same classification is implemented for CRED’s annual disaster statistical review from 2007 reports and followed by many other databases.

However, Integrated Research on Disaster Risk (IRDR) program sponsored by the United Nations Office for Disaster Risk Reduction (UNDRR) tested the operating viability of the new classification provided by CRED and Munich Re in national databases, and concluded that implementation of this classification in national databases is difficult. The reason given was that national databases run primarily at the peril level and CRED classification is more of a top-down approach where bottom-level disaster types are exclusively associated with sub-types therefore to main types, as shown in Fig.  1 . This allowed IRDR to work on revising the existing framework. The relationship between peril and main disaster event is not exclusive in the revised classification meaning perils can be linked to multiple disaster categories in the main event as illustrated in Fig.  1 . However, the main level classification of natural disasters remains the same in the IRDR classification (IRDR, 2014 ).

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig1_HTML.jpg

Disaster classification: CRED (2008) by Guha-Sapir ( 2008 ) versus IRDR ( 2014 ).

Source : compilation by author

The bottom-level classification is such an uneven segment in the disaster typology, it changes from time to time from one event to another depending on the definite occurrence and causes for loss. A great deal of work has gone into the CRED's disaster classification since the beginning of the twenty-first century, and their initiation through EM-Dat to improve and standardise the classification has opened doors for academicians and United Nations (UN) organisations to try and implement the disaster classification in their area of work.

IRDR is only focused on natural events and the UNDRR’s latest work is dedicated to all event approach following the Sendai Framework (UNDRR, 2020a ). The new list has avoided a hierarchical approach in classifying disasters, citing the dynamic relationship between various events will be inadequate in hierarchical style and preferred non-hierarchical or flat list (UNDRR, 2020a ). Figure  2 depicts the generic and group-level disasters in CRED, IRDR, and UNDRR.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig2_HTML.jpg

Comparison of the first-level classification between CRED versus IRDR versus UNDRR.

Global disaster databases and UNDRR classified disasters based on the causative dimension and this has been the popular choice. This study is not looking into the peril level classification for the categorisation of disasters in articles but only takes into consideration of the disaster generic group (e.g. natural disaster) and the first level disaster group (e.g. geophysical). This review will be using CRED’s classification of natural disasters as it is simple, distinguishes between all-natural disasters, and more importantly separates from non-natural disasters. The remaining disasters in the review will be identified as human-induced disasters.

Review of reviews

There have been no reviews in the field of BDA and HDO before 2016. Although the research in the field has been marginal over the years, it has recently accelerated as a result of the volatile world we now live in. Furthermore, this discipline is becoming more interconnected and multidisciplinary, making it difficult to keep up with the ongoing research and remain on the cutting edge (Snyder, 2019 ). This research has revealed 13 review studies and surveys of the literature conducted thus far, of which an examination shows that 77% of these studies are not comprehensive. This means the studies either only look at one type of disaster (Balti et al., 2020 ), one particular disaster phase (Cumbane & Gidófalvi, 2019 ), one form of big data source (Wang & Ye, 2018 ), one element of disaster (Sarker et al., 2020a ), or the combination of multiple technologies (Khan et al., 2020 ). Table ​ Table1 1 summarises all thirteen studies identified and briefly describes each review's emphasis. The identified reviews are of several forms, including systematic literature review (SLR), literature review (LR), literature survey (LS), and systematic literature survey (SLS).

Total number of existing reviews in the field.

A review study on the intersection of BDA and HDO is considered a full-scale review. Full-scale literature review papers are very few in this field, and only three papers, Akter and Wamba ( 2019 ); R1, Gupta et al. ( 2019 ); R2, and Sharma and Joshi ( 2019 ); R3 have been identified from the list. The first two reviews, which were conducted around the same time, offered more positive perspectives by outlining the benefits and emphasised the need for use of big data in a humanitarian and disaster setting. The third review has attempted to bring the arguments of the challenges and negative effects related to the use of big data in relief operations. The search string that was used to shortlist the studies is clearly stated in R1 and R2, but this was not the case in R3, which stated that studies were obtained using several databases, but did not include the search string that could help scholars reproduce the results for further verification. The period was flexible in these review papers, R2 and R3 did not restrict themselves to a specific period, however, R1 acknowledges that it did come across very few articles before 2010, therefore it chose to focus on studies published between 2010 and 2017. The primary reason for undertaking an SLR on top of the already existing three papers is that close to 70% of articles on the review topic are published in the last 3 years, meaning, after the research conducted for R1 and R2. More specifically, our study contains only seven articles that are reviewed in either R1 or R2. This number further approves the necessity for an SLR in the field to revisit the review areas that were not covered in R3 (even if they were covered in R1 and R2) such as Classification: by research methodologies, Classification: by disaster phase, Disaster occurrence (year), and Theoretical underpinnings. The blind eye on the management subject area is evident in which, R1 papers from the management field are just above five and in R2 the number is below five. There is a marginally better number in the current study with roughly 10% of papers coming from the management domain but it is nowhere near the top two preferred subject areas of the field.

Besides a few similarities and a good range of dissimilarities in the inclusion and exclusion criteria between the first two review papers, the theoretical underpinning debate is discussed in both studies. R1 explicated the lack of representation of theories in the field and offered some ideas on a few theories as a future research direction. On the other hand, understanding the field from the organisational theoretical lens is one of the research objectives of R2. This study can’t stress enough the importance of theoretical requirements in the field of BDA in HDO. Although R3 was not forthcoming in presenting the important aspect of search criteria that is required for any review, it does stand as the inimitable review in this nascent field as it brings a different view of big data in humanitarian relief, called negative effects. The review divided the articles into three groups: supportive, mixed, and critical. Drawing upon the critical section, a total of eight challenges were discussed. Some challenges are related to ethical concerns, errors caused by either language or culture, and issues with the existence of big data itself.

The three full-scale reviews along with the current review are compared in Table ​ Table2 2 to see how the full-scale reviews are advancing in the field of BDA and HDO. The assessment is based on the review results, and how the authors classified the extant research in their review. The review area named as ‘distribution’ in the table is descriptive where the distribution of articles is available from the respective database they chose for review. Because descriptive results were not considered for this review, we will not present any distribution categories. The remaining review areas which are highlighted in bold are the compilation of outcomes that emerged after reviewing the set of papers. Being the very first full-scale review in the field, R1 mostly produced a basic analysis while at the same time classified papers in three different review areas. R2 which is focused on the humanitarian supply chain papers provided less descriptive outcomes and more analysis on the review papers with supplying enablers and concerns for big data. R3, which was published after the first two reviews, emerged as the less descriptive one and classified articles with real case disasters and reference to the data sources used in the papers. This review will provide a comprehensive overview of the field, incorporating the lessons learnt from the previous three reviews. The reader should bear in mind that this table does not in any way measure the quality of these reviews.

Comparison between three reviews in the field.

This study identified seven review areas to examine, and these areas were chosen logically to represent the two review themes, HDO and BDA. To begin, it is essential to analyse the event in terms of what it is (disaster type), what stage it is in (disaster phase), and where it occurred (disaster location) from the aspect of disaster/humanitarian crisis management. We added 'when it occurred' (disaster year) to this to observe how scholars choose events; recent or historical disasters. Then, from the standpoint of BDA, we are interested in the types of big data (sources of big data) that have been used/examined in previous studies. We still regard this as a nascent field, thus we provided the types of research (research methodologies) undertaken in the field as well as the theories (theoretical underpinnings) that are applied to assess how far we have come.

Methodology

A systematic literature review (SLR) is a research approach that is used to gather and critically evaluate the current state of knowledge on the study topic to address research questions. SLR was implemented as a result of four important considerations. First and foremost, it seeks to provide clarity to the overall process through the use of a review protocol and a carefully planned search strategy (Booth et al., 2012 ). Second, the authors wish to prevent any bias in performing the study, particularly selection and publication bias, and SLR principles can help to reduce this and facilitate the development of more accurate results (Becheikh et al., 2006 ). Third, it must be transparent throughout the review process (Booth et al., 2012 ) and, fourth, it has to be reproducible for other researchers interested in extending this research (Booth et al., 2012 ). The principles of Denyer and Tranfield ( 2009 ) and Tranfield et al. ( 2003 ), two commonly employed SLR techniques in management, were adopted in this review, and they are also preferred in the operation and supply chain domain (El Baz et al., 2019 ; Gligor & Holcomb, 2012 ; Tachizawa & Wong, 2014 ).

Review protocol

The research protocol facilitates the execution of the second stage of the study, 'conducting a literature review,' which is the fundamental component of this research in the SLR process depicted in Fig.  3 . The goal of this protocol is to eliminate any researcher bias (Tranfield et al., 2003 ), therefore a search strategy with a clear set of rules is in place to find the relevant journal articles for this study. As a result, the search for existing literature is facilitated by the selection of a more appropriate citation database, and Scopus was selected for this review. Scopus is regarded as the most comprehensive multidisciplinary database, with more journal coverage than Web of Science (Aghaei Chadegani et al., 2013 ).

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig3_HTML.jpg

Systematic literature review process.

Source : Adopted from Tranfield et al. ( 2003 ) and Denyer and Tranfield ( 2009 )

Search strategy

The search strategy used to shortlist academic literature utilising inclusion and exclusion criteria determines the efficacy of SLR (Snyder, 2019 ). Using the Boolean operators, a search string in Scopus was created which represents both BDA and HDO in the search results. The authors are cautious that inserting more keywords may significantly narrow the search and perhaps omit any relevant literature. As a result, the search string is not rigorous and is as broad as feasible. Because this is a rapidly evolving field, and as a measure, authors are mindful in selecting search keywords. BDA is split into two terms: 'big data' and 'analytics,' because some research papers might have used either name in the keywords, abstract, or title rather than the complete phrase BDA. These keywords are linked with two others, "humanitarian" and "disaster," which represent the field of HDO. The complete search string that was used is listed below.

(("analytics" AND "humanitarian") OR ("analytics" AND "disaster") OR ("big data" AND "humanitarian") OR ("big data" AND "disaster"))

The search criteria, as indicated in Table ​ Table3, 3 , consist of five levels that have aided in the selection of relevant articles, and this was executed within Scopus. The search string was used in the search area, which resulted in 1,563 articles in the first level. Due to the multidisciplinary nature of the study field, five suitable subject areas have been included at level two, bringing the total to 1,354. Only peer-reviewed articles are considered in this review, limiting the total to 483 at level three. The rationale for analysing solely published material is that it can improve the review's quality because most publications undergo a thorough peer-review process (Light & Pillemer, 1984 ). Additionally, the number dropped to 468 when only journal papers are considered at level four. Finally, filtering our search to papers written in English yields a total of 417 articles. Although there was no constraint on publication year during the search, the earliest paper can be tracked back to 2009, as seen in Fig.  4 . The data collection procedure began in April 2020, with the first search conducted on April 29th, and the follow-up searches conducted on July 23rd and December 31st of the same year to update the sample.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig4_HTML.jpg

Research publication over the years by the number of articles.

Search criteria results in Scopus.

Abstract and full-text review

An additional shortlisting process is used by evaluating the search results employing inclusion and exclusion criteria. The abstracts of 417 papers were thoroughly studied, but, when authors thought that the abstract content was insufficient to establish the article's relevance, a full-text review was undertaken. This procedure eliminated around 62% of the papers, leaving 160 for full-text review. One example of an article that has been omitted is ‘Predicting Heart Diseases from Large Scale IoT Data Using a Map-Reduce Paradigm’ (Abd & Manaa, 2020 ). While this article does not discuss humanitarian or disaster operations, it was surfaced in the list due to the inclusion of the key terms ‘big data’ and ‘disaster’ in the abstract.

This study is concept-centric, with a framework designed to capture the key themes in each study to achieve comprehensiveness (Webster & Watson, 2002 ). For full-text papers, the inclusion criterion is based entirely on one parameter; ‘Is the article at the intersection of BDA and HDO?’ This evaluation has been carried out by classifying articles into three distinct categories. Table ​ Table4 4 shows that category one has the most relevant publications to the study topic. For instance, Dubey et al.’s ( 2018 ) article titled ‘Big data and predictive analytics in humanitarian supply chains: Enabling visibility and coordination in the presence of swift trust’ focused on both humanitarian and BDA, hence listed in category one. Category two, on the other hand, is marginally relevant and one such example for this category is ‘Disaster management in the digital age’ (Talley, 2020 ), which discusses various technologies that can be used in disaster management, including BDA. Wherein articles from category three are unrelated and do not contribute to the advancement of this review. If we look at Mann’s ( 2018 ) paper, ‘Left to Other Peoples’ Devices? A Political Economy Perspective on the Big Data Revolution in Development’, it shifts data 4 development (D4D) focus to the economic development, hence placed in category 3. This review considered articles from categories 1 and 2, containing 86 studies, 13 of which were reviews. We opted to produce findings for conceptual, empirical, and model papers, totalling 73 articles.

Full-text review results.

The authors report important findings from the final set of papers in this section, which were identified following fit assessment criteria and are structured into seven review areas in six sub-sections. First category outlines which disasters are more concentrated and where the research is inadequate. The second category reveals which disaster stages are more popular among academics. The third category focuses on disaster locations, as well as how many of these are on real-world disasters and their group. The fourth category is about the big data sources utilised to perform the research and which of these are common in each disaster phase. The fifth category briefly discusses studies associated with theories. At the end of the section, results allied with research methodologies utilised in articles are also presented.

Disaster categories

Scholars had put more importance on natural occurrences, as seen in Fig.  5 because natural disasters comprise more than half of disasters reviewed in the literature. Within the first generic group 'natural disasters', geophysical disasters such as volcanic activity, earthquakes, and tsunamis, along with hydrological disasters including floods and heavy rains were studied. Floods and earthquakes are the predominant choices for researchers in this category of sudden-onset disasters. The interest in geophysical disasters revolves around situational awareness prior to the disaster (Amato et al., 2019 ), public emotion (Yang et al., 2019 ), supply chain resilience (Papadopoulos et al., 2017 ), and information exchange behaviour (Li et al., 2018 ). Further, demand estimation for shelters (X. Zhang et al., 2020b ), and the development of an information system to assist logistic operations in reaching the affected people (Warnier et al., 2020 ) were prioritised. Scholars investigated various aspects of the hydrological disaster group, including responding to the disaster through sentiment analysis (Ragini et al., 2018 ), bridging the information gap between responding organisations (van den Homberg et al., 2018 ), and understanding the severity of the disaster (Kankanamge et al., 2020 ). In addition, academics were interested in forecasting the disaster (Puttinaovarat & Horkaew, 2019 ), and estimating the need for relief supplies (Lin et al., 2020 ) in the hydrological group. Researchers are also paying attention to another sudden-onset disaster group, meteorological disasters which include hurricanes and typhoons. The research in this group focuses on understanding the needs of impacted people and how their priorities change (Malawani et al., 2020 ), examining the societal impacts (Zhang et al., 2020a ), understanding human activities in disasters (Liu et al., 2020 ), sociodemographic factors influencing disaster response (Fan et al., 2020 ), and public behaviour (Chae et al., 2014 ). The research in these three disaster groups is quite diversified, and much emphasis has been placed on them, not only because they are more common, but also because of the economic damage and fatalities that they inflict. The work in these three disaster groups is entirely empirical and model based, with the majority of them (77%) focused on real disaster cases.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig5_HTML.jpg

Disaster group in articles-separated by generic and first-level disaster group.

The group of climatological disasters has received very little attention, with a focus on wildfire and study on the heatwave. Further biological disaster group research is insignificant with only one publication addressing the epidemic crisis. In their research, a couple of scholars focused on multiple disasters inside the natural disaster generic group, with earthquake being one of the multiple disasters. The remainder of papers under the independent category of natural disasters are generic and not particular to any disaster group. Human-induced disasters have rarely been examined; as per the Swiss Re ( 2021 ) report, 37% of disasters reported in 2018 were caused by humans, with a 10-year average of more than 30%. However, researchers' interest in this area is negligible. Bahir and Peled ( 2016 ) attempted to identify the location of the conflict in their research by analysing textual messages, whereas Rogstadius et al. ( 2013 ) and Zhang et al. ( 2016 ) studied situational awareness during civil war and riots, respectively. There is a potential in the human-induced disaster segment and big data such as satellite imagery and mobile data that could be significant for those working in the field and monitoring the trends of the situation to better act. These talking points, however, must be translated into better research and then tested in the field. Further, several articles did not cover either of the disaster generic groups, and this contains conceptual and empirical work mainly related to the general humanitarian supply chain, ethics, and privacy. In one publication, the technology was evaluated in a non-disaster context, therefore it was not allocated to any of the disaster groups. There are also articles on the mix of natural and human-induced disasters in which the majority of them are general and discussed humanitarian principles (Sandvik et al., 2017 ), and humanitarian data sets (Bell et al., 2021 ).

Disaster phase

Disaster occurrences and scenarios in the previous research are divided into four phases-mitigation, preparedness, response, and recovery (Cumbane & Gidófalvi, 2019 ; Kankanamge et al., 2020 ; Sarker et al., 2020a ). Figure  6 illustrates the articles distribution across these four stages, as well as the inclusion of additional categories, where the combined number of articles from mitigation, preparedness, and recovery is not even a third of the total number of articles from the response stage, thereby demonstrating a drastic imbalance in research between the four stages. The work done thus far in the mitigation phase has primarily focused on two aspects. The first is nowcasting disaster impact and disaster forecasting to mitigate significant risks (Avvenuti et al., 2017 ; Puttinaovarat & Horkaew, 2019 ; Qayum et al., 2020 ), and the second is gaining a better knowledge of people's emotions and situations to assist in minimising the impact (Amato et al., 2019 ; Yang et al., 2019 ; Zamarreño-Aramendia et al., 2020 ). In the preparedness phase, Bag et al. ( 2021 ) sought to identify the barriers in employing BDA in the humanitarian supply chain, as well as their interrelationships. This empirical work is timely because there is less research in the preparedness stage in the context of BDA in HDO, and it should help to broaden the conversation. Moreover, research on disaster preparedness in the event of a sudden-onset disaster has to be considerably increased. Because the preparation window is much shorter in this scenario, near-real time and real-time data are more significant. Scholars' top priority over the years has been response events, and this is same for practitioners and policymakers. The response phase is the most intensive, and the established mechanisms will be more overwhelming in this phase than in any other phase. As a result, the disaster response articles in extant research covered all disaster groups except biological, utilised all types of data sources, and spanned across all regions. The focus needs to shift as acting early on can have substantial results on HDO. According to the Boston Consulting Group ( 2015 ) report, financial benefits can be as much as double, which implies that spending one dollar before a disaster can save two dollars during the response, and it can also save 1 week of response time on average. This anticipated action may also result in saving lives. Articles focused on more than one phase categorised as multiple and they used the same source of big data, social media (SM). Though this segment is the combination of multiple phases, they all are centred on the combination of response-recovery (crisis management), with only one research focusing on preparedness-response-recovery. The authors' focus during these crisis management phases is on evaluating the sentiments of the affected people, the severity of disaster damage, and data management procedures. The work of Shan et al. ( 2019 ) is stimulating in that their model for measuring disaster damage evaluated both physical and emotional damage to people in real-time.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig6_HTML.jpg

Articles based on disaster phase.

A significant amount of research talk about the complete cycle of disaster with nearly half of them being conceptual papers. Of these, the majority of work is generalised dialogue and articles focused on organisational mindfulness (Amaye et al., 2016 ) and group privacy (Gerdes, 2020 ). Also, it is worth noting the work of Iglesias et al. ( 2020 ) on building reference architecture for big data, as well as critical components required for the system. Furthermore, the authors highlighted the potential uses of big data in each crisis phase for a variety of tasks based on the core capabilities developed by the National Response Framework. The empirical and model work, on the other hand, concentrated on significant conditions across phases such as coordination in supply chain (Dubey et al., 2018 , 2019 ), crisis communication (Jin & Spence, 2020 ; Kibanov et al., 2017 ), and understanding public behaviour (Chae et al., 2014 ). However, a considerable number of publications did not examine any disaster phase(s), hence classified into the independent category. This category includes papers on the hype around big data (Read et al., 2016 ), challenges (Bell et al., 2021 ), big data in digital humanitarian practices (Burns, 2015 ), and ethics of big data (Taylor, 2016 ).

Disaster locations and occurrence

Disasters strike in any region, and no place is immune, especially when it comes to natural disasters. However, some regions are severely impacted by both economic damage and human casualties. The Asian region continues to be disaster-prone, and it is one of the world's most severely affected regions (Swiss Re, 2021 ), therefore it is expected that scholars will favour examining the events in this region, as shown in Fig.  7 . The Americas are the second most studied region, however, academics prefer the United States over the South American region, with seven out of eight articles focusing on the United States, with a focus on hurricanes. One of the reasons for the high emphasis on the North American region is economic loss. Though human loss in North America is one of the lowest in the world, economic loss is the highest, accounting for nearly 52% of the overall world losses (Swiss Re, 2021 ). Other regions, Europe, Oceania, and Africa, have got much less attention. Africa has the second-highest number of disaster-related human mortality, behind Asia (Swiss Re, 2021 ), yet it is the least concentrated in this domain, where disaster and humanitarian assistance can be extremely crucial. Furthermore, while a couple of papers focus on multiple locations (Chaudhuri & Bose, 2020 ; Mulder et al., 2016 ), the vast majority are from the independent category, including some empirical and all conceptual articles where the investigation is not location-driven. Scholars studied disaster areas using a variety of technological platforms, including data processing and analytical tools Apache Spark (Avvenuti et al., 2018 ; Ragini et al., 2018 ), ScatterBlogs (Thom et al., 2016 ), and Weka (Kankanamge et al., 2020 ) along with programming languages such as R (Malawani et al., 2020 ; Sangameswar et al., 2017 ) and Python (Shan et al., 2019 ; Warnier et al., 2020 ).

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig7_HTML.jpg

Region of disaster.

Figure  8 categorises disasters according to the year in which they occurred. The scholars picked disasters which happened between 2011 and 2019, with an average time gap between disaster incidents and research publication is three and half years. The year 2012 was one of the most expensive hurricane seasons in Atlantic history, and the fact that the most academic research selected disasters from the same year (as seen in the figure below) was due to an increase in scholarly interest in hurricane Sandy in the United States. Also, disaster that spanned across two different years were reported by more papers. As limited research is conducted by considering actual disasters as cases, the general category ends up with high number of publications that do not focus on real-world disasters. The remaining number of articles, which studied actual disasters, were broken down by regions in the “Appendix 2”, to discover which disasters were the most prominent in each region. Except for biological disasters, each group has at least one study on real disasters. Research on climatological disasters was only conducted in Europe and Oceania, whereas hydrological disasters are topped in the Asian region.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig8_HTML.jpg

Year of disaster based on real cases in articles.

Sources of big data

This review further examined the literature on the basis of data utilised for research, and Fig.  9 displays all of the big data sources investigated. Spatial data including satellite, aerial, and map data mainly serve as a visual aid for humanitarian and disaster responders, and researchers rely on these data sources to obtain greater accuracy (Lin et al., 2020 ; Nagendra et al., 2020 ). Ofli et al. ( 2016 ) chose aerial imaging over satellite imaging in their research because the processing time is shorter with aerial imagery. Aerial imagery is helpful for measuring small-scale disasters because it lowers cost, gives data in a timely manner, and avoids capturing difficulties because it is taken below the clouds (Meier, 2015 ). However, this will depend on the time and scale of the crisis, as satellite data will be beneficial for gathering precise texture information over a much larger area and approximately measuring height-related data, which can help quantify the damage intensity on the ground (Yu et al., 2018 ). When Mulder et al. ( 2016 ) examined crowdsourcing data in their study, they pointed out that by the time the data reaches the decision-makers, the original 'crowd' (often affected people) may have been eliminated from the information flow. In addition to their critical viewpoint, Givoni ( 2016 ) advocated for a cautious approach by studying two crowdsourcing platforms, the micro mappers and missing maps. Despite considerable technological developments, mobile phone data sources remain important and scholars explored the use of passive (positioning) and active (SMS) data to cover information gaps such as impacted people's location and need (Cinnamon et al., 2016 ; Nasim & Ramaraju, 2019 ). The most significant disclosure is the use of SM data with a staggering number of publications, which is not confined to developed countries, but is applied across every region, spanning all disaster phases, and being employed in majority of the disaster groups. Behl and Dutta ( 2019 ) work further confirms that scholars employed SM data extensively in their studies. A significant number of articles are general, with no emphasis on data sources, and a large proportion of them are conceptual studies.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig9_HTML.jpg

Big data sources across disaster phases in articles.

Figure  9 also illustrates the comparison of the usage of various data sources across disaster phases, and this shows that academics preferred to use multiple data sources to better identify the needs of affected people at the disaster response stage than any other stage. To overcome the discrepancy when using multiple data sources, Griffith et al. ( 2019 ) stated that data must be cleaned to a significant extent, and cross-referencing between the sources must be performed. People responding to disasters often need to consider the impact of disasters on infrastructure, situation reports that update the ground reality, and information that describes risk levels, necessitating the use of several data sources (Warnier et al., 2020 ). In addition, scholars who studied the disaster response phase used all of the data sources shown in Fig.  9 . The dominance of SM is not just in the response phase, but also when scholars studied multiple phases where it is the major source used for research. SM is not the preferred data source for examining disaster preparedness and recovery. People's engagement in SM typically grows from preparation to response and declines from response to recovery, according to Yan and Pedraza-Martinez ( 2019 ), while on some occasions scholars have turned to SM platforms during the mitigation phase.

Scholars preferred Twitter (20 articles) and Weibo (4 articles) within the SM data source to explore solutions to HDO-related challenges. Twitter's apparent dominance stems mostly from the fact that it includes huge volumes of publicly accessible data that is easy to comprehend, and most significantly, it offers timely data (Thom et al., 2016 ). Nevertheless, over-reliance on Twitter may pose bias-related concerns due to the extensive use of a single SM (Avvenuti et al., 2018 ), and data noise will be higher (Sherchan et al., 2017 ). Weibo, another SM network, has been featured in a few articles, but these studies have only been conducted in Asia. Furthermore, in one instance, academics used multiple SM data sources for their research, with Twitter being one among them. The utilisation of multiple SM networks as a data source, according to the researchers, provides a comprehensive view of the disaster's unfolding (Chaudhuri & Bose, 2020 ; Sherchan et al., 2017 ). Organisations may choose data sources in operations for a variety of reasons, including the best match for their circumstance in a disaster, availability or even financial capability to acquire data. While all three reasons appear rational, organisations should strive to select the first one, which is based on the best fit for the type of disaster, stage of disaster, and location of the disaster. Data cleansing and processing are key components of data analysis, accounting for 80% of overall data analysis (Griffith et al., 2019 ), therefore selecting an adequate data source is critical for operational efficiency.

Theoretical underpinnings

The employment of theories is necessitated not just because of the intrinsic complexity of the HDO field, but also because of the context in which they occur (Galindo & Batta, 2013 ; Oloruntoba et al., 2019 ). The largest portion of the research in the examined articles did not apply any theories, and where they were used, there was no clear preference for one theory over another, not to mention that no single theory appeared more than once. The smaller percentage of theories in research publications might be attributed to the importance of applied research in the HDO field, where practitioners place a high value on practical relevance (Oloruntoba et al., 2019 ). The limited utilisation of theory has been mentioned in Akter and Wamba’s ( 2019 ) study, however, there appears to be a modest shift and improvement in theory usage over the last couple of years. Despite recent advances, the theory development in HDO is still in infancy, and there is a need and opportunity for researchers to integrate, expand, or even contradict theories to progress knowledge and overcome gaps (Oloruntoba et al., 2019 ).

Amaye et al. ( 2016 ) integrated organisational mindfulness processes with information system design theory to develop a mindfulness-based information systems assessment framework for making better decisions in emergency management circumstances. In an attempt to explain resilience, Papadopoulos et al. ( 2017 ) employed the TOSE resilience theoretical framework to investigate the use of big data in humanitarian supply chain networks for sustainability. From empirically testing their theory, authors demonstrated that the exchange of quality information in relief operations, public–private partnerships, and swift trust work as enablers of resilience in the humanitarian supply chain. In their research, Dubey et al. ( 2018 ) used a contingent resource-based view, in which the authors regarded big data predictive analytics as a capability for organisations that might be beneficial in visibility creation and coordination building, and swift trust could affect this relationship. Prasad et al. ( 2018 ) deployed resource dependence theory to investigate the interaction between non-governmental organisations (NGOs) and supply-chain partners and how this relationship affects the power dynamic in big data generation. According to the authors, partners in the supply chain have the ability to compel NGOs to employ BDA in their actions, which was empirically tested in three NGOs. Li et al. ( 2018 ), on the other hand, investigated population behaviour during disaster using a sociological theory called social exchange theory. The authors focused their investigation on people who were not impacted by the earthquake and if their actions on social platforms varied from those who were affected. Further, the organisational information processing theory was formulated into the humanitarian setting by Dubey et al. ( 2019 ) as an outcome. The authors empirically demonstrated that BDA capability has a favourable effect on both collaborative performance and swift trust.

Jeble et al. ( 2019 ) developed a conceptual model by interlinking two theories, resource-based view and social capital. In their work, authors developed a model based on big data and predictive analytics as a capability with tangible, intangible, and human resources, as well as social aspects such as trust, participation, social norms, and network to help improve performance in humanitarian supply chains. The road and distribution network will not be the same once the disaster strikes, because pre-disaster transportation models do not consider disaster-related disruptions. The use of social support theory in Yan and Pedraza-Martinez ( 2019 ) was to explore what elements inspire SM users to respond to the relief organisations’ posts during disasters and what form of social support the user interactions with organisations are connected with. The study by Warnier et al. ( 2020 ) utilises graph theory to investigate how to reach disaster-affected populations through these networks. Their research examined the transportation network using a variety of metrics, including centrality measurements, dynamic network properties, and intrinsic network properties. The work of Susha ( 2020 ) built a critical success factor (CSF) theoretical framework, found several elements for establishing data collaboratives, and streamlined them to the most relevant factors.

Research methodologies

We also looked into the research methodologies used in the articles to understand where the research is heading and what methods scholars prefer to study the field. Figure  10 depicts the distribution of articles into reviews, conceptual, and empirical and model papers. Though this is not a mature field, the amount of empirical and model research is predominant so far. In this, qualitative research (interviews, case studies, ethnography) is commonly used to study the activities engaged in the disaster response stage, but with a substantially lower ratio of theories used. On the other hand, quantitative research utilising surveys is minimal but studied events across various disaster phases, and the theory usage ratio is significantly better than qualitative studies. In addition, a few researchers employed mixed-method techniques with a focus on supply chain and situational awareness across disaster phases.

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Fig10_HTML.jpg

Research methods in articles.

Scholars who employed models conducted extensive research on the Asian region, with half of them focusing on real-world disasters. Scholars developed models utilising MLP neural network, Apache Spark, and Python to better understand the needs of those affected. And relying exclusively on traditional approaches is no longer a viable option, thus scholars have either shifted to completely new data types or combined traditional data with new data types. In the segment of conceptual papers, scholars concentrated more on advocacy style in discussing the necessity of BDA or how to approach it, and a few papers highlighted difficulties of technology utilisation in the HDO. However, theory development or framework-related work is less visible in this part. “Appendix 4” shows the conceptual papers as well as the empirical & model studies. The review portion is discussed in length earlier in the 'review of reviews' section.

HDO management has progressed over the years in terms of establishing international mechanisms (UNDRR, 2015 ), assessing the needs for relief supplies (Apte et al., 2016 ), and even improving community-based disaster management (Zhang et al., 2013 ). Unfortunately, these developments have not been able to control the increasing number of fatalities, impacted populations, or economic losses, which have risen substantially from 1980–1999 to 2000–2019. Existing research that incorporates BDA into HDO focuses on disaster groups that are significant due to their frequency of occurrence, such as geophysical, hydrological, and meteorological. Concentrating entirely on frequent disasters does not help the field progress. Climatological and biological disasters should not be overlooked simply because they are less common than other disasters, and this does not imply that their impacts would be minimal. For instance, COVID-19, which began as an infectious disease has now evolved into an ongoing pandemic and a significant humanitarian crisis. If the early stages of an event are neglected, a hazard can turn into a disaster and a humanitarian crisis. Mami Mizutori, the UN Special Representative for Disaster Risk Reduction, has stated that “it is time to recognise that there is no such thing as a natural disaster” (UNDRR, 2021 ) highlighting that we bear a great deal of responsibility for resolving this, and our actions have to be decisive. Increasing research in less-focused disaster categories alone will not suffice. The disaster response phase is heavily concentrated in the context of BDA and HDO that the combined research of the other three phases mitigation, preparedness, and recovery makes up less than a third of the response phase. Mitigation and preparedness strategies have been widely debated as ways to lessen the negative effects of humanitarian crises and disasters (Asghar et al., 2006 ; Oteng-Ababio, 2013 ). This has been proven to be effective in terms of time and cost savings in two of the UN organisations' preparedness investments in three countries, according to pilot research conducted by Boston Consulting Group ( 2015 ). If that is so evident, why the research is not moving in this direction and investigate how BDA can bring additional value? Because, traditional HDOs are reactive, with relief agencies waiting for a disaster to unfold before initiating any humanitarian aid (Goldschmidt & Kumar, 2016 ). In 2020, UN OCHA launched a pilot programme called anticipatory humanitarian action in Bangladesh, using predictive analytics to intervene before the disaster (flood) occurred. As a result, more people were reached, aid became cheaper and faster, and the quality of assistance improved (UN OCHA, 2021 ). To see this initiative through in a central humanitarian agency, a lot of firsts had to happen, and such pilot projects at a larger scale won't be able to drive smaller organisations in the third sector. This could change if more research on pre-disaster phases involving local aid organisations is conducted.

Each HDO is distinct in its own right, just as each humanitarian crisis or disaster is unique in its own way. Every nation may not have the same emergency response systems, and the impacts will vary. Similarly, the research performed on each region differs in the field, with a substantial level of research in one region and a limited level of research in the other. However, the fact that Africa and South America were not represented in the 30 papers on actual disaster cases in review is cause for concern. Disasters and humanitarian crises pose a high to very high risk in Africa and a medium to high risk in South America (Thow et al., 2020 ). More focus in these regions, especially on Africa, would be particularly valuable because much of humanitarian work and the funding is directed here and the effective use of these funds is essential. The data availability and variations of multiple data sources can be a challenge in considering Africa and South America for research. Nonetheless, Humanitarian Data Exchange (HDX) currently includes data grids for 27 locations, 19 of which are in these two areas (Centre for Humanitarian Data, 2021 ). SM platforms remain popular data sources in examining the real case disasters and humanitarian crises (21 out of 30 articles), which fits well as long as privacy, ethical, and validation concerns are addressed. Though SM cannot be a one-size-fits-all solution for every data-related problem in HDO, The use of an additional data source, such as authoritative data, would complement SM data and could address validation concerns (Wang & Ye, 2018 ). A dialogue should be initiated to determine which big data sources are more suited to each disaster group and why, and how this can improve HDO efficiency. Based on existing research, a framework has been constructed in the “Appendix 3” that demonstrates what led scholars to select the specific big data source across several disaster groups. This was not possible in a few disaster groups due to a lack of empirical investigation, and for that reason, authors' views are included, which are indicated in bold in the same table. The framework was created by combining existing taxonomies on big data sources from UN Global Pulse ( 2012 ) and Qadir et al. ( 2016 ).

The scientific interest in the topic has increased greatly, and the year 2015 would be considered as an inflection point, with research moving at a breakneck pace since then. As a result, we opted to take stock of advancements in the field, as the years 2019 and 2020 had witnessed a tremendous amount of work. Though this is a multidisciplinary topic, and the work integration of multiple subject areas will only assist grow the field further so that practitioners could make better use of it, researchers from the management subject area can increase their attention towards this topic to holistically enrich it.

Implications for practice

Some of the scholars highlighted how their study findings can be put into practice. Prasad et al. ( 2018 ) argued that third sector organisations must identify the important data attributes, as well as the change in expected results such as lead times and cost due to these data attributes, prior to the intervention. Yan and Pedraza-Martinez ( 2019 ) discussed how the usage of SM as a data source might be enhanced, and suggested that relief organisations use SM platforms for actionable information reaching volunteers, and donors. Furthermore, the use of SM as a big data source by public authorities should not be reactive, which necessitates a cultural shift in these organisations, and Zamarreño-Aramendia et al. ( 2020 ) made multiple recommendations on how SM can be used by the authorities. Fan et al. ( 2020 ) on the other hand, emphasised that response managers should consider the size of the population while employing SM to provide relief supplies to address spatial inequality. The work of Kontokosta and Malik ( 2018 ) on how the use of multiple big data sources can be helpful to reach the most affected people with a minimum capacity of resilience is noteworthy, and their REDI index is aimed at community organisations.

Future research directions

Table ​ Table5 5 outlines possible directions for future studies from the standpoint of big data, through which a single or various big data sources can be employed to perform the research.

Future directions to research in the field of BDA and HDO.

In recent times, the most devastating disaster categories have been biological and climatological, while being mostly overlooked by academics, with just a fleeting reference in the current literature. The biological disaster group might receive a lot of attention from researchers in the coming years as a result of the COVID-19. Is it necessary to wait for a significant climate crisis to unfold before expanding this disaster group's research capabilities? Scholars need to bring attention to these understudied disaster categories in the natural disaster generic group, as well as level the research in the human-induced disaster generic group to evaluate how BDA may or may not be effective in certain disaster groups. If the knowledge gap between these two disaster generic groups widens further, this might lead to inconsistent suppositions and rationales for BDA in the HDO spectrum. Griffith et al. ( 2019 ) consider humanitarian logistics to be an immature field from an analytical standpoint because the solutions developed from the research efforts may not be employed in actual disaster settings due to computational burdens. This is something the academic community should take into account, rather than only developing models, they should strive to provide techniques, tools, and prospective solutions that can be used in real HDO settings.

At the start of this review, 168 million people required various forms of humanitarian relief, by the end of the study, that figure had increased to 235 million. There is no time to waste, and certainly no data to be lost. Organisations in this field, such as NGOs, disaster management agencies, and other humanitarian societies, need to focus on exploring the use of BDA with the same tenacity as profit-driven enterprises while keeping ethical issues in check. Fortunately, academic research in this field is growing at a rapid pace, with the years 2019 and 2020 accounting for more than half of all research. Although significant progress has been made in the management subject domain, the total contribution has been minimal. Because it is a multidisciplinary field, various subject areas make important contributions. However, scholars from the management domain need to engage more in the advancement of the field. This study aimed to tackle three research questions and the topic in a systematic and more integrated manner. First, research on the application of BDA in HDO has substantially increased in recent years, demonstrating academics' interest and ability to investigate whether or not big data could improve the way humanitarian and disaster management operate. Second, the state of BDA application in the field remains lopsided among different disaster locations, disaster categories, and disaster stages, and research efforts were not utilised where they are more critical. Putting the emphasis on responding to disasters whilst overlooking the other three phases, mitigation, preparedness, and recovery will not lead to a comprehensive development of the field. Additionally, a heavy reliance on SM as a big data source has a factual, bias, and ethical concerns that need to be addressed. Third, a lack of theoretical frameworks is visible in the discipline; while this appears to be improving recently, the proportion of publications with a theoretical viewpoint in total papers published each year is not encouraging. Despite these significant findings, the review also has a few limitations, which the authors are aware of when undertaking the review.

Limitations

There are three key limitations: one in database selection, one in exclusion criteria that was not part of the five-level search criteria, and one owing to the usage of SLR as a method. Though the selection of database is rational in this study, if the additional resources and time are available, web of science as an additional database could be incorporated for future studies. This addition may introduce a few more publications to the evaluation process and offer a much richer view of the subject. The second limitation is a Scopus-specific feature. To filter the results, the database provides two options: 'Exclude' and 'Limit to'. The subject area is one of the options to filter the results in Scopus, but it does not offer a unique split for papers by using the ‘Limit to’ condition. Because Scopus allocates each paper to several subject areas, it is not possible to get a unique number of articles listed in each subject area by using the 'Limit to' condition. The authors' reasoning for utilising this condition instead of 'Exclude' is that the 'Exclude' condition removes any publications with subject areas indicated in the 'Limit to' list (including subject areas in which the authors are particularly interested but will be omitted because each article contains tags of multiple subject areas). Furthermore, while the review was rigorous, it is possible that the author omitted a few studies because they did not fit the pre-defined inclusion/exclusion criteria.

Appendix 1: Final list of articles considered for review by publishing year in Scopus

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Figa_HTML.jpg

Appendix 2: Real case disasters (by disaster group) employed in articles based on the disaster region

An external file that holds a picture, illustration, etc.
Object name is 10479_2022_4904_Figb_HTML.jpg

Appendix 3: The selection and rationale behind big data sources in each disaster group

Appendix 4: methodological distribution of existing research, availability of data and material, code availability, declarations.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Abburu S. GIS based interoperable platform for disaster data exchange using OGC standards and spatial query. International Journal of Web Portals. 2017; 9 (1):29–51. doi: 10.4018/IJWP.2017010103. [ CrossRef ] [ Google Scholar ]
  • Abd FM, Manaa ME. Predicting heart diseases from large scale IoT data using a map-reduce paradigm. Open Computer Science. 2020; 10 (1):422–430. doi: 10.1515/comp-2020-0204. [ CrossRef ] [ Google Scholar ]
  • Aghaei Chadegani A, Salehi H, Md Yunus MM, Farhadi H, Fooladi M, Farhadi M, Ale Ebrahim N. A comparison between two main academic literature collections: Web of science and scopus databases. Asian Social Science. 2013; 9 (5):18–26. doi: 10.5539/ass.v9n5p18. [ CrossRef ] [ Google Scholar ]
  • Akhtar P, Osburg VS, Kabra G, Ullah S, Shabbir H, Kumari S. Coordination and collaboration for humanitarian operational excellence: Big data and modern information processing systems. Production Planning and Control. 2020 doi: 10.1080/09537287.2020.1834126. [ CrossRef ] [ Google Scholar ]
  • Akter S, Wamba SF. Big data and disaster management: A systematic review and agenda for future research. Annals of Operations Research. 2019; 283 (1–2):939–959. doi: 10.1007/s10479-017-2584-2. [ CrossRef ] [ Google Scholar ]
  • Altay N, Green WG., III OR/MS research in disaster operations management. European Journal of Operational Research. 2006; 175 (1):475–493. doi: 10.1016/j.ejor.2005.05.016. [ CrossRef ] [ Google Scholar ]
  • Amato F, Moscato V, Picariello A, Sperli’ì G. Extreme events management using multimedia social networks. Future Generation Computer Systems. 2019; 94 :444–452. doi: 10.1016/j.future.2018.11.035. [ CrossRef ] [ Google Scholar ]
  • Amaye A, Neville K, Pope A. BigPromises: Using organisational mindfulness to integrate big data in emergency management decision making. Journal of Decision Systems. 2016; 25 :76–84. doi: 10.1080/12460125.2016.1187419. [ CrossRef ] [ Google Scholar ]
  • Apte A, Gonçalves P, Yoho K. Capabilities and competencies in humanitarian operations. Journal of Humanitarian Logistics and Supply Chain Management. 2016; 6 (2):240–258. doi: 10.1108/JHLSCM-04-2015-0020. [ CrossRef ] [ Google Scholar ]
  • Asghar S, Alahakoon D, Churilov L. A comprehensive conceptual model for disaster management. Journal of Humanitarian Assistance. 2006; 1360 (0222):1–15. [ Google Scholar ]
  • Avvenuti M, Cresci S, Del Vigna F, Fagni T, Tesconi M. CrisMap: A big data crisis mapping system based on damage detection and geoparsing. Information Systems Frontiers. 2018; 20 (5):993–1011. doi: 10.1007/s10796-018-9833-z. [ CrossRef ] [ Google Scholar ]
  • Avvenuti M, Cresci S, La Polla MN, Meletti C, Tesconi M. Nowcasting of earthquake consequences using big social data. IEEE Internet Computing. 2017; 21 (6):37–45. doi: 10.1109/MIC.2017.4180834. [ CrossRef ] [ Google Scholar ]
  • Bag S, Gupta S, Wood L. Big data analytics in sustainable humanitarian supply chain: Barriers and their interactions. Annals of Operations Research. 2021 doi: 10.1007/s10479-020-03790-7. [ CrossRef ] [ Google Scholar ]
  • Bahir E, Peled A. Geospatial extreme event establishing using social network’s text analytics. GeoJournal. 2016; 81 (3):337–350. doi: 10.1007/s10708-015-9622-x. [ CrossRef ] [ Google Scholar ]
  • Balti H, Ben Abbes A, Mellouli N, Farah IR, Sang Y, Lamolle M. A review of drought monitoring with big data: Issues, methods, challenges and research directions. Ecological Informatics. 2020; 60 :1–17. doi: 10.1016/j.ecoinf.2020.101136. [ CrossRef ] [ Google Scholar ]
  • Becheikh N, Landry R, Amara N. Lessons from innovation empirical studies in the manufacturing sector: A systematic review of the literature from 1993–2003. Technovation. 2006; 26 (5–6):644–664. doi: 10.1016/j.technovation.2005.06.016. [ CrossRef ] [ Google Scholar ]
  • Behl A, Chavan M, Jain K, Sharma I, Pereira VE, Zhang JZ. The role of organizational culture and voluntariness in the adoption of artificial intelligence for disaster relief operations. International Journal of Manpower. 2021 doi: 10.1108/IJM-03-2021-0178. [ CrossRef ] [ Google Scholar ]
  • Behl A, Dutta P. Humanitarian supply chain management: A thematic literature review and future directions of research. Annals of Operations Research. 2019; 283 (1–2):1001–1044. doi: 10.1007/s10479-018-2806-2. [ CrossRef ] [ Google Scholar ]
  • Behl A, Dutta P. Engaging donors on crowdfunding platform in Disaster Relief Operations (DRO) using gamification: A Civic Voluntary Model (CVM) approach. International Journal of Information Management. 2020; 54 :102140. doi: 10.1016/j.ijinfomgt.2020.102140. [ CrossRef ] [ Google Scholar ]
  • Behl A, Dutta P, Luo Z, Sheorey P. Enabling artificial intelligence on a donation-based crowdfunding platform: A theoretical approach. Annals of Operations Research. 2021 doi: 10.1007/s10479-020-03906-z. [ CrossRef ] [ Google Scholar ]
  • Bell D, Lycett M, Marshan A, Monaghan A. Exploring future challenges for big data in the humanitarian domain. Journal of Business Research. 2021; 131 :453–468. doi: 10.1016/j.jbusres.2020.09.035. [ CrossRef ] [ Google Scholar ]
  • Below, R., Wirtz, A., & Guha-Sapir, D. (2009). Disaster category classification and peril terminology for operational purposes . C. f. R. o. t. E. o. Disasters. Retrived from, https://www.cred.be/node/564
  • Berren MR, Beigel A, Ghertner S. A typology for the classification of disasters. Community Mental Health Journal. 1980; 16 (2):103–111. doi: 10.1007/BF00778582. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Booth A, Papaioannou D, Sutton A. Systematic approaches to a successful literature review. SAGE; 2012. [ Google Scholar ]
  • Boston Consulting Group. (2015). UNICEF/WFP return on investment for emergency preparedness study . Retrived from, https://www.wfp.org/publications/unicefwfp-return-investment-emergency-preparedness-study
  • Burger A, Oz T, Kennedy WG, Crooks AT. Computational social science of disasters: Opportunities and challenges. Future Internet. 2019; 11 (5):1–31. doi: 10.3390/fi11050103. [ CrossRef ] [ Google Scholar ]
  • Burns R. Rethinking big data in digital humanitarianism: Practices, epistemologies, and social relations. GeoJournal. 2015; 80 (4):477–490. doi: 10.1007/s10708-014-9599-x. [ CrossRef ] [ Google Scholar ]
  • Centre for Humanitarian Data. (2019). Predictive analytics . Retrived from, https://centre.humdata.org/predictive-analytics/
  • Centre for Humanitarian Data. (2021). The state of open humanitarian data 2021: Assessing data availability across humanitarian crises . Retrived from, https://reliefweb.int/report/world/state-open-humanitarian-data-2021-assessing-data-availability-across-humanitarian
  • Chae J, Thom D, Jang Y, Kim S, Ertl T, Ebert DS. Public behavior response analysis in disaster events utilizing visual analytics of microblog data. Computers and Graphics (pergamon) 2014; 38 (1):51–60. doi: 10.1016/j.cag.2013.10.008. [ CrossRef ] [ Google Scholar ]
  • Chaudhuri N, Bose I. Exploring the role of deep neural networks for post-disaster decision support. Decision Support Systems. 2020; 130 :113234. doi: 10.1016/j.dss.2019.113234. [ CrossRef ] [ Google Scholar ]
  • Cinnamon J, Jones SK, Adger WN. Evidence and future potential of mobile phone data for disease disaster management. Geoforum. 2016; 75 :253–264. doi: 10.1016/j.geoforum.2016.07.019. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cumbane SP, Gidófalvi G. Review of big data and processing frameworks for disaster response applications. ISPRS International Journal of Geo-Information. 2019; 8 (9):1–23. doi: 10.3390/ijgi8090387. [ CrossRef ] [ Google Scholar ]
  • Day JM, Melnyk SA, Larson PD, Davis EW, Whybark DC. Humanitarian and disaster relief supply chains: A matter of life and death. Journal of Supply Chain Management. 2012; 48 (2):21–36. doi: 10.1111/j.1745-493X.2012.03267.x. [ CrossRef ] [ Google Scholar ]
  • de Boer J. Definition and classification of disasters: Introduction of a disaster severity scale. Journal of Emergency Medicine. 1990; 8 (5):591–595. doi: 10.1016/0736-4679(90)90456-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • De Smet H, Lagadec P, Leysen J. Disasters out of the box: A new ballgame? Journal of Contingencies Crisis Management. 2012; 20 (3):138–148. doi: 10.1111/j.1468-5973.2012.00666.x. [ CrossRef ] [ Google Scholar ]
  • Denyer, D., & Tranfield, D. (2009). Producing a systematic review .
  • Dubey R, Gunasekaran A, Childe SJ, Roubaud D, Fosso Wamba S, Giannakis M, Foropon C. Big data analytics and organizational culture as complements to swift trust and collaborative performance in the humanitarian supply chain. International Journal of Production Economics. 2019; 210 :120–136. doi: 10.1016/j.ijpe.2019.01.023. [ CrossRef ] [ Google Scholar ]
  • Dubey R, Luo Z, Gunasekaran A, Akter S, Hazen BT, Douglas MA. Big data and predictive analytics in humanitarian supply chains: Enabling visibility and coordination in the presence of swift trust. International Journal of Logistics Management. 2018; 29 (2):485–512. doi: 10.1108/IJLM-02-2017-0039. [ CrossRef ] [ Google Scholar ]
  • El Baz J, Laguir I, Stekelorum R. Logistics and supply chain management research in Africa: A systematic literature review and research agenda. International Journal of Logistics Management. 2019; 30 (1):8–38. doi: 10.1108/IJLM-09-2017-0242. [ CrossRef ] [ Google Scholar ]
  • Eshghi K, Larson RC. Disasters: Lessons from the past 105 years. Disaster Prevention and Management: An International Journal. 2008; 17 (1):62–82. doi: 10.1108/09653560810855883. [ CrossRef ] [ Google Scholar ]
  • Fan C, Esparza M, Dargin J, Wu F, Oztekin B, Mostafavi A. Spatial biases in crowdsourced data: Social media content attention concentrates on populous areas in disasters. Computers, Environment and Urban Systems. 2020; 83 :1–12. doi: 10.1016/j.compenvurbsys.2020.101514. [ CrossRef ] [ Google Scholar ]
  • Fan C, Zhang C, Yahja A, Mostafavi A. Disaster City Digital Twin: A vision for integrating artificial and human intelligence for disaster management. International Journal of Information Management. 2021; 56 :102049. doi: 10.1016/j.ijinfomgt.2019.102049. [ CrossRef ] [ Google Scholar ]
  • Fast L. Diverging data: Exploring the epistemologies of data collection and use among those working on and in conflict. International Peacekeeping. 2017; 24 (5):706–732. doi: 10.1080/13533312.2017.1383562. [ CrossRef ] [ Google Scholar ]
  • Fathi R, Thom D, Koch S, Ertl T, Fiedrich F. VOST: A case study in voluntary digital participation for collaborative emergency management. Information Processing and Management. 2020; 57 (4):102174. doi: 10.1016/j.ipm.2019.102174. [ CrossRef ] [ Google Scholar ]
  • Financial Tracking Service. (2021). Trends in response plan/appeal requirements . Retrived from, https://fts.unocha.org/appeals/overview/2021
  • Galindo G, Batta R. Review of recent developments in OR/MS research in disaster operations management. European Journal of Operational Research. 2013; 230 (2):201–211. doi: 10.1016/j.ejor.2013.01.039. [ CrossRef ] [ Google Scholar ]
  • Gazi T, Gazis A. Humanitarian aid in the age of CoviD-19: A review of big data crisis analytics and the general data protection regulation. International Review of the Red Cross. 2020; 102 (913):75–94. doi: 10.1017/S1816383121000084. [ CrossRef ] [ Google Scholar ]
  • Gerdes A. A moderate interpretation of group privacy illustrated by cases from disaster management. Journal of Contingencies and Crisis Management. 2020; 28 (4):446–452. doi: 10.1111/1468-5973.12336. [ CrossRef ] [ Google Scholar ]
  • Givoni M. Between micro mappers and missing maps: Digital humanitarianism and the politics of material participation in disaster response. Environment and Planning d: Society and Space. 2016; 34 (6):1025–1043. doi: 10.1177/0263775816652899. [ CrossRef ] [ Google Scholar ]
  • Gligor DM, Holcomb MC. Understanding the role of logistics capabilities in achieving supply chain agility: A systematic literature review. Supply Chain Management. 2012; 17 (4):438–453. doi: 10.1108/13598541211246594. [ CrossRef ] [ Google Scholar ]
  • Goldschmidt KH, Kumar S. Humanitarian operations and crisis/disaster management: A retrospective review of the literature and framework for development. International Journal of Disaster Risk Reduction. 2016; 20 :1–13. doi: 10.1016/j.ijdrr.2016.10.001. [ CrossRef ] [ Google Scholar ]
  • Goswami S, Chakraborty S, Ghosh S, Chakrabarti A, Chakraborty B. A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal. 2018; 9 (3):365–378. doi: 10.1016/j.asej.2016.01.012. [ CrossRef ] [ Google Scholar ]
  • Graham C, Thompson C, Wolcott M, Pollack J, Tran M. A guide to social media emergency management analytics: Understanding its place through Typhoon Haiyan tweets. Statistical Journal of the IAOS. 2015; 31 (2):227–236. doi: 10.3233/sji-150893. [ CrossRef ] [ Google Scholar ]
  • Gray RA. Disasters: Natural, nuclear, and classificatory. RQ. 1982; 22 (1):42–47. [ Google Scholar ]
  • Greenough PG, Nelson EL. Beyond mapping: A case for geospatial analytics in humanitarian health. Conflict and Health. 2019; 13 (1):1–14. doi: 10.1186/s13031-019-0234-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Griffith DA, Boehmke B, Bradley RV, Hazen BT, Johnson AW. Embedded analytics: Improving decision support for humanitarian logistics operations. Annals of Operations Research. 2019; 283 (1–2):247–265. doi: 10.1007/s10479-017-2607-z. [ CrossRef ] [ Google Scholar ]
  • Guha-Sapir, D., & Below, R. (2002). Quality and accuracy of disaster data: A comparative analyse of 3 global data sets . Retrived from, https://www.cred.be/node/288
  • Guha-Sapir D. Disaster data: A balanced perspective. CRED Crunch; 2008. [ Google Scholar ]
  • Gupta S, Altay N, Luo Z. Big data in humanitarian supply chain management: A review and further research directions. Annals of Operations Research. 2019; 283 (1–2):1153–1173. doi: 10.1007/s10479-017-2671-4. [ CrossRef ] [ Google Scholar ]
  • Iglesias CA, Favenza A, Carrera Á. A big data reference architecture for emergency management. Information (switzerland) 2020; 11 (12):1–24. doi: 10.3390/info11120569. [ CrossRef ] [ Google Scholar ]
  • IRDR. (2014). Peril classification and hazard glossary . Retrived from, https://www.irdrinternational.org/knowledge_pool/publications/173
  • Jeble S, Kumari S, Venkatesh VG, Singh M. Influence of big data and predictive analytics and social capital on performance of humanitarian supply chain: Developing framework and future research directions. Benchmarking. 2019; 27 (2):606–633. doi: 10.1108/BIJ-03-2019-0102. [ CrossRef ] [ Google Scholar ]
  • Jin X, Spence PR. Understanding crisis communication on social media with CERC: Topic model analysis of tweets about Hurricane Maria. Journal of Risk Research. 2020 doi: 10.1080/13669877.2020.1848901. [ CrossRef ] [ Google Scholar ]
  • Kankanamge N, Yigitcanlar T, Goonetilleke A, Kamruzzaman M. Determining disaster severity through social media analysis: Testing the methodology with South East Queensland Flood tweets. International Journal of Disaster Risk Reduction. 2020; 42 :101360. doi: 10.1016/j.ijdrr.2019.101360. [ CrossRef ] [ Google Scholar ]
  • Khan A, Gupta S, Gupta SK. Multi-hazard disaster studies: Monitoring, detection, recovery, and management, based on emerging technologies and optimal techniques. International Journal of Disaster Risk Reduction. 2020; 47 :101642. doi: 10.1016/j.ijdrr.2020.101642. [ CrossRef ] [ Google Scholar ]
  • Khoury BJ. Logistics data analytics alongside voucher programme phases. Journal of Humanitarian Logistics and Supply Chain Management. 2019; 9 (3):332–351. doi: 10.1108/JHLSCM-06-2018-0050. [ CrossRef ] [ Google Scholar ]
  • Kibanov M, Stumme G, Amin I, Lee JG. Mining social media to inform peatland fire and haze disaster management. Social Network Analysis and Mining. 2017; 7 (1):579–600. doi: 10.1007/s13278-017-0446-1. [ CrossRef ] [ Google Scholar ]
  • Knox Clarke P, Campbell L. Decision-making at the sharp end: A survey of literature related to decision-making in humanitarian contexts. Journal of International Humanitarian Action. 2020; 5 :1–14. doi: 10.1186/s41018-020-00068-2. [ CrossRef ] [ Google Scholar ]
  • Kontokosta CE, Malik A. The Resilience to Emergencies and Disasters Index: Applying big data to benchmark and validate neighborhood resilience capacity. Sustainable Cities and Society. 2018; 36 :272–285. doi: 10.1016/j.scs.2017.10.025. [ CrossRef ] [ Google Scholar ]
  • Kwapong Baffoe BO, Luo W. Humanitarian relief sustainability: A framework of humanitarian logistics digital business ecosystem. Transportation Research Procedia. 2020; 48 :363–387. doi: 10.1016/j.trpro.2020.08.032. [ CrossRef ] [ Google Scholar ]
  • Lacourt, M., & Radosta, M. (2019). Strength in numbers—Towards a more efficient humanitarian aid: Pooling logistics resources . Retrived from, https://reliefweb.int/report/world/strength-numbers-towards-more-efficient-humanitarian-aid-pooling-logistics-resources
  • Landwehr PM, Wei W, Kowalchuck M, Carley KM. Using tweets to support disaster planning, warning and response. Safety Science. 2016; 90 :33–47. doi: 10.1016/j.ssci.2016.04.012. [ CrossRef ] [ Google Scholar ]
  • Li L, Zhang Q, Tian J, Wang H. Characterizing information propagation patterns in emergencies: A case study with Yiliang Earthquake. International Journal of Information Management. 2018; 38 (1):34–41. doi: 10.1016/j.ijinfomgt.2017.08.008. [ CrossRef ] [ Google Scholar ]
  • Light RJ, Pillemer DB. Summing up. The science of reviewing research. Harvard University Press; 1984. [ Google Scholar ]
  • Lin A, Wu H, Liang G, Cardenas-Tristan A, Wu X, Zhao C, Li D. A big data-driven dynamic estimation model of relief supplies demand in urban flood disaster. International Journal of Disaster Risk Reduction. 2020; 49 :101682. doi: 10.1016/j.ijdrr.2020.101682. [ CrossRef ] [ Google Scholar ]
  • Liu Z, Du Y, Yi J, Liang F, Ma T, Pei T. Quantitative estimates of collective geo-tagged human activities in response to typhoon Hato using location-aware big data. International Journal of Digital Earth. 2020; 13 (9):1072–1092. doi: 10.1080/17538947.2019.1645894. [ CrossRef ] [ Google Scholar ]
  • Lukić T, Gavrilov MB, Marković SB, Komac B, Zorn M, Mladan D, Dordević J, Milanović M, Vasiljević DA, Vujičić MD, Kuzmanović B, Prentović R. Classification of natural disasters between the legislation and application: Experience of the Republic of Serbia. Acta Geographica Slovenica. 2013; 53 (SPL.1):149–164. doi: 10.3986/AGS53301. [ CrossRef ] [ Google Scholar ]
  • Madianou M. Technocolonialism: Digital innovation and data practices in the humanitarian response to refugee crises. Social Media and Society. 2019; 5 (3):1–13. doi: 10.1177/2056305119863146. [ CrossRef ] [ Google Scholar ]
  • Malawani AD, Nurmandi A, Purnomo EP, Rahman T. Social media in aid of post disaster management. Transforming Government: People, Process and Policy. 2020; 14 (2):237–260. doi: 10.1108/TG-09-2019-0088. [ CrossRef ] [ Google Scholar ]
  • Mann L. Left to other peoples’ devices? A political economy perspective on the Big Data revolution in development. Development and Change. 2018; 49 (1):3–36. doi: 10.1111/dech.12347. [ CrossRef ] [ Google Scholar ]
  • Meier P. Digital humanitarians: How big data is changing the face of humanitarian response. Taylor and Francis; 2015. [ Google Scholar ]
  • Mulder F, Ferguson J, Groenewegen P, Boersma K, Wolbers J. Questioning Big Data: Crowdsourcing crisis data towards an inclusive humanitarian response. Big Data and Society. 2016; 3 (2):1–13. doi: 10.1177/2053951716662054. [ CrossRef ] [ Google Scholar ]
  • Nagendra NP, Narayanamurthy G, Moser R. Management of humanitarian relief operations using satellite big data analytics: The case of Kerala floods. Annals of Operations Research. 2020 doi: 10.1007/s10479-020-03593-w. [ CrossRef ] [ Google Scholar ]
  • Nasim M, Ramaraju GV. Using passive anonymous mobile positioning data & aggregation analytics to enhance tool-sets for flood relief agencies. International Journal of Engineering and Advanced Technology. 2019; 8 (5):657–663. [ Google Scholar ]
  • Ofli F, Meier P, Imran M, Castillo C, Tuia D, Rey N, Briant J, Millet P, Reinhard F, Parkan M, Joost S. Combining human computing and machine learning to make sense of big (aerial) data for disaster response. Big Data. 2016; 4 (1):47–59. doi: 10.1089/big.2014.0064. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oloruntoba R, Hossain GF, Wagner B. Theory in humanitarian operations research. Annals of Operations Research. 2019; 283 (1–2):543–560. doi: 10.1007/s10479-016-2378-y. [ CrossRef ] [ Google Scholar ]
  • Oteng-Ababio M. ‘Prevention is better than cure’: Assessing Ghana's preparedness (capacity) for disaster management. Jàmbá: Journal of Disaster Risk Studies. 2013; 5 (2):1–11. doi: 10.4102/jamba.v5i2.75. [ CrossRef ] [ Google Scholar ]
  • Papadopoulos T, Gunasekaran A, Dubey R, Altay N, Childe SJ, Fosso-Wamba S. The role of Big Data in explaining disaster resilience in supply chains for sustainability. Journal of Cleaner Production. 2017; 142 :1108–1118. doi: 10.1016/j.jclepro.2016.03.059. [ CrossRef ] [ Google Scholar ]
  • Park M, Jung D, Lee S, Park S. Heatwave damage prediction using random forest model in Korea. Applied Sciences (switzerland) 2020; 10 (22):1–12. doi: 10.3390/app10228237. [ CrossRef ] [ Google Scholar ]
  • Prasad S, Zakaria R, Altay N. Big data in humanitarian supply chain networks: A resource dependence perspective. Annals of Operations Research. 2018; 270 (1–2):383–413. doi: 10.1007/s10479-016-2280-7. [ CrossRef ] [ Google Scholar ]
  • Puttinaovarat S, Horkaew P. Application programming interface for flood forecasting from geospatial big data and crowdsourcing data. International Journal of Interactive Mobile Technologies. 2019; 13 (11):137–156. doi: 10.3991/ijim.v13i11.11237. [ CrossRef ] [ Google Scholar ]
  • Qadir J, Ali A, ur Rasool R, Zwitter A, Sathiaseelan A, Crowcroft J. Crisis analytics: Big data-driven crisis response. Journal of International Humanitarian Action. 2016; 1 (1):1–21. doi: 10.1186/s41018-016-0013-9. [ CrossRef ] [ Google Scholar ]
  • Qayum A, Ahmad F, Arya R, Singh RK. Predictive modeling of forest fire using geospatial tools and strategic allocation of resources: EForestFire. Stochastic Environmental Research and Risk Assessment. 2020; 34 (12):2259–2275. doi: 10.1007/s00477-020-01872-3. [ CrossRef ] [ Google Scholar ]
  • Ragini JR, Anand PMR, Bhaskar V. Big data analytics for disaster response and recovery through sentiment analysis. International Journal of Information Management. 2018; 42 :13–24. doi: 10.1016/j.ijinfomgt.2018.05.004. [ CrossRef ] [ Google Scholar ]
  • Read R, Taithe B, Mac Ginty R. Data hubris? Humanitarian information systems and the mirage of technology. Third World Quarterly. 2016; 37 (8):1314–1331. doi: 10.1080/01436597.2015.1136208. [ CrossRef ] [ Google Scholar ]
  • ReliefWeb. (2008). Glossary of humanitarian terms . Retrived from, https://reliefweb.int/report/world/reliefweb-glossary-humanitarian-terms-enko
  • Rogstadius J, Vukovic M, Teixeira CA, Kostakos V, Karapanos E, Laredo JA. CrisisTracker: Crowdsourced social media curation for disaster awareness. IBM Journal of Research and Development. 2013; 57 (5):4:1–4:13. doi: 10.1147/JRD.2013.2260692. [ CrossRef ] [ Google Scholar ]
  • Romascanu A, Ker H, Sieber R, Greenidge S, Lumley S, Bush D, Morgan S, Zhao R, Brunila M. Using deep learning and social network analysis to understand and manage extreme flooding. Journal of Contingencies and Crisis Management. 2020; 28 (3):251–261. doi: 10.1111/1468-5973.12311. [ CrossRef ] [ Google Scholar ]
  • Sandvik KB, Gabrielsen Jumbert M, Karlsrud J, Kaufmann M. Humanitarian technology: A critical research agenda. International Review of the Red Cross. 2014; 96 (893):219–242. doi: 10.1017/S1816383114000344. [ CrossRef ] [ Google Scholar ]
  • Sandvik KB, Jacobsen KL, McDonald SM. Do no harm: A taxonomy of the challenges of humanitarian experimentation. International Review of the Red Cross. 2017; 99 (904):319–344. doi: 10.1017/S181638311700042X. [ CrossRef ] [ Google Scholar ]
  • Sangameswar MV, Nagabhushana Rao M, Satyanarayana S. An algorithm for identification of natural disaster affected area. Journal of Big Data. 2017; 4 (1):1–11. doi: 10.1186/s40537-017-0096-1. [ CrossRef ] [ Google Scholar ]
  • Sarker MNI, Peng Y, Yiran C, Shouse RC. Disaster resilience through big data: Way to environmental sustainability. International Journal of Disaster Risk Reduction. 2020; 51 :101769. doi: 10.1016/j.ijdrr.2020.101769. [ CrossRef ] [ Google Scholar ]
  • Sarker MNI, Yang B, Lv Y, Huq ME, Kamruzzaman MM. Climate change adaptation and resilience through big data. International Journal of Advanced Computer Science and Applications. 2020; 11 (3):533–539. doi: 10.14569/IJACSA.2020.0110368. [ CrossRef ] [ Google Scholar ]
  • Shah SA, Seker DZ, Hameed S, Draheim D. The rising role of big data analytics and IoT in disaster management: Recent advances, taxonomy and prospects. IEEE Access. 2019; 7 :54595–54614. doi: 10.1109/ACCESS.2019.2913340. [ CrossRef ] [ Google Scholar ]
  • Shaluf IM. Disaster types. Disaster Prevention and Management: An International Journal. 2007; 16 (5):704–717. doi: 10.1108/09653560710837019. [ CrossRef ] [ Google Scholar ]
  • Shaluf IM. An overview on disasters. Disaster Prevention and Management: An International Journal. 2007; 16 (5):687–703. doi: 10.1108/09653560710837000. [ CrossRef ] [ Google Scholar ]
  • Shaluf IM, Ahmadun FR, Said AM. A review of disaster and crisis. Disaster Prevention and Management: An International Journal. 2001; 12 (1):24–32. doi: 10.1108/09653560310463829. [ CrossRef ] [ Google Scholar ]
  • Shan S, Zhao F, Wei Y, Liu M. Disaster management 2.0: A real-time disaster damage assessment model based on mobile social media data—A case study of Weibo (Chinese Twitter) Safety Science. 2019; 115 :393–413. doi: 10.1016/j.ssci.2019.02.029. [ CrossRef ] [ Google Scholar ]
  • Sharma P, Joshi A. Challenges of using big data for humanitarian relief: Lessons from the literature. Journal of Humanitarian Logistics and Supply Chain Management. 2019; 10 (4):423–446. doi: 10.1108/JHLSCM-05-2018-0031. [ CrossRef ] [ Google Scholar ]
  • Sherchan W, Pervin S, Butler CJ, Lai JC, Ghahremanlou L, Han B. Harnessing Twitter and Instagram for disaster management. IBM Journal of Research and Development. 2017; 61 (6):81–812. doi: 10.1147/JRD.2017.2729238. [ CrossRef ] [ Google Scholar ]
  • Snyder H. Literature review as a research methodology: An overview and guidelines. Journal of Business Research. 2019; 104 :333–339. doi: 10.1016/j.jbusres.2019.07.039. [ CrossRef ] [ Google Scholar ]
  • Song X, Zhang H, Akerkar RA, Huang H, Guo S, Zhong L, Ji Y, Opdahl AL, Purohit H, Skupin A, Pottathil A, Culotta A. Big data and emergency management: Concepts, methodologies, and applications. IEEE Transactions on Big Data. 2020 doi: 10.1109/TBDATA.2020.2972871. [ CrossRef ] [ Google Scholar ]
  • Susha I. Establishing and implementing data collaborations for public good: A critical factor analysis to scale up the practice. Information Polity. 2020; 25 (1):3–24. doi: 10.3233/IP-180117. [ CrossRef ] [ Google Scholar ]
  • Swaminathan JM. Big data analytics for rapid, impactful, sustained, and efficient (RISE) humanitarian operations. Production and Operations Management. 2018; 27 (9):1696–1700. doi: 10.1111/poms.12840. [ CrossRef ] [ Google Scholar ]
  • Swiss Re. (2021). Natural catastrophes in 2020 . S. R. M. Ltd. Retrived from, https://www.swissre.com/institute/research/sigma-research/sigma-2021-01.html
  • Tachizawa EM, Wong CY. Towards a theory of multi-tier sustainable supply chains: A systematic literature review. Supply Chain Management. 2014; 19 :643–653. doi: 10.1108/SCM-02-2014-0070. [ CrossRef ] [ Google Scholar ]
  • Talley JW. Disaster management in the digital age. IBM Journal of Research and Development. 2020; 64 (1–2):1:1–1:5. doi: 10.1147/JRD.2019.2954412. [ CrossRef ] [ Google Scholar ]
  • Taylor AJ. A pattern of disasters and victims. Disasters. 1990; 14 (4):291–300. doi: 10.1111/j.1467-7717.1990.tb01074.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Taylor L. The ethics of big data as a public good: Which public? Whose good? Philosophical Transactions of the Royal Society a: Mathematical, Physical and Engineering Sciences. 2016; 374 (2083):20160126. doi: 10.1098/rsta.2016.0126. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thom D, Krüger R, Ertl T. Can twitter save lives? A broad-scale study on visual social media analytics for public safety. IEEE Transactions on Visualization and Computer Graphics. 2016; 22 (7):1816–1829. doi: 10.1109/TVCG.2015.2511733. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thow, A., Vernaccini, L., Poljansek, K., & Marin Ferrer, M. (2020). INFORM report 2020: Shared evidence for managing crisis and disaster . P. O. o. t. E. Union. Retrived from, https://publications.jrc.ec.europa.eu/repository/handle/JRC120275
  • Tomaszewski B, MacEachren AM. Geovisual analytics to support crisis management: Information foraging for geo-historical context. Information Visualization. 2012; 11 (4):339–359. doi: 10.1177/1473871612456122. [ CrossRef ] [ Google Scholar ]
  • Tranfield D, Denyer D, Smart P. Towards a methodology for developing evidence-informed management knowledge by means of systematic review. British Journal of Management. 2003; 14 (3):207–222. doi: 10.1111/1467-8551.00375. [ CrossRef ] [ Google Scholar ]
  • Tullis JA, Kar B. Where is the provenance? Ethical replicability and reproducibility in GIScience and its critical applications. Annals of the American Association of Geographers. 2020 doi: 10.1080/24694452.2020.1806029. [ CrossRef ] [ Google Scholar ]
  • UN OCHA. (2010). OCHA on message: Humanitarian principles . Retrived from, https://www.unocha.org/node/897
  • UN OCHA. (2020). Global humanitarian overview 2021 . Retrived from, https://reliefweb.int/report/world/global-humanitarian-overview-2021-enarfres
  • UN OCHA. (2021). Acting before the flood an anticipatory humanitarian action pilot in Bangladesh . Retrived from, https://reliefweb.int/report/bangladesh/acting-flood-anticipatory-humanitarian-action-pilot-bangladesh-march-2021
  • UN Global Pulse. (2012). Big data for development: Challenges and opportunities . Retrived from, https://www.unglobalpulse.org/document/big-data-for-development-opportunities-and-challenges-white-paper/
  • UNDRR. (2015). Sendai framework for disaster risk reduction 2015–2030 . Retrived from, https://www.undrr.org/publication/sendai-framework-disaster-risk-reduction-2015-2030
  • UNDRR. (2020a). Hazard definition and classification review . Retrived from, https://www.undrr.org/publication/hazard-definition-and-classification-review
  • UNDRR. (2020b). Human cost of disasters an overview of the last 20 years 2000–2019 .
  • UNDRR. (2021). Sendai framework 6th anniversary: Time to recognize there is no such thing as a natural disaster - we're doing it to ourselves . Retrived from, https://www.undrr.org/news/sendai-framework-6th-anniversary-time-recognize-there-no-such-thing-natural-disaster-were
  • van den Homberg M, Monné R, Spruit M. Bridging the information gap of disaster responders by optimizing data selection using cost and quality. Computers and Geosciences. 2018; 120 :60–72. doi: 10.1016/j.cageo.2018.06.002. [ CrossRef ] [ Google Scholar ]
  • Van Wassenhove LN. Blackett memorial lecture humanitarian aid logistics: Supply chain management in high gear. Journal of the Operational Research Society. 2006; 57 (5):475–489. doi: 10.1057/palgrave.jors.2602125. [ CrossRef ] [ Google Scholar ]
  • Wang J, Meyer MC, Wu Y, Wang Y. Maximum data-resolution efficiency for fog-computing supported spatial big data processing in disaster scenarios. IEEE Transactions on Parallel and Distributed Systems. 2019; 30 (8):1826–1842. doi: 10.1109/TPDS.2019.2896143. [ CrossRef ] [ Google Scholar ]
  • Wang J, Sato K, Guo S, Chen W, Wu J. Big data processing with minimal delay and guaranteed data resolution in disaster areas. IEEE Transactions on Vehicular Technology. 2019; 68 (4):3833–3842. doi: 10.1109/TVT.2018.2889094. [ CrossRef ] [ Google Scholar ]
  • Wang J, Wu Y, Yen N, Guo S, Cheng Z. Big data analytics for emergency communication networks: A survey. IEEE Communications Surveys and Tutorials. 2016; 18 (3):1758–1778. doi: 10.1109/COMST.2016.2540004. [ CrossRef ] [ Google Scholar ]
  • Wang Z, Ye X. Social media analytics for natural disaster management. International Journal of Geographical Information Science. 2018; 32 (1):49–72. doi: 10.1080/13658816.2017.1367003. [ CrossRef ] [ Google Scholar ]
  • Warnier M, Alkema V, Comes T, Van de Walle B. Humanitarian access, interrupted: Dynamic near real-time network analytics and mapping for reaching communities in disaster-affected countries. Or Spectrum. 2020; 42 (3):815–834. doi: 10.1007/s00291-020-00582-0. [ CrossRef ] [ Google Scholar ]
  • Webster J, Watson RT. Analyzing the past to prepare for the future: Writing a literature review. MIS Quarterly: Management Information Systems. 2002; 26 (2):xiii–xxiii. [ Google Scholar ]
  • Wirtz A, Kron W, Löw P, Steuer M. The need for data: Natural disasters and the challenges of database management. Natural Hazards. 2014; 70 (1):135–157. doi: 10.1007/s11069-012-0312-4. [ CrossRef ] [ Google Scholar ]
  • Wu X, Cao Y, Xiao Y, Guo J. Finding of urban rainstorm and waterlogging disasters based on microblogging data and the location-routing problem model of urban emergency logistics. Annals of Operations Research. 2020; 290 (1–2):865–896. doi: 10.1007/s10479-018-2904-1. [ CrossRef ] [ Google Scholar ]
  • Yan L, Pedraza-Martinez AJ. Social media for disaster management: Operational value of the social conversation. Production and Operations Management. 2019; 28 (10):2514–2532. doi: 10.1111/poms.13064. [ CrossRef ] [ Google Scholar ]
  • Yang T, Xie J, Li G, Mou N, Li Z, Tian C, Zhao J. Social media big data mining and spatio-temporal analysis on public emotions for disaster mitigation. ISPRS International Journal of Geo-Information. 2019; 8 (1):1–23. doi: 10.3390/ijgi8010029. [ CrossRef ] [ Google Scholar ]
  • Yu M, Yang C, Li Y. Big data in natural disaster management: A review. Geosciences (switzerland) 2018; 8 (5):1–26. doi: 10.3390/geosciences8050165. [ CrossRef ] [ Google Scholar ]
  • Zamarreño-Aramendia G, Cristòfol FJ, De-San-eugenio-vela J, Ginesta X. Social-media analysis for disaster prevention: Forest fire in artenara and valleseco, Canary Islands. Journal of Open Innovation: Technology, Market, and Complexity. 2020; 6 (4):1–18. doi: 10.3390/joitmc6040169. [ CrossRef ] [ Google Scholar ]
  • Zhang C, Yao W, Yang Y, Huang R, Mostafavi A. Semiautomated social media analytics for sensing societal impacts due to community disruptions during disasters. Computer-Aided Civil and Infrastructure Engineering. 2020; 35 (12):1331–1348. doi: 10.1111/mice.12576. [ CrossRef ] [ Google Scholar ]
  • Zhang J, Ahlbrand B, Malik A, Chae J, Min Z, Ko S, Ebert DS. A visual analytics framework for microblog data analysis at multiple scales of aggregation. Computer Graphics Forum. 2016; 35 (3):441–450. doi: 10.1111/cgf.12920. [ CrossRef ] [ Google Scholar ]
  • Zhang X, Yi L, Zhao D. Community-based disaster management: A review of progress in China. Natural Hazards. 2013; 65 (3):2215–2239. doi: 10.1007/s11069-012-0471-3. [ CrossRef ] [ Google Scholar ]
  • Zhang X, Yu J, Chen Y, Wen J, Chen J, Yin Z. Supply-demand analysis of urban emergency shelters based on spatiotemporal population estimation. International Journal of Disaster Risk Science. 2020; 11 (4):519–537. doi: 10.1007/s13753-020-00284-9. [ CrossRef ] [ Google Scholar ]

Has Digital Financial Inclusion Curbed Carbon Emissions Intensity? Considering Technological Innovation and Green Consumption in China

  • Published: 11 March 2024

Cite this article

  • Ao Yang 2 , 3 ,
  • Mao Yang 1 ,
  • Fuyong Zhang 2 ,
  • Aza Azlina Md Kassim 3 &
  • Peixu Wang 2 , 3  

Humans and nature are a community of destiny, and carbon emission intensity (CEI) affects the economy’s and environment’s sustainable development. With the transformation of traditional finance and the advent of big data, digital inclusive finance (DFI) plays an increasingly important role in coping with carbon emission reduction. Based on China’s local data from 2011 to 2022, this paper uses panel intermediary, threshold effect, and spatial econometric models to test the impact of DFI on CEI. The main research conclusions include the following: (1) DFI, breadth of digital financial inclusion (DFI_B), depth of use of digital financial inclusion (DFI_D), and degree of digitalization (DIG) can significantly inhibit provincial CEI. Among them, DFI_D has a more significant inhibitory effect on CEI. (2) The intermediary effect model proved that DFI could reduce regional CEI by improving technological innovation and green consumption levels. (3) The heterogeneity test found that DFI in the central and western provinces has a more significant inhibitory effect on CEI. (4) Further research found a single threshold effect on the influence of DFI and economic development level on CEI. (5) The spatial Durbin model (SDM) found that DFI and CEI have positive spillover spatial effects. In addition, DFI can significantly suppress local CEI and improve the surrounding areas’ CEI. The results will help enrich the research on the application of DFI to CEI and provide empirical evidence for better implementing carbon emission reduction and building a beautiful China.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review of big data

Data Availability

Most of the data generated or analyzed during this study are included in this published article. The rest of the datasets used for analysis can be found in the CSMAR Database [ https://data.csmar.com/ ] and the WIND Database [ https://www.wind.com.cn/ ].

Aziz, A., & Naima, U. (2021). Rethinking digital financial inclusion: Evidence from Bangladesh. Technology in Society, 64 , 101509. https://doi.org/10.1016/j.techsoc.2020.101509

Article   Google Scholar  

Biswas, A., & Roy, M. (2015). Leveraging factors for sustained green consumption behavior based on consumption value perceptions: Testing the structural model. Journal of Cleaner Production, 95 , 332–340. https://doi.org/10.1016/j.jclepro.2015.02.042

Chang, L., Iqbal, S., & Chen, H. (2023). Does financial inclusion index and energy performance index co-move? Energy Policy, 174 , 113422. https://doi.org/10.1016/j.enpol.2023.113422

Chen, J., Cui, H., Xu, Y., & Ge, Q. (2021). Long-term temperature and sea-level rise stabilization before and beyond 2100: Estimating the additional climate mitigation contribution from China’s recent 2060 carbon neutrality pledge. Environmental Research Letters, 16 (7), 074032. https://doi.org/10.1088/1748-9326/ac0cac

Article   ADS   CAS   Google Scholar  

Chen, S., Tan, Z., Wang, J., Zhang, L., He, X., & Mu, S. (2023). Spatial and temporal evolution of synergizing the reduction of pollution and carbon emissions and examination on comprehensive pilot effects–Evidence from the national eco-industrial demonstration parks in China. Environmental Impact Assessment Review, 101 , 107147. https://doi.org/10.1016/j.eiar.2023.107147

Cheng, X., Yao, D., Qian, Y., Wang, B., & Zhang, D. (2023). How does fintech influence carbon emissions: Evidence from China’s prefecture-level cities. International Review of Financial Analysis , 102655. https://doi.org/10.1016/j.irfa.2023.102655

Connolly, J., & Prothero, A. (2008). Green consumption: Life-politics, risk and contradictions. Journal of Consumer Culture, 8 (1), 117–145. https://doi.org/10.1177/1469540507086422

Ding, X., Gao, L., Wang, G., & Nie, Y. (2022). Can the development of digital financial inclusion curb carbon emissions? Empirical test from spatial perspective. Frontiers in Environmental Science, 10 , 2093. https://doi.org/10.3389/fenvs.2022.1045878

Dong, F., Zhu, J., Li, Y., Chen, Y., Gao, Y., Hu, M., ... & Sun, J. (2022). How green technology innovation affects carbon emission efficiency: Evidence from developed countries proposing carbon neutrality targets. Environmental Science and Pollution Research , 29(24), 35780–35799. https://doi.org/10.1007/s11356-022-18581-9

Eisenman, J. (2023). Locating Africa in China’s community of shared future for mankind: A relational approach. Journal of International Development, 35 (1), 65–78. https://doi.org/10.1002/jid.3674

Fareed, Z., Rehman, M. A., Adebayo, T. S., Wang, Y., Ahmad, M., & Shahzad, F. (2022). Financial inclusion and the environmental deterioration in Eurozone: The moderating role of innovation activity. Technology in Society, 69 , 101961. https://doi.org/10.1016/j.techsoc.2022.101961

Feng, S., Chong, Y., Yu, H., Ye, X., & Li, G. (2022). Digital financial development and ecological footprint: Evidence from green-biased technology innovation and environmental inclusion. Journal of Cleaner Production, 380 , 135069. https://doi.org/10.1016/j.jclepro.2022.135069

Ge, H., Li, B., Tang, D., Xu, H., & Boamah, V. (2022). Research on digital inclusive finance promoting the integration of rural three-industry. International Journal of Environmental Research and Public Health, 19 (6), 3363. https://doi.org/10.3390/ijerph19063363

Article   PubMed   PubMed Central   Google Scholar  

Gilg, A., Barr, S., & Ford, N. (2005). Green consumption or sustainable lifestyles? Identifying the Sustainable Consumer. Futures, 37 (6), 481–504. https://doi.org/10.1016/j.futures.2004.10.016

Gleim, M. R., Smith, J. S., Andrews, D., & Cronin, J. J., Jr. (2013). Against the green: A multi-method examination of the barriers to green consumption. Journal of Retailing, 89 (1), 44–61. https://doi.org/10.1016/j.jretai.2012.10.001

Guo, Q., Wu, Z., Ding, C., Akbar, M. W., & Guo, T. (2023). An empirical analysis of the nexus between digital financial inclusion, industrial structure distortion, and China’s energy intensity. Environmental Science and Pollution Research , 1–15. https://doi.org/10.1007/s11356-023-25323-y

He, Z., Chen, H., Hu, J., & Zhang, Y. (2022). The impact of digital inclusive finance on provincial green development efficiency: Empirical evidence from China. Environmental Science and Pollution Research, 29 (60), 90404–90418. https://doi.org/10.1007/s11356-022-22071-3

Article   PubMed   Google Scholar  

Hong, M., Tian, M., & Wang, J. (2022). Digital Inclusive Finance, agricultural industrial structure optimization and agricultural green total factor productivity. Sustainability, 14 (18), 11450. https://doi.org/10.3390/su141811450

Hu, D., Fang, X., & DiGiovanni, Y. M. (2023). Technological progress, financial constrains, and digital financial inclusion. Small Business Economics, 1–29. https://doi.org/10.1007/s11187-023-00745-7

Jiang, C., & Ma, X. (2019). The impact of financial development on carbon emissions: A global perspective. Sustainability, 11 (19), 5241. https://doi.org/10.3390/su11195241

Kazemzadeh, E., Fuinhas, J. A., Salehnia, N., Koengkan, M., & Silva, N. (2023). Exploring necessary and sufficient conditions for carbon emission intensity: A comparative analysis. Environmental Science and Pollution Research, 30 (43), 97319–97338. https://doi.org/10.1007/s11356-023-29260-8

Ke, N., Lu, X., Kuang, B., & Zhang, X. (2023). Regional disparities and evolution trend of city-level carbon emission intensity in China. Sustainable Cities and Society, 88 , 104288. https://doi.org/10.1016/j.scs.2022.104288

Khan, K., Luo, T., Ullah, S., Rasheed, H. M. W., & Li, P. H. (2023). Does digital financial inclusion affect CO2 emissions? Evidence from 76 emerging markets and developing economies (EMDE’s). Journal of Cleaner Production, 420 , 138313. https://doi.org/10.1016/j.jclepro.2023.138313

Konadu, R., Ahinful, G. S., Boakye, D. J., & Elbardan, H. (2022). Board gender diversity, environmental innovation and corporate carbon emissions. Technological Forecasting and Social Change, 174 , 121279. https://doi.org/10.1016/j.techfore.2021.121279

Lee, C. C., & Wang, F. (2022). How does digital inclusive finance affect carbon intensity? Economic Analysis and Policy, 75 , 174–190. https://doi.org/10.1016/j.eap.2022.05.010

Lee, C. C., Wang, F., & Lou, R. (2022). Digital financial inclusion and carbon neutrality: Evidence from non-linear analysis. Resources Policy, 79 , 102974. https://doi.org/10.1016/j.resourpol.2022.102974

Li, G., Fang, X., & Liu, M. (2021). Will digital inclusive finance make economic development greener? Evidence from China. Frontiers in Environmental Science , 452. https://doi.org/10.3389/fenvs.2021.762231

Li, B., Liao, M., Yuan, J., & Zhang, J. (2023). Green consumption behavior prediction based on fan-shaped search mechanism fruit fly algorithm optimized neural network. Journal of Retailing and Consumer Services, 75 , 103471.

Li, J., & Li, B. (2022). Digital inclusive finance and urban innovation: Evidence from China. Review of Development Economics, 26 (2), 1010–1034. https://doi.org/10.1111/rode.12846

Li, J., Wu, Y., & Xiao, J. J. (2020). The impact of digital finance on household consumption: Evidence from China. Economic Modelling, 86 , 317–326. https://doi.org/10.1016/j.econmod.2019.09.027

Li, W., & Fan, Y. (2023). Influence of green finance on carbon emission intensity: Empirical evidence from China based on spatial metrology. Environmental Science and Pollution Research, 30 (8), 20310–20326. https://doi.org/10.1007/s11356-022-23523-6

Li, X., & Sui, S. (2023). Unraveling the influence and mechanism of digital inclusive finance on household financial substitution: Evidence from China. Asia Pacific Journal of Marketing and Logistics . https://doi.org/10.1108/APJML-09-2022-0799

Li, Y., Wang, M., Liao, G., & Wang, J. (2022). Spatial spillover effect and threshold effect of digital financial inclusion on farmers’ income growth—Based on provincial data of China. Sustainability, 14 (3), 1838. https://doi.org/10.3390/su14031838

Liang, J., Bai, W., Li, Q., Zhang, X., & Zhang, L. (2022). Dynamic mechanisms and institutional frameworks of China’s green development: An analysis from the perspective of collaboration. Sustainability, 14 (11), 6491. https://doi.org/10.3390/su14116491

Liu, J., Yu, Q., Chen, Y., & Liu, J. (2022a). The impact of digital technology development on carbon emissions: A spatial effect analysis for China. Resources, Conservation and Recycling, 185 , 106445. https://doi.org/10.1016/j.resconrec.2022.106445

Article   CAS   Google Scholar  

Liu, N., Hong, C., & Sohail, M. T. (2022b). Does financial inclusion and education limit CO2 emissions in China? A new perspective. Environmental Science and Pollution Research, 29 (13), 18452–18459. https://doi.org/10.1007/s11356-021-17032-1

Liu, X. J., Jin, X. B., Luo, X. L., & Zhou, Y. K. (2023). Multi-scale variations and impact factors of carbon emission intensity in China. Science of the Total Environment, 857 , 159403. https://doi.org/10.1016/j.scitotenv.2022.159403

Article   ADS   CAS   PubMed   Google Scholar  

Lv, M., & Bai, M. (2021). Evaluation of China’s carbon emission trading policy from corporate innovation. Finance Research Letters, 39 , 101565. https://doi.org/10.1016/j.frl.2020.101565

Ma, X. W., Ye, Y., Shi, X. Q., & Zou, L. L. (2016). Decoupling economic growth from CO2 emissions: A decomposition analysis of China’s household energy consumption. Advances in Climate Change Research, 7 (3), 192–200. https://doi.org/10.1016/j.accre.2016.09.004

Mukalayi, N. M., & Inglesi-Lotz, R. (2023). Digital financial inclusion and energy and environment: Global positioning of sub-Saharan African countries. Renewable and Sustainable Energy Reviews, 173 , 113069. https://doi.org/10.1016/j.rser.2022.113069

Murshed, M., Ahmed, R., Khudoykulov, K., Kumpamool, C., Alrwashdeh, N. N. F., & Mahmood, H. (2023). Can enhancing financial inclusivity lower climate risks by inhibiting carbon emissions? Contextual evidence from emerging economies. Research in International Business and Finance, 65 , 101902. https://doi.org/10.1016/j.ribaf.2023.101902

Ng, P. M., Cheung, C. T., Lit, K. K., Wan, C., & Choy, E. T. (2024). Green consumption and sustainable development: The effects of perceived values and motivation types on green purchase intention. Business Strategy and the Environment, 33 (2), 1024–1039. https://doi.org/10.1002/bse.3535

Peattie, K. (2010). Green consumption: Behavior and norms. Annual Review of Environment and Resources, 35 , 195–228. https://doi.org/10.1146/annurev-environ-032609-094328

Rawtani, D., Gupta, G., Khatri, N., Rao, P. K., & Hussain, C. M. (2022). Environmental damages due to war in Ukraine: A perspective. Science of the Total Environment, 850 , 157932. https://doi.org/10.1016/j.scitotenv.2022.157932

Rosa, E. A., & Dietz, T. (2012). Human drivers of national greenhouse-gas emissions. Nature Climate Change, 2 (8), 581–586. https://doi.org/10.1038/nclimate1506

Shahbaz, M., Li, J., Dong, X., & Dong, K. (2022). How financial inclusion affects the collaborative reduction of pollutant and carbon emissions: The case of China. Energy Economics, 107 , 105847. https://doi.org/10.1016/j.eneco.2022.105847

Shen, M., Huang, W., Chen, M., Song, B., Zeng, G., & Zhang, Y. (2020). (Micro) plastic crisis: Un-ignorable contribution to global greenhouse gas emissions and climate change. Journal of Cleaner Production, 254 , 120138. https://doi.org/10.1016/j.jclepro.2020.120138

Shi, F., Ding, R., Li, H., & Hao, S. (2022). Environmental regulation, digital financial inclusion, and environmental pollution: An empirical study based on the spatial spillover effect and panel threshold effect. Sustainability, 14 (11), 6869. https://doi.org/10.3390/su14116869

Song, X., & Chen, Z. (2023). Pathways for an island energy transition under climate change: The case of Chongming Island, China. Frontiers in Energy Research , 11. https://doi.org/10.3389/fenrg.2023.1126411

Sun, Y., Liu, N., & Zhao, M. (2019). Factors and mechanisms affecting green consumption in China: A multilevel analysis. Journal of Cleaner Production, 209 , 481–493. https://doi.org/10.1016/j.jclepro.2018.10.241

Tang, H. L., Liu, J. M., Mao, J., & Wu, J. G. (2020). The effects of emission trading system on corporate innovation and productivity-empirical evidence from China’s SO 2 emission trading system. Environmental Science and Pollution Research, 27 , 21604–21620. https://doi.org/10.1007/s11356-020-08566-x

Article   CAS   PubMed   Google Scholar  

Testa, F., Pretner, G., Iovino, R., Bianchi, G., Tessitore, S., & Iraldo, F. (2021). Drivers to green consumption: A systematic review. Environment, Development and Sustainability, 23 , 4826–4880. https://doi.org/10.1007/s10668-020-00844-5

Wang, H., & Guo, J. (2022). Impacts of digital inclusive finance on CO2 emissions from a spatial perspective: Evidence from 272 cities in China. Journal of Cleaner Production, 355 , 131618. https://doi.org/10.1016/j.jclepro.2022.131618

Wang, L., Liao, Y., Yang, L., Li, H., Ye, B., & Wang, W. (2016). Emergency response to and preparedness for extreme weather events and environmental changes in China. Asia Pacific Journal of Public Health, 28 (2_suppl), 59S-66S. https://doi.org/10.1177/1010539514549763

Wang, L., Wang, Y., Sun, Y., Han, K., & Chen, Y. (2022a). Financial inclusion and green economic efficiency: Evidence from China. Journal of Environmental Planning and Management, 65 (2), 240–271. https://doi.org/10.1080/09640568.2021.1881459

Wang, W., Gao, P., & Wang, J. (2023). Nexus among digital inclusive finance and carbon neutrality: Evidence from company-level panel data analysis. Resources Policy, 80 , 103201. https://doi.org/10.1016/j.resourpol.2022.103201

Wang, X., Wang, X., Ren, X., & Wen, F. (2022b). Can digital financial inclusion affect CO2 emissions of China at the prefecture level? Evidence from a spatial econometric approach. Energy Economics, 109 , 105966. https://doi.org/10.1016/j.eneco.2022.105966

Wang, X., & Zhong, M. (2023). Can digital economy reduce carbon emission intensity? Empirical evidence from China’s smart city pilot policies. Environmental Science and Pollution Research, 30 (18), 51749–51769. https://doi.org/10.1007/s11356-023-26038-w

Wu, M., Guo, J., Tian, H., & Hong, Y. (2022). Can digital finance promote peak carbon dioxide emissions? Evidence from China. International Journal of Environmental Research and Public Health, 19 (21), 14276. https://doi.org/10.3390/ijerph192114276

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wu, X., & Chang, H. (2023). Impact of digital inclusive finance on household tourism consumption: Evidence from China. European Journal of Innovation Management . https://doi.org/10.1108/EJIM-09-2022-0527

Xiao, Y., Ma, D., Zhang, F., Zhao, N., Wang, L., Guo, Z., ... & Xiao, Y. (2023). Spatiotemporal differentiation of carbon emission efficiency and influencing factors: From the perspective of 136 countries. Science of the Total Environment , 879, 163032. https://doi.org/10.1016/j.scitotenv.2023.163032

Xiong, M., Li, W., Teo, B. S. X., & Othman, J. (2022). Can China’s digital inclusive finance alleviate rural poverty? An empirical analysis from the perspective of regional economic development and an income gap. Sustainability, 14 (24), 16984. https://doi.org/10.3390/su142416984

Xiong, M., Li, W., Xian, B. T. S., & Yang, A. (2023). Digital inclusive finance and enterprise innovation—Empirical evidence from Chinese listed companies. Journal of Innovation & Knowledge, 8 (1), 100321. https://doi.org/10.1016/j.jik.2023.100321

Xu, D., Liu, E., Duan, W., & Yang, K. (2022). Consumption-driven carbon emission reduction path and simulation research in steel industry: A case study of China. Sustainability, 14 (20), 13693. https://doi.org/10.3390/su142013693

Xu, P., Ye, P., Jahanger, A., Huang, S., & Zhao, F. (2023). Can green credit policy reduce corporate carbon emission intensity: Evidence from China’s listed firms. Corporate Social Responsibility and Environmental Management . https://doi.org/10.1002/csr.2506

Xue, L., & Zhang, X. (2022). Can digital financial inclusion promote green innovation in heavily polluting companies? International Journal of Environmental Research and Public Health, 19 (12), 7323. https://doi.org/10.3390/ijerph19127323

Xue, Q., Feng, S., Chen, K., & Li, M. (2022). Impact of digital finance on regional carbon emissions: An empirical study of sustainable development in China. Sustainability, 14 (14), 8340. https://doi.org/10.3390/su14148340

Yan, X., Deng, Y., Peng, L., & Jiang, Z. (2023). Study on the impact of digital economy development on carbon emission intensity of urban agglomerations and its mechanism. Environmental Science and Pollution Research, 30 (12), 33142–33159. https://doi.org/10.1007/s11356-022-24557-6

Yang, L., Wang, L., & Ren, X. (2022b). Assessing the impact of digital financial inclusion on PM2. 5 concentration: Evidence from China. Environmental Science and Pollution Research , 1–8. https://doi.org/10.1007/s11356-021-17030-3

Yang, A., Huan, X., Teo, B. S. X., & Li, W. (2023a). Has green finance improved China’s ecological and livable environment?. Environmental Science and Pollution Research , 1–15. https://doi.org/10.1007/s11356-023-25484-w

Yang, B., Ma, F., Deng, W., & Pi, Y. (2022a). Digital inclusive finance and rural household subsistence consumption in China. Economic Analysis and Policy, 76 , 627–642. https://doi.org/10.6981/FEM.202101_2(1).0039

Yang, M., Chen, H., Long, R., & Yang, J. (2023b). How does government regulation shape residents’ green consumption behavior? A multi-agent simulation considering environmental values and social interaction. Journal of Environmental Management, 331 , 117231.

Yu, C., Jia, N., Li, W., & Wu, R. (2022). Digital inclusive finance and rural consumption structure–evidence from Peking University digital inclusive financial index and China household finance survey. China Agricultural Economic Review, 14 (1), 165–183. https://doi.org/10.1108/CAER-10-2020-0255

Zhang, Q., Li, J., Li, Y., & Huang, H. (2023a). Coupling analysis and driving factors between carbon emission intensity and high-quality economic development: Evidence from the Yellow River Basin China. Journal of Cleaner Production, 423 , 138831. https://doi.org/10.1016/j.jclepro.2023.138831

Zhang, R., Wu, K., Cao, Y., & Sun, H. (2023b). Digital inclusive finance and consumption-based embodied carbon emissions: A dual perspective of consumption and industry upgrading. Journal of Environmental Management, 325 , 116632. https://doi.org/10.1016/j.jenvman.2022.116632

Zhang, X., Sun, H., & Wang, T. (2022). Impact of financial inclusion on the efficiency of carbon emissions: Evidence from 30 provinces in China. Energies, 15 (19), 7316. https://doi.org/10.3390/en15197316

Zhang, Y. J. (2011). The impact of financial development on carbon emissions: An empirical analysis in China. Energy Policy, 39 (4), 2197–2203. https://doi.org/10.1016/j.enpol.2011.02.026

Zhang, Y., Xiao, C., & Zhou, G. (2020). Willingness to pay a price premium for energy-saving appliances: Role of perceived value and energy efficiency labeling. Journal of Cleaner Production, 242 , 118555. https://doi.org/10.1016/j.jclepro.2019.118555

Zhao, H., Yang, Y., Li, N., Liu, D., & Li, H. (2021). How does digital finance affect carbon emissions? Evidence from an Emerging Market. Sustainability, 13 (21), 12303. https://doi.org/10.3390/su132112303

Zheng, H., & Li, X. (2022). The impact of digital financial inclusion on carbon dioxide emissions: Empirical evidence from Chinese provinces data. Energy Reports, 8 , 9431–9440. https://doi.org/10.1016/j.egyr.2022.07.050

Zhengning, P. U., & Jinhua, F. E. I. (2022). The impact of digital finance on residential carbon emissions: Evidence from China. Structural Change and Economic Dynamics, 63 , 515–527. https://doi.org/10.1016/j.strueco.2022.07.006

Zhong, R., He, Q., & Qi, Y. (2022). Digital economy, agricultural technological progress, and agricultural carbon intensity: Evidence from China. International Journal of Environmental Research and Public Health, 19 (11), 6488. https://doi.org/10.3390/ijerph19116488

Zhu, Y., & Lin, B. (2004). Sustainable housing and urban construction in China. Energy and Buildings, 36 (12), 1287–1297. https://doi.org/10.1016/j.enbuild.2003.11.007

Zou, X. Y., Peng, X. Y., Zhao, X. X., & Chang, C. P. (2023). The impact of extreme weather events on water quality: International evidence. Natural Hazards, 115 (1), 1–21. https://doi.org/10.1007/s11069-022-05548-9

Download references

Author information

Authors and affiliations.

School of Business, Sias University, Zhengzhou, 451100, China

School of Economics and Trade, Henan University of Technology, Zhengzhou, 450001, China

Ao Yang, Fuyong Zhang & Peixu Wang

Graduate School of Management, Management and Science University, 40100, Shah Alam, Selangor, Malaysia

Ao Yang, Aza Azlina Md Kassim & Peixu Wang

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mao Yang .

Ethics declarations

Ethics approval.

All the sources have been cited appropriately, and there is no such issue during this study.

Consent to Participate

All authors of the article consent to participate.

Consent for Publication

All authors of the article consent to publish.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Yang, A., Yang, M., Zhang, F. et al. Has Digital Financial Inclusion Curbed Carbon Emissions Intensity? Considering Technological Innovation and Green Consumption in China. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-01902-3

Download citation

Received : 04 December 2023

Accepted : 04 March 2024

Published : 11 March 2024

DOI : https://doi.org/10.1007/s13132-024-01902-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Digital financial inclusion
  • Carbon emission intensity
  • Technological innovation
  • Beautiful China initiative
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) Major Research Topics in Big Data: A Literature Analysis from

    literature review of big data

  2. (PDF) RESEARCH IN BIG DATA -AN OVERVIEW

    literature review of big data

  3. Workflow of literature review, data collection and analysis.

    literature review of big data

  4. (PDF) BIG DATA ANALYTICS: LITERATURE STUDY ON HOW BIG DATA WORKS

    literature review of big data

  5. (PDF) Big Data Analytics on Social Media Data: A Literature Review

    literature review of big data

  6. (PDF) Potential of Big Data for Marketing: A Literature Review

    literature review of big data

VIDEO

  1. Chapter two

  2. Big data research Assingment-4

  3. The content of the literature review

  4. How to Conduct Research for Your Scientific Inquiry or Project?

  5. Lecture 11: Basics of Literature Review

  6. BIG DATA ANALYTICS: CURRENT AFFAIRS REVIEW: SCIENCE AND TECH

COMMENTS

  1. Literature Review on Big Data Analytics Methods

    Literatu re Review on Big Data Anal ytics Methods DOI: h p:// dx.doi. org/1 0. 57 72/intec hopen.8684 3 has a n eighbor point that can be rea ched via "move.

  2. Big Data Analytics: A Literature Review Paper

    Big Data Analytics: A Literature Review Paper . MapReduce nodes and the HDFS work together. A t ste. p. e. t including log files, sensor data, or anyth ing of the s. o. o.

  3. Big Data Analytics: A Literature Review Paper

    Abstract. In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and ...

  4. Literature Review on Big Data Analytics Methods

    In this research, a literature review on big data analytics, deep learning and its algorithms, and machine learning and related methods has been considered. As a result, a conceptual model is provided to show the relation of the algorithms that helps researchers and practitioners in deploying BDA on IOT data.

  5. Big data analytics capabilities: a systematic literature review and

    This poses a novel perspective on big data literature, since the vast majority focuses on tools, technical methods (e.g. data mining, textual analysis, and sentiment analysis), network analytics, and infrastructure. ... The main argument of this systematic literature review is that the value of big data does not solely rely on the technologies ...

  6. Debating big data: A literature review on realizing value from big data

    2.2. Analysis and synthesis of the literature. Our analysis focused on summarizing and analyzing existing theories on big data value realization, highlighting prevailing debates related to this topic, and identifying supporting evidence and gaps in the literature (Jones and Gatrell, 2014).Our aim was to provide new insights that can contribute to future research and thus, to go beyond merely ...

  7. Big data stream analysis: a systematic literature review

    Recently, big data streams have become ubiquitous due to the fact that a number of applications generate a huge amount of data at a great velocity. This made it difficult for existing data mining tools, technologies, methods, and techniques to be applied directly on big data streams due to the inherent dynamic characteristics of big data. In this paper, a systematic review of big data streams ...

  8. A Systematic Literature Review and Future Perspectives for Handling Big

    Big data analytics in cancer disease-based systematic literature review is offered, acting as a road map for experts in the area to spot and deal with problems caused by new developments. A comprehensive analysis of the issues and challenges posed by deep learning-based healthcare big data analytics is given, along with a look ahead.

  9. Big data analytics in healthcare: a systematic literature review

    Big data analytics in healthcare: a systematic literature review. The current study performs a systematic literature review (SLR) to synthesise prior research on the applicability of big data analytics (BDA) in healthcare. The SLR examines the outcomes of 41 studies, and presents them in a comprehensive framework.

  10. Big data analytics: a literature review

    Dazhi Chong. With more and more data generated, it has become a big challenge for traditional architectures and infrastructures to process large amounts of data within an acceptable time and resources. In order to efficiently extract value from these data, organizations need to find new tools and methods specialized for big data processing.

  11. Full article: A systematic literature review on the use of big data for

    1. Introduction. Big data has attracted substantial interest from researchers as a means to obtain insights to find solutions, new strategies, and to uncover hidden potentials for an array of purposes (Mayer-Schonberger & Cukier, Citation 2013).Big data is a term that primarily describes datasets that are so large, unstructured and complex that they require advanced and unique technologies to ...

  12. History, Evolution and Future of Big Data and Analytics: A Bibliometric

    Big data and performance. In management literature, at least, a cross-disciplinary overview of the BDA discussion is lacking. ... (i.e. performance in organizations). Potentially, as a result, our review does not replicate the big data research streams in healthcare, education and public management/government included in previous work (Fosso ...

  13. Towards the Use of Big Data in Healthcare: A Literature Review

    We conducted a literature review using the Scopus database over the period 2010-2020. The article selection process involved five steps: the planning and identification of studies, the evaluation of articles, the extraction of results, the summary, and the dissemination of the audit results. ... Big Data in Science and Healthcare: A Review of ...

  14. Big-data business models: A critical literature review and

    In particular, our review uses three major criteria (big-data business model types, dimensions, and deployment) to assess the state of the big-data business model literature and identify shortcomings in this literature. On this basis, we derive and discuss five central research perspectives (supply chain, stakeholder, ethics, national, and ...

  15. A new theoretical understanding of big data analytics capabilities in

    This research poses an original point of view on Big Data literature since, by far majority focuses on tools, infrastructure, technical aspects, and network analytics. ... Adebiyi A. Big data stream analysis: a systematic literature review. J Big Data. 2019;6(1):1-30. Google Scholar Jha AK, Agi MA, Ngai EW. A note on big data analytics ...

  16. A comprehensive and systematic literature review on the big data

    Systematic literature review (SLR) is a research methodology that examines data and findings of the researchers relative to specified questions [46, 47].It aims to find as much relevant research on the defined questions as possible and to use explicit methods to identify what can reliably be said based on these studies [48, 49].This section provides an SLR to understand the BDM techniques in ...

  17. Privacy Prevention of Big Data Applications: A Systematic Literature Review

    This Systematic Literature Review is designed to identify any privacy gap in Big Data applications, in particular in cybercrimes and various areas of the Internet, aspects related to the privacy of Big Data applications, such as techniques used to identify the privacy gaps and the security issues of the Big Data review, which may lead to ...

  18. A Literature Review on Big Data Analytics Capabilities

    by conducting an in-depth literature review. We adopted a systematic literature review approach and studied academic articles published between 2010 and 2018. We used Scopus and Web of Science (WoS) databases to find published studies related to big data analytics capabilities, twenty-five (25) of which met the selection criteria.

  19. A comprehensive and systematic literature review on the big data

    "Big data" technologies are a new generation of distributed architectures and technologies that provide distributed data mining capabilities to inexpensively, ... Systematic literature review (SLR) is a research methodology that examines data and findings of the researchers relative to specified questions [46, 47].

  20. PDF Big Data Analytics: A Literature Review Perspective

    For industry, a literature review helps with examining areas in big data analytics that are already mature as well as identifying problems that have been solved and those that have not been solved yet. This clarity helps investors and businesses to think positively about big data (Lee et al., 2014; Chen, M. et al., 2014).

  21. A systematic literature review on the use of big data analytics in

    This review further examined the literature on the basis of data utilised for research, and Fig. 9 displays all of the big data sources investigated. Spatial data including satellite, aerial, and map data mainly serve as a visual aid for humanitarian and disaster responders, and researchers rely on these data sources to obtain greater accuracy ...

  22. Sustainability

    With the help of big data analytical techniques, the average sample size has significantly increased from 142 for articles on and before 2010 to 4005 for articles after 2010. ... Chen, C. Science Mapping: A Systematic Review of the Literature. J. Data Inf. Sci. 2017, 2, 1-40. [Google Scholar] Van Eck, N.J.; Waltman, L. Software survey ...

  23. A systematic literature review of big data adoption in ...

    The in-depth literature review on big data from the study of Chen et al. reveals that the adoption of big data in e-commerce and market intelligence catches the most attention with top five topics on competitive advantage, big data, data warehousing, decision support, and customer relationship management.

  24. Perspective of using green walls to achieve better energy efficiency

    The continuous state of development of the human society imply a big effort and demand to generate and use more and more energy to power our lifestyle. The impact of human activity has negative effects on the environment and thus solutions to maintain or increase our development rate but in the same time to decrease the pressure on the environment must be developed. In this context, the ...

  25. A systematic literature review on the use of big data analytics in

    At the start of this review, 168 million individuals required humanitarian assistance, at the conclusion of the research, the number had risen to 235 million. Humanitarian aid is critical not just for dealing with a pandemic that occurs once every century, but more for assisting amid civil conflicts, surging natural disasters, as well as other kinds of emergencies. Technology's dependability ...

  26. Has Digital Financial Inclusion Curbed Carbon Emissions ...

    Humans and nature are a community of destiny, and carbon emission intensity (CEI) affects the economy's and environment's sustainable development. With the transformation of traditional finance and the advent of big data, digital inclusive finance (DFI) plays an increasingly important role in coping with carbon emission reduction. Based on China's local data from 2011 to 2022, this paper ...