• Open access
  • Published: 20 April 2017

Internet big data and capital markets: a literature review

  • Minjian Ye 1 &
  • Guangzhong Li 1  

Financial Innovation volume  3 , Article number:  6 ( 2017 ) Cite this article

12k Accesses

18 Citations

4 Altmetric

Metrics details

Research in various academic disciplines has undergone tremendous changes in the era of big data. Everyone is talking about big data nowadays, but how exactly is it being applied in research on financial studies?

This study summarizes the sources of Internet big data for research related to capital markets and the analytical methods that have been used in the literature. In addition, it presents a review of the research findings based on Internet big data in the field of capital markets and proposes suggestions for future studies in which big data can be applied to examine issues related to capital markets.

(1) Internet big data sources related to present capital market research can be categorized into forum-type data, microblog-type data and search class data. (2) As for research about investors’ sentiments on the basis of Internet big data, the main methods of sentiment analysis include building an inventory of lexical categories, using dictionaries for analysis of lexical categories, and machine learning. (3) Many studies address whether Internet big data can predict capital markets. However, they reach no consistent conclusions, which could be due to limitations of sample and analysis method used. (4) Data collection technique and analysis methods require further improvements.

Introduction

This is the era of information explosion and a world overwhelmed by numbers and digits. Research by the International Data Corporation indicates that the global data volume is expected to reach 35 zettabytes (ZB) Footnote 1 by 2020. Beyond that, the trend of growth doubling every 2 years will be maintained. This implies that we have entered the era of big data. Simultaneously, the term “big data” has been mentioned repeatedly in both commercial applications and academic research. The World Economic Forum Footnote 2 has claimed that big data is a new type of asset class. In Forbes magazine, Rotella has labeled big data “the new oil.” Footnote 3 In addition, research on big data within academia is increasing rapidly; a summary of the number of papers published in 2010–2015 with “big data” as the subject and listed in the CNKI and Web of Science databases is illustrated in Fig.  1 .

Number of papers published in 2010–2015 with “big data” as the subject

The growing emphasis that academia has placed on research on big data can be discerned intuitively. In fact, academic research on the topic has surged in the last 3 years. Furthermore, other events show that big data plays an important role in the developmental process of modern societies. These include the launch of the Big Data Research and Development Initiative Footnote 4 by the Obama Administration in 2012 and the special report on big data Footnote 5 published by the United Nations Global Pulse.

Big data has various unique characteristics, including volume, velocity, variety, and veracity. The application of big data in the financial field has developed rapidly, and such applications are highly valued by various fields in society. The most direct application is the development of various fund products based on big data. The earliest index product based on big data to be launched in China was Dingtoubao by Galaxy Asset Management Co. This product mainly invests in the Teng An 100 Index, a stock market index jointly created by Tencent Holdings Limited and Ji An Financial Information and managed by China Securities Index Co., Ltd. (CSI). This was soon followed by three fund management companies: GF, China Southern, and Bosera. These companies successively developed big data indexes jointly with owners of Internet big data resources, including Baidu, Sina, and Alibaba. This has led to the launch of separate funds, such as the GF’s Baifa 100 Index A, China Southern’s Big Data 100 Index and Big Data 300 Index A, and Bosera’s CSI Taojin Big Data 100 Index A. Footnote 6

As of October 2016, there were 19 funds in the Chinese market named after big data, with a combined scale of funds amounting to 15 billion RMB. Footnote 7 One of the funds with outstanding performance is the Dacheng CSI 360 Internet + Big Data 100 Index Fund, which ranked second among stock-oriented funds that were newly launched in 2016. From the second quarter, it ranked top among all the big data funds and was in the top 10 in the Chinese market for stock-oriented funds. Furthermore, it achieved the highest cumulative gain of 25.6%, which not only delighted its fund investors but also surprised those in the big data fund industry. Footnote 8

As big data has been incorporated successfully into business, how should big data be applied in academic research related to capital markets? To answer this question, we must first clearly ascertain where big data is located and who owns it. At present, it seems that big data is controlled mainly by several Internet companies and public service sectors. Some organizations or departments that are relatively rich in data resources are illustrated in Fig.  2 . However, considering data availability, most research on big data in the financial field has focused on big data from social networking platforms and search engines. Hence, the review subjects for this study are studies on big data from forums, microblogs, and search engines related to the capital markets. The review was undertaken from the perspective of data acquisition.

Distribution of big data

The main review topics are as follows: (i) big data indexes used in research on capital markets and analysis methods and (ii) research findings on big data indexes related to the field of capital markets. These two aspects are discussed in "Internet big data related to capital markets" and "Application of Internet big data to capital markets " sections, respectively. "Conclusions and future research" section provides recommendations on future research directions.

Internet big data related to capital markets

Observing investors’ decision-making processes was difficult prior to the advent of the Internet era. Furthermore, it was difficult to directly observe information that investors are concerned about and their views concerning the market. With the ubiquity of information technology and the Internet, an increasing number of investors are gathering information from the Internet for analysis. Through a click of the mouse and inputs via the keyboard, such investment decision behavior has resulted in the accumulation of a huge amount of unstructured information in the virtual network. For example, Google processed 2 million searches per minute in 2012, whereas Twitter users posted approximately 100,000 tweets per minute (James 2012 ). Over time, these types of data gradually have become the basic sources of big data for research related to capital markets.

Specifically, most research on Internet big data related to capital markets can be categorized into forum-type, microblog-type, and search-type data. Given the significant differences in the structures of the big data, each type is explained and analyzed separately in the following subsections.

Forum-type data

Internet forums provide users with a platform for communication and interaction that transcends time and geographical region. Within the same message board of a forum, investors are able to interact with other investors worldwide. Forums on stocks and shares are generally categorized into separate message boards for different stocks; therefore, investors can have discussions and exchanges on those message boards corresponding to the stocks of their interest. On forum sites, the general approach is to automatically place that the newest posts and posts on a message board that are being discussed most intensely on the home pages. In addition, message board managers evaluate the content quality of posts and bump the good ones to the top or pin these as recommended posts, thereby making them easily found within the sub-board that contains all the pinned or sticky posts. This means that investors can quickly obtain the latest, relatively important, and high-quality information on the stocks that they are interested in by checking the relevant message boards. However, pinned posts are screened and selected by message board managers, indicating that a certain degree of subjectivity might be involved. Moreover, since individuals are unable to customize the various message boards to focus on information related to a particular topic, it might be inefficient to obtain information from stock message boards given investors’ limited attention. In addition, the timeliness of interactions on the boards is poor.

The main Chinese stock message boards include Eastmoney.com, Homeway.com, Iguba.com, and Taoguba.com. For existing studies, the stock message boards of Eastmoney.com are the main data sources selected to represent Chinese forums. Examples of such studies include those by Ackert et al. ( 2016 ), Dong and Xiao ( 2011 ), Huang et al. ( 2016 ), Nan ( 2015 ), Shen et al. ( 2013 ) and Zheng et al. ( 2015 ). Several probable reasons exist for this situation, as follows. (i) Eastmoney.com has always been ranked the most popular Footnote 9 among Chinese stock message boards, its users are active, and the volume of its cumulative data is huge. (ii) The time period over which its public data has been saved is longer. For example, its earliest saved data on Gree Electric Appliances Inc. dates back to August 29, 2007, whereas data from Homeway.com are available only for the 3 months. (iii) Information on Eastmoney.com is richer and more detailed. For example, studies by Dong and Xiao ( 2011 ) and Huang et al. ( 2016 ) required identifying of the geographical region of users who posted messages. Only Eastmoney.com can provide relatively sufficient data to meet such a requirement.

In terms of the basic data indicators, the main types selected for existing studies generally can be classified as quantitative and content class indicators. The former includes the number of posts, clicks, shares, and investors participating in discussions (Ackert et al. 2016 ; Dong and Xiao 2011 ; Huang et al. 2016 ; Jiang et al. 2015 ; Shen et al. 2013 ; Zheng et al. 2015 ). Simultaneously, most literature has evaluated the information contents of forum posts using textual analysis to construct the content class indicators. The main types include indexes based on numbers of opposers and supporters and the degree of divergence in opinions (Antweiler and Frank 2004 ; Das et al. 2005 ; Nan 2015 ; Sabherwal et al. 2011 ; Zhang and Swanson 2010 ). In a study on information exchange over the Internet and herding behavior in the stock market, Zheng et al. ( 2015 ) selected the natural logarithm for the total number of daily posts as the proxy variable for investors’ communication.

Presently, there are two main text classification methods for constructing indicators of market sentiments: manual classification and machine learning. Studies based on Chinese forums mostly have relied on manual classification with supplementation from simple programming classification (Ackert et al. 2016 ; Nan 2015 ; Shen et al. 2013 ). In his study on the impact of online public opinions on private placement of listed firms, Shen et al. ( 2013 ) used web crawler technology to collect all posts that contain the word “additional issue.” He then randomly selected 3,000 posts from the overall samples for manual classification. In addition, 26 words that appeared in those selected posts were deemed to reflect the tendency to oppose. These words included “opposition,” “failure,” and “misappropriation of funds.” Finally, a program was used to identify whether posts contained any opposing content so that all posts could be categorized as either the opposing or supporting type. The proportions of each type of posts relative to the total number of posts were used to construct the indexes for opposition and support, which were used to gauge the direction of online public opinions.

Based on Shen et al. ( 2013 ), Nan ( 2015 ) calculated the squared difference between the indexes for opposition and support. This led to the construction of a variable for the degree of divergence in opinions. Similarly, Ackert et al. ( 2016 ) manually set up keywords reflecting bullish or bearish market sentiments and then used a program to identify the market sentiments reflected in the posts, thereby building an indicator for trends in market sentiments.

For research on English forums, machine learning or tagging of posts is used to determine textual sentiment. Antweiler and Frank ( 2004 ) collected text information from posts made on message boards in Yahoo! Finance and Raging Bull forums that corresponded to 45 stocks. The Naive Bayes algorithm was applied to classify textual sentiment, which was used as the basis for constructing the indicator for the degree of divergence. Separately, Sabherwal et al. ( 2011 ) selected data from posts on forums in TheLion.com. Similarly, the Naive Bayes classifier method was applied to build indicators for market sentiments and degree of divergence. Leung and Ton ( 2015 ) captured text information from posts made on the HotCopper forum for more than 2,000 stocks and then applied the Naive Bayes algorithm to classify market sentiments and build an indicator for the degree of divergence.

Besides the Naive Bayes method, some scholars have used the maximum entropy method for text classification. Zhang and Swanson ( 2010 ) collected data from posts on message boards in Yahoo! Finance that corresponded to 30 U.S. listed companies. The maximum entropy model was used for text classification, leading to the construction of an indicator for market sentiments. In addition, some scholars have realized that the reliability ranking of the person who made the post affects the level of trust of other investors regarding its contents. This causes posts to have different weightings within the overall indicator for market sentiments. Hence, it might be more appropriate to use the reliability-weighted average method instead of a simple weighted average method to build the indicator for overall trends in market sentiments.

Forums on TheLion.com have an exclusive feature that indicates the reliability level of its users. Thus, Sabherwal et al. ( 2011 ) used the related data to create a reliability-weighted overall indicator for market sentiments. Gu et al. ( 2006 ) calculated the reliability scores for forum users based on the forecast accuracy of their previous posts, creating a reliability-weighted overall indicator for market sentiments. After 2004, some of the data sources (e.g., Yahoo! Finance) began allowing users to tag their personal sentiments about the market when making posts. This enables scholars to directly use the self-disclosed sentiments and tag these for text classification (Kim and Kim 2014 ).

In addition, some stock message boards have unique data that provide the appropriate context for studies in specific fields. For example, among Chinese stock message boards, only Eastmoney.com provides data on the IP addresses of users who have posted a message. This unique feature facilitates the study on home bias in investors communication. Dong and Xiao ( 2011 ) employed this feature; through users’ respective IP addresses, the authors identified the geographical regions of participants involved in discussions on stock message boards. Looking at the message board of a particular stock, the authors compared the number of investors from a particular province engaged in discussions against the total number of investors having discussions on that board. This enabled the authors to measure the local level of exchange for that stock in the province. Extending the same approach to other provinces, the authors established the differences in levels of exchange between two provinces for a particular stock. Similarly, Huang et al. ( 2016 ) utilized the IP data of Eastmoney.com and constructed a quantitative indicator of local bias in investor attention. The indicator directly measures the magnitude of home bias in information exchange.

On the HotCopper forum, if investors publish posts that do not comply with stipulated regulations, board managers can make the necessary revisions and state the reason(s) for doing so beneath the affected posts. Tapping into this feature, Delort et al. ( 2011 ) examined the reasons for revisions and identified posts that contained keywords relating to the manipulation of stock prices. The event studies method was then used to analyze those posts in order to examine the situation of stock price manipulation known as “pump and dump”.

Microblog-type data

This type of data completely differs from forum-type data both in terms of data structure and means of information exchange. In terms of data structure, microblogging on a specific stock does not occur on message boards dedicated to that stock. Hence, the method used for collecting samples definitely impacts the conclusion. On the other hand, the efficiency of information exchange via microblogging is much faster than that via forum-type information. The transmission mode of microblog-type data is that of external chain reactions, which is much more efficient than the internal accumulation mode for forum-type data. Considering that investors have limited attention, microblogs help increase investors’ efficiency in accepting information. This is because the follow classification method and push mechanism of microblogging can filter irrelevant information more effectively.

Presently, the main microblogging platforms in China are Sina and Tencent, whereas Twitter is the main microblogging platform outside China. For research related to capital markets, most scholars typically have used the Sina microblog as the source for Chinese microblog-type data. For the English equivalent, Twitter is the main data source.

The basic data used for empirical study are quantity-type indicators, including the number of microblogs published, shares, and comments made (Cheng and Lin 2013 ; Mao et al. 2012 ; Ruiz et al. 2012 ; Sul et al. 2016 ; Xu and Chen 2016 ). The main content-type indicators include the frequency with which specific keywords appear or sentiment indexes (Bartov et al. 2015 ; Bollen et al. 2011a , 2011b ; Cheng and Lin 2013 ; Sprenger et al. 2014a ; Xu and Chen 2016 ). In addition, some scholars have used the large amount of microblogging data on user-follower relationships to construct social network-type indicators (Ruiz et al. 2012 ; Yang et al. 2015 ).

For research based on Sina microblog’s big data, Xu and Chen ( 2016 ) used web search technology to collect all content published by the official microblogs of listed companies. Next, 104 keywords were used to classify the microblogs into disclosure and nondisclosure types. The keywords were then grouped and used to categorize information disclosed by the official microblogs of listed companies into one of the following four categories: those related to the company’s business; finance; research and development; and reputation. The number of disclosures made by a listed company via its microblog in a day is treated as the company’s level of disclosure, and the frequency with which the disclosure type of keywords appears in the company’s microblog was used to measure the disclosed information contents. Considering that microblogs contain relatively substantial amounts of irrelevant information, Xu and Chen ( 2016 ) measured the level of noise in microblogs by subtracting the disclosure type of microblogs from the total number of microblogs published on the same day.

Separately, Cheng and Lin ( 2013 ) selected the microblogs of five accredited financial media organizations. Data on the number of microblogs published and commentaries made via these microblogs were used as the sources of sample data. Using sentiment analysis techniques, Cheng and Lin ( 2013 ) quantitatively calculated the amount of change in market sentiments at various levels, including the word, sentence, and overall post. The findings were then synthesized to become an indicator for investors’ daily market sentiments. In addition, other scholars have used the event studies method to analyze changes in the stock market performance of listed companies prior and subsequent to the launch of their official Sina microblogs (Jin et al. 2016 ).

With the exception of Mao et al. ( 2012 ), who examined the correlation between the number of posts and stock market from the levels of individual stocks, industries, and indexes, most other scholars who have used Twitter’s big data for research have focused only on indicators of market sentiments. Sprenger et al. ( 2014b ) collected Twitter data related to listed companies, used the Naive Bayes algorithm for text classification of market sentiments, and constructed a daily overall indicator to reflect those sentiments. Sul et al. ( 2016 ) collated information in tweets related to listed companies, examined the text contents, and determined market sentiments by analyzing the lexical categories (parts of speech) using the Harvard-IV dictionary. The authors similarly established a daily overall indicator of market sentiments.

For analytical purposes, Bartov et al. ( 2015 ) constructed four tendency indicators of market sentiments using the enhanced Naive Bayes classifier, Harvard-IV dictionary, and inventory of lexical categories by Loughran and McDonald ( 2011 ). After gathering information from Twitter on the numbers of posts, fans, and shares, Zhang et al. ( 2011a ) extracted vocabulary related to the words “hope” and “fear.” He then used the daily frequency of occurrence of such vocabulary as the indicator of market sentiments.

Some scholars have conducted in-depth mining of relationship class data found in microblogging data to undertake social network research. Ruiz et al. ( 2012 ) collected two main categories of data. The first category pertained to the numbers of posts and shares; the second category related to statistical variables of social networks, including the connected component of social networks and centrality of distribution. Yang et al. ( 2015 ) collected data on the personal information of Twitter users, in addition to relationship class data on users and their followers, which facilitated the development of a financial social networking structure within Twitter.

Search class data

Currently, Baidu and Google are the mainstream Chinese and English search engines, respectively. In China, Baidu has the most extensive market coverage. The majority of investors obtain information related to the stock market via this search engine (Zhang et al. 2014b ). At the global level, Google’s market share far exceeds that of other products in the same category. Most studies on the U.S. market have used the search data in Google Trends ( www.google.com/trends/ ) as their study sample. Search class data directly reflects users’ behavioral patterns for accessing information. In addition, keywords can capture the contents of users’ information needs. The search behavior of an individual directly expresses the user’s information needs, while the aggregate search behavior reflects the level of attention paid to a particular event, or the demand for certain types of information.

The basic search class data are mainly those utilizing the number of searches to measure the popularity of a topic and those that use search results to measure information contents. Although Baidu does not publish raw data on the number of searches, it has built separate indicators on the levels of media and users’ attention on a topic by applying big data analytical techniques to the raw data. Most studies have adopted these two indicators as proxies for attention levels. An example is Liu et al. ( 2014 ), who considered the absolute value of data on users’ and the media’s levels of attention as proxy variables for attention levels. However, Zhang et al. ( 2014b ) indicated that the absolute values of attention indicators often ignore the impact of company size on the level of attention received. The authors proposed that instead of the magnitude of the absolute value, the relative level of its changes is more worthy of concern. Based on this principle, Zhang et al. ( 2014b ) used the relative value of the Baidu indicator to measure search intensity. Based on that, the authors further constructed an indicator of positive rate of change in online search intensity, which acted as the proxy for changing trends in the search behavior of investors in the market.

In addition to data on search intensity, Baidu’s data on search results are also valuable. Most studies tend to focus on a particular network platform for data collection, and Baidu’s search results can reflect practically all of the content information found on the Chinese Internet. From this aspect and based on a mining algorithm for text semantics, Zhang et al. ( 2011b ) used Baidu search keywords to “search for X number of relevant pages.” With the search results, the authors used various techniques such as keyword selection, application of the “advanced search” function in the search engine, processing of date formats, and data cleaning to construct an open source information indicator. This indicator allows for continuous day-by-day observation of variables and facilitates the examination of the impact of open source information on asset pricing.

Google publishes two types of data that can be applied for research: (i) raw search data and (ii) search indicators that have been standardized. These allow for more diversified types of indicators to be constructed during research. Joseph et al. ( 2011 ) collected search data on the Standard & Poor’s (S&P) 500 Index’s component stocks. The authors directly defined the standardized search indicators as indicators of investor sentiments. After collecting the search data on the Russell 3000 Index’s component stocks, Da et al. ( 2011a ) defined the raw data on search numbers as the search volume index (SVI). After taking the logarithm of that index and performing median adjustment, the authors proposed an adjusted SVI, which they used to measure the level of investor attention. Takeda and Wakao ( 2014 ) collated Google search data on the Japanese market and built three indicators of search intensity.

Some scholars have been concerned that the weekend effect and seasonal variations might affect search volumes (Da et al. 2015 ; Drake et al. 2012 ). As such, these scholars often eliminated time trends from the raw data. Drake et al. ( 2012 ) argued that online search behavior reflects investor demand for public information. Thus, after collecting daily search data for the S&P 500 Index’s component stocks, the authors performed regression of the SVI against a time dummy variable. The residual value was then used to construct an abnormal SVI for measuring changes in demand for information. Da et al. ( 2015 ) selected 118 words from the Harvard IV-4 and Lasswell value dictionaries as keywords for searches. Furthermore, the authors used the Google search engine to obtain the raw search data. The latter was subjected to both first-order difference and seasonal trend adjustments to construct the Financial and Economic Attitudes Revealed by Search (FEARS) Index for measuring market sentiments.

Summary of usage of Internet big data

The discussion in " Forum-type data " to " Search class data " subsections indicates that the main research direction for big data is the impact of investor sentiments on the financial sector (Antweiler and Frank 2004 ; Bartov et al. 2015 ; Sprenger et al. 2014a ; Sabherwal et al. 2011 ). At present, the main methods for judging sentiments include building an inventory of lexical categories, using dictionaries for analysis of lexical categories, and machine learning.

The first method mostly relies on the collection of text needed for research. A large number of sentiment-type keywords are selected from the sample text to build an inventory of lexical categories, which are used as the judgment criterion for classification of all the text (Nan 2015 ; Shen et al. 2013 ; Xu and Chen 2016 ; Zhang et al. 2011a ). The second method is based on the lexical classifications in dictionaries. The relevant keywords are extracted and used for the classification of collated text (Bartov et al. 2015 ; Da et al. 2015 ; Sul et al. 2016 ). In the field of financial research, the main distinction between these two methods lies in the sources of keywords: the source of the first method is text information from the samples, whereas the sources for the second method are actual dictionaries or inventories of lexical categories developed on the basis of dictionaries. Although the first method seems more subjective, this method, which is based on manual screening of keywords from the samples, might be more appropriate in the Chinese context. This is due to vast linguistic variations such that the lexical category of a Chinese word might not be completely consistent across different contexts.

The main dictionary selected for the second method is the Harvard IV-4 (Da et al. 2011a ). However, many vocabularies that are not considered negative in the financial context are grouped into the negative lexical category in the Harvard IV-4 (Loughran and McDonald 2011 ). To address this issue, Loughran and McDonald ( 2011 ) developed an inventory of lexical categories that is more appropriate for financial research. However, Sul et al. ( 2016 ) highlighted that the inventory by Loughran and McDonald ( 2011 ) is more suitable for the analysis of formal financial documents (e.g., 10 K filings). Since microblog class text employs colloquial and popular terminologies, the Harvard-IV dictionary is still deemed more appropriate for their analysis.

The third method-machine learning-uses computer algorithms for the classification of textual sentiment. The main steps of machine learning are as follows: (i) select a specific text as the corpus training set and manually classify the words contained within, (ii) use a computer algorithm, such as the Naive Bayes algorithm, to train the text in the training set and establish the judgment rules for text classification, and (iii) apply the judgment rules to all text classifications.

The Naive Bayes algorithm is currently one of the most popular training algorithms for classifying textual sentiment (Antweiler and Frank 2004 ; Leung and Ton 2015 ; Sabherwal et al. 2011 ). Through simple comparison, Antweiler and Frank ( 2004 ) found that the accuracy of algorithm classification is actually higher than that of manual classification. Other than the Naive Bayes algorithm, some scholars have also used the maximum entropy model for text classification (Zhang and Swanson 2010 ).

During the process of data collection, posts related to stock information within forum class big data are often concentrated in the same message boards. This makes it relatively simple to determine the corresponding relationship between posts and stocks when collecting data. However, this process is relatively more complex for data of the microblog and search classes. This is because these two classes of data do not have specific boundaries that segregate the various stock data.

When acquiring search data, scholars have to collect corresponding data for a stock based on the keywords relevant to it. The accuracy of the search data is closely related to the keywords used for the search. Keywords available for scholars to choose from include stock code, company name, and names of a company’s main products (Da et al. 2011a , 2011b ). Keywords used for empirical research must be carefully selected by considering the special features of the research topic. Often, some stock codes or company names are similar to keywords for other objects and events. Examples include the Chinese stock-Zhangjiajie, which also is a city name, and American company Apple, which is also a kind of fruit. Search data for these types of keywords would often contain a lot of other irrelevant information, which must be removed during data collection (Da et al. 2011a ).

For the acquisition of microblog class big data related to stocks, it is similarly necessary to select appropriate search rules for text screening to ensure that the search contents contain relevant information. In their study on Twitter, Sprenger et al. ( 2014a ) used hashtags to collect data from text information. Sul et al. ( 2016 ) used the $ symbol and a stock’s abbreviation as the search criterion.

Application of Internet big data to capital markets

Internet big data and stock market performance.

The Internet has revolutionized the manner in which information is transmitted and the pattern by which investors process information (Barber and Odean 2001 ; Moat et al. 2014 ). At the present stage, big data from forums, microblogs, and search engines are mainly used to examine their impact on stock market performance. Many scholars believe that the various indicators constructed on the basis of big data (e.g., those of market sentiments, divergence in opinions, and level of attention) impact multiple variables, including stock returns, trading volumes, and volatility. Information extracted from Internet big data can definitely explain stock market performance to a certain extent (Alanyali et al. 2013 ; Bordino et al. 2012 ; Gloor et al. 2009 ; Siganos 2013 ; Sprenger et al. 2014b ; Wysocki 1998 ).

At the level of individual stocks, Wysocki ( 1998 ) initially found that the number of stock-related posts published at night is related to trading volumes. Using posts data made on Yahoo! Finance and Raging Bull, Antweiler and Frank ( 2004 ) discovered that information contained in the posts can help forecast stock volatility and stock returns, although the latter does not have any economic significance. In addition, the greater the divergence in market sentiments, the larger the stock trading volume. Sprenger et al. ( 2014b ) demonstrated that sentiments contained in tweets are significantly correlated with stock returns. The greater the number of daily microblogs, the larger the stock trading volume. Significant correlations were observed between divergence in market sentiments reflected on the microblogs and stock volatility.

Zhang et al. (2011b) conducted searches in Baidu using keywords. Furthermore, the authors defined the number of web pages returned from the searches as measurement indicators for the amount of information contained in social media. After analysis of information on social media and asset pricing, the authors found that the former is a rich source of effective information that affects abnormal stock returns. In addition, Vlastakis and Markellos ( 2012 ) found a significant and positive correlation between the volumes of Google searches and both market trading volumes and volatilities. When investors’ level of risk aversion increases, Google search volumes also increase. Ruiz et al. ( 2012 ) examined the user-follower relationships established in microblogs and concluded that the connection components within social networks and the number of nodes in interaction graphs are significantly correlated with trading volumes and stock prices, with the correlation being stronger for the former than the latter. Nevertheless, trading strategies developed on the basis of correlation between stock prices are superior to basic trading strategies.

Empirical research by Zhang et al. ( 2014b ) revealed that the intensity of online searches by investors impacts short-term stock returns, short-term trading volumes, and cumulative returns. In addition, investors’ online searches have stronger explanatory power and better forecasting ability on the stock market compared with the traditional variables of investors’ sentiments and level of attention. Shen et al. ( 2013 ) discovered that for a company facing negative public opinion about its private placement, the excess returns on stocks subsequent to the private placement notice are significantly negative.

Da et al. ( 2011a ) found that an increase in search volumes often meant a rise in stock prices over the subsequent fortnight, in addition to price reversals within the year. A similar reversal effect exists in the Chinese stock market (Yu and Zhang 2012 ; Zhang et al. 2014a ). Other scholars have analyzed the link between search volumes and initial public offering (IPO) premiums. Da et al. ( 2011a ) found that an increase in searches prior to IPO indicates greater gains on the first day when the shares are listed. The study by Song et al. ( 2011 ), based on data from Google Trends, indicated that online search volumes for a pre-IPO stock have better explanatory power and forecasting ability on a company’s level of stock sales, excess returns on the first day of trading, and long-term performance. Search volumes can explain 23% of the first-day excess returns and 10% or more of long-term cumulative returns. In addition, results from the empirical research by Nan ( 2015 ) indicate a significant and positive correlation between divergence of online opinions in stock message boards and IPO premiums.

For the indexes, Zhang et al. (2011a) found that the proportion of emotion-related vocabulary on microblogs is significantly and negatively correlated with the Dow Jones Index, NASDAQ Index, and S&P 500 Index. However, the proportion is significantly and positively correlated with the Volatility Index. For the Dow Jones Index, Bollen et al. ( 2011b ) showed that the addition of sentiment-type indicators significantly improves forecasting results. The accuracy of analysis for the daily direction of change in the Dow Jones Index was 86.7%, while the mean absolute percentage error decreased by 6%.

Da et al. ( 2015 ) found that the FEARS Index developed using Google’s search data can predict trends in short-term returns, volatility changes, and capital flows of mutual funds. Furthermore, the addition of Twitter data improves the model’s forecasting accuracy of the S&P 500 Index. Cheng and Lin ( 2013 ) showed that sentiment indicators of investors on social media are positively correlated with stock market index returns and trading volumes. The impact of those two factors on sentiment indicators can last for more than 40 trading days.

Mao et al. ( 2012 ) undertook a comprehensive analysis of the behavioral characteristics of Internet big data and various aspects of the stock market. They further analyzed the correlation between the number of Twitter posts and the stock market from the levels of individual stocks, industries, and indexes. Furthermore, they showed that Twitter could help predict stock market performance, especially at the indexes level.

However, the results of some empirical research has revealed that Internet big data does not improve forecasting results on the stock market. Kim and Kim ( 2014 ) highlighted that market sentiments contained in posts on message boards cannot predict future returns, volatilities, and trading volumes of stocks. In addition, the findings of Tumarkin and Whitelaw ( 2001 ) demonstrated that information in posts on message boards cannot predict stock returns or excess trading volumes, thereby supporting the efficient market hypothesis. Although Zhao et al. ( 2013 ) proved a positive correlation between search intensities on Baidu and stock returns, the rate of change in the level of attention is not a significant risk factor. The authors concluded that search intensities on Baidu do not systematically affect stock returns.

Although Internet big data was found to impact stock market performance, studies have shown that stock market performance similarly affects the behavioral characteristics of online investors. Kim and Kim ( 2014 ) discovered that sentiments in investors’ posts are affected by past performance of the stock. Zhang et al. ( 2014b ) indicated that although the stock market can affect online searches, online searching behaviors affect and predict stock market performance to a greater extent. The endogenous problem that exists between online searches and stock returns only has a minimal impact on forecasting ability.

Some scholars believe that the forecasting results of Internet big data on the stock market are affected by other factors. Many of these authors have discussed the relationship between the weight ratio of investor information and forecasting ability. Gu et al. ( 2006 ) weighed each post’s recommendation by its author’s credibility based on the accuracy of his/her past posts. The authors proved that a credibility-weighted recommendation of a stock message board can predict stock returns but a simple-weighted recommendation cannot. Yang et al. ( 2015 ) found that for the Twitter social network, weighted sentiment indicators based on critical nodes have better forecasting ability for the financial market than do general sentiment indicators. Zhang et al. ( 2016 ) identified financial users who were invited and certified by Sina Weibo as “celebrities.” Through event study analysis, the authors determined that posts made by celebrities could significantly predict stock returns compared to those made by ordinary users. The former contained more future public information and current private information, whereas the latter mostly comprised outdated information, indicating that the role of ordinary users tended toward one of information follower rather than of provider.

However, other scholars have learned that investors who are more influential might not publish information with better forecasting ability. According to Sul et al. ( 2016 ), tweets published by users with more followers are unable to predict stock returns, but those by fewer followers significantly impact future stock returns. In this regard, the authors believed that information disseminated by the former is quickly reflected in the stock prices and, hence, is not predictive. This reason was supported when analyzing the number of shares. Sul et al. ( 2016 ) further found that the more a piece of information is shared, the poorer is its forecasting results and vice versa. A trading strategy based on the aforementioned findings can achieve annualized returns of 11–15%.

Considering that Internet big data has the advantage of geographical identification, some scholars have introduced home bias into their studies on forecasting ability. Dong and Xiao ( 2011 ) established that the phenomenon of home bias exists in communication on stock message boards. There is a greater probability that investors in stock message boards will participate in discussions about local stock information. This home bias significantly impacts stock prices. The larger the proportion of local investors involved in information exchange, the higher are stock prices.

Huang et al. ( 2016 ) used IP data from Eastmoney.com to construct a quantitative indicator of investors concerned about home bias. The construction of this indicator is more refined compared with that by Dong and Xiao ( 2011 ). Their findings demonstrated that the situation in which investors are concerned about home bias is more severe in less developed regions and that the level of concern is affected by market size, turnover rate, and name of securities. Ackert et al. ( 2016 ) found that the advice of opinion leaders has greater investment value. In addition, the authors were more concerned about corporations from “home” and, thus, were more accurate when making related forecasts.

Nevertheless, other scholars have held the view that Internet big data’s forecasting ability with regard to stock market performance is affected by other factors, including the difficulty of a stock being arbitraged, level of attention on the company concerned, event type, and disclosure environment. Joseph et al. ( 2011 ) compiled search data on S&P 500 component stocks and defined search intensity as the indicator for investors’ market sentiments. Search intensity is considered to forecast weekly stock returns and trading volumes steadily. Moreover, the relationship between returns and search volume might be affected by the difficulty of a stock being arbitraged. For companies that received less attention from the market, Blankespoor et al. ( 2013 ) established that the posting of information via Twitter can reduce the degree of information asymmetry. Furthermore, the authors established that a positive correlation exists between the level of information dissemination and stock liquidity. In addition, Sprenger et al. ( 2014a ) indicated that advanced stock returns for good news are higher than those for bad news and that the impact of news events on stock markets significantly differs for various event types.

The stocks of a small company are small cap and have weak profitability and poor underwriting capacity. Nan ( 2015 ) discovered that the IPO premiums for such stocks are more vulnerable to divergence of opinions among investors on stock message boards. Xu and Chen ( 2016 ) empirically revealed that disclosure via microblogs could significantly increase same day excess returns and excess trading volumes for the company’s stocks. The level of increase is affected not only by the degree of disclosure intensity and information density of the disclosure but also by noise information in the microblogs. In addition, when microblogs are used to disseminate information that has already been made public, the resultant market response will be stronger than an announcement that is not circulated via microblogs. The impact of disclosures via microblogs is greater for companies that are relatively out of the limelight, and the effect of such disclosures on the trading behavior of individual investors is more significant.

Internet big data and other research related to capital markets

Other studies have been conducted in areas outside of stock market performance. These studies have found that Internet big data can predict the performance of companies. Da et al. ( 2011b ) demonstrated that the search intensity for companies with main products can predict their corporate profitability upon listing. The forecasting effect is especially significant for corporations that have fewer products and for growth companies. In addition, Bartov et al. ( 2015 ) discovered that overall sentiment indicators can predict a company’s quarterly profitability, in addition to excess earnings after announcement of its quarterly earnings data. This forecasting effect is more pronounced for companies whose information environments are poorer. Shen et al. ( 2013 ) found that for companies facing more negative public opinions, the probability of their performance declining after implementation of private placement is higher.

Other scholars’ area of interest is the impact of Internet big data on the regulatory mechanism. An example is the empirical research by Shen et al. ( 2013 ), who found that companies facing negative public opinions about theirs private placements have a significantly lower probability of their private placement proposals being approved by the relevant departments after evaluation. However, such negative public opinions do not significantly impact the probability of the private placement proposal being passed at shareholders’ meetings. Separately, studies about the herding effect have demonstrated that the immediate and next day effects are weakened by online communication. Dynamic interactions exist between this effect in the stock market and online communication; the latter can weaken the herding effect, suppress the continued spread of herding behavior, and improve market efficiency (Zheng et al. 2015 ).

Conclusions and future research

From the literature review, Internet big data sources related to present capital markets research can be categorized into forum-type data, microblog-type data and search class data. Based on these data, researchers can build some more complicated variables to analyze traditional questions. With regards to research about investors’ sentiments based on Internet big data, the main methods of sentiments analysis include building an inventory of lexical categories, using dictionaries for analysis of lexical categories, and machine learning. Many studies analyze whether Internet big data can predict capital markets. but they reach no consistent conclusions, which might be due to sample and analysis methodlimitations.

Through the summary stated above, we believe that research on Internet big data and capital markets has achieved some results. Nevertheless, there remains room for improvement and enhancement in the methods used for data collection and analysis, as well as the areas covered in the research. In most existing studies about Internet big data and capital markets, data collection was mainly undertaken for forum, microblog, and search classes of big data. From the perspective of data acquisition, the majority of the samples in the literature were treated as representative samples. These included the component stocks of the various indexes and specified stocks of high-tech enterprises. The majority of the samples did not strictly comply with the definition of research based on full samples, which should be an important characteristic of big data research. Moreover, the data source was usually a single and particular platform. Relatively few studies have analyzed data that were comprehensively collected from multiple platforms. Overall research based on big data would benefit if there were bigger breakthroughs in terms of data collection and information aggregation.

The time spans of many studies tended to be relatively short because these were limited by the short traceback time of data from many platforms. In addition, the degree of replicability of research conclusions by other scholars was low, and the conclusions did not facilitate multiangle studies for a specific issue using the same benchmarks as the basis. This issue can be resolved only through the establishment of specialized databases that cater to research. This would require the cooperation of corporations that are sources of big data through the implementation of an appropriate method.

Presently, Twitter has been officially authorized and is building the GNIP database for supporting research (Bartov et al. 2015 ). However, specialized databases have not yet been developed for other forum- and microblog-class platforms. Separately, most study samples were collected by the study teams themselves programming. During the initial sample collection stage, it was inevitable that microblog- and search-class big data face the problem of noise interference. The direction of future research for sample collection algorithms is to identify methods for the accurate identification and collection of fuzzy but relevant information (Godbole et al. 2007 ). Big data research samples will undoubtedly be more comparable if specialized databases were to be established through the application of unified technical means across big data platforms, which are suitable for research use.

From the perspective of data analysis, analytical methods used in current studies have remained relatively simple and have room for improvement in terms of accuracy level. Taking the textual analysis technique as an example, Koppel and Shtrimberg ( 2006 ) found that the classification accuracy of machine learning algorithms can reach 70.3 and 65.9% for intra-sample and out-of-sample classifications, respectively. In addition, high overall accuracy in classification of textual sentiment has been achieved through the machine learning method. However, even higher accuracy levels would be beneficial for eliminating noise interference in studies, thereby ensuring stability of the conclusions (Nardo et al. 2016 ).

This is especially the case for textual analysis of Chinese big data. Research on big data and capital markets mainly depends on manual identification, which involves subjective judgments, causing differences to remain between keywords selected in most studies. This situation introduces a certain degree of interference to the stability of conclusions. Many highly effective algorithms for mining of textual semantics have been introduced progressively into the field of financial research. These include different classifier algorithms coupled with a voting theme (Das and Chen 2007 ), support vector machines (De Choudhury et al. 2008 ), and five-stage filtering (Bettman et al. 2010 ).

At present, studies on Internet big data are mainly focused on their forecasting effects on stock market performances. Relatively few scholars have discussed the impact of Internet big data on corporate behavior. In the era of big data, it is possible to apply Internet big data to the forecasting of not only stock market performance but also companies’ performance levels (Da et al. 2011b ). Furthermore, Shen et al. ( 2013 ) indicated that such data can be used to forecast the probability of decline in companies’ future performances. Bartov et al. ( 2015 ) found that overall sentiment indicators can predict companies’ quarterly profitability.

Simultaneously, the literature has examined the impact of Internet big data on regulatory mechanisms (Shen et al. 2013 ). In modern societies, companies’ management teams are in environments surrounded by massive volumes of information. Are a company’s decisions on capital structure, corporate governance, and other corporate behavior somehow affected by these Internet big data? These relationships await further research by scholars.

In computer terminology, ZB refers to one sextillion bytes. The term KB in our daily usage is the acronym for “kilobyte.”

World Economic Forum: Unlocking the value of personal data: From collection to usage, 2013.

Rotella, P. (2012, April 2). Is data the new oil? Forbes.

Obama administration unveils “Big data” initiative: Announces $200 million in new R&D investments. http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2_pdf , 2012.

UN Global Pulse. Big data for development: Challenges & opportunities, 2012.

http://finance.sina.com.cn/money/fund/jjpl/2015-12-31/doc-ifxncyar6085226.shtml .

The size of funds based on big data is derived from the website and the total size is calculated by the author http://fund.eastmoney.com/data/fundsearch.html?spm=search&key=%E5%A4%A7%E6%95%B0%E6%8D%AE#key%E5%A4%A7%E6%95%B0%E6%8D%AE .

http://funds.hexun.com/2016-10-31/186671597.html .

According to the statistics by China Webmaster ( http://top.chinaz.com/ ), the overall popularity ranking of Eastmoney.com’s stock message boards far exceeds that of other stock message boards.

Ackert LF, Jiang L, Lee HS et al. (2016) Influential investors in online stock forums [J]. Int Rev Financ Anal 45:39–46

Article   Google Scholar  

Alanyali M, Moat HS, Preis T (2013) Quantifying the relationship between financial news and the stock market [J]. Sci Rep 3:3578

Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards [J]. J Financ 59(3):1259–1294

Barber BM, Odean T (2001) The internet and the investor [J]. J Econ Perspect 15(1):41–54

Bartov E, Faurel L, Mohanram PS (2015) Can Twitter Help Predict Firm-Level Earnings and Stock Returns? Available at SSRN 2631421

Google Scholar  

Bettman JL, Hallett AG, Sault S (2010) Exploring the impact of electronic message board takeover rumors on the US equity market. SSRN working paper

Blankespoor E, Miller GS, White HD (2013) The role of dissemination in market liquidity: Evidence from firms’ use of Twitter™ [J]. Account Rev 89(1):79–112

Bollen J, Mao H, Pepe A (2011a) Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena [J]. ICWSM 11:450–453

Bollen J, Mao H, Zeng X (2011b) Twitter mood predicts the stock market [J]. J Comput Sci 2(1):1–8

Bordino I, Battiston S, Caldarelli G et al (2012) Web search queries can predict stock market volumes [J]. PLoS One 7(7):e40014

Cheng W, Lin J (2013) Investors’ market sentiments on social media and stock market indices [J]. Manage Sci 05:111–119

Da Z, Engelberg J, Gao P (2011a) In search of attention [J]. J Financ 66(5):1461–1499

Da Z, Engelberg J, Gao P (2011b) In search of fundamentals [C]. AFA 2012 Chicago Meetings Paper

Da Z, Engelberg J, Gao P (2015) The sum of all FEARS investor sentiment and asset prices [J]. Rev Financ Stud 28(1):1–32

Das SR, Chen MY (2007) Yahoo! for Amazon: Sentiment extraction from small talk on the web [J]. Manag Sci 53(9):1375–1388

Das S, Martínez‐Jerez A, Tufano P (2005) eInformation: A clinical study of investor discussion and sentiment [J]. Financ Manag 34(3):103–137

De Choudhury M, Sundaram H, John A et al (2008) Can blog communication dynamics be correlated with stock market activity? [C]. Proceedings of the nineteenth ACM conference on Hypertext and hypermedia. ACM 2008:55–60

Delort JY, Arunasalam B, Milosavljevic M et al (2011) The Impact of Manipulation in Internet Stock Message Boards. Int J Bank Account Finance 8(4):1–18

Dong D, Xiao Z (2011) Home bias in the exchange of stock information and its impact on stock prices: evidences from stock message boards [J]. Manage World 01:52–61, +188

Drake MS, Roulstone DT, Thornock JR (2012) Investor information demand: Evidence from Google searches around earnings announcements [J]. J Account Res 50(4):1001–1040

Gloor PA, Krauss J, Nann S et al (2009) Web science 2.0: Identifying trends through semantic social network analysis [C]. Computational Science and Engineering, 2009. CSE’09. International Conference on. IEEE 4:222–215

Godbole N, Srinivasaiah M, Skiena S (2007) Large-Scale Sentiment Analysis for News and Blogs [J]. ICWSM 7(21):219–222

Gu B, Konana P, Liu A et al (2006) Identifying information in stock message boards and its implications for stock market efficiency [C]. Workshop on Information Systems and Economics, Los Angeles

Huang Y, Qiu H, Wu Z (2016) Local bias in investor attention: Evidence from China’s Internet stock message boards [J]. J Empir Financ 38:338–354

James J (2012) Data never sleeps: How much data is generated every minute [J]. Domo Blog 2012:8

Jiang C, Liang K, Ding Y, Liu S, Liu Y (2015) Forecasting stock behaviors based on social media [J]. Chin J Manag Sci 01:17–24

Jin X, Shen D, Zhang W (2016) Has microblogging changed stock market behavior? Evidence from China [J]. Physica A: Stat Mech Appl 452:151–156

Joseph K, Wintoki MB, Zhang Z (2011) Forecasting abnormal stock returns and trading volume using investor sentiment: Evidence from online search [J]. Int J Forecast 27(4):1116–1127

Kim SH, Kim D (2014) Investor sentiment from internet message postings and the predictability of stock returns [J]. J Econ Behav Organ 107:708–729

Koppel M, Shtrimberg I (2006) Good news or bad news? Let the market decide [M]. Computing attitude and affect in text: Theory and applications. Springer Netherlands., pp 297–301

Book   Google Scholar  

Leung H, Ton T (2015) The impact of internet stock message boards on cross-sectional returns of small-capitalization stocks [J]. J Bank Financ 55:37–55

Liu F, Ye Q, Li YJ (2014) Impacts of interactions between news attention and investor attention onstock returns: Empirical investigation on financial shares in China. J Manag Sci Chin 17(1):72–85

Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks [J]. J Financ 66(1):35–65

Mao Y, Wei W, Wang B et al (2012) Correlating S&P 500 stocks with Twitter data [C]. Proceedings of the first ACM international workshop on hot topics on interdisciplinary social networks research. ACM 69:72

Moat HS, Preis T, Olivola CY et al (2014) Using big data to predict collective behavior in the real world [J]. Behav Brain Sci 37(01):92–93

Nan X (2015) Impact of divergence of opinions among online investors on IPO premiums in the new media era: a mining method based on data from stock message boards [J]. Chin Soft Sci 10:155–165

Nardo M, Petracco M, Naltsidis M (2016) Walking down Wall Street with a tablet: a survey of stock market predictions using the web [J]. J Econ Surv 30(2):356–369

Ruiz EJ, Hristidis V, Castillo C et al (2012) Correlating financial time series with micro-blogging activity [C]. Proceedings of the fifth ACM international conference on Web search and data mining. ACM 2012:513–522

Sabherwal S, Sarkar SK, Zhang Y (2011) Do internet stock message boards influence trading? Evidence from heavily discussed stocks with no fundamental news [J]. J Bus Finance Account 38(9–10):1209–1237

Shen Y, Yang J, Li P (2013) Corporate governance role of Internet public opinions: empirical evidences based on private placements [J]. Nankai Bus Rev 03:80–88

Siganos A (2013) Google attention and target price run ups [J]. Int Rev Financ Anal 29:219–226

Song S, Cao H, Yang K (2011) Investor concerns and IPO anomalies: empirical evidence from online search volumes [J]. Econ Res J S1:145–155

Sprenger TO, Sandner PG, Tumasjan A et al (2014a) News or Noise? Using Twitter to Identify and Understand Company‐specific News Flow [J]. J Bus Finance Account 41(7–8):791–830

Sprenger TO, Tumasjan A, Sandner PG et al (2014b) Tweets and trades: The information content of stock microblogs [J]. Eur Financ Manag 20(5):926–957

Sul HK, Dennis AR, Yuan LI (2016) Trading on Twitter: Using Social Media Sentiment to Predict Stock Returns [J]. Decision Sciences

Takeda F, Wakao T (2014) Google search intensity and its relationship with returns and trading volume of Japanese stocks [J]. Pac Basin Financ J 27:1–18

Tumarkin R, Whitelaw RF (2001) News or noise? Internet postings and stock prices [J]. Financ Anal J 57(3):41–51

Vlastakis N, Markellos RN (2012) Information demand and stock market volatility [J]. J Bank Financ 36(6):1808–1821

Wysocki PD (1998) Cheap talk on the web: The determinants of postings on stock message boards [J]. University of Michigan Business School Working Paper., p 98025

Xu W, Chen D (2016) Role of information disclosed by the media: empirical evidence from Sina Weibo [J]. J Financ Res 03:157–173

Yang SY, Mo SYK, Liu A (2015) Twitter financial community sentiment and its predictive relationship to stock market movement [J]. Quant Finan 15(10):1637–1656

Yu Q, Zhang B (2012) Investors’ limited attention and stock returns: an empirical study using the Baidu Index as indicator of the level of attention [J]. J Financ Res 08:152–165

Zhang Y, Swanson PE (2010) Are day traders bias free? Evidence from internet stock message boards [J]. J Econ Financ 34(1):96–112

Zhang X, Fuehres H, Gloor PA (2011a) Predicting stock market indicators through twitter “I hope it is not as bad as I fear” [J]. Procedia Soc Behav Sci 26:55–62

Zhang Y, Zhang W, Jin X, Xiong X (2011b) Does the Internet know more? Open source information and asset pricing [J]. Syst Eng Theory Pract 04:577–586

Zhang X, Fuehres H, Gloor PA (2012) Predicting asset value through twitter buzz [M]. Advances in Collective Intelligence 2011. Springer Berlin Heidelberg., pp 23–34

Zhang J, Liao W, Zhang R (2014a) Impact of ordinary investors’ level of attention on the volume and price of stock market transactions: an empirical study based on the Baidu Index [J]. Account Res 08:52–59, +97

Zhang Y, Li Y, Su Z, Zhang Z (2014b) Can online searches be used to forecast stock market performance? [J]. J Financ Res 02:193–206

Zhang Y, An Y, Feng X et al (2016) Celebrities and ordinaries in social networks: Who knows more information? [J]. Finance Research Letters

Zhao L, Lu Z, Wang Z (2013) Public’s search for stocks on Baidu: empirical research on the relationship between stock returns and search volume on Baidu [J]. J Financ Res 04:183–195

Zheng Y, Dong D, Zhu H (2015) Can online exchange of stock information weaken the herding effect? An analysis based on the Chinese stock market [J]. Manage Rev 06:58–67

Download references

Acknowledgements

This paper is funded by National Nature Sciences Foundation of China (No. 71372148).

National Nature Sciences Foundation of China (No. 71372148).

Authors’ contribution

GL gave the main idea of the review paper and gave some improvement suggestions. MY collected references and wrote the main body for the paper. Both authors read and approved the final manuscript.

Authors’ information

Minjian Ye is a PhD student at Sun Yat-Sen Business School, Sun Yat-Sen University. His research interests are corporate governance, capital structure, and big data analysis.

Guangzhong Li is a professor of finance at Sun Yat-Sen Business School, Sun Yat-Sen University. He has published several papers in Review of Finance、Journal of Corporate Finance、Journal of International Money and Finance、Journal of Business Finance and Accounting、Journal of Comparative Economics. His research interests are corporate finance, financial institution, and big data analysis.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and affiliations.

Sun Yat-Sen Business School, Sun Yat-Sen University, Guangzhou, 510275, China

Minjian Ye & Guangzhong Li

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Guangzhong Li .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Ye, M., Li, G. Internet big data and capital markets: a literature review. Financ Innov 3 , 6 (2017). https://doi.org/10.1186/s40854-017-0056-y

Download citation

Received : 06 December 2016

Accepted : 26 March 2017

Published : 20 April 2017

DOI : https://doi.org/10.1186/s40854-017-0056-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Internet big data
  • Financial studies
  • Capital markets

literature review on capital market

Book cover

The Impact of Globalization on International Finance and Accounting pp 385–393 Cite as

A Literature Review of Financial Performance Measures and Value Relevance

  • Nattarinee Kopecká 2  
  • Conference paper
  • First Online: 30 December 2017

1833 Accesses

3 Citations

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

Performance measurement comprises several metrics and applications used as a benchmark in business sectors for both internal and external users. For managers, it expresses whether company’s targets are reached and as a way of evaluating risks and returns for shareholders. A variety of performance measures are utilized to almost every operational process, and the area is rather vast. Therefore, the aim of the study is to find out what kinds of financial tools are better linked to market value. The result of the study shows that financial measures appear to be favorable measures for companies providing relevant and meaningful information to shareholders. Especially, return on investment (ROI) and earnings are significantly relevant to market value, while the superiority of EVA still remains unclear. Above all, companies still prefer traditional financial measures to other financial tools.

This is a preview of subscription content, log in via an institution .

Alibad S, Dorestani A, Balsara N (2013) The most value relevant accounting performance measure by industry. J Account Finance. http://www.na-businesspress.com . Accessed 16 Jul 2016

Almasan AC, Grosu C (2010) Financial measures for performance measurements in a regulated environment. Paper presented at the 5th international conference on economy and management transformation, West University of Timisoara, Romania, 24–26 Oct 2010

Google Scholar  

Barney JB (2002) Gaining and sustaining competitive advantage. Prentice Hall, Upper Saddle River

Black AP, Wright PD, Bachman JE (1998) In search of shareholder value: managing the drivers of performance. Financial Times Prentice Hall, London

Barton J, Hansen B, Pownall G (2010) Which performance measures do investor value the most and why? Account Rev 85:18–19

Article   Google Scholar  

Bhasin L (2016) Disclosure of EVA in the financial statement: experience of an asian economy. https://www.academia.edu . Accessed 22 Sept 2016

Ewoh AIE (2011) Performance measurement in an era of new public management. http://digitalcommons.kennesaw.edu . Accessed 10 Jul 2016

Francis J, Schipper K, Vincent L (2003) The relative and incremental explanatory power of earnings and alternative (to earnings) performance measures for returns. Contemp Account Res 1:121–164. https://doi.org/10.1506/XVQV-NQ4A-08EX-FC8A

Gentry RJ, Shen W (2010) The relationship between accounting and market measures of firm financial performance : how strong is it? J Manag Issues 22:514–530

Holthausen RW, Watts RL (2001) The relevance of the value-relevance literature for financial accounting standard setting. J Account Econ 31:3–75

Kaplan R, Norton D (1992) The balanced scorecard: measures that drive performance, vol 70. Harvard Business Reviews Press, Boston, pp 71–79

Kamath GB (2015) The impact of intellectual capital on financial performance and market valuation of firm in India. International Letters of Social and Humanistic Sciences. http://wwwscipress.com/ILSHS.48.107 . Accessed 18 Sept 2016

Knáplová A, Pavelková D, Chodúr M (2011) Měření a říyení výkonnosti podniku, Praha

Neely A, Mills J, Platts K, Gregory M, Richards H (1994) Realizing strategy through measurement. Int J Oper Prod Manag 14:52–140

Neely A, Gregory M, Platts K (2005) Performance measurement system design. A literature review and research agenda. Int J Oper Prod Manag 15:80–166

Patel RP, Patel M (2012) Impact of EVA on share price. International Journal of Contemporary Business Studies. A Study of Indian Private Sector Banks. https://ssm.com/abstract=2097467 . Accessed 12 Jul 2016

Pathirawasam C (2010) Value relevance of accounting information: evidence from Sri Lanka. Int J Res Commer Manag 8(1):13–20

Richard PJ, Devinney TM, Yip GS, Johnson G (2009) Measuring organizational performance: towards methodological best practice. J Manag. http://jom.sagepub.com/cgi/content/refs/35/3/718 . Accessed 20 Sept 2016

Shan YG (2014) Value relevance, earning management and corporate governance in China. http://www.business.adelaide.edu.au . Accessed 25 Jul 2016

Stewart GB (1994) EVA: fact and fantasy. J Appl Corp Financ 7(2):71–87. https://doi.org/10.1111/j.1745-622.1994.tb00406x

Sorter GH, Gans MS, Rosenfield P, Shannon RM, Streit RG (1974) Objectives of financial statements. American Institute of Certified Public Accountants, New York, pp 3–66

Venkatraman N, Ramanujam V (1986) Measurement of business performance in strategy research: a comparison of approaches. Acad Manag Rev 11:801–814

Download references

Acknowledgments

This paper has been dedicated to the research project "Analysis of strategic management ac-counting relation to company management and performance" (supported by Internal Grant Agency, No. IG 71/2017).

Author information

Authors and affiliations.

Department of Management Accounting, University of Economic, Prague, Nam. W. Churchilla 4, 130 67, Prague 3, Czech Republic

Nattarinee Kopecká

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Nattarinee Kopecká .

Editor information

Editors and affiliations.

Faculty of Finance and Accounting, University of Economics, Prague, Prague, Czech Republic

David Procházka

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper.

Kopecká, N. (2018). A Literature Review of Financial Performance Measures and Value Relevance. In: Procházka, D. (eds) The Impact of Globalization on International Finance and Accounting. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-68762-9_42

Download citation

DOI : https://doi.org/10.1007/978-3-319-68762-9_42

Published : 30 December 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-68761-2

Online ISBN : 978-3-319-68762-9

eBook Packages : Economics and Finance Economics and Finance (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Browse All Articles
  • Newsletter Sign-Up

CapitalMarkets →

No results found in working knowledge.

  • Were any results found in one of the other content buckets on the left?
  • Try removing some search filters.
  • Use different search filters.

This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser. To learn more about cookies, click here .

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

INDIAN STOCK MARKET -REVIEW OF LITERATURE

Profile image of bhushan pawar

Related Papers

Bonfring International Journal

In India, the history of capital markets dates back to the 18th century when East India Company securities traded the country. The present study is largely based on the available secondary data. The statistical data regarding growth of the capital markets was available from various websites. Capital markets help to channelize surplus funds into productive use. Generally, this market trades mostly in long-term securities. The important divisions of the capital market are stock market, bond market and primary, secondary markets. Primary markets deal with the trade of new issues of stocks and other securities, whereas secondary market deals with the exchange of existing or previously-issued securities. Our finding is that during the first and second five year plans, the Government emphasized on the development of agriculture and public undertakings. The Public sector undertaking was healthier than Private undertakings, but shares were not listed in the stock exchange. More over controller of Capital Issue (CCI) closely supervised everything. A number of investors were interested to invest their savings in debentures instead of company deposits. We conclude that Capital markets were not well organized and developed during the British rule. But in the present scenario, we find that Capital markets are well developed after the introduction of SEBI. Through provision of long term loans, the capital market brings about effective functioning of various sectors of the economy. A sound and efficient capital market is one of the most instrumental factors in the economic development of a nation.

literature review on capital market

Dr Shruti Mishra

India had an internet user base of about 462 million as of Dec 2016 and buoyed by Internet penetration in rural areas, the number of web users in India will see a two-fold rise at 730 million by 2020. Despite being the second-largest user base in world, only behind China (650 million, 48% of population), the penetration of e-commerce is low compared to markets like the U.S (266 million, 84%), or France (54 M, 81%), but is growing at an unprecedented rate, adding around 6 million new entrants every month. The industry consensus is that growth is at an inflection point. In India, cash on delivery is the most preferred payment method, accumulating 75% of the e-retail activities. Demand for international consumer products (including long tail items) is growing much faster than in-country supply from authorized distributors and e-commerce offerings. This paper will give the scenario of present and future of e-commerce business taking consideration of demonetization in country.

MANISH SHAW

José G. Vargas-Hernández

BRICS are being considered as “biggest richest innovative countries association”. Their role and contribution are of paramount significance in the present globalize world. Emergence of BRICS as an economic bloc is also seen as an alternative to EU. Keeping the increasing significance of the bloc; the aim of this work is to identify strategies that enable emerging economies of the BRICS group play a bigger role globally, these structural changes have been made in their political and economic reforms, which together with the business sector has been progress scorch significant breakthrough towards international market, since society and organizations are functional, and institutional organizations for the development of a society and the economy. The role of BRICS bank may go a long way in the promotion of development in BRICS in particular and world in general.

International Journal of Recent Advances in Multidisciplinary Research

Raja Mannar Badur

Indian financial market has seen an extraordinary volatility in the last few years. Since the year 2002, Indian market has grown from a much volatile condition to growth phenomena, from a SENSEX point of 5500 in December 2003 to 13,787 in December 2006 and crossed the mark of 20,000 in the year 2007 and again in 2013. Due to various reasons, the stock market has also experienced drastic decline to even less than 8,000 points in 2008. It is not because of only the domestic market but also the international investors. There are many other variables which contribute to the positive growth of the stock market. FIIs investment is considered to be one of the biggest push after the economic fundamentals. There is no doubt that the liberalisation of the FII flows into the Indian Capital Market since 1993 has had a considerable impact on Indian stock market. The present paper is an attempt to explore the FDIs investment behaviour and its relationship with GDP, SENSEX and NIFTY movement. Further, an attempt is made to develop an understanding of the dynamics of the trading behaviour of FDIs and effect on the Indian stock market. The study is covers the period, financial year 2000-2001 to 2016-17on GDP, BSE Sensex and Nifty and FII activity. It provides the evidence of significant positive correlation between FDI activity and effects on Indian Capital Market. The analysis also finds that the movements in the Indian Capital Market are fairly explained by the FDI net inflows

Arvind Kumar

IOSR Journals

Abstract: Financial market is a place which provides a place for investment and helps in enhancing the income in terms of return. The main aim of financial market is to create cash flow in the market, so that individuals can take investment decision without any fear. Every investor would like to get required rate of return with minimum risk. To attain the objective of high return with minimum risk, various instruments, practices and strategies have been devised and developed in the recent past. After privatization and globalization financial market has entered into a new phase of global integration and liberalization. On the one hand integration of the Indian capital market with global market open the boundaries for investment to everyone, which also helps in increasing the cash flow, on the other hand there has increased in financial risk as the frequent changes in the interest rates, currency exchange rate and stock prices. To overcome from the increased financial risk a risk minimizing tool were launched by NSE during the year 2001, and that tool was Derivatives. This study helps in analyzing the facts behind launching of financial derivative by NSE India and how derivatives help in the growth of share market in India. The case will cover introduction, contextual note, various arguments and the results, remaining problems and new ingenuities regarding financial derivatives of NSE India. Key Words: Financial Market, Return, Risk, Globalization, Privatization, Derivatives

Rakesh Gupta

Hypothesis of Market Efficiency is an important concept for the investors who wish to hold internationally diversified portfolios. With increased movement of investments across international boundaries owing to the integration of world economies, the understanding of efficiency of the emerging markets is also gaining greater importance. In this paper we test the weak form efficiency in the framework of random

Prafulla Kumar Sinha

RELATED PAPERS

srilaxmi challagundla , Dr. Abdul Majeeb Pasha. Shaik

eDITOR'S nOTE tHE tEAM

Pramod Yadav

Dipankor Coondoo

Samuel Onyuma, PhD

SSRN Electronic Journal

Manoj Kumar

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

IMAGES

  1. (PDF) Conceptual study of the difference between the money market and

    literature review on capital market

  2. Market Segmentation-Review of Literature

    literature review on capital market

  3. Literature review on working capital

    literature review on capital market

  4. (PDF) THE IMPORTANCE OF FINANCIAL ACCOUNTING FOR THE FUNCTIONING OF

    literature review on capital market

  5. Literature review on working capital

    literature review on capital market

  6. (PDF) REVIEW OF LITERATURE ON WORKING CAPITAL MANAGEMENT AND FUTURE

    literature review on capital market

VIDEO

  1. Economics of Capital Market, Important Questions, Fifth Semester BA Economics, Calicut University

  2. Capital market theory

  3. Perspectives on Capital Liquidity in the Banking System

  4. Invest Safely: Beware of Online Influencers while investing in Capital Markets (Malayalam 45 secs)

  5. Capital market and features of capital market (class 12 business studies)

  6. Introduction to Capital Market Participants

COMMENTS

  1. Herding behaviour in the capital market: What do we know and ...

    The literature review of 279 papers provides insight regarding the structure of knowledge on herding behaviour in the capital market. We developed a systematic literature review that resulted in four dimensions related to herding behaviour: driver, research context, covariate, and impact of herding behaviour.

  2. PDF Literature Review on Capital Market

    (2014) [2] define the capital market as a long-term capital market, which refers to securities financing, trading and fund borrowing and lending for more than one year, which can also be called the medium and long-term capital market. In the searched literature, there are many classification methods for the capital market.

  3. (PDF) Stock Markets: An Overview and A Literature Review

    A stock exchange, also called a securities exchange or. bourse is the name given to the facility for engaging in buying and selling of shares of. stock or bonds or other financial instruments. For ...

  4. Theory of Capital Markets: A Review of Literature

    Theory of Capital Markets Fall 1996 Theory of Capital Markets: A Review of Literature Worapot Ongkrutaraksa, Ph.D. Abstract The main purpose of this essay is to revisit the relevant theory and evidence regarding the informationally efficient capital markets. It explores the normative theory of perfect capital markets, the stochastic notion of ...

  5. 106942 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on CAPITAL MARKETS. Find methods information, sources, references or conduct a literature review on ...

  6. Internet big data and capital markets: a literature review

    From the literature review, Internet big data sources related to present capital markets research can be categorized into forum-type data, microblog-type data and search class data. Based on these data, researchers can build some more complicated variables to analyze traditional questions.

  7. New evidence on practical implications of the CAPM

    The Capital Asset Pricing Model (hereafter CAPM) (Black, 1972; Lintner, 1965; Sharpe, 1964) has proven to be the most commonly applied financial model for explaining stock returns.This model claims that the expected required return on an individual stock i E(r i) is equal to the risk-free rate r f (for approximation, a 1-month T-bill return) with an addition of the systematic risk coefficient ...

  8. 1125 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on CAPITAL MARKET RESEARCH. Find methods information, sources, references or conduct a literature review ...

  9. State capitalism and capital markets: Comparing securities exchanges in

    Yet, in the existing state capitalism literature, capital markets have only been systematically analysed with regard to state ownership of listed firms. Such processes of financialization 1 have often not been a focus of analysis (see the section 'Finance and state capitalism' for a literature review).

  10. Literature Review

    This chapter critically reviews the existing literature in some areas of business that impact investment appraisal decisions. These areas include capital budgeting, corporate governance, capital markets, accounting practices, and investment appraisal methods. To discuss these concepts and their interrelationships, some relevant theories are ...

  11. Internet big data and capital markets: a literature review

    This study summarizes the sources of Internet big data for research related to capital markets and the analytical methods that have been used in the literature, and proposes suggestions for future studies in which big data can be applied to examine issues related toCapital markets. BackgroundResearch in various academic disciplines has undergone tremendous changes in the era of big data.

  12. A Literature Review of Financial Performance Measures and Value

    The results of the review are the following: the study based on the model of corporate market valuation of nonfinancial indicators (human capital, knowledge, and the reputation of firms) applying among US firms from 2006 to 2009 shows that the model positively affects firms' performance, especially sales growth and net profit margin.

  13. Full article: Influencing factors that determine capital structure

    The journal articles chosen for this capital structure determinants literature review study cover a time span of 7 years that is, from 2014 to 2020. This time scope criterion was drawn from the review of literature on the determinants of capital structure conducted by Kumar et al. (Citation 2017) from the period 1972 to 2013.

  14. Capital Markets: Articles, Research, & Case Studies on Capital Markets

    Reliance on high leverage is one distinctive component of the bank business model. This study suggests that the aggregate United States banking sector was relatively inefficient between 1960 and 2015. The falling costs of new production technologies in capital markets may further advantage capital markets over banks. 08 May 2017.

  15. Capital Markets Development : Causes, Effects, and Sequencing

    Daily Updates of the Latest Projects & Documents. This note consolidates and summarizes the theoretical and empirical research produced in the past 20 years on the causes, effects, and sequencing of capital markets .

  16. Market Timing and Capital Structure: A Critical Literature Review

    This study seeks to critically review capital structure theories and issues related to the impact of market timing on capital structure. 1.3 Objectives of the research The main objective of the study is to conduct a review of existing literature to determine the impacts of market timing on capital structure of the firms.

  17. (PDF) Determinants of capital structure: A literature review

    227. DETERMINANTS OF CAPITAL STRUCTURE: A. LITERATURE REVIEW. Athenia Bongani Sibindi*. *University of South Africa, Department of F inance, Risk Management and Banking, P.O Box 392, UNI SA, South ...

  18. The Effect of Capital Structure on Profitability: An Empirical Panel

    The review of some of the major studies is presented in Table 1 so as to develop a clear understanding of available literature on the effect of capital structure on profitability. Although many research studies have been undertaken in the field of capital structure and profitability, but very few studies explain the effect of capital structure ...

  19. Environmental, Social, and Governance (ESG) disclosure: A literature review

    The increased interest in CSR among capital market participants is accompanied by the expansion of academic research related to CSR in the accounting literature In Appendix A we provide a summary of all CSR-related publications up to 2021 published in the Accounting, Organizations and Society, British Accounting Review, Contemporary Accounting Research, Journal of Accounting Economics, Journal ...

  20. PDF Impact of Derivatives on Indian Capital Market

    issues throughout the world by the researchers. The review of literature and study is presented in four sections: first, the r eview of studies fundamental to capital market of India; second, the r eview of studies elementar y to the testing about the capital market efficiency; thir d, the review of studies

  21. INDIAN STOCK MARKET -REVIEW OF LITERATURE

    The review of literature has brought to light that Enlistment of corporate securities in more than one stock exchange at the same time improves liquidity of securities and functioning of stock exchange- According to Gupta. There is existence of wild speculation in the Indian stock market-According to L.C. Gupta.

  22. (PDF) Indian Capital Market: A Review

    Review of Literature. Subir Gokarn (1996) in his research paper "Indian Capital Market Reforms, 1992-96 An Asse ssment" ... 'Indian capital market review: Issues, dimensions and performance.

  23. Small and medium-sized enterprises in emerging markets and foreign

    The rest of this study is organized as follows. First, after the introduction section, the literature review of the various CSFs is depicted in Section II. Section III reveals research methods, where MCDM, the step-wise weight assessment ratio analysis, the sampling procedure, and the Delphi method are presented in detail.

  24. (PDF) Research on the Indian Capital Market: A Review

    For the. purpose of the review, resea rch has been defined as doc toral dissertations, pap ers published in. academic journals, books (including expository, but excluding ob viously popular books ...