cluster analysis in marketing research example

Cluster Analysis for Marketers: The Ultimate Guide

cluster analysis in marketing research example

When you want to analyze your marketing data, it is simply not realistic to look at each customer separately. True, it is beneficial to collect and store rich data for each customer; however, it is impossible to organize and communicate analyses that look at thousands or millions of individual customer records at the same time. Making decisions at a strategic level would be impractical.

Our brains simply cannot process information at such a granular level. At the same time, we know that we don't want to oversimplify it down to a one-size-fits-all approach. There has to be a middle ground where the customer’s voice is adequately heard,  even if  some segmentation of the user base is required.

In fact, there is a way to elegantly approach the challenge of segmenting customers. It is called cluster analysis, and it is one of the most accessible and explainable ways to apply machine learning on marketing data.

Why cluster analysis ?

Let's take a step back before diving into this technique. It’s important to understand how cluster analysis differs from other approaches. If the goal is to segment customers, why can't you do this segmentation manually?

Well, you can. In fact, if you work with web analysis tools like Google Analytics , you are probably used to manually defining traffic and user segments of interest in order to keep the analysis focused on the right places. 

This approach is very common and  for good reason, but it has its limitations.

While it can be  effective when working with a small number of user dimensions, it is not hard to imagine how it cannot easily scale in the presence of a high number of user attributes. Luckily, when the human brain reaches its limit, advanced analytics and machine learning can provide solutions.

Prepare your data first

Cluster analysis is a fascinating technique and one of the top advanced analytics methods used in Marketing.

❗To prepare the foundation of your organization to work effectively with clustering you'll need to carefully prepare your data .

You'll want to make sure your basic digital marketing reporting needs are well taken care of. Having a solid automated data and reporting pipeline in place will free up resources, reduce human errors, and, most importantly, improve data quality.

The quantity and diversity of data also play a key role. The reason for this is most of the advanced marketing analytics techniques, such as clustering, perform significantly better in the presence of larger volumes of granular data collected from a variety of sources.

The way you handle, process, and utilize your data affects your company's position on the analytics maturity curve. The higher you climb the curve, the more advanced your analysis methods are and the more insights you get from raw data.

Improvado's analytics maturity model

🚀 Discover how to move forward on the analytics maturity path with our extensive guide. 🚀

Improvado can help with all of these aspects of your preparation before you dive into advanced marketing analytics, from automating your marketing reports to collecting and storing granular level data.

Use Cases for Marketing

Clustering for customers is one of the most widely-known domains for cluster analysis applications. It helps marketers group together similar customer stories.  Once you become familiar with the technique, there is no shortage of other marketing-related fields where you can meaningfully apply it .

Customer use case

You can cluster customers based on the many types of characteristics available about them and their behavior. For example, clustering can be based on:

  • Customer browsing activity
  • Customer demographics
  • Recency, frequency, and monetary value of a customer
  • Items bought by a customer
  • Offline customer behavior

Product use case

Another interesting use case is product clustering, which can be based on attributes of products such as:

  • When the product was purchased
  • Who purchased the product
  • In which store the product was purchased 

SEO use case

Likewise, say for SEO keywords, you can apply cluster analysis if you have available data about:

  • Keyword rankings
  • Difficulty score
  • Authority score   

By the way, we've built an SEO dashboard template that can help you better track your content marketing metrics, including sessions, visits, and bounce rate, and more.

SEO dashboard for cluster analysis

How Clustering works

The basic concept.

‍ Now that you have seen how useful clustering is in a marketing context, it's time to gain some intuition on how it works. Incidentally, if you have been wondering  how a machine learning technique can work in practice for marketing, this will give you a great sense. In fact, clustering is considered among the most widely-used, unsupervised machine learning techniques. 

Why unsupervised? Because there isn't any ground truth that we want the machine to learn or predict, instead we want the data itself to reveal the natural structures within it. Sound confusing? It's not. To make the concept clearer, let's look at a simple example.

A simple example

Imagine you are in charge of a T-shirt company who wants to customize the fit of T-shirts for its customers. You have sample data regarding the height and weight of your customers. This is how the data looks when plotted in two dimensions:

What the clustering algorithm does is label each customer—represented by a point in the graph—according to the optimal cluster that it can be matched to. The key is to make clusters as homogeneous as possible. 

Key definitions

How are the clusters determined? The idea is to form clusters in a way that maximizes the similarity between the points of each group. “How is similarity defined?” you might ask. It's expressed as the distance between each possible pair of points. 

‍ How can you measure that distance? This is where the  Pythagorean theorem comes in (you might have heard of it in geometry class). If you have the x and y values of two points —in our example, the weight and height measurements of two customers— you can calculate the distance between them. This simple calculation, based on this classic theory, is the foundation of the clustering algorithm.

A marketing example

Hopefully by now, the information in this article has helped you to start connecting the dots.

Next step, forget about heights and weights and think about some more realistic scenarios. While with two variables clustering analysis might seem easy and intuitive, this is not the case when you start adding customer attributes.  If you move beyond the three attributes threshold, it's no longer possible to visualize the data.

Instead of measurements like height and weight, you now have variables such as customer income, age, purchase value, and so on. You can calculate the distances in the same away as in the simple example above until you find the optimal clusters.

K-means algorithm

This last step however cannot happen in one go. It should happen iteratively by following one of the several clustering algorithms available. The most common one is called k-means, which, as we 'll see, comes with some favorable properties.

Once the algorithm determines the optimal clusters, the ball is back in your court.

The marketer's role

As a marketer, you need to use your domain knowledge, intuition, and experience to give descriptive names to the clusters produced by the algorithm and, of course, ensure that the outcomes make sense from a practical and business standpoint.  You might want, for example, to experiment with adding or removing  one or more of the initial attributes and then rerun the algorithm to check if it produces more meaningful clusters.

Applying the clustering technique

  • To prepare for clustering, you'll need to have granular level data for each customer, each product, etc. This technique simply doesn't work with aggregate data.
  • Ideally, if your data lives in different places you’ll want to collect them and store them in a data warehouse such as BigQuery, Redshift, or Snowflake for easy access. Remember that Improvado is here to help you with this.
  • Before applying the technique, you'll need to make sure that the data is numeric or converted into a numeric form so that the mathematical distances can be calculated.
  • You might also need to normalize the data of the various attributes if they are expressed in different scales. One way to do this is by converting the values of attributes in such a way that they range between zero and one, while still keeping all their original properties.

Once the data is ready from a preprocessing point of view, there are a few options as to how to apply the algorithm:

  • If you have data scientists on your team, they can use open source tools such as the programming language R  or Python for such tasks.
  • SaaS and other analytics tools like Tableau have integrated functionality to allow users to perform clustering in a drag and drop fashion.
  • With the right add-on packages, it is also possible to carry out clustering in Excel. 
  • These days, another very convenient way to do this is via BigQuery, especially if you are familiar with SQL syntax. Implementation of clustering can be accomplished within a few lines of SQL code with the option to immediately  visualize results.

Cluster analysis in practice

The image below shows how the outcome of a cluster analysis might look like in practice.  This particular example is from Tableau, which provides a built-in function for clustering. A large number of products have been grouped into three distinct clusters, based on their sales value and profit ratio. 

The clustering algorithm could have included many more variables. But even with just these two, the result of the analysis can be really informative. For instance, if you are in charge of marketing and product strategy you now have a data-driven way to prioritize the products based on which “performance” cluster they belong to -notice also the presence of some outliers that might require your special attention!

Clustering, despite its merits, is not the perfect solution for all segmentation use cases. Here are some pros and cons of clustering to keep in mind: 

  • It is a very interpretable technique and is easy to visualize.
  • It is efficient to implement and can easily scale to large data with millions of records. 
  • It is dynamic. The definitions of clusters evolve as data changes.
  • It can be used as a data exploration technique to better understand data before making decisions.
  • The cluster analysis result is not deterministic, meaning that different executions of the algorithm might return different results. 
  • With k-means clustering, the marketer must predefine the number of clusters, which is not always an easy, straightforward decision.
  • There is some preprocessing in the data that needs to be done before applying the technique, as discussed in the requirements section.

Great, now that all the steps have been followed and some interesting clusters have been produced— what’s next?

Well, there are many options depending on the exact use case.

For clustering of customers and prospects, you can use the clusters to

  •  customize your re-targeting and re-marketing strategies
  •  better adjust promotional and other types of marketing messages
  •  customize the product for the various personas to better fit their needs
  •  personalize the website design and UI. 

When clustering is used on the product level, it is possible to better capture cross- and up-selling opportunities between the different product clusters.

Clustering is a perfect fit for marketing. It reveals the natural structure in marketing data. It is a great tool for data exploration and it is relatively easy to explain and visualize. It is also one of the most accessible machine learning techniques for marketing. It is very effective in clustering customers, products, keywords, ad groups— you name it!

Our recommendation:

Check out The Best Marketing Analytics Tools & Software for 202‍

Best agency management software for marketing agencies

Marketing “Middleware” Demystified

The Best Marketing Analytics Tools & Software for 2023

Data-Driven Marketing 101: Concept, Benefits, and Pitfalls Clarified

cluster analysis in marketing research example

500+ data sources under one roof to drive business growth. 👇

cluster analysis in marketing research example

Unshackling Marketing Insights With Advanced UTM Practices

cluster analysis in marketing research example

Improvado Labs: experience the latest marketing analytics technology

cluster analysis in marketing research example

Im provado - AI-powered marketing analytics & intelligence

From data to insights, automate and activate your marketing reporting with Al.

G2 Crowd logo

From the blog

Google Cloud is an Improvado partner.

San Diego | Headquarters

3919 30th St, San Diego, CA 92104

San Francisco

2800 Leavenworth St, Suite 250, San Francisco, CA 94133

Webinar ‘Praxis-Check Qualitätssicherung bei Online-Umfragen’

22.04.2024 11:00 - 11:45 UHR 

Cluster Analysis Guide with Examples

Explore the power of cluster analysis with our comprehensive guide. Learn the definition, types, and examples of this statistical method to gain insights into complex relationships in your data.

Table of contents

  • What is Cluster Analysis?

Types of Cluster Analysis with Examples

Benefits of cluster analysis, drawbacks of using cluster analysis.

  • How to use Cluster Analysis?

As a widely used statistical method, cluster analysis helps to identify groups of similar objects within a dataset, making it a valuable tool in fields such as market research , biology, and psychology. In this guide, we cover the definition of cluster analysis, explore its different types, and provide practical examples of its applications. By the end of this guide, you will have a thorough understanding of cluster analysis and its benefits, enabling you to make informed decisions when it comes to analyzing your own data.

What is a Cluster Analysis?

Cluster analysis is a statistical method used to group items into clusters based on how closely associated they are. It is an exploratory analysis that identifies structures within data sets and tries to identify homogenous groups of cases. Cluster analysis can handle binary, nominal, ordinal, and scale data, and it is often used in conjunction with other analyses such as discriminant analysis. The purpose of cluster analysis is to find similar groups of subjects based on a global measure over the whole set of characteristics.

Cluster analysis has many real-world applications, such as in unsupervised machine learning, data mining, statistics, Graph Analytics, image processing, and numerous physical and social science applications. In marketing, cluster analysis is used to segment customers into groups based on their purchasing behavior or preferences. In healthcare, it is used to identify patient subgroups with similar characteristics or treatment outcomes. In investor trading, cluster analysis is used to develop a diversified portfolio by grouping stocks that exhibit high correlations in returns into one basket, those slightly less correlated in another, and so on.

Hierarchical Clustering

K-means clustering, dbscan (density-based spatial clustering of applications with noise), fuzzy c-means clustering, mean shift clustering, affinity propagation, spectral clustering.

  • Identify Hidden Patterns and Trends One of the primary benefits of cluster analysis is its ability to reveal hidden patterns and trends within datasets. By grouping similar data points together, cluster analysis can help users uncover relationships and structures that may not be immediately apparent. This can lead to valuable insights, driving innovation and improving decision-making processes.
  • Enhance Decision-Making Cluster analysis enables organizations to make more informed decisions by providing a clear understanding of the relationships and patterns within their data. By identifying clusters, decision-makers can better target resources, develop tailored marketing strategies, and optimize product offerings to meet the needs of different customer segments or market niches.
  • Improve Data Organization and Visualization Cluster analysis can help simplify complex datasets by organizing data points into meaningful groups. This organization makes it easier to visualize and analyze large amounts of data, enabling users to quickly identify trends, outliers, and potential areas of interest. Additionally, clustering can be used to create more effective data visualizations, such as heatmaps or dendrograms, which can enhance communication and understanding of data-driven insights.
  • Enhance Customer Segmentation By applying cluster analysis to customer data, businesses can segment their customer base into distinct groups based on various attributes, such as demographics, purchasing behavior, and product preferences. This segmentation enables companies to tailor their marketing strategies and product offerings to better meet the needs of specific customer segments, ultimately leading to increased customer satisfaction and loyalty.
  • Streamline Anomaly Detection Cluster analysis can be used to identify outliers or anomalies in datasets, which can be crucial for detecting fraud, network intrusions, or equipment failures. By grouping data points based on their similarities, cluster analysis can effectively separate normal data from anomalous events, allowing organizations to quickly identify and address potential issues.
  • Optimize Resource Allocation In industries such as logistics, manufacturing, or urban planning, cluster analysis can help optimize resource allocation by identifying patterns in spatial or temporal data. For instance, by clustering delivery addresses or manufacturing facilities based on their geographic proximity, organizations can reduce transportation costs and improve overall efficiency.
  • Facilitate Machine Learning and Predictive Analytics Cluster analysis plays a critical role in machine learning and predictive analytics by serving as a preprocessing step for other techniques. For instance, clustering can be used to reduce the dimensionality of data before applying classification or regression algorithms, improving the performance and accuracy of predictive models. Additionally, cluster analysis can help identify subgroups within datasets, which can be used to develop more targeted machine learning models or generate more nuanced predictions.
  • Choice of Distance Metric and Clustering Algorithm The effectiveness of cluster analysis depends on the choice of distance metric and clustering algorithm. Different distance metrics, such as Euclidean, Manhattan, or cosine similarity, can produce varying results. Choosing the most appropriate metric for your dataset is crucial, as an unsuitable metric may lead to poor clustering results or misinterpretation of the data.
  • Sensitivity to Initial Conditions and Outliers Some clustering algorithms, such as K-means, are sensitive to initial conditions, meaning that different initializations can lead to different clustering results. This sensitivity can result in inconsistent outcomes, making it challenging to determine the optimal solution. Outliers can also significantly impact the performance of clustering algorithms. In some cases, the presence of outliers may cause clusters to become skewed or distorted, leading to inaccurate or misleading results. Robust algorithms that can handle outliers, such as DBSCAN, may be more suitable for such situations.
  • Determining the Optimal Number of Clusters Deciding on the optimal number of clusters is often a challenging task. In some algorithms, such as K-means, the number of clusters must be predefined, which can be problematic if the true number of clusters is unknown. Users must rely on heuristics or validation measures, such as the silhouette score or elbow method, to estimate the best number of clusters. These methods, however, may not always provide a definitive answer and are subject to interpretation.
  • Scalability and Computational Complexity Cluster analysis can become computationally expensive and time-consuming, particularly for large datasets. Some algorithms, such as hierarchical clustering, have high computational complexity, making them unsuitable for handling large amounts of data. In such cases, users may need to consider more efficient algorithms or implement techniques such as dimensionality reduction or data sampling to improve performance.

How to use Cluster Analysis

Analyzing clustering data is a crucial step in uncovering hidden patterns and structures within your dataset. By following a systematic approach, you can effectively identify meaningful groups and gain valuable insights from your data.

  • Preparing Your Data – Before diving into cluster analysis, it’s essential to prepare your data by cleaning and preprocessing it. This process may involve removing outliers, handling missing values, and scaling or normalizing features. Proper data preparation ensures that your clustering analysis produces accurate and meaningful results.
  • Choosing the Right Clustering Algorithm – There are various clustering algorithms available, each with its strengths and weaknesses. Consider the sample size , distribution, and shape of your dataset when selecting the most appropriate algorithm. Remember that no single algorithm is universally applicable, so it’s essential to choose the one that best suits your specific data characteristics.
  • Elbow Method: Plot the variance explained (or within-cluster sum of squares) against the number of clusters. Look for the “elbow” point, where adding more clusters results in only marginal improvements.
  • Silhouette Score : Calculate the silhouette score for different numbers of clusters and choose the one with the highest score.
  • Gap Statistic: Compare the within-cluster dispersion to a reference distribution and choose the number of clusters where the gap is the largest.
  • Applying the Clustering Algorithm – Once you have chosen the appropriate algorithm and determined the optimal number of clusters, apply the algorithm to your dataset. Most programming languages and data analysis tools, such as Python, R, or Excel, offer built-in functions or libraries for performing cluster analysis. Be sure to fine-tune any algorithm-specific parameters to ensure the best results.

In conclusion, cluster analysis is a powerful data mining technique that uncovers hidden patterns and structures within large datasets by grouping similar data points together. This guide has explored the fundamental concepts and techniques of cluster analysis, providing a strong foundation for leveraging this valuable tool in research and organizations.

One essential takeaway is the significance of understanding various clustering algorithms, each with its unique strengths and weaknesses. Selecting the most suitable algorithm for your dataset, such as K-means, hierarchical clustering, DBSCAN, or spectral clustering, is critical for obtaining accurate and meaningful results. Additionally, determining the optimal number of clusters, preparing data, and evaluating clustering results are crucial steps in the process.

FAQ on Cluster Analysis

What is cluster analysis, and why is it important.

Cluster analysis is a data mining technique that groups similar data points together based on their attributes, uncovering hidden patterns and structures within large datasets. It is important because it enables researchers and organizations to gain valuable insights, make informed decisions, and drive innovation.

How do I choose the right clustering algorithm for my data?

Selecting the right clustering algorithm depends on factors such as dataset size, distribution, and shape. Some popular algorithms include K-means, hierarchical clustering, DBSCAN, and spectral clustering. It's essential to understand the strengths and weaknesses of each algorithm and choose the one that best suits your data's unique characteristics.

How can I determine the optimal number of clusters for my dataset?

Several methods can help guide your decision, such as the Elbow Method, Silhouette Score, and Gap Statistic. Each method aims to identify the number of clusters that maximizes within-cluster cohesion and between-cluster separation, leading to meaningful and interpretable results.

How do I analyze cluster data?

Evaluating clustering results can be done through visual inspection, by plotting data points and color-coding them based on cluster assignments, or using metrics such as Silhouette Score and Adjusted Rand Index (ARI) to measure clustering performance. Visualizations like scatter plots, heatmaps, or dendrograms can also provide insights into the relationships between data points and overall data structure.

Related pages

Turf analysis.

Learn how TURF Analysis can optimize your product range and media plans. Unlock strategies to maximize market reach and improve ROI.

Regression Analysis

Explore the power of Regression Analysis to forecast trends, assess risks, and make data-driven decisions. .

Key Driver Analysis

Explore Key Driver Analysis (KDA): the game-changing statistical tool that identifies what really drives customer satisfaction and loyalty.

Discover how the Kano Model guides market research by categorizing customer needs. Optimize product features to boost satisfaction & ROI.

Van Westendorp Price Sensitivity Meter

Comprehensive guide to the Van Westendorp pricing model: ✓ Definition ✓ Implementation ✓ Graph ✓ Interpretation ► Get informed

Discover the t-test, a statistical method to compare group means, and learn how to calculate it to make data-driven decisions.

MaxDiff Scaling

Discover MaxDiff Scaling, a powerful technique to measure relative preferences, with real-world examples and guidance on effective usage.

Implicit Association Test

Uncover hidden biases with the Implicit Association Test. Delve into your subconscious preferences in a revealing psychological experiment.

Gabor-Granger Analysis

Learn to determine the optimal price with our Gabor-Granger analysis guide covering the basics, benefits, drawbacks, and tips.

Conjoint Analysis

Learn about conjoint analysis, a powerful market research technique used to determine how consumers value different product attributes.

  • Privacy Overview
  • Strictly Necessary Cookies
  • Additional Cookies

This website uses cookies to provide you with the best user experience possible. Cookies are small text files that are cached when you visit a website to make the user experience more efficient. We are allowed to store cookies on your device if they are absolutely necessary for the operation of the site. For all other cookies we need your consent.

You can at any time change or withdraw your consent from the Cookie Declaration on our website. Find the link to your settings in our footer.

Find out more in our privacy policy about our use of cookies and how we process personal data.

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot properly without these cookies.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as additional cookies.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

Book cover

Handbook of Market Research pp 221–249 Cite as

Cluster Analysis in Marketing Research

  • Thomas Reutterer 4 &
  • Daniel Dan 5  
  • Reference work entry
  • First Online: 03 December 2021

7062 Accesses

Cluster analysis is an exploratory tool for compressing data into a smaller number of groups or representing points. The latter aims at sufficiently summarizing the underlying data structure and as such can serve the analyst for further consideration instead of dealing with the complete data set. Because of this data compression property, cluster analysis remains to be an essential part of the marketing analyst’s toolbox in today’s data rich business environment. This chapter gives an overview of the various approaches and methods for cluster analysis and links them with the most relevant marketing research contexts. We also provide pointers to the specific packages and functions for performing cluster analysis using the R ecosystem for statistical computing. A substantial part of this chapter is devoted to the illustration of applying different clustering procedures to a reference data set of shopping basket data. We briefly outline the general approach of the considered techniques, provide a walk-through for the corresponding R code required to perform the analyses, and offer some interpretation of the results.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Adams, R. A., & Fournier, J. J. (2003). Sobolev spaces (Pure and applied mathematics) (Vol. 140). Amsterdam: Elsevier.

Google Scholar  

Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis . Beverly Hills: Sage.

Book   Google Scholar  

Anderberg, M. R. (1973). Cluster analysis for applications . New York: Academic.

Arabie, P., & Lawrence, J. H. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Advanced methods of marketing research (pp. 160–189). Cambridge, MA: Blackwell.

Arabie, P., & Lawrence, J. H. (1996). An overview of combinatorial data analysis. Clustering and classification (pp. 5–63). Singapore: World Scientific.

Arabie, P., Carroll, J. D., DeSarbo, W., & Wind, J. (1981). Overlapping clustering: A new method for product positioning. Journal of Marketing Research, 28 (3), 310–317.

Article   Google Scholar  

Bock, H. H. (1974). Automatische Klassifikation . Göttingen: Vandenhoeck & Ruprecht.

Boztuğ, Y., & Reutterer, T. (2008). A combined approach for segment-specific market basket analysis. European Journal of Operational Research, 187 (1), 294–312.

Breugelmans, E., Boztuğ, Y., & Reutterer, T. (2010). A multistep approach to derive targeted category promotions. Working paper series of the Marketing Science Institute, MSI report no. 10-118, Cambridge, MA.

Büschken, J., & Allenby, G. M. (2016). Sentence-based text analysis for customer reviews. Marketing Science, 35 (6), 953–975.

Cattell, R. B. (1943). The description of personality: Basic traits resolved into clusters. Journal of Abnormal and Social Psychology, 38 (4), 476–506.

Chapman, C., & McDonnell Feit, E. (2019). Segmentation: Clustering and classification. R for marketing research and analytics (pp. 299–338). New York: Springer.

Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1 (2), 224–227.

Decker, R. (2005). Market basket analysis by means of a growing neural network. The International Review of Retail, Distribution and Consumer Research, 15 (2), 151–169.

DeSarbo, W. S., Ajay, K. M., & Lalita, A. M. (1993). Non-spatial tree models for the assessment of competitive market structure: An integrated review of the marketing and psychometric literature. In J. Eliashberg & G. L. Lilien (Eds.), Handbooks in operations research and management science (Vol. 5, pp. 193–257). Amsterdam: Elsevier.

Dimitriadou, E., Dolničar, S., & Weingessel, A. (2002). An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67 (1), 137–159.

Dolnicar, S., & Leisch, F. (2003). Winter tourist segments in Austria: Identifying stable vacation styles using bagged clustering techniques. Journal of Travel Research, 41 (3), 281–292.

Dolnicar, S., Grün, B., Leisch, F., & Schmidt, K. (2014). Required sample sizes for data-driven market segmentation analyses in tourism. Journal of Travel Research, 53 (3), 296–306.

Dolnicar, S., Grün, B., & Leisch, F. (2018). Market segmentation analysis. Understanding it, doing it, and making it useful . Singapore: Springer.

Dréze, X., & Hoch, S. J. (1998). Exploiting the installed base using cross-merchandising and category destination programs. International Journal of Research in Marketing, 15 (5), 459–471.

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis: Wiley series in probability and statistics . New York: Wiley.

Farris, J. S. (1969). On the cophenetic correlation coefficient. Systematic Zoology, 18 (3), 279–285.

Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97 (458), 611–631.

Fraley, C., & Raftery, A. E. (2003). Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. Journal of Classification, 20 (2), 263–286.

Frühwirth-Schnatter, S. (2006). Finite mixture and Markov switching models . New York: Springer Science & Business Media.

Ghesmoune, M., Lebbah, M., & Azzag, H. (2016). State-of-the-art on clustering data streams. Big Data Analytics, 1 (13), 1–27.

Grover, R., & Srinivasan, V. (1987). A simultaneous approach to market segmentation and market structuring. Journal of Marketing Research, 24 , 139–153.

Hartigan, J. A. (1975). Clustering algorithms . New York: Wiley.

Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C: Applied Statistics, 28 (1), 100–108.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). Unsupervised learning. In The elements of statistical learning (pp. 485–585). New York: Springer.

Chapter   Google Scholar  

Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2015). Handbook of cluster analysis . Boca Raton/London/New York: CRC Press.

Hornik, K. (2004). Cluster ensembles. In C. Weihs, W. Gaul (Eds.), Classification – The ubiquitous challenge. Proceedings of the 28th annual conference of the Gesellschaft für Klassifikation E.V (pp. 65–72). Heidelberg: University of Dortmund/Springer.

Hornik, K. (2005). A clue for cluster ensembles. Journal of Statistical Software, 14 (12), 1–25.

Hruschka, H. (1986). Market definition and segmentation using fuzzy clustering methods. International Journal of Research in Marketing, 3 (2), 117–134.

Hruschka, H., & Natter, M. (1986). Comparing performance of feedforward neural nets and K-means for cluster-based market segmentation. European Journal of Operational Research, 114 (2), 346–353.

Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data . Upper Saddle River: Prentice-Hall.

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis . Hoboken: Wiley.

Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics & Data Analysis, 51 (2), 526–544.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1 (14), 281–297.

Manchanda, P., Ansari, A., & Gupta, S. (1999). The “shopping basket”: A model for multicategory purchase incidence decisions. Marketing Science, 18 (2), 95–114.

Mazanec, J. A. (1999). Simultaneous positioning and segmentation analysis with topologically ordered feature maps: A tour operator example. Journal of Retailing and Customer Services, 6 (4), 219–235.

Mazanec, J. A., & Strasser, H. (2000). A nonparametric approach to perceptions-based market segmentation: Foundations (Vol. 1). Wien: Springer.

McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to clustering . New York: Marcel Dekker.

Mild, A., & Reutterer, T. (2003). An improved collaborative filtering approach for predicting cross-category purchases based on binary market basket data. Journal of Retailing and Consumer Services, 10 (3), 123–133.

Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50 (2), 159–179.

Mooi, E., Sarstedt, M., & Mooi-Reci, I. (2018). Data. In Market research (pp. 27–50). Singapore: Springer.

Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine your own business: Market-structure surveillance through text mining. Marketing Science, 31 (3), 521–543.

Ng, R. T., & Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14 (5), 1003–1016.

Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20 (2), 134–148.

R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing . Vienna: R Development Core Team.

Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66 , 846850.

Rao, V. R., & Sabavala, D. J. (1981). Inference of hierarchical choice processes from panel data. Journal of Consumer Research, 8 (1), 85–96.

Reutterer, T. (1998). Competitive market structure and segmentation analysis with self-organizing feature maps. In P. Anderson (Ed.), Proceedings of the 27th EMAC conference. Track 5: Marketing research (pp. 85–105). Stockholm: EMAC.

Reutterer, T. (2003). Bestandsaufnahme und aktuelle Entwicklungen bei der Segmentierungsanalyse von Produktmarkten. Journal für Betriebswirtschaft, 53 (2), 52–74.

Reutterer, T., & Natter, M. (2000). Segmentation-based competitive analysis with MULTICLUS and topology representing networks. Computers & Operations Research, 27 (11–12), 1227–1247.

Reutterer, T., Mild, A., Natter, M., & Taudes, A. (2006). A dynamic segmentation approach for targeting and customizing direct marketing campaigns. Journal of Interactive Marketing, 20 (3–4), 43–57.

Reutterer, T., Hahsler, M., & Hornik, K. (2007). Data mining und marketing am beispiel der explorativen warenkorbanalyse. Marketing ZFP, 29 (3), 163–180.

Reutterer, T., Hornik, K., March, N., & Gruber, K. (2017). A data mining framework for targeted category promotions. Journal of Business Economics, 87 (3), 337–358.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20 , 53–65.

Russell, G. J., & Petersen, A. (2000). Analysis of cross category dependence in market basket selection. Journal of Retailing, 76 (3), 367–392.

Russell, G. J., Ratneshwar, S., Schocker, A. D., Bell, D., Bodapat, A., Degeratu, A., Hildebrandt, L., Kim, N., Ramaswami, S., & Shankar, V. H. (1999). Multiple-category decision-making: Review and synthesis. Marketing Letters, 10 (3), 319–332.

Saraçli, S., Doğan, N., & Doğan, I. (2013). Comparison of hierarchical cluster analysis methods by cophenetic correlation. Journal of Inequalities and Applications, 2013 (1), 203.

Sneath, P. H. (1957). Some thoughts on bacterial classification. Journal of General Microbiology, 17 , 184–200.

Sokal, R. R., & Sneath, P. H. A. (1963). Principles of numerical taxonomy (A series of books in biology). San Francisco: W.H. Freeman.

Späth, H. (1977). Cluster-analyse – Algorithmen zur Objektklassifizierung und Datenreduktion (2nd ed.). München/Wien: Oldenbourg Wissenschaftsverlag.

Srivastava, R. K., Leone, R. P., & Shocker, A. D. (1981). Market structure analysis: Hierarchical clustering of products based on substitution-in-use. Journal of Marketing, 45 (3), 38–48.

Srivastava, R. K., Alpert, M. I., & Shocker, A. D. (1984). A customer-oriented approach for determining market structures. Journal of Marketing, 48 (2), 32–45.

Strasser, H. (2000). Reduction of complexity. In J. Mazanec & H. Strasser (Eds.), A nonparametric approach to perceptions-based market segmentation: Foundations (pp. 99–140). Wien/New York: Springer.

Strehl, A., & Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15 (2), 208–230.

Struyf, A., Hubert, M., & Rousseeuw, P. (1996). Clustering in an object-oriented environment. Journal of Statistical Software, 1 (4), 1.

Tirunillai, S., & Tellis, G. J. (2014). Mining marketing meaning from online chatter: Strategic brand analysis of big data using Latent Dirichlet allocation. Journal of Marketing Research, 51 (4), 463–479.

Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions . Chichester: Wiley.

Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58 , 236–244.

Wedel, M., & Kamakura, W. A. (2000). Market segmentation – Conceptual and methodological foundations . New York: Springer.

Download references

Author information

Authors and affiliations.

Department of Marketing, WU Vienna University of Economics and Business, Vienna, Austria

Thomas Reutterer

Department of New Media, Modul University Vienna, Vienna, Austria

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Thomas Reutterer .

Editor information

Editors and affiliations.

Department of Business-to-Business Marketing, Sales, and Pricing, University of Mannheim, Mannheim, Germany

Christian Homburg

Department of Marketing & Sales Research Group, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Martin Klarmann

Marketing & Sales Department, University of Mannheim, Mannheim, Germany

Arnd Vomberg

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this entry

Cite this entry.

Reutterer, T., Dan, D. (2022). Cluster Analysis in Marketing Research. In: Homburg, C., Klarmann, M., Vomberg, A. (eds) Handbook of Market Research. Springer, Cham. https://doi.org/10.1007/978-3-319-57413-4_11

Download citation

DOI : https://doi.org/10.1007/978-3-319-57413-4_11

Published : 03 December 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-57411-0

Online ISBN : 978-3-319-57413-4

eBook Packages : Business and Management Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

cluster analysis in marketing research example

Home Market Research Research Tools and Apps

Cluster Analysis: What it is & How to Use It

Bucket research data into groups to make statistical inferences with cluster analysis. Learn how to use the method with its different types.

Data is imperative for brands and organizations to derive inferences and draw conclusions into the minds of customers. Cluster analysis is a critical component of data analysis in market research that aids brands with deriving trends, identifying groups among various demographics of customers, purchase behaviors, likes and dislikes, and more. 

This analysis method in the market research process provides insights to bucket information into smaller groups that help understand how different groups of individuals behave under similar circumstances. Various organizations and researchers can qualify clusters into varied categories depending on pre-defined criteria of what makes sense of a cluster, but the underlying data analysis theme is similar.

Content Index

What is cluster analysis?

  • Hierarchical clustering

Centroid-based clustering

Distribution-based clustering, density-based clustering, cluster analysis examples.

  • Cluster Analysis with QuestionPro

Cluster analysis is a statistical method in research that allows researchers to bucket or group a set of objects into small but distinct clusters that differ in characteristics from other such different clusters. The underlying theme in exploratory data analysis helps brands, organizations, and researchers derive insights from visual data to spot trends and validate hypotheses and explicit assumptions. 

This analysis method in research is commonly based on statistical data analysis used in varied fields, including pattern recognition, machine learning, insights management in market research, data scrubbing, bioinformatics, and more. 

The objective of cluster analysis is to find groups of objects with distinct behavioral changes but where the underlying characteristics and the things are in the same control group. An excellent example of this research method is banks using qualitative and quantitative data to plot trends in claims processing among clients. Using cluster analysis helps them conclude fraudulent claims and better understand consumer behavior .

Discover a wealth of insights in our latest article, showcasing diverse examples of qualitative data in education .

Cluster Analysis Methods 

Cluster analysis helps researchers and statisticians to make a more profound sense of data and make better decisions. While the data can be a part of qualitative research or quantitative research , data analysis is still conducted in a research platform where the data is plotted on a graph. However, as mentioned above, various cluster analysis methods are used to suit research needs.

However, it is essential to note that the clustering method needs to be chosen experimentally unless there is mathematical reasoning to go with a specific manner. Let us look at the most commonly used cluster analysis methods.  

Cluster Analysis Methods

Hierarchical clustering or connectivity-based clustering analysis  

Hierarchical clustering or connectivity-based clustering analysis is the most commonly used method in cluster analysis. In this method, data that showcase similar components are grouped to form a cluster.

These clusters are then correlated to other sets that show identical properties to form other clusters. The central premise of this method in survey research is that objects closer are much more related than objects farther apart.

The other method in hierarchical clustering is the divisive method, where you start with a set of data and then divide them into smaller clusters of similar information. In this method, linkage criteria between clusters are better defined to understand the distance between clusters and their relation. It is important to note that there is no single data partitioning in this analysis model. 

In this clustering method, clusters are formed but are defined by a single central vector point. Using the K-means method clustering algorithm, a central point is found on the axis with a defined objective. Then smaller clusters are connected to this central such that the distance between the clusters and this central point is minimized. 

A drawback of this cluster analysis technique is that the number of clusters, k-clusters is to be defined right at the outset, limiting data analysis and representation. 

The distribution-based clustering analysis method groups data into objects of the same distribution. This method is the most widely used statistical analysis method . This method’s distinct characteristic is simple random sampling to collect sample objects from a distribution.

This model works best when there is a need to display a correlation between attributes and objects. However, the drawback of this model is that since objects are grouped based on predefined attributes, there could be an element of bias in the clustering since each object must match a distribution.  

The density-based clustering method is the fourth commonly used cluster analysis technique, where clusters are defined based on density compared to the overall data set. The objects in the sparse areas are noise and border points, as they typically separate clusters on the graphical representation.

DBSCAN is the most commonly used density-based clustering method. However, a drawback of this method is that a drop in density is required to showcase the difference between two clusters, which often feels unnatural.

Cluster analysis is a definite benefit, and it is widely used across industries, functionalities, and the research field. To better depict the usefulness of cluster analysis in research , let us look at the bottom two examples. 

Cluster analysis in retail marketing

Brands traditionally use cluster analysis to make sense of purchase behavior research and trends by using demographic segmentation among their customer base. A few factors usually considered are geographical location, sex, age, annual family income, etc.

These parameters highlight how different consumer groups make other purchase decisions; hence, retail giants use this data to draw parallels on how to market to such audiences. This also helps in maximizing the ROI on spending while reducing customer churn .  

Cluster analysis in sports sciences

Another everyday use case of cluster analysis is in the field of sports. Data scientists, researchers, doctors, team management, scouts, etc., look at how similar players fare in different scenarios and how effective they are in their sport. Players are bucketed into body type, age, position, and similar criteria to check their effectiveness. 

Cluster Analysis with QuestionPro 

Looking at the correct data and analyzing it is highly beneficial for researchers and brands. Using a mature research platform like QuestionPro allows you to collect research data and helps you run advanced analysis within the tool to give you the insights that matter. 

Leveraging QuestionPro , it is possible to understand your customers and other research objects better and quickly make decisions that matter. Leverage the power of the enterprise-grade research suite today!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

A/B testing software

Top 13 A/B Testing Software for Optimizing Your Website

Apr 12, 2024

contact center experience software

21 Best Contact Center Experience Software in 2024

Government Customer Experience

Government Customer Experience: Impact on Government Service

Apr 11, 2024

Employee Engagement App

Employee Engagement App: Top 11 For Workforce Improvement 

Apr 10, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Cluster analysis: Insights into target groups, markets & products

Appinio Research · 06.12.2023 · 14min read

Statistical data analysis using cluster analysis

Gaining profound insights into target groups and markets , fortifying customer loyalty, propelling developmental endeavors, and evaluating the risks associated with a product launch – all these feats become attainable through a great tool: cluster analysis. This method unravels patterns and correlations embedded within vast datasets.

In the following article, we delve into the essence of cluster analysis, tracing its origins, highlighting its merits for both market research and companies, delineating the prerequisites for a successful analysis, and uncovering the untapped potential that cluster analysis holds for optimizing your marketing strategies.

What is a cluster analysis?

Cluster analysis emerges as a versatile powerhouse in the realm of market research. 

This statistical method enables the identification of patterns and clusters within data, where shared characteristics or properties bind elements together. 

Homogeneous groups, referred to as 'clusters,' encapsulate akin data points or objects. 

Through this segmentation, companies can acquire specific insights into their customers, products, or markets, subsequently translating these revelations into strategic marketing initiatives.

When does cluster analysis add value?

Cluster analysis emerges as a valuable asset for companies seeking profound market and target group insights. 

Its utility becomes most pronounced when companies grapple with a substantial volume of customers or products, and categorizing them into distinct groups is a must. 

This segmentation facilitates a more focused approach to diverse customer segments, fostering the development of personalized marketing strategies and target group approaches. Ultimately, cluster analysis serves as a catalyst for enhancing competitiveness, enabling companies to deploy resources more efficiently and make informed decisions in marketing and product development.

A brief history of cluster analysis in market research

The inception of cluster analysis dates back to the 1930s. However, it was during the 1950s and 1960s that diverse approaches to cluster analysis took shape, capturing the imagination of both market research and marketing spheres. 

Recognizing its potential, companies embraced this analytical tool to segment customer data and pinpoint target groups , paving the way for tailored approaches and optimized marketing strategies.

The advent of computers in the 1980s marked a turning point, rendering cluster analysis more accessible and efficient. 

Today, it enjoys unprecedented popularity. 

Technological strides, including big data and advanced statistical software like SPSS, coupled with real-time, app-based data analysis, have elevated cluster analysis to an indispensable status for business success. 

It serves as a vital element for understanding market segments, identifying customer needs, and formulating competitive advantages.

Applications of cluster analysis

Cluster analysis finds its foothold not only in market research but extends its reach to diverse fields, showcasing its versatility in classifying customers or data into homogeneous target groups. 

This method proves invaluable for identifying patterns and correlations, paving the way for the development of personalized marketing strategies.

Beyond market research, cluster analysis finds application in various sectors:

  • The social sciences leverage cluster analysis to segment population groups and unveil behavioral patterns. 
  • In healthcare , it aids in crafting personalized treatment plans for patient groups. 
  • The finance domain benefits from portfolio optimization and risk minimization. 
  • In biology , the tool is instrumental in unraveling genetic patterns and family trees, contributing to the exploration of specific causes of diseases.
  • In the realm of mobility planning and logistics , it facilitates the examination of traffic flows, enabling the planning of more efficient routes.

The advantages and disadvantages of cluster analysis

Cluster analysis emerges as a transformative tool, propelling companies to new heights by offering a plethora of opportunities.

  • Recognizing patterns and structures : In the vast landscape of data, cluster analysis unveils hidden patterns and structures, providing invaluable insights.
  • Target group identification and segmentation : It facilitates the precise identification and segmentation of t arget groups , laying the groundwork for targeted marketing strategies.
  • Personalizing marketing strategies : With a focus on personalization, cluster analysis empowers companies to tailor marketing strategies, thus enhancing overall efficiency.
  • Optimizing products and services : Companies can refine their products and services by leveraging the findings of cluster analysis, ensuring alignment with customer needs.
  • Informed decision-making : Serving as a cornerstone for corporate strategy and marketing plans, cluster analysis provides a sound basis for informed decision-making.

Disadvantages

However, like any method, cluster analysis has its drawbacks:

  • Subjectivity in cluster selection : The selection of clusters and determining their number can be subjective, introducing an element of interpretation.
  • Resource-intensive for large datasets : Dealing with large datasets can be research-intensive and resource-consuming, potentially slowing down the analysis process.
  • Impact of outliers : Individual data points acting as outliers may adversely affect clustering accuracy.
  • Assumption risks : Analyses are susceptible to inaccuracies if built on incorrect assumptions regarding data classification.
  • Risk of over-clustering : There is a risk of creating an excessive number of clusters, potentially leading to representations that are no longer truly reflective of the underlying data.

Navigating these considerations judiciously allows companies to harness the full potential of cluster analysis while being mindful of its limitations.

Clusters: a robust foundation for further analyses

The outcomes of cluster analysis serve a dual purpose. Firstly, they provide an excellent launchpad for targeted marketing initiatives. Secondly, these clusters form a solid groundwork for further investigations through regression analysis , factor analysis , or TURF analysis .

  • Regression Analysis This method explores relationships between individual variables, shedding light on the effectiveness or inefficacy of marketing activities. It also unravels intricate connections between distinct segments.
  • Factor Analysis Aimed at simplifying complex datasets, factor analysis sifts through the intricacies to unveil the most crucial factors. This process identifies additional similarities between objects within a cluster, enhancing the depth of understanding.
  • TURF Analysis Leveraging available data, TURF analysis scrutinizes the outcomes of marketing activities, delineating which product and marketing mix yields the most significant reach among customers.

By leveraging clusters as a robust database, companies can not only fine-tune their marketing strategies but also delve deeper into the intricate dynamics and relationships within their customer segments.

Essential prerequisites for effective cluster analysis

The efficacy of a cluster analysis hinges on the foundation of robust data. 

For this purpose, it is imperative to normalize or scale the data, ensuring comparability and facilitating the classification into clusters. 

The clusters themselves necessitate distinctly recognizable characteristics or variables. Equally pivotal is the selection of an appropriate algorithm and analysis software, such as SPSS, coupled with the critical task of determining the optimal number of clusters.

Key requirements include:

  • Normalized or scaled data A prerequisite for meaningful comparisons and cluster classification.
  • Clearly defined characteristics or variables Essential for the identification and differentiation of clusters.
  • Appropriate algorithm The right algorithm is key in ensuring the accuracy and relevance of the analysis.
  • Suitable analysis software Leveraging advanced tools like SPSS enhances the efficiency and accuracy of the cluster analysis.
  • Optimal cluster number determination Striking the right balance in determining the number of clusters is crucial for precision.

Above all, a clear comprehension of the analysis objectives is foundational. This understanding is essential for interpreting clusters meaningfully and extracting strategic insights that can steer informed decision-making.

Methodologies in Cluster Analysis

Cluster analysis employs various methodologies, or methods, depending on the objectives and data scenarios. Among the plethora of approaches, two stand out as the most common: hierarchical cluster analysis and k-means.

  • This method constructs a tree structure encompassing all data points, ranging from individual data points to larger clusters. Clusters can manifest at different hierarchical levels, with two fundamental directions: agglomerative (bottom-up) and divisive (top-down).
  • Provides a comprehensive perspective, offering insights into hierarchical relationships among data points.
  • An iterative technique that categorizes data points into predefined k-clusters, determined before the analysis. The objective is to group similar data points within the same clusters, ensuring each cluster exhibits similar characteristics.
  • Facilitates the identification of patterns and extraction of trends by grouping data points based on their similarities.

These methods offer distinct advantages based on the nature of the data and the analytical objectives. Selecting the most appropriate methodology is pivotal to the success and relevance of the cluster analysis.

Example of different group distributions as a result of a cluster analysis

Real-world application of cluster analysis

Consider a scenario where a company aims to connect with younger target groups , seeking a profound understanding of their needs to tailor individualized marketing initiatives. 

To achieve this, the company conducts a survey gathering demographic data, including age, gender, place of residence, interests, and more. 

The collected data undergoes analysis through cluster analysis to categorize customers into distinct groups.

  • Data collection Demographic information such as age, gender, location, and interests is gathered through a comprehensive survey.
  • Cluster analysis Leveraging cluster analysis, the collected data is meticulously analyzed to identify patterns and commonalities. This results in the segmentation of customers into distinct groups based on shared characteristics.
  • Strategic insights Armed with the segmented customer groups, the company gains valuable insights into the unique needs and preferences of younger target demographics.
  • Targeted marketing measures With a nuanced understanding of each customer cluster, the company can craft targeted marketing strategies tailored to the specific characteristics and preferences of each group.
  • Enhanced engagement By aligning marketing measures with the identified customer clusters, the company maximizes its outreach and engagement with the younger target groups.

In this example, cluster analysis serves as a powerful tool, enabling the company to not only comprehend the diverse needs of younger demographics but also to strategically tailor marketing initiatives, fostering a more personalized and impactful connection with their target audience.

Cluster analysis in nine steps

Cluster analysis stands out as an invaluable tool for extracting coherent patterns and groups from vast datasets, offering profound insights for refining marketing strategies and targeting specific audience segments. 

Here is an overview of the typical nine-step process in cluster analysis:

  • Data preparation Before diving into the analysis, ensure the data, whether customer information or product characteristics, is meticulously collected, complete, and standardized.
  • Variable selection Identify the relevant variables and characteristics for the analysis. These may include demographic data, purchasing behavior, or product features.
  • Data normalization Normalize the data to enhance comparability, allowing for better scaling and extraction of different characteristics across units or value ranges.
  • Method selection Choose the appropriate cluster analysis method based on the specific objectives and data characteristics.
  • Number of clusters determination Decide on the optimal number of clusters to divide the data. This can be achieved through visual inspection or statistical methods like the elbow criterion.
  • Implementation of cluster analysis Assign each data point to a specific cluster using statistical software such as SPSS.
  • Results interpretation Analyze the formed clusters to identify distinctive features and differences between groups. Extract marketing strategies or product enhancements based on these key characteristics.
  • Validation and implementation of results Critically review and validate results using internal or external validation methods. Implement targeted marketing strategies or adjust products and services to cater to the unique needs of individual clusters.
  • Monitoring and adaptation Continuously monitor the effectiveness of implemented measures and adapt strategies as needed. Employ cluster analysis as an ongoing tool to identify market changes and evolving customer behaviors, ensuring flexibility in strategy adjustments.

Why companies should use cluster analysis

Cluster analysis emerges as a great tool, empowering market researchers and companies to distill intricate data into lucid and interpretable patterns. 

By categorizing customers or products into clusters, this method facilitates the precise identification of target groups , paving the way for the development of tailor-made marketing strategies. This strategic approach secures you a competitive edge.

Here's why companies should leverage cluster analysis:

  • Precision in target group identification Cluster analysis allows for the precise identification of target groups , enabling companies to understand and cater to the unique needs of diverse customer segments.
  • Tailor-made marketing strategies By grouping customers or products into clusters, companies can develop tailor-made marketing strategies that resonate with the specific characteristics and preferences of each cluster.
  • Enhanced competitive advantages The insights derived from cluster analysis contribute to the formulation of strategies that enhance competitive advantages. This is especially crucial in navigating today's highly competitive business landscape.
  • Improved market understanding Companies gain a nuanced understanding of market diversity, empowering them to navigate and respond effectively to the dynamic landscapes of various markets.
  • Data-driven decision-making Harnessing cluster analysis enables companies to make data-driven decisions, ensuring strategic choices are informed and optimized for success.

In a business world marked by intense competition, the adoption of cluster analysis emerges as an invaluable asset, offering a pathway to better comprehend markets, meet customer needs, and make informed strategic decisions.

Interested in running your own study?

In our dashboard, you will find questionnaire templates that you can customize and get the insights you need to bring your brand to the next level.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

What is Data Analysis Definition Tools Examples

11.04.2024 | 34min read

What is Data Analysis? Definition, Tools, Examples

What is a Confidence Interval and How to Calculate It

09.04.2024 | 29min read

What is a Confidence Interval and How to Calculate It?

What is Field Research Definition Types Methods Examples

05.04.2024 | 28min read

What is Field Research? Definition, Types, Methods, Examples

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Cluster Analysis – Types, Methods and Examples

Cluster Analysis – Types, Methods and Examples

Table of Contents

Cluster Analysis

Cluster Analysis

Cluster analysis, also known as clustering, is a statistical technique used in machine learning and data mining that involves the grouping of objects or points in such a way that objects in the same group, also known as a cluster, are more similar to each other than to those in other groups. It is a main task of exploratory data analysis and is used in various fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

Cluster Analysis in Research

in Research Cluster analysis is used to group a set of objects or observations into subsets called clusters. The goal of cluster analysis is to identify inherent patterns, similarities, or relationships within the data, by organizing the objects in a way that objects within the same cluster are more similar to each other than to those in other clusters.

Cluster Analysis Methodology

Cluster analysis is a multi-step process and the specific steps can vary somewhat depending on the specific technique being used. However, the general methodology is typically similar and can be outlined as follows:

  • Data preparation : The data you plan to cluster must be gathered, cleaned, and preprocessed. This can involve dealing with missing or erroneous data, transforming data into a usable format, normalizing data so that different scales can be compared, and reducing dimensionality if the data has a high number of variables.
  • Feature selection : This step involves deciding which variables or features will be used for clustering. The selected features should be relevant to the clustering task. Irrelevant or redundant features can distort the structure of the data and lead to poor clustering results.
  • Choice of clustering algorithm : Different clustering algorithms are suitable for different types of data and different clustering tasks. Some algorithms, like K-means, work best with spherical clusters of similar size, while others, like DBSCAN, can handle clusters of different shapes and sizes.
  • Parameter setting : Most clustering algorithms have parameters that need to be set before the algorithm can run. For example, the K-means algorithm requires the number of clusters to be specified in advance. These parameters can have a big impact on the clustering results, so they need to be chosen carefully.
  • Clustering : Run the clustering algorithm on your data. This will typically involve an iterative process where the algorithm continually adjusts the clusters until it finds the best fit for the data.
  • Cluster validation : After the clusters have been formed, it’s important to validate the results to ensure they make sense. This can involve statistical testing, comparison to known classes, or domain-specific validation methods.
  • Interpretation of results : The final step is to interpret the clustering results. This can involve analyzing the characteristics of each cluster, visualizing the clusters, or using the clusters for some subsequent analysis.

Types of Cluster Analysis

Cluster analysis is a versatile process with various types that can be used depending on the specific needs of a task. Here are some common types of cluster analysis:

Partitioning Clustering

This type of clustering divides data into a set of mutually exclusive clusters. The most well-known method in this category is the K-means clustering algorithm, where ‘K’ refers to the pre-specified number of clusters. These methods typically start with a random partitioning of data and refine it through an iterative process.

Hierarchical Clustering

This type of clustering creates a tree of clusters. Hierarchical clustering, not only clusters the data, but also builds a hierarchy of clusters, like a binary tree structure. It comes in two flavors:

  • Agglomerative (Bottom-Up) : Each data point starts in its own cluster and pairs of clusters are merged as one moves up the hierarchy.
  • Divisive (Top-Down) : All data points start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Density-Based Clustering

These types of algorithms look for areas in the feature space where there are high densities of observations. The most famous of these is DBSCAN (Density-Based Spatial Clustering of Applications with Noise). It works by defining a neighborhood around a data point and if there are a minimum number of points within this neighborhood then a cluster is started.

Grid-Based Clustering

These types of algorithms quantize the space into a finite number of cells forming a grid structure and perform all clustering operations on this obtained grid structure. The primary advantage of these algorithms is its fast processing time, which is typically dependent on the number of cells in each dimension in the quantized space.

Model-Based Clustering

These algorithms hypothesize a model for each cluster and find the best fit of data to a given model. Examples of these are Gaussian Mixture Models and Expectation-Maximization algorithms. The advantage here is the model provides a probabilistic framework for estimating the characteristics of the process generating the data.

Subspace Clustering or Biclustering

While in standard clustering, an object belongs to exactly one cluster, in subspace clustering, an object can belong to more than one cluster and each cluster is associated with a subset of the dimensions. This type of clustering is particularly useful for high-dimensional data where each dimension represents a feature of the data.

Cluster Analysis Formulas

Here are some of the key formulas and mathematical concepts used in various cluster analysis methods.

K-means Clustering :

The main objective in K-means is to minimize the within-cluster variance, which is typically measured by Euclidean distance. The formula for Euclidean distance between two points x and y in n dimensional space is:

cluster analysis in marketing research example

Where n is the number of dimensions, x_i is the i -th coordinate of point x , and y_i is the i -th coordinate of point y .

The objective function for K-means, which needs to be minimized, is the sum of the Euclidean distances from each data point to the center of the cluster (centroid) it was assigned to. Here’s how the formula looks like:

cluster analysis in marketing research example

Where ||x_i - v_j|| is the Euclidean distance from data point i to the centroid of cluster j , and w_ij equals 1 if point i belongs to cluster j and 0 otherwise.

Hierarchical Clustering :

In hierarchical clustering, we use different types of linkage methods to find the distance between clusters, which can be single linkage (minimum distance), complete linkage (maximum distance), average linkage, and centroid linkage. Here are the formulas for single and complete linkage methods:

Single Linkage: d(S,T) = min {d(s,t) : s ∈ S, t ∈ T}

Complete Linkage: d(S,T) = max {d(s,t) : s ∈ S, t ∈ T}

Where S and T are two different clusters, s and t are any two points in clusters

S and T respectively, and d(s,t) is the distance between points s and t .

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) :

There isn’t a single formula for DBSCAN like there is for K-means or distance measures in hierarchical clustering. DBSCAN involves more of a procedural algorithm, with key concepts like ε (eps) which is the maximum distance between two samples for them to be considered as in the same neighborhood, and minimum samples which is the minimum number of samples in a neighborhood for a data point to qualify as a core point.

Examples of Cluster Analysis

Examples of Cluster Analysis are as follows:

  • A company wants to launch a new product, and it first needs to identify its target market. By conducting a cluster analysis on its customer data (considering variables such as age, income, past purchasing behavior, geographical location, etc.), the company can identify distinct groups of customers who may respond differently to the new product. For instance, a segment may consist of high-income young adults who are early technology adopters, and they can be targeted with specific marketing strategies.
  • Hospitals and health systems use cluster analysis to improve patient care and operational efficiency. For example, a hospital may group patients based on their symptoms, medical history, and demographics to predict health outcomes and personalize treatments. Also, cluster analysis can be used to identify patterns in the admission rates and optimize staffing and resource allocation accordingly.
  • Banks and financial institutions often use clustering techniques for credit scoring. By clustering clients based on their credit history, income, and other financial data, they can predict the risk of default for new clients and make informed decisions on loan approvals.
  • Telecom companies can use cluster analysis to understand the usage patterns of their customers. This can be based on calling behavior, data usage, recharge patterns, etc. The insights obtained can then be used for customer segmentation, targeted marketing, and customer churn prediction.
  • Online retailers can use cluster analysis for product recommendation systems. By clustering users who have similar browsing and purchasing behaviors, they can recommend products that similar users have liked or bought in the past.
  • Cities and municipalities can use cluster analysis to optimize public transportation routes. By clustering areas based on demand, distance, population density, etc., they can design bus or train routes that efficiently serve the needs of the community.
  • Educational institutions can use cluster analysis to group students based on their performance, learning styles, interests, etc. This can help in personalizing teaching methods, identifying students who may need additional support, and creating effective academic programs.

Applications of Cluster Analysis

Cluster analysis is widely used across many disciplines and industries, given its ability to uncover hidden patterns and groupings within data. Here are some of its key applications:

  • Business and Marketing : In customer segmentation, businesses use clustering to group customers based on similar behaviors or preferences. This enables targeted marketing, improves customer service, and aids in product development.
  • Healthcare and Medicine : Clustering is used for patient classification based on symptoms, genetics, or response to treatments. This can guide diagnoses and therapeutic strategies. It’s also used in genomic research, such as clustering genes with similar expression patterns.
  • Finance : Financial institutions use cluster analysis for portfolio management, risk analysis, and customer segmentation. For instance, customers can be grouped based on their credit scores, income levels, and investment behaviors, allowing for customized financial advice.
  • Environment : Clustering can help in identifying geographical areas with similar climate patterns or biodiversity, which is useful for environmental management and conservation planning.
  • Information Technology : In data mining, clustering is used to discover patterns and associations in large datasets. In cybersecurity, it’s used for anomaly detection to identify unusual patterns or activities.
  • Social Science : Cluster analysis is used to identify groups with similar social behaviors, attitudes, or characteristics. For instance, it can be used to segment populations based on socio-economic variables.
  • Transportation : Cities can use clustering to identify busy hubs or traffic patterns, helping in urban planning and public transport route optimization.
  • Education : Clustering is used to group students based on their learning patterns and performance. This can inform differentiated instruction strategies and early intervention efforts.
  • Astronomy : Astronomers use cluster analysis to categorize stars and galaxies based on their properties.
  • Telecommunications : Telecommunication companies use clustering for network traffic analysis, infrastructure optimization, and customer segmentation.

When to use Cluster Analysis

Cluster analysis is a useful tool when you want to explore your data to find patterns or groupings. Here are some instances where it would be appropriate to use cluster analysis:

  • Understanding Variations : If you have a large amount of data and you want to understand the differences and similarities within your data, cluster analysis can be an effective tool. It allows you to identify the structures in your data and group similar data together.
  • Exploratory Data Analysis : If you are in the early stages of your research and are not sure what you are looking for, cluster analysis can help you to identify patterns, spot anomalies, test hypotheses, or check assumptions.
  • Feature Engineering : Cluster analysis can be used to create new features that can capture the underlying structures in the data. These new features can be used to improve the performance of machine learning models.
  • Segmentation : If you need to segment your market, customers, users, or any other type of entity, cluster analysis can be an effective approach. For example, it is commonly used in marketing to identify different customer segments based on their buying behavior or preferences.
  • Dimensionality Reduction : If your data is high-dimensional (i.e., it has a large number of features), cluster analysis can be used to reduce its dimensionality. This can make the data easier to visualize or to work with.
  • Anomaly Detection : Cluster analysis can be used to detect outliers or anomalies in your data. Anything that doesn’t fit well into any of the identified clusters may be considered an anomaly and could be worth investigating.
  • Preprocessing : Cluster analysis can also be used as a preprocessing step for other machine learning algorithms. For instance, you could use cluster analysis to group your data, then train a separate machine learning model for each cluster.

Advantages of Cluster Analysis

Advantages of Cluster Analysis are as follows:

  • Unsupervised Learning : Cluster analysis doesn’t require labeled data, making it a useful tool for exploratory analysis. It can find patterns and structures in the data that may not be immediately apparent.
  • Versatility : Clustering can be applied across a wide range of disciplines and fields. Whether it’s market segmentation in business, image segmentation in computer vision, or pattern discovery in genomics, cluster analysis has a variety of uses.
  • Simplicity : Some clustering algorithms, such as K-means, are relatively simple to understand and implement. This makes them accessible to analysts and researchers.
  • Insight Extraction : Cluster analysis helps in uncovering meaningful insights from complex and large datasets. This is particularly useful in big data applications where manually interpreting data would be impractical.
  • Data Summarization : Clustering provides a way to summarize the data by grouping similar observations together. This is useful in large datasets where the sheer volume of data makes it hard to analyze individual data points.
  • Anomaly Detection : Clustering can help in identifying outliers or anomalies. Points that are not part of any cluster or are far from the rest of the points in their cluster could be considered anomalous.
  • Preprocessing Step : Clustering can be used as a preprocessing step in machine learning and data mining to improve computational efficiency or the performance of algorithms.
  • Feature Creation : Clustering can be used to create new features that can be used in other machine learning models. For example, cluster assignments or distances to cluster centroids could be used as new features.

Disadvantages of Cluster Analysis

Disadvantages of Cluster Analysis are as follows:

  • Subjectivity : One of the main challenges with cluster analysis is the interpretation of the results. As it’s an unsupervised learning technique, the clusters are not pre-defined and their interpretation can be subjective and not always straightforward.
  • Choosing the Number of Clusters : In some clustering methods like K-means, the number of clusters needs to be specified beforehand. Choosing an inappropriate number of clusters can lead to poor clustering performance. Although there are methods to help determine the optimal number of clusters, they often provide a range rather than a definitive answer.
  • Sensitivity to Initialization and Local Optima : Some algorithms, such as K-means, are sensitive to the initial choice of centroids. Different initializations may yield different results. Also, these algorithms can sometimes converge to a local optimum rather than the global optimum.
  • Assumptions about Cluster Shape and Size : Many clustering algorithms make certain assumptions about the shape and size of the clusters. For instance, K-means assumes that clusters are spherical and roughly equal in size. If these assumptions are not met, the clustering results may be poor.
  • Difficulty with High-Dimensional Data : Clustering can become challenging when dealing with high-dimensional data. The distance between points becomes less meaningful in high-dimensional spaces (a problem often referred to as the “curse of dimensionality”), which can degrade the performance of clustering algorithms.
  • Sensitivity to Noise and Outliers : Many clustering algorithms are sensitive to noise and outliers in the data. A few unusual data points can significantly influence the shape and size of the clusters.
  • Scalability : Some clustering methods can be computationally intensive, especially with large datasets. This could make them unsuitable for applications that require real-time clustering of streaming data.
  • Lack of Predictive Power : Unlike supervised learning models, clustering models typically do not predict an outcome or a target variable. They are primarily used for understanding the underlying structure of the data.

Also see Correlation Analysis

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Substantive Framework

Substantive Framework – Types, Methods and...

Cluster Analysis

Cluster analysis

Quick definition: Cluster analysis is a form of exploratory data analysis in which observations are divided into groups that share common characteristics. Those groups are compared and contrasted with other groups to derive information about the observations.

Key takeaways:

  • Cluster analysis allows organizations to better understand their customers by identifying individuals with similar traits, which can inform how the organization communicates with those customers.
  • There are five main clustering approaches. The most common are K-means clustering and hierarchical, or hierarchy, clustering. The clustering approach an organization takes depends on what is being analyzed and why.
  • To ensure accurate cluster analysis, choose helpful variables (behavior, geography, demographics, etc.) to evaluate the observations, cluster the observations into the right number of groups, and create clusters with high intra-cluster similarity and low inter-cluster similarity.

The following questions were answered in an interview with John Bates, the director of product management for Predictive Marketing Solutions and Analytics Premium for Adobe Marketing Cloud.

What is cluster analysis? What is the purpose of clustering? What are the different types of clustering? What are the characteristics of a good cluster analysis? How do you perform cluster analysis? What do you do with the results of a cluster analysis? How do you make sure your cluster analysis is accurate? Why is cluster analysis important for business strategy? How do you make sure cluster analysis is accurate? How often do organizations update clusters?

What is a cluster analysis?

Cluster analysis is a type of unsupervised classification, meaning it doesn’t have any predefined classes, definitions, or expectations up front. It’s a statistical data mining technique used to cluster observations similar to each other but unlike other groups of observations.

An individual sorting out the chocolates from a sampler box is a good metaphor for understanding clustering. The person may have preferences for certain types of chocolate.

When they sift through their box, there are lots of ways they can group that chocolate. They can group it by milk chocolate vs. dark chocolate, nuts vs. no nuts, fruit

filling, nougat, etc.

The process of separating pieces of candy into piles of similar candy based on those characteristics is clustering. We do it all the time.

What is the purpose of clustering?

The general purpose of cluster analysis in marketing is to construct groups or clusters while ensuring that the observations are as similar as possible within a group.

Ultimately, the purpose depends on the application. In marketing, clustering helps marketers discover distinct groups of customers in their customer base. They then use this knowledge to develop targeted marketing campaigns.

For example, clustering may help an insurance company identify groups of motor insurance policyholders with a high average claim cost.

The purpose behind clustering depends on how a company intends to use it, which is largely informed by the industry, the business unit, and what the company is trying to accomplish.

What are the different types of clustering?

There are five different major clustering approaches:

  • Partitioning algorithms
  • Hierarchy algorithms
  • Density-based algorithms
  • Grid-based algorithms
  • Model-based algorithms

The most common clustering approaches are partitioning and hierarchy algorithms.

The main difference between the two is that partitioning algorithms look to create various partitions and then evaluate them by some criterion, while hierarchy-based algorithms decompose, or split information, based on a criterion.

K-means clustering is probably the most common partitioning algorithm. It’s generally used when the number of classes is fixed in advance. An analyst tells the algorithm how many clusters they want to divide the observations into.

Then each cluster is represented by the center of the cluster, or the mean. It's an efficient option, but it does have some weaknesses. It’s only applicable when the mean is defined and the number of clusters is determined in advance.

It also doesn't deal well with outliers, so if there are observations that are very different from the rest, K-means isn’t the best option.

Another type of algorithm is called expectation maximization (EM). EM is a type of partitioning algorithm, but it's model-based. It works similarly to K-means.

However, instead of assigning examples to clusters to maximize that difference in means or the variables, the EM clustering over the variables computes the probability of cluster memberships, or the likelihood that a single observation falls into a particular cluster.

It uses probability distributions to calculate that number.

The great thing about EM is that it's not mutually exclusive. A customer can have the probability of being associated with multiple clusters.

They will typically get assigned to the one with the highest probability, but they may also have a lot of characteristics or traits with another cluster.

The purpose of hierarchical clustering is to create a hierarchy of groups. This can either be done with an agglomerative process, which starts with each observation in its own cluster and then pairs up similar observations in multiple levels, or a divisive process.

This starts with all the observations in a single cluster and then breaks them into different groups.

A hierarchy cluster is like a data visualization tree. You can see how people start together and then divide out based on different criteria. Hierarchical clustering is great for the end user to be able to see those relationships.

What are the characteristics of a good cluster analysis?

A good clustering method will produce high-quality clusters, which means there is high similarity between observations in a single cluster and low similarity between observations in different clusters.

The quality of the clustering result depends on both the similarity measure used by the method and its implementation. The quality is also measured by the method’s ability to discover some or all hidden patterns that may exist within the data.

A lot of this is evaluated using what’s called a “distance.” Clustering algorithms use a distance measure or metric to determine how to separate observations in the different groups.

The most common one is called Euclidean distance, which shows how far one center of a cluster is from another center of a cluster, but there are many options.

A distance measure often shows how close an observation is to the cluster's mean, or average value, and identifies the cluster's shape.

What are the disadvantages of cluster analysis, and how can companies avoid problems?

Cluster analysis in marketing is an exploratory technique. It's not about making predictions.

In the case of expectation maximization, given the algorithm, it might look at the probability distribution of the data and the probability of assignment to a cluster. That said, it's not making any predictions regarding what those people are likely to do next.

All EM is really doing is helping make sense of data across lots of different variables for a given observation. Companies can only look at a couple of data sets simultaneously and see patterns.

These models are helpful for evaluating lots of data to identify those patterns and then group people who are similar to one another across those traits.

The advantages are that it helps in exploration. It helps inform strategy—how a company might think about their marketing campaigns or make business decisions—but it’s not the end.

Cluster analysis also looks only at known customers. When a new customer begins to interact with a business and the business does not have all the necessary data yet, the customer is an unknown quantity.

They haven't been authenticated, so the company has very little information about them (for instance, where the customer lives). A cluster analysis is static to the assignment at the time and only pertains to the data that’s put into it.

It’s important to regularly re-evaluate clustering and re-apply analysis. If new data comes in, it should be incorporated into the analysis. It’s important never to get too fixated on individual cluster assignments.

Allow clusters to be fluid. And remember to evaluate how customers may move between clusters based on certain interactions they have with the business.

How do you perform cluster analysis?

The first step of cluster analysis is usually to choose the analysis method, which will depend on the size of the data and the types of variables.

Hierarchical clustering, for example, is appropriate for small data sets, while K-means clustering is more appropriate for moderately large data sets and when the number of clusters is known in advance.

Large data sets usually require a mixture of different types of variables, and they generally require a two-step procedure.

After you decide on what method of analysis to use, start the process by choosing the number of cases to subdivide into homogeneous groups or clusters. Those cases, or observations, can be any subject, person, or thing you want to analyze.

Next, choose the variables to include. There could be 1,000 variables, or even 10,000 or 25,000. The number and types of variables chosen will determine what type of algorithm should be used.

Then decide whether to standardize those variables in some way, so that every variable contributes equally to the distance or similarity between the cases. However, the

analysis can be run with both standardized and unstandardized variables.

Each analysis method has a different approach. For K-means clustering, select the number of clusters, then the algorithm iteratively estimates the cluster means and assigns each case to the cluster for which its distance to the cluster mean is the smallest.

For hierarchical clustering, choose a statistic that quantifies how far apart or similar two cases are.

Next, the algorithm selects a method for forming the groups. Finally, the algorithm determines how many clusters are needed to represent the data. It looks at how similar clusters are and splits.

What do you do with the results of a cluster analysis?

Depending on the clustering method, there's usually an associated visualization. That's very common for investigating the results. In the case of K-means, it’s common to use an X, Y axis that shows the distance of groups of observations.

By using that type of visualization, those groupings become very clear. In the case of hierarchical clustering, visualization called a dendrogram is used, which shows the splits in the cut tree.

Why is cluster analysis important for business strategy?

Cluster analysis can benefit a company in multiple ways, including how they market their products.

It can affect whom they market those products to, what retention and sales strategies might be employed, and how they might evaluate prospective customers.

They can cluster current customers and determine their lifetime value relative to their propensity for attrition, and that can inform how they communicate with different customers and how to identify new high-value customers.

How do you make sure your cluster analysis is accurate?

When looking at the accuracy of a cluster, there are three important factors: cluster tendency, number of clusters, and clustering quality.

Before evaluating cluster performance, make sure the data set you’re working with has clustering tendency, which means that it doesn’t contain uniformly distributed points.

For example, it doesn’t benefit the analysis to choose a variable like “species,” because every observation will be the same. There are statistical methods for assessing clustering tendency.

Number of clusters is a required parameter for K-means clustering, but it’s useful for evaluating accuracy in other methods as well. By identifying how many clusters a team intends to work with, they can group observations in the best way to derive helpful insights.

Too few clusters means putting together observations that aren’t similar enough to take action, while too many clusters will divide your observations up too much to be useful.

Clustering quality looks at the level of similarity within a cluster and among separate clusters.

There are multiple methods to ensure a high clustering quality, including the adjusted rand index, the Fowlkes-Mallows scores, mutual information-based scores, and homogeneity completeness.

How often do organizations update clusters?

It often depends on the use case. A high-tech retailer like Best Buy might use clusters at the highest level to align the entire enterprise on personas.

Every employee, from those in the call centers to the individuals in the stores themselves, can look at every customer and classify them into the cluster or persona they most align with.

The company won’t change those clusters very often because they inform a higher-level strategy across the entire business.

But then, within certain departments, you might have micro clusters. Given one of those higher-level clusters, companies may want to cluster individuals more often because they are moving through different life cycle stages of the sales process.

Once they’ve clustered their customers, the cluster becomes stale, so companies might re-cluster those individuals depending on how long the sales cycle is.

People also view

Other glossary terms

Data-Driven Decision Making Data Visualization Market Segmentation Descriptive Analytics Correlation Analysis

Related Adobe products

Adobe Analytics Adobe Audience Manager Adobe Sensei Experience Platform Adobe Target

Cluster analysis card image

  • 1-800-609-6480

Alchemer

  • Your Audience
  • Your Industry
  • Customer Stories
  • Case Studies
  • Alchemer Survey
  • Alchemer Survey is the industry leader in flexibility, ease of use, and fastest implementation. Learn More
  • Alchemer Workflow
  • Alchemer Workflow is the fastest, easiest, and most effective way to close the loop with customers. Learn More
  • Alchemer Digital
  • Alchemer Digital drives omni-channel customer engagement across Mobile and Web digital properties. Learn More
  • Additional Products
  • Alchemer Mobile
  • Alchemer Web
  • Email and SMS Distribution
  • Integrations
  • Panel Services
  • Website Intercept
  • Onboarding Services
  • Business Labs
  • Basic Training
  • Alchemer University
  • Our full-service team will help you find the audience you need. Learn More
  • Professional Services
  • Specialists will custom-fit Alchemer Survey and Workflow to your business. Learn More
  • Mobile Executive Reports
  • Get help gaining insights into mobile customer feedback tailored to your requirements. Learn More
  • Self-Service Survey Pricing
  • News & Press
  • Help Center
  • Mobile Developer Guides
  • Resource Library
  • Close the Loop
  • Security & Privacy

An Introduction to Cluster Analysis

  • Share this post:

What is Cluster Analysis?

Cluster analysis is a statistical method used to group similar objects into respective categories. It can also be referred to as segmentation analysis, taxonomy analysis, or clustering.

The goal of performing a cluster analysis is to sort different objects or data points into groups in a manner that the degree of association between two objects is high if they belong to the same group, and low if they belong to different groups.

Cluster analysis differs from many other statistical methods due to the fact that it’s mostly used when researchers do not have an assumed principle or fact that they are using as the foundation of their research.

This analysis technique is typically performed during the exploratory phase of research, since unlike techniques such as factor analysis , it doesn’t make any distinction between dependent and independent variables. Instead, cluster analysis is leveraged mostly to discover structures in data without providing an explanation or interpretation. 

Put simply, cluster analysis discovers structures in data without explaining why those structures exist. 

For example, when cluster analysis is performed as part of market research , specific groups can be identified within a population. The analysis of these groups can then determine how likely a population cluster is to purchase products or services. If these groups are defined clearly, a marketing team can then target varying cluster with tailored, targeted communication. 

Common Applications of Cluster Analysis 

Marketers commonly use cluster analysis to develop market segments, which allow for better positioning of products and messaging.  company to better position itself, explore new markets, and development products that specific clusters find relevant and valuable.  

Insurance  

Insurance companies often leverage cluster analysis if there are a high number of claims in a given region. This enables them to learn exactly what is driving this increase in claims.  

Geology  

For cities on fault lines, geologists use cluster analysis to evaluate seismic risk and the potential weaknesses of earthquake-prone regions. By considering the results of this research, residents can do their best to prepare mitigate potential damage. 

Putting Clustering into Context

It’s easy to overthink cluster analysis, but our brains naturally cluster data on a regular basis in order to simplify the world around us. Whether we realize it or not, we deal with clustering in practically every aspect of our day-to-day lives.

For example, a group of friends sitting at the same table in a restaurant can be considered a cluster. 

In grocery stores, goods of a similar nature are grouped together in order to make shopping more convenient and efficient.

This list of events during which we use clustering in our everyday lives could go on forever, but perhaps it makes more sense to consider a more classic, archetypal example.

In biology, humans belong to the following clusters: primates, mammals, amniotes, vertebrates, and animals. In this example, note that as we move down the chain of clusters, humans show less and less similarities to the other members of the group. Humans have more in common with primates than they do with other mammals, and more in common with mammals than they do with all animals in general.

The Benefits of Cluster Analysis

Clustering allows researchers to identify and define patterns between data elements. 

Revealing these patterns between data points helps to distinguish and outline structures which might not have been apparent before, but which give significant meaning to the data once they are discovered.

Once a clearly defined structure emerges from the dataset at hand, informed decision-making becomes much easier.

The Different Types of Cluster Analysis

There are three primary methods used to perform cluster analysis:  

Hierarchical Cluster

This is the most common method of clustering. It creates a series of models with cluster solutions from 1 (all cases in one cluster) to n (each case is an individual cluster). This approach also works with variables instead of cases. Hierarchical clustering can group variables together in a manner similar to factor analysis . 

Finally, hierarchical cluster analysis can handle nominal, ordinal, and scale data. But, remember not to mix different levels of measurement into your study.

K-Means Cluster

This method is used to quickly cluster large datasets. Here, researchers define the number of clusters prior to performing the actual study. This approach is useful when testing different models with a different assumed number of clusters.

Two-Step Cluster

This method uses a cluster algorithm to identify groupings by performing pre-clustering first, and then performing hierarchical methods. Two-step clustering is best for handling larger datasets that would otherwise take too long a time to calculate with strictly hierarchical methods. 

Essentially, two-step cluster analysis is a combination of hierarchical and k-means cluster analysis. It can handle both scale and ordinal data, and it automatically selects the number of clusters.

What Does The Clustering Process Look Like?

Step #1: build and distribute a survey.

Your survey should be designed to include multiple measures of propensity to purchase and the preferences for the product at hand. It should be distributed to your population of interest, and your sample size should be large enough to inform statistically-based decisions.

Step #2: Analyze Response Data

It’s considered best practice to perform a factor analysis on your survey to minimize the factors being clustered. If after your factor analysis it’s concluded that a handful of questions are measuring the same thing, you should combine these questions prior to performing your cluster analysis. 

After reducing your data by factoring, perform the cluster analysis and decide how many clusters seem appropriate, and record those cluster assignments. You’ll now be able to view the means of all of your factors across clusters.

Step #3: Take Informed Action!

Comb through your data to identify differences in the means of factors, and name your clusters based on these differences. These differences between clusters are then able to inform your marketing, allowing you to target precise groups of customers with the right message, at the right time, in the right manner.

  • Get Your Free Demo Today Get Demo
  • See How Easy Alchemer Is to Use See Help Docs

Start making smarter decisions

Related posts, introducing alchemer workflow – the fastest, easiest, most effective way to close the loop with customers and employees.

  • February 14, 2023
  • 3 minute read

Alchemer Acquires Apptentive, Market-Leading Mobile Feedback Platform

  • January 4, 2023

How In-app Customer Feedback Helps Drive Revenue and Inform Your Product 

  • April 11, 2024
  • 5 minute read

Don’t let State and Federal Regulations Crush Your IT Department 

  • April 8, 2024
  • 4 minute read

Don’t Let Unknown Data Siloes Put Your Entire School District at Risk

  • April 1, 2024

See it in Action

Request a demo.

cluster analysis in marketing research example

  • Privacy Overview
  • Strictly Necessary Cookies
  • 3rd Party Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

What Is Cluster Analysis?

cluster analysis in marketing research example

Cluster analysis is a data analysis method that clusters (or groups) objects that are closely associated within a given data set . When performing cluster analysis, we assign characteristics (or properties) to each group. Then we create what we call clusters based on those shared properties. Thus, clustering is a process that organizes items into groups using unsupervised machine learning algorithms . 

Cluster analysis is a useful and straightforward tool for understanding data patterns. The main goal of clustering is to identify the clusters and group them accordingly. We can also use cluster analysis to identify anomalies or outliers , which are cases that stand out from the rest of the data. We use anomalies mostly to identify areas or cases that need further investigation. For example, banks use anomaly detection to fight fraud. 

When Is Cluster Analysis Useful?

Cluster analysis helps us understand data and detect patterns. In certain cases, it provides a great starting point for further analysis. In other cases, it can give you the greatest insights from the data. Here are some cases when cluster analysis is more appropriate than other methods like standard deviation or correlation .

Should I Use Cluster Analysis?

  • If you have large and unstructured data sets , it can be expensive and time-consuming to label groups manually. In this case, cluster analysis provides the best solution to divide your data into groups.
  • When you don’t know the number of clusters in advance , cluster analysis can provide the first insight into groups that are available in your data set.
  • When you need to detect outliers in your data, cluster analysis provides an effective method compared to traditional outlier detection methods, such as standard deviation.
  • Cluster analysis can help you detect anomalies .  While outliers are observations distant from the mean, they don’t necessarily represent abnormalities. On the other hand, anomalies relate to identifying rare events or observations that deviate greatly from the mean. 

More From This Expert What Is Extrapolation?

Applications of Cluster Analysis 

Cluster analysis has applications in many disparate industries and fields. Here’s a list of some disciplines that make use of this methodology.

  • Marketing : Cluster analysis is popular in marketing, especially in customer segmentation. This method of analysis helps to both target customer segments and perform sales analysis by groups.
  • Business Operations : Businesses can optimize their processes and reduce costs by analyzing clusters and identifying similarities and differences between data points. For example, you can identify patterns in customer data and improve customer support processes for a particular group that may require special attention.
  • Earth Observation : Using a clustering algorithm , you can create a pixel mask for objects in an image. For example, you can use image segmentation to classify vegetation or built-up areas in a satellite image.
  • Data Science: We can use cluster analysis for predictive analytics . By applying machine learning techniques to clusters, we can create predictive models to make inferences about a particular data set.

More From the Built In Tech Dictionary What Is Geospatial Intelligence?

Types of Clustering Methods

Centroid-based clustering and density-based clustering are two of the most widely used clustering methods.

Centroid-Based Clustering

This type of clustering calculates clusters based on a central point which may or may not be part of the data set. For centroid-based clustering, you can use the K-means clustering algorithm , which divides the data set into k clusters. Data points belong to the cluster with the nearest mean or cluster point.

Density-Based Clustering

Density-based clustering deals with the density of the data points. The clusters are tied to a threshold — a given number that indicates the minimum number of points in a given cluster radius. Density-based clustering is an effective way to identify noise and separate it from the clusters. The most widely used density-based clustering algorithm is density-based spatial clustering of applications with noise (DBSCAN).

Example of Cluster Analysis 

The following example shows you how to use the centroid-based clustering algorithm to cluster 30 different points into five groups. You can plot points on a two-dimensional graph, as shown in the graphs below. 

On the left, we have a random distribution of the 30 points. The first iteration of the K-means clustering divides the points into five groups, with each cluster represented by a different color, as shown in the center graph. 

The algorithm will then iteratively move the points from one cluster to another until the points are grouped optimally. The end result will be five distinct clusters, as shown in the graph on the right.

Cluster Analysis image of three graphs illustrating the process described by the author above.

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Market Research
  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Sydney.

language

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Analysis & Reporting
  • Cluster Analysis

Try Qualtrics for free

What is cluster analysis overview and examples.

14 min read Cluster analysis can be a powerful data-mining tool for any organisation or research project. Here we breakdown what it is, when it’s useful and why – with plenty of examples along the way.

What is cluster analysis?

Cluster analysis is a statistical method for processing data. It works by organising items into groups – or clusters – based on how closely associated they are.

cluster analysis graph

Cluster analysis, like dimension reduction analysis ( factor analysis ), is concerned with data collection in which the variables have not been partitioned beforehand into criterion vs. predictor subsets.

The objective of cluster analysis is to find similar groups of subjects, where the “similarity” between each pair of subjects represents a unique characteristic of the group vs. the larger population/sample. Strong differentiation between groups is indicated through separate clusters; a single cluster indicates extremely homogeneous data.

Cluster analysis is an unsupervised learning algorithm, meaning that you don’t know how many clusters exist in the data before running the model. Unlike many other  statistical methods , cluster analysis is typically used when there is no assumption made about the likely relationships within the data. It provides information about where associations and patterns in data exist, but not what those might be or what they mean.

Free eBook: 2024 market research trends report

When should cluster analysis be used?

Cluster analysis is for when you’re looking to segment or categorise a dataset into groups based on similarities, but aren’t sure what those groups should be.

While it’s tempting to use cluster analysis in many different research projects, it’s important to know when it’s genuinely the right fit. Here are three of the most common scenarios where cluster analysis proves its worth.

Exploratory data analysis

When you have a new dataset and are in the early stages of understanding it, cluster analysis can provide a much-needed guide.

By forming clusters, you can get a read on potential patterns or trends that could warrant deeper investigation.

Market segmentation

This is a golden application for cluster analysis, especially in the business world. Because when you aim to target your products or services more effectively, understanding your customer base becomes paramount.

Cluster analysis can carve out specific customer segments based on buying habits, preferences or demographics, allowing for tailored marketing strategies that resonate more deeply.

Resource allocation

Be it in healthcare, manufacturing, logistics or many other sectors, resource allocation is often one of the biggest challenges. Cluster analysis can be used to identify which groups or areas require the most attention or resources, enabling more efficient and targeted deployment.

How is cluster analysis used?

The most common use of cluster analysis is classification. Subjects are separated into groups so that each subject is more similar to other subjects in its group than to subjects outside the group.

In a  market research  context, cluster analysis might be used to identify categories like age groups, earnings brackets, urban, rural or suburban location.

In marketing, cluster analysis can be used for  audience segmentation , so that different customer groups can be targeted with the most relevant messages.

Healthcare researchers might use cluster analysis to find out whether different geographical areas are linked with high or low levels of certain illnesses, so they can investigate possible local factors contributing to health problems.

Employers, on the other hand, could use cluster analysis to identify groups of employees who have similar feelings about workplace culture, job satisfaction or career development. With this data, HR departments can tailor their initiatives to better suit the needs of specific clusters, like offering targeted training programs or improving office amenities.

Whatever the application, data cleaning is an essential preparatory step for successful cluster analysis. Clustering works at a data-set level where every point is assessed relative to the others, so the data must be as complete as possible.

Cluster analysis in action: A step-by-step example

Here’s how an online bookstore used cluster analysis to transform its raw data into actionable insights.

Step one: Creating the objective

The bookstore’s aim is to provide more personalized book recommendations to its customers. The belief is that by curating book selections that will be more appealing to subgroups of its customers, the bookstore will see an increase in sales.

Step two: Using the right data

The bookstore has its own historical sales data, including two key variables: ‘favorite genre’, which includes categories like sci-fi, romance and mystery; and ‘average spend per visit’.

The bookstore opts to hone in on these two factors as they are likely to provide the most actionable insights for personalized marketing strategies.

Step three: Choosing the best approach

After settling on the variables, the next decision is determining the right analytical approach.

The bookstore opts for K-means clustering for the ‘average spend per visit’ variable because it’s numerical – and therefore scalar data. For ‘favorite genre’, which is categorical – and therefore non-scalar data – they choose K-medoids.

Step four: Running the algorithm

With everything set, it’s time to crunch the numbers. The bookstore runs the K-means and K-medoids clustering algorithms to identify clusters within their customer base.

The aim is to create three distinct clusters, each encapsulating a specific customer profile based on their genre preferences and spending habits.

Step five: Validating the clusters

Once the algorithms have done their work, it’s important to check the quality of the clusters. For this, the bookstore looks at intracluster and intercluster distances.

A low intracluster distance means customers within the same group are similar, while a high intercluster distance ensures the groups are distinct from each other. In other words, the customers within each group are similar to one another and the group of customers are distinct from one another.

Step six: Interpreting the results

Now that the clusters are validated, it’s time to dig into what they actually mean. Each cluster should represent a specific customer profile based solely on ‘favourite genre’ and ‘average spend per visit’.

For example, one cluster might consist of customers who are keen on sci-fi and tend to spend less than $20, while another cluster could be those who prefer romance novels and are in the $20-40 spending range.

Step seven: Applying the findings

The final step is all about action. Armed with this new understanding of their customer base, the bookstore can now tailor its marketing strategies.

Knowing what specific subgroups like to read and how much they’re willing to spend, the store can send out personalised book recommendations or offer special discounts to those specific clusters – aiming to increase sales and customer satisfaction.

Cluster analysis algorithms

Your choice of cluster analysis algorithm is important, particularly when you have mixed data. In major statistics packages you’ll find a range of preset algorithms ready to number-crunch your matrices.

K-means and K-medoid are two of the most suitable clustering methods. In both cases (K) = the number of clusters.

k-means and k-medoids clustering

The K-means algorithm establishes the presence of clusters by finding their centroid points. A centroid point is the average of all the data points in the cluster. By iteratively assessing the Euclidean distance between each point in the dataset, each one can be assigned to a cluster.

The centroid points are random to begin with and will change each time as the process is carried out. K-means is commonly used in cluster analysis, but it has a limitation in being mainly useful for scalar data.

K-medoid works in a similar way to K-means, but rather than using mean centroid points which don’t equate to any real points from the dataset, it establishes medoids, which are real interpretable data-points.

The K-medoids clustering algorithm offers an advantage for survey data analysis as it is suitable for both categorical and scalar data. This is because rather than measuring Euclidean distance between the medoid point and its neighbours, the algorithm can measure distance in multiple dimensions, representing a number of different categories or variables.

K-medoids is less common than K-means in clustering analysis, but is often used when a more robust method that’s less sensitive to outliers is needed.

Measuring clusters using intracluster and intercluster distances

Evaluating the quality of clustering involves a two-pronged approach: assessing intracluster and intercluster distances.

Intracluster distance  is the distance between the data points inside the cluster. If there is a strong clustering effect present, this should be small (more homogenous).

Intercluster distance  is the distance between data points in different clusters. Where strong clustering exists, these should be large (more heterogenous).

In an ideal clustering scenario, you’d use both measures to gauge how good your clusters are. Low intracluster distances – known as high intra-cluster similarity – mean items in the same cluster are similar, which is good; high intercluster distances – known as low inter-cluster similarity – mean different clusters are well-separated, which is also good.

Using both measures gives you a fuller picture of how effective your clustering is.

Differing cluster variations

Key considerations in cluster analysis

When getting started with cluster analysis, it makes sense to start with methods that assign each data point to a single, distinct cluster. It’s commonly accepted that within each cluster, the data points share similarities.

The assumption here is that your data set is composed of different, unordered classes, and that none of these classes are inherently more important than the others. In some cases, however, we may also view these classes as hierarchical in nature, with sub-classes within them – here we could apply hierarchical clustering and hierarchical cluster analysis.

Cluster analysis is often a “preliminary” step. That means before you even start, you’re not applying any previous judgments to split up your data; you’re working on the notion that natural clusters should exist within the data.

This initial approach differs from techniques like discriminant analysis, where you have a dependent variable guiding the classification. In cluster analysis, however, the focus is purely on inherent similarities within the data collection itself.

So, the key questions for cluster analysis would be:

  • What metrics will you use to measure the similarity between data points, and how will each variable be weighted when calculating this measure?
  • Once you’ve determined the similarities, what methods will you use to form the clusters?
  • After forming clusters, what descriptive metrics will help define the nature of each cluster?
  • Assuming you’ve adequately described your clusters, what can you infer about their statistical significance?

This should offer a clearer yet still approachable overview of the essential questions in cluster analysis.

Non-scalar data in cluster analysis

So far, we’ve mainly talked about scalar data – things that differ from each other by degrees along a scale, such as numerical quantity or degree. But what about items that are non-scalar and can only be sorted into categories?

When you’re dealing with such categories like color, species and shape, you can’t easily measure the “distance” between data points like you can with scalar data. Various techniques, like using dummy variables or specialised distance measures, can be employed to include non-scalar data in your cluster analysis.

Dummy variables are a way to convert categories into a format that can be provided to a mathematical model. For example, if you have a color category with options like red, blue and green, you could create separate “dummy” columns for each colour, marking them as 1 if they apply and 0 if they don’t.

Specialised distance measures , on the other hand, are custom calculations designed to figure out how “far apart” different categories are from each other. For example, if you’re clustering based on movie genres, a specialised measure might decide that “action” and “adventure” are closer to each other than “action” and “romance”.

Ideally, the data for cluster analysis is categorical, interval or ordinal data. Using a mix of these types can complicate the analysis, as you’ll need to figure out how to meaningfully compare different kinds of data. It’s doable, but it adds an extra layer of complexity you’ll need to account for.

Cluster analysis and factor analysis

When you’re dealing with a large number of variables – for example a lengthy or complex  survey  – it can be useful to simplify your data before performing cluster analysis so that it’s easier to work with. Using factors reduces the number of dimensions that you’re clustering on, and can result in clusters that are more reflective of the true patterns in the data.

Factor analysis  is a technique for taking large numbers of variables and combining those that relate to the same underlying factor or concept, so that you end up with a smaller number of dimensions. For example, factor analysis might help you replace questions – like “Did you receive good service?”, “How confident were you in the agent you spoke to?” and “Did we resolve your query?” – with a single factor:  customer satisfaction .

This way you can reduce messiness and complexity in your data and arrive more quickly at a manageable number of clusters.

Related resources

Analysis & Reporting

Sentiment Analysis 20 min read

Thematic analysis 11 min read, predictive analytics 19 min read, descriptive statistics 15 min read, statistical significance calculator 18 min read, data analysis 29 min read, regression analysis 19 min read, request demo.

Ready to learn more about Qualtrics?

Statology

Statistics Made Easy

5 Examples of Cluster Analysis in Real Life

Cluster analysis is a technique used in machine learning that attempts to find clusters of observations within a dataset.

The goal of cluster analysis is to find clusters such that the observations within each cluster are quite similar to each other, while observations in different clusters are quite different from each other.

The following examples show how cluster analysis is used in various real-life situations.

Example 1: Retail Marketing

Retail companies often use clustering to identify groups of households that are similar to each other.

For example, a retail company may collect the following information on households:

  • Household income
  • Household size
  • Head of household Occupation
  • Distance from nearest urban area

They can then feed these variables into a clustering algorithm to perhaps identify the following clusters:

  • Cluster 1: Small family, high spenders
  • Cluster 2: Larger family, high spenders
  • Cluster 3: Small family, low spenders
  • Cluster 4: Large family, low spenders

The company can then send personalized advertisements or sales letters to each household based on how likely they are to respond to specific types of advertisements.

Example 2: Streaming Services

Streaming services often use clustering analysis to identify viewers who have similar behavior.

For example, a streaming service may collect the following data about individuals:

  • Minutes watched per day
  • Total viewing sessions per week
  • Number of unique shows viewed per month

Using these metrics, a streaming service can perform cluster analysis to identify high usage and low usage users so that they can know who they should spend most of their advertising dollars on.

Example 3: Sports Science

Data scientists for sports teams often use clustering to identify players that are similar to each other. 

For example, professional basketball teams may collect the following information about players:

  • Points per game
  • Rebounds per game
  • Assists per game
  • Steals per game

They can then feed these variables into a clustering algorithm to identify players that are similar to each other so that they can have these players practice with each other and perform specific drills based on their strengths and weaknesses.

Example 4: Email Marketing

Many businesses use cluster analysis to identify consumers who are similar to each other so they can tailor their emails sent to consumers in such a way that maximizes their revenue.

For example, a business may collect the following information about consumers:

  • Percentage of emails opened
  • Number of clicks per email
  • Time spent viewing email

Using these metrics, a business can perform cluster analysis to identify consumers who use email in similar ways and tailor the types of emails and frequency of emails they send to different clusters of customers.

Example 5: Health Insurance

Actuaries at health insurance companies often used cluster analysis to identify “clusters” of consumers that use their health insurance in specific ways.

For example, an actuary may collect the following information about households:

  • Total number of doctor visits per year
  • Total household size
  • Total number of chronic conditions per household
  • Average age of household members

An actuary can then feed these variables into a clustering algorithm to identify households that are similar. The health insurance company can then set monthly premiums based on how often they expect households in specific clusters to use their insurance.

Additional Resources

The following tutorials explain how to perform various types of cluster analysis using statistical programming languages:

How to Perform K-Means Clustering in Python How to Perform K-Means Clustering in R How to Perform K-Medoids Clustering in R How to Perform Hierarchical Clustering in R

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

IMAGES

  1. Cluster Analysis: What it is & How to Use It

    cluster analysis in marketing research example

  2. A Step-By-Step Guide To Cluster Analysis In Predictive Analytics

    cluster analysis in marketing research example

  3. Market Clustering: A Path to More Effective Customer Marketing

    cluster analysis in marketing research example

  4. Cluster Analysis In Market Research: Quick Guide

    cluster analysis in marketing research example

  5. PPT

    cluster analysis in marketing research example

  6. Using Cluster Analysis for Market Segmentation

    cluster analysis in marketing research example

VIDEO

  1. CLUSTER ANALYSIS

  2. Understand Marketing in 4 Minutes

  3. Cluster Analysis Part 1

  4. XLSTAT

  5. cluster analysis

  6. Master of Science in Marketing: Yunsong Xie (Class of 2023)

COMMENTS

  1. Cluster Analysis

    Your clusters in market segmentation will usually have a heavier emphasis on geographic information, such as metro areas, states, countries, regions, etc., and demographics, such as age, income, gender, etc. Examples of cluster analysis in market segmentation. A company has created what they consider to be the perfect cocktail dress.

  2. Cluster Analysis in Marketing Research

    For example, there is extensive research on the determination of the number of clusters (Milligan and Cooper 1985; Dimitriadou et al. 2002) or on the stability ... G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20(2), 134-148. Article Google Scholar ...

  3. Cluster Analysis: Definition and Examples

    In a market research context, cluster analysis might be used to identify categories like age groups, earnings brackets, urban, rural or suburban location. In marketing, cluster analysis can be used for audience segmentation, ... For example, one cluster might consist of customers who are keen on sci-fi and tend to spend less than $20, while ...

  4. Cluster Analysis in Marketing Research: Review and Suggestions for

    Sherman L. and Sheth J. N. (1977), "Cluster Analysis and Its Applications in Marketing Research," in Multivariate Methods for Market and Survey Research, Sheth J. N., ed. Chicago: American Marketing Association.

  5. Market Clustering: A Path to More Effective Customer Marketing

    Cluster analysis in practice. The image below shows how the outcome of a cluster analysis might look like in practice. This particular example is from Tableau, which provides a built-in function for clustering. A large number of products have been grouped into three distinct clusters, based on their sales value and profit ratio. ‍

  6. Cluster Analysis: Definition, Types, Tipps and Examples

    As a widely used statistical method, cluster analysis helps to identify groups of similar objects within a dataset, making it a valuable tool in fields such as market research, biology, and psychology.In this guide, we cover the definition of cluster analysis, explore its different types, and provide practical examples of its applications.

  7. PDF Cluster Analysis in Marketing Research

    Cluster analysis has a long history and emerged as a major topic in the 1960s and 1970s underthelabel"numericaltaxonomy"(cf.,SokalandSneath1963;Bock1974).The origins of cluster analysis appeared in disciplines such as biology for deriving taxonomies of species or psychology to study personality traits (Cattell 1943).

  8. Cluster Analysis: What it is & How to Use It

    Cluster Analysis Examples. Cluster analysis is a definite benefit, and it is widely used across industries, functionalities, and the research field. To better depict the usefulness of cluster analysis in research, let us look at the bottom two examples. Cluster analysis in retail marketing

  9. Cluster analysis: Insights into target groups, markets & products

    A brief history of cluster analysis in market research. The inception of cluster analysis dates back to the 1930s. However, it was during the 1950s and 1960s that diverse approaches to cluster analysis took shape, capturing the imagination of both market research and marketing spheres. ... In this example, cluster analysis serves as a powerful ...

  10. Cluster Analysis in Marketing Research: Review and Suggestions for

    Applications of cluster analysis to marketing problems are reviewed. Alternative. methods of cluster analysis are presented and evaluated in terms of recent empirical work on their performance characteristics. A two-stage cluster analysis methodology is recommended: preliminary identification of clusters via Ward's minimum variance.

  11. Cluster Analysis

    Cluster Analysis in Research. ... For example, it is commonly used in marketing to identify different customer segments based on their buying behavior or preferences. Dimensionality Reduction: If your data is high-dimensional (i.e., it has a large number of features), cluster analysis can be used to reduce its dimensionality. This can make the ...

  12. What is Cluster Analysis in Marketing?

    The general purpose of cluster analysis in marketing is to construct groups or clusters while ensuring that the observations are as similar as possible within a group. Ultimately, the purpose depends on the application. In marketing, clustering helps marketers discover distinct groups of customers in their customer base.

  13. An Introduction to Cluster Analysis

    For example, when cluster analysis is performed as part of market research, specific groups can be identified within a population. The analysis of these groups can then determine how likely a population cluster is to purchase products or services. If these groups are defined clearly, a marketing team can then target varying cluster with ...

  14. Cluster Analysis In Market Research: Quick Guide

    1. Cluster analysis is a great way to identify different customer segments. 2. It can help you understand how customers interact with your product or service. 3. Cluster analysis can help you ...

  15. Cluster Analysis in Marketing: Techniques, Methods, and Use Cases

    There are several techniques and methods used in cluster analysis of marketing data, including: 1. K-means Clustering: A widely used method that divides data into a specified number (k) of ...

  16. What Is Cluster Analysis? (Examples + Applications)

    Cluster analysis is a data analysis method that clusters (or groups) objects that are closely associated within a given data set. When performing cluster analysis, we assign characteristics (or properties) to each group. Then we create what we call clusters based on those shared properties. Thus, clustering is a process that organizes items ...

  17. Using Cluster Analysis for Market Research

    In its most general definition, a cluster is a group of similar things or people positioned or occurring closely together. In market research, a cluster is a collection of data objects that are similar and dissimilar to each other. The primary objective of cluster analysis is to classify objects into relatively homogeneous groups based on a set ...

  18. K-Means in Marketing Analysis: Clustering 210 US DMAs

    The k-means clustering algorithm is an iterative algorithm that reaches for a pre-determined number of clusters within an unlabeled dataset, and basically works as follows: Select 𝑘 initial seeds. Assign each observation to the cluster with the nearest mean (least squared Euclidean distance)

  19. Cluster Analysis & Market Segmentation

    Understand how cluster analysis is used in marketing, learn how cluster segmentation works, and see examples of customer clustering. Updated: 11/21/2023 Table of Contents

  20. Cluster Analysis in 5 Steps

    Example of Cluster Analysis. ... Using Regression Analysis in Market Research. Connor Brooke. March 21, 2023. Marketing. Transform Your Business with Factor Analysis in 8 Simple Steps.

  21. PDF Cluster Analysis in Marketing Research

    using various clustering methods, we then provide a couple of hands-on examples on how to put cluster analysis into action. Using one and the same data set, we ... Cluster Analysis in Marketing Research 5. Ifthedataisnonmetric(i.e.,nominal,binary,orordinalscales),themostcommon way of quantifying the (dis)similarity between data points is based ...

  22. What Is Cluster Analysis? When Should You Use It

    Unlike many other statistical methods, cluster analysis is typically used when there is no assumption made about the likely relationships within the data. It provides information about where associations and patterns in data exist, but not what those might be or what they mean. Free eBook: 2024 market research trends report.

  23. 5 Examples of Cluster Analysis in Real Life

    The following examples show how cluster analysis is used in various real-life situations. Example 1: Retail Marketing. Retail companies often use clustering to identify groups of households that are similar to each other. For example, a retail company may collect the following information on households: Household income; Household size