Table of Contents

What is data collection, why do we need data collection, what are the different data collection methods, data collection tools, the importance of ensuring accurate and appropriate data collection, issues related to maintaining the integrity of data collection, what are common challenges in data collection, what are the key steps in the data collection process, data collection considerations and best practices, choose the right data science program, are you interested in a career in data science, what is data collection: methods, types, tools.

What is Data Collection? Definition, Types, Tools, and Techniques

The process of gathering and analyzing accurate data from various sources to find answers to research problems, trends and probabilities, etc., to evaluate possible outcomes is Known as Data Collection. Knowledge is power, information is knowledge, and data is information in digitized form, at least as defined in IT. Hence, data is power. But before you can leverage that data into a successful strategy for your organization or business, you need to gather it. That’s your first step.

So, to help you get the process started, we shine a spotlight on data collection. What exactly is it? Believe it or not, it’s more than just doing a Google search! Furthermore, what are the different types of data collection? And what kinds of data collection tools and data collection techniques exist?

If you want to get up to speed about what is data collection process, you’ve come to the right place. 

Transform raw data into captivating visuals with Simplilearn's hands-on Data Visualization Courses and captivate your audience. Also, master the art of data management with Simplilearn's comprehensive data management courses  - unlock new career opportunities today!

Data collection is the process of collecting and evaluating information or data from multiple sources to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities. It is an essential phase in all types of research, analysis, and decision-making, including that done in the social sciences, business, and healthcare.

Accurate data collection is necessary to make informed business decisions, ensure quality assurance, and keep research integrity.

During data collection, the researchers must identify the data types, the sources of data, and what methods are being used. We will soon see that there are many different data collection methods . There is heavy reliance on data collection in research, commercial, and government fields.

Before an analyst begins collecting data, they must answer three questions first:

  • What’s the goal or purpose of this research?
  • What kinds of data are they planning on gathering?
  • What methods and procedures will be used to collect, store, and process the information?

Additionally, we can break up data into qualitative and quantitative types. Qualitative data covers descriptions such as color, size, quality, and appearance. Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll numbers, percentages, etc.

Before a judge makes a ruling in a court case or a general creates a plan of attack, they must have as many relevant facts as possible. The best courses of action come from informed decisions, and information and data are synonymous.

The concept of data collection isn’t a new one, as we’ll see later, but the world has changed. There is far more data available today, and it exists in forms that were unheard of a century ago. The data collection process has had to change and grow with the times, keeping pace with technology.

Whether you’re in the world of academia, trying to conduct research, or part of the commercial sector, thinking of how to promote a new product, you need data collection to help you make better choices.

Now that you know what is data collection and why we need it, let's take a look at the different methods of data collection. While the phrase “data collection” may sound all high-tech and digital, it doesn’t necessarily entail things like computers, big data , and the internet. Data collection could mean a telephone survey, a mail-in comment card, or even some guy with a clipboard asking passersby some questions. But let’s see if we can sort the different data collection methods into a semblance of organized categories.

Primary and secondary methods of data collection are two approaches used to gather information for research or analysis purposes. Let's explore each data collection method in detail:

1. Primary Data Collection:

Primary data collection involves the collection of original data directly from the source or through direct interaction with the respondents. This method allows researchers to obtain firsthand information specifically tailored to their research objectives. There are various techniques for primary data collection, including:

a. Surveys and Questionnaires: Researchers design structured questionnaires or surveys to collect data from individuals or groups. These can be conducted through face-to-face interviews, telephone calls, mail, or online platforms.

b. Interviews: Interviews involve direct interaction between the researcher and the respondent. They can be conducted in person, over the phone, or through video conferencing. Interviews can be structured (with predefined questions), semi-structured (allowing flexibility), or unstructured (more conversational).

c. Observations: Researchers observe and record behaviors, actions, or events in their natural setting. This method is useful for gathering data on human behavior, interactions, or phenomena without direct intervention.

d. Experiments: Experimental studies involve the manipulation of variables to observe their impact on the outcome. Researchers control the conditions and collect data to draw conclusions about cause-and-effect relationships.

e. Focus Groups: Focus groups bring together a small group of individuals who discuss specific topics in a moderated setting. This method helps in understanding opinions, perceptions, and experiences shared by the participants.

2. Secondary Data Collection:

Secondary data collection involves using existing data collected by someone else for a purpose different from the original intent. Researchers analyze and interpret this data to extract relevant information. Secondary data can be obtained from various sources, including:

a. Published Sources: Researchers refer to books, academic journals, magazines, newspapers, government reports, and other published materials that contain relevant data.

b. Online Databases: Numerous online databases provide access to a wide range of secondary data, such as research articles, statistical information, economic data, and social surveys.

c. Government and Institutional Records: Government agencies, research institutions, and organizations often maintain databases or records that can be used for research purposes.

d. Publicly Available Data: Data shared by individuals, organizations, or communities on public platforms, websites, or social media can be accessed and utilized for research.

e. Past Research Studies: Previous research studies and their findings can serve as valuable secondary data sources. Researchers can review and analyze the data to gain insights or build upon existing knowledge.

Now that we’ve explained the various techniques, let’s narrow our focus even further by looking at some specific tools. For example, we mentioned interviews as a technique, but we can further break that down into different interview types (or “tools”).

Word Association

The researcher gives the respondent a set of words and asks them what comes to mind when they hear each word.

Sentence Completion

Researchers use sentence completion to understand what kind of ideas the respondent has. This tool involves giving an incomplete sentence and seeing how the interviewee finishes it.

Role-Playing

Respondents are presented with an imaginary situation and asked how they would act or react if it was real.

In-Person Surveys

The researcher asks questions in person.

Online/Web Surveys

These surveys are easy to accomplish, but some users may be unwilling to answer truthfully, if at all.

Mobile Surveys

These surveys take advantage of the increasing proliferation of mobile technology. Mobile collection surveys rely on mobile devices like tablets or smartphones to conduct surveys via SMS or mobile apps.

Phone Surveys

No researcher can call thousands of people at once, so they need a third party to handle the chore. However, many people have call screening and won’t answer.

Observation

Sometimes, the simplest method is the best. Researchers who make direct observations collect data quickly and easily, with little intrusion or third-party bias. Naturally, it’s only effective in small-scale situations.

Accurate data collecting is crucial to preserving the integrity of research, regardless of the subject of study or preferred method for defining data (quantitative, qualitative). Errors are less likely to occur when the right data gathering tools are used (whether they are brand-new ones, updated versions of them, or already available).

Among the effects of data collection done incorrectly, include the following -

  • Erroneous conclusions that squander resources
  • Decisions that compromise public policy
  • Incapacity to correctly respond to research inquiries
  • Bringing harm to participants who are humans or animals
  • Deceiving other researchers into pursuing futile research avenues
  • The study's inability to be replicated and validated

When these study findings are used to support recommendations for public policy, there is the potential to result in disproportionate harm, even if the degree of influence from flawed data collecting may vary by discipline and the type of investigation.

Let us now look at the various issues that we might face while maintaining the integrity of data collection.

In order to assist the errors detection process in the data gathering process, whether they were done purposefully (deliberate falsifications) or not, maintaining data integrity is the main justification (systematic or random errors).

Quality assurance and quality control are two strategies that help protect data integrity and guarantee the scientific validity of study results.

Each strategy is used at various stages of the research timeline:

  • Quality control - tasks that are performed both after and during data collecting
  • Quality assurance - events that happen before data gathering starts

Let us explore each of them in more detail now.

Quality Assurance

As data collecting comes before quality assurance, its primary goal is "prevention" (i.e., forestalling problems with data collection). The best way to protect the accuracy of data collection is through prevention. The uniformity of protocol created in the thorough and exhaustive procedures manual for data collecting serves as the best example of this proactive step. 

The likelihood of failing to spot issues and mistakes early in the research attempt increases when guides are written poorly. There are several ways to show these shortcomings:

  • Failure to determine the precise subjects and methods for retraining or training staff employees in data collecting
  • List of goods to be collected, in part
  • There isn't a system in place to track modifications to processes that may occur as the investigation continues.
  • Instead of detailed, step-by-step instructions on how to deliver tests, there is a vague description of the data gathering tools that will be employed.
  • Uncertainty regarding the date, procedure, and identity of the person or people in charge of examining the data
  • Incomprehensible guidelines for using, adjusting, and calibrating the data collection equipment.

Now, let us look at how to ensure Quality Control.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Quality Control

Despite the fact that quality control actions (detection/monitoring and intervention) take place both after and during data collection, the specifics should be meticulously detailed in the procedures manual. Establishing monitoring systems requires a specific communication structure, which is a prerequisite. Following the discovery of data collection problems, there should be no ambiguity regarding the information flow between the primary investigators and staff personnel. A poorly designed communication system promotes slack oversight and reduces opportunities for error detection.

Direct staff observation conference calls, during site visits, or frequent or routine assessments of data reports to spot discrepancies, excessive numbers, or invalid codes can all be used as forms of detection or monitoring. Site visits might not be appropriate for all disciplines. Still, without routine auditing of records, whether qualitative or quantitative, it will be challenging for investigators to confirm that data gathering is taking place in accordance with the manual's defined methods. Additionally, quality control determines the appropriate solutions, or "actions," to fix flawed data gathering procedures and reduce recurrences.

Problems with data collection, for instance, that call for immediate action include:

  • Fraud or misbehavior
  • Systematic mistakes, procedure violations 
  • Individual data items with errors
  • Issues with certain staff members or a site's performance 

Researchers are trained to include one or more secondary measures that can be used to verify the quality of information being obtained from the human subject in the social and behavioral sciences where primary data collection entails using human subjects. 

For instance, a researcher conducting a survey would be interested in learning more about the prevalence of risky behaviors among young adults as well as the social factors that influence these risky behaviors' propensity for and frequency. Let us now explore the common challenges with regard to data collection.

There are some prevalent challenges faced while collecting data, let us explore a few of them to understand them better and avoid them.

Data Quality Issues

The main threat to the broad and successful application of machine learning is poor data quality. Data quality must be your top priority if you want to make technologies like machine learning work for you. Let's talk about some of the most prevalent data quality problems in this blog article and how to fix them.

Inconsistent Data

When working with various data sources, it's conceivable that the same information will have discrepancies between sources. The differences could be in formats, units, or occasionally spellings. The introduction of inconsistent data might also occur during firm mergers or relocations. Inconsistencies in data have a tendency to accumulate and reduce the value of data if they are not continually resolved. Organizations that have heavily focused on data consistency do so because they only want reliable data to support their analytics.

Data Downtime

Data is the driving force behind the decisions and operations of data-driven businesses. However, there may be brief periods when their data is unreliable or not prepared. Customer complaints and subpar analytical outcomes are only two ways that this data unavailability can have a significant impact on businesses. A data engineer spends about 80% of their time updating, maintaining, and guaranteeing the integrity of the data pipeline. In order to ask the next business question, there is a high marginal cost due to the lengthy operational lead time from data capture to insight.

Schema modifications and migration problems are just two examples of the causes of data downtime. Data pipelines can be difficult due to their size and complexity. Data downtime must be continuously monitored, and it must be reduced through automation.

Ambiguous Data

Even with thorough oversight, some errors can still occur in massive databases or data lakes. For data streaming at a fast speed, the issue becomes more overwhelming. Spelling mistakes can go unnoticed, formatting difficulties can occur, and column heads might be deceptive. This unclear data might cause a number of problems for reporting and analytics.

Become a Data Science Expert & Get Your Dream Job

Become a Data Science Expert & Get Your Dream Job

Duplicate Data

Streaming data, local databases, and cloud data lakes are just a few of the sources of data that modern enterprises must contend with. They might also have application and system silos. These sources are likely to duplicate and overlap each other quite a bit. For instance, duplicate contact information has a substantial impact on customer experience. If certain prospects are ignored while others are engaged repeatedly, marketing campaigns suffer. The likelihood of biased analytical outcomes increases when duplicate data are present. It can also result in ML models with biased training data.

Too Much Data

While we emphasize data-driven analytics and its advantages, a data quality problem with excessive data exists. There is a risk of getting lost in an abundance of data when searching for information pertinent to your analytical efforts. Data scientists, data analysts, and business users devote 80% of their work to finding and organizing the appropriate data. With an increase in data volume, other problems with data quality become more serious, particularly when dealing with streaming data and big files or databases.

Inaccurate Data

For highly regulated businesses like healthcare, data accuracy is crucial. Given the current experience, it is more important than ever to increase the data quality for COVID-19 and later pandemics. Inaccurate information does not provide you with a true picture of the situation and cannot be used to plan the best course of action. Personalized customer experiences and marketing strategies underperform if your customer data is inaccurate.

Data inaccuracies can be attributed to a number of things, including data degradation, human mistake, and data drift. Worldwide data decay occurs at a rate of about 3% per month, which is quite concerning. Data integrity can be compromised while being transferred between different systems, and data quality might deteriorate with time.

Hidden Data

The majority of businesses only utilize a portion of their data, with the remainder sometimes being lost in data silos or discarded in data graveyards. For instance, the customer service team might not receive client data from sales, missing an opportunity to build more precise and comprehensive customer profiles. Missing out on possibilities to develop novel products, enhance services, and streamline procedures is caused by hidden data.

Finding Relevant Data

Finding relevant data is not so easy. There are several factors that we need to consider while trying to find relevant data, which include -

  • Relevant Domain
  • Relevant demographics
  • Relevant Time period and so many more factors that we need to consider while trying to find relevant data.

Data that is not relevant to our study in any of the factors render it obsolete and we cannot effectively proceed with its analysis. This could lead to incomplete research or analysis, re-collecting data again and again, or shutting down the study.

Deciding the Data to Collect

Determining what data to collect is one of the most important factors while collecting data and should be one of the first factors while collecting data. We must choose the subjects the data will cover, the sources we will be used to gather it, and the quantity of information we will require. Our responses to these queries will depend on our aims, or what we expect to achieve utilizing your data. As an illustration, we may choose to gather information on the categories of articles that website visitors between the ages of 20 and 50 most frequently access. We can also decide to compile data on the typical age of all the clients who made a purchase from your business over the previous month.

Not addressing this could lead to double work and collection of irrelevant data or ruining your study as a whole.

Dealing With Big Data

Big data refers to exceedingly massive data sets with more intricate and diversified structures. These traits typically result in increased challenges while storing, analyzing, and using additional methods of extracting results. Big data refers especially to data sets that are quite enormous or intricate that conventional data processing tools are insufficient. The overwhelming amount of data, both unstructured and structured, that a business faces on a daily basis. 

The amount of data produced by healthcare applications, the internet, social networking sites social, sensor networks, and many other businesses are rapidly growing as a result of recent technological advancements. Big data refers to the vast volume of data created from numerous sources in a variety of formats at extremely fast rates. Dealing with this kind of data is one of the many challenges of Data Collection and is a crucial step toward collecting effective data. 

Low Response and Other Research Issues

Poor design and low response rates were shown to be two issues with data collecting, particularly in health surveys that used questionnaires. This might lead to an insufficient or inadequate supply of data for the study. Creating an incentivized data collection program might be beneficial in this case to get more responses.

Now, let us look at the key steps in the data collection process.

In the Data Collection Process, there are 5 key steps. They are explained briefly below -

1. Decide What Data You Want to Gather

The first thing that we need to do is decide what information we want to gather. We must choose the subjects the data will cover, the sources we will use to gather it, and the quantity of information that we would require. For instance, we may choose to gather information on the categories of products that an average e-commerce website visitor between the ages of 30 and 45 most frequently searches for. 

2. Establish a Deadline for Data Collection

The process of creating a strategy for data collection can now begin. We should set a deadline for our data collection at the outset of our planning phase. Some forms of data we might want to continuously collect. We might want to build up a technique for tracking transactional data and website visitor statistics over the long term, for instance. However, we will track the data throughout a certain time frame if we are tracking it for a particular campaign. In these situations, we will have a schedule for when we will begin and finish gathering data. 

3. Select a Data Collection Approach

We will select the data collection technique that will serve as the foundation of our data gathering plan at this stage. We must take into account the type of information that we wish to gather, the time period during which we will receive it, and the other factors we decide on to choose the best gathering strategy.

4. Gather Information

Once our plan is complete, we can put our data collection plan into action and begin gathering data. In our DMP, we can store and arrange our data. We need to be careful to follow our plan and keep an eye on how it's doing. Especially if we are collecting data regularly, setting up a timetable for when we will be checking in on how our data gathering is going may be helpful. As circumstances alter and we learn new details, we might need to amend our plan.

5. Examine the Information and Apply Your Findings

It's time to examine our data and arrange our findings after we have gathered all of our information. The analysis stage is essential because it transforms unprocessed data into insightful knowledge that can be applied to better our marketing plans, goods, and business judgments. The analytics tools included in our DMP can be used to assist with this phase. We can put the discoveries to use to enhance our business once we have discovered the patterns and insights in our data.

Let us now look at some data collection considerations and best practices that one might follow.

We must carefully plan before spending time and money traveling to the field to gather data. While saving time and resources, effective data collection strategies can help us collect richer, more accurate, and richer data.

Below, we will be discussing some of the best practices that we can follow for the best results -

1. Take Into Account the Price of Each Extra Data Point

Once we have decided on the data we want to gather, we need to make sure to take the expense of doing so into account. Our surveyors and respondents will incur additional costs for each additional data point or survey question.

2. Plan How to Gather Each Data Piece

There is a dearth of freely accessible data. Sometimes the data is there, but we may not have access to it. For instance, unless we have a compelling cause, we cannot openly view another person's medical information. It could be challenging to measure several types of information.

Consider how time-consuming and difficult it will be to gather each piece of information while deciding what data to acquire.

3. Think About Your Choices for Data Collecting Using Mobile Devices

Mobile-based data collecting can be divided into three categories -

  • IVRS (interactive voice response technology) -  Will call the respondents and ask them questions that have already been recorded. 
  • SMS data collection - Will send a text message to the respondent, who can then respond to questions by text on their phone. 
  • Field surveyors - Can directly enter data into an interactive questionnaire while speaking to each respondent, thanks to smartphone apps.

We need to make sure to select the appropriate tool for our survey and responders because each one has its own disadvantages and advantages.

4. Carefully Consider the Data You Need to Gather

It's all too easy to get information about anything and everything, but it's crucial to only gather the information that we require. 

It is helpful to consider these 3 questions:

  • What details will be helpful?
  • What details are available?
  • What specific details do you require?

5. Remember to Consider Identifiers

Identifiers, or details describing the context and source of a survey response, are just as crucial as the information about the subject or program that we are actually researching.

In general, adding more identifiers will enable us to pinpoint our program's successes and failures with greater accuracy, but moderation is the key.

6. Data Collecting Through Mobile Devices is the Way to Go

Although collecting data on paper is still common, modern technology relies heavily on mobile devices. They enable us to gather many various types of data at relatively lower prices and are accurate as well as quick. There aren't many reasons not to pick mobile-based data collecting with the boom of low-cost Android devices that are available nowadays.

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

1. What is data collection with example?

Data collection is the process of collecting and analyzing information on relevant variables in a predetermined, methodical way so that one can respond to specific research questions, test hypotheses, and assess results. Data collection can be either qualitative or quantitative. Example: A company collects customer feedback through online surveys and social media monitoring to improve their products and services.

2. What are the primary data collection methods?

As is well known, gathering primary data is costly and time intensive. The main techniques for gathering data are observation, interviews, questionnaires, schedules, and surveys.

3. What are data collection tools?

The term "data collecting tools" refers to the tools/devices used to gather data, such as a paper questionnaire or a system for computer-assisted interviews. Tools used to gather data include case studies, checklists, interviews, occasionally observation, surveys, and questionnaires.

4. What’s the difference between quantitative and qualitative methods?

While qualitative research focuses on words and meanings, quantitative research deals with figures and statistics. You can systematically measure variables and test hypotheses using quantitative methods. You can delve deeper into ideas and experiences using qualitative methodologies.

5. What are quantitative data collection methods?

While there are numerous other ways to get quantitative information, the methods indicated above—probability sampling, interviews, questionnaire observation, and document review—are the most typical and frequently employed, whether collecting information offline or online.

6. What is mixed methods research?

User research that includes both qualitative and quantitative techniques is known as mixed methods research. For deeper user insights, mixed methods research combines insightful user data with useful statistics.

7. What are the benefits of collecting data?

Collecting data offers several benefits, including:

  • Knowledge and Insight
  • Evidence-Based Decision Making
  • Problem Identification and Solution
  • Validation and Evaluation
  • Identifying Trends and Predictions
  • Support for Research and Development
  • Policy Development
  • Quality Improvement
  • Personalization and Targeting
  • Knowledge Sharing and Collaboration

8. What’s the difference between reliability and validity?

Reliability is about consistency and stability, while validity is about accuracy and appropriateness. Reliability focuses on the consistency of results, while validity focuses on whether the results are actually measuring what they are intended to measure. Both reliability and validity are crucial considerations in research to ensure the trustworthiness and meaningfulness of the collected data and measurements.

Are you thinking about pursuing a career in the field of data science? Simplilearn's Data Science courses are designed to provide you with the necessary skills and expertise to excel in this rapidly changing field. Here's a detailed comparison for your reference:

Program Name Data Scientist Master's Program Post Graduate Program In Data Science Post Graduate Program In Data Science Geo All Geos All Geos Not Applicable in US University Simplilearn Purdue Caltech Course Duration 11 Months 11 Months 11 Months Coding Experience Required Basic Basic No Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more 8+ skills including Exploratory Data Analysis, Descriptive Statistics, Inferential Statistics, and more 8+ skills including Supervised & Unsupervised Learning Deep Learning Data Visualization, and more Additional Benefits Applied Learning via Capstone and 25+ Data Science Projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

We live in the Data Age, and if you want a career that fully takes advantage of this, you should consider a career in data science. Simplilearn offers a Caltech Post Graduate Program in Data Science  that will train you in everything you need to know to secure the perfect position. This Data Science PG program is ideal for all working professionals, covering job-critical topics like R, Python programming , machine learning algorithms , NLP concepts , and data visualization with Tableau in great detail. This is all provided via our interactive learning model with live sessions by global practitioners, practical labs, and industry projects.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Recommended Reads

Managing Data

Difference Between Collection and Collections in Java

An Ultimate One-Stop Solution Guide to Collections in C# Programming With Examples

Data Science Career Guide: A Comprehensive Playbook To Becoming A Data Scientist

Capped Collection in MongoDB

What Are Java Collections and How to Implement Them?

Get Affiliated Certifications with Live Class programs

Data scientist.

  • Add the IBM Advantage to your Learning
  • 25 Industry-relevant Projects and Integrated labs

Caltech Data Sciences-Bootcamp

  • Exclusive visit to Caltech’s Robotics Lab

Caltech Post Graduate Program in Data Science

  • Earn a program completion certificate from Caltech CTME
  • Curriculum delivered in live online sessions by industry experts
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Data Collection – Methods Types and Examples

Data Collection – Methods Types and Examples

Table of Contents

Data collection

Data Collection

Definition:

Data collection is the process of gathering and collecting information from various sources to analyze and make informed decisions based on the data collected. This can involve various methods, such as surveys, interviews, experiments, and observation.

In order for data collection to be effective, it is important to have a clear understanding of what data is needed and what the purpose of the data collection is. This can involve identifying the population or sample being studied, determining the variables to be measured, and selecting appropriate methods for collecting and recording data.

Types of Data Collection

Types of Data Collection are as follows:

Primary Data Collection

Primary data collection is the process of gathering original and firsthand information directly from the source or target population. This type of data collection involves collecting data that has not been previously gathered, recorded, or published. Primary data can be collected through various methods such as surveys, interviews, observations, experiments, and focus groups. The data collected is usually specific to the research question or objective and can provide valuable insights that cannot be obtained from secondary data sources. Primary data collection is often used in market research, social research, and scientific research.

Secondary Data Collection

Secondary data collection is the process of gathering information from existing sources that have already been collected and analyzed by someone else, rather than conducting new research to collect primary data. Secondary data can be collected from various sources, such as published reports, books, journals, newspapers, websites, government publications, and other documents.

Qualitative Data Collection

Qualitative data collection is used to gather non-numerical data such as opinions, experiences, perceptions, and feelings, through techniques such as interviews, focus groups, observations, and document analysis. It seeks to understand the deeper meaning and context of a phenomenon or situation and is often used in social sciences, psychology, and humanities. Qualitative data collection methods allow for a more in-depth and holistic exploration of research questions and can provide rich and nuanced insights into human behavior and experiences.

Quantitative Data Collection

Quantitative data collection is a used to gather numerical data that can be analyzed using statistical methods. This data is typically collected through surveys, experiments, and other structured data collection methods. Quantitative data collection seeks to quantify and measure variables, such as behaviors, attitudes, and opinions, in a systematic and objective way. This data is often used to test hypotheses, identify patterns, and establish correlations between variables. Quantitative data collection methods allow for precise measurement and generalization of findings to a larger population. It is commonly used in fields such as economics, psychology, and natural sciences.

Data Collection Methods

Data Collection Methods are as follows:

Surveys involve asking questions to a sample of individuals or organizations to collect data. Surveys can be conducted in person, over the phone, or online.

Interviews involve a one-on-one conversation between the interviewer and the respondent. Interviews can be structured or unstructured and can be conducted in person or over the phone.

Focus Groups

Focus groups are group discussions that are moderated by a facilitator. Focus groups are used to collect qualitative data on a specific topic.

Observation

Observation involves watching and recording the behavior of people, objects, or events in their natural setting. Observation can be done overtly or covertly, depending on the research question.

Experiments

Experiments involve manipulating one or more variables and observing the effect on another variable. Experiments are commonly used in scientific research.

Case Studies

Case studies involve in-depth analysis of a single individual, organization, or event. Case studies are used to gain detailed information about a specific phenomenon.

Secondary Data Analysis

Secondary data analysis involves using existing data that was collected for another purpose. Secondary data can come from various sources, such as government agencies, academic institutions, or private companies.

How to Collect Data

The following are some steps to consider when collecting data:

  • Define the objective : Before you start collecting data, you need to define the objective of the study. This will help you determine what data you need to collect and how to collect it.
  • Identify the data sources : Identify the sources of data that will help you achieve your objective. These sources can be primary sources, such as surveys, interviews, and observations, or secondary sources, such as books, articles, and databases.
  • Determine the data collection method : Once you have identified the data sources, you need to determine the data collection method. This could be through online surveys, phone interviews, or face-to-face meetings.
  • Develop a data collection plan : Develop a plan that outlines the steps you will take to collect the data. This plan should include the timeline, the tools and equipment needed, and the personnel involved.
  • Test the data collection process: Before you start collecting data, test the data collection process to ensure that it is effective and efficient.
  • Collect the data: Collect the data according to the plan you developed in step 4. Make sure you record the data accurately and consistently.
  • Analyze the data: Once you have collected the data, analyze it to draw conclusions and make recommendations.
  • Report the findings: Report the findings of your data analysis to the relevant stakeholders. This could be in the form of a report, a presentation, or a publication.
  • Monitor and evaluate the data collection process: After the data collection process is complete, monitor and evaluate the process to identify areas for improvement in future data collection efforts.
  • Ensure data quality: Ensure that the collected data is of high quality and free from errors. This can be achieved by validating the data for accuracy, completeness, and consistency.
  • Maintain data security: Ensure that the collected data is secure and protected from unauthorized access or disclosure. This can be achieved by implementing data security protocols and using secure storage and transmission methods.
  • Follow ethical considerations: Follow ethical considerations when collecting data, such as obtaining informed consent from participants, protecting their privacy and confidentiality, and ensuring that the research does not cause harm to participants.
  • Use appropriate data analysis methods : Use appropriate data analysis methods based on the type of data collected and the research objectives. This could include statistical analysis, qualitative analysis, or a combination of both.
  • Record and store data properly: Record and store the collected data properly, in a structured and organized format. This will make it easier to retrieve and use the data in future research or analysis.
  • Collaborate with other stakeholders : Collaborate with other stakeholders, such as colleagues, experts, or community members, to ensure that the data collected is relevant and useful for the intended purpose.

Applications of Data Collection

Data collection methods are widely used in different fields, including social sciences, healthcare, business, education, and more. Here are some examples of how data collection methods are used in different fields:

  • Social sciences : Social scientists often use surveys, questionnaires, and interviews to collect data from individuals or groups. They may also use observation to collect data on social behaviors and interactions. This data is often used to study topics such as human behavior, attitudes, and beliefs.
  • Healthcare : Data collection methods are used in healthcare to monitor patient health and track treatment outcomes. Electronic health records and medical charts are commonly used to collect data on patients’ medical history, diagnoses, and treatments. Researchers may also use clinical trials and surveys to collect data on the effectiveness of different treatments.
  • Business : Businesses use data collection methods to gather information on consumer behavior, market trends, and competitor activity. They may collect data through customer surveys, sales reports, and market research studies. This data is used to inform business decisions, develop marketing strategies, and improve products and services.
  • Education : In education, data collection methods are used to assess student performance and measure the effectiveness of teaching methods. Standardized tests, quizzes, and exams are commonly used to collect data on student learning outcomes. Teachers may also use classroom observation and student feedback to gather data on teaching effectiveness.
  • Agriculture : Farmers use data collection methods to monitor crop growth and health. Sensors and remote sensing technology can be used to collect data on soil moisture, temperature, and nutrient levels. This data is used to optimize crop yields and minimize waste.
  • Environmental sciences : Environmental scientists use data collection methods to monitor air and water quality, track climate patterns, and measure the impact of human activity on the environment. They may use sensors, satellite imagery, and laboratory analysis to collect data on environmental factors.
  • Transportation : Transportation companies use data collection methods to track vehicle performance, optimize routes, and improve safety. GPS systems, on-board sensors, and other tracking technologies are used to collect data on vehicle speed, fuel consumption, and driver behavior.

Examples of Data Collection

Examples of Data Collection are as follows:

  • Traffic Monitoring: Cities collect real-time data on traffic patterns and congestion through sensors on roads and cameras at intersections. This information can be used to optimize traffic flow and improve safety.
  • Social Media Monitoring : Companies can collect real-time data on social media platforms such as Twitter and Facebook to monitor their brand reputation, track customer sentiment, and respond to customer inquiries and complaints in real-time.
  • Weather Monitoring: Weather agencies collect real-time data on temperature, humidity, air pressure, and precipitation through weather stations and satellites. This information is used to provide accurate weather forecasts and warnings.
  • Stock Market Monitoring : Financial institutions collect real-time data on stock prices, trading volumes, and other market indicators to make informed investment decisions and respond to market fluctuations in real-time.
  • Health Monitoring : Medical devices such as wearable fitness trackers and smartwatches can collect real-time data on a person’s heart rate, blood pressure, and other vital signs. This information can be used to monitor health conditions and detect early warning signs of health issues.

Purpose of Data Collection

The purpose of data collection can vary depending on the context and goals of the study, but generally, it serves to:

  • Provide information: Data collection provides information about a particular phenomenon or behavior that can be used to better understand it.
  • Measure progress : Data collection can be used to measure the effectiveness of interventions or programs designed to address a particular issue or problem.
  • Support decision-making : Data collection provides decision-makers with evidence-based information that can be used to inform policies, strategies, and actions.
  • Identify trends : Data collection can help identify trends and patterns over time that may indicate changes in behaviors or outcomes.
  • Monitor and evaluate : Data collection can be used to monitor and evaluate the implementation and impact of policies, programs, and initiatives.

When to use Data Collection

Data collection is used when there is a need to gather information or data on a specific topic or phenomenon. It is typically used in research, evaluation, and monitoring and is important for making informed decisions and improving outcomes.

Data collection is particularly useful in the following scenarios:

  • Research : When conducting research, data collection is used to gather information on variables of interest to answer research questions and test hypotheses.
  • Evaluation : Data collection is used in program evaluation to assess the effectiveness of programs or interventions, and to identify areas for improvement.
  • Monitoring : Data collection is used in monitoring to track progress towards achieving goals or targets, and to identify any areas that require attention.
  • Decision-making: Data collection is used to provide decision-makers with information that can be used to inform policies, strategies, and actions.
  • Quality improvement : Data collection is used in quality improvement efforts to identify areas where improvements can be made and to measure progress towards achieving goals.

Characteristics of Data Collection

Data collection can be characterized by several important characteristics that help to ensure the quality and accuracy of the data gathered. These characteristics include:

  • Validity : Validity refers to the accuracy and relevance of the data collected in relation to the research question or objective.
  • Reliability : Reliability refers to the consistency and stability of the data collection process, ensuring that the results obtained are consistent over time and across different contexts.
  • Objectivity : Objectivity refers to the impartiality of the data collection process, ensuring that the data collected is not influenced by the biases or personal opinions of the data collector.
  • Precision : Precision refers to the degree of accuracy and detail in the data collected, ensuring that the data is specific and accurate enough to answer the research question or objective.
  • Timeliness : Timeliness refers to the efficiency and speed with which the data is collected, ensuring that the data is collected in a timely manner to meet the needs of the research or evaluation.
  • Ethical considerations : Ethical considerations refer to the ethical principles that must be followed when collecting data, such as ensuring confidentiality and obtaining informed consent from participants.

Advantages of Data Collection

There are several advantages of data collection that make it an important process in research, evaluation, and monitoring. These advantages include:

  • Better decision-making : Data collection provides decision-makers with evidence-based information that can be used to inform policies, strategies, and actions, leading to better decision-making.
  • Improved understanding: Data collection helps to improve our understanding of a particular phenomenon or behavior by providing empirical evidence that can be analyzed and interpreted.
  • Evaluation of interventions: Data collection is essential in evaluating the effectiveness of interventions or programs designed to address a particular issue or problem.
  • Identifying trends and patterns: Data collection can help identify trends and patterns over time that may indicate changes in behaviors or outcomes.
  • Increased accountability: Data collection increases accountability by providing evidence that can be used to monitor and evaluate the implementation and impact of policies, programs, and initiatives.
  • Validation of theories: Data collection can be used to test hypotheses and validate theories, leading to a better understanding of the phenomenon being studied.
  • Improved quality: Data collection is used in quality improvement efforts to identify areas where improvements can be made and to measure progress towards achieving goals.

Limitations of Data Collection

While data collection has several advantages, it also has some limitations that must be considered. These limitations include:

  • Bias : Data collection can be influenced by the biases and personal opinions of the data collector, which can lead to inaccurate or misleading results.
  • Sampling bias : Data collection may not be representative of the entire population, resulting in sampling bias and inaccurate results.
  • Cost : Data collection can be expensive and time-consuming, particularly for large-scale studies.
  • Limited scope: Data collection is limited to the variables being measured, which may not capture the entire picture or context of the phenomenon being studied.
  • Ethical considerations : Data collection must follow ethical principles to protect the rights and confidentiality of the participants, which can limit the type of data that can be collected.
  • Data quality issues: Data collection may result in data quality issues such as missing or incomplete data, measurement errors, and inconsistencies.
  • Limited generalizability : Data collection may not be generalizable to other contexts or populations, limiting the generalizability of the findings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Evaluating Research

Evaluating Research – Process, Examples and...

Research Questions

Research Questions – Types, Examples and Writing...

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

collection of data assignment

Home Market Research

Data Collection: What It Is, Methods & Tools + Examples

collection of data assignment

Let’s face it, no one wants to make decisions based on guesswork or gut feelings. The most important objective of data collection is to ensure that the data gathered is reliable and packed to the brim with juicy insights that can be analyzed and turned into data-driven decisions. There’s nothing better than good statistical analysis .

LEARN ABOUT: Level of Analysis

Collecting high-quality data is essential for conducting market research, analyzing user behavior, or just trying to get a handle on business operations. With the right approach and a few handy tools, gathering reliable and informative data.

So, let’s get ready to collect some data because when it comes to data collection, it’s all about the details.

Content Index

What is Data Collection?

Data collection methods, data collection examples, reasons to conduct online research and data collection, conducting customer surveys for data collection to multiply sales, steps to effectively conduct an online survey for data collection, survey design for data collection.

Data collection is the procedure of collecting, measuring, and analyzing accurate insights for research using standard validated techniques.

Put simply, data collection is the process of gathering information for a specific purpose. It can be used to answer research questions, make informed business decisions, or improve products and services.

To collect data, we must first identify what information we need and how we will collect it. We can also evaluate a hypothesis based on collected data. In most cases, data collection is the primary and most important step for research. The approach to data collection is different for different fields of study, depending on the required information.

LEARN ABOUT: Action Research

There are many ways to collect information when doing research. The data collection methods that the researcher chooses will depend on the research question posed. Some data collection methods include surveys, interviews, tests, physiological evaluations, observations, reviews of existing records, and biological samples. Let’s explore them.

LEARN ABOUT: Best Data Collection Tools

Data Collection Methods

Phone vs. Online vs. In-Person Interviews

Essentially there are four choices for data collection – in-person interviews, mail, phone, and online. There are pros and cons to each of these modes.

  • Pros: In-depth and a high degree of confidence in the data
  • Cons: Time-consuming, expensive, and can be dismissed as anecdotal
  • Pros: Can reach anyone and everyone – no barrier
  • Cons: Expensive, data collection errors, lag time
  • Pros: High degree of confidence in the data collected, reach almost anyone
  • Cons: Expensive, cannot self-administer, need to hire an agency
  • Pros: Cheap, can self-administer, very low probability of data errors
  • Cons: Not all your customers might have an email address/be on the internet, customers may be wary of divulging information online.

In-person interviews always are better, but the big drawback is the trap you might fall into if you don’t do them regularly. It is expensive to regularly conduct interviews and not conducting enough interviews might give you false positives. Validating your research is almost as important as designing and conducting it.

We’ve seen many instances where after the research is conducted – if the results do not match up with the “gut-feel” of upper management, it has been dismissed off as anecdotal and a “one-time” phenomenon. To avoid such traps, we strongly recommend that data-collection be done on an “ongoing and regular” basis.

LEARN ABOUT: Research Process Steps

This will help you compare and analyze the change in perceptions according to marketing for your products/services. The other issue here is sample size. To be confident with your research, you must interview enough people to weed out the fringe elements.

A couple of years ago there was a lot of discussion about online surveys and their statistical analysis plan . The fact that not every customer had internet connectivity was one of the main concerns.

LEARN ABOUT:   Statistical Analysis Methods

Although some of the discussions are still valid, the reach of the internet as a means of communication has become vital in the majority of customer interactions. According to the US Census Bureau, the number of households with computers has doubled between 1997 and 2001.

Learn more: Quantitative Market Research

In 2001 nearly 50% of households had a computer. Nearly 55% of all households with an income of more than 35,000 have internet access, which jumps to 70% for households with an annual income of 50,000. This data is from the US Census Bureau for 2001.

There are primarily three modes of data collection that can be employed to gather feedback – Mail, Phone, and Online. The method actually used for data collection is really a cost-benefit analysis. There is no slam-dunk solution but you can use the table below to understand the risks and advantages associated with each of the mediums:

Keep in mind, the reach here is defined as “All U.S. Households.” In most cases, you need to look at how many of your customers are online and determine. If all your customers have email addresses, you have a 100% reach of your customers.

Another important thing to keep in mind is the ever-increasing dominance of cellular phones over landline phones. United States FCC rules prevent automated dialing and calling cellular phone numbers and there is a noticeable trend towards people having cellular phones as the only voice communication device.

This introduces the inability to reach cellular phone customers who are dropping home phone lines in favor of going entirely wireless. Even if automated dialing is not used, another FCC rule prohibits from phoning anyone who would have to pay for the call.

Learn more: Qualitative Market Research

Multi-Mode Surveys

Surveys, where the data is collected via different modes (online, paper, phone etc.), is also another way of going. It is fairly straightforward and easy to have an online survey and have data-entry operators to enter in data (from the phone as well as paper surveys) into the system. The same system can also be used to collect data directly from the respondents.

Learn more: Survey Research

Data collection is an important aspect of research. Let’s consider an example of a mobile manufacturer, company X, which is launching a new product variant. To conduct research about features, price range, target market, competitor analysis, etc. data has to be collected from appropriate sources.

The marketing team can conduct various data collection activities such as online surveys or focus groups .

The survey should have all the right questions about features and pricing, such as “What are the top 3 features expected from an upcoming product?” or “How much are your likely to spend on this product?” or “Which competitors provide similar products?” etc.

For conducting a focus group, the marketing team should decide the participants and the mediator. The topic of discussion and objective behind conducting a focus group should be clarified beforehand to conduct a conclusive discussion.

Data collection methods are chosen depending on the available resources. For example, conducting questionnaires and surveys would require the least resources, while focus groups require moderately high resources.

Feedback is a vital part of any organization’s growth. Whether you conduct regular focus groups to elicit information from key players or, your account manager calls up all your marquee  accounts to find out how things are going – essentially they are all processes to find out from your customers’ eyes – How are we doing? What can we do better?

Online surveys are just another medium to collect feedback from your customers , employees and anyone your business interacts with. With the advent of Do-It-Yourself tools for online surveys, data collection on the internet has become really easy, cheap and effective.

Learn more:  Online Research

It is a well-established marketing fact that acquiring a new customer is 10 times more difficult and expensive than retaining an existing one. This is one of the fundamental driving forces behind the extensive adoption and interest in CRM and related customer retention tactics.

In a research study conducted by Rice University Professor Dr. Paul Dholakia and Dr. Vicki Morwitz, published in Harvard Business Review, the experiment inferred that the simple fact of asking customers how an organization was performing by itself to deliver results proved to be an effective customer retention strategy.

In the research study, conducted over the course of a year, one set of customers were sent out a satisfaction and opinion survey and the other set was not surveyed. In the next one year, the group that took the survey saw twice the number of people continuing and renewing their loyalty towards the organization data .

Learn more: Research Design

The research study provided a couple of interesting reasons on the basis of consumer psychology, behind this phenomenon:

  • Satisfaction surveys boost the customers’ desire to be coddled and induce positive feelings. This crops from a section of the human psychology that intends to “appreciate” a product or service they already like or prefer. The survey feedback collection method is solely a medium to convey this. The survey is a vehicle to “interact” with the company and reinforces the customer’s commitment to the company.
  • Surveys may increase awareness of auxiliary products and services. Surveys can be considered modes of both inbound as well as outbound communication. Surveys are generally considered to be a data collection and analysis source. Most people are unaware of the fact that consumer surveys can also serve as a medium for distributing data. It is important to note a few caveats here.
  • In most countries, including the US, “selling under the guise of research” is illegal. b. However, we all know that information is distributed while collecting information. c. Other disclaimers may be included in the survey to ensure users are aware of this fact. For example: “We will collect your opinion and inform you about products and services that have come online in the last year…”
  • Induced Judgments:  The entire procedure of asking people for their feedback can prompt them to build an opinion on something they otherwise would not have thought about. This is a very underlying yet powerful argument that can be compared to the “Product Placement” strategy currently used for marketing products in mass media like movies and television shows. One example is the extensive and exclusive use of the “mini-Cooper” in the blockbuster movie “Italian Job.” This strategy is questionable and should be used with great caution.

Surveys should be considered as a critical tool in the customer journey dialog. The best thing about surveys is its ability to carry “bi-directional” information. The research conducted by Paul Dholakia and Vicki Morwitz shows that surveys not only get you the information that is critical for your business, but also enhances and builds upon the established relationship you have with your customers.

Recent technological advances have made it incredibly easy to conduct real-time surveys and  opinion polls . Online tools make it easy to frame questions and answers and create surveys on the Web. Distributing surveys via email, website links or even integration with online CRM tools like Salesforce.com have made online surveying a quick-win solution.

So, you’ve decided to conduct an online survey. There are a few questions in your mind that you would like answered, and you are looking for a fast and inexpensive way to find out more about your customers, clients, etc.

First and foremost thing you need to decide what the smart objectives of the study are. Ensure that you can phrase these objectives as questions or measurements. If you can’t, you are better off looking at other data sources like focus groups and other qualitative methods . The data collected via online surveys is dominantly quantitative in nature.

Review the basic objectives of the study. What are you trying to discover? What actions do you  want to take as a result of the survey? –  Answers to these questions help in validating collected data. Online surveys are just one way of collecting and quantifying data .

Learn more: Qualitative Data & Qualitative Data Collection Methods

  • Visualize all of the relevant information items you would like to have. What will the output survey research report look like? What charts and graphs will be prepared? What information do you need to be assured that action is warranted?
  • Assign ranks to each topic (1 and 2) according to their priority, including the most important topics first. Revisit these items again to ensure that the objectives, topics, and information you need are appropriate. Remember, you can’t solve the research problem if you ask the wrong questions.
  • How easy or difficult is it for the respondent to provide information on each topic? If it is difficult, is there an alternative medium to gain insights by asking a different question? This is probably the most important step. Online surveys have to be Precise, Clear and Concise. Due to the nature of the internet and the fluctuations involved, if your questions are too difficult to understand, the survey dropout rate will be high.
  • Create a sequence for the topics that are unbiased. Make sure that the questions asked first do not bias the results of the next questions. Sometimes providing too much information, or disclosing purpose of the study can create bias. Once you have a series of decided topics, you can have a basic structure of a survey. It is always advisable to add an “Introductory” paragraph before the survey to explain the project objective and what is expected of the respondent. It is also sensible to have a “Thank You” text as well as information about where to find the results of the survey when they are published.
  • Page Breaks – The attention span of respondents can be very low when it comes to a long scrolling survey. Add page breaks as wherever possible. Having said that, a single question per page can also hamper response rates as it increases the time to complete the survey as well as increases the chances for dropouts.
  • Branching – Create smart and effective surveys with the implementation of branching wherever required. Eliminate the use of text such as, “If you answered No to Q1 then Answer Q4” – this leads to annoyance amongst respondents which result in increase survey dropout rates. Design online surveys using the branching logic so that appropriate questions are automatically routed based on previous responses.
  • Write the questions . Initially, write a significant number of survey questions out of which you can use the one which is best suited for the survey. Divide the survey into sections so that respondents do not get confused seeing a long list of questions.
  • Sequence the questions so that they are unbiased.
  • Repeat all of the steps above to find any major holes. Are the questions really answered? Have someone review it for you.
  • Time the length of the survey. A survey should take less than five minutes. At three to four research questions per minute, you are limited to about 15 questions. One open end text question counts for three multiple choice questions. Most online software tools will record the time taken for the respondents to answer questions.
  • Include a few open-ended survey questions that support your survey object. This will be a type of feedback survey.
  • Send an email to the project survey to your test group and then email the feedback survey afterward.
  • This way, you can have your test group provide their opinion about the functionality as well as usability of your project survey by using the feedback survey.
  • Make changes to your questionnaire based on the received feedback.
  • Send the survey out to all your respondents!

Online surveys have, over the course of time, evolved into an effective alternative to expensive mail or telephone surveys. However, you must be aware of a few conditions that need to be met for online surveys. If you are trying to survey a sample representing the target population, please remember that not everyone is online.

Moreover, not everyone is receptive to an online survey also. Generally, the demographic segmentation of younger individuals is inclined toward responding to an online survey.

Learn More: Examples of Qualitarive Data in Education

Good survey design is crucial for accurate data collection. From question-wording to response options, let’s explore how to create effective surveys that yield valuable insights with our tips to survey design.

  • Writing Great Questions for data collection

Writing great questions can be considered an art. Art always requires a significant amount of hard work, practice, and help from others.

The questions in a survey need to be clear, concise, and unbiased. A poorly worded question or a question with leading language can result in inaccurate or irrelevant responses, ultimately impacting the data’s validity.

Moreover, the questions should be relevant and specific to the research objectives. Questions that are irrelevant or do not capture the necessary information can lead to incomplete or inconsistent responses too.

  • Avoid loaded or leading words or questions

A small change in content can produce effective results. Words such as could , should and might are all used for almost the same purpose, but may produce a 20% difference in agreement to a question. For example, “The management could.. should.. might.. have shut the factory”.

Intense words such as – prohibit or action, representing control or action, produce similar results. For example,  “Do you believe Donald Trump should prohibit insurance companies from raising rates?”.

Sometimes the content is just biased. For instance, “You wouldn’t want to go to Rudolpho’s Restaurant for the organization’s annual party, would you?”

  • Misplaced questions

Questions should always reference the intended context, and questions placed out of order or without its requirement should be avoided. Generally, a funnel approach should be implemented – generic questions should be included in the initial section of the questionnaire as a warm-up and specific ones should follow. Toward the end, demographic or geographic questions should be included.

  • Mutually non-overlapping response categories

Multiple-choice answers should be mutually unique to provide distinct choices. Overlapping answer options frustrate the respondent and make interpretation difficult at best. Also, the questions should always be precise.

For example: “Do you like water juice?”

This question is vague. In which terms is the liking for orange juice is to be rated? – Sweetness, texture, price, nutrition etc.

  • Avoid the use of confusing/unfamiliar words

Asking about industry-related terms such as caloric content, bits, bytes, MBS , as well as other terms and acronyms can confuse respondents . Ensure that the audience understands your language level, terminology, and, above all, the question you ask.

  • Non-directed questions give respondents excessive leeway

In survey design for data collection, non-directed questions can give respondents excessive leeway, which can lead to vague and unreliable data. These types of questions are also known as open-ended questions, and they do not provide any structure for the respondent to follow.

For instance, a non-directed question like “ What suggestions do you have for improving our shoes?” can elicit a wide range of answers, some of which may not be relevant to the research objectives. Some respondents may give short answers, while others may provide lengthy and detailed responses, making comparing and analyzing the data challenging.

To avoid these issues, it’s essential to ask direct questions that are specific and have a clear structure. Closed-ended questions, for example, offer structured response options and can be easier to analyze as they provide a quantitative measure of respondents’ opinions.

  • Never force questions

There will always be certain questions that cross certain privacy rules. Since privacy is an important issue for most people, these questions should either be eliminated from the survey or not be kept as mandatory. Survey questions about income, family income, status, religious and political beliefs, etc., should always be avoided as they are considered to be intruding, and respondents can choose not to answer them.

  • Unbalanced answer options in scales

Unbalanced answer options in scales such as Likert Scale and Semantic Scale may be appropriate for some situations and biased in others. When analyzing a pattern in eating habits, a study used a quantity scale that made obese people appear in the middle of the scale with the polar ends reflecting a state where people starve and an irrational amount to consume. There are cases where we usually do not expect poor service, such as hospitals.

  • Questions that cover two points

In survey design for data collection, questions that cover two points can be problematic for several reasons. These types of questions are often called “double-barreled” questions and can cause confusion for respondents, leading to inaccurate or irrelevant data.

For instance, a question like “Do you like the food and the service at the restaurant?” covers two points, the food and the service, and it assumes that the respondent has the same opinion about both. If the respondent only liked the food, their opinion of the service could affect their answer.

It’s important to ask one question at a time to avoid confusion and ensure that the respondent’s answer is focused and accurate. This also applies to questions with multiple concepts or ideas. In these cases, it’s best to break down the question into multiple questions that address each concept or idea separately.

  • Dichotomous questions

Dichotomous questions are used in case you want a distinct answer, such as: Yes/No or Male/Female . For example, the question “Do you think this candidate will win the election?” can be Yes or No.

  • Avoid the use of long questions

The use of long questions will definitely increase the time taken for completion, which will generally lead to an increase in the survey dropout rate. Multiple-choice questions are the longest and most complex, and open-ended questions are the shortest and easiest to answer.

Data collection is an essential part of the research process, whether you’re conducting scientific experiments, market research, or surveys. The methods and tools used for data collection will vary depending on the research type, the sample size required, and the resources available.

Several data collection methods include surveys, observations, interviews, and focus groups. We learn each method has advantages and disadvantages, and choosing the one that best suits the research goals is important.

With the rise of technology, many tools are now available to facilitate data collection, including online survey software and data visualization tools. These tools can help researchers collect, store, and analyze data more efficiently, providing greater results and accuracy.

By understanding the various methods and tools available for data collection, we can develop a solid foundation for conducting research. With these research skills , we can make informed decisions, solve problems, and contribute to advancing our understanding of the world around us.

Analyze your survey data to gauge in-depth market drivers, including competitive intelligence, purchasing behavior, and price sensitivity, with QuestionPro.

You will obtain accurate insights with various techniques, including conjoint analysis, MaxDiff analysis, sentiment analysis, TURF analysis, heatmap analysis, etc. Export quality data to external in-depth analysis tools such as SPSS and R Software, and integrate your research with external business applications. Everything you need for your data collection. Start today for free!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

quantitative data analysis software

10 Quantitative Data Analysis Software for Every Data Scientist

Apr 18, 2024

Enterprise Feedback Management software

11 Best Enterprise Feedback Management Software in 2024

online reputation management software

17 Best Online Reputation Management Software in 2024

Apr 17, 2024

customer satisfaction survey software

Top 11 Customer Satisfaction Survey Software in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Logo for New Prairie Press Open Book Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

5 Collecting Data in Your Classroom

ESSENTIAL QUESTIONS

  • What sort of methodological considerations are necessary to collect data in your educational context?
  • What methods of data collection will be most effective for your study?
  • What are the affordances and limitations associated with your data collection methods?
  • What does it mean to triangulate data, and why is it necessary?

As you develop an action plan for your action research project, you will be thinking about the primary task of conducting research, and probably contemplating the data you will collect. It is likely you have asked yourself questions related to the methods you will be using, how you will organize the data collection, and how each piece of data is related within the larger project. This chapter will help you think through these questions.

Data Collection

The data collection methods used in educational research have originated from a variety of disciplines (anthropology, history, psychology, sociology), which has resulted in a variety of research frameworks to draw upon. As discussed in the previous chapter, the challenge for educator-researchers is to develop a research plan and related activities that are focused and manageable to study. While human beings like structure and definitions, especially when we encounter new experiences, educators-as-researchers frequently disregard the accepted frameworks related to research and rely on their own subjective knowledge from their own pedagogical experiences when taking on the role of educator-researcher in educational settings. Relying on subjective knowledge enables teachers to engage more effectively as researchers in their educational context. Educator-researchers especially rely on this subjective knowledge in educational contexts to modify their data collection methodologies. Subjective knowledge negotiates the traditional research frameworks with the data collection possibilities of their practice, while also considering their unique educational context. This empowers educators as researchers, utilizing action research, to be powerful agents for change in educational contexts.

Thinking about Types of Data

Whether the research design is qualitative, quantitative or mixed-methods, it will determine the methods or ways you use to collect data. Qualitative research designs focus on collecting data that is relational, interpretive, subjective, and inductive; whereas a typical quantitative study, collects data that are deductive, statistical, and objective.

In contrast, qualitative data is often in the form of language, while quantitative data typically involves numbers. Quantitative researchers require large numbers of participants for validity, while qualitative researchers use a smaller number of participants, and can even use one (Hatch, 2002). In the past, quantitative and qualitative educational researchers rarely interacted, sometimes holding contempt for each other’s work; and even published articles in separate journals based on having distinct theoretical orientations in terms of data collection. Overall, there is a greater appreciation for both quantitative and qualitative approaches, with scholars finding distinct value in each approach, yet in many circles the debate continues over which approach is more beneficial for educational research and in educational contexts.

The goal of qualitative data collection is to build a complex and nuanced description of social or human problems from multiple perspectives. The flexibility and ability to use a variety of data collection techniques encompasses a distinct stance on research. Qualitative researchers are able to capture conversations and everyday language, as well as situational attitudes and beliefs. Qualitative data collection is able to be fitted to the study, with the goal of collecting the most authentic data, not necessarily the most objective. To researchers who strictly use quantitative methods, qualitative methods may seem wholly unstructured, eclectic, and idiosyncratic; however, for qualitative researchers these characteristics are advantageous to their purpose. Quantitative research depends upon structure and is bounded to find relationship among variables and units of measurement. Quantitative research helps make sense of large amounts of data. Both quantitative and qualitative research help us address education challenges by better identifying what is happening, with the goal of identifying why it is happening, and how we can address it.

Most educator-researchers who engage in research projects in schools and classrooms utilize qualitative methodologies for their data collection. Educator-researchers also use mixed methods that focus on qualitative methods, but also use quantitative methods, such as surveys, to provide a multidimensional approach to inquiring about their topic. While qualitative methods may feel more comfortable, there is a methodological rationale for using quantitative research.

Research methodologists use two distinct forms of logic to describe research: induction and deduction. Inductive approaches are focused on developing new or emerging theories, by explaining the accumulation of evidence that provides meaning to similar circumstances. Deductive approaches move in the opposite direction, and create meaning about a particular situation by reasoning from a general idea or theory about the particular circumstances. While qualitative approaches are inductive – observe and then generate theories, for example – qualitative researchers will typically initiate studies with some preconceived notions of potential theories to support their work.

Flexible Research Design

A researcher’s decisions about data collection and activities involve a personal choice, yet the choice of data sources must be responsive to the proposed project and topic. Logically, researchers will use whatever validated methods help them to address the issue they are researching and will develop a research plan around activities to implement those methods. While a research plan is important to conducting valid research in schools and classrooms, a research plan should also be flexible in design to allow data to emerge and find the best data to address research questions. In this way, a research plan is recommended, but data collection methods are not always known in advance. As you, the educator-researcher, interacts with participants, you may find it necessary to continue the research with additional data sources to better address the question at the center of your research. When educators are researchers and a participant in their study, it is especially important to keep an open mind to the wide range of research methodologies. All-in-all educator-researchers should understand that there are varied and multiple paths to move from research questions to addressing those questions.

Mixed Methods

As mentioned above, mixed methods is the use of both qualitative and quantitative methods. Researchers generally use mixed methods to clarify findings from the initial method of data collection. In mixed-methods research, the educator-researcher has increased flexibility in data collection. Mixed methods studies often result in a combination of precise measurements (e.g., grades, test scores, survey, etc.) along with in-depth qualitative data that provide meaningful detail to those measurements. The key advantage of using mixed methods is that quantitative details enhance qualitative data sources that involve conclusions and use terms such as usually, some, or most which can be substituted with a number or quantity, such as percentages or averages, or the mean, the median, and/or the mode. One challenge to educator-researchers is that mixed methods require more time and resources to complete the study, and more familiarity about both qualitative and quantitative data collection methods.

Mixed methods in educator research, even if quantitative methods are only used minimally, provide an opportunity to clarify findings, fill gaps in understanding, and cross-check data. For example, if you are looking at the use of math journals to better engage students and improve their math scores, it would be helpful to understand their abilities in math and reading before analyzing the math journals. Therefore, looking at their test scores might give you some nuanced understanding of why some students improved more than others after using the math journals. Pre- and post-surveys would also provide valuable information in terms of students’ attitudes and beliefs about math and writing. In line with thinking about pre- and post-surveys, some researchers suggest using either qualitative or quantitative approaches in different phases of the research process. In the previous example, pre- and post test scores may quantitatively demonstrate growth or improvement after implementing the math journal; however, the qualitative data would provide detailed evidence as to why the math journals contributed to growth or improvement in math. Quantitative methods can establish relationships among variables, while qualitative methods can explain factors underlying those same relationships.

I caution the reader at this point to not simply think of qualitative methodologies as anecdotal details to quantitative reports. I only highlight mixed methods to introduce the strength of such studies, and to aid in moving educational research methodology away from the binary thinking of quantitative vs. qualitative. In thinking about data collection, possible data sources include questionnaires or surveys, observations (video or written notes), collaboration (meetings, peer coaching), interviews, tests and records, pictures, diaries, transcripts of video and audio recordings, personal journals, student work samples, e-mail and online communication, and any other pertinent documents and reports. As you begin to think about data collection you will consider the available materials and think about aspects discussed in the previous chapter: who, what, where, when, and how. Specifically:

  • Who are the subjects or participants for the study?
  • What data is vital evidence for this study?
  • Where will the data be collected?
  • When will the data be collected?
  • How will the data be collected?

If you find you are having trouble identifying data sources that support your initial question, you may need to revise your research question – and make sure what you are asking is researchable or measurable. The research question can always change throughout the study, but it should only be in relation the data being collected.

Participant Data

As an educator, your possible participants selection pool is narrower than most researchers encounter – however, it is important to be clear about their role in the data design and collection. A study can involve one participant or multiple participants, and participants often serve as the primary source of data in the research process. Most studies by educator-researchers utilize purposeful sampling, or in other words, they select participants who will be able to provide the most relevant information to the study. Therefore, the study design relies upon the participants and the information they can provide. The following is a description of some data collection methods, which include: surveys or questionnaires, individual or group interviews, observations, field notes or diaries, narratives, documents, and elicitation.

Surveys, or questionnaires, are a research instrument frequently used to receive data about participants’ feelings, beliefs, and attitudes in regard to the research topic or activities. Surveys are often used for large sample sizes with the intent of generalizing from a sample population to a larger population. Surveys are used with any number of participants and can be administered at different times during the study, such as pre-activity and post-activity, with the same participants to determine if changes have occurred over the course of the activity time, or simply change over time. Researchers like surveys and questionnaires as an instrument because they can be distributed and collected easily – especially with all of the recent online application possibilities (e.g., Google, Facebook, etc.). Surveys come in several forms, closed-ended, open-ended, or a mix of the two. Closed-ended surveys are typically multiple-choice questions or scales (e.g. 1-5, most likely–least likely) that allow participants to rate or select a response for each question. These responses can easily be tabulated into meaningful number representations, like percentages. For example, Likert scales are often used with a five-point range, with options such as strongly agree, agree, neutral, disagree, and strongly disagree. Open-ended surveys consist of prompts for participants to add their own perspectives in short answer or limited word responses. Open-ended surveys are not always as easy to tabulate, but can provide more detail and description.

Interviews and Focus Groups

Interviews are frequently used by researchers because they often produce some of the most worthwhile data. Interviews allow researchers to obtain candid verbal perspectives through structured or semi-structured questioning. Interview questions, either structured or semi-structured, are related to the research question or research activities to gauge the participants’ thoughts, feelings, motivations, and reflections. Some research relies on interviewing as the primary data source, but most often interviews are used to strengthen and support other data sources. Interviews can be time consuming, but interviews are worthwhile in that you can gather richer and more revealing information than other methods that could be utilized (Koshy, 2010). Lincoln and Guba (1985) identified five outcomes of interviewing:

Outcomes of Interviewing

  • Here and now explanations;
  • Reconstructions of past events and experiences;
  • Projections of anticipated experiences;
  • Verification of information from other sources;
  • Verification of information (p. 268).

As mentioned above, interviews typically take two forms: structured and semi-structured. In terms of interviews, structured means that the researcher identifies a certain number of questions, in a prescribed sequence, and the researcher asks each participant these questions in the same order. Structured interviews qualitatively resemble surveys and questionnaires because they are consistent, easy to administer, provide direct responses, and make tabulation and analysis more consistent. Structured interviews use an interview protocol to organize questions, and maintain consistency.

Semi-structured interviews have a prescribed set of questions and protocol, just like structured interviews, but the researcher does not have to follow those questions or order explicitly. The researcher should ask the same questions to each participant for comparison reasons, but semi-structured interviews allow the researcher to ask follow-up questions that stray from the protocol. The semi-structured interview is intended to allow for new, emerging topics to be obtained from participants. Semi-structured questions can be included in more structured protocols, which allows for the participant to add additional information beyond the formal questions and for the researcher to return to preplanned formal questions after the participant responds. Participants can be interviewed individually or collectively, and while individual interviews are time-consuming, they can provide more in-depth information.

When considering more than two participants for an interview, researchers will often use a focus group interview format. Focus group interviews typically involve three to ten participants and seek to gain socially dependent perspectives or organizational viewpoints. When using focus group interviews with students, researchers often find them beneficial because they allow student reflection and ideas to build off of each other. This is important because often times students feel shy or hesitant to share their ideas with adults, but once another student sparks or confirms their idea, belief, or opinion they are more willing to share. Focus group interviews are very effective as pre- and post-activity data sources. Researchers can use either a structured or semi-structured interview protocol for focus group interviews; however, with multiple participants it may be difficult to maintain the integrity of a structured protocol.

Observations

One of the simplest, and most natural, forms of data collection is to engage in formal observation. Observing humans in a setting provides us contextual understanding of the complexity of human behavior and interrelationships among groups in that setting. If a researcher wants to examine the ways teachers approach a particular area of pedagogical practice, then observation would be a viable data collection tool. Formal observations are truly unique and allow the researcher to collect data that cannot be obtained through other data sources. Ethnography is a qualitative research design that provides a descriptive account based on researchers’ observations and explorations to examine the social dynamics present in cultures and social systems – which includes classrooms and schools. Taken from anthropology, the ethnographer uses observations and detailed note taking, along with other forms of mapping or making sense of the context and relationships within. For Creswell (2007), several guidelines provide structure to an observation:

Structuring Observations

  • Identify what to observe
  • Determine the role you will assume — observer or participant
  • Design observational protocol for recording notes
  • Record information such as physical situation, particular events and activities
  • Thank participants and inform them of the use of and their accessibility to the data (pp. 132– 134)

As an educator-researcher, you may take on a role that exceeds that of an observer and participate as a member of the research setting. In this case, the data sources would be called participant observation to clearly identify the degree of involvement you have in the study. In participant observation, the researcher embeds themselves in the actions of the participants. It is important to understand that participant observation will provide completely different data, in comparison to simply observing someone else. Ethnographies, or studies focused completely on observation as a data source, often extend longer than other data sources, ranging from several months to even years. Extended time provides the researcher the ability to obtain more detailed and accurate information, because it takes time to observe patterns and other details that are significant to the study. Self-study is another consideration for educators, if they want to use observation and be a participant observer. They can use video and audio recordings of their activities to use as data sources and use those as the source of observation.

Field Diaries and Notes

Utilizing a field dairy, or keeping field notes, can be a very effective and practical data collection method. In purpose, a field diary or notes keep a record of what happens during the research activities. It can be useful in tracking how and why your ideas and the research process evolved. Many educators keep daily notes about their classes, and in many ways, this is a more focused and narrower version of documenting the daily happenings of a class. A field diary or notes can also serve as an account of your reflections and commentary on your study, and can be a starting place for your data analysis and interpretations. A field diary or notes are typically valuable when researchers begin to write about their project because it allows them to draw upon their authentic voice. The reflective process that represents a diary can also serve as an additional layer of professional learning for researchers. The format and length of a field diary or notes will vary depending on the researching and the topic; however, the ultimate goal should be to facilitate data collection and analysis.

Data narratives and stories are a fairly new form of formalized data. While researchers have collected bits and pieces of narratives in other forms of data, asking participants to compose a narrative (either written, spoken, or performed) as a whole allows researchers to examine how participants embrace the complexities of the context and social interactions. Humans are programmed to engage with and share narratives to develop meaningful and experiential knowledge. Educator autobiographies bring to life personal stories shaped by knowledge, values, and feelings that developed from their classroom experiences. Narrative data includes three primary areas: temporality, sociality, and place (Clandinin & Conolley, 2000). In terms of temporality, narratives have a past, present, and future because stories are time-based and transitional. Sociality highlights the social relationships in narratives as well as the personal and moral dispositions. Place includes the spaces where the narratives happen. Furthermore, bell hooks (1991) notes that narratives, or storytelling, as inquiry can be a powerful way to study how contexts are influenced by power structures, often linking and intersecting the structural dynamics of social class, race, and gender to highlight the struggle.

Documents provide a way to collect data that is unobtrusive to the participant. Documents are unobtrusive data because it is collected without modifying or distracting the research context when gathered. Educational settings maintain records on all sorts of activities in schools: content standards, state mandates, student discipline records, student attendance, student assessments, performance records, parental engagement, records of how teachers spend PTO money, etc. Documents often provide background and contextual material providing a snapshot of school policies, demographic information, ongoing records over a period of time, and contextual details from the site of the research study. Documents can be characterized similarly to historical research, as primary and secondary. Examples of primary materials are first-hand sources from someone in the educational context, such as minutes from a school board or faculty meeting, photographs, video recordings, and letters. Examples of secondary sources typically include analysis or interpretations of a primary source by others, such as texts, critiques, and reviews. Both types of sources are especially valuable in action research.

Elicitation Methods

We have talked about several methods of data collection that each have useful ways of documenting, inquiring, and thinking about the research question. However, how does a researcher engage participants in ways that allow them to demonstrate what they know, feel, think, or believe? Asking participants directly about their thinking, feeling, or beliefs will only take you so far depending on the comfort and rapport the participant has with the researcher. There are always a variety of hurdles in extracting participants’ knowledge. Even the manner in which questions are framed and the way researchers use materials in the research process are equally important in getting participants to provide reliable, comparable, and valid responses. Furthermore, all individuals who participate in research studies vary in their ability to recall and report what they know, and this affects the value of traditional data collection, especially structured and semi-structured interviewing. In particular, participants’ knowledge or other thinking of interest may be implicit and difficult for them to explicate in simple discussion.

Elicitation methods help researchers uncover unarticulated participant knowledge through a potential variety of activities. Researchers will employ elicitation methods and document the participants’ actions and typically the description of why they took those particular actions. Educators may be able to relate the process of elicitation methods to a “think aloud” activity in which the researcher wants to record or document the activity. Elicitation methods can take many forms. What follows are some basic ideas and formats for elicitation methods.

Brainstorming/Concept Map

Most educators are probably familiar with the process of brainstorming or creating a concept map. These can be very effective elicitation methods when the researcher asks the participant to create a concept map or representation of brainstorming, and then asks the participant to explain the connections between concepts or ideas on the brainstorming or concept map.

Sorting provides an engaging way to gather data from your participants. Sorting, as you can imagine, involves participants sorting, grouping, or categorizing objects or photographs in meaningful ways. Once participants have sorted the objects or photographs, the researcher records or documents the participant explaining why they sorted or grouped the objects or photographs in the way that they did. As a former history teacher, I would often use sorting to assess my students’ understanding of related concepts and events in a world history class. I would use pictures too as the means for students to sort and demonstrate what they understood from the unit. For broader discussion of elicitation techniques in history education see Barton (2015).

Listing/ Ranking

Listing can be an effective way to examine participants’ thinking about a topic. Researchers can have participants construct a list in many different ways to fit the focus of the study and then have the participants explain their list. For example, if an educator was studying middle school student perceptions of careers, they could ask them to complete three lists: Careers in Most Demand; Careers with Most Education/Training; Careers of most Interest.

Then, once participants have filled out the lists, the most important part is documenting them explaining their thinking, and why they filled out the lists the way they did. As you may imagine, in this example, every participant would have a list that is different based on their personal interests.

Researchers can also elicit responses by simply giving participants a prompt, and then asking them to recall whatever they know about that prompt. Researchers will have the participants do this in some sort of demonstrative activity. For example, at the end of a world history course, I might ask students to explain what “culture” means to them and to explain their thinking.

Re-articulation (writing or drawing)

A unique way to engage participants in elicitation methods is to have them write about, rewrite, or draw visual representations of either life experiences or literature that they have read. For example, you could ask them to rewrite a part of the literature they did not like, add a part they thought should be there, or simply extend the ending. Participants can either write or draw these re-articulations. I find that drawing works just as well because, again, the goal is to have participant describe their thinking based on the activity.  

Scenario Decision-Making

Elicitation methods can also examine skills. Researchers can provide participants scenarios and ask them to make decisions. The researchers can document those decisions and analyze the extent to which the participant understands the skill.

  Document, Photograph, or Video Analysis

This is the most basic elicitation in which the researcher provides a document, photograph, or video for the participant to examine. Then, the researcher asks questions about the participants interpretations of the document, photograph, or video. One method that would support this sort of elicitation is to ask the participants to provide images from their everyday words. For example, asking students to document the literacy examples in their homes (i.e., pictures of calendars, bookshelves etc.).  With the availability of one-to-one tech, and iPads, participant documentation is easier.

There are many more methods of data collection also, as well as many variations of the methods described above. The goal for you is to find the data collection methods that are going to give you the best data to answer your research question. If you are unsure, there is nothing wrong with collecting more data than you need to make sure you use effective methods – the only thing you have to lose is time!

Use of Case Studies

Case studies are a popular way for studying phenomena in settings using qualitative methodology. Case studies typically encompass qualitative studies which look closely at what happens when researchers collect data, analyze the data, and present the results. Case studies can focus on a single case or examine a phenomenon across multiple cases. Case studies frame research in a way that allows for rich description of data and depth of analysis.

An advantage of using case study design is that the reader often identifies with the case or phenomena, as well as the participants in the study. Yin (2003) describes case study methodology as inquiry that investigates a contemporary phenomenon within its authentic context. Case studies are particularly appropriate when the boundaries and relationship between the phenomenon and the context are not clear. Case studies relate well with the processes involved in action research. Critics of action research case studies sometimes criticize the inevitable subjectivity, just like general criticisms of action research. Case studies provide researchers opportunities to explore both the how and the why of phenomena in context, while being both exploratory and descriptive.

We want to clarify the differences between methodologies and methods of research. There are methodologies of research, like case study and action research, and methods of data collection. Methodologies like ethnography, narrative inquiry, and case study draw from some similar methods of data collecting that include interviews, collection of artifacts (writings, drawings, images), and observations. The differences between the methodologies include the time-frame for research; the boundaries of the research; and the epistemology.

Triangulation of Data

Triangulation is a method used by qualitative researchers to check and establish trustworthiness in their studies by using and analyzing multiple (three or more) data collection methods to address a research question and develop a consistency of evidence from data sources or approaches. Thus, triangulation facilitates trustworthiness of data through cross verification of evidence, to support claims, from more than two data collection sources. Triangulation also tests the consistency of findings obtained through different data sources and instruments, while minimizing bias in the researcher’s interpretations of the data.

If we think about the example of studying the use of math journals in an elementary classroom, the researcher would want to collect at least three sources of data – the journal prompts, assessment scores, and interviews. When the researcher is analyzing the data, they will want to find themes or evidence across all three data sources to address their research question. In a very basic analysis, if the students demonstrated a deeper level of reflection about math in the journals, their assessment scores improved, and their interviews demonstrated they had more confidence in their number sense and math abilities – then, the researcher could conclude, on a very general level, that math journals improved their students’ math skills, confidence, or abilities. Ideally, the study would examine specific aspects of math to enable deeper analysis of math journals, but this example demonstrates the basic idea of triangulation. In this example, all of the data provided evidence that the intervention of a math journal improved students’ understanding of math, and the three data sources provided trustworthiness for this claim.

Data Collection Checklist

  • Based on your research question, what data might you need ?
  • What are the multiple ways you could collect that data ?
  • How might you document this data , or organize it so that it can be analyzed?
  • What methods are most appropriate for your context and timeframe ?
  • How much time will your data collection require? How much time can you allow for?
  • Will you need to create any data sources (e.g., interview protocol, elicitation materials)?
  • Do your data sources all logically support the research question, and each other?
  • Does your data collection provide for multiple perspectives ?
  • How will your data achieve triangulation in addressing the research question?
  • Will you need more than three data sources to ensure triangulation of data?

Action Research Copyright © by J. Spencer Clark; Suzanne Porath; Julie Thiele; and Morgan Jobe is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

Complete Guide to Data Collection for Data Science: Step-by-Step

Did you know that data collection is one of the most time-consuming steps in the process of data science? But it's definitely not as terrifying as data cleaning.

The explosion in data production is making all organizations  more data-driven .

Collecting data is the new trend in the market and this post is an informative piece on everything you should know about data collection.

Here's what you can expect to learn today:

The idea behind data science

What is data collection in data science.

  • The process of data collection

Before we get into the details of the data collection process, let me briefly introduce you to the idea behind data science.

If you're already familiar with data science, skip to the next section.

If I were to pick one of the most defining periods of technological advancements, it would be the  Big Data era .

Data science became a thing due to two important reasons:

  • Acceleration in the power of data processing through the introduction of the graphics processing unit (GPU)
  • Production of massive amounts of data

Did you know that the human species is producing  2.5 quintillion bytes of data every single day ? Every time you surf the web, hit a "like" on  Instagram , or share a cat meme, you produce data.

What's the smart thing to do with so much data? Process it? Extract insightful information and correlations?

That's right!

Data science is the art of extracting insightful information from data.

Have you ever wondered why Google throws ads at you about what clothes to buy? The reason is that  Google collects data about which websites you interact with, what products interest you, and so on . It then targets you with ads relevant to your interests.

That's how you use data to grow businesses, and that's why most organizations are becoming more data-driven by the day.

However, the process of extracting insightful information is not as straightforward as the example I gave above. You have to be able to identify a problem statement, collect relevant data, and then go about cleaning, processing, analyzing, and  extracting useful data .

Today we will focus on data collection.

Data collection is the process of accumulating data that's required to solve a problem statement.

What do I mean by a problem statement?

All data science projects (all projects really) start with a problem that needs a solution. There's always something you can solve or improve.

Step-by-step guide to data collection

Data collection gets done in steps, and it's important to understand that this is an iterative and repetitive process, meaning that after the first round of collecting data, you probably need to repeat what you did.

In the below sections, you can read about the steps you can take to collect your data.

Identify a problem statement

The most vital step is to identify and pinpoint the exact question that needs to be answered.

For example, let's say your online cat food business is not producing enough sales. Your problem statement would be: find ways to attract more customers and improve your sales.

You can work backward once you briefly identify your problem and solution. In this case, you can start off by taking a look at the audience you are targeting.

Maybe you need to target a wider age group , or you may want to learn more about what type of cat owners shop online, such as their geographic location, gender, ethnicity, and so on.

Collecting more data is often about collecting the right type of data . Thus, the first step is to understand what problem needs solving and how you can go about solving it.

Determine what type of data is needed

The next step is to consider what type of data you must collect.

Is it quantitative or qualitative ?

Accessing and processing quantitative data is easier because it involves raw numbers and digits. On the other hand, processing qualitative data, such as customer reviews or feedback, is more complex.

Segregating the different types of data from the moment of data collection can be useful while performing data processing down the line.

Decide on your data sources

Once you have an idea about what data you need, start looking into whether the data is within your organization or if you'll require third-party or external data.

In most cases, the smart thing to do is to  acquire external data . This acquisition will keep you on par with your competitors, who will probably also invest in third-party data. You must be willing to buy data and keep your legal team close.

At this point, it's important to draw your attention to the  ethical issues relevant to data collection and data privacy.

Make sure your audience is  fully aware  of the data you're collecting about them. You don't want to fall into a data scandal, such as the one in which  Facebook and Cambridge Analytica  were involved. If your organization is buying data from another corporation, your legal team must be careful to consider all data privacy clauses.

In addition to that, collecting data from government organizations is also common. Some data scientists also use surveys to collect data .

Another practice is to build a user persona based on existing data . For instance, your organization has insights into the type of people who buy sports gear. Such information can get used to create a user persona for people with varied interests. This process is common when there is not enough data available.

Create a timeline

Now it's time to identify the time frame within which the data is most useful.

For example, do you need end-to-end data about how a customer lands on an e-commerce website? Or do you need relevant parts about the user's search history, geography, and background?

Identifying the timeline is  key to getting the exact type of data you need  to solve your problem statement.

A potential lead may generate data at different stages, and it's your job to effectively evaluate which data is most relevant.

Collect your data

To effectively collect data, devise a plan that addresses all the questions relevant to  securely collecting data.

If you're collecting data from a third party or a stakeholder,  make sure all requirements and privacy issues get considered .

Additionally, create a plan for  how you will store the data.  Make sure your organization has the right tools and infrastructure to  manage and process the data .

You also need to  establish a systematic approach for storing all the different types of data  so that you can later combine and further process them.

For example, storing transactional data can be relatively easier since there are tons of tools that arrange such data in a tabular format. On the other hand,  unstructured data  can be relatively difficult to manage and store due to its loose format.

Therefore, you must devise a plan to collect your data and make the processing simpler.

Panoply can help!

Panoply  offers a  simple solution to the problems involved in collecting data by offering a hassle-free data collection service. You can go through their free demo to check out how you can store, manage, and access your data.

Collecting our thoughts

During the process of data analysis, you'll get new revelations about additional data that's required, thus making data science an iterative process.

The act of retrieving useful insights requires identifying and collecting the right kind of data. Once your organization has the right data, it becomes easier to process and analyze it.

Data collection can play a vital part in helping businesses grow.

By understanding what data collection is and the various steps you should consider while collecting data for your data science project, you can gain valuable insights into how to apply the information for future growth and development.

Also Check Out

Get panoply updates on the fly., work smarter, better, and faster with monthly tips and how-tos..

Data collection in research: Your complete guide

Last updated

31 January 2023

Reviewed by

Cathy Heath

In the late 16th century, Francis Bacon coined the phrase "knowledge is power," which implies that knowledge is a powerful force, like physical strength. In the 21st century, knowledge in the form of data is unquestionably powerful.

But data isn't something you just have - you need to collect it. This means utilizing a data collection process and turning the collected data into knowledge that you can leverage into a successful strategy for your business or organization.

Believe it or not, there's more to data collection than just conducting a Google search. In this complete guide, we shine a spotlight on data collection, outlining what it is, types of data collection methods, common challenges in data collection, data collection techniques, and the steps involved in data collection.

Analyze all your data in one place

Uncover hidden nuggets in all types of qualitative data when you analyze it in Dovetail

  • What is data collection?

There are two specific data collection techniques: primary and secondary data collection. Primary data collection is the process of gathering data directly from sources. It's often considered the most reliable data collection method, as researchers can collect information directly from respondents.

Secondary data collection is data that has already been collected by someone else and is readily available. This data is usually less expensive and quicker to obtain than primary data.

  • What are the different methods of data collection?

There are several data collection methods, which can be either manual or automated. Manual data collection involves collecting data manually, typically with pen and paper, while computerized data collection involves using software to collect data from online sources, such as social media, website data, transaction data, etc. 

Here are the five most popular methods of data collection:

Surveys are a very popular method of data collection that organizations can use to gather information from many people. Researchers can conduct multi-mode surveys that reach respondents in different ways, including in person, by mail, over the phone, or online.

As a method of data collection, surveys have several advantages. For instance, they are relatively quick and easy to administer, you can be flexible in what you ask, and they can be tailored to collect data on various topics or from certain demographics.

However, surveys also have several disadvantages. For instance, they can be expensive to administer, and the results may not represent the population as a whole. Additionally, survey data can be challenging to interpret. It may also be subject to bias if the questions are not well-designed or if the sample of people surveyed is not representative of the population of interest.

Interviews are a common method of collecting data in social science research. You can conduct interviews in person, over the phone, or even via email or online chat.

Interviews are a great way to collect qualitative and quantitative data . Qualitative interviews are likely your best option if you need to collect detailed information about your subjects' experiences or opinions. If you need to collect more generalized data about your subjects' demographics or attitudes, then quantitative interviews may be a better option.

Interviews are relatively quick and very flexible, allowing you to ask follow-up questions and explore topics in more depth. The downside is that interviews can be time-consuming and expensive due to the amount of information to be analyzed. They are also prone to bias, as both the interviewer and the respondent may have certain expectations or preconceptions that may influence the data.

Direct observation

Observation is a direct way of collecting data. It can be structured (with a specific protocol to follow) or unstructured (simply observing without a particular plan).

Organizations and businesses use observation as a data collection method to gather information about their target market, customers, or competition. Businesses can learn about consumer behavior, preferences, and trends by observing people using their products or service.

There are two types of observation: participatory and non-participatory. In participatory observation, the researcher is actively involved in the observed activities. This type of observation is used in ethnographic research , where the researcher wants to understand a group's culture and social norms. Non-participatory observation is when researchers observe from a distance and do not interact with the people or environment they are studying.

There are several advantages to using observation as a data collection method. It can provide insights that may not be apparent through other methods, such as surveys or interviews. Researchers can also observe behavior in a natural setting, which can provide a more accurate picture of what people do and how and why they behave in a certain context.

There are some disadvantages to using observation as a method of data collection. It can be time-consuming, intrusive, and expensive to observe people for extended periods. Observations can also be tainted if the researcher is not careful to avoid personal biases or preconceptions.

Automated data collection

Business applications and websites are increasingly collecting data electronically to improve the user experience or for marketing purposes.

There are a few different ways that organizations can collect data automatically. One way is through cookies, which are small pieces of data stored on a user's computer. They track a user's browsing history and activity on a site, measuring levels of engagement with a business’s products or services, for example.

Another way organizations can collect data automatically is through web beacons. Web beacons are small images embedded on a web page to track a user's activity.

Finally, organizations can also collect data through mobile apps, which can track user location, device information, and app usage. This data can be used to improve the user experience and for marketing purposes.

Automated data collection is a valuable tool for businesses, helping improve the user experience or target marketing efforts. Businesses should aim to be transparent about how they collect and use this data.

Sourcing data through information service providers

Organizations need to be able to collect data from a variety of sources, including social media, weblogs, and sensors. The process to do this and then use the data for action needs to be efficient, targeted, and meaningful.

In the era of big data, organizations are increasingly turning to information service providers (ISPs) and other external data sources to help them collect data to make crucial decisions. 

Information service providers help organizations collect data by offering personalized services that suit the specific needs of the organizations. These services can include data collection, analysis, management, and reporting. By partnering with an ISP, organizations can gain access to the newest technology and tools to help them to gather and manage data more effectively.

There are also several tools and techniques that organizations can use to collect data from external sources, such as web scraping, which collects data from websites, and data mining, which involves using algorithms to extract data from large data sets. 

Organizations can also use APIs (application programming interface) to collect data from external sources. APIs allow organizations to access data stored in another system and share and integrate it into their own systems.

Finally, organizations can also use manual methods to collect data from external sources. This can involve contacting companies or individuals directly to request data, by using the right tools and methods to get the insights they need.

  • What are common challenges in data collection?

There are many challenges that researchers face when collecting data. Here are five common examples:

Big data environments

Data collection can be a challenge in big data environments for several reasons. It can be located in different places, such as archives, libraries, or online. The sheer volume of data can also make it difficult to identify the most relevant data sets.

Second, the complexity of data sets can make it challenging to extract the desired information. Third, the distributed nature of big data environments can make it difficult to collect data promptly and efficiently.

Therefore it is important to have a well-designed data collection strategy to consider the specific needs of the organization and what data sets are the most relevant. Alongside this, consideration should be made regarding the tools and resources available to support data collection and protect it from unintended use.

Data bias is a common challenge in data collection. It occurs when data is collected from a sample that is not representative of the population of interest. 

There are different types of data bias, but some common ones include selection bias, self-selection bias, and response bias. Selection bias can occur when the collected data does not represent the population being studied. For example, if a study only includes data from people who volunteer to participate, that data may not represent the general population.

Self-selection bias can also occur when people self-select into a study, such as by taking part only if they think they will benefit from it. Response bias happens when people respond in a way that is not honest or accurate, such as by only answering questions that make them look good. 

These types of data bias present a challenge because they can lead to inaccurate results and conclusions about behaviors, perceptions, and trends. Data bias can be avoided by identifying potential sources or themes of bias and setting guidelines for eliminating them.

Lack of quality assurance processes

One of the biggest challenges in data collection is the lack of quality assurance processes. This can lead to several problems, including incorrect data, missing data, and inconsistencies between data sets.

Quality assurance is important because there are many data sources, and each source may have different levels of quality or corruption. There are also different ways of collecting data, and data quality may vary depending on the method used. 

There are several ways to improve quality assurance in data collection. These include developing clear and consistent goals and guidelines for data collection, implementing quality control measures, using standardized procedures, and employing data validation techniques. By taking these steps, you can ensure that your data is of adequate quality to inform decision-making.

Limited access to data

Another challenge in data collection is limited access to data. This can be due to several reasons, including privacy concerns, the sensitive nature of the data, security concerns, or simply the fact that data is not readily available.

Legal and compliance regulations

Most countries have regulations governing how data can be collected, used, and stored. In some cases, data collected in one country may not be used in another. This means gaining a global perspective can be a challenge. 

For example, if a company is required to comply with the EU General Data Protection Regulation (GDPR), it may not be able to collect data from individuals in the EU without their explicit consent. This can make it difficult to collect data from a target audience.

Legal and compliance regulations can be complex, and it's important to ensure that all data collected is done so in a way that complies with the relevant regulations.

  • What are the key steps in the data collection process?

There are five steps involved in the data collection process. They are:

1. Decide what data you want to gather

Have a clear understanding of the questions you are asking, and then consider where the answers might lie and how you might obtain them. This saves time and resources by avoiding the collection of irrelevant data, and helps maintain the quality of your datasets. 

2. Establish a deadline for data collection

Establishing a deadline for data collection helps you avoid collecting too much data, which can be costly and time-consuming to analyze. It also allows you to plan for data analysis and prompt interpretation. Finally, it helps you meet your research goals and objectives and allows you to move forward.

3. Select a data collection approach

The data collection approach you choose will depend on different factors, including the type of data you need, available resources, and the project timeline. For instance, if you need qualitative data, you might choose a focus group or interview methodology. If you need quantitative data , then a survey or observational study may be the most appropriate form of collection.

4. Gather information

When collecting data for your business, identify your business goals first. Once you know what you want to achieve, you can start collecting data to reach those goals. The most important thing is to ensure that the data you collect is reliable and valid. Otherwise, any decisions you make using the data could result in a negative outcome for your business.

5. Examine the information and apply your findings

As a researcher, it's important to examine the data you're collecting and analyzing before you apply your findings. This is because data can be misleading, leading to inaccurate conclusions. Ask yourself whether it is what you are expecting? Is it similar to other datasets you have looked at? 

There are many scientific ways to examine data, but some common methods include:

looking at the distribution of data points

examining the relationships between variables

looking for outliers

By taking the time to examine your data and noticing any patterns, strange or otherwise, you can avoid making mistakes that could invalidate your research.

  • How qualitative analysis software streamlines the data collection process

Knowledge derived from data does indeed carry power. However, if you don't convert the knowledge into action, it will remain a resource of unexploited energy and wasted potential.

Luckily, data collection tools enable organizations to streamline their data collection and analysis processes and leverage the derived knowledge to grow their businesses. For instance, qualitative analysis software can be highly advantageous in data collection by streamlining the process, making it more efficient and less time-consuming.

Secondly, qualitative analysis software provides a structure for data collection and analysis, ensuring that data is of high quality. It can also help to uncover patterns and relationships that would otherwise be difficult to discern. Moreover, you can use it to replace more expensive data collection methods, such as focus groups or surveys.

Overall, qualitative analysis software can be valuable for any researcher looking to collect and analyze data. By increasing efficiency, improving data quality, and providing greater insights, qualitative software can help to make the research process much more efficient and effective.

collection of data assignment

Learn more about qualitative research data analysis software

Get started today.

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 5 March 2024

Last updated: 25 November 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

Logo for JCU Open eBooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

4.5 Data Collection Methods

Choosing the most appropriate and practical data collection method is an important decision that must be made carefully. It is important to recognise that the quality of data collected in a qualitative manner is a direct reflection of the skill and competence of the researcher. Advanced interpersonal skills are required, especially the ability to accurately interpret and respond to subtle participant behavior in a variety of situations. Interviews, focus groups and observations are the primary methods of data collection used in qualitative healthcare research (Figure 4.7). 62

collection of data assignment

Interviews can be used to explore individual participants’ views, experiences, beliefs and motivations. There are three fundamental types of research interviews: structured, semi-structured and unstructured.

Structured interviews, also known as standardised open-ended interviews, are carefully prepared ahead of time, and each participant is asked the same question in a certain sequence. 63 A structured interview is essentially an oral questionnaire in which a pre-determined list of questions is asked, with little or no variation and no room for follow-up questions to answers that require further clarification. 63 Structured interviews are relatively quick and easy to develop and use and are especially useful when you need clarification on a specific question. 63 However, by its very nature, it allows only a limited number of participant responses, so it is of little use if “depth” is desired. This approach resists improvisation and the pursuit of intuition but can promote consistency among participants. 63

Semi-structured interviews, also known as the general interview guide approach, include an outline of questions to ensure that all pertinent topics are covered. 63 A semi-structured interview consists of a few key questions that help define the area to be explored but also allow the interviewer or respondent to diverge and explore ideas or responses in more detail. 64 This interview format is used most frequently in healthcare, as it provides participants with some guidance about what to talk about. The flexibility of this approach, especially when compared to structured interviews, is that it allows participants to discover or refine important information that may not have been previously considered relevant by the research team. 63

Unstructured interviews, also known as informal conversational interviews, consist of questions that are spontaneously generated in the natural flow of conversation, reflect no preconceptions or ideas, and have little or no organisation. 65 Such conversations can easily start with an opening question such as, “Can you tell me about your experience at the clinic?” It then proceeds primarily based on the initial response. Unstructured interviews tend to be very lengthy (often hours), lack pre-set interview questions, and provide little guidance on what to talk about, which can be difficult for participants. 63

As a result, they are often considered only when great “depth” is required, little is known about the subject, or another viewpoint on a known issue is requested. 63 Significant freedom in unstructured interviews allows for more intuitive and spontaneous exchanges between the researcher and the participants. 63. .

Advantages and Disadvantages

Interviews can be conducted via Phone, Face-to-Face or Online, depending on participants’ preferences and availability. Often participants are flattered to be asked and they make the time to speak with you and they reward you with candour. 66 Usually, interviews provide flexibility to schedule sessions at the convenience of the interviewees. 66 It also provides less observer or participant bias as other participants’ experiences or opinions do not influence the interviewee. Interviews also provide enough talk time for interviewees and spare them from spending time listening to others. Additionally, the interviewer can observe the non-verbal behaviour of the interviewee and potentially record it as data. 66

Interviews also have inherent weaknesses. Conducting interviews can be very costly and time-consuming. 66 Interviews also provide less anonymity, which is usually a major concern for many respondents. 66 Nonetheless, qualitative interviews can be a valuable tool to help uncover meaning and understanding of phenomena. 66

With your understanding of interviews, watch this video clip and identify what you would do differently and provide your thoughts in the Padlet below.

Now watch the video clip below to see how a good interview should be conducted

After watching the video, reflect on the responses you provided in the Padlet and consider if there is anything you may have missed out or need to revise.

Focus group

Focus groups are group interviews that explore participants’ knowledge and experiences and how and why individuals act in various ways. 67   This method involves bringing a small group together to discuss a specific topic or issue. The groups typically include 6-8 participants and are conducted by an experienced moderator who follows a topic guide or interview guide. 67 The conversations can be audio or videotaped and then transcribed, depending on the researchers’ and participants’ preferences. In addition, focus groups can include an observer who records nonverbal parts of the encounter, potentially with the help of an observation guide. 67

Advantages and disadvantages

Focus groups effectively bring together homogenous groups of people with relevant expertise and experience on a specific issue and can offer comprehensive information. 67 They are often used to gather information about group dynamics, attitudes, and perceptions and can provide a rich source of data. 67

Disadvantages include less control over the process and a lower level of participation by each individual. 67 Also, focus group moderators, as well as those responsible for data processing, require prior experience. Focus groups are less suitable for discussing sensitive themes as some participants may be reluctant to express their opinions in a group environment. 67 Furthermore, it is important to watch for the creation of “groupthink” or dominance of certain group members, as group dynamics and social dynamics can influence focus groups. 67

Observation

Observations involve the researcher observing and recording the behaviour and interactions of individuals or groups in a natural setting. 67 Observations are especially valuable for gaining insights about a specific situation and real behaviour. They can be participant (the researcher participates in the activity) or non-participant (the researcher observes from a distance) in nature. 67 The observer in participant observations is a member of the observed context, such as a nurse working in an intensive care unit. The observer is “on the outside looking in” in non-participant observations, i.e. present but not a part of the scenario, attempting not to impact the environment by their presence. 67 During the observation, the observer notes everything or specific elements of what is happening around them, such as physician-patient interactions or communication between different professional groups. 67

The advantage of performing observations includes reducing the gap between the researcher and the study. Issues may be found that the researcher was unaware of and are relevant in gaining a greater understanding of the research. 67 However, observation can be time-consuming, as the researcher may need to observe the behaviour or interactions for an extended period to collect enough data. In addition, they can be influenced by the researcher’s biases, which can affect the accuracy and validity of the data collected. 68

An Introduction to Research Methods for Undergraduate Health Profession Students Copyright © 2023 by Faith Alele and Bunmi Malau-Aduli is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

6.894 : Interactive Data Visualization

Assignment 2: exploratory data analysis.

In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of captioned visualizations that convey key insights gained during your analysis.

Step 1: Data Selection

First, you will pick a topic area of interest to you and find a dataset that can provide insights into that topic. To streamline the assignment, we've pre-selected a number of datasets for you to choose from.

However, if you would like to investigate a different topic and dataset, you are free to do so. If working with a self-selected dataset, please check with the course staff to ensure it is appropriate for the course. Be advised that data collection and preparation (also known as data wrangling ) can be a very tedious and time-consuming process. Be sure you have sufficient time to conduct exploratory analysis, after preparing the data.

After selecting a topic and dataset – but prior to analysis – you should write down an initial set of at least three questions you'd like to investigate.

Part 2: Exploratory Visual Analysis

Next, you will perform an exploratory analysis of your dataset using a visualization tool such as Tableau. You should consider two different phases of exploration.

In the first phase, you should seek to gain an overview of the shape & stucture of your dataset. What variables does the dataset contain? How are they distributed? Are there any notable data quality issues? Are there any surprising relationships among the variables? Be sure to also perform "sanity checks" for patterns you expect to see!

In the second phase, you should investigate your initial questions, as well as any new questions that arise during your exploration. For each question, start by creating a visualization that might provide a useful answer. Then refine the visualization (by adding additional variables, changing sorting or axis scales, filtering or subsetting data, etc. ) to develop better perspectives, explore unexpected observations, or sanity check your assumptions. You should repeat this process for each of your questions, but feel free to revise your questions or branch off to explore new questions if the data warrants.

  • Final Deliverable

Your final submission should take the form of a Google Docs report – similar to a slide show or comic book – that consists of 10 or more captioned visualizations detailing your most important insights. Your "insights" can include important surprises or issues (such as data quality problems affecting your analysis) as well as responses to your analysis questions. To help you gauge the scope of this assignment, see this example report analyzing data about motion pictures . We've annotated and graded this example to help you calibrate for the breadth and depth of exploration we're looking for.

Each visualization image should be a screenshot exported from a visualization tool, accompanied with a title and descriptive caption (1-4 sentences long) describing the insight(s) learned from that view. Provide sufficient detail for each caption such that anyone could read through your report and understand what you've learned. You are free, but not required, to annotate your images to draw attention to specific features of the data. You may perform highlighting within the visualization tool itself, or draw annotations on the exported image. To easily export images from Tableau, use the Worksheet > Export > Image... menu item.

The end of your report should include a brief summary of main lessons learned.

Recommended Data Sources

To get up and running quickly with this assignment, we recommend exploring one of the following provided datasets:

World Bank Indicators, 1960–2017 . The World Bank has tracked global human developed by indicators such as climate change, economy, education, environment, gender equality, health, and science and technology since 1960. The linked repository contains indicators that have been formatted to facilitate use with Tableau and other data visualization tools. However, you're also welcome to browse and use the original data by indicator or by country . Click on an indicator category or country to download the CSV file.

Chicago Crimes, 2001–present (click Export to download a CSV file). This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.

Daily Weather in the U.S., 2017 . This dataset contains daily U.S. weather measurements in 2017, provided by the NOAA Daily Global Historical Climatology Network . This data has been transformed: some weather stations with only sparse measurements have been filtered out. See the accompanying weather.txt for descriptions of each column .

Social mobility in the U.S. . Raj Chetty's group at Harvard studies the factors that contribute to (or hinder) upward mobility in the United States (i.e., will our children earn more than we will). Their work has been extensively featured in The New York Times. This page lists data from all of their papers, broken down by geographic level or by topic. We recommend downloading data in the CSV/Excel format, and encourage you to consider joining multiple datasets from the same paper (under the same heading on the page) for a sufficiently rich exploratory process.

The Yelp Open Dataset provides information about businesses, user reviews, and more from Yelp's database. The data is split into separate files ( business , checkin , photos , review , tip , and user ), and is available in either JSON or SQL format. You might use this to investigate the distributions of scores on Yelp, look at how many reviews users typically leave, or look for regional trends about restaurants. Note that this is a large, structured dataset and you don't need to look at all of the data to answer interesting questions. In order to download the data you will need to enter your email and agree to Yelp's Dataset License .

Additional Data Sources

If you want to investigate datasets other than those recommended above, here are some possible sources to consider. You are also free to use data from a source different from those included here. If you have any questions on whether your dataset is appropriate, please ask the course staff ASAP!

  • data.boston.gov - City of Boston Open Data
  • MassData - State of Masachussets Open Data
  • data.gov - U.S. Government Open Datasets
  • U.S. Census Bureau - Census Datasets
  • IPUMS.org - Integrated Census & Survey Data from around the World
  • Federal Elections Commission - Campaign Finance & Expenditures
  • Federal Aviation Administration - FAA Data & Research
  • fivethirtyeight.com - Data and Code behind the Stories and Interactives
  • Buzzfeed News
  • Socrata Open Data
  • 17 places to find datasets for data science projects

Visualization Tools

You are free to use one or more visualization tools in this assignment. However, in the interest of time and for a friendlier learning curve, we strongly encourage you to use Tableau . Tableau provides a graphical interface focused on the task of visual data exploration. You will (with rare exceptions) be able to complete an initial data exploration more quickly and comprehensively than with a programming-based tool.

  • Tableau - Desktop visual analysis software . Available for both Windows and MacOS; register for a free student license.
  • Data Transforms in Vega-Lite . A tutorial on the various built-in data transformation operators available in Vega-Lite.
  • Data Voyager , a research prototype from the UW Interactive Data Lab, combines a Tableau-style interface with visualization recommendations. Use at your own risk!
  • R , using the ggplot2 library or with R's built-in plotting functions.
  • Jupyter Notebooks (Python) , using libraries such as Altair or Matplotlib .

Data Wrangling Tools

The data you choose may require reformatting, transformation or cleaning prior to visualization. Here are tools you can use for data preparation. We recommend first trying to import and process your data in the same tool you intend to use for visualization. If that fails, pick the most appropriate option among the tools below. Contact the course staff if you are unsure what might be the best option for your data!

Graphical Tools

  • Tableau Prep - Tableau provides basic facilities for data import, transformation & blending. Tableau prep is a more sophisticated data preparation tool
  • Trifacta Wrangler - Interactive tool for data transformation & visual profiling.
  • OpenRefine - A free, open source tool for working with messy data.

Programming Tools

  • JavaScript data utilities and/or the Datalib JS library .
  • Pandas - Data table and manipulation utilites for Python.
  • dplyr - A library for data manipulation in R.
  • Or, the programming language and tools of your choice...

The assignment score is out of a maximum of 10 points. Submissions that squarely meet the requirements will receive a score of 8. We will determine scores by judging the breadth and depth of your analysis, whether visualizations meet the expressivenes and effectiveness principles, and how well-written and synthesized your insights are.

We will use the following rubric to grade your assignment. Note, rubric cells may not map exactly to specific point scores.

Submission Details

This is an individual assignment. You may not work in groups.

Your completed exploratory analysis report is due by noon on Wednesday 2/19 . Submit a link to your Google Doc report using this submission form . Please double check your link to ensure it is viewable by others (e.g., try it in an incognito window).

Resubmissions. Resubmissions will be regraded by teaching staff, and you may earn back up to 50% of the points lost in the original submission. To resubmit this assignment, please use this form and follow the same submission process described above. Include a short 1 paragraph description summarizing the changes from the initial submission. Resubmissions without this summary will not be regraded. Resubmissions will be due by 11:59pm on Saturday, 3/14. Slack days may not be applied to extend the resubmission deadline. The teaching staff will only begin to regrade assignments once the Final Project phase begins, so please be patient.

  • Due: 12pm, Wed 2/19
  • Recommended Datasets
  • Example Report
  • Visualization & Data Wrangling Tools
  • Submission form

Book cover

Research for Medical Imaging and Radiation Sciences pp 97–157 Cite as

Data Collection, Analysis, and Interpretation

  • Mark F. McEntee 5  
  • First Online: 03 January 2022

465 Accesses

Often it has been said that proper prior preparation prevents performance. Many of the mistakes made in research have their origins back at the point of data collection. Perhaps it is natural human instinct not to plan; we learn from our experiences. However, it is crucial when it comes to the endeavours of science that we do plan our data collection with analysis and interpretation in mind. In this section on data collection, we will review some fundamental concepts of experimental design, sample size estimation, the assumptions that underlie most statistical processes, and ethical principles.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Al-Murshedi, S., Hogg, P., & England, A. (2018). An investigation into the validity of utilising the CDRAD 2.0 phantom for optimisation studies in digital radiography. The British Journal of Radiology . British Institute of Radiology , 91 (1089), 4. https://doi.org/10.1259/bjr.20180317

Article   Google Scholar  

Alhailiy, A. B., et al. (2019). The associated factors for radiation dose variation in cardiac CT angiography. The British Journal of Radiology . British Institute of Radiology , 92 (1096), 20180793. https://doi.org/10.1259/bjr.20180793

Article   PubMed   PubMed Central   Google Scholar  

Armato, S. G., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics . John Wiley and Sons Ltd , 38 (2), 915–931. https://doi.org/10.1118/1.3528204

Avison, D. E., et al. (1999). Action research. Communications of the ACM . Association for Computing Machinery (ACM) , 42 (1), 94–97. https://doi.org/10.1145/291469.291479

Båth, M., & Månsson, L. G. (2007). Visual grading characteristics (VGC) analysis: A non-parametric rank-invariant statistical method for image quality evaluation. British Journal of Radiology, 80 (951), 169–176. https://doi.org/10.1259/bjr/35012658

Chakraborty, D. P. (2017). Observer performance methods for diagnostic imaging . CRC Press. https://doi.org/10.1201/9781351228190

Book   Google Scholar  

Couper, M. P., Traugott, M. W., & Lamias, M. J. (2001). Web survey design and administration. Public Opinion Quarterly . Oxford Academic , 65 (2), 230–253. https://doi.org/10.1086/322199

Article   CAS   PubMed   Google Scholar  

European Commission European Guidelines on Quality Criteria for Diagnostic Radiographic Images EUR 16260 EN. (1995).

Google Scholar  

Fähling, M., et al. (2017). Understanding and preventing contrast-induced acute kidney injury. Nature Reviews Nephrology . Nature Publishing Group, 169–180. https://doi.org/10.1038/nrneph.2016.196

Faucon, A. L., Bobrie, G., & Clément, O. (2019). Nephrotoxicity of iodinated contrast media: From pathophysiology to prevention strategies. European Journal of Radiology . Elsevier Ireland Ltd, 231–241. https://doi.org/10.1016/j.ejrad.2019.03.008

Fisher, M. J., & Marshall, A. P. (2009). Understanding descriptive statistics. Australian Critical Care . Elsevier , 22 (2), 93–97. https://doi.org/10.1016/j.aucc.2008.11.003

Article   PubMed   Google Scholar  

Fryback, D. G., & Thornbury, J. R. (1991). The efficacy of diagnostic imaging. Medical Decision Making . Sage PublicationsSage CA: Thousand Oaks, CA , 11 (2), 88–94. https://doi.org/10.1177/0272989X9101100203

Ganesan, A., et al. (2018). A review of factors influencing radiologists’ visual search behaviour. Journal of Medical Imaging and Radiation Oncology . Blackwell Publishing , 62 (6), 747–757. https://doi.org/10.1111/1754-9485.12798

Gilligan, L. A., et al. (2020). Risk of acute kidney injury following contrast-enhanced CT in hospitalized pediatric patients: A propensity score analysis. Radiology . Radiological Society of North America Inc. , 294 (3), 548–556. https://doi.org/10.1148/radiol.2020191931

Good, P. I., & Hardin, J. W. (2012). Common errors in statistics (and how to avoid them): Fourth edition . Wiley. https://doi.org/10.1002/9781118360125

Gusterson, H. (2008). Ethnographic research. In Qualitative methods in international relations (pp. 93–113). Palgrave Macmillan UK. https://doi.org/10.1057/9780230584129_7

Chapter   Google Scholar  

Hansson, J., Månsson, L. G., & Båth, M. (2016). The validity of using ROC software for analysing visual grading characteristics data: An investigation based on the novel software VGC analyzer. Radiation Protection Dosimetry . Oxford University Press , 169 (1–4), 54–59. https://doi.org/10.1093/rpd/ncw035

Home - LUNA16 - Grand Challenge. (n.d.). Available at: https://luna16.grand-challenge.org/ . Accessed 25 Mar 2021.

Huda, W., et al. (1997). Comparison of a photostimulable phosphor system with film for dental radiology. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontics . Mosby Inc. , 83 (6), 725–731. https://doi.org/10.1016/S1079-2104(97)90327-9

Iarossi, G. (2006). The power of survey design: A user’s guide for managing surveys, interpreting results, and influencing respondents . Available at: https://books.google.com/books?hl=en&lr=&id=E-8XHVsqoeUC&oi=fnd&pg=PR5&dq=survey+design&ots=fADK9Aznuk&sig=G5DiPgYM18VcoZ-PF05kT7G0OGI . Accessed 21 Mar 2021.

Jang, J. S., et al. (2018). Image quality assessment with dose reduction using high kVp and additional filtration for abdominal digital radiography. Physica Medica . Associazione Italiana di Fisica Medica , 50 , 46–51. https://doi.org/10.1016/j.ejmp.2018.05.007

Jessen, K. A. (2004). Balancing image quality and dose in diagnostic radiology. European Radiology, Supplement . Springer , 14 (1), 9–18. https://doi.org/10.1007/s10406-004-0003-7

King, N., Horrocks, C., & Brooks, J. (2018). Interviews in qualitative research . Available at: https://books.google.com/books?hl=en&lr=&id=ZdB2DwAAQBAJ&oi=fnd&pg=PP1&dq=interviews+in+research&ots=hwRx2cwH3W&sig=_gt8y-4GlHSCnTQAhLfynA3C17E . Accessed: 21 Mar 2021.

Krul, A. J., Daanen, H. A. M., & Choi, H. (2011). Self-reported and measured weight, height and body mass index (BMI) in Italy, the Netherlands and North America. The European Journal of Public Health . Oxford Academic , 21 (4), 414–419. https://doi.org/10.1093/eurpub/ckp228

Kundel, H. L. (1979). Images, image quality and observer performance. New horizons in radiology lecture. Radiology, 132 (2), 265–271. https://doi.org/10.1148/132.2.265

Makary, M. A., & Daniel, M. (2016). Medical error-the third leading cause of death in the US. BMJ (Online) . BMJ Publishing Group , 353 . https://doi.org/10.1136/bmj.i2139

Martin, C. J., Sharp, P. F., & Sutton, D. G. (1999). Measurement of image quality in diagnostic radiology. Applied Radiation and Isotopes . Elsevier Sci Ltd , 50 (1), 21–38. https://doi.org/10.1016/S0969-8043(98)00022-0

Mathematical methods of statistics / by Harald Cramer | National Library of Australia (n.d.). Available at: https://catalogue.nla.gov.au/Record/81100 . Accessed: 22 Mar 2021.

McCollough, C. H., & Schueler, B. A. (2000). Calculation of effective dose. Medical Physics . John Wiley and Sons Ltd , 27 (5), 828–837. https://doi.org/10.1118/1.598948

Meissner, H., et al. (n.d.). Best Practices for Mixed Methods Research in the Health Sciences_the_nature_and_design_of_mixed_methods_research .

Morgan, D. L. (1996). Focus groups. Annual Review of Sociology . Annual Reviews Inc. , 22 , 129–152. https://doi.org/10.1146/annurev.soc.22.1.129

Moses, L. E., Shapiro, D., & Littenberg, B. (1993). Combining independent studies of a diagnostic test into a summary roc curve: Data-analytic approaches and some additional considerations. Statistics in Medicine . John Wiley & Sons, Ltd , 12 (14), 1293–1316. https://doi.org/10.1002/sim.4780121403

Neill Howell 2008 Inferential Statistical Decision Tree – StuDocu . (n.d.). Available at: https://www.studocu.com/en-gb/document/university-of-hertfordshire/using-data-to-address-research-questions/summaries/neill-howell-2008-inferential-statistical-decision-tree/1193346/view . Accessed: 23 Mar 2021.

Neuendorf, K. A., & Kumar, A. (2016). Content analysis. In The international encyclopedia of political communication (pp. 1–10). Wiley. https://doi.org/10.1002/9781118541555.wbiepc065

Nguyen, P. K., et al. (2015). Assessment of the radiation effects of cardiac CT angiography using protein and genetic biomarkers. JACC: Cardiovascular Imaging . Elsevier Inc. , 8 (8), 873–884. https://doi.org/10.1016/j.jcmg.2015.04.016

Noordzij, M., et al. (2010). Sample size calculations: Basic principles and common pitfalls. Nephrology Dialysis Transplantation . Oxford University Press, , 25 (5), 1388–1393. https://doi.org/10.1093/ndt/gfp732

Pisano, E. D., et al. (2005). Diagnostic performance of digital versus film mammography for breast-cancer screening. New England Journal of Medicine . Massachusetts Medical Society , 353 (17), 1773–1783. https://doi.org/10.1056/NEJMoa052911

Article   CAS   Google Scholar  

ROC curve analysis with MedCalc. (n.d.). Available at: https://www.medcalc.org/manual/roc-curves.php . Accessed 30 Mar 2021.

Rudolfer, S. M. (2003). ZHOU, X.-H., OBUCHOWSKI, N. A. and MCCLISH, D. K. statistical methods in diagnostic medicine. Wiley, New York, 2002. xv + 437 pp. $94.95/£70.50. ISBN 0-471-34772-8. Biometrics . Wiley-Blackwell , 59 (1), 203–204. https://doi.org/10.1111/1541-0420.00266

Sudheesh, K., Duggappa, D. R., & Nethra, S. S. (2016). How to write a research proposal? Indian Journal of Anaesthesia . https://doi.org/10.4103/0019-5049.190617

Download references

Author information

Authors and affiliations.

University College Cork, Cork, Ireland

Mark F. McEntee

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mark F. McEntee .

Editor information

Editors and affiliations.

Medical Imaging, Faculty of Health, University of Canberra, Burnaby, BC, Canada

Euclid Seeram

Faculty of Health, University of Canberra, Canberra, ACT, Australia

Robert Davidson

Brookfield Health Sciences, University College Cork, Cork, Ireland

Andrew England

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Cite this chapter.

McEntee, M.F. (2021). Data Collection, Analysis, and Interpretation. In: Seeram, E., Davidson, R., England, A., McEntee, M.F. (eds) Research for Medical Imaging and Radiation Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-79956-4_6

Download citation

DOI : https://doi.org/10.1007/978-3-030-79956-4_6

Published : 03 January 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-79955-7

Online ISBN : 978-3-030-79956-4

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Module 10: Statistics: Collecting Data

Assignment: collecting data problem set.

  • What is the population of this survey?
  • What is the size of the population?
  • What is the size of the sample?
  • Give the sample statistic for the proportion of voters surveyed who said they were supporting the education bill.
  • Based on this sample, we might expect how many of the representatives to support the education bill?
  • Give the sample statistic for the proportion of voters surveyed who said they’d vote for Brown.
  • Based on this sample, we might expect how many of the 9500 voters to vote for Brown?
  • Identify the most relevant source of bias in this situation: A survey asks the following: Should the mall prohibit loud and annoying rock music in clothing stores catering to teenagers?
  • Identify the most relevant source of bias in this situation: To determine opinions on voter support for a downtown renovation project, a surveyor randomly questions people working in downtown businesses.
  • Identify the most relevant source of bias in this situation: A survey asks people to report their actual income and the income they reported on their IRS tax form.
  • Identify the most relevant source of bias in this situation: A survey randomly calls people from the phone book and asks them to answer a long series of questions.
  • Identify the most relevant source of bias in this situation: A survey asks the following: Should the death penalty be permitted if innocent people might die?
  • Identify the most relevant source of bias in this situation: A study seeks to investigate whether a new pain medication is safe to market to the public. They test by randomly selecting 300 men from a set of volunteers.
  • In a study, you ask the subjects their age in years. Is this data qualitative or quantitative?
  • In a study, you ask the subjects their gender. Is this data qualitative or quantitative?
  • Does this describe an observational study or an experiment: The temperature on randomly selected days throughout the year was measured.
  • Does this describe an observational study or an experiment? A group of students are told to listen to music while taking a test and their results are compared to a group not listening to music.
  • In a study, the sample is chosen by separating all cars by size, and selecting 10 of each size grouping. What is the sampling method?
  • In a study, the sample is chosen by writing everyone’s name on a playing card, shuffling the deck, then choosing the top 20 cards. What is the sampling method?
  • Which is the treatment group?
  • Which is the control group (if there is one)?
  • Is this study blind, double-blind, or neither?
  • Is this best described as an experiment, a controlled experiment, or a placebo controlled experiment?
  • Is this a sampling or a census?
  • Is this an observational study or an experiment?
  • Are there any possible sources of bias in this study?
  • This study involves two kinds of non-random sampling: (1) Subjects are not randomly sampled from some specified population and (2) Subjects are not randomly assigned to groups. Which problem is more serious? What affect on the results does each have?
  • A farmer believes that playing Barry Manilow songs to his peas will increase their yield. Describe a controlled experiment the farmer could use to test his theory.
  • A sports psychologist believes that people are more likely to be extroverted as adults if they played team sports as children. Describe two possible studies to test this theory. Design one as an observational study and the other as an experiment. Which is more practical?

Exploration

  • What is the population of this study?
  • List two reasons why the data may differ.
  • Can you tell if one researcher is correct and the other one is incorrect? Why?
  • Would you expect the data to be identical? Why or why not?
  • If the first researcher collected her data by randomly selecting 40 states, then selecting 1 person from each of those states. What sampling method is that?
  • If the second researcher collected his data by choosing 40 patients he knew. What sampling method would that researcher have used? What concerns would you have about this data set, based upon the data collection method?
  • Find a newspaper or magazine article, or the online equivalent, describing the results of a recent study (the results of a poll are not sufficient). Give a summary of the study’s findings, then analyze whether the article provided enough information to determine the validity of the conclusions. If not, produce a list of things that are missing from the article that would help you determine the validity of the study. Look for the things discussed in the text: population, sample, randomness, blind, control, placebos, etc.

Download the assignment from one of the links below (.docx or .rtf):

Collecting Data Problem Set: Word Document

Collecting Data Problem Set: Rich Text Format

  • Data Collection Sheet: Types + [Template Examples]

busayo.longe

  • Data Collection

One of the things you can’t do without, as an organization, is data collection. To make sense of this raw information for your business, you need to organize it in a data collection sheet.

Interestingly, there are many types of data collection sheets, but choosing the right one for your business can be difficult. After reading this article, you will be familiar with different types of data collection sheets and also know how to use Formplus for data collection . 

What is a Data Collection Sheet?

Simply put, a data collection sheet is a tool that is used to collect and organize data. It can also be defined as a worksheet that helps you to collect, process, and make sense of information from multiple data resources. 

Typically, a data collection sheet is divided into 3 columns. The first column contains different sets of data variables while the second column is used for data tallying. The third column is used for recording the total value of each data variable. 

Data collection sheets are very useful in quantitative research because they help you to gather, record, and organize different numerical values from the research variables. This makes it easier for you to arrive at your research outcomes. Common types of data collection sheets include check sheets, tally sheets, and frequency tables. 

Types of Data Collection Sheets

  • Check Sheet

A check sheet is a tool that is used to collect data from respondents in real-time. It also records the location where the data was collected, and most times, it is usually a blank form that is designed for the quick, easy, and efficient recording of the desired information. 

Data is recorded in the sheet using unique “marks” or checks. It is best to make use of a data check sheet when the information is being collected from the same location or the same person is handling the data collection process. This way, it is easy for you to objectively record responses and prevent data disruptions. 

A checklist serves as an effective tool when collecting data on the frequency and identifying patterns of events, problems, defects, and defect location, and for identifying defect causes. 

  • Tally Sheet

A tally sheet is a type of check sheet that is used to record quantitative data from form respondents. Quantitative data refers to data that can be quantified; that is an information set that has numerical or statistical value.  

A tally sheet is also known as a tabular check sheet because it is used to collect quantifiable data and determine the frequency of a specific occurrence in the research context. You’re probably familiar with this data collection sheet type from numerous simple arithmetic lessons back in primary school. 

  • Frequency Table

A frequency table is a statistical tool that is used to collect information on the number of times a research variable occurs in the research environment. This data set, when interpreted, can provide great insights into consumer behavior patterns in market research, among other things. 

In some cases, researchers merge this with the tally sheet to objectively capture data during a systematic investigation. Typically, the data variables are arranged in the table in ascending order, and the frequency (number of times they occur) is placed in a corresponding column. 

Uses of Data Collection Sheets

A data collection sheet is a systematic tool for collecting and analyzing data in research. Quantitative researchers use data collection sheets to track different numerical values in the course of the systematic investigation. 

  • It Saves Time

Using a data collection sheet helps you to be more efficient when carrying out a systematic investigation. With a data collection sheet, you can easily fill out and organize your data efficiently. 

  • Data Categorization

A data collection sheet makes data categorization easy. You can place data variables in categories as you create different columns in your sheet. 

  • Research Reporting

It is a useful tool in research reporting. You can include a copy of your data sheet in your research report to help other parties understand how and why your data was captured. 

  • It makes it easy for you to collect and process large volumes of data at once. 

Data Collection Sheet Templates 

  • Yearbook Form

Do you want to create a yearbook for your school or business? Use this Formplus yearbook form template to easily collect and organize data from different respondents. You can include different form fields to help you collect a variety of information from the respondents and export the data as CSV or sheet. 

  • Vendor Registration Form

Use this vendor registration form to collect information from individuals and organizations who wish to showcase their goods and services on your business platform. With this form template, you can easily organize vendors’ bio-data, place vendors in defined categories, and prevent any confusion. 

  • Student Data Sheet Template

With this student data sheet template, you can easily collect and process personal information from your students including contact details, parents/guardians’ information, email address, and home address. You can edit this template to suit your school’s needs in the easy-to-use form builder.

  • School Admission Form

This school admission form template can be modified to suit different needs. You can use it to collect different information from new and prospective students in your school such as contact details, email addresses, parents/guardians’ information, and other similar details. 

  • Online Conference Registration Form

Use this online conference registration form to swiftly collect information from prospective event attendees. With this form, you will be able to get all the data you need to ensure the success of your event. You can edit this template in the drag-and-drop form builder.

  • Manuscript Submission Form

This manuscript submission form makes it easier for publishers to collect manuscript submissions from interested writers and review these documents in time. With the file upload field, you can collect documents of any file size, directly in your manuscript submission form.  

  • Demographic Survey

Whether in market research or qualitative data collection, the Formplus demographic survey makes it easy for you to gather different types of information from your target audience. You can use the email invitation feature to share your form with the respondents.  

Primary vs Secondary Data Collection

Primary data collection is collecting information directly from the data source without going through any existing sources. It is a very common method for research projects and the data gotten can be shared publicly for other subsequent research

Secondary data collection, on the other hand, is collecting data that has been sourced in the past by someone else. This is usually available for others to use.

The major difference between these two methods is that researchers can collect the most recent data when conducting primary research , which may not be the case for secondary data.

Read: Primary vs Secondary Data (15 Key Differences)

How to Create a Data Collection Sheet with Formplus  

Formplus is an online tool that you can use to easily collect and process data from your target audience. In the drag-and-drop form builder, you can create your data collection sheet from scratch or edit any of the available templates to help you gather information from form respondents. 

Here is a simple guide to creating a data collection sheet with Formplus.

  • Log into your Formplus account. If you do not have a Formplus account, you can create one here. 
  • As you add each field, you can click on the “pencil” icon to access field editing options. 
  • Save your data collection sheet to access the builder’s customization section. Here, there are numerous features you can make use of to change the outlook of your form. You can add preferred background images, change your form font, or insert your organization’s logo in the datasheet. 
  • Google Sheets Integration: Formplus allows you to easily collaborate with your team members on form data and responses via Google sheets integration. With Google sheets integration, you can automatically update form responses in Google spreadsheets without having to import or export data.
  • You can also generate custom visual reports; that is graphs and charts, by clicking on the different fields and data sets in the form analytics dashboard. 

Why Use Formplus for Data Collection? 

Formplus has multiple form features that make data collection easy and seamless. Whether in research or quantitative data collection, Formplus forms help you to gather, organize, and process tons of information in no time. Let’s get familiar with some amazing data collection features on Formplus.

  • Offline Forms

Formplus allows you to collect data from respondents even when they are offline. This means that form respondents can fill out and submit their data collection sheet when they do not have a stable internet connection or any access to the internet. 

With offline forms, you do not have to bother about poor or zero internet access obstructing your data collection process. Any information inputted in offline mode will automatically be uploaded to your preferred cloud storage whenever internet connectivity is restored. 

  • Data Storage

We also offer secure storage for your information so that you do not have to bother about losing data at any time. In addition to this, you can choose where your data gets stored – in the Formplus secure servers or your preferred external cloud storage system. 

Presently, Formplus supports Google Drive, OneDrive, and DropBox cloud storage systems. After syncing your Formplus account to any of these external cloud storage systems, you can easily receive file uploads and organize form data in your preferred storage account. 

  • Multi-Option Fields

There are over 30 form fields for you to choose from in the form builder and this means that you can seamlessly collect different types of information from respondents. You can add date/time fields to your form and also carry out date/time calculations. 

You can also receive files directly in your form. Formplus allows respondents to upload files of any kind and size in your data collection sheet. Also, you can collect digital signatures and receive payments right in your form. 

  • Download as CSV/PDF

With Formplus, you also get the option to download form responses in CSV/PDF format. You can also export your form data to Google sheets to help you organize your data effectively and collaborate with your team members seamlessly. 

  • Form Analytics

The form analytics dashboard in the Formplus builder displays important analytics and metrics for your data collection process. Here, you can access form metrics like the total number of form submissions, the total number of form views, and the geographical locations from where form submissions were made. 

You can also build custom visual reports using the builder’s report summary tool. Simply click on the form field or data category to automatically display your data as custom graphs and charts. 

  • Teams and Collaboration

With teams and collaboration, you can easily keep all your team members in sync as you work on forms, data, and responses. Formplus allows you to add important collaborators to your shared account so that everyone can work on the data collection sheet together. 

As the administrator of the shared account, you can assign roles, grant permissions, and restrict access to folders and form data. This feature works with an audit trail that allows you to track any changes and/or suggestions made to your data collection sheet. 

Conclusion 

Data collection can be challenging, but working with a data collection sheet makes this process a lot easier. Apart from helping you to organize your data efficiently, a data collection sheet also helps you to save time by making data categorization less cumbersome. 

In this article, we’ve shared in-depth information on different types of data collection sheets and we’ve also introduced you to multiple data collection sheet templates on Formplus. Formplus makes it easier for you to create and modify different data collection sheets without delay. 

Logo

Connect to Formplus, Get Started Now - It's Free!

  • data collection sheet
  • data collection sheet examples
  • data collection template
  • busayo.longe

Formplus

You may also like:

How Technology is Revolutionizing Data Collection

As global industrialization continues to transform, it is becoming evident that there is a ubiquity of large datasets driven by the need...

collection of data assignment

Brand vs Category Development Index: Formula & Template

In this article, we are going to break down the brand and category development index along with how it applies to all brands in the market.

Survey Data Collection: Methods, Types, and Analysis

This article will discuss how to collect data via surveys and all the various methods available. As a bonus, you’ll learn how to use...

7 Data Collection Methods & Tools For Research

This guide highlights the best data collection methods and tools for either quantitative and qualitative research and the best online...

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council; Division of Behavioral and Social Sciences and Education; Commission on Behavioral and Social Sciences and Education; Committee on Basic Research in the Behavioral and Social Sciences; Gerstein DR, Luce RD, Smelser NJ, et al., editors. The Behavioral and Social Sciences: Achievements and Opportunities. Washington (DC): National Academies Press (US); 1988.

Cover of The Behavioral and Social Sciences: Achievements and Opportunities

The Behavioral and Social Sciences: Achievements and Opportunities.

  • Hardcopy Version at National Academies Press

5 Methods of Data Collection, Representation, and Analysis

This chapter concerns research on collecting, representing, and analyzing the data that underlie behavioral and social sciences knowledge. Such research, methodological in character, includes ethnographic and historical approaches, scaling, axiomatic measurement, and statistics, with its important relatives, econometrics and psychometrics. The field can be described as including the self-conscious study of how scientists draw inferences and reach conclusions from observations. Since statistics is the largest and most prominent of methodological approaches and is used by researchers in virtually every discipline, statistical work draws the lion’s share of this chapter’s attention.

Problems of interpreting data arise whenever inherent variation or measurement fluctuations create challenges to understand data or to judge whether observed relationships are significant, durable, or general. Some examples: Is a sharp monthly (or yearly) increase in the rate of juvenile delinquency (or unemployment) in a particular area a matter for alarm, an ordinary periodic or random fluctuation, or the result of a change or quirk in reporting method? Do the temporal patterns seen in such repeated observations reflect a direct causal mechanism, a complex of indirect ones, or just imperfections in the data? Is a decrease in auto injuries an effect of a new seat-belt law? Are the disagreements among people describing some aspect of a subculture too great to draw valid inferences about that aspect of the culture?

Such issues of inference are often closely connected to substantive theory and specific data, and to some extent it is difficult and perhaps misleading to treat methods of data collection, representation, and analysis separately. This report does so, as do all sciences to some extent, because the methods developed often are far more general than the specific problems that originally gave rise to them. There is much transfer of new ideas from one substantive field to another—and to and from fields outside the behavioral and social sciences. Some of the classical methods of statistics arose in studies of astronomical observations, biological variability, and human diversity. The major growth of the classical methods occurred in the twentieth century, greatly stimulated by problems in agriculture and genetics. Some methods for uncovering geometric structures in data, such as multidimensional scaling and factor analysis, originated in research on psychological problems, but have been applied in many other sciences. Some time-series methods were developed originally to deal with economic data, but they are equally applicable to many other kinds of data.

  • In economics: large-scale models of the U.S. economy; effects of taxation, money supply, and other government fiscal and monetary policies; theories of duopoly, oligopoly, and rational expectations; economic effects of slavery.
  • In psychology: test calibration; the formation of subjective probabilities, their revision in the light of new information, and their use in decision making; psychiatric epidemiology and mental health program evaluation.
  • In sociology and other fields: victimization and crime rates; effects of incarceration and sentencing policies; deployment of police and fire-fighting forces; discrimination, antitrust, and regulatory court cases; social networks; population growth and forecasting; and voting behavior.

Even such an abridged listing makes clear that improvements in methodology are valuable across the spectrum of empirical research in the behavioral and social sciences as well as in application to policy questions. Clearly, methodological research serves many different purposes, and there is a need to develop different approaches to serve those different purposes, including exploratory data analysis, scientific inference about hypotheses and population parameters, individual decision making, forecasting what will happen in the event or absence of intervention, and assessing causality from both randomized experiments and observational data.

This discussion of methodological research is divided into three areas: design, representation, and analysis. The efficient design of investigations must take place before data are collected because it involves how much, what kind of, and how data are to be collected. What type of study is feasible: experimental, sample survey, field observation, or other? What variables should be measured, controlled, and randomized? How extensive a subject pool or observational period is appropriate? How can study resources be allocated most effectively among various sites, instruments, and subsamples?

The construction of useful representations of the data involves deciding what kind of formal structure best expresses the underlying qualitative and quantitative concepts that are being used in a given study. For example, cost of living is a simple concept to quantify if it applies to a single individual with unchanging tastes in stable markets (that is, markets offering the same array of goods from year to year at varying prices), but as a national aggregate for millions of households and constantly changing consumer product markets, the cost of living is not easy to specify clearly or measure reliably. Statisticians, economists, sociologists, and other experts have long struggled to make the cost of living a precise yet practicable concept that is also efficient to measure, and they must continually modify it to reflect changing circumstances.

Data analysis covers the final step of characterizing and interpreting research findings: Can estimates of the relations between variables be made? Can some conclusion be drawn about correlation, cause and effect, or trends over time? How uncertain are the estimates and conclusions and can that uncertainty be reduced by analyzing the data in a different way? Can computers be used to display complex results graphically for quicker or better understanding or to suggest different ways of proceeding?

Advances in analysis, data representation, and research design feed into and reinforce one another in the course of actual scientific work. The intersections between methodological improvements and empirical advances are an important aspect of the multidisciplinary thrust of progress in the behavioral and social sciences.

  • Designs for Data Collection

Four broad kinds of research designs are used in the behavioral and social sciences: experimental, survey, comparative, and ethnographic.

Experimental designs, in either the laboratory or field settings, systematically manipulate a few variables while others that may affect the outcome are held constant, randomized, or otherwise controlled. The purpose of randomized experiments is to ensure that only one or a few variables can systematically affect the results, so that causes can be attributed. Survey designs include the collection and analysis of data from censuses, sample surveys, and longitudinal studies and the examination of various relationships among the observed phenomena. Randomization plays a different role here than in experimental designs: it is used to select members of a sample so that the sample is as representative of the whole population as possible. Comparative designs involve the retrieval of evidence that is recorded in the flow of current or past events in different times or places and the interpretation and analysis of this evidence. Ethnographic designs, also known as participant-observation designs, involve a researcher in intensive and direct contact with a group, community, or population being studied, through participation, observation, and extended interviewing.

Experimental Designs

Laboratory experiments.

Laboratory experiments underlie most of the work reported in Chapter 1 , significant parts of Chapter 2 , and some of the newest lines of research in Chapter 3 . Laboratory experiments extend and adapt classical methods of design first developed, for the most part, in the physical and life sciences and agricultural research. Their main feature is the systematic and independent manipulation of a few variables and the strict control or randomization of all other variables that might affect the phenomenon under study. For example, some studies of animal motivation involve the systematic manipulation of amounts of food and feeding schedules while other factors that may also affect motivation, such as body weight, deprivation, and so on, are held constant. New designs are currently coming into play largely because of new analytic and computational methods (discussed below, in “Advances in Statistical Inference and Analysis”).

Two examples of empirically important issues that demonstrate the need for broadening classical experimental approaches are open-ended responses and lack of independence of successive experimental trials. The first concerns the design of research protocols that do not require the strict segregation of the events of an experiment into well-defined trials, but permit a subject to respond at will. These methods are needed when what is of interest is how the respondent chooses to allocate behavior in real time and across continuously available alternatives. Such empirical methods have long been used, but they can generate very subtle and difficult problems in experimental design and subsequent analysis. As theories of allocative behavior of all sorts become more sophisticated and precise, the experimental requirements become more demanding, so the need to better understand and solve this range of design issues is an outstanding challenge to methodological ingenuity.

The second issue arises in repeated-trial designs when the behavior on successive trials, even if it does not exhibit a secular trend (such as a learning curve), is markedly influenced by what has happened in the preceding trial or trials. The more naturalistic the experiment and the more sensitive the meas urements taken, the more likely it is that such effects will occur. But such sequential dependencies in observations cause a number of important conceptual and technical problems in summarizing the data and in testing analytical models, which are not yet completely understood. In the absence of clear solutions, such effects are sometimes ignored by investigators, simplifying the data analysis but leaving residues of skepticism about the reliability and significance of the experimental results. With continuing development of sensitive measures in repeated-trial designs, there is a growing need for more advanced concepts and methods for dealing with experimental results that may be influenced by sequential dependencies.

Randomized Field Experiments

The state of the art in randomized field experiments, in which different policies or procedures are tested in controlled trials under real conditions, has advanced dramatically over the past two decades. Problems that were once considered major methodological obstacles—such as implementing randomized field assignment to treatment and control groups and protecting the randomization procedure from corruption—have been largely overcome. While state-of-the-art standards are not achieved in every field experiment, the commitment to reaching them is rising steadily, not only among researchers but also among customer agencies and sponsors.

The health insurance experiment described in Chapter 2 is an example of a major randomized field experiment that has had and will continue to have important policy reverberations in the design of health care financing. Field experiments with the negative income tax (guaranteed minimum income) conducted in the 1970s were significant in policy debates, even before their completion, and provided the most solid evidence available on how tax-based income support programs and marginal tax rates can affect the work incentives and family structures of the poor. Important field experiments have also been carried out on alternative strategies for the prevention of delinquency and other criminal behavior, reform of court procedures, rehabilitative programs in mental health, family planning, and special educational programs, among other areas.

In planning field experiments, much hinges on the definition and design of the experimental cells, the particular combinations needed of treatment and control conditions for each set of demographic or other client sample characteristics, including specification of the minimum number of cases needed in each cell to test for the presence of effects. Considerations of statistical power, client availability, and the theoretical structure of the inquiry enter into such specifications. Current important methodological thresholds are to find better ways of predicting recruitment and attrition patterns in the sample, of designing experiments that will be statistically robust in the face of problematic sample recruitment or excessive attrition, and of ensuring appropriate acquisition and analysis of data on the attrition component of the sample.

Also of major significance are improvements in integrating detailed process and outcome measurements in field experiments. To conduct research on program effects under field conditions requires continual monitoring to determine exactly what is being done—the process—how it corresponds to what was projected at the outset. Relatively unintrusive, inexpensive, and effective implementation measures are of great interest. There is, in parallel, a growing emphasis on designing experiments to evaluate distinct program components in contrast to summary measures of net program effects.

Finally, there is an important opportunity now for further theoretical work to model organizational processes in social settings and to design and select outcome variables that, in the relatively short time of most field experiments, can predict longer-term effects: For example, in job-training programs, what are the effects on the community (role models, morale, referral networks) or on individual skills, motives, or knowledge levels that are likely to translate into sustained changes in career paths and income levels?

Survey Designs

Many people have opinions about how societal mores, economic conditions, and social programs shape lives and encourage or discourage various kinds of behavior. People generalize from their own cases, and from the groups to which they belong, about such matters as how much it costs to raise a child, the extent to which unemployment contributes to divorce, and so on. In fact, however, effects vary so much from one group to another that homespun generalizations are of little use. Fortunately, behavioral and social scientists have been able to bridge the gaps between personal perspectives and collective realities by means of survey research. In particular, governmental information systems include volumes of extremely valuable survey data, and the facility of modern computers to store, disseminate, and analyze such data has significantly improved empirical tests and led to new understandings of social processes.

Within this category of research designs, two major types are distinguished: repeated cross-sectional surveys and longitudinal panel surveys. In addition, and cross-cutting these types, there is a major effort under way to improve and refine the quality of survey data by investigating features of human memory and of question formation that affect survey response.

Repeated cross-sectional designs can either attempt to measure an entire population—as does the oldest U.S. example, the national decennial census—or they can rest on samples drawn from a population. The general principle is to take independent samples at two or more times, measuring the variables of interest, such as income levels, housing plans, or opinions about public affairs, in the same way. The General Social Survey, collected by the National Opinion Research Center with National Science Foundation support, is a repeated cross sectional data base that was begun in 1972. One methodological question of particular salience in such data is how to adjust for nonresponses and “don’t know” responses. Another is how to deal with self-selection bias. For example, to compare the earnings of women and men in the labor force, it would be mistaken to first assume that the two samples of labor-force participants are randomly selected from the larger populations of men and women; instead, one has to consider and incorporate in the analysis the factors that determine who is in the labor force.

In longitudinal panels, a sample is drawn at one point in time and the relevant variables are measured at this and subsequent times for the same people. In more complex versions, some fraction of each panel may be replaced or added to periodically, such as expanding the sample to include households formed by the children of the original sample. An example of panel data developed in this way is the Panel Study of Income Dynamics (PSID), conducted by the University of Michigan since 1968 (discussed in Chapter 3 ).

Comparing the fertility or income of different people in different circumstances at the same time to find correlations always leaves a large proportion of the variability unexplained, but common sense suggests that much of the unexplained variability is actually explicable. There are systematic reasons for individual outcomes in each person’s past achievements, in parental models, upbringing, and earlier sequences of experiences. Unfortunately, asking people about the past is not particularly helpful: people remake their views of the past to rationalize the present and so retrospective data are often of uncertain validity. In contrast, generation-long longitudinal data allow readings on the sequence of past circumstances uncolored by later outcomes. Such data are uniquely useful for studying the causes and consequences of naturally occurring decisions and transitions. Thus, as longitudinal studies continue, quantitative analysis is becoming feasible about such questions as: How are the decisions of individuals affected by parental experience? Which aspects of early decisions constrain later opportunities? And how does detailed background experience leave its imprint? Studies like the two-decade-long PSID are bringing within grasp a complete generational cycle of detailed data on fertility, work life, household structure, and income.

Advances in Longitudinal Designs

Large-scale longitudinal data collection projects are uniquely valuable as vehicles for testing and improving survey research methodology. In ways that lie beyond the scope of a cross-sectional survey, longitudinal studies can sometimes be designed—without significant detriment to their substantive interests—to facilitate the evaluation and upgrading of data quality; the analysis of relative costs and effectiveness of alternative techniques of inquiry; and the standardization or coordination of solutions to problems of method, concept, and measurement across different research domains.

Some areas of methodological improvement include discoveries about the impact of interview mode on response (mail, telephone, face-to-face); the effects of nonresponse on the representativeness of a sample (due to respondents’ refusal or interviewers’ failure to contact); the effects on behavior of continued participation over time in a sample survey; the value of alternative methods of adjusting for nonresponse and incomplete observations (such as imputation of missing data, variable case weighting); the impact on response of specifying different recall periods, varying the intervals between interviews, or changing the length of interviews; and the comparison and calibration of results obtained by longitudinal surveys, randomized field experiments, laboratory studies, onetime surveys, and administrative records.

It should be especially noted that incorporating improvements in methodology and data quality has been and will no doubt continue to be crucial to the growing success of longitudinal studies. Panel designs are intrinsically more vulnerable than other designs to statistical biases due to cumulative item non-response, sample attrition, time-in-sample effects, and error margins in repeated measures, all of which may produce exaggerated estimates of change. Over time, a panel that was initially representative may become much less representative of a population, not only because of attrition in the sample, but also because of changes in immigration patterns, age structure, and the like. Longitudinal studies are also subject to changes in scientific and societal contexts that may create uncontrolled drifts over time in the meaning of nominally stable questions or concepts as well as in the underlying behavior. Also, a natural tendency to expand over time the range of topics and thus the interview lengths, which increases the burdens on respondents, may lead to deterioration of data quality or relevance. Careful methodological research to understand and overcome these problems has been done, and continued work as a component of new longitudinal studies is certain to advance the overall state of the art.

Longitudinal studies are sometimes pressed for evidence they are not designed to produce: for example, in important public policy questions concerning the impact of government programs in such areas as health promotion, disease prevention, or criminal justice. By using research designs that combine field experiments (with randomized assignment to program and control conditions) and longitudinal surveys, one can capitalize on the strongest merits of each: the experimental component provides stronger evidence for casual statements that are critical for evaluating programs and for illuminating some fundamental theories; the longitudinal component helps in the estimation of long-term program effects and their attenuation. Coupling experiments to ongoing longitudinal studies is not often feasible, given the multiple constraints of not disrupting the survey, developing all the complicated arrangements that go into a large-scale field experiment, and having the populations of interest overlap in useful ways. Yet opportunities to join field experiments to surveys are of great importance. Coupled studies can produce vital knowledge about the empirical conditions under which the results of longitudinal surveys turn out to be similar to—or divergent from—those produced by randomized field experiments. A pattern of divergence and similarity has begun to emerge in coupled studies; additional cases are needed to understand why some naturally occurring social processes and longitudinal design features seem to approximate formal random allocation and others do not. The methodological implications of such new knowledge go well beyond program evaluation and survey research. These findings bear directly on the confidence scientists—and others—can have in conclusions from observational studies of complex behavioral and social processes, particularly ones that cannot be controlled or simulated within the confines of a laboratory environment.

Memory and the Framing of Questions

A very important opportunity to improve survey methods lies in the reduction of nonsampling error due to questionnaire context, phrasing of questions, and, generally, the semantic and social-psychological aspects of surveys. Survey data are particularly affected by the fallibility of human memory and the sensitivity of respondents to the framework in which a question is asked. This sensitivity is especially strong for certain types of attitudinal and opinion questions. Efforts are now being made to bring survey specialists into closer contact with researchers working on memory function, knowledge representation, and language in order to uncover and reduce this kind of error.

Memory for events is often inaccurate, biased toward what respondents believe to be true—or should be true—about the world. In many cases in which data are based on recollection, improvements can be achieved by shifting to techniques of structured interviewing and calibrated forms of memory elicitation, such as specifying recent, brief time periods (for example, in the last seven days) within which respondents recall certain types of events with acceptable accuracy.

  • “Taking things altogether, how would you describe your marriage? Would you say that your marriage is very happy, pretty happy, or not too happy?”
  • “Taken altogether how would you say things are these days—would you say you are very happy, pretty happy, or not too happy?”

Presenting this sequence in both directions on different forms showed that the order affected answers to the general happiness question but did not change the marital happiness question: responses to the specific issue swayed subsequent responses to the general one, but not vice versa. The explanations for and implications of such order effects on the many kinds of questions and sequences that can be used are not simple matters. Further experimentation on the design of survey instruments promises not only to improve the accuracy and reliability of survey research, but also to advance understanding of how people think about and evaluate their behavior from day to day.

Comparative Designs

Both experiments and surveys involve interventions or questions by the scientist, who then records and analyzes the responses. In contrast, many bodies of social and behavioral data of considerable value are originally derived from records or collections that have accumulated for various nonscientific reasons, quite often administrative in nature, in firms, churches, military organizations, and governments at all levels. Data of this kind can sometimes be subjected to careful scrutiny, summary, and inquiry by historians and social scientists, and statistical methods have increasingly been used to develop and evaluate inferences drawn from such data. Some of the main comparative approaches are cross-national aggregate comparisons, selective comparison of a limited number of cases, and historical case studies.

Among the more striking problems facing the scientist using such data are the vast differences in what has been recorded by different agencies whose behavior is being compared (this is especially true for parallel agencies in different nations), the highly unrepresentative or idiosyncratic sampling that can occur in the collection of such data, and the selective preservation and destruction of records. Means to overcome these problems form a substantial methodological research agenda in comparative research. An example of the method of cross-national aggregative comparisons is found in investigations by political scientists and sociologists of the factors that underlie differences in the vitality of institutions of political democracy in different societies. Some investigators have stressed the existence of a large middle class, others the level of education of a population, and still others the development of systems of mass communication. In cross-national aggregate comparisons, a large number of nations are arrayed according to some measures of political democracy and then attempts are made to ascertain the strength of correlations between these and the other variables. In this line of analysis it is possible to use a variety of statistical cluster and regression techniques to isolate and assess the possible impact of certain variables on the institutions under study. While this kind of research is cross-sectional in character, statements about historical processes are often invoked to explain the correlations.

More limited selective comparisons, applied by many of the classic theorists, involve asking similar kinds of questions but over a smaller range of societies. Why did democracy develop in such different ways in America, France, and England? Why did northeastern Europe develop rational bourgeois capitalism, in contrast to the Mediterranean and Asian nations? Modern scholars have turned their attention to explaining, for example, differences among types of fascism between the two World Wars, and similarities and differences among modern state welfare systems, using these comparisons to unravel the salient causes. The questions asked in these instances are inevitably historical ones.

Historical case studies involve only one nation or region, and so they may not be geographically comparative. However, insofar as they involve tracing the transformation of a society’s major institutions and the role of its main shaping events, they involve a comparison of different periods of a nation’s or a region’s history. The goal of such comparisons is to give a systematic account of the relevant differences. Sometimes, particularly with respect to the ancient societies, the historical record is very sparse, and the methods of history and archaeology mesh in the reconstruction of complex social arrangements and patterns of change on the basis of few fragments.

Like all research designs, comparative ones have distinctive vulnerabilities and advantages: One of the main advantages of using comparative designs is that they greatly expand the range of data, as well as the amount of variation in those data, for study. Consequently, they allow for more encompassing explanations and theories that can relate highly divergent outcomes to one another in the same framework. They also contribute to reducing any cultural biases or tendencies toward parochialism among scientists studying common human phenomena.

One main vulnerability in such designs arises from the problem of achieving comparability. Because comparative study involves studying societies and other units that are dissimilar from one another, the phenomena under study usually occur in very different contexts—so different that in some cases what is called an event in one society cannot really be regarded as the same type of event in another. For example, a vote in a Western democracy is different from a vote in an Eastern bloc country, and a voluntary vote in the United States means something different from a compulsory vote in Australia. These circumstances make for interpretive difficulties in comparing aggregate rates of voter turnout in different countries.

The problem of achieving comparability appears in historical analysis as well. For example, changes in laws and enforcement and recording procedures over time change the definition of what is and what is not a crime, and for that reason it is difficult to compare the crime rates over time. Comparative researchers struggle with this problem continually, working to fashion equivalent measures; some have suggested the use of different measures (voting, letters to the editor, street demonstration) in different societies for common variables (political participation), to try to take contextual factors into account and to achieve truer comparability.

A second vulnerability is controlling variation. Traditional experiments make conscious and elaborate efforts to control the variation of some factors and thereby assess the causal significance of others. In surveys as well as experiments, statistical methods are used to control sources of variation and assess suspected causal significance. In comparative and historical designs, this kind of control is often difficult to attain because the sources of variation are many and the number of cases few. Scientists have made efforts to approximate such control in these cases of “many variables, small N.” One is the method of paired comparisons. If an investigator isolates 15 American cities in which racial violence has been recurrent in the past 30 years, for example, it is helpful to match them with 15 cities of similar population size, geographical region, and size of minorities—such characteristics are controls—and then search for systematic differences between the two sets of cities. Another method is to select, for comparative purposes, a sample of societies that resemble one another in certain critical ways, such as size, common language, and common level of development, thus attempting to hold these factors roughly constant, and then seeking explanations among other factors in which the sampled societies differ from one another.

Ethnographic Designs

Traditionally identified with anthropology, ethnographic research designs are playing increasingly significant roles in most of the behavioral and social sciences. The core of this methodology is participant-observation, in which a researcher spends an extended period of time with the group under study, ideally mastering the local language, dialect, or special vocabulary, and participating in as many activities of the group as possible. This kind of participant-observation is normally coupled with extensive open-ended interviewing, in which people are asked to explain in depth the rules, norms, practices, and beliefs through which (from their point of view) they conduct their lives. A principal aim of ethnographic study is to discover the premises on which those rules, norms, practices, and beliefs are built.

The use of ethnographic designs by anthropologists has contributed significantly to the building of knowledge about social and cultural variation. And while these designs continue to center on certain long-standing features—extensive face-to-face experience in the community, linguistic competence, participation, and open-ended interviewing—there are newer trends in ethnographic work. One major trend concerns its scale. Ethnographic methods were originally developed largely for studying small-scale groupings known variously as village, folk, primitive, preliterate, or simple societies. Over the decades, these methods have increasingly been applied to the study of small groups and networks within modern (urban, industrial, complex) society, including the contemporary United States. The typical subjects of ethnographic study in modern society are small groups or relatively small social networks, such as outpatient clinics, medical schools, religious cults and churches, ethnically distinctive urban neighborhoods, corporate offices and factories, and government bureaus and legislatures.

As anthropologists moved into the study of modern societies, researchers in other disciplines—particularly sociology, psychology, and political science—began using ethnographic methods to enrich and focus their own insights and findings. At the same time, studies of large-scale structures and processes have been aided by the use of ethnographic methods, since most large-scale changes work their way into the fabric of community, neighborhood, and family, affecting the daily lives of people. Ethnographers have studied, for example, the impact of new industry and new forms of labor in “backward” regions; the impact of state-level birth control policies on ethnic groups; and the impact on residents in a region of building a dam or establishing a nuclear waste dump. Ethnographic methods have also been used to study a number of social processes that lend themselves to its particular techniques of observation and interview—processes such as the formation of class and racial identities, bureaucratic behavior, legislative coalitions and outcomes, and the formation and shifting of consumer tastes.

Advances in structured interviewing (see above) have proven especially powerful in the study of culture. Techniques for understanding kinship systems, concepts of disease, color terminologies, ethnobotany, and ethnozoology have been radically transformed and strengthened by coupling new interviewing methods with modem measurement and scaling techniques (see below). These techniques have made possible more precise comparisons among cultures and identification of the most competent and expert persons within a culture. The next step is to extend these methods to study the ways in which networks of propositions (such as boys like sports, girls like babies) are organized to form belief systems. Much evidence suggests that people typically represent the world around them by means of relatively complex cognitive models that involve interlocking propositions. The techniques of scaling have been used to develop models of how people categorize objects, and they have great potential for further development, to analyze data pertaining to cultural propositions.

Ideological Systems

Perhaps the most fruitful area for the application of ethnographic methods in recent years has been the systematic study of ideologies in modern society. Earlier studies of ideology were in small-scale societies that were rather homogeneous. In these studies researchers could report on a single culture, a uniform system of beliefs and values for the society as a whole. Modern societies are much more diverse both in origins and number of subcultures, related to different regions, communities, occupations, or ethnic groups. Yet these subcultures and ideologies share certain underlying assumptions or at least must find some accommodation with the dominant value and belief systems in the society.

The challenge is to incorporate this greater complexity of structure and process into systematic descriptions and interpretations. One line of work carried out by researchers has tried to track the ways in which ideologies are created, transmitted, and shared among large populations that have traditionally lacked the social mobility and communications technologies of the West. This work has concentrated on large-scale civilizations such as China, India, and Central America. Gradually, the focus has generalized into a concern with the relationship between the great traditions—the central lines of cosmopolitan Confucian, Hindu, or Mayan culture, including aesthetic standards, irrigation technologies, medical systems, cosmologies and calendars, legal codes, poetic genres, and religious doctrines and rites—and the little traditions, those identified with rural, peasant communities. How are the ideological doctrines and cultural values of the urban elites, the great traditions, transmitted to local communities? How are the little traditions, the ideas from the more isolated, less literate, and politically weaker groups in society, transmitted to the elites?

India and southern Asia have been fruitful areas for ethnographic research on these questions. The great Hindu tradition was present in virtually all local contexts through the presence of high-caste individuals in every community. It operated as a pervasive standard of value for all members of society, even in the face of strong little traditions. The situation is surprisingly akin to that of modern, industrialized societies. The central research questions are the degree and the nature of penetration of dominant ideology, even in groups that appear marginal and subordinate and have no strong interest in sharing the dominant value system. In this connection the lowest and poorest occupational caste—the untouchables—serves as an ultimate test of the power of ideology and cultural beliefs to unify complex hierarchical social systems.

Historical Reconstruction

Another current trend in ethnographic methods is its convergence with archival methods. One joining point is the application of descriptive and interpretative procedures used by ethnographers to reconstruct the cultures that created historical documents, diaries, and other records, to interview history, so to speak. For example, a revealing study showed how the Inquisition in the Italian countryside between the 1570s and 1640s gradually worked subtle changes in an ancient fertility cult in peasant communities; the peasant beliefs and rituals assimilated many elements of witchcraft after learning them from their persecutors. A good deal of social history—particularly that of the family—has drawn on discoveries made in the ethnographic study of primitive societies. As described in Chapter 4 , this particular line of inquiry rests on a marriage of ethnographic, archival, and demographic approaches.

Other lines of ethnographic work have focused on the historical dimensions of nonliterate societies. A strikingly successful example in this kind of effort is a study of head-hunting. By combining an interpretation of local oral tradition with the fragmentary observations that were made by outside observers (such as missionaries, traders, colonial officials), historical fluctuations in the rate and significance of head-hunting were shown to be partly in response to such international forces as the great depression and World War II. Researchers are also investigating the ways in which various groups in contemporary societies invent versions of traditions that may or may not reflect the actual history of the group. This process has been observed among elites seeking political and cultural legitimation and among hard-pressed minorities (for example, the Basque in Spain, the Welsh in Great Britain) seeking roots and political mobilization in a larger society.

Ethnography is a powerful method to record, describe, and interpret the system of meanings held by groups and to discover how those meanings affect the lives of group members. It is a method well adapted to the study of situations in which people interact with one another and the researcher can interact with them as well, so that information about meanings can be evoked and observed. Ethnography is especially suited to exploration and elucidation of unsuspected connections; ideally, it is used in combination with other methods—experimental, survey, or comparative—to establish with precision the relative strengths and weaknesses of such connections. By the same token, experimental, survey, and comparative methods frequently yield connections, the meaning of which is unknown; ethnographic methods are a valuable way to determine them.

  • Models for Representing Phenomena

The objective of any science is to uncover the structure and dynamics of the phenomena that are its subject, as they are exhibited in the data. Scientists continuously try to describe possible structures and ask whether the data can, with allowance for errors of measurement, be described adequately in terms of them. Over a long time, various families of structures have recurred throughout many fields of science; these structures have become objects of study in their own right, principally by statisticians, other methodological specialists, applied mathematicians, and philosophers of logic and science. Methods have evolved to evaluate the adequacy of particular structures to account for particular types of data. In the interest of clarity we discuss these structures in this section and the analytical methods used for estimation and evaluation of them in the next section, although in practice they are closely intertwined.

A good deal of mathematical and statistical modeling attempts to describe the relations, both structural and dynamic, that hold among variables that are presumed to be representable by numbers. Such models are applicable in the behavioral and social sciences only to the extent that appropriate numerical measurement can be devised for the relevant variables. In many studies the phenomena in question and the raw data obtained are not intrinsically numerical, but qualitative, such as ethnic group identifications. The identifying numbers used to code such questionnaire categories for computers are no more than labels, which could just as well be letters or colors. One key question is whether there is some natural way to move from the qualitative aspects of such data to a structural representation that involves one of the well-understood numerical or geometric models or whether such an attempt would be inherently inappropriate for the data in question. The decision as to whether or not particular empirical data can be represented in particular numerical or more complex structures is seldom simple, and strong intuitive biases or a priori assumptions about what can and cannot be done may be misleading.

Recent decades have seen rapid and extensive development and application of analytical methods attuned to the nature and complexity of social science data. Examples of nonnumerical modeling are increasing. Moreover, the widespread availability of powerful computers is probably leading to a qualitative revolution, it is affecting not only the ability to compute numerical solutions to numerical models, but also to work out the consequences of all sorts of structures that do not involve numbers at all. The following discussion gives some indication of the richness of past progress and of future prospects although it is by necessity far from exhaustive.

In describing some of the areas of new and continuing research, we have organized this section on the basis of whether the representations are fundamentally probabilistic or not. A further useful distinction is between representations of data that are highly discrete or categorical in nature (such as whether a person is male or female) and those that are continuous in nature (such as a person’s height). Of course, there are intermediate cases involving both types of variables, such as color stimuli that are characterized by discrete hues (red, green) and a continuous luminance measure. Probabilistic models lead very naturally to questions of estimation and statistical evaluation of the correspondence between data and model. Those that are not probabilistic involve additional problems of dealing with and representing sources of variability that are not explicitly modeled. At the present time, scientists understand some aspects of structure, such as geometries, and some aspects of randomness, as embodied in probability models, but do not yet adequately understand how to put the two together in a single unified model. Table 5-1 outlines the way we have organized this discussion and shows where the examples in this section lie.

Table 5-1. A Classification of Structural Models.

A Classification of Structural Models.

Probability Models

Some behavioral and social sciences variables appear to be more or less continuous, for example, utility of goods, loudness of sounds, or risk associated with uncertain alternatives. Many other variables, however, are inherently categorical, often with only two or a few values possible: for example, whether a person is in or out of school, employed or not employed, identifies with a major political party or political ideology. And some variables, such as moral attitudes, are typically measured in research with survey questions that allow only categorical responses. Much of the early probability theory was formulated only for continuous variables; its use with categorical variables was not really justified, and in some cases it may have been misleading. Recently, very significant advances have been made in how to deal explicitly with categorical variables. This section first describes several contemporary approaches to models involving categorical variables, followed by ones involving continuous representations.

Log-Linear Models for Categorical Variables

Many recent models for analyzing categorical data of the kind usually displayed as counts (cell frequencies) in multidimensional contingency tables are subsumed under the general heading of log-linear models, that is, linear models in the natural logarithms of the expected counts in each cell in the table. These recently developed forms of statistical analysis allow one to partition variability due to various sources in the distribution of categorical attributes, and to isolate the effects of particular variables or combinations of them.

Present log-linear models were first developed and used by statisticians and sociologists and then found extensive application in other social and behavioral sciences disciplines. When applied, for instance, to the analysis of social mobility, such models separate factors of occupational supply and demand from other factors that impede or propel movement up and down the social hierarchy. With such models, for example, researchers discovered the surprising fact that occupational mobility patterns are strikingly similar in many nations of the world (even among disparate nations like the United States and most of the Eastern European socialist countries), and from one time period to another, once allowance is made for differences in the distributions of occupations. The log-linear and related kinds of models have also made it possible to identify and analyze systematic differences in mobility among nations and across time. As another example of applications, psychologists and others have used log-linear models to analyze attitudes and their determinants and to link attitudes to behavior. These methods have also diffused to and been used extensively in the medical and biological sciences.

Regression Models for Categorical Variables

Models that permit one variable to be explained or predicted by means of others, called regression models, are the workhorses of much applied statistics; this is especially true when the dependent (explained) variable is continuous. For a two-valued dependent variable, such as alive or dead, models and approximate theory and computational methods for one explanatory variable were developed in biometry about 50 years ago. Computer programs able to handle many explanatory variables, continuous or categorical, are readily available today. Even now, however, the accuracy of the approximate theory on given data is an open question.

Using classical utility theory, economists have developed discrete choice models that turn out to be somewhat related to the log-linear and categorical regression models. Models for limited dependent variables, especially those that cannot take on values above or below a certain level (such as weeks unemployed, number of children, and years of schooling) have been used profitably in economics and in some other areas. For example, censored normal variables (called tobits in economics), in which observed values outside certain limits are simply counted, have been used in studying decisions to go on in school. It will require further research and development to incorporate information about limited ranges of variables fully into the main multivariate methodologies. In addition, with respect to the assumptions about distribution and functional form conventionally made in discrete response models, some new methods are now being developed that show promise of yielding reliable inferences without making unrealistic assumptions; further research in this area promises significant progress.

One problem arises from the fact that many of the categorical variables collected by the major data bases are ordered. For example, attitude surveys frequently use a 3-, 5-, or 7-point scale (from high to low) without specifying numerical intervals between levels. Social class and educational levels are often described by ordered categories. Ignoring order information, which many traditional statistical methods do, may be inefficient or inappropriate, but replacing the categories by successive integers or other arbitrary scores may distort the results. (For additional approaches to this question, see sections below on ordered structures.) Regression-like analysis of ordinal categorical variables is quite well developed, but their multivariate analysis needs further research. New log-bilinear models have been proposed, but to date they deal specifically with only two or three categorical variables. Additional research extending the new models, improving computational algorithms, and integrating the models with work on scaling promise to lead to valuable new knowledge.

Models for Event Histories

Event-history studies yield the sequence of events that respondents to a survey sample experience over a period of time; for example, the timing of marriage, childbearing, or labor force participation. Event-history data can be used to study educational progress, demographic processes (migration, fertility, and mortality), mergers of firms, labor market behavior, and even riots, strikes, and revolutions. As interest in such data has grown, many researchers have turned to models that pertain to changes in probabilities over time to describe when and how individuals move among a set of qualitative states.

Much of the progress in models for event-history data builds on recent developments in statistics and biostatistics for life-time, failure-time, and hazard models. Such models permit the analysis of qualitative transitions in a population whose members are undergoing partially random organic deterioration, mechanical wear, or other risks over time. With the increased complexity of event-history data that are now being collected, and the extension of event-history data bases over very long periods of time, new problems arise that cannot be effectively handled by older types of analysis. Among the problems are repeated transitions, such as between unemployment and employment or marriage and divorce; more than one time variable (such as biological age, calendar time, duration in a stage, and time exposed to some specified condition); latent variables (variables that are explicitly modeled even though not observed); gaps in the data; sample attrition that is not randomly distributed over the categories; and respondent difficulties in recalling the exact timing of events.

Models for Multiple-Item Measurement

For a variety of reasons, researchers typically use multiple measures (or multiple indicators) to represent theoretical concepts. Sociologists, for example, often rely on two or more variables (such as occupation and education) to measure an individual’s socioeconomic position; educational psychologists ordinarily measure a student’s ability with multiple test items. Despite the fact that the basic observations are categorical, in a number of applications this is interpreted as a partitioning of something continuous. For example, in test theory one thinks of the measures of both item difficulty and respondent ability as continuous variables, possibly multidimensional in character.

Classical test theory and newer item-response theories in psychometrics deal with the extraction of information from multiple measures. Testing, which is a major source of data in education and other areas, results in millions of test items stored in archives each year for purposes ranging from college admissions to job-training programs for industry. One goal of research on such test data is to be able to make comparisons among persons or groups even when different test items are used. Although the information collected from each respondent is intentionally incomplete in order to keep the tests short and simple, item-response techniques permit researchers to reconstitute the fragments into an accurate picture of overall group proficiencies. These new methods provide a better theoretical handle on individual differences, and they are expected to be extremely important in developing and using tests. For example, they have been used in attempts to equate different forms of a test given in successive waves during a year, a procedure made necessary in large-scale testing programs by legislation requiring disclosure of test-scoring keys at the time results are given.

An example of the use of item-response theory in a significant research effort is the National Assessment of Educational Progress (NAEP). The goal of this project is to provide accurate, nationally representative information on the average (rather than individual) proficiency of American children in a wide variety of academic subjects as they progress through elementary and secondary school. This approach is an improvement over the use of trend data on university entrance exams, because NAEP estimates of academic achievements (by broad characteristics such as age, grade, region, ethnic background, and so on) are not distorted by the self-selected character of those students who seek admission to college, graduate, and professional programs.

Item-response theory also forms the basis of many new psychometric instruments, known as computerized adaptive testing, currently being implemented by the U.S. military services and under additional development in many testing organizations. In adaptive tests, a computer program selects items for each examinee based upon the examinee’s success with previous items. Generally, each person gets a slightly different set of items and the equivalence of scale scores is established by using item-response theory. Adaptive testing can greatly reduce the number of items needed to achieve a given level of measurement accuracy.

Nonlinear, Nonadditive Models

Virtually all statistical models now in use impose a linearity or additivity assumption of some kind, sometimes after a nonlinear transformation of variables. Imposing these forms on relationships that do not, in fact, possess them may well result in false descriptions and spurious effects. Unwary users, especially of computer software packages, can easily be misled. But more realistic nonlinear and nonadditive multivariate models are becoming available. Extensive use with empirical data is likely to force many changes and enhancements in such models and stimulate quite different approaches to nonlinear multivariate analysis in the next decade.

Geometric and Algebraic Models

Geometric and algebraic models attempt to describe underlying structural relations among variables. In some cases they are part of a probabilistic approach, such as the algebraic models underlying regression or the geometric representations of correlations between items in a technique called factor analysis. In other cases, geometric and algebraic models are developed without explicitly modeling the element of randomness or uncertainty that is always present in the data. Although this latter approach to behavioral and social sciences problems has been less researched than the probabilistic one, there are some advantages in developing the structural aspects independent of the statistical ones. We begin the discussion with some inherently geometric representations and then turn to numerical representations for ordered data.

Although geometry is a huge mathematical topic, little of it seems directly applicable to the kinds of data encountered in the behavioral and social sciences. A major reason is that the primitive concepts normally used in geometry—points, lines, coincidence—do not correspond naturally to the kinds of qualitative observations usually obtained in behavioral and social sciences contexts. Nevertheless, since geometric representations are used to reduce bodies of data, there is a real need to develop a deeper understanding of when such representations of social or psychological data make sense. Moreover, there is a practical need to understand why geometric computer algorithms, such as those of multidimensional scaling, work as well as they apparently do. A better understanding of the algorithms will increase the efficiency and appropriateness of their use, which becomes increasingly important with the widespread availability of scaling programs for microcomputers.

Over the past 50 years several kinds of well-understood scaling techniques have been developed and widely used to assist in the search for appropriate geometric representations of empirical data. The whole field of scaling is now entering a critical juncture in terms of unifying and synthesizing what earlier appeared to be disparate contributions. Within the past few years it has become apparent that several major methods of analysis, including some that are based on probabilistic assumptions, can be unified under the rubric of a single generalized mathematical structure. For example, it has recently been demonstrated that such diverse approaches as nonmetric multidimensional scaling, principal-components analysis, factor analysis, correspondence analysis, and log-linear analysis have more in common in terms of underlying mathematical structure than had earlier been realized.

Nonmetric multidimensional scaling is a method that begins with data about the ordering established by subjective similarity (or nearness) between pairs of stimuli. The idea is to embed the stimuli into a metric space (that is, a geometry with a measure of distance between points) in such a way that distances between points corresponding to stimuli exhibit the same ordering as do the data. This method has been successfully applied to phenomena that, on other grounds, are known to be describable in terms of a specific geometric structure; such applications were used to validate the procedures. Such validation was done, for example, with respect to the perception of colors, which are known to be describable in terms of a particular three-dimensional structure known as the Euclidean color coordinates. Similar applications have been made with Morse code symbols and spoken phonemes. The technique is now used in some biological and engineering applications, as well as in some of the social sciences, as a method of data exploration and simplification.

One question of interest is how to develop an axiomatic basis for various geometries using as a primitive concept an observable such as the subject’s ordering of the relative similarity of one pair of stimuli to another, which is the typical starting point of such scaling. The general task is to discover properties of the qualitative data sufficient to ensure that a mapping into the geometric structure exists and, ideally, to discover an algorithm for finding it. Some work of this general type has been carried out: for example, there is an elegant set of axioms based on laws of color matching that yields the three-dimensional vectorial representation of color space. But the more general problem of understanding the conditions under which the multidimensional scaling algorithms are suitable remains unsolved. In addition, work is needed on understanding more general, non-Euclidean spatial models.

Ordered Factorial Systems

One type of structure common throughout the sciences arises when an ordered dependent variable is affected by two or more ordered independent variables. This is the situation to which regression and analysis-of-variance models are often applied; it is also the structure underlying the familiar physical identities, in which physical units are expressed as products of the powers of other units (for example, energy has the unit of mass times the square of the unit of distance divided by the square of the unit of time).

There are many examples of these types of structures in the behavioral and social sciences. One example is the ordering of preference of commodity bundles—collections of various amounts of commodities—which may be revealed directly by expressions of preference or indirectly by choices among alternative sets of bundles. A related example is preferences among alternative courses of action that involve various outcomes with differing degrees of uncertainty; this is one of the more thoroughly investigated problems because of its potential importance in decision making. A psychological example is the trade-off between delay and amount of reward, yielding those combinations that are equally reinforcing. In a common, applied kind of problem, a subject is given descriptions of people in terms of several factors, for example, intelligence, creativity, diligence, and honesty, and is asked to rate them according to a criterion such as suitability for a particular job.

In all these cases and a myriad of others like them the question is whether the regularities of the data permit a numerical representation. Initially, three types of representations were studied quite fully: the dependent variable as a sum, a product, or a weighted average of the measures associated with the independent variables. The first two representations underlie some psychological and economic investigations, as well as a considerable portion of physical measurement and modeling in classical statistics. The third representation, averaging, has proved most useful in understanding preferences among uncertain outcomes and the amalgamation of verbally described traits, as well as some physical variables.

For each of these three cases—adding, multiplying, and averaging—researchers know what properties or axioms of order the data must satisfy for such a numerical representation to be appropriate. On the assumption that one or another of these representations exists, and using numerical ratings by subjects instead of ordering, a scaling technique called functional measurement (referring to the function that describes how the dependent variable relates to the independent ones) has been developed and applied in a number of domains. What remains problematic is how to encompass at the ordinal level the fact that some random error intrudes into nearly all observations and then to show how that randomness is represented at the numerical level; this continues to be an unresolved and challenging research issue.

During the past few years considerable progress has been made in understanding certain representations inherently different from those just discussed. The work has involved three related thrusts. The first is a scheme of classifying structures according to how uniquely their representation is constrained. The three classical numerical representations are known as ordinal, interval, and ratio scale types. For systems with continuous numerical representations and of scale type at least as rich as the ratio one, it has been shown that only one additional type can exist. A second thrust is to accept structural assumptions, like factorial ones, and to derive for each scale the possible functional relations among the independent variables. And the third thrust is to develop axioms for the properties of an order relation that leads to the possible representations. Much is now known about the possible nonadditive representations of both the multifactor case and the one where stimuli can be combined, such as combining sound intensities.

Closely related to this classification of structures is the question: What statements, formulated in terms of the measures arising in such representations, can be viewed as meaningful in the sense of corresponding to something empirical? Statements here refer to any scientific assertions, including statistical ones, formulated in terms of the measures of the variables and logical and mathematical connectives. These are statements for which asserting truth or falsity makes sense. In particular, statements that remain invariant under certain symmetries of structure have played an important role in classical geometry, dimensional analysis in physics, and in relating measurement and statistical models applied to the same phenomenon. In addition, these ideas have been used to construct models in more formally developed areas of the behavioral and social sciences, such as psychophysics. Current research has emphasized the communality of these historically independent developments and is attempting both to uncover systematic, philosophically sound arguments as to why invariance under symmetries is as important as it appears to be and to understand what to do when structures lack symmetry, as, for example, when variables have an inherent upper bound.

Many subjects do not seem to be correctly represented in terms of distances in continuous geometric space. Rather, in some cases, such as the relations among meanings of words—which is of great interest in the study of memory representations—a description in terms of tree-like, hierarchial structures appears to be more illuminating. This kind of description appears appropriate both because of the categorical nature of the judgments and the hierarchial, rather than trade-off, nature of the structure. Individual items are represented as the terminal nodes of the tree, and groupings by different degrees of similarity are shown as intermediate nodes, with the more general groupings occurring nearer the root of the tree. Clustering techniques, requiring considerable computational power, have been and are being developed. Some successful applications exist, but much more refinement is anticipated.

Network Models

Several other lines of advanced modeling have progressed in recent years, opening new possibilities for empirical specification and testing of a variety of theories. In social network data, relationships among units, rather than the units themselves, are the primary objects of study: friendships among persons, trade ties among nations, cocitation clusters among research scientists, interlocking among corporate boards of directors. Special models for social network data have been developed in the past decade, and they give, among other things, precise new measures of the strengths of relational ties among units. A major challenge in social network data at present is to handle the statistical dependence that arises when the units sampled are related in complex ways.

  • Statistical Inference and Analysis

As was noted earlier, questions of design, representation, and analysis are intimately intertwined. Some issues of inference and analysis have been discussed above as related to specific data collection and modeling approaches. This section discusses some more general issues of statistical inference and advances in several current approaches to them.

Causal Inference

Behavioral and social scientists use statistical methods primarily to infer the effects of treatments, interventions, or policy factors. Previous chapters included many instances of causal knowledge gained this way. As noted above, the large experimental study of alternative health care financing discussed in Chapter 2 relied heavily on statistical principles and techniques, including randomization, in the design of the experiment and the analysis of the resulting data. Sophisticated designs were necessary in order to answer a variety of questions in a single large study without confusing the effects of one program difference (such as prepayment or fee for service) with the effects of another (such as different levels of deductible costs), or with effects of unobserved variables (such as genetic differences). Statistical techniques were also used to ascertain which results applied across the whole enrolled population and which were confined to certain subgroups (such as individuals with high blood pressure) and to translate utilization rates across different programs and types of patients into comparable overall dollar costs and health outcomes for alternative financing options.

A classical experiment, with systematic but randomly assigned variation of the variables of interest (or some reasonable approach to this), is usually considered the most rigorous basis from which to draw such inferences. But random samples or randomized experimental manipulations are not always feasible or ethically acceptable. Then, causal inferences must be drawn from observational studies, which, however well designed, are less able to ensure that the observed (or inferred) relationships among variables provide clear evidence on the underlying mechanisms of cause and effect.

Certain recurrent challenges have been identified in studying causal inference. One challenge arises from the selection of background variables to be measured, such as the sex, nativity, or parental religion of individuals in a comparative study of how education affects occupational success. The adequacy of classical methods of matching groups in background variables and adjusting for covariates needs further investigation. Statistical adjustment of biases linked to measured background variables is possible, but it can become complicated. Current work in adjustment for selectivity bias is aimed at weakening implausible assumptions, such as normality, when carrying out these adjustments. Even after adjustment has been made for the measured background variables, other, unmeasured variables are almost always still affecting the results (such as family transfers of wealth or reading habits). Analyses of how the conclusions might change if such unmeasured variables could be taken into account is essential in attempting to make causal inferences from an observational study, and systematic work on useful statistical models for such sensitivity analyses is just beginning.

The third important issue arises from the necessity for distinguishing among competing hypotheses when the explanatory variables are measured with different degrees of precision. Both the estimated size and significance of an effect are diminished when it has large measurement error, and the coefficients of other correlated variables are affected even when the other variables are measured perfectly. Similar results arise from conceptual errors, when one measures only proxies for a theoretical construct (such as years of education to represent amount of learning). In some cases, there are procedures for simultaneously or iteratively estimating both the precision of complex measures and their effect on a particular criterion.

Although complex models are often necessary to infer causes, once their output is available, it should be translated into understandable displays for evaluation. Results that depend on the accuracy of a multivariate model and the associated software need to be subjected to appropriate checks, including the evaluation of graphical displays, group comparisons, and other analyses.

New Statistical Techniques

Internal resampling.

One of the great contributions of twentieth-century statistics was to demonstrate how a properly drawn sample of sufficient size, even if it is only a tiny fraction of the population of interest, can yield very good estimates of most population characteristics. When enough is known at the outset about the characteristic in question—for example, that its distribution is roughly normal—inference from the sample data to the population as a whole is straightforward, and one can easily compute measures of the certainty of inference, a common example being the 95 percent confidence interval around an estimate. But population shapes are sometimes unknown or uncertain, and so inference procedures cannot be so simple. Furthermore, more often than not, it is difficult to assess even the degree of uncertainty associated with complex data and with the statistics needed to unravel complex social and behavioral phenomena.

Internal resampling methods attempt to assess this uncertainty by generating a number of simulated data sets similar to the one actually observed. The definition of similar is crucial, and many methods that exploit different types of similarity have been devised. These methods provide researchers the freedom to choose scientifically appropriate procedures and to replace procedures that are valid under assumed distributional shapes with ones that are not so restricted. Flexible and imaginative computer simulation is the key to these methods. For a simple random sample, the “bootstrap” method repeatedly resamples the obtained data (with replacement) to generate a distribution of possible data sets. The distribution of any estimator can thereby be simulated and measures of the certainty of inference be derived. The “jackknife” method repeatedly omits a fraction of the data and in this way generates a distribution of possible data sets that can also be used to estimate variability. These methods can also be used to remove or reduce bias. For example, the ratio-estimator, a statistic that is commonly used in analyzing sample surveys and censuses, is known to be biased, and the jackknife method can usually remedy this defect. The methods have been extended to other situations and types of analysis, such as multiple regression.

There are indications that under relatively general conditions, these methods, and others related to them, allow more accurate estimates of the uncertainty of inferences than do the traditional ones that are based on assumed (usually, normal) distributions when that distributional assumption is unwarranted. For complex samples, such internal resampling or subsampling facilitates estimating the sampling variances of complex statistics.

An older and simpler, but equally important, idea is to use one independent subsample in searching the data to develop a model and at least one separate subsample for estimating and testing a selected model. Otherwise, it is next to impossible to make allowances for the excessively close fitting of the model that occurs as a result of the creative search for the exact characteristics of the sample data—characteristics that are to some degree random and will not predict well to other samples.

Robust Techniques

Many technical assumptions underlie the analysis of data. Some, like the assumption that each item in a sample is drawn independently of other items, can be weakened when the data are sufficiently structured to admit simple alternative models, such as serial correlation. Usually, these models require that a few parameters be estimated. Assumptions about shapes of distributions, normality being the most common, have proved to be particularly important, and considerable progress has been made in dealing with the consequences of different assumptions.

More recently, robust techniques have been designed that permit sharp, valid discriminations among possible values of parameters of central tendency for a wide variety of alternative distributions by reducing the weight given to occasional extreme deviations. It turns out that by giving up, say, 10 percent of the discrimination that could be provided under the rather unrealistic assumption of normality, one can greatly improve performance in more realistic situations, especially when unusually large deviations are relatively common.

These valuable modifications of classical statistical techniques have been extended to multiple regression, in which procedures of iterative reweighting can now offer relatively good performance for a variety of underlying distributional shapes. They should be extended to more general schemes of analysis.

In some contexts—notably the most classical uses of analysis of variance—the use of adequate robust techniques should help to bring conventional statistical practice closer to the best standards that experts can now achieve.

Many Interrelated Parameters

In trying to give a more accurate representation of the real world than is possible with simple models, researchers sometimes use models with many parameters, all of which must be estimated from the data. Classical principles of estimation, such as straightforward maximum-likelihood, do not yield reliable estimates unless either the number of observations is much larger than the number of parameters to be estimated or special designs are used in conjunction with strong assumptions. Bayesian methods do not draw a distinction between fixed and random parameters, and so may be especially appropriate for such problems.

A variety of statistical methods have recently been developed that can be interpreted as treating many of the parameters as or similar to random quantities, even if they are regarded as representing fixed quantities to be estimated. Theory and practice demonstrate that such methods can improve the simpler fixed-parameter methods from which they evolved, especially when the number of observations is not large relative to the number of parameters. Successful applications include college and graduate school admissions, where quality of previous school is treated as a random parameter when the data are insufficient to separately estimate it well. Efforts to create appropriate models using this general approach for small-area estimation and undercount adjustment in the census are important potential applications.

Missing Data

In data analysis, serious problems can arise when certain kinds of (quantitative or qualitative) information is partially or wholly missing. Various approaches to dealing with these problems have been or are being developed. One of the methods developed recently for dealing with certain aspects of missing data is called multiple imputation: each missing value in a data set is replaced by several values representing a range of possibilities, with statistical dependence among missing values reflected by linkage among their replacements. It is currently being used to handle a major problem of incompatibility between the 1980 and previous Bureau of Census public-use tapes with respect to occupation codes. The extension of these techniques to address such problems as nonresponse to income questions in the Current Population Survey has been examined in exploratory applications with great promise.

Computer Packages and Expert Systems

The development of high-speed computing and data handling has fundamentally changed statistical analysis. Methodologies for all kinds of situations are rapidly being developed and made available for use in computer packages that may be incorporated into interactive expert systems. This computing capability offers the hope that much data analyses will be more carefully and more effectively done than previously and that better strategies for data analysis will move from the practice of expert statisticians, some of whom may not have tried to articulate their own strategies, to both wide discussion and general use.

But powerful tools can be hazardous, as witnessed by occasional dire misuses of existing statistical packages. Until recently the only strategies available were to train more expert methodologists or to train substantive scientists in more methodology, but without the updating of their training it tends to become outmoded. Now there is the opportunity to capture in expert systems the current best methodological advice and practice. If that opportunity is exploited, standard methodological training of social scientists will shift to emphasizing strategies in using good expert systems—including understanding the nature and importance of the comments it provides—rather than in how to patch together something on one’s own. With expert systems, almost all behavioral and social scientists should become able to conduct any of the more common styles of data analysis more effectively and with more confidence than all but the most expert do today. However, the difficulties in developing expert systems that work as hoped for should not be underestimated. Human experts cannot readily explicate all of the complex cognitive network that constitutes an important part of their knowledge. As a result, the first attempts at expert systems were not especially successful (as discussed in Chapter 1 ). Additional work is expected to overcome these limitations, but it is not clear how long it will take.

Exploratory Analysis and Graphic Presentation

The formal focus of much statistics research in the middle half of the twentieth century was on procedures to confirm or reject precise, a priori hypotheses developed in advance of collecting data—that is, procedures to determine statistical significance. There was relatively little systematic work on realistically rich strategies for the applied researcher to use when attacking real-world problems with their multiplicity of objectives and sources of evidence. More recently, a species of quantitative detective work, called exploratory data analysis, has received increasing attention. In this approach, the researcher seeks out possible quantitative relations that may be present in the data. The techniques are flexible and include an important component of graphic representations. While current techniques have evolved for single responses in situations of modest complexity, extensions to multiple responses and to single responses in more complex situations are now possible.

Graphic and tabular presentation is a research domain in active renaissance, stemming in part from suggestions for new kinds of graphics made possible by computer capabilities, for example, hanging histograms and easily assimilated representations of numerical vectors. Research on data presentation has been carried out by statisticians, psychologists, cartographers, and other specialists, and attempts are now being made to incorporate findings and concepts from linguistics, industrial and publishing design, aesthetics, and classification studies in library science. Another influence has been the rapidly increasing availability of powerful computational hardware and software, now available even on desktop computers. These ideas and capabilities are leading to an increasing number of behavioral experiments with substantial statistical input. Nonetheless, criteria of good graphic and tabular practice are still too much matters of tradition and dogma, without adequate empirical evidence or theoretical coherence. To broaden the respective research outlooks and vigorously develop such evidence and coherence, extended collaborations between statistical and mathematical specialists and other scientists are needed, a major objective being to understand better the visual and cognitive processes (see Chapter 1 ) relevant to effective use of graphic or tabular approaches.

Combining Evidence

Combining evidence from separate sources is a recurrent scientific task, and formal statistical methods for doing so go back 30 years or more. These methods include the theory and practice of combining tests of individual hypotheses, sequential design and analysis of experiments, comparisons of laboratories, and Bayesian and likelihood paradigms.

There is now growing interest in more ambitious analytical syntheses, which are often called meta-analyses. One stimulus has been the appearance of syntheses explicitly combining all existing investigations in particular fields, such as prison parole policy, classroom size in primary schools, cooperative studies of therapeutic treatments for coronary heart disease, early childhood education interventions, and weather modification experiments. In such fields, a serious approach to even the simplest question—how to put together separate estimates of effect size from separate investigations—leads quickly to difficult and interesting issues. One issue involves the lack of independence among the available studies, due, for example, to the effect of influential teachers on the research projects of their students. Another issue is selection bias, because only some of the studies carried out, usually those with “significant” findings, are available and because the literature search may not find out all relevant studies that are available. In addition, experts agree, although informally, that the quality of studies from different laboratories and facilities differ appreciably and that such information probably should be taken into account. Inevitably, the studies to be included used different designs and concepts and controlled or measured different variables, making it difficult to know how to combine them.

Rich, informal syntheses, allowing for individual appraisal, may be better than catch-all formal modeling, but the literature on formal meta-analytic models is growing and may be an important area of discovery in the next decade, relevant both to statistical analysis per se and to improved syntheses in the behavioral and social and other sciences.

  • Opportunities and Needs

This chapter has cited a number of methodological topics associated with behavioral and social sciences research that appear to be particularly active and promising at the present time. As throughout the report, they constitute illustrative examples of what the committee believes to be important areas of research in the coming decade. In this section we describe recommendations for an additional $16 million annually to facilitate both the development of methodologically oriented research and, equally important, its communication throughout the research community.

Methodological studies, including early computer implementations, have for the most part been carried out by individual investigators with small teams of colleagues or students. Occasionally, such research has been associated with quite large substantive projects, and some of the current developments of computer packages, graphics, and expert systems clearly require large, organized efforts, which often lie at the boundary between grant-supported work and commercial development. As such research is often a key to understanding complex bodies of behavioral and social sciences data, it is vital to the health of these sciences that research support continue on methods relevant to problems of modeling, statistical analysis, representation, and related aspects of behavioral and social sciences data. Researchers and funding agencies should also be especially sympathetic to the inclusion of such basic methodological work in large experimental and longitudinal studies. Additional funding for work in this area, both in terms of individual research grants on methodological issues and in terms of augmentation of large projects to include additional methodological aspects, should be provided largely in the form of investigator-initiated project grants.

Ethnographic and comparative studies also typically rely on project grants to individuals and small groups of investigators. While this type of support should continue, provision should also be made to facilitate the execution of studies using these methods by research teams and to provide appropriate methodological training through the mechanisms outlined below.

Overall, we recommend an increase of $4 million in the level of investigator-initiated grant support for methodological work. An additional $1 million should be devoted to a program of centers for methodological research.

Many of the new methods and models described in the chapter, if and when adopted to any large extent, will demand substantially greater amounts of research devoted to appropriate analysis and computer implementation. New user interfaces and numerical algorithms will need to be designed and new computer programs written. And even when generally available methods (such as maximum-likelihood) are applicable, model application still requires skillful development in particular contexts. Many of the familiar general methods that are applied in the statistical analysis of data are known to provide good approximations when sample sizes are sufficiently large, but their accuracy varies with the specific model and data used. To estimate the accuracy requires extensive numerical exploration. Investigating the sensitivity of results to the assumptions of the models is important and requires still more creative, thoughtful research. It takes substantial efforts of these kinds to bring any new model on line, and the need becomes increasingly important and difficult as statistical models move toward greater realism, usefulness, complexity, and availability in computer form. More complexity in turn will increase the demand for computational power. Although most of this demand can be satisfied by increasingly powerful desktop computers, some access to mainframe and even supercomputers will be needed in selected cases. We recommend an additional $4 million annually to cover the growth in computational demands for model development and testing.

Interaction and cooperation between the developers and the users of statistical and mathematical methods need continual stimulation—both ways. Efforts should be made to teach new methods to a wider variety of potential users than is now the case. Several ways appear effective for methodologists to communicate to empirical scientists: running summer training programs for graduate students, faculty, and other researchers; encouraging graduate students, perhaps through degree requirements, to make greater use of the statistical, mathematical, and methodological resources at their own or affiliated universities; associating statistical and mathematical research specialists with large-scale data collection projects; and developing statistical packages that incorporate expert systems in applying the methods.

Methodologists, in turn, need to become more familiar with the problems actually faced by empirical scientists in the laboratory and especially in the field. Several ways appear useful for communication in this direction: encouraging graduate students in methodological specialties, perhaps through degree requirements, to work directly on empirical research; creating postdoctoral fellowships aimed at integrating such specialists into ongoing data collection projects; and providing for large data collection projects to engage relevant methodological specialists. In addition, research on and development of statistical packages and expert systems should be encouraged to involve the multidisciplinary collaboration of experts with experience in statistical, computer, and cognitive sciences.

A final point has to do with the promise held out by bringing different research methods to bear on the same problems. As our discussions of research methods in this and other chapters have emphasized, different methods have different powers and limitations, and each is designed especially to elucidate one or more particular facets of a subject. An important type of interdisciplinary work is the collaboration of specialists in different research methodologies on a substantive issue, examples of which have been noted throughout this report. If more such research were conducted cooperatively, the power of each method pursued separately would be increased. To encourage such multidisciplinary work, we recommend increased support for fellowships, research workshops, and training institutes.

Funding for fellowships, both pre-and postdoctoral, should be aimed at giving methodologists experience with substantive problems and at upgrading the methodological capabilities of substantive scientists. Such targeted fellowship support should be increased by $4 million annually, of which $3 million should be for predoctoral fellowships emphasizing the enrichment of methodological concentrations. The new support needed for research workshops is estimated to be $1 million annually. And new support needed for various kinds of advanced training institutes aimed at rapidly diffusing new methodological findings among substantive scientists is estimated to be $2 million annually.

  • Cite this Page National Research Council; Division of Behavioral and Social Sciences and Education; Commission on Behavioral and Social Sciences and Education; Committee on Basic Research in the Behavioral and Social Sciences; Gerstein DR, Luce RD, Smelser NJ, et al., editors. The Behavioral and Social Sciences: Achievements and Opportunities. Washington (DC): National Academies Press (US); 1988. 5, Methods of Data Collection, Representation, and Analysis.
  • PDF version of this title (16M)

In this Page

Other titles in this collection.

  • The National Academies Collection: Reports funded by National Institutes of Health

Recent Activity

  • Methods of Data Collection, Representation, and Analysis - The Behavioral and So... Methods of Data Collection, Representation, and Analysis - The Behavioral and Social Sciences: Achievements and Opportunities

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Data Collection Methods | Step-by-Step Guide & Examples

Data Collection Methods | Step-by-Step Guide & Examples

Published on 4 May 2022 by Pritha Bhandari .

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental, or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem .

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

  • The  aim of the research
  • The type of data that you will collect
  • The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

Table of contents

Step 1: define the aim of your research, step 2: choose your data collection method, step 3: plan your data collection procedures, step 4: collect the data, frequently asked questions about data collection.

Before you start the process of data collection, you need to identify exactly what you want to achieve. You can start by writing a problem statement : what is the practical or scientific issue that you want to address, and why does it matter?

Next, formulate one or more research questions that precisely define what you want to find out. Depending on your research questions, you might need to collect quantitative or qualitative data :

  • Quantitative data is expressed in numbers and graphs and is analysed through statistical methods .
  • Qualitative data is expressed in words and analysed through interpretations and categorisations.

If your aim is to test a hypothesis , measure something precisely, or gain large-scale statistical insights, collect quantitative data. If your aim is to explore ideas, understand experiences, or gain detailed insights into a specific context, collect qualitative data.

If you have several aims, you can use a mixed methods approach that collects both types of data.

  • Your first aim is to assess whether there are significant differences in perceptions of managers across different departments and office locations.
  • Your second aim is to gather meaningful feedback from employees to explore new ideas for how managers can improve.

Prevent plagiarism, run a free check.

Based on the data you want to collect, decide which method is best suited for your research.

  • Experimental research is primarily a quantitative method.
  • Interviews , focus groups , and ethnographies are qualitative methods.
  • Surveys , observations, archival research, and secondary data collection can be quantitative or qualitative methods.

Carefully consider what method you will use to gather data that helps you directly answer your research questions.

When you know which method(s) you are using, you need to plan exactly how you will implement them. What procedures will you follow to make accurate observations or measurements of the variables you are interested in?

For instance, if you’re conducting surveys or interviews, decide what form the questions will take; if you’re conducting an experiment, make decisions about your experimental design .

Operationalisation

Sometimes your variables can be measured directly: for example, you can collect data on the average age of employees simply by asking for dates of birth. However, often you’ll be interested in collecting data on more abstract concepts or variables that can’t be directly observed.

Operationalisation means turning abstract conceptual ideas into measurable observations. When planning how you will collect data, you need to translate the conceptual definition of what you want to study into the operational definition of what you will actually measure.

  • You ask managers to rate their own leadership skills on 5-point scales assessing the ability to delegate, decisiveness, and dependability.
  • You ask their direct employees to provide anonymous feedback on the managers regarding the same topics.

You may need to develop a sampling plan to obtain data systematically. This involves defining a population , the group you want to draw conclusions about, and a sample, the group you will actually collect data from.

Your sampling method will determine how you recruit participants or obtain measurements for your study. To decide on a sampling method you will need to consider factors like the required sample size, accessibility of the sample, and time frame of the data collection.

Standardising procedures

If multiple researchers are involved, write a detailed manual to standardise data collection procedures in your study.

This means laying out specific step-by-step instructions so that everyone in your research team collects data in a consistent way – for example, by conducting experiments under the same conditions and using objective criteria to record and categorise observations.

This helps ensure the reliability of your data, and you can also use it to replicate the study in the future.

Creating a data management plan

Before beginning data collection, you should also decide how you will organise and store your data.

  • If you are collecting data from people, you will likely need to anonymise and safeguard the data to prevent leaks of sensitive information (e.g. names or identity numbers).
  • If you are collecting data via interviews or pencil-and-paper formats, you will need to perform transcriptions or data entry in systematic ways to minimise distortion.
  • You can prevent loss of data by having an organisation system that is routinely backed up.

Finally, you can implement your chosen methods to measure or observe the variables you are interested in.

The closed-ended questions ask participants to rate their manager’s leadership skills on scales from 1 to 5. The data produced is numerical and can be statistically analysed for averages and patterns.

To ensure that high-quality data is recorded in a systematic way, here are some best practices:

  • Record all relevant information as and when you obtain data. For example, note down whether or how lab equipment is recalibrated during an experimental study.
  • Double-check manual data entry for errors.
  • If you collect quantitative data, you can assess the reliability and validity to get an indication of your data quality.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organisations.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g., understanding the needs of your consumers or user testing your website).
  • You can control and standardise the process for high reliability and validity (e.g., choosing appropriate measurements and sampling methods ).

However, there are also some drawbacks: data collection can be time-consuming, labour-intensive, and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research , you also have to consider the internal and external validity of your experiment.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Operationalisation means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioural avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalise the variables that you want to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, May 04). Data Collection Methods | Step-by-Step Guide & Examples. Scribbr. Retrieved 15 April 2024, from https://www.scribbr.co.uk/research-methods/data-collection-guide/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, qualitative vs quantitative research | examples & methods, triangulation in research | guide, types, examples, what is a conceptual framework | tips & examples.

CPE/CSC 484-S13 User-Centered Design and Development

  • Schedule (Draft)
  • A1: Heuristic Evaluation
  • A2: UCD Tools
  • A3: Storyboards
  • A4: Data Collection
  • A5: Usability Evaluation
  • 484-W13  > 
  • Assignments  > 
  • A4: Data Collection  > 

Assignment 4: User Data Collection

Team assignment.

This assignment is to be performed in a team of about 3-5 people, preferably the same team as the project.

Goals and Objectives

Get practical experience with the design, planning, and conduct of data collection activities.

Perform an evaluation of the collected data with particular emphasis on usability.

Utilize the collected data to improve the design of a system or product.

Collect information for adjustments to the design and development process used.

Description

In this assignment you will select a topic or task you need user input for, choose a data collection method appropriate for this purpose, set up a data collection station, and conduct a sample session with outside participants, which should be as close to the intended users as possible. If possible, the topic should be related to your team project; if your team feels that it is not suitable, you can select a different topic for the user data collection task, but please talk to me first.Look at this activity from the perspective of team members who need to justify to their boss the overhead and expenses involved in this data collection activity. If you add up the actual expenses, the time spent by the developers, the time spent by the participants, and possibly the costs affiliated with the use of external participants, you will quickly see that such activities can become quite expensive. So it is very important to determine what you want to achieve, to carefully plan the setup and procedures, and to perform a thorough evaluation of the collected data, and to complete this assignment professionally.

Data Collection Station

The most frequently used methods are video recording (either with a video camera or a Webcam), audio recording (with a tape recorder, dictation devices, or electronic voice recording), and the logging of computer activities (key strokes, mouse movements). Morae and similar usability evaluation suites provide these capabilities in an integrated manner. During the S12 quarter, I expect all teams to use Morae for their data collection activities in order to collect experience with it. We can also get additional equipment from Cal Poly's Media Services department; please let me know as soon as possible if your team requires specialized equipment.

Participant Consent

Experiments involving humans (and possibly also those involving animals) are subject to some laws and guidelines. Cal Poly has a Human Subjects Committee, which maintains a Web page on this topic at http://rgp.calpoly.edu/indexHS.html . While classroom activities like ours are exempt from the more stringent requirements (such as approval of the experiments), you need to be aware of these policies, and some of the laws that they are based on. In general, you should tell participants in advance what you expect them to do, what the potential difficulties and risks are, and what you are planning to do with the data collected. This is of course most essential if you are collecting personal data. I strongly recommend refraining from collecting potentially sensitive data like home addresses, home phone numbers, racial or ethnical background, religion or faith, sexual orientation, etc. Additionally, I cannot possibly imagine a scenario for this exercise where it would be legitimate for you to collect data like Social Security, credit card, or bank information.For the more common activities like questionnaires, surveys, or focus groups, it is common practice to use "informed consent" forms that participants sign at the beginning. You can find examples on the Human Subjects Committee Web page (see above, or http://rgp.calpoly.edu/formsHS.html ) for a template to be used on anonymous questionnaires).

Submission and Deadline

You can view the deliverable for this activity as a report to your boss, or to venture capitalists who might be interested in providing funding for your project. Of course the main emphasis is to convince them that your overall project is great, but in this particular part, you can actually collect data that show them what others think of it.At the end of the next lab period after the data collection station event, post the materials you used on the Trac Wiki or an alternative repository if confidentiality is required. This should include:

  • A brief description of the topic, and the reasons you selected it.
  • A description of the data collection technique you chose, and why.
  • Materials used during the data collection itself, such as consent forms, explanations of the procedure, questionnaires, feedback forms, etc.
  • Excerpts from protocols of data collection activities. If possible, include some typical events, and also some unexpected ones.
  • A summary of the overall outcome, including common themes among the activities and reactions of the users, observations that confirmed your expectations, and surprises that you didn't expect. It is also possible that you may have conflicting evidence, which could indicate diversities in the participants, or deficiencies in your data collection technique; you should address these.
  • An evaluation of the overall data collection activity, including suggestions for improvement, and an assessment of the suitability of the data collection technique you chose.

Criteria Collection Station

  • Difficulty of chosen collection method
  • Consent form (specific to the method)
  • Script, schedule, questionnaire etc available
  • Setup of the station (equipment, facilities)
  • Interaction with the participants
  • Cleanup after the event

Criteria Documents

  • Quality of the documents
  • Capturing of the data (e.g. audio/video recording, notes, computer)
  • Organization of data (spreadsheed, data base)
  • Evaluation of data
  • Discussion of the results
  • Presentation of the results
  • Evaluation of the overall experiment

Some materials from last year's projects and assignments are available on the respective team Wikis; see http://users.csc.calpoly.edu/~fkurfess/Courses/484/S12/Project/Project/Teams.html for the ones from last year, and http://users.csc.calpoly.edu/~fkurfess/Courses/Recent-Courses.shtml for a list of courses going back to 2000 (not all projects are available anymore, and some are on confidential repositories).

Collection of Data

A fundamental question to be considered at the  outset is whether the collection of data should be done by complete enumeration  or by sampling.In complete enumeration, each and every individual of the group to which data are to relate is covered, and information gathered for each individual separately

In Sampling, only some individuals forming a representative part of the group are covered, either because the group is too large or because the items on which information is sought are too numerous.

Complete enumeration may lead to greater accuracy and greater refinement in analysis, but it may be a very expensive and time-consuming operation.A sample designed and taken with care can produce results that may be sufficiently accurate for the purpose of the enquiry, and it can save much time and money.

The information sought may be gathered, from the individuals of the whole group (called the population ) or from those of the sample by one of the three methods:

i)                    The questionnaire method

ii)                  The interviewer method

iii)                The method of direct observation.

Questionnaire method: Each informant (or respondent) is provided with a questionnaire, usually sent by mail with return postage prepaid, and is asked to supply the information in the form of answer to the questions.

This method can be effective only when the informants  have attained a certain level of education .

The drawback of the method is that the informants may not get  sufficient interest in the enquiry even if they are sufficiently enlightened. Consequently, the data may involve a high percentage of non-response and thus fail to reflect the true state of the field of enquiry.

Interviewer method: Enumerators go from one informant  to other and elicit the required information . This method is used in population censuses. Also, this method has to be employed in case the informants are not all literate or, even if literate, have not attained the requisite educational level .

For instance, if one is interested in family income and expenditure on different items, one may arrange to interview the head of each family and collect the information sought from him. The data collected by this method are likely to be more accurate, since a tactful investigator may persuade the informant to supply the required information and the meaning of each question may be properly explained to him so that the answer may be correct and to the  point.

Method of Direct Observation: The enquirer or his assistants get the data directly from the field of enquiry without having to depend on the co-operation of informants.

When data are needed on the height and weights of , say, 200 college students, they will be approached individually and height (say in cm) of each measured with a tape and the weight (say in kg) measured with a weighing  balance.

If data are needed on the sentence of a novel by , say, Bankimchandra, the enquirer himself will go through the book and note for each sentence the length i.e. the number of words contained therein.

On the other hand, if data are required on the incidence of blindness among a group of people, one will just observe each member of the group and note whether he or she is or is not blind.

The direct method of data collection may, therefore, involve either measurement or counting or both observation.

Large Deviations Theory

Path analysis, correlation coefficient, google to pay $2.59 million to settle allegations of discrimination, overview of concussion, john inglefield’s thanksgiving, disadvantages of holding company, electroextraction (ee), pseudo palladium – a binary alloy, water scarcity, latest post, cathodic protection – a technique for controlling corrosion, electromagnetism – a discipline of physics, astronomers measure the heaviest black hole pair ever discovered, even passive smokers are extensively colonized by microbes, webb discovers proof that a neutron star powers the young supernova remnant, flyback transformer (fbt).

  • Skip to main content
  • Skip to FDA Search
  • Skip to in this section menu
  • Skip to footer links

U.S. flag

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

U.S. Food and Drug Administration

  •   Search
  •   Menu
  • FDA Organization
  • Center for Drug Evaluation and Research | CDER

CDER Center for Clinical Trial Innovation (C3TI)

Selective Safety Data Collection (SSDC) Demonstration Project

Selective Safety Data Collection (SSDC) offers an innovative approach to facilitate the conduct of large-scale efficacy and safety trials through the purposeful reduction in the collection of certain types of data for drugs or biologics with a well-characterized safety profile. C3TI aims to promote the adoption of SSDC principles into appropriate drug development programs. The SSDC demonstration project aims to partner with sponsors to strategically streamline data collection in some late-stage pre-approval or post-approval trials, creating benefits such as reducing the burden of collecting unnecessary safety data, eliminating unnecessary expense and allocating resources to the relevant objectives of a study, and facilitating trial conduct to answer important scientific questions on long-term efficacy and safety of drugs and biologics.

CDER partnership with sponsors to implement SSDC in appropriately identified trials will result in improved understanding of its real-world applicability, demonstration of its ability to facilitate efficient clinical trials, identification of potential challenges that programs encounter and ways to address those challenges, and promotion of best practices. This demonstration project intends to provide lessons learned from prospectively planned clinical trials integrating SSDC through close communication and collaboration among sponsors and CDER subject matter experts and review staff. Ideal studies include specific late-stage pre- or post-marketing studies of drugs where the safety profile, with respect to commonly occurring adverse events, is well-understood and documented.

Benefits of participation

By participating, sponsor(s) would receive additional CDER engagement support for trial design and implementation aspects, which includes leaders across several CDER offices (e.g., Office of New Drugs, Office of Medical Policy, Office of Translational Sciences). Engagement may include access to additional coordination support with CDER subject matter experts and an inspection process that is fit-for-purpose for the innovative design (i.e., focused on a quality by design approach).

Eligibility Criteria for SSDC Demonstration Project Proposals

  • The sponsor has an active pre-Investigational New Drug (IND) or IND for the product(s) included in the proposal.
  • Trial for an approved drug, seeking a new indication in a similar population to that in which it was already approved
  • Trial for an approved drug, seeking to expand the label to include additional endpoints in the same patient population
  • Safety trial investigating a very specific safety concern (e.g., a PMR under FDAAA)
  • Trial designed to provide additional evidence of efficacy when current data support a well-characterized safety profile
  • Sponsors participating in demonstration projects will be expected to share select details of their clinical trials and the implementation of clinical trial innovations as they progress, starting as early as the finalization of study design. This sharing may include updates, lessons learned, and relevant insights gathered during the demonstration trial. It is understood that these shared details will reflect general principles and innovative aspects, while maintaining the necessary confidentiality of proprietary or sensitive information.

Instructions on how to submit a proposal can be found on the C3TI Demonstration Program Proposal Submission webpage.

  • Using Replenishment Planning

Export Planning Data to CSV Files

You can replicate sourcing rules and assignment set data from your Oracle Fusion Cloud Supply Planning instance to another instance. When exporting data to CSV files, you can export a subset of the measures that are listed in a plan's measure catalog.

Replicate Sourcing Rules and Assignment Set Data

You can replicate sourcing rules and assignment set data from your Oracle Fusion Cloud Supply Planning instance to another instance or application without manually reentering data. For example, you can replicate the sourcing rules in your Oracle Supply Chain Planning production instance into a reporting system or a test instance.

Use the Export Supply Chain Planning Data scheduled process job to extract sourcing rules and assignment set data into a comma-delimited values (CSV) file format. The generated file is in the standard file-based import (FBDI) file format that you can then use to do the following:

Check for missing or incorrect sourcing rules or sourcing assignments. For example, does a particular item-organization have a valid sourcing assignment?

Replicate and then modify existing sourcing rules and assignment sets in a spreadsheet. You can then upload the new setups into another instance or application to create new sourcing rule and assignment set setups.

Copy the sourcing rule and assignment set setups from one instance to another instance or application, such as from a test instance to a production instance.

The format of the CSV files, including the header content, is the same as the format of the corresponding files in the source instance. This enables you to import the same data directly into another production or test instance. Use the Load Planning Data from Files process to upload your imported files to staging tables.

The following steps show you how to export sourcing rules and assignment set data to a CSV file:

In the Scheduled Processes work area, click the Schedule New Process button on the Overview page.

In the Schedule New Process dialog box, search for and select Export Supply Chain Planning Data and then click OK .

In the Process Details dialog box, Basic Options section, select Sourcing Rules and Assignment Sets from the Entities to Export drop down list.

Optionally, click the Filter button to filter your export with specific assignment sets. If you don't select any filters, then the process exports all sourcing rules and assignment sets.

In the Process Details dialog box, click the Submit button to submit the scheduled process job.

After you submit your job, you will receive a confirmation with a confirmation number. The planning process appends the confirmation number to the extracted CSV file names, such as AssignmentSets_65960.csv and SourcingRules_65960.csv, where 65960 is the confirmation number.

Export Measures Along With Planning Data

While exporting planning data, you can include a subset of measures that are part of a plan's measure catalog.

You can assign a measure catalog other than a plan’s extract measure catalog defined in Plan options, in the Manage Plans UI. This improves the versatility and quality of data extracts required for planning.

Depending on your planning cycle and data needs, extract only those measures that are most relevant for the planning activity. For example, for a given plan you can export demand related measures early in a planning cycle and supply related measures later in the cycle.

You can also assign a measure catalog when running the Prepare Planning Data for Export scheduled process.

collection of data assignment

Abortion data collection bill latest flare up over reproductive rights in NH

Protestors gathered in Manchester after the Dobbs decision in June 2022.

On a party-line vote, Republicans in the New Hampshire Senate last week approved a bill that would require abortion providers to share certain data about the procedures they perform with state public health officials.

Forty-six other states already have similar laws in place, making New Hampshire an outlier in the dissemination of abortion statistics.

“I've heard debates on the floor many times that we just don’t have the information, we don’t have the data,” Republican Sen. Regina Birdsell said on the Senate floor last week. “Well, guess what: This will do it.”

But for Democrats, the proposal, which was added late in the legislative process and therefore not subject to a public hearing, is the GOP’s latest attempt to chip away at abortion rights and curtail personal freedoms.

“Ever since the Dobbs decision, we have been living in a dystopian horror show with control of pregnant bodies the main plot line,” Democrat Sen. Debra Altschiller said during debate on the measure.

New Hampshire Republicans have tried unsuccessfully numerous times in recent years to require abortion providers to release certain statistics. The latest effort calls for providers to share the date and location of each abortion, the method used, including if a medication was prescribed, as well as share the state of residence of the pregnant patient, and the gestational age of the fetus.

The state Department of Health and Human Services would then publish data annually on abortions in New Hampshire, though the bill doesn’t clarify if the information would be released in an aggregated form, or if the county or even the zip code of the provider would be disclosed.

Democrats argued that level of data shared publicly could put providers at risk for harassment or other targeting; they also questioned how gestational age should be determined by the provider, since the bill lacks any detail.

Scene from an anti-abortion rally in Manchester, New Hampshire in June of 2022.

“This amendment would potentially require a government-forced, potentially medically unnecessary, intrusive trans-vaginal ultrasound,” Sen. Becky Whitley said during a debate last Friday that grew tense at times. “That should send chills down the spine of every woman in the state.”

Senate Majority Leader Sharon Carson, a Republican, rose to her feet, saying she was baffled by the claim.

“I can’t believe what I’m hearing here. I really and truly cannot,” said Carson. “There’s no requirement for any kind of testing here. No ultrasound, no nothing.”

Carson accused Democrats of spreading misinformation about the bill, and in a statement this week reiterated that an ultrasound is not the only way providers could determine age in compliance with the bill.

In practice, providers say ultrasounds are performed before abortions when it makes sense for the patient. But there are other ways to determine gestational age, including using the date of the last menstrual cycle. States including Maine, Massachusetts and Vermont permit abortion providers to estimate the age of the fetus using that information.

A tool for sound policy, or for scoring political points

Abortion providers in New Hampshire say they aren’t opposed to producing and sharing protected, anonymized data, as long as it is used to advance public health policy.

“However, where we need clarity whenever we consider the request to supply abortion data is really we need to know specifically or with some clarity, what the anticipated public health benefit is and how the data may be used,” said Sandi Denoncour, executive director of Lovering Health Center in Greenland.

New Hampshire State House, Concord, NH.

Abortion rights supporters point to what they see as a history of states using reporting requirements to bog down abortion providers with paperwork. Other states have also required providers to collect invasive or what they see as irrelevant information about the patient, including their history of contraceptive use.

“They're not really being used for public health purposes,” said Rachel Jones, a researcher with the Guttmacher Institute, one of the country’s leading research institutions on abortion. “They're being used to further stigmatize abortion and increase the burden on the facilities that provide this care.”

While New Hampshire, along with California, Maryland and New Jersey are the only states that don’t have reporting mandates, Guttmacher’s website does maintain abortion statistics for procedures performed in New Hampshire.

In 2023, Guttmacher estimates there were 2,400 abortions performed in the state.

That data is based on voluntary reporting by local clinics, including Lovering Health Center and Planned Parenthood of Northern New England.

Those clinics willingly share aggregate abortion numbers, they said, because they trust Guttmacher to use the data for research purposes.

After clearing the state Senate on a party line vote, the bill mandating reporting statistics now heads to the New Hampshire House, where it will get a full public hearing and could be amended.

Gov. Chris Sununu has previously said he supports the state collecting data.

collection of data assignment

The Primarily Politics newsletter: From the ballot box to your inbox!

collection of data assignment

You make NHPR possible.

NHPR is nonprofit and independent. We rely on readers like you to support the local, national, and international coverage on this website. Your support makes this news available to everyone.

Give today. A monthly donation of $5 makes a real difference.

  • Share full article

Advertisement

Supported by

Secret Rift Over Data Center Fueled Push to Expand Reach of Surveillance Program

Privacy advocates are raising alarms about a mysterious provision the House added to a surveillance bill last week. The Senate is likely to vote on the bill later this week.

collection of data assignment

By Charlie Savage

Charlie Savage has been writing about national security and legal policy, including surveillance, for more than two decades. He reported from Washington.

A hidden dispute over whether a data center for cloud computing must cooperate with a warrantless surveillance program prompted the House last week to add a mysterious provision to a bill extending the program, according to people familiar with the matter.

The disclosure helps clarify the intent behind an amendment that has alarmed privacy advocates as Senate leaders try to swiftly pass the bill , which would add two more years to a wiretapping law known as Section 702 . The provision would add to the types of service providers that could be compelled to participate in the program, but it is written in enigmatic terms that make it hard to understand what it is supposed to permit.

Data centers are centralized warehouses of computer servers that can be accessed over the internet from anywhere in the world. In the cloud computing era, they are increasingly operated by third parties that rent out the storage space and computing power that make other companies’ online services work.

Even as national security officials described the provision as a narrow fix to a technical issue, they have declined to explain a classified court ruling from 2022 to which the provision is a response, citing the risk of tipping off foreign adversaries. Privacy advocates, for their part, have portrayed the amendment as dangerous, so broadly worded that it could be used to draft ordinary service people — like cable installers, janitors or plumbers who can gain physical access to office computer equipment — to act as spies.

Under Section 702, the government may collect, without a warrant and from U.S. companies like Google and AT&T, the communications of foreigners abroad who have been targeted for intelligence or counterterrorism purposes — even when they are communicating with Americans. Enacted in 2008, it legalized a form of the warrantless surveillance program President George W. Bush began after the terrorist attacks of Sept. 11, 2001.

Specifically, after the court that oversees national security surveillance approves the government’s annual requests seeking to renew the program and setting rules for it, the administration sends directives to “electronic communications service providers” that require them to participate. If any such entity balks, the court decides whether it must cooperate.

Last August, the government partly declassified court rulings centered on the dispute. The surveillance court in 2022, and an appeals court panel a year later, sided with an unidentified company that had objected to being compelled to participate in the program because it believed one of its services did not fit the necessary criteria.

The details were redacted. But according to the people familiar with the matter, who spoke on the condition of anonymity to discuss a sensitive matter, the judges found that a data center service does not fit the legal definition of an “electronic communications service provider” because it does not itself give its users the ability to send or receive electronic messages.

Unredacted portions in both rulings suggested that Congress update the definition if the interpretation was a problem. “If the government believes that the scope of Section 702 directives should be broadened as a matter of national security policy, its recourse is with Congress,” wrote Judge Rudolph Contreras, then the presiding judge of the surveillance court.

And the appellate panel noted that the definition invoked in Section 702 traces back to a law Congress wrote in 1986, meaning that it was “premised on internet architecture now almost 40 years old.” They added, “Any unintended gap in coverage revealed by our interpretation is, of course, open to reconsideration by the branches of government whose competence and constitutional authority extend to statutory revision.”

In an interview, Matthew G. Olsen, the head of the Justice Department’s national security division, said the push for the provision was being driven by a way that communications technology had evolved since Congress wrote Section 702 in 2008. But he declined to address whether the rise of data centers was the specific catalyst.

“Over the past 15 years, there has been a shift from reliance on only a handful of major backbone internet providers,” he said. “As technology changes, we have to go back to the fundamental purpose of 702, which is about foreign adversaries who are using U.S. infrastructure.”

Mr. Olsen also stressed that the law only permits targeting the communications of foreigners abroad and that its use is subject to oversight by all three branches.

Privacy advocates have put forward a far more disturbing interpretation of what the provision might do. In recent days, for example, the office of a leading privacy-minded senator, Ron Wyden, Democrat of Oregon, has circulated a warning that the provision could be used to conscript someone with access to a journalist’s laptop to extract communications between that journalist and a hypothetical foreign source who was targeted for intelligence.

“Even if a law is pitched as addressing a specific situation, history shows that intelligence agencies will use every inch of authority Congress provides to spy on Americans,” Mr. Wyden said in a statement, calling the provision “a breathtaking expansion of Section 702, which should terrify anyone who cares about Americans’ rights.”

A co-sponsor of the provision, Representative Jim Himes of Connecticut, the ranking Democrat on the House Intelligence Committee, expressed frustration about such worries.

“The privacy groups — and I admire their commitment to civil liberties — but they have been suggesting that this is bringing back the Stasi,” he said in an interview. “What they are doing is massive exaggerating here, as they have done throughout the whole reauthorization process to try to generate fear.”

As lawmakers debated whether to renew Section 702, Mr. Himes and his co-sponsor, his Republican counterpart, Representative Michael R. Turner of Ohio, put forward an amendment to expand the definition of who could receive a directive. Under their changes, it would also encompass “any other service provider who has access to equipment that is being or may be used to transmit or store wire or electronic communications.”

Privacy advocates expressed dismay, saying that by its plain text, the amendment could be used to force companies that offer wireless internet service to customers — like coffee shops and hotels — to tap those networks for warrantless surveillance, scooping up Americans’ messages to and from foreign targets.

Mr. Turner and Mr. Himes ultimately narrowed the amendment , adding a series of carve-outs. Those include restricting directives toward entities that primarily serve as dwellings, community facilities, food service establishments or other public accommodations.

The amendment passed, 236 to 186.

Still, as the bill heads to the Senate, privacy advocates have warned that the wording remains unacceptably broad. Sean Vitka, policy director for the civil liberties group Demand Progress, said that even if the Biden administration did not intend to use the provision so expansively, there was no guarantee that a future administration would agree.

“This change can be used to turn innumerable scores of Americans into secret government spies, posing a severe threat to hundreds of thousands of big and small businesses and their many millions of customers, clients and users,” he said.

In theory, the Senate could further narrow the language to exclude the most alarming scenarios being floated by critics of the provision. In that case, however, the bill would have to go back to the House, and given the legislative calendar, there may be little time for that step.

Although Section 702 is written in a way that would allow the program to continue operating until early April 2025 even if the statute expires on Friday, Senate leaders appear determined to prevent any lapse in the law.

Charlie Savage writes about national security and legal policy. More about Charlie Savage

A Divided Congress: Latest News and Analysis

Aid for Allies: Speaker Mike Johnson’s elaborate plan to push his foreign aid package for Israel and Ukraine  through the House over the objections of his fellow Republicans relies on Democrats' cooperation .

TikTok Bill: The House made another push to force through legislation that would require the sale of TikTok by its Chinese owner or ban the app in the United States by packaging the measure with aid to Ukraine and Israel .

Surveillance Bill: Senate leaders of both parties are urging their colleagues to renew a warrantless surveillance law  before it expires. But the program would continue after any such lapse  — with some caveats.

Mayorkas Impeachment: Republicans say the Senate’s quick dismissa l of charges against Alejandro Mayorkas, the homeland security secretary, sets a dangerous precedent. Democrats say the mistake would have been to treat the case seriously .

Campus Antisemitism Hearing: Columbia’s president, Nemat Shafik, agreed that the university needed to take a tougher stance on antisemitism, in response to harsh questioning from a Republican-led House committee .

The Federal Register

The daily journal of the united states government, request access.

Due to aggressive automated scraping of FederalRegister.gov and eCFR.gov, programmatic access to these sites is limited to access to our extensive developer APIs.

If you are human user receiving this message, we can add your IP address to a set of IPs that can access FederalRegister.gov & eCFR.gov; complete the CAPTCHA (bot test) below and click "Request Access". This process will be necessary for each IP address you wish to access the site from, requests are valid for approximately one quarter (three months) after which the process may need to be repeated.

An official website of the United States government.

If you want to request a wider IP range, first request access for your current IP, and then use the "Site Feedback" button found in the lower left-hand side to make the request.

More From Forbes

Four mistakes that leaders should avoid with ai.

  • Share to Facebook
  • Share to Twitter
  • Share to Linkedin

There needs to be a balance between technology and humanity

Artificial intelligence (AI) is one of the hottest trends in business today – with good reason. AI is set to transform the way we live and work. According to Bloomberg Intelligence, generative AI alone is set to become a $1.3 trillion market by 2032.

Businesses around the world are rushing to experiment with AI in the hope of gaining strategic advantage. But they risk losing large sums of money if they don’t adopt AI tools that can deliver the right results. Research by intelligent data infrastructure company NetApp revealed only half of U.K. businesses actually understand how AI can benefit their operations, while just 20% have a strong understanding of how they can harness AI technology.

So, what are the key mistakes that leaders need to avoid when implementing AI systems?

AI mistake #1: Sacrificing insight for automation

“AI can help to inform and facilitate decisions, but as a leader, you need to take ownership of every decision,” says Steve Oriola, CEO of Unbounce , a software company that creates AI-powered landing pages. “Automation without insight leaves performance up to chance, driving results that you can’t articulate or replicate. Even if the business is supported by AI, leaders are still accountable for making informed and measured decisions so they can identify why targets are hit or missed.”

Oriola recommends that when using AI to improve performance, leaders should look for a tool that feels like an insightful advisor, helping you to make informed decisions more efficiently. He explains: “The ideal tool is transparent, so you can ensure it works with your existing processes, clearly articulates how it improves performance, and uses its insights and data elsewhere.”

JPMorgan Joins Goldman Sachs In Serious Bitcoin Halving Price Warning

Google makes a major new sale offer to pixel 8 buyers, las vegas show sues michael jackson estate over broadway show logo.

You’ll know you’ve found the right tool when it feels like an extension of your team, Oriola argues. “You wouldn’t trust an underqualified employee with critical aspects of your business, so you should hold AI tools to the same standards.”

AI mistake #2: Using AI to replace, not enhance

Leaders who see AI as a way to replace human labor and cut costs are being short-sighted in their approach. “It’s imperative that we harness AI as a tool to augment, not replace, human ingenuity,” says Christie Horsman, vice president of marketing at online course platform Thinkific . “We’re very deliberate in using AI within the products we create to augment the expression of a creator’s unique genius, not to replace it.”

Horsman says that having a framework helps to minimize the risk that human creativity is sidelined. “By setting clear guidelines around the ethics and principles that matter most to your company, your people and your customers, you can more clearly navigate the complexities of AI integration,” she says, “ensuring that technology complements and enhances human ingenuity rather than competes with it.”

In Horsman’s experience, encouraging this type of symbiotic relationship between teams and AI “acts to amplify creativity and this leads to more innovation and better products for our customers”.

AI mistake #3: Overlooking the balance between technology and humanity

Technological enhancements should not come at the expense of humanity since that would defeat the purpose of those enhancements. Taking the healthcare sector as an example, leaders shouldn’t allow AI to overshadow the “irreplaceable decision-making and compassion of healthcare professionals”, says Dr Daan Dohmen, professor of digital transformation in healthcare at the Open University and CEO of home care platform Luscii .

He adds: “Gradually integrating AI, while fostering trust and identifying synergies between AI and human intuition, is crucial.”

Dohmen believes that prioritizing data privacy is non-negotiable when it comes to the deployment of AI systems in health. He says: “Our focus should be on using AI to enhance care quality and accessibility, and ensuring decisions and treatments are both informed and personalized, without neglecting the vital role of human empathy.”

IMAGES

  1. 5 Types of Data You Must Collect to Move Your Students Forward

    collection of data assignment

  2. (DOC) Assignment on Data Collection (Structured Data)

    collection of data assignment

  3. The Importance of Data Collection

    collection of data assignment

  4. Data Demystified: A Definitive Guide to Data Collection Methods

    collection of data assignment

  5. FREE 10+ Research Data Collection Form Samples & Templates in PDF

    collection of data assignment

  6. Data Collection: Methods, Definition, Types, and Tools

    collection of data assignment

VIDEO

  1. Video Assignment

  2. Mastering Data Collection Methods: Essential Strategies & Techniques

  3. BIG DATA ASSIGNMENT

  4. Big data Assignment

  5. How is Data Collected ... [How Data Collection works in Statistics]

  6. Big Data Computing

COMMENTS

  1. Data Collection

    Data Collection | Definition, Methods & Examples. Published on June 5, 2020 by Pritha Bhandari.Revised on June 21, 2023. Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.

  2. What Is Data Collection: Methods, Types, Tools

    Data collection is the process of collecting and evaluating information or data from multiple sources to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities. It is an essential phase in all types of research, analysis, and decision-making, including that done in the social sciences ...

  3. Data Collection

    Data collection is the process of gathering and collecting information from various sources to analyze and make informed decisions based on the data collected. This can involve various methods, such as surveys, interviews, experiments, and observation. In order for data collection to be effective, it is important to have a clear understanding ...

  4. Data Collection: What It Is, Methods & Tools + Examples

    Data collection methods are chosen depending on the available resources. For example, conducting questionnaires and surveys would require the least resources, while focus groups require moderately high resources. Reasons to Conduct Online Research and Data Collection . Feedback is a vital part of any organization's growth.

  5. Collecting Data in Your Classroom

    Data Collection . The data collection methods used in educational research have originated from a variety of disciplines (anthropology, history, psychology, sociology), which has resulted in a variety of research frameworks to draw upon. As discussed in the previous chapter, the challenge for educator-researchers is to develop a research plan ...

  6. Complete Guide to Data Collection for Data Science: Step-by-Step

    Step-by-step guide to data collection. Data collection gets done in steps, and it's important to understand that this is an iterative and repetitive process, meaning that after the first round of collecting data, you probably need to repeat what you did. In the below sections, you can read about the steps you can take to collect your data.

  7. PDF Chapter 6 Methods of Data Collection Introduction to Methods of Data

    The frequencies expected by chance are calculated by multiplying the frequency for the row times the frequency for the column and dividing by the total number of observations (N). Kappa is calculated by using fO and fC on the diagonal where the categories match. Thus: fO - fC (50 + 20) - (42 + 12) 70 - 54 16.

  8. Data Collection in Research: Examples, Steps, and FAQs

    Data collection is the process of gathering information from various sources via different research methods and consolidating it into a single database or repository so researchers can use it for further analysis. Data collection aims to provide information that individuals, businesses, and organizations can use to solve problems, track progress, and make decisions.

  9. (PDF) Data Collection Methods and Tools for Research; A Step-by-Step

    One of the main stages in a research study is data collection that enables the researcher to find answers to research questions. Data collection is the process of collecting data aiming to gain ...

  10. 4.5 Data Collection Methods

    4.5 Data Collection Methods Choosing the most appropriate and practical data collection method is an important decision that must be made carefully. It is important to recognise that the quality of data collected in a qualitative manner is a direct reflection of the skill and competence of the researcher. Advanced interpersonal skills are ...

  11. 6 Methods of Data Collection (With Types and Examples)

    6 methods of data collection. There are many methods of data collection that you can use in your workplace, including: 1. Observation. Observational methods focus on examining things and collecting data about them. This might include observing individual animals or people in their natural spaces and places.

  12. Assignment 2: Exploratory Data Analysis

    Assignment 2: Exploratory Data Analysis. In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of ...

  13. Data Collection, Analysis, and Interpretation

    6.1.1 Preparation for a Data Collection. A first step in any research project is the research proposal (Sudheesh et al., 2016 ). The research proposal should set out the background to the work, and the reason of the work is necessary. It should set out a hypothesis or a research question.

  14. Assignment: Collecting Data Problem Set

    Look for the things discussed in the text: population, sample, randomness, blind, control, placebos, etc. Download the assignment from one of the links below (.docx or .rtf): Collecting Data Problem Set: Word Document. Collecting Data Problem Set: Rich Text Format.

  15. Data Collection Sheet: Types + [Template Examples]

    A data collection sheet is a systematic tool for collecting and analyzing data in research. Quantitative researchers use data collection sheets to track different numerical values in the course of the systematic investigation. It Saves Time; Using a data collection sheet helps you to be more efficient when carrying out a systematic investigation.

  16. Methods of Data Collection, Representation, and Analysis

    This chapter concerns research on collecting, representing, and analyzing the data that underlie behavioral and social sciences knowledge. Such research, methodological in character, includes ethnographic and historical approaches, scaling, axiomatic measurement, and statistics, with its important relatives, econometrics and psychometrics. The field can be described as including the self ...

  17. Data Collection Methods

    Table of contents. Step 1: Define the aim of your research. Step 2: Choose your data collection method. Step 3: Plan your data collection procedures. Step 4: Collect the data. Frequently asked questions about data collection.

  18. Humanizing Data: Collection, Visualization, and Reflection

    During the second week of the assignment, students turned in their collected data along with a reflection of their data cleaning process and a proposal for how they planned to visualize this work. Splitting into small groups of 5-6, the class then addressed questions designed to frame these projects within the larger scope of the course:

  19. 484-W13 Assignment 4: Data Collection

    Assignment 4: User Data Collection Team Assignment. This assignment is to be performed in a team of about 3-5 people, preferably the same team as the project. Goals and Objectives. Get practical experience with the design, planning, and conduct of data collection activities.

  20. 609 Data Collection Assignment

    Data Collection Assignment 9/26/ Data Collection Assignment. Directions: For each scenario below, identify one target behavior for increase OR decrease, provide an operational definition (please make sure you use the criteria from your operational definition assignment), and identify a data collection method for the selected behavior. ...

  21. DOC University of Washington

    ÐÏ à¡± á> þÿ 1 3 ...

  22. Collection of Data

    Collection of Data. Article. A fundamental question to be considered at the outset is whether the collection of data should be done by complete enumeration or by sampling.In complete enumeration, each and every individual of the group to which data are to relate is covered, and information gathered for each individual separately.

  23. Data Collection Assignment

    Data Collection Assignment (Continued) Option A legal contract, typically purchased for a stated consideration, that permits but does not require the holder of the option (known as the optionee) to buy, sell, or lease real estate for a stipulated period of time in accordance with specified terms; a unilateral right to exercise a privilege.

  24. Selective Safety Data Collection (SSDC) Demonstration Project

    The SSDC demonstration project aims to partner with sponsors to strategically streamline data collection in some late-stage pre-approval or post-approval trials, creating benefits such as reducing ...

  25. Export Planning Data to CSV Files

    Use the Export Supply Chain Planning Data scheduled process job to extract sourcing rules and assignment set data into a comma-delimited values (CSV) file format. The generated file is in the standard file-based import (FBDI) file format that you can then use to do the following:

  26. Abortion data collection bill latest flare up over reproductive rights

    Protestors gathered in Manchester after the U.S. Supreme Court's Dobbs decision in June 2022. On a party-line vote, Republicans in the New Hampshire Senate last week approved a bill that would ...

  27. Secret Rift Over Data Center Fueled Push to Expand Reach of

    A hidden dispute over whether a data center for cloud computing must cooperate with a warrantless surveillance program prompted the House last week to add a mysterious provision to a bill ...

  28. Submission for OMB Review, Comment Request, Proposed Collection: IMLS

    The Institute of Museum and Library Services (IMLS) announces the following information collection has been submitted to the Office of Management and Budget (OMB) for review and approval in accordance with the Paperwork Reduction Act. This program helps to ensure that requested data can be...

  29. Four Mistakes That Leaders Should Avoid With AI

    Dohmen believes that prioritizing data privacy is non-negotiable when it comes to the deployment of AI systems in health. He says: "Our focus should be on using AI to enhance care quality and ...

  30. Intel Gaudi, Xeon and AI PC Accelerate Meta Llama 3 GenAI Workloads

    Why It Matters: As part of its mission to bring AI everywhere, Intel invests in the software and AI ecosystem to ensure that its products are ready for the latest innovations in the dynamic AI space.In the data center, Intel Gaudi and Intel Xeon processors with Intel® Advanced Matrix Extension (Intel® AMX) acceleration give customers options to meet dynamic and wide-ranging requirements.