• Search This Site All UCSD Sites Faculty/Staff Search Term
  • Contact & Directions
  • Climate Statement
  • Cognitive Behavioral Neuroscience
  • Cognitive Psychology
  • Developmental Psychology
  • Social Psychology
  • Adjunct Faculty
  • Non-Senate Instructors
  • Researchers
  • Psychology Grads
  • Affiliated Grads
  • New and Prospective Students
  • Honors Program
  • Experiential Learning
  • Programs & Events
  • Psi Chi / Psychology Club
  • Prospective PhD Students
  • Current PhD Students
  • Area Brown Bags
  • Colloquium Series
  • Anderson Distinguished Lecture Series
  • Speaker Videos
  • Undergraduate Program
  • Academic and Writing Resources

Writing Research Papers

  • Writing a Literature Review

When writing a research paper on a specific topic, you will often need to include an overview of any prior research that has been conducted on that topic.  For example, if your research paper is describing an experiment on fear conditioning, then you will probably need to provide an overview of prior research on fear conditioning.  That overview is typically known as a literature review.  

Please note that a full-length literature review article may be suitable for fulfilling the requirements for the Psychology B.S. Degree Research Paper .  For further details, please check with your faculty advisor.

Different Types of Literature Reviews

Literature reviews come in many forms.  They can be part of a research paper, for example as part of the Introduction section.  They can be one chapter of a doctoral dissertation.  Literature reviews can also “stand alone” as separate articles by themselves.  For instance, some journals such as Annual Review of Psychology , Psychological Bulletin , and others typically publish full-length review articles.  Similarly, in courses at UCSD, you may be asked to write a research paper that is itself a literature review (such as, with an instructor’s permission, in fulfillment of the B.S. Degree Research Paper requirement). Alternatively, you may be expected to include a literature review as part of a larger research paper (such as part of an Honors Thesis). 

Literature reviews can be written using a variety of different styles.  These may differ in the way prior research is reviewed as well as the way in which the literature review is organized.  Examples of stylistic variations in literature reviews include: 

  • Summarization of prior work vs. critical evaluation. In some cases, prior research is simply described and summarized; in other cases, the writer compares, contrasts, and may even critique prior research (for example, discusses their strengths and weaknesses).
  • Chronological vs. categorical and other types of organization. In some cases, the literature review begins with the oldest research and advances until it concludes with the latest research.  In other cases, research is discussed by category (such as in groupings of closely related studies) without regard for chronological order.  In yet other cases, research is discussed in terms of opposing views (such as when different research studies or researchers disagree with one another).

Overall, all literature reviews, whether they are written as a part of a larger work or as separate articles unto themselves, have a common feature: they do not present new research; rather, they provide an overview of prior research on a specific topic . 

How to Write a Literature Review

When writing a literature review, it can be helpful to rely on the following steps.  Please note that these procedures are not necessarily only for writing a literature review that becomes part of a larger article; they can also be used for writing a full-length article that is itself a literature review (although such reviews are typically more detailed and exhaustive; for more information please refer to the Further Resources section of this page).

Steps for Writing a Literature Review

1. Identify and define the topic that you will be reviewing.

The topic, which is commonly a research question (or problem) of some kind, needs to be identified and defined as clearly as possible.  You need to have an idea of what you will be reviewing in order to effectively search for references and to write a coherent summary of the research on it.  At this stage it can be helpful to write down a description of the research question, area, or topic that you will be reviewing, as well as to identify any keywords that you will be using to search for relevant research.

2. Conduct a literature search.

Use a range of keywords to search databases such as PsycINFO and any others that may contain relevant articles.  You should focus on peer-reviewed, scholarly articles.  Published books may also be helpful, but keep in mind that peer-reviewed articles are widely considered to be the “gold standard” of scientific research.  Read through titles and abstracts, select and obtain articles (that is, download, copy, or print them out), and save your searches as needed.  For more information about this step, please see the Using Databases and Finding Scholarly References section of this website.

3. Read through the research that you have found and take notes.

Absorb as much information as you can.  Read through the articles and books that you have found, and as you do, take notes.  The notes should include anything that will be helpful in advancing your own thinking about the topic and in helping you write the literature review (such as key points, ideas, or even page numbers that index key information).  Some references may turn out to be more helpful than others; you may notice patterns or striking contrasts between different sources ; and some sources may refer to yet other sources of potential interest.  This is often the most time-consuming part of the review process.  However, it is also where you get to learn about the topic in great detail.  For more details about taking notes, please see the “Reading Sources and Taking Notes” section of the Finding Scholarly References page of this website.

4. Organize your notes and thoughts; create an outline.

At this stage, you are close to writing the review itself.  However, it is often helpful to first reflect on all the reading that you have done.  What patterns stand out?  Do the different sources converge on a consensus?  Or not?  What unresolved questions still remain?  You should look over your notes (it may also be helpful to reorganize them), and as you do, to think about how you will present this research in your literature review.  Are you going to summarize or critically evaluate?  Are you going to use a chronological or other type of organizational structure?  It can also be helpful to create an outline of how your literature review will be structured.

5. Write the literature review itself and edit and revise as needed.

The final stage involves writing.  When writing, keep in mind that literature reviews are generally characterized by a summary style in which prior research is described sufficiently to explain critical findings but does not include a high level of detail (if readers want to learn about all the specific details of a study, then they can look up the references that you cite and read the original articles themselves).  However, the degree of emphasis that is given to individual studies may vary (more or less detail may be warranted depending on how critical or unique a given study was).   After you have written a first draft, you should read it carefully and then edit and revise as needed.  You may need to repeat this process more than once.  It may be helpful to have another person read through your draft(s) and provide feedback.

6. Incorporate the literature review into your research paper draft.

After the literature review is complete, you should incorporate it into your research paper (if you are writing the review as one component of a larger paper).  Depending on the stage at which your paper is at, this may involve merging your literature review into a partially complete Introduction section, writing the rest of the paper around the literature review, or other processes.

Further Tips for Writing a Literature Review

Full-length literature reviews

  • Many full-length literature review articles use a three-part structure: Introduction (where the topic is identified and any trends or major problems in the literature are introduced), Body (where the studies that comprise the literature on that topic are discussed), and Discussion or Conclusion (where major patterns and points are discussed and the general state of what is known about the topic is summarized)

Literature reviews as part of a larger paper

  • An “express method” of writing a literature review for a research paper is as follows: first, write a one paragraph description of each article that you read. Second, choose how you will order all the paragraphs and combine them in one document.  Third, add transitions between the paragraphs, as well as an introductory and concluding paragraph. 1
  • A literature review that is part of a larger research paper typically does not have to be exhaustive. Rather, it should contain most or all of the significant studies about a research topic but not tangential or loosely related ones. 2   Generally, literature reviews should be sufficient for the reader to understand the major issues and key findings about a research topic.  You may however need to confer with your instructor or editor to determine how comprehensive you need to be.

Benefits of Literature Reviews

By summarizing prior research on a topic, literature reviews have multiple benefits.  These include:

  • Literature reviews help readers understand what is known about a topic without having to find and read through multiple sources.
  • Literature reviews help “set the stage” for later reading about new research on a given topic (such as if they are placed in the Introduction of a larger research paper). In other words, they provide helpful background and context.
  • Literature reviews can also help the writer learn about a given topic while in the process of preparing the review itself. In the act of research and writing the literature review, the writer gains expertise on the topic .

Downloadable Resources

  • How to Write APA Style Research Papers (a comprehensive guide) [ PDF ]
  • Tips for Writing APA Style Research Papers (a brief summary) [ PDF ]
  • Example APA Style Research Paper (for B.S. Degree – literature review) [ PDF ]

Further Resources

How-To Videos     

  • Writing Research Paper Videos
  • UCSD Library Psychology Research Guide: Literature Reviews

External Resources

  • Developing and Writing a Literature Review from N Carolina A&T State University
  • Example of a Short Literature Review from York College CUNY
  • How to Write a Review of Literature from UW-Madison
  • Writing a Literature Review from UC Santa Cruz  
  • Pautasso, M. (2013). Ten Simple Rules for Writing a Literature Review. PLoS Computational Biology, 9 (7), e1003149. doi : 1371/journal.pcbi.1003149

1 Ashton, W. Writing a short literature review . [PDF]     

2 carver, l. (2014).  writing the research paper [workshop]. , prepared by s. c. pan for ucsd psychology.

Back to top

  • Research Paper Structure
  • Formatting Research Papers
  • Using Databases and Finding References
  • What Types of References Are Appropriate?
  • Evaluating References and Taking Notes
  • Citing References
  • Writing Process and Revising
  • Improving Scientific Writing
  • Academic Integrity and Avoiding Plagiarism
  • Writing Research Papers Videos
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

50+ Research Topics for Psychology Papers

How to Find Psychology Research Topics for Your Student Paper

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

research paper simply psychology

Steven Gans, MD is board-certified in psychiatry and is an active supervisor, teacher, and mentor at Massachusetts General Hospital.

research paper simply psychology

  • Specific Branches of Psychology
  • Topics Involving a Disorder or Type of Therapy
  • Human Cognition
  • Human Development
  • Critique of Publications
  • Famous Experiments
  • Historical Figures
  • Specific Careers
  • Case Studies
  • Literature Reviews
  • Your Own Study/Experiment

Are you searching for a great topic for your psychology paper ? Sometimes it seems like coming up with topics of psychology research is more challenging than the actual research and writing. Fortunately, there are plenty of great places to find inspiration and the following list contains just a few ideas to help get you started.

Finding a solid topic is one of the most important steps when writing any type of paper. It can be particularly important when you are writing a psychology research paper or essay. Psychology is such a broad topic, so you want to find a topic that allows you to adequately cover the subject without becoming overwhelmed with information.

In some cases, such as in a general psychology class, you might have the option to select any topic from within psychology's broad reach. Other instances, such as in an  abnormal psychology  course, might require you to write your paper on a specific subject such as a psychological disorder.

As you begin your search for a topic for your psychology paper, it is first important to consider the guidelines established by your instructor.

Research Topics Within Specific Branches of Psychology

The key to selecting a good topic for your psychology paper is to select something that is narrow enough to allow you to really focus on the subject, but not so narrow that it is difficult to find sources or information to write about.

One approach is to narrow your focus down to a subject within a specific branch of psychology. For example, you might start by deciding that you want to write a paper on some sort of social psychology topic. Next, you might narrow your focus down to how persuasion can be used to influence behavior .

Other social psychology topics you might consider include:

  • Prejudice and discrimination (i.e., homophobia, sexism, racism)
  • Social cognition
  • Person perception
  • Social control and cults
  • Persuasion, propaganda, and marketing
  • Attraction, romance, and love
  • Nonverbal communication
  • Prosocial behavior

Psychology Research Topics Involving a Disorder or Type of Therapy

Exploring a psychological disorder or a specific treatment modality can also be a good topic for a psychology paper. Some potential abnormal psychology topics include specific psychological disorders or particular treatment modalities, including:

  • Eating disorders
  • Borderline personality disorder
  • Seasonal affective disorder
  • Schizophrenia
  • Antisocial personality disorder
  • Profile a  type of therapy  (i.e., cognitive-behavioral therapy, group therapy, psychoanalytic therapy)

Topics of Psychology Research Related to Human Cognition

Some of the possible topics you might explore in this area include thinking, language, intelligence, and decision-making. Other ideas might include:

  • False memories
  • Speech disorders
  • Problem-solving

Topics of Psychology Research Related to Human Development

In this area, you might opt to focus on issues pertinent to  early childhood  such as language development, social learning, or childhood attachment or you might instead opt to concentrate on issues that affect older adults such as dementia or Alzheimer's disease.

Some other topics you might consider include:

  • Language acquisition
  • Media violence and children
  • Learning disabilities
  • Gender roles
  • Child abuse
  • Prenatal development
  • Parenting styles
  • Aspects of the aging process

Do a Critique of Publications Involving Psychology Research Topics

One option is to consider writing a critique paper of a published psychology book or academic journal article. For example, you might write a critical analysis of Sigmund Freud's Interpretation of Dreams or you might evaluate a more recent book such as Philip Zimbardo's  The Lucifer Effect: Understanding How Good People Turn Evil .

Professional and academic journals are also great places to find materials for a critique paper. Browse through the collection at your university library to find titles devoted to the subject that you are most interested in, then look through recent articles until you find one that grabs your attention.

Topics of Psychology Research Related to Famous Experiments

There have been many fascinating and groundbreaking experiments throughout the history of psychology, providing ample material for students looking for an interesting term paper topic. In your paper, you might choose to summarize the experiment, analyze the ethics of the research, or evaluate the implications of the study. Possible experiments that you might consider include:

  • The Milgram Obedience Experiment
  • The Stanford Prison Experiment
  • The Little Albert Experiment
  • Pavlov's Conditioning Experiments
  • The Asch Conformity Experiment
  • Harlow's Rhesus Monkey Experiments

Topics of Psychology Research About Historical Figures

One of the simplest ways to find a great topic is to choose an interesting person in the  history of psychology  and write a paper about them. Your paper might focus on many different elements of the individual's life, such as their biography, professional history, theories, or influence on psychology.

While this type of paper may be historical in nature, there is no need for this assignment to be dry or boring. Psychology is full of fascinating figures rife with intriguing stories and anecdotes. Consider such famous individuals as Sigmund Freud, B.F. Skinner, Harry Harlow, or one of the many other  eminent psychologists .

Psychology Research Topics About a Specific Career

​Another possible topic, depending on the course in which you are enrolled, is to write about specific career paths within the  field of psychology . This type of paper is especially appropriate if you are exploring different subtopics or considering which area interests you the most.

In your paper, you might opt to explore the typical duties of a psychologist, how much people working in these fields typically earn, and the different employment options that are available.

Topics of Psychology Research Involving Case Studies

One potentially interesting idea is to write a  psychology case study  of a particular individual or group of people. In this type of paper, you will provide an in-depth analysis of your subject, including a thorough biography.

Generally, you will also assess the person, often using a major psychological theory such as  Piaget's stages of cognitive development  or  Erikson's eight-stage theory of human development . It is also important to note that your paper doesn't necessarily have to be about someone you know personally.

In fact, many professors encourage students to write case studies on historical figures or fictional characters from books, television programs, or films.

Psychology Research Topics Involving Literature Reviews

Another possibility that would work well for a number of psychology courses is to do a literature review of a specific topic within psychology. A literature review involves finding a variety of sources on a particular subject, then summarizing and reporting on what these sources have to say about the topic.

Literature reviews are generally found in the  introduction  of journal articles and other  psychology papers , but this type of analysis also works well for a full-scale psychology term paper.

Topics of Psychology Research Based on Your Own Study or Experiment

Many psychology courses require students to design an actual psychological study or perform some type of experiment. In some cases, students simply devise the study and then imagine the possible results that might occur. In other situations, you may actually have the opportunity to collect data, analyze your findings, and write up your results.

Finding a topic for your study can be difficult, but there are plenty of great ways to come up with intriguing ideas. Start by considering your own interests as well as subjects you have studied in the past.

Online sources, newspaper articles, books , journal articles, and even your own class textbook are all great places to start searching for topics for your experiments and psychology term papers. Before you begin, learn more about  how to conduct a psychology experiment .

What This Means For You

After looking at this brief list of possible topics for psychology papers, it is easy to see that psychology is a very broad and diverse subject. While this variety makes it possible to find a topic that really catches your interest, it can sometimes make it very difficult for some students to select a good topic.

If you are still stumped by your assignment, ask your instructor for suggestions and consider a few from this list for inspiration.

  • Hockenbury, SE & Nolan, SA. Psychology. New York: Worth Publishers; 2014.
  • Santrock, JW. A Topical Approach to Lifespan Development. New York: McGraw-Hill Education; 2016.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Big Data in Psychology: Introduction to Special Issue

Lisa l. harlow.

Department of Psychology, University of Rhode Island

Frederick L. Oswald

Department of Psychology, Rice University

The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data are mentioned, including: 1. The benefits of collaboration across disciplines, such as those in the social sciences, applied statistics, and computer science. Doing so assists in grounding big data research in sound theory and practice, as well as in affording effective data retrieval and analysis. 2. Availability of large datasets on Facebook, Twitter, and other social media sites that provide a psychological window into the attitudes and behaviors of a broad spectrum of the population. 3. Identifying, addressing, and being sensitive to ethical considerations when analyzing large datasets gained from public or private sources. 4. The unavoidable necessity of validating predictive models in big data by applying a model developed on one dataset to a separate set of data or hold-out sample. Translational abstracts that summarize the articles in very clear and understandable terms are included in Appendix A , and a glossary of terms relevant to big data research discussed in the articles is presented in Appendix B .

Big data involves the storing, retrieval, and analysis of large amounts of information and has been gaining interest in the scientific literature writ large since the 1990s. As a catch-all term, big data has also been referred to by a number of other related terms such as: data mining, knowledge discovery in databases, data or predictive analytics, or data science. The domain has traditionally been associated with computer science, statistics, and business, and now it is clearly, quickly, and usefully making inroads into psychological research and applied practice. There is a healthy and growing infrastructure for dealing with big data, some of it being open source and free to use. For example, Hadoop (a name originally based on that of a child’s toy elephant) is a widely used open source file system and framework. Within this framework, MySQL is a structured query language that is also open source and is used a great deal. It allows powerful capabilities to “Select” a specific group of entities, “From” a specific database or set of files, “Where” one or more specific conditions hold. For example, an academic researcher could select and analyze data based on student identification numbers from class records in several majors, where the GPA is less than 2.0. In turn, this could allow for the possibility of strategic data-driven interventions with these students to offer enrichment or tutoring that would bolster their grades and improve their chances of staying in school and succeeding. Once big data are queried and refined, they can be analyzed with a number of tools, increasingly with commonly known software and programs such as R and Python, respectively.

Who is using big data? Business industries in this area abound (e.g., insurance, manufacturing, retail, pharmaceuticals, transportation, utilities, law, gaming, eBay, telecommunication, hotels). Social media is also prominently involved (e.g., Google, Facebook, LinkedIn, Yahoo, Twitter). Various academic disciplines also have a visible presence (e.g., genomics, medicine, and environmental sciences, the latter often using spatial geographic information systems, or GIS). There are several journals in this area, including the open access and peer-reviewed journal Big Data , founded in 2013 and currently edited by Dhar. Their web page ( http://www.liebertpub.com/overview/big-data/611/ ) boasts a comprehensive coverage and audience—yet has not yet mentioned psychology or even the broader social sciences. At least two other journals were founded in 2014, the open access Journal of Big Data that is edited by Furht and Khoshgoftaar, and Big Data Research that is edited by Wu and Palpanas. Likewise, these two journals also do not appear to be directed to those in psychology or the larger social sciences. Similarly, a quick Google search in September 2016 for “big data book” revealed more than 48 million results, although it is noteworthy that all of the big data books listed on the front page, are not specifically directed to social science fields. Noting all of this is not to indict the current state of big data for neglecting psychology—quite the opposite: Psychology and the social sciences should be proactive and take advantage of a real opportunity in front of them. The timing is ripe, now that the big data movement has matured beyond many of its fads.

So, where does psychology fit into the field of big data or related areas such as computational social science? There are a number of areas in which psychology could and has begun to weigh in, such as wellness, mental health, depression, substance use, behavioral health, behavior change, social media, workplace well-being and effectiveness, student learning and adjustment, and behavioral genetics. A number of recent books of interest to psychology researchers have been published ( Alvarez, 2016 ; Cioffi-Revilla, 2014 ; Mayer-Schönberger & Cukier, 2013 ; McArdle & Ritschard, 2014 , to name a few). Researchers are studying topics such as health and the human condition in big datasets comprising thousands of individuals, such as in the Kavli Human Project ( http://kavlihumanproject.org/ ; Azmak et al., 2015 ). In a similar vein, Fawcett (2016) discusses the analysis of what is called the quantified self in which individuals collect data on themselves (e.g., number of steps, heart rate, sleep patterns) using personal trackers such as Fitbit, Jawbone, iPhone, and similar devices. Researchers envision studies that could link such personal data to health and productivity to reveal patterns or links between behavior and various outcomes of interest.

It is apparent that big data or data science is here to stay, with or without psychology. This broad-and-growing field offers a unique opportunity for interested psychological scientists to be involved in addressing the complex technical, substantive, and ethical challenges with regard to storing, retrieving, analyzing, and verifying large datasets. Big data science can be instrumental in collaboratively working to uncover and illuminate cogent and robust patterns in psychological data that directly or indirectly involve human behavior, cognition, and affect over time and within sociocultural systems. These psychological patterns, in turn, give meaning to non-psychological data (e.g., medical data involving health-related interventions; booms and busts tied to financial investing behavior). The big data community, and big data themselves, can together propel psychological science forward.

In this special issue, we offer 10 articles that focus on various aspects of big data and how they can be used by applied researchers in psychology and other social science fields. One of the common themes of these articles is also clearly evident in federal funding announcements for big data projects: Psychologists and psychology benefit from the collaboration and contributions of other disciplines—and vice-versa. For example, such collaborations can incorporate cutting-edge breakthroughs from computer science that can help access and analyze large amounts of data, as well as theory and behavioral science from across the social sciences that offer insight into the areas that are most in need of understanding, prediction, and intervention.

A second theme is that data are widely available in open forums such as Facebook, Twitter, and other social media sites, and can offer the opportunity to identify trends and patterns that are important to address. For example, tapping the content of Google activity could indicate geographic areas where users are inquiring about various flu or other symptoms, thus pointing to areas in which it may be important to focus health intervention efforts. The psychological nature of the query content might allow for early planning in targeting the intervention (e.g., judging the level of knowledge and concern about the health problem and its related symptoms and treatment). Note that when big data analyses incidentally detect a useful signal in the noise of social media data, one’s discoveries and research efforts need not stop there; researchers can develop new construct-driven measures that help amplify those signals that may have initially been discovered serendipitously.

A third general theme is that it is critically important to consider and carefully attend to the ethical issues of big data projects, including data acquisition and security, the protection of the identity of the users who often inadvertently provide extensive data, and decisions about how the information will be used and interpreted vis-à-vis the nature of the audience or stakeholders involved.

A fourth shared theme of these articles is that it is essential to develop theories and hypotheses on an initial training set of data and then verify those findings with other validation datasets, either from a hold-out sample of the original data or from separate, independent data. With the existence of large datasets that often may not have had an overriding theory or set of hypotheses guiding their formation, an initial analysis of big data is often at the exploratory or data mining level. At least one or more subsequent analyses of separate data may be needed to be able to generalize past the initial data, particularly as there can be a large number of variables that are relevant to prediction, but not necessarily the best measures that one could obtain with additional foresight and planning. Given a large number of incidental variables, and given the flexible modeling afforded by big data analyses, it is perhaps more important than ever to avoid over-interpreting what might be considered a modern-day version of the classic “ crud factor ” ( Meehl, 1990 , p. 108), namely where researchers could find the appearance of relationships between variables in a large dataset that are robustly upheld (e.g., through cross-validation), yet these relationships may change or dissipate over time, as the nature of the relevant sample, population, and the phenomenon under study change as well. Each of the articles in this special issue address one or more of these four themes in relatively easy-to-understand presentations of how big data can be used by researchers in psychology.

A summary of the highlights of the articles is presented, below, followed by Appendix A , which provides translational abstracts (TAs) of the articles, briefly describing the essence of the papers in clearly understandable language. Appendix B includes a Glossary of some of the major terms used in the 10 articles, providing brief descriptions of each and an indication of which articles refer to these terms. To be clear, the Glossary is not intended to provide an exhaustive list of big data concepts; it is more of a summary of some of the ideas and practices that are referred to in these special issue articles so that readers can have a reference of the terminology and find out which special issue articles are discussing them. To help identify which terms are included in the Glossary in Appendix B , these terms are italicized in this introductory article, although not necessarily in the separate articles themselves.

The first article by Chen and Wojcik offers an excellent guide to conducting behavioral science research on large datasets. In addition to describing some background and concepts, they provide three tutorials in the supplemental materials in which interested readers can move through the steps. Their first tutorial clearly indicates how to acquire the congressional speech data through application programming interfaces ( APIs ) that reflect specific procedures needed to acquire data from a site. Their second tutorial demonstrates how these data are analyzed using procedures known as latent semantic analysis ( LSA ) and latent Dirichlet allocation ( LDA ) topic modeling, both of which can be used to assess the co-occurrence of words in a dataset based on underlying topics and relationships between documents. Other terms, common to the big data community and discussed in their main article and their third tutorial, include bag of words , stop words , support vector machines , machine learning , and s upervised learning algorithms (see also our Glossary in Appendix B of this article). Chen and Wojcik also provide two appendices to help apply the material they discuss. Their Appendix A provides the Python code for acquiring data from the Congressional Daily Digest that are discussed in the first and second tutorials, and the use of MySQL . Their Appendix B offers a checklist for conducting research with big data.

In the second article, Landers et al. discuss web scraping , an automated process that can quickly extract data from websites behind the scenes. Behavioral scientists are increasingly involved in this type of research, within academia and in organizations, determining the pulse of social consciousness and norms on web sources such as Facebook, Twitter, Instagram, and Google. Along with delineating potential benefits of web scraping, Landers et al. also provide their expert advice on the need to emphasize theory in such a project. In particular, they discuss what they call theory of the data source or data source theory to help ensure the relevance and meaningfulness of data that are obtained from web scraping. Although there are not yet exact standards on the ethics of scraping the web for data, Landers et al. suggest that the APA Ethical Principles of Psychologists and Code of Ethics (2010) , along with those from the Data Science Association , can suggest policies and procedures for collecting data in a responsible manner that respects the participants and the research field in which conclusions will be shared. Assessing large datasets that are gleaned or scraped from the web using the theory-driven method suggested by Landers et al. can help lessen the possibility that the findings are just happenstances of a large collection of information.

The third article, by Kosinski et al., discusses how to use large data bases collected from the web to understand and predict a relevant outcome. Their paper is a tutorial that describes an example of using Facebook digital footprint data, stored in what is called a user-footprint matrix , to predict personality characteristics. The authors analyze input from over 100,000 Facebook users (see myPersonality project , http://www.mypersonality.org/ ; Kosinski, Matz, Gosling, Popov, & Stillwell, 2015) using dimension-reduction procedures such as singular value decomposition ( SVD ) that is computationally easy to use as a method for conducting principal components analysis . The Kosinski et al. article also discusses a clustering procedure known as latent Dirichlet allocation ( LDA ) to help form dimensions with similar content from large datasets of text or counts of words or products. Findings from an LDA model can be visually depicted in a heatmap that shows darker colors when a trait or characteristic is more correlated with one of the LDA clusters. Thus, you can see at glance the patterns that characterize each cluster.

In the fourth article, Kern et al. discuss the analysis of big data found on social media, such as on Facebook and Twitter. The authors discuss several steps in acquiring, processing, and quantifying these kinds of data, so as to make them more manageable for statistical analyses. The authors discuss a World Well-Being Project and use LDA or latent semantic analysis that helps reduce large amounts of text-based information into a smaller set of relevant dimensions. They also discuss a procedure known as differential language analysis , encouraging the use of database management systems that pervade the world of business and increasingly are being implemented in psychological research. Cautioning that results could be specific to a particular dataset and need to be further tested with independent data, Kern et al. explain and implement the k-fold cross-validation method that tests a prediction model across repeated subsets of a large dataset to support the robustness of the findings. The authors also discuss prediction methods such as the lasso (i.e., least absolute shrinkage and selection operator ) as a regression method for robust prediction, based on screening a large set of predictors and weighting predictors that were selected conservatively (i.e., with lower magnitudes than traditional OLS regression). They also caution against ecological fallacies , whereby researchers derive erroneous conclusions about individuals and subgroups based on results from a larger group of data, and exception fallacies , when a conclusion is drawn based on outliers (exceptions) in the data that may stand out but may not fully represent the group. Not everyone uses social media, and some use it far more often or idiosyncratically than others. Still, these authors are optimistic about the amount and richness of the data that can be gleaned from social media, and the insights that can be gained from such data.

In the fifth article, Jones, Wojcik, Sweting, and Silver examine the content of Twitter posts after three different traumatic events (violence in or near college campuses), applying linguistic analyses to the text for negative emotional responses. They discuss a procedure known as Linguistic Inquiry and Word Count and an R-based computer twitteR package to analyze such data. Using an innovative approach, the authors recognize pertinent Twitter users by identifying people who follow relevant community networks tied to the geographical area of the event, and they are careful to compare results with control groups not similarly geographically situated, to help ensure that results were event-driven, versus other contemporaneous events that were more geographically widespread. Overall, this work demonstrates how psychological themes can be reliably extracted and related to region- and time-dependent events, similar to prior related work in the health arena.

In the sixth article, Stanley and Byrne contribute a theory-driven approach to big-data modeling of human memory (i.e., long-term knowledge storage and retrieval), testing two theoretical models that predict the tags that users apply to Twitter and Stack Overflow posts. Incorporating but going beyond the psychological tenet that “past behavior predicts future behavior,” the current models robustly predict how and to what extent this tenet applies given the nature, recency, and frequency of past behavior. This paper exemplifies an important general point, that big-data analyses benefit from being theory-driven, demonstrating how theories can develop in their usefulness as a joint function of empirical competition (i.e., deciding which model affords better prediction) and empirical cooperation (i.e., demonstrating how model ensembles might account for the data more robustly than models taken individually). The authors discuss the use of an ACT-R based Bayesian model and a random permutation model to understand and clarify predictions about links between processes and outcomes.

The seventh article, by Brandmaier et al, discuss ensemble methods that they developed, one of which is called structural equation model (SEM) trees that combines decision trees (also called recursive partitioning methods ) and SEM to understand the nature of a large dataset. These authors suggest an extended method called SEM forests that allows researchers to generate and test hypotheses, combining both data- and theory-based approaches. These and other methods, such as latent class analysis and multiple sample SEM , help in assessing distinct clusters in the data. Several methods are described to gauge how effectively an SEM forest is modeling the data, such as examining variable importance based on out-of-bag samples from the SEM trees, as well as case proximity and conversely, an average dissimilarity metric, the latter indicating its novelty . Brandmaier et al. provide two examples to demonstrate the use of SEM forests. Interested researchers can conduct similar analyses using Brandmaier’s (2015) semtree package that is written in R, with their supplemental material providing the R code for the examples they provide.

In the eighth article, Miller, Lubke, McArtor, and Bergeman detail a new method for detecting robust nonlinearities and interactions in large data sets based on decision trees . Called multivariate gradient boosted trees , this method extends a well-established machine-learning or statistical learning theory method. Whereas most predictive models in the big data arena seek to predict a single criterion, the present approach consider multiple criteria to be predicted (as does the Beaton et al. partial least squares correspondence analysis method). Such exploration is useful for informing and refining theories, measures, and models that take a more deductive approach. To do this, a boosted tree-based model for each outcome is fit separately, where the goal is to minimize cross-validated prediction error across all outcomes. An advantage of tree-based methods comes in detecting complex predictive relationships (interactions and nonlinearities) without having to specify their functional form beforehand. In the current approach, tree models can be compared across outcomes, and the explained covariance between pairs of outcomes can also be explored. The authors illustrate this approach using measures of psychological well-being as predictors of multiple psychological and physical health outcomes. Interested readers can apply this method to their own data with Miller’s R-based mvtboost package .

In the ninth article, Chapman, Weiss, and Dubenstein consider measure-development models that focus squarely on predictive validity using a machine-learning approach that challenges—and complements—traditional approaches to measure development involving psychometric reliability. The proposed approach seeks out additional model complexity so long as it is justified by increased prediction; the approach incorporates k-fold cross validation methods to avoid model overfitting. Almost two decades ago, McDonald’s (1999) classic book, Test theory: A unified treatment , also suggested that measures of a construct judged to be similar should not only demonstrate psychometric reliability, but also show similar relationships with measures of other constructs in a larger nomological net. The current big-data paper reflects one important step toward advancing this general idea, discussing procedures and terms such as elastic net, expected prediction error , generalized cross-validation error , stochastic gradient boosting and supervised principal components analysis , as well as R-based computer packages glmnet , and superpc .

For the final 10 th article, Beaton, Dunlop, and Abdi jointly analyze genetic, behavioral, and structural MRI in a tutorial for a generalized version of partial least squares called partial least squares correspondence analysis (PLSCA). The method can handle disparate data types that are on widely different scales, as might become increasingly common in large and complex data sets. In particular, their methods can accommodate categorical data when analyzing relationships between two sets of multivariate data, where traditional analyses assume the data for each variable are continuous (or even more strictly, multivariate normal). These authors have developed a freely available R package, TExPosition , which allows readers to apply the PLSCA method to their own data.

In closing, we hope you find something of interest to you in one or more of the 10 articles we present in this special issue on the use of big data in psychology. We recognize that other articles may approach these topics differently, and likewise, many other big data topics will be discussed in the future. We look forward to continued tutorials and other research publications in Psychological Methods that share even more about how to apply innovative and informative big data methods to meaningful and relevant data of interest to researchers in psychology and related social science fields.

Acknowledgments

The co-editors (Harlow and Oswald) would like to thank the authors and reviewers who contributed to this special issue. We also would like to offer much appreciation and thanks to our manuscript coordinator, Meleah Ladd, who has played an integral part in helping to make every aspect of our work better and more enjoyable, and especially so with this special issue. Lisa Harlow also extends thanks to the National Institutes of Health grant G20RR030883.

Appendix A: Translational Abstracts (TAs) for the 10 Special Issue Articles

The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering the practices of acquiring, managing, processing, and analyzing data. It is accompanied by three tutorials that walk through the acquisition of real text data, the analysis of that text data, and the use of an algorithm to classify data into different categories. Big data practitioners in academia, industry, and the community have built a comprehensive base of tools and knowledge that makes big data research accessible to researchers in a broad range of fields. However, big data research does require knowledge of software programming and a different analytical mindset. For those willing to acquire the requisite skills, innovative analyses of unexpected or previously untapped data sources can offer fresh ways to develop, test, and extend theories. When conducted with care and respect, big data research can become an essential complement to traditional research.

One of the biggest challenges for psychology researchers is finding high quality sources of data to address research questions of interest. Often, researchers rely on simply giving surveys to undergraduate students, which can cause problems when trying to draw conclusions about human behavior in general. To work around these problems, sometimes researchers actually watch people in real life, or observe via the web, taking notes on their behaviors to be analyzed later. But this process is time-consuming, difficult, and error-prone. In this paper, we provide a tutorial on a technique that can be used to create datasets summarizing actual human behavior on the internet in an automated way, partially solving both of these problems. This big data technique, called web scraping, takes advantage of a programming language called Python commonly used by data scientists. We also introduce a new related concept, called data source theories, as a way to address a common criticism of many big data approaches – specifically, that because the analytic techniques are “data-driven,” they tend to take advantage of luck more so than psychology’s typical approaches. As a result of this tendency, researchers sometimes make conclusions that do not reflect reality beyond their dataset. In creating a data source theory, researchers precisely account why the data they found exist and test the hypotheses implied by that theory with additional analyses. Thus, we combine the strengths of psychology (i.e., high quality measurement and rich theory) with those of data science (i.e., flexibility and power in analysis).

Humans are increasingly migrating to the digital environment, producing large amounts of digital footprints of behaviors, communication, and social interactions. Analyzing big datasets of such footprints presents unique methodological challenges, but could greatly further our understanding of individuals, groups, and societies. This tutorial provides an accessible introduction to the crucial methods used in big data analysis. We start by listing potential data sources, and explain how to efficiently store and prepare data for the analysis. We then show the reader how to reduce the dimensionality and extract patterns from big datasets. Finally, we demonstrate how to employ such data to build prediction models. The text is accompanied by examples of R code and a sample dataset, allowing the reader put their new skills into practice.

Many people spend considerable time on social media sites such as Facebook and Twitter, expressing thoughts, emotions, behaviors, and more. The massive data that are available provide researchers with opportunities to study people within their real-world contexts, at a scale previously impossible for psychological research. However, typical psychological methods are inadequate for dealing with the size and messiness of such data. Modern computational linguistics strategies offer tools and techniques, and numerous resources are available, but there is little guidance for psychologists on where to even begin. We provide an introduction to help guide such research. We first consider how to acquire social media data and transform it from meaningless characters into words, phrases, and topics. Both top down theory driven approaches and bottom up data-driven approaches can be used to describe characteristics of individuals, groups, and communities, and to predict other outcomes. We then provide several examples from our own work, looking at personality and well-being. However, the power and potential of social media language data also brings responsibility. We highlight challenges and issues that need to be considered, including how data are accessed, processed, analyzed, and interpreted, and ever-evolving ethical issues. Social media has become a valuable part of social life, and there is much we can learn by cautiously bringing together the tools of computer science with the theories and insights of psychology.

Capturing a snapshot of emotional responses of a community soon after a collective trauma (e.g., school shooting) is difficult. However, because of its rapid distribution and widespread use, social media such as Twitter may provide an immediate window into a community’s emotional response. Nonetheless, locating Twitter users living in communities that have experienced collective traumas is challenging. Prior researchers have either used the extremely small number of geo-tagged tweets (3–6%) to identify residents of affected communities or used hashtags to collect tweets without certainty of the users’ location. We offer an alternative: identify a subset of local community Twitter accounts (e.g., city hall), identify followers of those accounts, and download their tweets for content analysis. Across three case studies of college campus killings (i.e., UC-Santa Barbara, Northern Arizona State University, Umpqua Community College), we demonstrate the utility of this method for rapidly investigating negative emotion expression among likely community members. Using rigorous longitudinal quasi-experimental designs, we randomly selected Twitter users from each impacted community and matched control communities to compare patterns of negative emotion expression in users’ tweets. Despite variation in the severity of violence across cases, similar patterns of increased negative emotion expression were visible in tweets posted by followers of Twitter accounts in affected communities after the killings compared to before the violence. Tweets from community-based Twitter followers in matched control communities showed no change in negative emotion expression over time. Using localized Twitter data offers promise in studying community-level response in the immediate aftermath of collective traumas.

The growth of social media and user-created content on online sites provides unique opportunities to study models of long-term memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a long-term memory retrieval problem, two long-term memory models were tested on millions of posts and tweets and evaluated on how accurately they predict a user’s chosen tags. An uncompressed and compressed model (in terms of storage of information in long-term memory) were tested on the large datasets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the compressed model that previously used only context. Also, an attentional weight term in the uncompressed model was linked to a natural language processing method used to attenuate common words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the compressed model performed comparably to the uncompressed model without including word order. This shows that the strength of the compressed model is not in the ability to represent word order, but rather in the way in which information is efficiently compressed. The results of the large-scale exploration show how the architecture of the two memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains.

Building models fully informed by theory is impossible when data sets are large and their relations to theory not yet specified. In such instances, researchers may start with a core model guided by theory, and then face the problem which additional variables should be included and which may be omitted. Structural equation model (SEM) trees, a combination of SEM anddecision trees, offer a principled solution to this selection problem. SEM trees hierarchically split empirical data into homogeneous groups sharing similar data patterns by recursively selecting optimal predictors of these differences from a potentially large set of candidates. SEM forests are an extension of SEM trees, consisting of ensembles of SEM trees each built on a random sample of the original data. By aggregating the predictive information contained in a forest, researchers obtain a measure of variable importance that is more robust than corresponding measures from single trees. Variable importance informs on what variables may be missing from their models and may guide revisions of the underlying theory. In summary, SEM trees and forests serve as a data-driven tool for the improvement of theory-guided latent variable models. By combining the flexibility of SEM as a generic modeling technique with the potential of trees and forests to account for diverse and interactive predictors, SEM trees and forests serve as a powerful tool for hypothesis generation and theory development.

Collecting data from smart-phones, watches or web-sites is a promising development for psychological research. However, exploring these data sets can be challenging because there are often extremely large numbers of possible variables that could be used to predict an outcome of interest. In addition, there is often not much established theory that could help making a selection. Using statistical models such as regression models for data exploration can be inconvenient because these standard methods are not designed to handle large data. In the worst case, using simple statistical models can be misleading. For example, simply testing the correlation between predictors and outcomes will likely miss predictors with effects that are not approximately linear. In this paper we suggest using a machine learning method called ‘gradient boosted decision trees’. This approach can detect predictors with many different kinds of effects, but is easy to use compared to fitting many different statistical models. We extend this method to multivariate outcomes, and implement our approach in the R package mvtboost which is freely available on CRAN. To illustrate the approach, we analyze predictors of psychological well-being and show how to estimate, tune, and interpret the results. The analysis showed, for example, that especially above average control of internal states is associated with increased personal growth. Experimental results from statistical simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy. It exceeds or matches the performance of other cutting edge machine learning methods over a wide range of conditions.

Researchers are often faced with problems that involve predicting an important outcome, based on a large number of factors that may be plausibly related to that outcome. Traditional methods for null hypothesis significance tests of one or a small number of specific predictors are not optimal for such problems. Machine learning, reformulated in a statistics framework as Statistical Learning Theory (SLT), offers a powerful alternative. We review the fundamental tenets of SLT, which center around constructing models that maximize predictive accuracy. Importantly, these models prioritize predictive accuracy in new data, external to the sample used to build the models. We illustrate three common SLT algorithms exemplifying this principle, in the psychometric task of developing a personality scale to predict future mortality. We conclude by reviewing some of the diverse contexts in which SLT models might be useful. These contexts are unified by research problems that do not seek to test a single or small number of null hypotheses, but instead involve accurate prediction of an outcome based on a large amount of potentially relevant data.

For nearly a century, detecting the genetic contributions to cognitive and behavioral phenomena has been a core interest for psychological research, and that interest is even stronger now. Today, the collection of genetic data is both simple and inexpensive. As a consequence a vast amount of genetic data is collected across different disciplines as diverse as experimental and clinical psychology, cognitive sciences, and neurosciences. However, such an explosion in data collection can make data analyses very difficult. This difficultly is especially relevant when we wish to identify relationships within, and between genetic data and, for example, cognitive and neuropsychological batteries. To alleviate such problems, we have developed a multivariate approach to make these types of analyses easier and to better identify the relationships between multiple genetic markers and multiple behavioral or cognitive phenomena. Our approach—called partial least squares correspondence analysis (PLSCA)—generalizes partial least squares and identifies the information common to two different data tables measured on the same participants. PLSCA is specifically tailored for the analysis of complex data that may exist in a variety of measurement scales (e.g., categorical, ordinal, interval, or ratio scales). In our paper, we present—in a tutorial format—how PLSCA works, how to use it, and how to interpret its results. We illustrate PLSCA with genetic, behavioral, and neuroimaging data from the Alzheimer’s Disease Neuroimaging Initiative. Finally, we make available R code and data examples so that those interested can easily learn and use the technique.

Appendix B: Glossary of Some of the Major Terms used in the10 Special Issue Articles

ACT-R based Bayesian models are based on the ACT-R theory of declarative memory that can be operationalized as a big data predictive model, reflecting how declarative memory processes (e.g., exposure, learning, recall, forgetting) affect behavioral outcomes. The predictive model incorporates a version of the Naïve Bayes method, such that any piece of knowledge is assigned a prior probability for being retrieved by the user, independent of all other pieces of available knowledge, which is then weighted by the information in the current context to yield a posterior distribution and prediction. See Stanley and Byrne.

APA Ethical Principles of Psychologists and Code of Ethics (2010) , along with those from the Data Science Association , suggest policies and procedures for collecting data in a responsible manner that respects the participants and the research field in which conclusions will be shared. See Landers et al.

Application Programming Interfaces ( APIs ) refer to sets of procedures that software programs use to request and access data in a systematic way from other software sources (APIs can be web-based or platform-specific). See Chen and Wojcik; Jones et al.; Kern et al.; and Stanley and Byrne.

Average dissimilarity is a general term indicating how different a case tends to be from the rest of the data. See Brandmaier et al.

Bag of words conveys word frequency in a relevant text (e.g., sentence, paragraph, entire document), without retaining the ordering or context of the words. See Chen and Wojcik.

Case proximity is a general term for the similarity between entities in a data set, identifying any clear outliers. See Brandmaier et al.

Crud factor ( Meehl, 1990 , p. 108) is a general term used to indicate that in any psychological domain, measures of constructs are all correlated with one another, at some overall level. Traditional analyses have dealt with this, as will big data analyses. See Harlow and Oswald.

Data source theory refers to a well-thought out theoretical rationale, developed on the basis of the available variables in a given set, to support the nature of the data and the findings derived from them. Researchers working with big data projects are encouraged to have a data source theory to guide exploration, analyses, and empirical results in large data sets. See Landers et al.

Database management system ( DBMS ) is a structure that can store, update, and retrieve large amounts of data that can be accrued in research studies. See Kern et al.

Data Science Association ( http://www.datascienceassn.org/ ) is an educational group that offers guidelines for researchers to follow regarding ethics and other matters relevant to organizations. See Landers et al.

Decision trees ( also called recursive partitioning methods ) are models that apply a series of cutoffs on predictor variables, such that at each stage of selecting a predictor and cutoff point, the two groups created by the cutoff are as separated (i.e., internally coherent and externally distinct) as possible on the outcome variable. Decision trees model complex interactions, because each split of the tree on a given predictor is dependent on all splits from the previous predictors. See Brandmaier et al., and Miller et al.

Differential language analysis ( DLA ) is an empirical method used to extract underlying dimensions of words or phrases without making a priori assumptions about the structure of the language, and then relating these dimensions to outcomes of interest. See Kern et al.

Digital footprint refers to data that can be obtained from various sources such as the web, the media, and other forums in which publicly available information is posted by or stored regarding individuals or events. These kind of data can be stored in what is called a User-Footprint Matrix . See Kosinski et al.

Ecological fallacies are incorrect conclusions made about individual people or entities that are derived from information that summarizes a larger group. For example, if a census found that higher educational levels were associated with higher income, it would not necessarily be true that everyone with high income had a high level of education. Simpson’s paradox is an extreme example, where each within-group relationship may be different from or even the opposite of a between-group relationship. See Kern et al.

Elastic net refers to a regression model that linearly weights the penalty functions from two regression models: the lasso regression model (applying an L1 penalty that conducts variable selection and shrinkage of non-zero weights) and the ridge regression model (applying an L2 penalty that applies shrinkage, does not select variables, and will include correlated predictors, unlike lasso ). See Chapman et al.

Ensemble methods involve the use of predictions across several models. The idea is that combining predictions across models tends to be an improvement over the predictions taken from any single model in isolation. An example of an ensemble method is the structural equation model random forests (see this term, below). See Brandmaier et al.

Exception fallacies involve mistaken conclusions about a group derived from a few unrepresentative instances in which an event, term, or characteristic occurs quite a lot. For example, if one or two participants in a dataset mention the word “sad” many times, it could falsely be surmised that the group of data as a whole experienced depression. See Kern et al.

Expected prediction error ( EPE ) is an index of accuracy for a predictive model, decomposed into: (a) squared bias (systematic model over- or under-prediction across data sets), (b) variance (fluctuation in the model parameter estimates across data sets), and (c) irreducible error variance (variance that cannot be explained by any model). Expected prediction error captures the bias-variance tradeoff : Models that are too simple will under-fit the data and show high bias yet low variance in the EPE formula; models that are too complex will over-fit the data and show low bias yet high variance in the EPE formula. See Chapman et al.

Generalized cross-validation error indicates the target that is be minimized (the loss function) in k -fold cross-validation: e.g., the sum of squared errors, the sum of absolute errors, or the Gini coefficient for dichotomous outcomes. See Chapman et al.

glmnet is a computer package written in R code by Friedman, Hastie, Simon and Tibshirani (2016) that fits lasso and elastic-net models, with the ability to graph model solutions across the entire path of relevant tuning parameters. See Chapman et al.

Heatmaps plot the relationships among variables and/or clusters, using colors or shading to indicate the strength of relationship among variables. See Kosinski et al.

k-fold cross-validation involves partitioning a large dataset into k subsets of equal size. First, a model is developed on ( k -1) partitions of the data – the “test” data set; then predicted values from model are obtained on the k th partition of the data that was held out – the “training” data set). This process is repeated k times so that all the data serve as training data, and all data therefore have predicted values from models in which they did not participate. See Chapman et al., and Kern et al.

Lasso ( least absolute shrinkage and selection operator ) is a regression method that helps screen out predictor variables that are not contributing much to a model relative to the others. See Kern et al.

Latent class analysis can help explain the heterogeneity in a set of data by clustering individuals into unobserved types, based on observed multivariate features. Features may be continuous or categorical in nature. See Brandmaier et al.

Latent Dirichlet allocation ( LDA ) is a method that models words within a corpus as being attributable to a smaller set of unobserved categories (topics) that are empirically derived. See Chen and Wojcik.

Latent semantic analysis ( LSA ) involves the examination of different texts, where it is assumed that the use of similar words can reveal common themes across different sources. See Chen and Wojcik, and Kern et al.

Linguistic inquiry and word count ( LIWC ) is a commercial analysis tool for matching target words (words within the corpus being analyzed) to dictionary words (words in the LIWC dictionary). Target words are then characterized by the coded features of their matching dictionary words, such as their tense and part of speech, psychological characteristics (e.g., affect, motivation, cognition), and type of concern (e.g., work, home, religion, money). See Jones et al.

Machine learning , which has also been called statistical learning theory , is a generic term that refers to computational procedures for identifying patterns and developing models that improve the prediction of an outcome of interest. See Chapman et al.; Chen and Wojcik; Harlow and Oswald; Kern et al; and Miller et al.

Multiple sample structural equation modeling ( SEM ) helps in testing differences across the different clusters that emerge, to identify the patterns of heterogeneity. See Brandmaier et al.

Multivariate gradient boosted trees involve a nonparametric regression method that applies the idea of stochastic gradient boosting to trees (see stochastic gradient boosting ). Trees are fitted iteratively to the residuals obtained from previous trees, while seeking to optimize cross-validated prediction across multiple outcomes (not just one). See Miller et al.

mvtboost is a package written in R code by Miller that implements multivariate gradient boosted trees, allowing the user to tune and explore the model. See Miller et al.

MyPersonality project ( http://www.mypersonality.org/ ; Kosinski, Matz, Gosling, Popov, and Stillwell, 2015) stores the scores from dozens of psychological questionnaires as well as Facebook profile data of over six million participants. See Kosinski, et al.

MySQL is an open source version of a structured query language for working with big data projects. See Harlow and Oswald, and Chen and Wojcik.

Novelty refers to how different a case is from the rest of the data, showing little proximity and more dissimilarity. See Brandmaier et al.

Out-of-bag samples are portions of a larger dataset that do participate in the development of a predictive model and can be used to generate predicted values (and error). Out-of-bag samples are similar to the test sample data referred to previously in k -fold cross-validation. See Brandmaier et al.

Partial least squares correspondence analysis ( PLSCA ) is a generalization of partial least squares that can extract relationships from two separate sets of data measured on the same sample. In particular, PLSCA is useful for handling both categorical and continuous data types (e.g., genetic single-nucleotide polymorphisms that are categorical, and behavioral data that are roughly continuous). Permutation tests and bootstrapping are applied to conduct statistical inference for the overall fit of the model as well as inference on the stability of each obtained component. See Beaton et al.

Random permutation model is an approach for determining whether to preserve information about word order in text analytics, in case doing so provides additional predictive information. Permutations create uncorrelated vectors as a point of contrast with the actual ordering. See Stanley and Byrne.

semtree is a computer package that was developed by ( Brandmaier, 2015 ; http://brandmaier.de/semtree/ ) and written in R. It can be used to analyze SEM tree and forest methods to help explore and discern clusters or subgroups within a large dataset. See Brandmaier et al. and related references.

Singular value decomposition ( SVD ) is a procedure used to reduce a large set of variables or items to a smaller set of dimensions. It is one approach to conducting a principal components analysis . See Kosinski et al.

Stack Overflow is an online question-and-answer forum for programmers (using R, Python, and otherwise). See Stanley and Byrne.

Stochastic gradient boosting is a general term for an iterative method of regression, such that the predictor entered first has the highest functional relationship with the outcome; then residuals are created, and the same rule is applied (where the outcome now becomes the residuals). Also, at each iteration, only a subset of the data is used to help develop more robust models (where out-of-bag prediction errors can be obtained from the data outside of the model). The learning rate and number of iterations are, loosely, inversely related (low learning rate, or improvement in prediction at each step, generally means more iterations) and optimizing these can be explored and supported through cross-validation. See Chapman et al.

Stop words are words that are not essential to a phrase or text and therefore can be omitted to help keep a file more concise. Examples of stop words include “an” and “the” or other similarly nondescript words that can be deleted from a large database (e.g., Twitter, Facebook) and do not need to be analyzed. See Chen and Wojcik; Kern et al.; and Stanley and Byrne.

Structural equation model (SEM) forests are classification procedures that combine SEM and decision-tree or SEM-tree methods to understand the nature of subgroups that exist in a large dataset. SEM forests extends the method of SEM trees by resampling the data to form aggregates of the SEM trees that should have less bias and more stability. See Brandmaier et al.

Structural equation model (SEM) trees combine the methods of decision trees and SEM to conduct theory-guided analysis of large datasets. SEM trees are useful in examining a theoretically based prediction model, but can be unstable when random variation in the data is inadvertently featured in a decision tree. See Brandmaier et al.

superpc is a computer package written in R code by Bair and Tibshirani (2010) that conducts the procedure known as supervised principal components analysis, a term that is defined below. See Chapman et al.

Supervised learning algorithms are procedures that can be developed on a training dataset, and then be used to build regression models that can predict an outcome with one or more variables. See Chen and Wojcik.

Supervised principal components analysis ( SPCA ) is a generalization of principal components regression that first selects predictors with meaningful univariate relationships with the outcome and then performs principal components analysis. Cross-validation is used to determine the appropriate threshold for variable selection and the number of principal components to retain. See Chapman et al.

TExPosition is a computer package written in R code by Beaton and colleagues that implements partial least squares correspondence analysis (this latter term being defined previously). See Beaton et al.

Theory of the data source is the process whereby a larger conceptual framework is adopted when analyzing and interpreting findings from a large dataset, particularly one obtained for another purpose, such as with web scraping of generally available data. See Landers et al.

twitteR is a package written in R code by Jeff Gentry that accesses the Twitter API (see glossary entry on this term), which then allows one to extract subsets of Twitter data found online, search the data, and subject the data to text analyses. See Jones et al.

User-footprint matrix holds information obtained from sources such as the web or various records and lists. See Kosinski et al.

Variable importance is a term indicating how much the inclusion of a specific variable will reduce the degree of uncertainty there is in a model (or models) of interest. The uncertainty criterion and the model of course must be mathematically formalized. See Brandmaier et al.

Web scraping is a process that culls large amounts of data from web pages to be used in observational or archival data collection projects. See Landers et al.

World Well-Being Project ( WWBP , http://www.wwbp.org/ ) involves a collaboration with researchers from psychology and computer science. The project draws on language data from social media to study evidence for well-being that can be revealed through themes of interpersonal relationships, successful achievements, involvement with activities, and indication of meaning and purpose in life. See Kern et al.

A draft of a portion of this introduction was previously presented in Harlow, L. L., & Spahn, R. (2014, October). Big data science: Is there a role for psychology ? Abstract for Society of Multivariate Experimental Psychology, Nashville, TN.

Contributor Information

Lisa L. Harlow, Department of Psychology, University of Rhode Island.

Frederick L. Oswald, Department of Psychology, Rice University.

  • Alvarez RM. Computational social science: Discovery and prediction. Cambridge: Cambridge University Press; 2016. [ Google Scholar ]
  • APA. Ethical principles of psychologists and code of conduct. 2010 Retrieved September 28, 2016 from: http://www.apa.org/ethics/code/
  • Azmak O, Bayer H, Caplin A, Chun M, Glimcher P, Koonin S, Patrinos A. Using big data to understand the human condition: The Kavli HUMAN Project. Big Data. 2015; 3 :173–188. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bair E, Tibshirani R. superpc: Supervised principal components. R package version 1.07. 2010 Retrieved from http://www-stat.standord.edu/~tibs/superpc .
  • Brandmaier AM. semtree: Recursive partitioning of structural equation models in R [Computer software manual] 2015 Retrieved from http://www.brandmaier.de/semtree .
  • Cioffi-Revilla C. Introduction to computational social science: Principles and applications. London: Springer-Verlag; 2014. [ Google Scholar ]
  • Fawcett T. Mining the quantified self: Personal knowledge discovery as a challenge for data science. Big Data. 2016; 3 :249–266. [ PubMed ] [ Google Scholar ]
  • Friedman J, Hastie T, Simon N, Tibshirani R. glmnet: Lasso and elastic-net regularized generalized linear models. R package 2.0-6. 2016 Retrieved from https://cran.r-project.org/web/packages/glmnet/glmnet.pdf .
  • Gentry J. Package ‘twitteR’, version 1.1.9. 2016 Retrieved from https://cran.r-project.org/web/packages/twitteR/twitteR.pdf .
  • Matz Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt; 2013. [ Google Scholar ]
  • McArdle JJ, Ritschard G, editors. Contemporary issues in exploratory data mining in the behavioral sciences. New York: Routledge; 2014. [ Google Scholar ]
  • McDonald RP. Test theory: A unified treatment. New York: Routledge; 1999. [ Google Scholar ]
  • Meehl PE. Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry. 1990; 1 :108–141. [ Google Scholar ]

IMAGES

  1. Expository essay: Psychology research report example

    research paper simply psychology

  2. Psychology example essay April 2016-v2

    research paper simply psychology

  3. research proposal topics psychology

    research paper simply psychology

  4. Positive Psychology Research Paper Example

    research paper simply psychology

  5. Psychology in Context Practice Papers with Model Answers

    research paper simply psychology

  6. 🏷️ Psychology research paper sample apa format. APA 7th Edition. 2022-10-26

    research paper simply psychology

COMMENTS

  1. Research Methods In Psychology

    Olivia Guy-Evans, MSc. Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.

  2. AQA A-Level Psychology Past Papers With Answers

    The past papers are free to download for you to use as practice for your exams. Paper 1: Introductory Topics. Paper 2: Psychology in Context. Paper 3: Issues and Options. AS Psychology (7181): Paper 1. A-Level Psychology (7182): Paper 1. 72 Marks.

  3. A-level Psychology AQA Revision Notes

    January 12, 2024. Reviewed by. Olivia Guy-Evans, MSc. Revision guide for AQA Psychology AS and A-Level topics, including straightforward study notes and summaries of the relevant theories and studies, past papers, and mark schemes with example answers. Fully updated for the 2023/24 academic year.

  4. Writing your psychology research paper.

    Many psychology students dislike writing a research paper, their aversion driven by anxiety over various aspects of the process. This primer for undergraduates explains how to write a clear, compelling, well-organized research paper. From picking a promising topic, to finding and digesting the pertinent literature, to developing a thesis, to outlining and presenting ideas, to editing for ...

  5. PDF Writing Your Psychology Research Paper

    My students tell me that writing research papers is hard for at least two reasons. First, a blank document is overwhelming—a 10-page paper feels unreachable, especially when the first page is coming along so slowly. Second, writing well—clear, coherent, and thoughtful prose—does not come naturally.

  6. APA PsycArticles

    The citation footprint of APA's journals (PDF, 91KB) is more than double our article output, demonstrating our commitment and focus on editorial excellence.Research published in APA PsycArticles provides global, diverse perspectives on the field of psychology. The database is updated bi-weekly, ensuring your patrons are connected to articles revealing the latest psychological findings.

  7. How to Write a Psychology Research Paper

    Remember to follow APA format as you write your paper and include in-text citations for any materials you reference. Make sure to cite any information in the body of your paper in your reference section at the end of your document. Writing a psychology research paper can be intimidating at first, but breaking the process into a series of ...

  8. The psychological perspective on mental health and mental disorder

    Overall, the position papers on psychological perspectives converge on several strengths of the European research field: i.e. a substantial body of expertise and knowledge in both basic and clinical research, strong and increasingly more intimate collaborative ties to the biomedical field, and a broader coverage of mental health issues as ...

  9. Writing Your Psychology Research Paper

    Book details. This primer explains how to write clear, compelling, well-organized research papers. From picking a promising topic, to finding and digesting the pertinent literature, to developing a thesis, to outlining and presenting ideas, to editing for clarity and concision — each step is broken down and illustrated with examples.

  10. Lab Report Format: Step-by-Step Guide & Examples

    A typical lab report would include the following sections: title, abstract, introduction, method, results, and discussion. The title page, abstract, references, and appendices are started on separate pages (subsections from the main body of the report are not). Use double-line spacing of text, font size 12, and include page numbers.

  11. PDF Guide to Writing a Psychology Research Paper

    Component 1: The Title Page. • On the right side of the header, type the first 2-3 words of your full title followed by the page number. This header will appear on every page of you report. • At the top of the page, type flush left the words "Running head:" followed by an abbreviation of your title in all caps.

  12. Writing Research Papers

    For more information on writing research papers in APA style, please checking out the following pages. Here you'll find details on multiple aspects of the research paper writing process, ranging from how the paper should be structured to how to write more effectively. Structure and Format - the critical components of each section of an APA ...

  13. PDF APA Handbook of Research Methods in Psychology

    Research Methods in Psychology AP A Han dbook s in Psychology VOLUME Research Designs: Quantitative, Qualitative, Neuropsychological, and Biological SECOND EDITION Harris Cooper, Editor-in-Chief Marc N. Coutanche, Linda M. McMullen, A. T. Panter, sychological Association. Not for further distribution.

  14. Writing a Literature Review

    Identify and define the topic that you will be reviewing. 2. Conduct a literature search. 3. Read through the research that you have found and take notes. 4. Organize your notes and thoughts; create an outline. 5. Write the literature review itself and edit and revise as needed.

  15. The Use of Research Methods in Psychological Research: A Systematised

    Introduction. Psychology is an ever-growing and popular field (Gough and Lyons, 2016; Clay, 2017).Due to this growth and the need for science-based research to base health decisions on (Perestelo-Pérez, 2013), the use of research methods in the broad field of psychology is an essential point of investigation (Stangor, 2011; Aanstoos, 2014).Research methods are therefore viewed as important ...

  16. Free APA Journal Articles

    Recently published articles from subdisciplines of psychology covered by more than 90 APA Journals™ publications. For additional free resources (such as article summaries, podcasts, and more), please visit the Highlights in Psychological Research page. Browse and read free articles from APA Journals across the field of psychology, selected by ...

  17. PDF WrITINg CeNTer BrIeF gUIde SerIeS A Brief Guide to Writing the

    The primary goal of a research summary or literature review paper is to synthesize research on a topic in psychology while also shedding a new light on that topic. Writing a literature review paper involves first doing substantial research both online and in the library. The goal of your research should be not just to find all of the relevant ...

  18. 50+ Research Topics for Psychology Papers

    In your paper, you might choose to summarize the experiment, analyze the ethics of the research, or evaluate the implications of the study. Possible experiments that you might consider include: The Milgram Obedience Experiment. The Stanford Prison Experiment. The Little Albert Experiment.

  19. About Simply Psychology

    With over 4 million monthly visitors and citations in 8,000 academic journal papers, Simply Psychology is an invaluable resource to those interested in psychology. The site includes information on various psychological topics such as emotion, child development, intelligence, and learning, as well as research methods and statistics.

  20. Scientific abstracts and plain language summaries in psychology: A

    In psychology, PLS are supposed to enable non-experts to understand and draw conclusions from research findings. Thus, they should exhibit a higher readability than traditional scientific abstracts. The aim of the present study was to compare the readability of scientific abstracts and PLS in psychology based on reproducible readability indices.

  21. Big Data in Psychology: Introduction to Special Issue

    Abstract. The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data ...