Computational Social Science Skills

Shot of a programmer working on a computer code at night

Our devices track our behavior in ways that create unimaginable amounts of information, information that can be useful for understanding human behavior. Not surprisingly, the widespread availability of these data parallels an increasing interest in data science and related topics (see Figure 1). Social scientists are increasingly using “big data” to examine theoretically-grounded research questions. Yet, few social scientists, especially psychologists, have the skills and training experiences needed to engage with this rapidly growing area of expertise called “data science.”

Ideally, data science skills should be linked with traditional forms of scientific inquiry including social science theory, research methods and statistics. Below, we briefly describe this exciting new area within social science inquiry related to big data acquisition, processing and visualizing, as well as two entry-level analytic techniques.

Computational social science (CSS) lies at the intersection of social science theory, traditional statistical and research methods, and computer science (see Figure 2) and is rapidly gaining traction in psychological science (e.g., Eiler et al., 2018; Jones, et al., 2017; Ritter et al., 2014; Tamburrini et al. 2015; Udea et al., 2017; Youyou, Kosinski, & Stillwell, 2015). Students with CSS skills can use data to examine theoretically-sound research questions in virtually any area they are interested in. They also improve their post-baccalaureate opportunities.

description
Figure 1. Google Search Trends

description
Figure 2. Primary (gray) and secondary (white) learning goals.

Unfortunately, most psychology curricula do not incorporate data science skills. The same is true at our university, and thus, as a workaround, during the 2017-18 academic year, we designed a year-long research lab including a set of learning goals and objectives and activities to introduce, practice and apply basic CSS skills.

Unlike a typical classroom setting, the educational experience was a collaborative, experiential learning atmosphere that employed several pedagogical strategies including self-directed learning, peer-mentoring, guest lecturing, in-lab presentations and scientific conference dissemination. Students worked collaboratively with peers outside of the lab and relied on the guidance and leadership of more senior mentors on all projects. They also independently sought out and employed other resources online and shared them with the group. Social media data (e.g., Twitter, Reddit) were selected due to broad applicability across interests, textual nature and amenability to network analysis.

CSS Skills

Below, we briefly discuss the CSS skills we identified as important and include links, reference articles and example papers that can be used by anyone entering this arena.

Programming

While there are many programming languages, R seems to have the shortest learning curve and maximizes capability. It is free and open-source, which means resources abound. R is good for data manipulation, visualization and analyses. Moreover, annotated code enhances transparency and replicability, which can be done easily within the R environment. Good starter learning resources can be found at the R Project for Statistical Computing websiteCookbook for R, and R for Data Science.

Data Acquisition and Ethics

A wide range of tools is available for ethically gathering organic online and social media data. Most websites provide documentation (i.e., via a robots.txt file, or licensing documentation) or an application program interface (API) that specifies how one can interact with the data. A variety of R packages that facilitate data acquisition are available at the Comprehensive R Archive Network or through RStudio directly.

Data Visualization and Manipulation

Data visualization skills are relevant for all stages of research, from initially visualizing large amounts of data to displaying results. While Excel can be useful for simple visualizations, the capabilities of R far exceed Excel. A good starting point is the R package tidyverse, which includes a range of tools specific to data science that share an underlying grammatical structure across toolkits.

Text and Network Analysis

Linguistic analysis software identifies language indicators including: themes, emotion, sentiment, evaluation, structure and others. Try the Linguistic Inquiry and Word Count (LIWC, Tausczik & Pennebaker, 2010; the Sentiment Analysis and Cognition Engine (Crossley, Kyle, & McNamara, 2017), and The Evaluative Lexicon; Rocklage, Rucker, & Nordgren, 2018. Network analysis is a natural partner to LAS in that correlation between variables derived from language analysis can be used to form the network (e.g., Eiler et al., 2018). Minimally priced point-and-click programs are available (UCINET; Borgatti, Everett, & Freeman, 2002, NetDraw; Borgatti, 2002) in addition to the more advanced analytics available in R (e.g., statnet, Handcock et al., 2016). For an introduction see Hanneman & Riddle’s (2005) free online book.

Get Started

For interested readers, we have described the CSS lab experience course in detail, complete with an example syllabus and detailed explanations of learning goals, lab activities and required exercises. Please contact us for a pre-print. We welcome the opportunity to interact with others who want guidance on how to get started.

Heidi A Wayment
Brian Alfred Eiler

Below are some helpful references, organized by topic.

General Articles on CSS and Big Data

Chen, E. E., & Wojcik, S. P. (2016). A practical guide to big data research in psychology. Psychological Methods21(4), 458–474. doi:10.1037/met0000111

Eiler, B. A., Doyle, P. C., Al-Kire, R. L., & Wayment, H. A. (2018). Teaching introductory computational psychology skills to undergraduate students in a research experience setting. Manuscript submitted for publication.

Grolemund, G. & Wickham, H. (2017). R for data science. Retrieved from: http://r4ds.had.co.nz/

Gorakala, S. K. (October, 2013). Fetch Twitter data using R. Retrieved from https://www.r-bloggers.com/fetch-twitter-data-using-r/

Hargittai, E. (2015). Is bigger always better? Potential biases of big data derived from social network sites. The Annals of the American Academy of Political & Social Science,659, 63-76.

Kim, A. E., Hansen, H. M., Murphy, J., Richards, A. K., Duke, J., & Allen, J. A. (2013). Methodological considerations in analyzing Twitter data. Oxford University Press. doi: 10.1093/jncimonographs/lgt026

Kosinski, M., Wang, Y., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21(4), 493-506. doi: 10.1037/met0000105

Landers, R. N., Brusso, R., Cavanaugh, K., & Collmus, A. B. (2016). A primer on theory-driven
web scraping: Automatic extraction of big data from the internet for use in psychological research. Psychological Methods, 21(4), 475-492. doi:10.1037/met0000081

Psychological Studies using CSS Skills

Cavolo, K. M., Eiler, B. A., & Wayment, H. A. (under review). Novel examination of self-evaluation processes through data mining and textual analysis. Journal of Language and Social Psychology.

Eiler, B. A., Al-Kire, R. L., Doyle, P. C., & Wayment, H. A. (in press). Power and trust dynamics of sexual violence: A textual analysis of Nassar victim impact statements and #MeToo disclosures on Twitter. Journal of Clinical Sport Psychology

Gillath, O., Karantzas, G. C., & Selcuk, E. (2017). A net of friends: Investigating friendship by integrating attachment theory and social network analysis. Personality and SocialPsychology Bulletin, 43(11), 1546-1565. doi:10.1177/0146167217719731

Jones, N. M., Thompson, R. R., Schedtter, C. D., & Silver, R. C. (2017). Distress and rumor exposure on social media during a campus lockdown. Proceedings of the National Academy of Sciences of the USA, 144, 11663-11668. doi:10.1073/pnas.1708518114

Jones, N., Wojcik, S. P., Sweeting, J., & Silver, R. C. Tweeting negative emotion: An investigation of Twitter data in the aftermath of violence on college campuses. Psychological Methods, 21(4), 526-541. doi:10.1037/met0000099

Ritter, R. S., Preston, J. L., & Hernandez, I. (2014). Happy tweets: Christians are happier, more socially connected, and less analytical than atheists on Twitter. Social Psychological and Personality Science5(2), 243-249. doi:10.1177/1948550613492345

Tamburrini, N., Cinnirella, M., Jansen, V. A., & Bryden, J. (2015). Twitter users change word usage according to conversation-partner social identity. Social Networks40, 84-89. doi: 10.1016/j.socnet.2014.07.004

Ueda, M., Mori, K., Matsubayashi, T., & Sawada, Y. (2017). Tweeting celebrity suicides: Users’ reaction to prominent suicide deaths on Twitter and subsequent increases in actual suicide. Social Science & Medicine, 189, 158-166. doi:10.1016/j.socscimed.2017.06.032

Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences of the United States of America, 112(4), 1036-1040. doi:10.1073/pnas.1418680112

Text Analysis

Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods49(3), 803–821. doi:10.3758/s13428-016-0743-z

Gefen, D., Endicott, J. E., Fresneda, J. E., Miller, J., & Larsen, K. R. (2017). A guide to text analysis with latent semantic analysis in r with annotated code: Studying online reviews and the stack exchange community. Communications of the Association for Information Systems41(1), 450–496. doi:10.17705/1CAIS.04121

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and psychometric properties of LIWC2007. Austin, TX: University of Texas at Austin.

Rocklage, M. D. & Fazio, R. H. (2015). The evaluative lexicon: Adjective use as a means of assessing and distinguishing attitude valence, extremity, and emotionality. Journal of Experimental Social Psychology, 56, 214-227. doi:10.1016/j.jesp.2014.10.005

Rocklage, M. D., Rucker, D. D., & Nordgren, L. F. (2018). The evaluative lexicon 2.0: The measurement of emotionality, extremity, and valence in language. Behavior Research Methods, 50, 1327-1344. doi:10.3758/s13428-017-0975-6

Network Analysis

Borgatti, S. P., 2002. NetDraw Software for Network Visualization. Analytic Technologies: Lexington, KY

Borgatti, S. P., Everett, M. G. and Freeman, L. C. 2002. Ucinet for Windows: Software for Social Network Analysis. Harvard, MA: Analytic Technologies.

Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science323(5916), 892–895. doi:10.1126/science.1165821

Butts, C. T. (2008). Social network analysis: A methodological introduction. Asian Journal of Social Psychology11(1), 13–41. doi:10.1111/j.1467-839X.2007.00241.x

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., & Morris, M. (2003). statnetSoftware tools for the statistical modeling of network datahttp://statnetproject.org

Hanneman, R. A. & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California, Riverside. Retrieved from: http://faculty.ucr.edu/~hanneman/

Montazeri, F., de Bildt, A., Dekker, V., & Anderson, G. M. (2018). Network analysis of anxiety in the autism realm. Journal of Autism and Developmental Disorders, 1–12. doi: 10.1007/s10803-018-3474-4

Re-posted with permission from the American Psychological Association’s Psychology Teacher Network.

About the Author

Heidi Wayment, PhD
Dr. Wayment received her doctorate in social and health psychology from UCLA in 1992 and following a post-doctoral fellowship has been teaching at Northern Arizona University since 1996. Her research focuses on issues related to self-identity, stress and coping and predictors of health behavior. For the past decade her research has focused on the “quiet ego” and characteristics associated with balance and growth, and the relationship to stress, coping and resilience, for which she was recognized with NAU’s Outstanding Scholarship Award in 2018. During her tenure at NAU she has served as department chair (2004-07, 2012-15) and associate dean (2016-18), and has helped strengthen undergraduate research opportunities. She is a fellow of the Society of Experimental Social Psychology, Association of Psychological Science and Western Psychological Association. Her interest in data science skills began in about 2010 and she is currently enrolled in the masters’ program through the University of Mannheim’s International Program in Survey and Data Science.
Brian Eiler, PhD
Dr. Eiler, earned his doctorate in experimental psychology from the University of Cincinnati and is currently a post-doctoral fellow at Northern Arizona University. His research applies the lens of complexity science to phenomena at the intersection of social, health and industrial/organizational psychology. This framework is not a single theory, but rather encompasses a range of disciplines and techniques (e.g., graph theory, dynamical systems, statistical physics, data science, computational and simulation modeling). His recent work has examined individual, social, and cultural influences on concussion reporting in collegiate football, disclosures of experiences of sexual violence, group based workplace inequality, and language on Twitter related to politics and pop-culture. His teaching and supervisory activities center on mentorship, skill acquisition and engagement. He has also been a champion for diversity, equity and inclusion through policy and program development and evaluation in academia and as an organizational consultant.