Thursday, Jun. 7, 2018
The prevailing use of technology in education has significantly changed how data are collected and analyzed in schools and universities in recent years. With the emergence of new forms of data, such as system log data (i.e., records of users’ interactions with a digital learning environment), researchers are more and more likely to collect rich, longitudinal information for all individuals and apply advanced analytical techniques, such as data mining and learning analytics, to assess the challenges, identify the problems and provide evidence to guide decision-making.
This newly published book Learning Analytics Goes to Schools: A Collaborative Approach to Improving Education, written by Dr. Andy Krumm, Dr. Barbara Means and Dr. Marie Bienkowski, describes multiple efforts to use data-intensive research methods to improve teaching and learning by building long-term partnerships between researchers and practitioners. In this blog, I introduce the contribution that the Teachers College (TC), Columbia University Education Leadership Data Analytics (ELDA) research team made to this book and summarize several key points that are highlighted in this book.
This collaborative research project was funded through a grant from the National Science Foundation (NSF DRL-1444621) in collaboration with SRI International, Summit Public Schools, and Teachers College, Columbia University.
In October 2016, Prof. Alex Bowers and four members of the TC ELDA team, Lauren Fox, SunMin Lee, Elizabeth Monroe and Yilin Pan, participated in a DataSprint with the evaluation and assessment team of Summit Public Schools and researchers from the Center for Technology at Learning, SRI International led by Dr. Andrew Krumm. A “DataSprint” is a collaborative approach for rapidly developing data products, change ideas, and tests of change ideas.
In this DataSprint, the ELDA team aimed to apply hierarchical cluster analysis to the system log data of Summit Public School’s Peronalized Learning Platform and visualize the results using heatmaps to help the practitioners get a better understanding of student use of the Personalized Learning Platform. The DataSprint was composed of three phases. In the first phase (30 days), the SRI International team cleaned the logfile dataset and identified several potential problems in collaboration with Summit Public Schools. Meanwhile, the TC ELDA team attended a training session on hierarchical cluster analysis and heatmaps led by Yihan Zhao, and held multiple group meetings to familiarize themselves with the demo data and generate mock-up R code. In the second phase, the three teams gathered in Menlo Park, CA, and worked together for two days. Participants were divided into three groups, each working on one of these three problems:
- What is the relationship between external and internal measures of student learning and achievement?
- What is the relationship between students’ activity in a digital learning environment and external measures of student achievement?
- How many district learning behaviors can be measured using digital learning environment data?
In each group, the team members first had a discussion to specify the research question and identify measures that are meaningful in practice. Then team members from SRI International and Teachers Colleges conducted the analysis while the team members from the Summit Public Schools worked on the interpretation of the results and put together the slides. The process was iterative and interactive, as each group made decisions on what to explore, how to address the research question, and how to interpret the results together within the context. In the third phase of the DataSprint, researchers from Teachers College cleaned the data files and R code files and shared them with the Summit Public Schools team so that they could replicate the same analysis in the future. All team members summarized their observations and thoughts, which eventually contributed to the book.
Highlights of this book
Data and Methods Used in Educational Data-Intensive Research
The increasing use of technology in education enables data-intensive researchers to collect and utilize data from digital learning environments (e.g., Intelligent Tutoring system, Learning Management System and Massive Open Online Courses), administrative data analysis (e.g., Student Information Systems, and Statewide Longitudinal Data Systems) and sensors and recording device (e.g., location, physical movement and speech data collected from fitness sensors). These data are different from traditional educational datasets in volume, velocity, variety, veracity, coverage and centrality.
To address the new challenges to analyze these data, Krumm and his colleagues recommend a five-step workflow: prepare, wrangle, explore, model and communicate. The authors then demonstrate how to apply the five steps of the workflow using a hypothetical example. For example, the hierarchical cluster analysis with heatmap is introduced as a useful tool to identify and visualize the structure patterns of the data.
Legal and Ethical Issues in Using Educational Data
Krumm and his colleagues point out that researchers should follow the federal legislation and make ethical decisions for privacy control and information security in the process of collecting and analyzing data. The Family Educational Rights and Privacy Act (FERPA) requires that schools must obtain the consent from parents and eligible students before disclosing any student information to third parties, except in case of a specified set of exemptions. De-identified data are subject to less restrictions. The ethical principles include “doing no harm through the use of data, doing good, making sure that the applications and results are just, fair and equitable; and ensuring that individuals have agency in terms of decision making or, at least transparency into what decisions are being made about them.” (p. 71).
Foundations of Collaborative Data-Intensive Improvement
Krumm and his colleagues introduce the model of collaborative data-intensive improvement (CDI) to bring researchers and education practitioners together for data-intensive research. CDI is highly influenced by three factors in education. First, federal policies, such as No Child Left Behind and Every Student Succeeds Act, have pushed education practitioners to use data to guide their decision-making. Second, with this emerging trend of evidence-based decision-making, new forms of collaborative education research have been developed, such as translational/implementation research that translates laboratory findings to real application in classrooms and schools with a deep understanding of the contexts, Design-Based Implementation Research in which researchers and practitioners co-design the research agenda, and improvement science that focuses on the variation in outcomes and how outcomes might be improved. Third, in recent years, although research that use detailed system log data from digital learning systems has increased, the big data approach has not been adopted in the collaborative research partnerships. CDI borrows techniques for working with practitioners from improvement science and Design-Based Implementation Research, and use detailed and frequent data to explore how improvements can be made.
Supporting Conditions for Collaborative Data-Intensive Improvement
Based on the experience of working with Summit Public Schools and other partners (e.g., Foundation for the Advancement of Teaching and the Carnegie Math Pathways) on the data-intensive improvement, Krumm and his colleagues summarize three characteristics that distinguish CDI from traditional research-practitioner partnerships: “(1) research questions and topics were based on the needs of practitioners; (2) the primary audience for data products was the partnership, and (3) researchers and practitioners co-develop change ideas” (pp. 126-127). In addition, four conditions were in place in order for researchers and practitioners to come together to develop data products and change ideas in the CDI with Summit Public Schools. First, the partnership between researchers and practitioners must base in trust. Second, an explicit improvement method organized multiple elements of the partnership’s work. Third, learning events provided opportunities for members of the partnership to collaborate and build knowledge. Fourth, common workflows and accompanying tools supported data-intensive research, improvement activities and project coordination.
Five phases of Collaborative Data-Intensive Improvement
To facilitate the use of CDI by other researchers, Krumm and his colleagues outline a five-phase process for organizing a CDI project. Phase I involves setting up a partnership. The main tasks include identifying project members, clarifying problems to address and defining the aim of the partnership. In Phase II, researchers and practitioners jointly develop the practical improvement theory that enables the partnership to reach the aim. The aim in Phase I and the theory in Phase II jointly shape the activities of data wrangling, exploration and modeling in Phase III. In Phase IV, the results of the data-intensive analysis are translated to change ideas on how to improve the practice. In Phase V, members of the partnership test out the change ideas in real learning environments.
In a CDI process, all participants are likely to make some changes in their usual practice. Education researchers need to get conformable with identifying research questions jointly with practitioners and developing the research designs that are more “engineering- and design-based” (p.157). For example, they may work on quick small tests of difference in means rather than conducting a rigorous Randomized Control Trials. As a result, they may find scholarly journals are not receptive to articles based on CDI work. Educational data scientists need to expand their work from data analysis to intervention developing and testing. In addition, they will need to wrangle and explore data from multiple sources to complement the data from a particular learning environment. Education leaders and frontline practitioners will find that they are committing to achieving a specific goal and using data to shape decisions about future instruction rather than relying on guts to make decisions.
Krumm and his colleagues shared five takeaways that they learned from multiple CDI projects. First, formulating meaningful variables is central to the work of CDI. Second, data scientists need to make sure they’re working on a real problem of practice. Third, researchers have to leave their desks to understand teaching and learning. Fourth, data for generating change ideas should not be confused with data for other purposes. Fifth, the CDI team should set up data security procedures and data use agreements before touching any individual-level data.
Krumm, A., Means, B., & Bienkowski, M. (2018). Learning Analytics Goes to School: A Collaborative Approach to Improving Education. New York: Routledge.