Reimagining Education Data Analytics

Creating a longitudinal view of lifelong learner experiences is a decades-old great idea that remains maddeningly difficult to achieve. And without this view of student activities and outcomes, it also remains maddeningly difficult to understand what’s working in education – for both the individual student and across the broader learning ecosystem.

P-20W datasets aren’t meeting expectations

A few states have linked pre-school, K-12, post-secondary, and workforce data into “P-20W” datasets with varied and limited success, often handicapped by data latency, fidelity, and gaps. Early education and workforce data remains sparse, and critical data points are absent from adjacent state agencies that impact student success, such as health and human services, foster care, or job training, to name a few.

The extended time that it takes for data to move from individual education settings, through intermediaries, and into a P-20W data aggregator, limits the usefulness and impact of the data. Researchers, practitioners, students and parents are only able to see data through a rearview mirror, but in this case, data is 18-months old and is definitely not “closer than it appears.” The opportunity for practitioners and parents to influence a student’s path is lost.

In addition, nuanced research questions remain difficult to answer due to poor data fidelity. As critical information moves between organizations, rigid, pre-defined standards structures are applied and data is held back at each step. Researchers are forced to bypass the P-20W datasets that were supposed to help them, and they must go back to individual institutions to access critical missing information, which takes time.

Individual state and agency attempts to tackle these problems are inefficient and waste limited public and philanthropic funds. Quite simply, these data problems are not unique to states or even to the education domain but are solvable through applying common, well-understood and proven data analytics patterns.

So what if…

So what if there was a way for states to leverage common patterns and practices to jump-start and accelerate their efforts to develop a longitudinal view of lifelong learner experiences? What if data generated in educational settings could be made available in days instead of months? What if the loss of data fidelity could be reduced or eliminated? What if leveraging modern privacy methodologies could unlock more data? What if we could finally answer the question of “what works in education”?

Changing perceptions and capabilities to unlock opportunities

Parent, practitioner, and political enthusiasm for developing a robust longitudinal view of students is high, and practical technical solutions are widely available. The topic of linking learner data from multiple sources to answer research questions and to provide personalized learning and services has historically raised some privacy concerns. Many parents, however, after a pandemic year of cobbled-together home learning applications that cannot provide an integrated view of their student’s lessons, homework, status or progress also now questioning if the current status quo represents state-of-the-art.

At the same time, cloud-based data analytics solutions have emerged that make acquiring, curating, linking, aggregating, storing and visualizing disparate but related data more approachable. Major cloud vendors have strengthened their offerings, focusing on lowering the investment required to implement advanced data analytics and visualization capabilities.

In the past, lengthy implementation times meant building state-level longitudinal datasets was a multi-year process, subject to changes in funding and sponsorship. Reduction in cost and barriers in implementing cloud-based analytics has significantly reduced time-to-value, making deployments possible within political cycles.

A&M is working on modern data pipelines for education

A&M is engaged in a series of projects with states and philanthropic organizations to assess states’ approach to modernize their P-20W systems and to deliver meaningful, linked datasets for lifelong learners. Through our work, we have canvassed a broad set of stakeholders including education institutions, academics, researchers, cloud vendors, education technology companies, state-level P-20W data aggregators, as-well-as data analytics thought leaders outside of education to identify opportunities, wants and needs, and gaps.

Most recently, A&M deeply evaluated one state-level data aggregator to identify critical capabilities and assess the agency’s ability to deliver. We looked at the landscape of possible solution providers to assess how they could be leveraged to fill capability gaps. A&M applied best practice patterns to develop a conceptual solution design, mapped possible solutions from the landscape analysis onto the design and developed a roadmap and cost model to help inform the agency’s decision process.

With a target solution design, A&M helped guide the state-level agency through a decision process to select specific solution providers and technologies and assisted the agency in developing a concrete design and defining their projects to deliver the design. A&M worked with the agency staff to develop a proof-of-concept to demonstrate the selected technologies and validate hypotheses developed during the concrete solution design. Now, A&M is continuing to provide technology and program management assistance as the agency implements the project roadmap.

What’s next?

A&M is engaging more states in formalizing their solution design and roadmap for developing P-20W data systems. A&M is also working with states and philanthropies to develop a “playbook” that states can use to guide and jumpstart their P-20W journeys. We are exploring how technology companies could collaborate on developing a shared architecture vision and technology “accelerators” that could transform the months-long process of getting to a proof-of-concept into weeks and could be leveraged as a starting point for more formal implementation efforts.

We’re also exploring opportunities to work on a few particular “messy” problems that would be best to solve collectively. These are particularly challenging problems that would benefit from multiple states and technology providers joining forces together to develop common solutions that can be shared.

First is privacy for individual level student information. Opposing goals of making the most information available to a wide set of stakeholders while maintaining adherence to strict privacy laws tends to skew toward privacy over availability; however, new approaches to privacy can increase the level of directional information made available while continuing to provide strict privacy at the student level.

As more varied and rich data is added to P-20W datasets, the challenge of creating purposefully-linked data becomes increasingly difficult. As data moves beyond the K-12 domain, the existence of a common linking identifier is less common and if datasets from external non-state sources are targeted, a common identifier will almost certainly not be available. There are data science approaches to linking data without a common identifier that can be applied. Additional requirements such as being able to unlink and relink data in the case of false positives and the need to re-identify individual data for small cohort interventions will be needed.

What’s working in education?

The purposefully-linked longitudinal view of lifelong learner experiences that has remained elusive is coming more clearly into focus and is now achievable. This view will unlock capabilities to deliver precision interventions to learners, reduce hurdles around understanding learner context as they transition between institutions, and foster an ecosystem of solution provider innovation.

Finally, expanding the number of states that have implemented P-20W datasets that facilitate sharing of data at scale with practitioners, researchers, institutions and parents and teachers will start to answer the questions around what works in education.