Experiments become harder to interpret when experimental units influence one another in ways we cannot observe. In an infectious-disease study, treating one patient can reduce exposure for nearby patients through lower transmission. On social platforms, showing a message to a subset of users changes what others see because reactions, shares, and ranking algorithms shift the content that spreads. These indirect effects can blur the link between who was treated and how outcomes change. The main question is how we can analyze the results of an experiment in such settings. In this context, I study how the effects of an intervention spread through a population over time and what these spreading patterns reveal about the hidden structure underlying them. The goal is to help researchers and organizations run experiments that stay accurate even when real-world interactions make the data messy.
Two units observed over three periods illustrate how an intervention can transmit across time and between individuals through multiple interference pathways. Source: https://www.pnas.org/doi/10.1073/pnas.2322232121
Suppose I tell my friend Alice a piece of information. I never see her talk to anyone else, but the next day Bob also knows it. Even without observing their conversation, the shift in what Bob knows tells me something passed between them. In many experiments, outcomes move in the same way. When an intervention touches some individuals, its effects appear gradually in others, and the patterns of these changes trace the hidden pathways through which influence travels.
A simple social network with two influencers. This figure illustrates how randomized treatment helps reveal an unobserved network structure. Indeed, randomizing the intervention triggers a representative sample of the links through which influence can travel. The resulting changes in outcomes reflect these activated pathways, providing indirect evidence of how the network shapes the evolution of outcomes over time. Source: https://arxiv.org/pdf/2511.21675
Causal Message Passing introduces the core idea behind this approach. The paper shows that if we watch how outcomes shift from one time step to the next, we can recover evolution rules that capture how influence moves through a network. Even when the network is never observed, the pattern of outcome changes carries enough structure to reveal how treatment effects propagate. This insight allows us to reconstruct what would have happened under different assignments by replaying the evolution with alternative treatments [1].
Estimation strategy. Experimental data is used to estimate evolution mappings f through supervised learning, which are then applied recursively to generate desired counterfactuals. Treatment allocations w and w′ share identical initial columns, serving as initialization for our recursive approach. Source: https://arxiv.org/pdf/2502.01106
We also study how to turn this idea into a practical tool. Specifically, we show how to learn the evolution rules from data and how to validate the estimation model. To this end, we introduce a bootstrap-style resampling method that creates multiple distinct samples from a single experiment while preserving the underlying evolution mechanisms. These samples then feed into a tailored cross-validation scheme that selects the most reliable model for the causal question at hand.
Our resampling approach builds on a key finding: groups with different treatment histories follow the same evolution rules even though their outcome levels differ. For example, always-treated and never-treated individuals may have very different outcomes, yet their outcome distributions evolve in parallel, according to similar evolution rules. By grouping units based on treatment history, we can construct many credible pseudo-samples from one experiment, which leads to more precise estimation [2].
Counterfactual Cross-Validation. Time horizon is partitioned into blocks for leave-one-out validation. Models are trained on the remaining blocks and evaluated via MSE to select optimal configurations. Source: https://arxiv.org/pdf/2502.01106
The follow-up work on evolution-based models [3] explains why and when these methods work. It takes an axiomatic view, showing when different treatment scenarios can be represented by the same underlying mechanism and what features of the system make this possible. The paper also identifies cases where this assumption breaks down, clarifying the limits of evolution-based estimation and highlighting the structural conditions required for these tools to remain valid.
Real experiments rarely reveal ground-truth counterfactuals, which makes it difficult to know when a method is working and when it is not. We address this by building a benchmark toolbox (CausalMP) with several realistic simulators that reflect different types of interference. The NYC Taxi Routes simulator uses real trip data to create an urban network, where strong daily and weekly cycles create challenging time trends for causal estimation. The Belief Adoption Model simulates opinion diffusion through coordination incentives, where interactions and demographics shape how beliefs spread in a heterogeneous way. The Exercise Encouragement simulator focuses on digital health interventions, combining peer influence with weekly behavioral cycles. The Data Center simulator models a server farm where routing rules create implicit interference across machines. The Auction Model captures strategic spillovers in competitive markets, where promotional treatments on one object shift bidding behavior on others. These environments, all publicly available, provide fully controlled settings with known counterfactuals, allowing rigorous evaluation of evolution-based estimators and other methods [2].
Within this collection, AI-driven simulations play a special role. They create rich behavior without hand-coding every detail and allow us to test methods in settings that mimic real platforms.
The LLM-based Social Network simulator uses AI agents connected through synthetic social networks to model how content spreads on a platform. Agents receive posts, react to what they see, and influence one another through their engagement patterns. The experiment compares a control setting, where each user sees a random ordering of posts, with a treatment setting that weights posts by friend engagement. This design creates clear interference pathways, making it an environment for testing whether causal effect estimators can recover the effects of a ranking algorithm when influence flows through user interactions [2].
The LLM-based election simulator models a synthetic population built from Census demographics and a Twitter social network. Each node is an LLM agent with its own behavioral profile that generates posts, reacts to content, and updates its voting intentions based on the flow of information around it. The experiment mirrors a large-scale mobilization study (61-million-user Facebook voting experiment): users receive either an informational message or a socially reinforced message, and their reactions shape what others encounter through replies, mentions, and repost chains. These interaction patterns create strong and uneven interference, where the influence of a single message can echo across the network as conversations evolve over time [4].
Overview of the LLM Social-Political Mobilization (LLM-SocioPol) simulator. U.S. Census and Twitter data are first used to construct a realistic population and social graph. Each agent’s profile is then enhanced with demographic attributes and political-stance scores before being assigned an LLM model (GPT-4.1, GPT-4.1-Mini, or GPT-4.1-Nano) reflecting individual sophistication. Within the simulation environment, agents manage their follow relationships, engage with and create posts, process social-influence cues, and continuously update their voting intentions. The example panel shows a representative user profile, a treated feed containing a social-message prompt, and the 0–4 voting-likelihood scale for outcome measurement. Source: https://arxiv.org/pdf/2510.26494
[1] S. Shirani and M. Bayati, “Causal Message Passing for Experiments with Unknown and General Network Interference,” Proceedings of the National Academy of Sciences, vol. 121, no. 40, p. e2322232121, 2024.
[2] S. Shirani, Y. Luo, W. Overman, R. Xiong, and M. Bayati, “Can We Validate Counterfactual Estimations in the Presence of General Network Interference?” Working paper, 2025.
[3] S. Shirani and M. Bayati, “On Evolution-Based Models for Experimentation Under Interference,” Working paper, 2025.
[4] S. Shirani and M. Bayati, “Simulating and Experimenting with Social Media Mobilization Using LLM Agents,” Working paper, 2025.