“I believe that the ultimate goal of computer science research isn't just about devising faster algorithms or more efficient computational methods. It finds its purpose when we design computer systems and user interfaces to empower individuals.” That is how CS Professor Yongjoo Park described the philosophy motivating his research.
Park said, “While refining computational efficiency is integral, the overarching aim of this project is to simplify complex causal reasoning for the user. That is, ensuring ‘effortless’ access to causal data exploration allows individuals to conceptualize problems in a more abstract way, freeing them from the intricacies of lower-level probabilistic reasoning and challenges in scaling the computation to large volumes of data.”
The abstract for “CARE: Interactive Systems for Scalable, Causal Data Science” noted that:
“Advances in Machine Learning, coupled with advances in scalable data processing, have resulted in highly accurate predictions of quantities of interest. Yet, despite the advances in ML and data systems, we cannot easily answer questions related to causal inference in observational data settings, of interest to business, academia, and the public at large. For example, a business may ask: did low salary cause high attrition? (causal inference); what would have been the effect on sales last year had we increased advertising expenditure targeted at women? (a counterfactual inference); an academic may ask: did improved educational attainment cause wage increase? (a causal question); a member of the public might ask: did lack of exercise cause my gain in weight? (a causal question).”
Sundaram added that:
“The entire data science pipeline, from the way data is organized, analyzed, and visualized, implicitly focuses on association between variables. Remarkably, in many cases, causal inference is possible with observational data without conducting randomized experiments. However, the data science infrastructure to do this effortlessly (i.e., with large datasets, interactively) does not exist despite advances in our theoretical understanding of how to accomplish this. This is exactly why this grant can have a huge impact on society. If we can enable citizens to infer causation as easily as they can infer association between variables today, it will lead to better-informed decisions.”
Park and Sundaram aim to create and construct a “scalable, CAusal-RElational (CARE) data system for end-to-end causal data exploration based on a core insight that for effective causal exploration, the system must be designed to let users experience causality by allowing explicit, real-time interventions with causal data modeling, do-calculus querying, and intervention-centric visualization.”
Park noted that construction will be followed by “user evaluation to quantify how effectively users can explore data, understand causal relationships, and test counterfactual scenarios. We anticipate that existing tools may be good enough for understanding associations, but they may fall short in reasoning what-if scenarios, often misleading users. For our tests, we will collect various causal exploration tasks from online and use small and large datasets.”