Misailovic Wins NSF CAREER Award to Develop Tools to Improve Applications That Rely on Noisy Data
A large and growing number of computing applications rely on mountains of inherently imperfect data to make critical decisions. Assistant Professor Sasa Misailovic wants to create tools that help programmers debug software operating in this messy environment, and has won National Science Foundation backing to do it.
Misailovic has won an NSF CAREER Award based on his proposal to investigate whether static program analysis can be used as the foundation for those tools, and to develop course work to train students in the probabilistic and statistical ways of thinking that can help them improve the reliability of their applications.
“How do we help software developers in a world of uncertainty? The work that I am doing is mainly connecting the theory – so building the theoretical framework for reasoning about it – and building practical tools that can help programmers understand where noise comes from and how that affects computations,” Misailovic said. “We’re looking for ways to bring debugging into this world where there are no strict, single ideal values for variables in your programs.”
The five-year award will provide Misailovic with $500,000.
The data Misailovic wants to help developers handle more effectively is known as noisy data – data in which what is useful or needed is obscured by extraneous or corrupt data. It is ubiquitous, a product of imperfections in the sensors gathering data in many applications, the models used in computation, and even in the computations themselves.
“We either have to ask program developers to think about every aspect of handling those imprecisions, or we can go and try to think, ‘How can we help them to introduce principled programming techniques for handling noise?’” Misailovic said.
The proposal accepted by the NSF grew out of two projects led by his students:
- ProbFuzz, a novel framework for systematic testing of probabilistic programming systems used to automate parts of the Bayesian tasks used in machine learning, computer vision and statistics. Its development was led by PhD student Saikat Dutta.
- PSense is a system used to evaluate the sensitivity of probabilistic programs. Zixin Huang, who is now a PhD student, led its development as an undergraduate.
Probabilistic programming languages aim to accelerate the use of efficient Bayesian inference – a means of updating the probability of a particular outcome as more information becomes available -- by providing an intuitive framework for software developers. Probabilistic languages represent probabilistic models as programs that assign probability distributions to the variables in code, and condition the model on observed data.
“So imagine now that you’re writing a program that has probability distributions in them (and) they have some values in them for initial beliefs,” Misailovic said. “You change something about the assumptions, and you want to see how much that affects the results of that program.”
During the project, he plans to evaluate two types of analyses for probabilistic programs. Sensitivity analysis examines how changes in an independent variable affect a dependent variable or output. Semantic differencing computes how the structural code changes affect the results of probabilistic inference.
Misailovic wants to develop a range of techniques that use these types of analysis to find errors in probabilistic programming systems and improve the performance of applications that rely on noisy data.
Beyond that, he plans to develop a course to train computer scientists to think more statistically so they can improve the development of their applications that operate in the presence of noise and uncertainty.