Forsyth’s New Textbook Covers Foundational Big Data Concepts

2/18/2018 David Mercer, CS @ ILLINOIS

“Probability and Statistics for Computer Science” touches on big data and machine learning topics that haven't typically been covered in undergraduate classrooms.

Written by David Mercer, CS @ ILLINOIS

Professor David A. Forsyth hopes that “Probability and Statistics for Computer Science,” the new textbook he’s authored, will be found very soon on university campuses across the country.

Professor David A. Forsyth's "Probability and Statistics for Computer Science" touches on big data and machine learning topics that haven't typically been covered in undergraduate classrooms.

The book touches on big data and machine learning topics that haven’t been typically seen in undergraduate classrooms, and Forsyth says the book and the concepts it introduces will help undergraduates navigate the rapidly changing world of computer science.

“Some of them are sort of elementary—basic probability, basic statistics. But it also covers some stuff that traditionally undergraduates haven’t seen and they need to see—classification, for example. Clustering is another example. Regression is another example,” said Forsyth, who is the Fulton Watson Copp Chair in Computer Science.

“Most computer science undergraduates don’t get shown that stuff, and that’s a mistake because once they leave the doors of their institution they’re going to hit it big time in the real world,” he said.

The book is the product of years of work and grew out of a decision in the Department of Computer Science several years ago to revise its statistics and probability curriculum to deal with the rise of big data and machine learning. That decision reflected changes that now demand a working knowledge of data, classification, regression, and a number of other areas, Forsyth said.

Professor David A. Forsyth, the Fulton Watson Copp Chair in Computer Science

“You don’t have to be an expert in any of these areas, but if your reaction to something is, ‘Classification, what’s that?,’ you’re going to have problems,” he said. “It’s a big change and a fairly recent change. But everybody is out there classifying this or predicting that or whatever.”

Forsyth believes the textbook he wrote is vastly better than what he would have been able to author 20 years ago because of what he can borrow from others – real data now widely published on the Internet. That includes everything from life-and-death information from the Federal Emergency Management Agency to data published by an Australian pizza chain to support its claim that its pizzas are always bigger than those sold by Domino’s.

Making up data for textbooks used to be the standard, but data from academics, government agencies and other sources gives students the chance to work through real-world problems and deal with the kinds of sometimes time-consuming headaches that includes - questionable, dated or just plain flawed data.

“Can you trust the number? Is the number meaningful? So the exercise is quite valuable that way,” Forsyth said. “(And) it’s not like you’re going to spend half an hour with this, come up with the answer, and then move on with your life.”

Forsyth previously co-authored “Computer Vision: A Modern Approach,” and he has another textbook in the works that he hopes to finish later this year.

The process of writing a textbook is a long one. He estimates the first draft of “Probability and Statistics for Computer Science” was finished in 2012.

By the time he was finished, the book had been rejected by one publisher, reviewed by people from a number of institutions, and shaped by even more feedback from students, colleagues and others.

“I owe lots of favors to all sorts of people,” Forsyth said.

Share this story

This story was published February 18, 2018.