When engineers from the recently released documentary “The Beatles: Get Back” contacted Paris Smaragdis, he agreed to help construct an AI-based system to clean up audio featuring the most famous rock ‘n’ roll band in the history of the genre.
From the first time he sat at and used a synthesizer, Illinois CS professor Paris Smaragdis knew that he wanted to learn more about how technology could make or alter music.
What’s followed is a career in academia that centered his Artificial Intelligence research on the question: What does it mean to take a stream of sound and then break it down into its individual components?
This key interest paced his Masters, PhD and postdoctoral research at the Massachusetts Institute of Technology. It fueled his academic career here and as a research scientist with Adobe Research. It provided opportunity to become an IEEE Fellow in 2015 and the IEEE chair of the Audio and Acoustic Signal Processing Technical Committee. And it helped him produce widely published research and more than 40 patents.
Through it all, nothing he’s accomplished has been more “mind-bending” than the recent work he completed with a team of engineers to boost the audio quality of director Peter Jackson’s recent documentary titled “The Beatles: Get Back.”
“I remember growing up as a kid, listening through Beatles cassettes while I sat in the back yard. I began to understand that The Beatles weren’t just a big deal because of the music. They were lightyears ahead of others because of the music production,” Smaragdis said. “Being in a position in which I could not only see how they produced their music, but also deconstruct it and undo the mixing, was definitely a mind-bending experience.”
This consultation began with a couple of phone calls.
Smaragdis heard from the engineering team working on the documentary, at which point they asked if he could help clean up this treasure trove of old Beatles audio. The issue being that the audio was incredibly messy.
This audio came mostly from one microphone that picked up sounds from a room full of musicians, producers, friends and collaborators. The documentary wanted to extract speech and music from these sounds so they could properly portray the development of an iconic piece of music to a new audience.
“That first phone call started with them asking me if this could even be done. I told them that if they tried it 10 years ago, I would say there was no way it was possible. But we’ve had several great advancements over these years that, I believed, could make this possible,” Smaragdis said. “I will say, though, when I did hear some of the first sample recordings, I started freaking out a bit, because of the audio quality.”
Subsequent conversations quickly eased the professor’s mind, however.
Smaragdis said that the engineering team was up-to-date and informed as to how to advance this effort. That allowed the group to focus on a concept that took him out of his comfort zone a bit.
Rather than setting the bar for this audio at something passable for an academic paper, this group needed the resulting sound produced at a movie-level quality. For an academic paper, Smaragdis said, you simply must quantitatively provide some proof that your idea works, but usually the quality of sound results doesn’t need to meet as high of a bar.
“In this case we had to ensure that this sounded good to seasoned audio engineers working in the industry, which is a very high bar,” Smaragdis said. “Soon I realized this process was much more of an art than a science, which was good for me – it was satisfying work and took me down a different path.
“This wasn’t about finding one scientific truth; it was about finding that truth and then tweaking it until it became amazing to listen to.”
The entire process took about nine months to complete.
Most of this work leaned into the research developments Smaragdis noted took place over the last 10 years. Prior to this, computers still struggled to make sense of complex signals such as music, where multiple things are happening simultaneously. But more recently, data-orientated methods that Smaragdis works with tapped into machine learning models to produce more sophisticated and useful systems that are now good enough to produce seamless results.
“This was a large team, and I only worked on a narrow aspect of a system that is called, ‘Mal,’” Smaragdis said. “My focus was on how we could get all of the sounds we encountered separated and identifiable. We could identify Paul (McCartney) speaking over here, or George’s (Harrison) voice speaking over there.”
In the end, the months-long process resulted in a clean sound worthy of the documentary and the monumental band.
Smaragdis, upon viewing the film, realized what the work meant.
“The thing that I found nice about it is the documentary proved that you could connect generations through The Beatles,” he said. “Of course, my parents were The Beatles generation, so they were psyched to see it. But I’ve also known younger kids say that they loved the documentary, and that they’ve discussed it with people were original fans of the band. That’s something I was excited about.”