Computer Scientist Publishes Manifesto for Expressive Algorithmic Music

A new five-year research project aims to understand how humans compute music.

Michael Byrne

Michael Byrne

Image: Goombung/Shutterstock

Gerhard Widmer occupies a peculiar place within computer science research. His focus of study is computational perception, a relatively unknown subfield located at the intersection of psychology, neuroscience, and machine learning. Generally, it questions the advance of computing in the absence of a deeper knowledge of how the human brain perceives the world to be computed. We want computers to have perception, yet we know little about the workings of our own human perception.

In the current ACM Transactions on Intelligent Systems and Technology, Widmer offers a manifesto for Music Information Research (MIR), a vast new field that encompasses everything from music recommender systems to automated music recognition and transcription systems to original algorithmically-generated music.

Widmer sees some crucial blind spots in MIR and a need for a refocusing of the field, arguing that his manifesto offers a path toward a "qualitative leap in musically intelligent systems." It's pretty interesting.

The manifesto (his term) starts with a list of problems—stuff that computers just can't do or can't do well enough:

(1) Distinguish between songs that I might find boring or interesting.

(2) Return (among other pieces) Beethoven's piano sonata op.81a (Les Adieux) when asked for a piece of classical music with a surprise at the beginning.

(3) Classify Tom Jobim's or Joao Gilberto's rendition of ˜ Garota de Ipanema as more relaxed and "flowing" than Frank Sinatra's.

(4) Play along with human musicians (e.g., accompany a soloist in a piano concerto) in a musically sympathetic way, recognising, anticipating, and adapting to the musicians' expressive way of playing (dramatic, lyrical, sober, etc.).

So, how do we actually solve these things? A key point made in the manifesto (among several) is that we need computers that are capable of more than just processing music as data and patterns. It's one thing to compute the structure of a piece of music, but another to compute a piece of music in terms of how music is perceived by humans. Music is a process of perception. "We need to remember that the ultimate place of music is in the human mind," Widmer writes.

The problem in understanding human perception of music in terms of computer science is that the models we currently have for that perception just aren't very computer science-y. They're imprecise, ambiguous, and often loaded with contradictions, Widmer notes. This is maybe not necessarily so, however.

The manifesto also makes the point that music perception and appreciation are both learned capabilities. We're not really hardwired for them. As such, they're targets for machine learning. "This is now the time for the MIR community to embark on massive feature/ representation learning endeavours—much like the current trend in image analysis, which starts to produce quite spectacular results," Widmer writes. "Given the computational and data-related demands, the MIR community should join forces and pool its resources, efforts, and learned models (in cases where the training data cannot be shared)—and indeed, it has already begun to do so."

Which brings us to Widmer's ultimate point: a new five-year European Research Council-funded research endeavour called Con Espressione. Its goal is the characterization and recognition of the expressive aspects of music, particularly as realized through musical performance. There's a lot to this, from algorithmically extracting data from audio recordings and live performances to developing new models for describing musical perception and for generating expressive music.

A key part of that is an online "game" in which players are asked to listen to excerpts from five Mozart sonata renditions played by different pianists and then supply adjectives best describing the "character" of the performances. It's a start, anyway.