Skip to main content

Sony’s AI drums up beats for songs

Watch all the Transform 2020 sessions on-demand here.


AI might soon become an invaluable tool in musicians’ compositional arsenals, if recent developments are any indication. In July, Montreal-based startup Landr raised $26 million for a product that analyzes musical styles to create bespoke sets of audio processors, while OpenAI and Google earlier this year debuted online creation tools that tap music-generating algorithms.

Inspired by this and other recent work, researchers at Sony investigated a machine learning model for conditional kick-drum track generation. Given an existing song and low-dimensional code that encodes the relationship between said song and to-be-generated new material, the AI creates a variety of “musically plausible” drum patterns from one song to another irrespective of differences in tempo and time-shift (i.e., changing speed or duration).

“We propose a model architecture … that encodes rhythmic interactions of the kick drum versus bass and snare patterns. Each mapping code captures local relations between kick vs bass and snare inputs, such that an entire track is associated to a sequence of mapping codes,” explained the coauthors. “Rather than controlling the characteristics of the generated material directly, it offers control over how the generated material relates to the conditioning material.”

To train the AI system, the researchers compiled a data set consisting of 665 pop, rock, and electronic songs where the rhythm instruments bass, kick, and snare were available as separate 44.1kHz audio tracks. (Contextual signals consisted of two input maps for beat and downbeat possibilities as well as maps for the onset functions of snare and bass.) Next, they rendered an audio file of a drum kick by placing a drum sample on all amplitude peaks remaining after thresholding, to which they introduced dynamics by choosing the volume of the sample from 70% for peaks at the threshold to 100% for peaks with the maximum value.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


In a series of experiments, they tapped the AI system to both conditionally generate drum patterns and transfer style, or apply rhythm patterns inferred from one song to induce similar patterns in another song. Additionally, they created time-stretched versions of songs at 80%, 90%, 110%, and 120% of the original tempo, respectively, and determined a mapping code.

Here’s one song pre-processing (Gypsy Love):

And here’s that same song with AI-generated drum patterns:

The team notes that the reconstructions aren’t perfect in part due to the model’s “invariance,” but they point out that the accuracy for the validation set was similar to that for the training set.

“We have shown that the mapping codes are largely tempo and time-invariant and that musically plausible kick drum tracks can be generated given a snare and bass track either by sampling a mapping code or through style transfer, by inferring the mapping code from another song,” wrote the coauthors, who leave to future work applying the same approach to snare drum and bass track generation.