The Echo Nest, a music intelligence platform powering smarter music apps across the web and devices, announces a major addition to the Million Song Dataset, which is a free resource of music data for the commercial and academic sectors released in conjunction with Columbia University and partially funded by the National Science Foundation.
The Million Song Dataset now includes fully anonymized music listening activity data from a sample of over 1.5 million fans, cross-referenced to the songs in the dataset. This fan activity data is represented through The Echo Nest’s “Taste Profiles,” which reflect listeners’ real time music activity. The new feature allows developers and researchers to build and test music recommendation algorithms using accurate acoustic and contextual metadata for one million songs. In addition, they can understand how anonymous users interacted with the songs in real-world scenarios.
The upshot: more efficient music information retrieval (MIR) research, easier development of commercial music services, and, ultimately, more powerful, accurate predictions of taste in apps for fans and the music industry.
Accurate recommendations are measurably crucial to media businesses. However, developers and researchers have had to build their own much smaller datasets before they can test and improve their music recommendation technologies. The Million Song Dataset solves that problem, for free, and now it includes anonymous real-world usage data in the form of Echo Nest Taste Profiles — something that’s impossible to derive from private, homegrown datasets.
“The lack of quality music activity data available for research has needlessly hobbled recommendation research for a long time,” said Brian Whitman, CTO of The Echo Nest. “I’m excited to be able to contribute a selection of Echo Nest Taste Profile data to the Million Song Dataset for scientists and developers to learn from and build amazing music experiences.”
“The listener behavior described by the Taste Profiles provides a critical missing link for the Million Song Dataset,” said Dan Ellis, Professor of Electrical Engineering at Columbia University, who organizes the Million Song Dataset effort. “Now researchers can fully explore the relationships between aspects such as audio features, lyrics, tags, and the user preferences behind the taste profiles, all on a single, well-defined, common dataset. We hope this will be a research benchmark for years to come.”
To ensure that the usage data is completely anonymous and cannot be traced back to any individual, The Echo Nest is only contributing anonymous session IDs, and only activity that overlaps with songs in the Million Song Dataset. The session IDs in the Taste Profiles were then randomly scrambled to ensure there is no link to the original user session.
Developers, researchers, and other interested parties can visit labrosa.ee.columbia.edu/millionsong for directions on how to download and use the Million Song Dataset in their work.
About The Echo Nest:
The Echo Nest powers smarter music applications for leading media companies and thousands of independent developers, with a customer base that reaches over 100 million music fans every month across over 220 applications. With the world’s only machine learning system that actively reads about and listens to music everywhere on the web, The Echo Nest opens up the largest repository of dynamic music data in the world – over 5 billion data points on over 30 million songs — to help developers re-shape the experience of playing — and playing with — music. The Echo Nest was co-founded by two MIT PhDs. Investors include Matrix Partners, Commonwealth Capital Ventures, and three co-founders of MIT Media Lab.