Presentation What Shazam doesn't want you to know

Roy always wants to know how things work, to the smallest detail. This session will focus on music recognition like Shazam and SoudHound. Those "magic" programs that identify songs by listening to it. After this session you'll not only know how to implement this in Java. You'll also have learned how a microphone works, how the human ear works, how to capture and analyze sound in Java SE, what the Fourier Transformation does, and of course how those music recognition algorithms do their magic. Also, after publishing this information on my blog I've received a couple of patent infringement claims from Shazam's patent holders. Can you really be sued after a weekend of programming and releasing the source code?

Speakers


Slides

Wh a t Sh a za m d o e s n 't w a n t y o u t o k n o w

Wh a t Sh a za m d o e s n 't w a n t y o u t o k n o w Roy van Rijn Software Craftsman

I m a g i n e t h i s ...

I m a g i n e t h i s ...

Friday afternoon

Friday afternoon 3

Getting inspiration

Getting inspiration 4

Singing along...

Singing along... 5

Meet Stewie!

Meet Stewie! 6

M u s i c m at c h i n g

M u s i c m at c h i n g

Music matching

Music matching · Shazam is magic... alien technology! 1) Start the application. 2) Let it listen for +/- 20 seconds. 3) It tells you: 8

Sat u r d a y m o r n i n g

Sat u r d a y m o r n i n g

The microphone

The microphone · Specifying the audio format: private AudioFormat getFormat() { float sampleRate = 44100; int sampleSizeInBits = 8; int channels = 1; boolean signed = true; boolean bigEndian = true; return new AudioFormat( sampleRate, sampleSizeInBits, channels, signed, bigEndian); } 10

The microphone

The microphone · Accessing the microphone final AudioFormat format = getFormat(); DataLine.Info info = new DataLine.Info(TargetDataLine.class, format); final TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info); line.open(format); line.start(); 11

The microphone

The microphone · Reading the sound: out = new ByteArrayOutputStream(); running = true; try { while (running) { int count = line.read(buffer, 0, buffer.length); if (count > 0) { out.write(buffer, 0, count); } } out.close(); } catch (IOException e) { throw new RuntimeException(e); } 12

Inside the byte array

Inside the byte array 0 0 0 1 2 5 0 -1 -3 -4 -5 -2 0 1 2 0 2 (etc) 13

Plotting a graph

Plotting a graph 14

The human ear

The human ear · Membrane, cochlear, brain 15

Plotting a graph

Plotting a graph 14

The microphone

The microphone · Specifying the audio format: private AudioFormat getFormat() { float sampleRate = 44100; int sampleSizeInBits = 8; int channels = 1; boolean signed = true; boolean bigEndian = true; return new AudioFormat( sampleRate, sampleSizeInBits, channels, signed, bigEndian); } 10

The microphone

The microphone · Reading the sound: out = new ByteArrayOutputStream(); running = true; try { while (running) { int count = line.read(buffer, 0, buffer.length); if (count > 0) { out.write(buffer, 0, count); } } out.close(); } catch (IOException e) { throw new RuntimeException(e); } 12

Plotting a graph

Plotting a graph 14

Frequencies...?

Frequencies...? · Hertz · The amount of cycles per second · i.e. one sine wave. 16

Time vs Frequency

Time vs Frequency · To match music we need frequencies, not waves. 17

Fourier Transformation

Fourier Transformation 18

Fourier Transformation

Fourier Transformation 19

Fourier Transformation

Fourier Transformation 18

Fourier Transformation

Fourier Transformation 19

Fourier Transformation

Fourier Transformation 20

Fourier Transformation

Fourier Transformation 21

Fourier Transformation

Fourier Transformation 22

Fourier Transformation

Fourier Transformation · Excellent explanation by Stuart Riffle ­ http://altdevblogaday.com 23

Frequency domain

Frequency domain · We've lost track of time! 24

Windowing

Windowing · Solution: Apply transformation on pieces byte audio[] = out.toByteArray(); final int amountSlices = audio.length/SLICE_SIZE; Complex[][] results = new Complex[amountChucks][]; for(int slice = 0;slice < amountSlices; slice++) { Complex[] complex = new Complex[SLICE_SIZE]; for(int i = 0;iSpectrum Analyzer Spectrum Analyzer · From wikipedia: Spectum Analyzer A spectrum analyzer or spectral analyzer is a device used to examine the spectral composition of some electrical, acoustic, or optical waveform. 26

Windowing

Windowing · Solution: Apply transformation on pieces byte audio[] = out.toByteArray(); final int amountSlices = audio.length/SLICE_SIZE; Complex[][] results = new Complex[amountChucks][]; for(int slice = 0;slice < amountSlices; slice++) { Complex[] complex = new Complex[SLICE_SIZE]; for(int i = 0;iSpectrum Analyzer Spectrum Analyzer · From wikipedia: Spectum Analyzer A spectrum analyzer or spectral analyzer is a device used to examine the spectral composition of some electrical, acoustic, or optical waveform. 26

A p h ex Tw i n

De m o : A p h ex Tw i n

Demo

Su n d a y

Su n d a y

Matching the song

Matching the song · Determine the key points (in ranges): 34 41 39 41 40 42 40 47 40 53 40 48 39 45 40 42 40 41 38 42 92 117 106 117 81 109 89 84 81 113 129 130 129 121 129 132 135 125 121 131 186 218 191 217 208 260 247 251 232 245 (etc...) 29

Something to match against

Something to match against · Playing/decoding MP3 files: ­ JLayer (real time MP3 decoder) · jl1.0.1.jar ­ MP3SPI (Java plugin, based on JLayer) · mp3spi1.9.4.jar ­ Tritonus (implementation of Java Sound API) · tritonus_share.jar 30

Something to match against

Something to match against · Harvesting my music collection: public void harvest(File rootDirectory) { String[] itemsInDirectory = rootDirectory.list(); for(String itemInDirectory:itemsInDirectory) { if(itemInDirectory.endsWith(".mp3")) { //Assume mp3 file File mp3File = new File(mp3Directory, itemInDirectory); captureAudio(mp3File); } else if(new File(mp3Directory, itemInDirectory).isDirectory()) { //Directory? Recurse! harvest(new File(mp3Directory, itemInDirectory)); } } } 31

What we have now

What we have now · We have: ­ Set of +/- 3000 files of reference data (songs) ­ Way of capturing key moments with microphone · Lets do some matching! 32

Hash function

Hash function · Create a single hash per slice private static final int FUZ_FACTOR = 2; private long hash(String line) { String[] p = line.split("\t"); long p1 = Long.parseLong(p[0]); long p2 = Long.parseLong(p[1]); long p3 = Long.parseLong(p[2]); long p4 = Long.parseLong(p[3]); // long p5 = Long.parseLong(p[5]); // Not using the fifth point currently return (p4-(p4%FUZ_FACTOR)) * 100000000 + (p3-(p3%FUZ_FACTOR)) * 100000 + (p2-(p2%FUZ_FACTOR)) * 100 + (p1-(p1%FUZ_FACTOR)); } 33

Matching algorithm #1

Matching algorithm #1 1) 2) 3) 4) Load all the reference hashes Listen to the microphone and generate hashes Find all matching hashes Return the reference-song with most hits · This worked (a bit) but produced a lot of mis-hits · How can we improve this? 34

Matching algorithm #2

Matching algorithm #2 · Microphone, sample #1, matches with: · Song 4, sample #4 · Song 6, sample #9 · Microphone, sample #2, matches with: · Song 4, sample #6 · Song 6, sample #10 · Song 8, sample #4 · Microphone, sample #3, matches with: · Song 4, sample #5 · Song 5, sample #3 35

Matching algorithm #2

Matching algorithm #2 · Microphone, sample #1, matches with: · Song 4, sample #4: · Song 6, sample #9 4­1=3 9­1=8 · Microphone, sample #2, matches with: · Song 4, sample #6 · Song 6, sample #10 · Song 8, sample #4 6­2=4 10 ­ 2 = 8 4­2=2 · Microphone, sample #3, matches with: · Song 4, sample #5 · Song 5, sample #2 5­3=2 2 ­ 3 = -1 36

Matching algorithm #2

Matching algorithm #2 · Now we group the results: 2x: 1x: 1x: 1x: 1x: 1x: Song 6 with offset 8 Song 4 with offset 2 Song 4 with offset 3 Song 4 with offset 4 Song 8 with offset 2 Song 5 with offset -1 37

Recap

Recap · Music matching (Shazam, SoundHound) isn't magic · Searching can be done very fast: ­ Searching for hashes: O(1) ­ Align and match · The Fourier Transformation is like a spirograph: 38

Demo

Other uses for this algorithm

Other uses for this algorithm · What are other uses for this algorithm: ­ Speech recognition? · Probably not.. ­ Detecting duplicate songs in your music collection? · Yes! Took 5 minutes for crude implementation ­ Subtitle synchronisation in India ! 40

Landmark Digital Services

Landmark Digital Services ... Landmark Digital Services owns the patents that cover the algorithm used as the basis for your recently posted "Creating Shazam In Java". While it is not Landmark's intention to alienate those in the Open Source and Music Information Retrieval community, Landmark must request that you do not ship, deploy or post the code presented in your post. Landmark also requests that in the future you do not ship, deploy or post any portions or versions of this code in its current state or in any modified state. We hope you understand our position and that we would be legally remiss not to make this request. We appreciate your immediate attention and response. ... 41

Getting information

Getting information · After this email I contacted: ­ Arnoud Engelfriet (Dutch IT lawyer, patent attorney) ­ Free Software Foundation ­ And others. 42

Now the blogpost?

Now the blogpost? · From another email: As I'm sure you are aware, your blogpost may be viewed internationally. As a result, you may contribute to someone infringing our patents in any part of the world. While we trust your good intentions, yes, we would like you to refrain from releasing the code at all and to remove the blogpost explaining the algorithm. 43

No way...

No way... · My reply was short and concise: I'm sorry, I can't comply. The blogpost will absolutely not be removed. Good luck. 44

Qu e s t i o n s ?

Qu e s t i o n s ?