Spectrogram Analysis of KLIK Music

Personal Posts No Comments »

A volunteer of ours is doing a massive do-over of the KLIK music library (one of the many tasks that people don’t think would ever need to be done), and at the same time we’ve decided it would be a good time to get rid of songs in the library that aren’t “high quality.”

What exactly defines “high quality?”  While usually this term is subjective based on of how much of an audio snob you are, there are some objective attributes that a “high quality” file will contain.  Some may say “a higher bitrate file will naturally sound better, so there’s your objective basis.”  This is not always true, though, as the audio may have sounded like garbage to begin with, and then encoded at 256kbps.  For a concrete example, download this MP3 (deliberately encoded at 320kbps MP3 to try to minimize loss), and take a look at its subsequent spectrogram:

As you can see, there is a significant difference between the audio before 30 seconds compared to the audio after 30 seconds.  They are the same 30-second clip of a song, but each is from a different source.  The first 30 seconds was provided by a volunteer, in a file that was encoded at 192kbps MP3, and the second 30 seconds was provided by iTunes.

The version on the left is missing virtually all frequencies above 10,000 hertz.  This causes the song to sound “dull.”  You can hear it pretty clearly in the MP3 file above (although, you will probably not hear any difference on the standard iPod headphones — you will need nicer headphones to hear the full difference)

In short, there are a lot of songs in the library that would likely be considered “low quality.”  Too quiet, missing frequencies above 10k hertz, etc.

Here is an example of a “good looking” Mp3 file (this is Lady Gaga’s “Born This Way”):

This particular file is a bit distorted, but nevertheless, it contains a much fuller frequency spectrum than the previous example.

So, here’s the problem.  As mentioned before, “bitrate” is an allegedly “good measure” of sound quality.  However, as we just showed, a high-bitrate file may still sound bad.  We’ve decided it’d be best to give “spectral analysis” a shot.

Each of those spectrogram takes about 2-3 seconds to completely analyze a song.  Span that over 15,000 songs, and that’s a pretty hefty chunk of time.  How do we speed it up?  There are two things we are doing:

  1. Converting stereo to mono for the spectrogram analysis (while the conversion from stereo to mono in itself takes time, the analysis of the data takes far less time)
  2. Analyzing only the first 120 seconds as supposed to the whole song.

These two things give us a pretty “clean” sample as to what the song looks like.

Since we’re not smart enough to simply write a program to generate the spectrogram but not display it, we’re relying on the coolest tool in the world called SoX to generate the spectrogram into PNG files.  That means we all have to do is write a little program that analyzes PNG files.

Fortunately, the .NET Framework makes it pretty simple.  It’s really just two lines of code:

Bitmap bmp = new Bitmap("C:\\Users\\Jake\\Desktop\\sox\\sox-14.3.1\\spectrogram.png");
Color clr = bmp.GetPixel(246, 272);

Pretty spiffy, eh?  Determining the pixel color is fast, and it’s even faster when we’re only dealing with grayscale.  As such, we’ve configured SoX to create the spectrograms in grayscale, like so:

While we meek humans may not be able to tell much of a difference between grayscale colors, the averages that the computer calculates can tell a world of difference.

The first thing we did was establish the frequency range that we were most interested in having.  After examining about 5 or 7 spectrogram, we decided the best area of interest was between 10,000 hertz and 16,000 hertz.  Many files cut off after 16,000 files, much like this one.  Files that did not extend into this range didn’t sound too great.  As such, we found the X and Y bounds of this area across the whole PNG file.

Now that we had the pixels to analyze, the next step was to find the best “resolution.”  It wound up being even reasonably fast to simply analyze every pixel and take the average of all of them to determine if the file would be admissible.  It took about half a second per file to add up and average all these pixels (it’s about 75,294 pixels).  However, skipping every other pixel increased performance even further.  We ran a little test to see how many pixels we could skip per iteration and still get accurate results.  This is what we came up with (with a different spectrogram, not listed here, that would fail the quality test [hence such low numbers]):

277 milliseconds to get a result of  45 (correct answer)
57  milliseconds to get a result of  45
23  milliseconds to get a result of  44
14  milliseconds to get a result of  46
9   milliseconds to get a result of  39

On each line, the delta value increased by 1.  So, for instance, the 277 millisecond line was incrementing the count by 1 pixel on each iteration.  The next line, 2 pixels, then 3 pixels, and so on.  Adding 5 pixels at a time gave fairly inaccurate results, so we decided to go forward with 3 pixels, as it gave fairly accurate results and gives significant performance improvements.

Consider this: by even skipping every other pixel, you will get your results 55 minutes sooner than if you analyzed every pixel.  There is even an 8 minute difference between skipping 2 pixels and 3 pixels.  Overall, pretty incredible the resources that sampling will save.

Up next: how KLIK will be using statistical analysis to determine the best “average intensity of 10,000-16,000Khz threshold for musical quality!”  (In other words, how we’ll use stats to figure out the minimum average of the intensity of sound between 10,000 and 16,000 hertz.  Maybe that’ll make more sense).

We forgot about February. Oops.

Personal Posts No Comments »

In truth, no news is generally good news.  In honesty, we just don’t have anyone who is dedicated to reporting the news!  If you’re interested, please volunteer with us. 🙂

There are a lot of things happening.  Here’s the basic scoop:

Want to go to Elitch Gardens for $25?! So do we.  We’re selling tickets again this year as part of Elitch Gardens’ Coasters for Caring program.  We earn $5 for each ticket sold, and all money comes directly to our organization and is put into use immediately — no waiting for a check from Elitch’s at the end of the program.  If you want to buy one, scoot on over here, or call us at 970-281-5545.

Want to donate to us using an eBay auction?  It’s now possible! We’re finally listed in the MissionFish database, which means you can donate to KLIK using an eBay auction.  Got something around your house you don’t want?  Donate a portion of the proceeds directly to KLIK, and you get extra exposure due to the fact that a portion of your auction is going to charity!  You can learn more over on eBay about how the program works.  It also means we can experiment with using things like eBay for selling in-kind donations that we don’t particularly need, as well as the possibility of selling advertising time on eBay (it could be an interesting experiment).

Silence for Sound?  TBD. We’re still trying to work with the Fort Collins Downtown Business Association to secure a date to use the Old Town Stage for our next Silence for Sound adventure.  Once we have a date, things can start rolling with bands.  Volunteers are already on active look-out for donated items for auction.

Colorado Meth Project partnership. We are excited to announce (even if it is a bit premature) that the Colorado Meth Project is going to be sponsoring KLIK Radio.  There will be an on-air interview in the coming future, and we will be airing their “not even once” ads throughout the day to raise awareness of the dangers of methamphetamine use, as well as its usage here in Colorado.

8z Real Estate., also known as COHomeFinder.com, Has decided to sponsor us for another year!  Yay!  Head on over to our Sponsors page to check out their link and information.

On that note, new advertising rates are coming soon. Now that businesses are actually showing interest in advertising with us (apparently 4 years old is the mark where people say “okay, I guess they’re not going anywhere…”), we’re coming up with a better pricing model for advertising.  If you are interested in buying advertising on our station, please contact us.

The board of directors transition to college students and alumni mixed with high school students is complete. In case you missed the transition, head on over to our board of directors page to see who the current directors are.  Hopefully we can get some pics or bios up later, too.

We’re getting closer to closing the gap in funding to purchase our telephony software.  We’re getting closer to being able to purchase a $1,700 piece of software which will allow us to manage up to 6 telephone calls simultaneously.  It will also add functionality for conference calling and allowing callers who are currently on-hold to hear the caller on air – something that currently doesn’t happen.  The upgrade will also require an upgraded sound card, which is adding to the delay in launching the final stage in our upgrade process called “KLIK Interactive” which started last year.

PaCE interns are starting to take over more and more software development. That means I can start leaving behind the day-to-day programming work and focus more on keeping the organization going.

The open source Podcaster is actually being tried out by others beyond KLIK. Yes, people, the Podcaster has gained a little bit of attention.  For awhile, I was receiving 10 e-mails a day or so about the Podcaster producing errors on other people’s computers (not impressive, I know, but based on the errors, it was always the user’s fault.  And, in my defense, the error handling mechanism works, so HA!)  There are a few developers from the CodePlex community who have signed on, but I have yet to get in touch with them, as I really have very little time to work on the Podcaster at the moment.  In about a month or so, I hope to really pick up the pace and work on it in my spare time outside of my newly-landed job at FM Global (yay!).  While I am in their employment for approximately 7 months, I should have a good deal of after-work time to put in to the Podcaster, since I won’t have to worry about school.  So!  The Podcaster is under way.

An open source version of our Uploader is coming to the blog…soon. We created a song uploader that we simply call “The Uploader.”  It basically copies files from members’ computers to our studio computers and adds it to our music library automatically.  It uses several pieces of open source software, both GPL and LGPL (ffmpeg, an ID3 library, and LAME MP3 to name a few), and we need to release the source code for it.  The problem is that it contains several hard-coded passwords, host names, and usernames for KLIK servers that we don’t want out and about in the public.  We’ll be redacting those, then releasing the code (so it’s pretty much there and complete, but without the security risks for us 🙂 ).  Anyways, if for some reason you are already familiar with the Uploader project and are wondering where the source code is, it’s coming.

That’s it!  Have a nice day!

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in