Motivation
I'm a student. And so it happens students in Finland we wear overalls in most events we attend overall picture
It is really common to customize your overalls. For example I have switched one of my sleeve s and made a custom hood to keep me warm in these winter nights. Here is a picture from my instagram so you can see what I'm talking about. I wanted to create led strips that would blink along my sleeves and sync to my environment and/or the music that is playing on my phone. I figured this shouldn't be too much of a hassle to botch together.
Keep in mind that this project took me a lot of time and effort, I'm simplifying the story a bit to keep it even remotely entertaining for those that do not like or want to go in depth in mathematics or signal processing.
Quick Realizations
It was very clear already in the beginning that such a small but very revolutionally powerful device as ESP32 would probably have not enough power for signal processing. I had quite a good backround already in Java, but to my suprise that didn't carry me very far in the project. In addition I quickly figured what music visualization actually consists of:
- Windowing (Hann, Hamming, Blackman-Harris)
- Fast, discrete and partial Fourier Transforms.
- Mel Filtering
- Interpolation
- Matrix operations
in my horror it seemed that my project just kept growing and the next period of university closing in very fast. Also Android visualizer API seemed to be very old and discouraging(8bit mono audio/partial samples).
First version
I spent most of my winter break coding and reading the Android documentation. My goal was just to make a working mockup UI that I could develop the visualization againts. First the whole concept of Activities, Fragments, Services, Binders etc. was very new and difficult concept for me to grasp. With the help of open source and Github I crawled through dozens of projects file by file to understand really how everything should be built. After learning the very basics of creating a UI in Android studio I started to implement the Bluetooth functionality which was way more complex that I had ever though about(how would've though?).
This part of the development wasn't all that interesting since it just involved basic Android ecosystem implementation so from now on we will just focus on the algorithm used by the visualization itself which is far more entertaining.
Difficulties
I studied the API and found that it provided a getFFT method. However I was pretty sceptical that it would even work since the API was introduced in Gingerbread. I created a base visualizer and a extended class of barvisualizer so I could plot the byte[] array to the screen. x-axis would be the length of the array and y-axis the values.
Sine wave should represent a clear symmetrical peak in the visualizer. Definitely not one ascending and descening in different manners nor should there be any other lines in the results. So the results were just like I was expecting. Fortunately the API had another method getWaveForm(byte[]) So I needed to do the Fourier Transform myself(or find a library that did it reliably). JTransforms, TarsosDSP and others had many similarities but resulted always in very different outcomes. I decided to stick with JTransforms because it supported Double precision and was the fastest.
With the manual FFT the end results were a lot clearer but lacked the symmetrical shape I was aiming for. I had to retreat to do some reading, I found this I learned that to remove the spectral leakage I needed to some windowing
Windowing
The most common of windowing methods were Hann, Hamm, Blackman-Harris etc. Hann-being the most recommended one for general purpose. I tried all of them and found Hann indeed produced the best result
However I only found methods that worked with Float typed arrays. I had to rewrite and combine a few methods. The end result was spectacular (2nd picture).
In addition to to this I found that to actually have a good evenly spaced window I had to have some sort of rolling or overlapping happen when a new sample is being generated from the source. I implemented a fft history that kept the given n amount of past captured samples in memory and performed FFT and Mel filtering on the total amount. Currently I'm using a rolling window size of 2, but raising this would not be a problem. The effect of chaning this is reflected later on the exponential filters.
Mel Filter
I was looking around in Scott Lawson's project and found that it used Mel Filtering and rather custom built one in the repo. I found a few examples for Java but they were way more complex than I needed them to be. I settled for Sphinx4 library's implementation of it. To keep the dependencies as small as possible I copied the actual class files to my project which I made some changes to make it work in my project. To keep the story short Mel-frequency cepstrum is used to transform the data in a way that represents the human ear. Even though we measure the sound "loudness" in decibels (and definitely not in Magnitudes which is what FFT produces) the human ear doesn't completely work in a logarithmic scale. We hear certain frequencies a lot better so even in music visualization "just" transforming the magnitudes to decibels would not have the best results.
After filtering the FFT with Mel Filter you have as many spectrum elements as you have set the FFT-BIN number. I won't go into detail but this is usually set between 8-24 depending how different areas you would want to visualize. Also I set the visualizer to work between 50-20000hz which is the spectrum roughly that our ears could hear.
Turns out first initializing the filter bank with the values given to it require a lot of processing power. By far I had my Fragment tied to the Visualizer API event listener that was working on the main UI thread (which is something you should definitely not do) so I had to rethink and design the backend. In the time of this writing it's still not perfect but I'm planning to move the front end into Flutter that would give me more freedom to develop the service structure even further.
Speeding through all of the former have got is this far which is about half way there in terms of visualization itself.
Visualization
I decided to put this into it's own section since the following parts combined create the whole abstraction.
Linear Interpolation
At this point I was almost giving up. Huge topics after another stumbling onto a problem that required me to do interpolation was almost the breaking point. Turn out this was actually the faster part of the whole project (I think I just had beginners luck with this).
So the problem was that I needed to get Mel Filtered data into a size fitting for the actual LED -strip and the only feasible way of doing that without a huge (O) -notation was to perform a some sort of interpolation on the dataset. Thankfully Apache commons math had an interpolator but using it was a challenge of itself. Thankfully I got it working pretty quickly.
Exponential Filtering
With the great idea from this project I realized that for the best end results I would need to apply exponential filtering into my correctly sized dataset. The exponential filter is just a fancy word of clipping the values in the given array to a certain values given to the object when initialized. In addition to the rolling windowing this results in different decay and rising of the RGB LEDs. Combining this with some matrix manipulation with NDJ4 would result in a truly terrific visualization in the end. Even though the class is very small the use cases of it differ greatly. This was also one of the more complex ideas of the whole project even though the code doesn't seem that complex.
Mapping the values
This was actually half done already before even starting the whole visualization, but to keep the narrative together I decided it would be best to keep this near the ESP32 -section.
After getting the three R, G ,B arrays filtered I needed to send them over to the ESP32. I wanted to transform values in the arrays into Byte[] arrays. This was headache of itself since just casting the values in Java with (byte) would end up with overflown byte sometimes. This was because I needed to send the values as unsigned Bytes which just needed a small shift and everything was working correctly (of course this took me an embarrassingly long time to figure out). With the values cast to the correct type I created a bytestream with Java standard libraries and sent it via my bluetooth serial service. In the time of this writing the WiFi method is not implemented in the Android app but is already done with Java.
The sent array has length of 4 * LED-lenght. This is because the first index is is for the actual LED and the other three following that are for the RGB -values(represented in the picture at the bottom of this post).
ESP32 and 8226
In the beginning I had a version only made for the ESP-8226(nodemcu) variant, which used WiFi to send the UDP packets. I already knew in the beginning the Bluetooth would be the more natural way for keeping portability in mind. When developing the Android app, I decided that for the time being I would settle for Bluetooth serial -since it's much more easier to implement than BLE. Of course in the future a natural development path is to switch the power hungry serial mode for BLE for power consumption reasons.
Thankfully the ESP32 had a library that implemented the BT -serial mode, but like stated it really implemented just the serial part. I had to code learn and code the receiver part myself. It wasn't anything complex but working with C++ is always a fun ride. After grabbing the bytestream from bluetooth I placed the data into a char -variable (which is actually a unsigned byte[] in C++). From there on out visualizing was pretty straight forward using the NeopixelBus libary.
Hurray! We have a working visualizer when you start playing music!
Here is the other half of the algorithmic side.
For more technical documentation, refer to the Technical Specifications
Contact
Jaan Taponen