Hi. I'm Bohdan Trotsenko
And this is how my name looks on spectrum
I have invented the simplest acoustic model:
A bit of explanation:
- What's on video is frequencies (spectrum), lower frequencies shown lower.
- Blue part is the preview (seeing frequencies as they are played
seem to be very "sudden" to perceive).
- Frequencies are normalized - so if you see the voice part "take over" and engine/music/other sounds fade
that's because voice part is more significant and normalization makes other sounds fade.
I have more videos on my youtube channel
and on the dedicated facebook
Questions and answers
- The simplest?
I can probably write it in about 30 lines of quite simple Python (no dependencies) – it's that simple.
It's so simple that sometimes I think it could be found with brute force of simplest algorithms over an input.
But that's just an author's bias, of course: I had to go through a couple hundred non-working ones over months
to find this one.
It's not Fourier. (Wow, I compete with
It's extremely lightweight – I can produce a spectrum using less than 10 * N * F
It's highly likely that I have invented the most efficient/lightweight spectrum algorithm ever!
It's also conceptually the simplest. I think I could convey the idea in 5 sentences
to anyone who passed codejam's round 1 (or, maybe, round 2).
A4 sheet of information would be enough for a 14 year old (probably, a bit math inclined).
- More on comparison with FFT
- Fourier transform works perfectly on a periodic signal.
In order to be applied to sound, a 'sliding window trick' is used.
Moreover: as a random window doesn't perfectly match at ends, another 'smoothing' must be used.
Pick a window too wide, and you're likely to lose high frequencies.
Pick a window too narrow, and you're restricted from analyzing low frequencies.
All this suggests that there's a better way.
Please compare analysis of modem handshake sound: FFT
Or this FFT vs mine.
Or violin sound decomposed.
Facebook's wav2letter uses 25 ms sliding window and 10 ms stride.
I simply don't need this.
While technically FFT can be computed O(N·logN) at best, given that the sliding window's length is fixed,
that's O(N) time complexity to process the signal.
However, in practice my method would need 10-100
times less operations to build a spectrogram.
- More on comparison with ...
- I tried searching the web for existing solution. What I have is far from
log power spectrum or
Mel-frequency cepstrum or
linear predictive coding
(which is a bit obvious after the statement on simplicity, but I'm constantly asked about this).
- Possible applications
- The algorithm can largely improve the quality of home speakers and online speech-to-text services.
It can largely save battery on phones (especially for hot phrase detection)
and even enable speech recognition on watches and earbuds!
It's certainly useful for all kinds of signal processing. E.g. real time software-defined radios could raise their limits.
- Can be improved
- ... but I need to spend additional 100-200 hours on research.
Plan A. Cooperate with a Fortune 500 company; potentially patent the algorithm (which is questionable given the simplicity);
grant license for 5 years.
Plan B. Proceed with creating a speech-to-text service.
- Can I try it?
Me in social networks:
My facebook page,
my stackoverflow page,
Other (and somewhat outdated links):
Old coding blog,
old thoughts on life.
Updated on: Feb 22, 2020.