Audio AI Reading Notes
A curated list of papers and articles I’ve read recently, focused on neural audio, DSP, and music AI. Each entry includes the original source and my key takeaways. This page is being updated regularly. More […]
A curated list of papers and articles I’ve read recently, focused on neural audio, DSP, and music AI. Each entry includes the original source and my key takeaways. This page is being updated regularly. More […]
This project trains a 1D convolutional neural network with time embeddings to remove noise from audio, inspired by Rectified Flow Models (RFM). What is a Rectified Flow Model? An RFM learns a velocity field that […]
Recently, I experimented with a quick way to measure how similar two audio tracks are — not by comparing raw waveforms, but by looking at their spectral patterns. Using PyTorch and Torchaudio, I load two […]
I recently built a lightweight speech denoiser using a GRU-based recurrent neural network that operates directly on raw audio frames. What I did: Implemented a SimpleDenoiserRNN with a single GRU layer and a linear output […]
I took a plain mono audio file and transformed it into a stereo track that sounds like it’s coming from a real direction in space — using HRTF data from a SOFA file. HRTF (Head-Related […]
This is a basic implementation of neural audio fingerprinting using PyTorch and torchaudio. It serves as a solid foundation for tasks like music identification, audio search, and deep audio retrieval. CNN architecture, sample rate, and […]
This is a basic neural audio codec implementation using a convolutional encoder and decoder, along with a quantizer. The encoder compresses raw audio into a low-dimensional latent vector. A simple scalar quantizer then rounds the […]
Here, I morph between two points (A to B), through an optional control point, and compare arc-length based sampling to linear interpolation. Main idea comes from my implementation for curve point distribution in 2D latent […]
—*Testing Stage* — I tried adding a pretrained mono CNN into a JUCE plugin to experiment with real-time audio processing, mainly thinking about a vocal denoiser. The process turned out simpler than I expected, and […]
To find a global minimum for a selected function, I compared brute force sweeping with Nesterov Accelerated Gradient (NAG) optimisation technique. By applying a small deltaX increments, I obtained local extremas in the range of […]
Simpson’s Rule is a classical numerical integration technique used to approximate definite integrals, especially when the integrand is difficult or impossible to integrate analytically. While it is a fundamental tool in numerical analysis and calculus, […]
One of data augmentation techniques for music dataset is transposing. In my opinion, it is quite useful for OMR training – image to notation process (score to music notation) like stem directions and clusters. However, […]
Content coming soon.
Content coming soon.