Audio AI Reading Notes

A curated list of papers and articles I’ve read recently, focused on neural audio, DSP, and music AI. Each entry includes the original source and my key takeaways. This page is being updated regularly. More […]


Denoising with Rectified Flow Models

This project trains a 1D convolutional neural network with time embeddings to remove noise from audio, inspired by Rectified Flow Models (RFM). What is a Rectified Flow Model? An RFM learns a velocity field that […]


Cosine Similarity

Recently, I experimented with a quick way to measure how similar two audio tracks are — not by comparing raw waveforms, but by looking at their spectral patterns. Using PyTorch and Torchaudio, I load two […]


Simple RNN for Speech Denoising

I recently built a lightweight speech denoiser using a GRU-based recurrent neural network that operates directly on raw audio frames. What I did: Implemented a SimpleDenoiserRNN with a single GRU layer and a linear output […]


HRTF Spatialization: Mono to Stereo

I took a plain mono audio file and transformed it into a stereo track that sounds like it’s coming from a real direction in space — using HRTF data from a SOFA file. HRTF (Head-Related […]


Neural Audio Fingerprinting

This is a basic implementation of neural audio fingerprinting using PyTorch and torchaudio. It serves as a solid foundation for tasks like music identification, audio search, and deep audio retrieval. CNN architecture, sample rate, and […]


Neural Audio Codec

This is a basic neural audio codec implementation using a convolutional encoder and decoder, along with a quantizer. The encoder compresses raw audio into a low-dimensional latent vector. A simple scalar quantizer then rounds the […]


Arc-Length-Based Sampling in Latent Space

Here, I morph between two points (A to B), through an optional control point, and compare arc-length based sampling to linear interpolation. Main idea comes from my implementation for curve point distribution in 2D latent […]


Pretrained Mono CNN in JUCE Plugin

—*Testing Stage* — I tried adding a pretrained mono CNN into a JUCE plugin to experiment with real-time audio processing, mainly thinking about a vocal denoiser. The process turned out simpler than I expected, and […]


Local Extremas and Nesterov Accelerated Gradient

To find a global minimum for a selected function, I compared brute force sweeping with Nesterov Accelerated Gradient (NAG) optimisation technique. By applying a small deltaX increments, I obtained local extremas in the range of […]


Arc Length by Simpson’s Rule

Simpson’s Rule is a classical numerical integration technique used to approximate definite integrals, especially when the integrand is difficult or impossible to integrate analytically. While it is a fundamental tool in numerical analysis and calculus, […]


Data Augmentation for MIDI: Transposing

One of data augmentation techniques for music dataset is transposing. In my opinion, it is quite useful for OMR training – image to notation process (score to music notation) like stem directions and clusters. However, […]