Data Augmentation for MIDI: Transposing

One of data augmentation techniques for music dataset is transposing. In my opinion, it is quite useful for OMR training – image to notation process (score to music notation) like stem directions and clusters. However, I disagree with it when it becomes related to musical training because transposing creates artificial scores which never been composed by real musicians. So, it actually manipulates the perceptual reality in the dataset.

Here, I implemented a transposing code for MIDI data folder to do in one go. “mido” library is very easy to use for MIDI data. If you want to work with MusicXML files, “music21” is another great library to work with. I decided to clamp MIDI range against edge cases. Maybe, a different approach could be used, it is open for a discussion.

 

import os
from mido import MidiFile, MidiTrack

def transpose_midi_folder(input_folder, output_folder, semitone_shift):
    os.makedirs(output_folder, exist_ok=True)

    for file in os.listdir(input_folder):
        if not file.lower().endswith(".mid"):
            continue

        input_path = os.path.join(input_folder, file)
        output_path = os.path.join(output_folder, file[:-4] + f"_transposed_{semitone_shift:+d}.mid")

        try:
            mid = MidiFile(input_path)
            transposed = MidiFile()
            transposed.ticks_per_beat = mid.ticks_per_beat  # Ensure timing stays intact

            for track in mid.tracks:
                new_track = MidiTrack()
                for msg in track:
                    if msg.type in ['note_on', 'note_off']:
                        new_note = msg.note + semitone_shift
                        new_note = max(0, min(127, new_note))  # Clamp to MIDI range
                        new_msg = msg.copy(note=new_note, time=msg.time)
                        new_track.append(new_msg)
                    else:
                        new_track.append(msg)
                transposed.tracks.append(new_track)

            transposed.save(output_path)
            print(f"Saved: {output_path}")
        except Exception as e:
            print(f"Error processing {file}: {e}")

# === CONFIG ===
input_folder = "your_midi_folder"
output_folder = "transposed_midi_folder"
semitone_shift = 2 # (e.g. 2 -> two semitones up, -3 three semitones down)

# === RUN ===
transpose_midi_folder(input_folder, output_folder, semitone_shift)
[1] Mido – MIDI Objects for Python

[2] End-to-end optical music recognition for piano form sheet music