Data Augmentation for MIDI: Transposing
One of data augmentation techniques for music dataset is transposing. In my opinion, it is quite useful for OMR training – image to notation process (score to music notation) like stem directions and clusters. However, I disagree with it when it becomes related to musical training because transposing creates artificial scores which never been composed by real musicians. So, it actually manipulates the perceptual reality in the dataset.
Here, I implemented a transposing code for MIDI data folder to do in one go. “mido” library is very easy to use for MIDI data. If you want to work with MusicXML files, “music21” is another great library to work with. I decided to clamp MIDI range against edge cases. Maybe, a different approach could be used, it is open for a discussion.
import os from mido import MidiFile, MidiTrack def transpose_midi_folder(input_folder, output_folder, semitone_shift): os.makedirs(output_folder, exist_ok=True) for file in os.listdir(input_folder): if not file.lower().endswith(".mid"): continue input_path = os.path.join(input_folder, file) output_path = os.path.join(output_folder, file[:-4] + f"_transposed_{semitone_shift:+d}.mid") try: mid = MidiFile(input_path) transposed = MidiFile() transposed.ticks_per_beat = mid.ticks_per_beat # Ensure timing stays intact for track in mid.tracks: new_track = MidiTrack() for msg in track: if msg.type in ['note_on', 'note_off']: new_note = msg.note + semitone_shift new_note = max(0, min(127, new_note)) # Clamp to MIDI range new_msg = msg.copy(note=new_note, time=msg.time) new_track.append(new_msg) else: new_track.append(msg) transposed.tracks.append(new_track) transposed.save(output_path) print(f"Saved: {output_path}") except Exception as e: print(f"Error processing {file}: {e}") # === CONFIG === input_folder = "your_midi_folder" output_folder = "transposed_midi_folder" semitone_shift = 2 # (e.g. 2 -> two semitones up, -3 three semitones down) # === RUN === transpose_midi_folder(input_folder, output_folder, semitone_shift)[1] Mido – MIDI Objects for Python [2] End-to-end optical music recognition for piano form sheet music