

=======================================================================
Statistical research on MIDI files.
=======================================================================
MIDI stands for 'Musical Instrument Digital Interface'.

Before my retirement there were some interesting publications about
mathematical research regarding the style of works in literature 
and music. In literature it was possible to scan books for frequency
of expressions, sentence lengths, word choice etc., as long as those
works were available in computer readable form; tons of cards
were punched in order to provide that. 

In music, it was somewhat harder. Someone wanting to do research needed to
encode musical scores painstakingly by hand to do the same. Worse:
there was no computer readable representation for music such as the EBCDIC
and ASCII for written language. The MIDI File Format was standardized
in the early 1980s to allow for a portable representation of musical
data across any platform.

Currently one can find literally tens of thousands of MIDI files on the
internet containing music as diverse as Bach and Metallica.  Because
of the standard format of these files, it is possible to reproduce the
music on any General MIDI capable sequencer and have it sound roughly
the same on all platforms, except for instrument quality.  Because the
music is represented by note, it is possible to edit, transpose, print
scores, change instruments, tempo, etc... without much effort at all.

Midi files can easily be created by using various computer and synthesizer
packages (cakewalk, playmidi, winjammer or similar).  A user can either
enter notes directly or have the computer record a live performance from
a midi instrument.  We can even scan well printed musical notes and
transform the resulting TIF file into a MIDI file using a program like
MIDISCAN (commercially available).

So it might be of some interest to set up statistics that allow comparisons
and classifications of comparable musical works.

The strictly formal definition of MIDI terms  helps to fulfill the demand of
statistics 'ceteris paribus'.

It is also possible to write a program that scans MIDI texts for given
tunes or motifs (retrieval).

MIDI defines notes numerically by semitones with middle C = 64.  Therefore
C sharp/D flat would be 65.  An octave below middle C whould be 52.  The
numbers range from 0 to 127 which allows for almost 10 octaves of range
(beyond human hearing) which most equipment does not implement fully.
The formal definition of a semitone is 2^(1/12) times the frequency
of the preceeding note.   Pitchbend may also be applied to vary
the pitch off of the standard tuning by a set amount (generally 1/4096th
of a semitone, up to one whole step).  Standard tuning is also definable
on most hardware, but some are fixed to middle A = 440Hz.

Four programs have been developed so far, all still under construction, ie.:
not free of bugs.

1) The ftp server ftp.gwd.de contains under the path 
      pub/linux/mirrors/sunsite/apps/sound/players
   the file playmidi-2.3.tgz by Nathan Laredo
   The program 'mftext' (LINUX executable)and its source 'mftext.c' 
   was in an older version and is included here.
   mftext translates MIDIfiles into readable  text and a shell
   script  'filter' reduces it to serve our purpose. 
  
2) 'midstat', source code printed below. it computes statistics on intervals,
   lengths of tones and the frequency of chords (major, minor, sept 
   for any key).
   The main object of observation is a vector of 128 int. values contai-
   ning the volumes (velocity) of the simultaneously sounding
   tones. The cell number of the vector represents the the pitch 
   (0 thru 127). Tones that do not sound are set to 0. Each tone event 
   updates this vector (array). 
   For finding chords each tone is mirrored into the lowest octave
   by converting its number modulo 12. When 3 or more tones are temporarily
   sounding the corresponding bits of a short int number are set and
   this value is looked up in a list of well known chords.
   The intervals are measured as differences between the highest 
   pitch tones in succession.

3) 'tonarten' finds for any duration which key the piece is in.
   (But, a good midi file should contain a midi key signature event)
   The program can probably be used for statements about the composer's
   way of modulation. (source code printed below)
   The program works like this: take one tone and eliminate from 
   a list of all keys those which do not contain the given tone.
   continue with the next tone and so on until all keys are 
   eliminated. go one step back and take those which are left. 
   When there is only one left, the passage was undoubtedly in this
   key.   (This will probably be foiled by accidentals)
   
4) 'motive' compares a given melody with all the subsequences of a
   song and computes the correlation coefficient, thus finding
   phrases and similarities.(also listed below). The correlation
   coefficient informs about the probability of a quotation.

These programs are activated by typing their names and giving the name
of the reduced MIDI file and the channels to be considered
as parameters.


Everybody is invited to participate in our discussion and any cooperation
will be highly appreciated. Please find the bugs in above listed 
programs, contribute further programs, replace our programs with better
ones, tell us about similar activities and ftp servers with MIDI 
and wrk files. 


Please contact

jmau@gwdg.de and/or gkoch@gwdg.de

Papers:

1) W.Fucks & J.Lauter, Exaktwissenschaftliche Musikanalyse, Koeln 1965.
   in:Forschungsberichte des Landes Nordrhein-Westfalen 1519.

2) W.Fucks, Mathematische Analyse der Formstruktur von Musik, Koeln  1958.
   in: Fforschungsberichte des Wirtschafts- und Verkehrsministeriums
   Nordrhein-Westfalen 357.

3) R.Wille, Musiktheorie und Mathematik, Darmstadt 1985 ,Fachbereich 
   Mathematik, Technische Hochschule Darmstadt, 870.


Above mentioned programs will be found:

FTP      gwdg.de/pub/sound/midi
==============================================================================
