Mark Sheeky Artworks

Auto-Phoneme

I hate meetings, and fortunately, I rarely have to go to any. One time consuming problem is taking the minutes, making notes, keeping a record. I thought today of a new way to speed this up.

A live meeting on Microsoft Teams or Zoom could have the audio recorded, and translated into written language too, though the latter is not very accurate. An accurate translation might be ideal, but it would have to recognise all words and accurately transcribe all languages as well as invented, new words as a human typographer would.

The sound could be recorded, but this is a huge increase in data storage even for a 48kbps (low quality) MP3 file. Meetings tend to be long, numerous and frequent and almost ubiquitous in the modern world. The quantity of data is a cost, as well as the cost of the work of all of the transcription, the red tape of transcription can be a significant drain on any organisation.

I thought that the two could be crossed to solve the transcription problem of the first. Rather than whole words, phonemes, short sections of the sounds of words, could be translated from the source audio, and during playback reproduced artificially at the same time and volume as the recording. This phoneme-coded data could be a simple list (phoneme/time/volume) and the resulting audio-stream should be intelligible while having very low storage. The lack of transcription of the words would be a benefit; it would support all languages and all future words and not require a dictionary or database. Current transcriptions from audio are not designed to play back those transcriptions at the correct time/volume and this information is very emotionally important to the meaning of phrases, and something which is lost in all but full recordings.