Browsing by Author "Fonseca, Nuno"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Aes white paper best practices in network audioPublication . Bouillot, Nicolas; Cohen, Elizabeth; Cooperstock, Jeremy R.; Floros, Andreas; Fonseca, Nuno; Foss, Richard; Goodman, Michael; Grant, John; Gross, Kevin; Harris, Steven; Harshbarger, Brent; Heyraud, Joffrey; Jonsson, Lars; Narus, John; Page, Michael; Snook, Tom; Tanaka, Atau; Trieger, Justin; Zanghieri, UmbertoAnalog audio needs a separate physical circuit for each channel. Each microphone in a studio or on a stage, for example, must have its own circuit back to the mixer. Routing of the signals is inflexible. Digital audio is frequently wired in a similar way to analog. Although several channels can share a single physical circuit (e.g., up to 64 with AES10), thus reducing the number of cores needed in a cable. Routing of signals is still inflexible and any change to the equipment in a location is liable to require new cabling. Networks allow much more flexibility. Any piece of equipment plugged into the network is able to communicate with any other. However, installers of audio networks need to be aware of a number of issues that affect audio signals but are not important for data networks and are not addressed by current IT networking technologies such as IP. This white paper examines these issues and provides guidance to installers and users that can help them build successful networked systems.
- Impulse response upmixing using particle systemsPublication . Fonseca, NunoWith the increase of the computational power of DSP's and CPU's, impulse responses (IR) and the convolution process are becoming a very popular approach to recreate some audio effects like reverb. But although many IR repositories exist, most IR consider only mono or stereo. This paper presents an approach for impulse response upmixing using particle systems. Using a reverse engineering process, a particle system is created, capable of reproducing the original impulse response. By re-rendering the obtained particle system with virtual microphones, an upmixing result can be obtained. Depending on the type of virtual microphone, several different output formats can be supported, ranging from stereo to surround, and including binaural support, Ambisonics or even custom speaker scenarios (VBAP).
- Segment Level Voice Conversion with Recurrent Neural NetworksPublication . Ramos, Miguel Varela; Black, Alan W.; Astudillo, Ramon Fernandez; Trancoso, Isabel; Fonseca, NunoVoice conversion techniques aim to modify a subject’s voice characteristics in order to mimic the one’s of another person. Due to the difference in utterance length between source and target speaker, state of the art voice conversion systems often rely on a frame alignment pre-processing step. This step aligns the entire utterances with algorithms such as dynamic time warping (DTW) that introduce errors, hindering system performance. In this paper we present a new technique that avoids the alignment of entire utterances at frame level, while keeping the local context during training. For this purpose, we combine an RNN model with the use of phoneme or syllablelevel information, obtained from a speech recognition system. This system segments the utterances into segments which then can be grouped into overlapping windows, providing the needed context for the model to learn the temporal dependencies. We show that with this approach, notable improvements can be attained over a state of the art RNN voice conversion system on the CMU ARCTIC database. It is also worth noting that with this technique it is possible to halve the training data size and still outperform the baseline.
- Singing voice resynthesis using concatenative-based techniquesPublication . Fonseca, NunoSinging has an important role in our life, and although synthesizers have been trying to replicate every musical instrument for decades, is was only during the last nine years that commercial singing synthesizers started to appear, allowing the ability to merge music and text, i.e., singing. These solutions may present realistic results on some situations, but they require time consuming processes and experienced users. The goal of this research work is to develop, create or adapt techniques that allow the resynthesis of the singing voice, i.e., allow the user to directly control a singing voice synthesizer using his/her own voice. The synthesizer should be able to replicate, as close as possible, the same melody, same phonetic sequence, and the same musical performance. Initially, some work was developed trying to resynthesize piano recordings with evolutionary approaches, using Genetic Algorithms, where a population of individuals (candidate solutions) representing a sequence of music notes evolved over time, tries to match an original audio stream. Later, the focus would return to the singing voice, exploring techniques as Hidden Markov Models, Neural Network Self Organized Maps, among others. Finally, a Concatenative Unit Selection approach was chosen as the core of a singing voice resynthesis system. By extracting energy, pitch and phonetic information (MFCC, LPC), and using it within a phonetic similarity Viterbi-based Unit Selection System, a sequence of internal sound library frames is chosen to replicate the original audio performance. Although audio artifacts still exist, preventing its use on professional applications, the concept of a new audio tool was created, that presents high potential for future work, not only in singing voice, but in other musical or speech domains.
