Browsing by Author "Fonseca, Nuno"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- Aes white paper best practices in network audioPublication . Bouillot, Nicolas; Cohen, Elizabeth; Cooperstock, Jeremy R.; Floros, Andreas; Fonseca, Nuno; Foss, Richard; Goodman, Michael; Grant, John; Gross, Kevin; Harris, Steven; Harshbarger, Brent; Heyraud, Joffrey; Jonsson, Lars; Narus, John; Page, Michael; Snook, Tom; Tanaka, Atau; Trieger, Justin; Zanghieri, UmbertoAnalog audio needs a separate physical circuit for each channel. Each microphone in a studio or on a stage, for example, must have its own circuit back to the mixer. Routing of the signals is inflexible. Digital audio is frequently wired in a similar way to analog. Although several channels can share a single physical circuit (e.g., up to 64 with AES10), thus reducing the number of cores needed in a cable. Routing of signals is still inflexible and any change to the equipment in a location is liable to require new cabling. Networks allow much more flexibility. Any piece of equipment plugged into the network is able to communicate with any other. However, installers of audio networks need to be aware of a number of issues that affect audio signals but are not important for data networks and are not addressed by current IT networking technologies such as IP. This white paper examines these issues and provides guidance to installers and users that can help them build successful networked systems.
- Concatenative singing voice resynthesisPublication . Fonseca, Nuno; Ferreira, Anibal; Rocha, Ana PaulaThe concept of capturing the sound of “something” for later replication is not new, and it is used in many synthesizers. But capturing sounds and use them as an audio effect, is less common. This paper presents an approach for the resynthesis of a singing voice, based on concatenative techniques, that uses pre-recorded audio material as an high level semantic audio effect, replacing an original audio recording with the sound of a different singer, while trying to keep the same musical/phonetic performance.
- Impulse response upmixing using particle systemsPublication . Fonseca, NunoWith the increase of the computational power of DSP's and CPU's, impulse responses (IR) and the convolution process are becoming a very popular approach to recreate some audio effects like reverb. But although many IR repositories exist, most IR consider only mono or stereo. This paper presents an approach for impulse response upmixing using particle systems. Using a reverse engineering process, a particle system is created, capable of reproducing the original impulse response. By re-rendering the obtained particle system with virtual microphones, an upmixing result can be obtained. Depending on the type of virtual microphone, several different output formats can be supported, ranging from stereo to surround, and including binaural support, Ambisonics or even custom speaker scenarios (VBAP).
- Segment Level Voice Conversion with Recurrent Neural NetworksPublication . Ramos, Miguel Varela; Black, Alan W.; Astudillo, Ramon Fernandez; Trancoso, Isabel; Fonseca, NunoVoice conversion techniques aim to modify a subject’s voice characteristics in order to mimic the one’s of another person. Due to the difference in utterance length between source and target speaker, state of the art voice conversion systems often rely on a frame alignment pre-processing step. This step aligns the entire utterances with algorithms such as dynamic time warping (DTW) that introduce errors, hindering system performance. In this paper we present a new technique that avoids the alignment of entire utterances at frame level, while keeping the local context during training. For this purpose, we combine an RNN model with the use of phoneme or syllablelevel information, obtained from a speech recognition system. This system segments the utterances into segments which then can be grouped into overlapping windows, providing the needed context for the model to learn the temporal dependencies. We show that with this approach, notable improvements can be attained over a state of the art RNN voice conversion system on the CMU ARCTIC database. It is also worth noting that with this technique it is possible to halve the training data size and still outperform the baseline.
- Singing voice resynthesis using concatenative-based techniquesPublication . Fonseca, NunoSinging has an important role in our life, and although synthesizers have been trying to replicate every musical instrument for decades, is was only during the last nine years that commercial singing synthesizers started to appear, allowing the ability to merge music and text, i.e., singing. These solutions may present realistic results on some situations, but they require time consuming processes and experienced users. The goal of this research work is to develop, create or adapt techniques that allow the resynthesis of the singing voice, i.e., allow the user to directly control a singing voice synthesizer using his/her own voice. The synthesizer should be able to replicate, as close as possible, the same melody, same phonetic sequence, and the same musical performance. Initially, some work was developed trying to resynthesize piano recordings with evolutionary approaches, using Genetic Algorithms, where a population of individuals (candidate solutions) representing a sequence of music notes evolved over time, tries to match an original audio stream. Later, the focus would return to the singing voice, exploring techniques as Hidden Markov Models, Neural Network Self Organized Maps, among others. Finally, a Concatenative Unit Selection approach was chosen as the core of a singing voice resynthesis system. By extracting energy, pitch and phonetic information (MFCC, LPC), and using it within a phonetic similarity Viterbi-based Unit Selection System, a sequence of internal sound library frames is chosen to replicate the original audio performance. Although audio artifacts still exist, preventing its use on professional applications, the concept of a new audio tool was created, that presents high potential for future work, not only in singing voice, but in other musical or speech domains.
- Singing voice resynthesis using vocal sound librariesPublication . Fonseca, Nuno; Ferreira, AníbalAlthough resynthesis may seem a simple analysis/synthesis process, it is a quite complex task, even more when it comes to recreating a singing voice. This paper presents a system whose goal is to start with an original audio stream of someone singing and recreate the same performance (melody, phonetics, dynam-ics) using an internal vocal sound library (choir or solo voice). By extracting dynamics and pitch information, and looking for phonetic similarities between the original audio frames and the frames of the sound library, a completely new audio stream is created. The obtained audio results, although not perfect (mainly due to the existence of audio artifacts), show that this technologi-cal approach may become an extremely powerful audio tool.
