Loading...
3 results
Search Results
Now showing 1 - 3 of 3
- Aes white paper best practices in network audioPublication . Bouillot, Nicolas; Cohen, Elizabeth; Cooperstock, Jeremy R.; Floros, Andreas; Fonseca, Nuno; Foss, Richard; Goodman, Michael; Grant, John; Gross, Kevin; Harris, Steven; Harshbarger, Brent; Heyraud, Joffrey; Jonsson, Lars; Narus, John; Page, Michael; Snook, Tom; Tanaka, Atau; Trieger, Justin; Zanghieri, UmbertoAnalog audio needs a separate physical circuit for each channel. Each microphone in a studio or on a stage, for example, must have its own circuit back to the mixer. Routing of the signals is inflexible. Digital audio is frequently wired in a similar way to analog. Although several channels can share a single physical circuit (e.g., up to 64 with AES10), thus reducing the number of cores needed in a cable. Routing of signals is still inflexible and any change to the equipment in a location is liable to require new cabling. Networks allow much more flexibility. Any piece of equipment plugged into the network is able to communicate with any other. However, installers of audio networks need to be aware of a number of issues that affect audio signals but are not important for data networks and are not addressed by current IT networking technologies such as IP. This white paper examines these issues and provides guidance to installers and users that can help them build successful networked systems.
- Impulse response upmixing using particle systemsPublication . Fonseca, NunoWith the increase of the computational power of DSP's and CPU's, impulse responses (IR) and the convolution process are becoming a very popular approach to recreate some audio effects like reverb. But although many IR repositories exist, most IR consider only mono or stereo. This paper presents an approach for impulse response upmixing using particle systems. Using a reverse engineering process, a particle system is created, capable of reproducing the original impulse response. By re-rendering the obtained particle system with virtual microphones, an upmixing result can be obtained. Depending on the type of virtual microphone, several different output formats can be supported, ranging from stereo to surround, and including binaural support, Ambisonics or even custom speaker scenarios (VBAP).
- Segment Level Voice Conversion with Recurrent Neural NetworksPublication . Ramos, Miguel Varela; Black, Alan W.; Astudillo, Ramon Fernandez; Trancoso, Isabel; Fonseca, NunoVoice conversion techniques aim to modify a subject’s voice characteristics in order to mimic the one’s of another person. Due to the difference in utterance length between source and target speaker, state of the art voice conversion systems often rely on a frame alignment pre-processing step. This step aligns the entire utterances with algorithms such as dynamic time warping (DTW) that introduce errors, hindering system performance. In this paper we present a new technique that avoids the alignment of entire utterances at frame level, while keeping the local context during training. For this purpose, we combine an RNN model with the use of phoneme or syllablelevel information, obtained from a speech recognition system. This system segments the utterances into segments which then can be grouped into overlapping windows, providing the needed context for the model to learn the temporal dependencies. We show that with this approach, notable improvements can be attained over a state of the art RNN voice conversion system on the CMU ARCTIC database. It is also worth noting that with this technique it is possible to halve the training data size and still outperform the baseline.
