AudioResearchBlog

Covering all audio related stuff with special focus on programming and digital signal processing

Archive for the 'speech' Category

Conferencia ‘Tackling the Acoustic Front-end for Distant-Talking Automatic Speech’ en BsAs

Posted by hordia on 1st September 2007

Me llega vía mail/boletín de IEEE Argentina que el próximo lunes 3 de septiembre se va a dar la conferencia ‘Tackling the Acoustic Front-end for Distant-Talking Automatic Speech‘ dictada por el Prof. Dr.-Ing. Walter Kellerman, conferencista distinguido de la IEEE Signal Processing Society. Esto será en la Sede de IEEE / CICOMRA, con entrada libre y gratuita.

 

Temario

With the ever-growing interest in ‘natural’ hands-free acoustic human/machine interfaces, the need for according distant-talking automatic speech recognition (ASR) systems increases. Considering interactive TV as a challenging exemplary application scenario, we investigate the structural problems presented by noisy and reverberant multi-source environments with unpredictable interference and acoustic echoes of loudspeaker signals, and discuss current acoustic signal processing techniques to enhance the input to the actual ASR system. Special attention is paid to reverberation, which affects speech recognizers much more than human listeners, and a recently published method incorporating a reverberation model on the feature level of ASR is discussed.

 

Sobre el orador (para más datos ver este link)

Walter Kellermann is Professor for communications at the Chair of Multimedia Communications and Signal Processing of the University of
Erlangen-Nuremberg, Germany. His current research interests include speech signal processing, array signal processing, adaptive filtering, and its applications to acoustic human/machine interfaces. He received the Dipl.-Ing. (univ.) degree in Electrical Engineering from the University of Erlangen-Nuremberg in 1983, and the Dr.-Ing. degree (’with distinction’) from the Technical University Darmstadt, Germany, in 1988. From 1989 to 1990, he was a Postdoctoral Member of Technical Staff at AT&T Bell Laboratories, Murray Hill, NJ. In 1990, he joined Philips Kommunikations Industrie, Nuremberg, Germany. From 1993 to 1999 he was a professor at the Fachhochschule Regensburg before he joined the University Erlangen-Nuremberg as a professor and head of the audio research laboratory in 1999 (for more see http://www.LNT.de/audio). In 1999 he co-founded the consulting firm DSP Solutions. Dr. Kellermann authored or co-authored eight book chapters and more than 100 refereed papers in journals and conference proceedings. He served as a guest editor to various journals, as an associate editor and guest editor to IEEE Transactions on Speech and Audio Processing from 2000 to 2004, and presently serves as associate editor to the EURASIP Journals on Signal Processing and on Advances in Signal Processing. He was the general chair of the 5th International Workshop on Microphone Arrays in 2003 and the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics in 2005. For 2007 and 2008 he is a Distinguished Lecturer of the IEEE Signal Processing Society.

 

Datos de la conferencia

Fecha y hora: Lunes 3 de setiembre a las 19:00
Lugar: Auditorio IEEE/CICOMRA, Av. Córdoba 744 Piso 1 B, Buenos Aires
Inscripción: No es arancelada, pero se solicita inscripción previa vía web completando el formulario disponible aca. Alternativamente por e-mail a sec.argentina@ieee.org citando ‘Conferencia SPS-01‘ o por teléfono a IEEE / CICOMRA (011) 4325 8839.

No hay charlas ni mucho movimiento sobre este tipo de cosas por aca, asi que voy a tratar de ir… y después, de hacerme tiempo para un review de la misma.

order propeciacheap propeciacheap ventolinventolindiscount xenicalxenicaldiscount revatiobuy revatiofemale viagra onlineorder female viagracompare viagra cialisviagra cialis free deliverycheap vpxlvpxl onlinelevitra professionalbuy levitra professionalpurchase levitraorder levitrabuy levitracialis jelly pricecialis jellycialis soft tabscialis softcialis super active pricecialis super activediscount generic cialischeap generic cialiscialis professionalcialis professional onlinecialis free shippingcialis pricebuy cialisorder brand viagrabuy brand viagraorder viagra jellyviagra jelly priceviagra soft tabsbuy viagra soft tabsviagra super active pricecheapviagra super activegeneric viagracheap generic viagraviagra professional onlinebuy viagra professionalbuy viagra prescriptionviagra pricecheap viagraviagra onlinebuy cialis brand namewhat is cialistadalafil cialisbuy cialis overnight deliverycialis sale overnight shippingcialis safe secure online shoppingcialis mail ordercialis free consultationcialis best price buy onlinecialis anti impotencebuy cialis fast shippingbuy cialis drug online rxbuy cialis canadabuy cialis by mailbuy cialis by checkwhere to buy viagrawhat is viagraviagra purchaseorder forms for buying viagracheap viagra overnight deliverycheap viagra fast shippingbuy kamagra viagrabuy generic viagra canadaviagra drugs order brand pillviagra best quality lowest pricesviagra best prices fda approvedselling viagra onlinesell viagra onlinepurchase viagra professionalpurchase viagra onlinepurchase viagrapurchase generic viagra onlinediscount price viagradiscount price sale viagrabuy viagra in englandbuy viagra consumers discountdiscount acompliacheap acompliadiscount propeciacheap propeciaclomid pricecheap clomidorder revatiocheap revatiocheap female viagrafemale viagra pricecompare viagra cialisviagra prescriptionvpxl pricevpxl onlinelevitra professional pricecheap levitra professionalbuy levitra prescriptionorder levitrabuy levitra order cialis jellybuy cialis jellyorder cialis soft tabsbuy cialis soft tabsorder cialis super activebuy cialis super activeorder generic cialisgeneric cialischeap cialis professionalcialis professional onlinediscount brand cialisbrand cialis onlinebuy cialis prescriptioncialis pricecheap cialisdiscount brand viagrabrand viagracheap viagra jellyviagra jelly pricecheap viagra soft tabsviagra soft tabs priceviagra super active pricecheap viagra super activegeneric viagra pricecheap generic viagraviagra professional pricecheap viagra professionalbuy viagra prescriptionviagra pricecheap viagra


, , , , , ,

Posted in audio, acoustics, signal processing, Castellano, speech, talks, conferences | No Comments »

Introducción a CLAM

Posted by hordia on 12th July 2007

CLAM es un completo framework para hacer investigación y desarrollo sobre audio y música (esto también incluye aplicaciones para usuarios finales). Ofrece un modelo conceptual y herramientas para el análisis, la síntesis y el procesamiento de señales de audio. Tiene una interfaz muy amigable, es Software Libre, multiplataforma y esta escrito en C++ (en muchas de sus aplicaciones utiliza tiempo real).

A pesar de que tuvo su origen en una Universidad de Barcelona, España, la documentación en español sobre este framework es escasa, asi que me decidí a hacer una pequeña introducción sobre las cosas básicas, pero con links (eso si, la mayoria en inglés) para el que quiera ir más allá. Pienso que le puede servir a más de uno para empezar.

Básicamente hay dos perfiles: el de usuario final (de las aplicaciones) y el de desarrolladores que quieran escribir sus propios programas sobre este framework.

En este momento se compone de 4 programas principales:
 
NetworkEditor:
Es una aplicación que permite conectar módulos en forma de red de procesamiento al estilo pd (pero mucho más amigable), MaxMSP o Reaktor (o para los que usan matlab, tipo simulink). Estas redes se ejecutan en tiempo real y se pueden correr con jack, portaudio, LADSPA o VST como backend.
Una de las características más interesantes es que esta red se puede exportar y después correr con una interfaz gráfica diseñada con QTDesigner (ambos programas exportan a un xml que luego se corre con la aplicación Prototyper)

Recomiendo ver esta presentación: “Visual prototyping of audio applications

Es decir, un usuario que no es programador puede armar complejos plugins o aplicaciones sin escribir una sola línea de código. También es muy útil para armar prototipos de futuras aplicaciones o desarrollos.
En este momento se esta integrando con LADSPA, lv2 y se planea reforzar aún más la posibilidad de usar plugins externos (como un módulo más) dentro del NetworkEditor y vicerversa, usar estas redes como plugins en otras aplicaciones.

Para el que quiera empezar, recomiendo esto:

 
Annotator:
Es un programa para hacer transcripciones, en el estilo de Sonic Visualizer. Muy potente y con características que lo hacen único.

Para conocer más:

 
SMSTools:
Un analizador de señales de audio en el estilo de wavesurfer que soporta diferentes tipos de visualización como spectogramas, y todas las derivadas del módelo Sinusoides + Residuo asi como trasnformaciones complejas basadas en este modelo (gender change, pitch-shifting, morph, etc) y muchas otras cosas más (ver tutorial).

 
Voice2MIDI:
Convierte voz en MIDI. Esta comentado en este artículo de linuxjournal.

 
Para desarrolladores, sirve como entorno para realizar sus propias aplicaciones de forma fácil o como herramienta para hacer prototipos de sus futuras implementaciones.

Recomiendo:

Si uno quiere, puede aportar al proyecto mandando ‘patchs’ de código a los desarrolladores principales y hasta convertirse en ‘developer’ luego de haber mandado varios de ellos.

 
En fin, es un proyecto bastante grande y ambicioso. Incluso hay miles de desarrollos más sobre el mismo que no están en el ‘paquete principal’, pero me pareció útil dar un panorama general porque tal vez sea tan abarcativo que maree un poco para el que recién escucha algo de él.

Como ya dije, es multiplataforma y esta disponible para GNU/Linux (con paquetes para varias distribuciones), Windows y Mac (ver más y descargar)

Otro links interesantes:


, , , , , , , , , , , , , , , , , , , , ,

Posted in audio, algorithms, effects, signal processing, music, free software, programming, GNU/Linux, GPL, c++, noise, libraries, midi, publications, projects, Castellano, CLAM, standards, speech, GSoC2007, GUI, library | No Comments »

Starts a ’summer’ of code for me

Posted by hordia on 17th April 2007

Last week I got accepted into google’s summer of code program, so I will be with this on summer… ehm s/summer/winter here… ;-)

I’m very happy with that!

Google granted 6 students to CLAM, so it’s a big success!!! All applications are listed here.

The scope of my app may vary (or change totally! see below) because I still have to have a meeting with my mentor and adjust some details. Indeed seems that maybe could be another of my gsoc’s applications. Beyond that, of course it will be released under GPL.

ATM, it will be:

Title: Educative Vowel Synthesizer
Mentor: Pau Arumí Albó
License: GNU General Public License (GPL)
Abstract: The main goal of this project is to build an application that let the user to synthesizing different vowels by placing a point within the vowel triangle, and the reverse, given an input vowel from the microphone place a dot on the triangle. For example, this is useful to students who can check their pronunciation. This includes displaying the mouth position for the vowel, visualizing the spectral peaks (and identify the effect), changing the pitch and vocal track characteristics.
A teacher could limit the set of vowels to the ones used for a particular language such as Catalan or English, so that the students just see the relevant ones for the exercise. Also includes some didactict games about identify the vowels by his spectral content.

I really don’t know ATM if it’s going to change or not, but that is what the gsoc page says so far ;-)

CONFIRMED: it’s going to change, maybe something with real time sms transformations and real time voice2midi, news soon… here: “My GSoC2007 application: Real-time spectral transformations

My mentor will be Pau Arumí Albó, one of the main developers of CLAM project and part of Universitat Pompeu Fabra (Barcelona, Spain). I meet him recently on irc channels and mailing lists and he seems very kind. He’s a free software enthusiastic and teachs software engineering. He also developed other free software like testfarm and MiniCppUnit and has many publications. Recently, for example, he was at LAC2007 conference with David García Garzón both showing the work: “Visual prototyping of audio applications“.

I had heard some about CLAM before this GSoC (i had played a little with it too) and indeed starting developing with was in my (long)ToDo. But i didn’t knew that they were open to new developers too, so as fast i get know about its GSoC participation (the bad thing was that was not much before the deadline) i had no doubts about try to apply. For luck, google extended the deadline a little and the time was enough to present (’at least only’) 4 app’s, but i had presented 20 (GSoC limit!) if i had enought time. That time i was thinking, researching about CLAM, imagining, reading some source code and documentation and writing proposals of course, hehehe. Seems that i overstating but is true :-) I had discarded from first the idea of apply to a different organization.

My first goal was to have a good opportunity to develop to or under this framework and even I was decided to keep close to the project beyond the results. I came here to learn a lot and give my best. They gave me a warm welocome from the beginning. I wish to be able continue developing to CLAM after completing GSoC (and completing it well of course!).

Some days ago, I already introduced me to the CLAM community here and also I have become member of his planet! Great!

And last but not least i want to express all my thanks for the opportunity to the entire CLAM development group! thanks!!!


, , , , , , , , , , ,

Posted in audio, algorithms, programming, GNU/Linux, GPL, c++, libraries, projects, English, CLAM, speech, GSoC2007 | No Comments »

 
Cerrar
Enviar por Correo