Prijedlozi #17968
Zatvorenvoip call centar
10%
Izmjenjeno od Ernad Husremović prije skoro 17 godina
http://www.freeswitch.org/node/42
We have just finished up the interface for our new speech recognition abstraction layer. Using the new API, it is now possible to attach a speech recognizer in the background that can react to particular speech and turn the text into events that are passed to the channel the same way as DTMF and text messages. We wrapped this up in a nice high level interface and made an example IVR that illustrates a pizza ordering system written in pure java-script.
http://www.freeswitch.org/eg/js/asr/pizza_js.html
The best part is we can also operate the speech recognizer at 16khz which vastly improves accuracy. It's even possible to do neat stuff like start a bridged call and have the recognizer still fire back events to your script for on-the-fly voice powered hangup or transfer and other cool ideas.
Izmjenjeno od Ernad Husremović prije skoro 17 godina
- Status promijenjeno iz Novo u Dodijeljeno
Izmjenjeno od Ernad Husremović prije skoro 17 godina
Izmjenjeno od Ernad Husremović prije skoro 17 godina
Izmjenjeno od Ernad Husremović prije skoro 17 godina
http://en.wikipedia.org/wiki/Pattern_recognition
What is HTK?
The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.
HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems. The HTK release contains extensive documentation and examples.
HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to build CUED's large vocabulary speech recognition systems (see CUED HTK LVR). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. See History of HTK for more details.
While Microsoft retains the copyright to the original HTK code, everybody is encouraged to make changes to the source code and contribute them for inclusion in HTK3.
Izmjenjeno od Ernad Husremović prije skoro 17 godina
http://wiki.freeswitch.org/wiki/Mod_pocketsphinx
Pocketsphinx is an open source speech recognition engine developed by Carnegie Mellon. mod_pocketsphinx allows FreeSWITCH™ to recognize speech.
- Works on Windows, Mac and Linux
- 8k and 16k acoustical models
- Semi-continuous recognition
- Great for smaller grammars.
Izmjenjeno od Ernad Husremović prije skoro 17 godina
http://www.speech.cs.cmu.edu/pocketsphinx/
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. It is released under the same permissive license as Sphinx itself.
Izmjenjeno od Ernad Husremović prije skoro 17 godina
http://cmusphinx.sourceforge.net/html/system.php
A complete speech recognition system will include data prepared using tools from outside sources, as well as programs available from this site.
Minimally, such a system will have an acoustic model trainer and a decoder, using audio data, a dictionary, and a language model possibly created outside. This page gives you pointers to tools and data that will allow you to create a full speech recognition system. Keep in mind, though, that building a working system requires knowledge in speech processing that this site cannot provide.
- Audio Data
- Open Source Models
- Dictionary
- Language Model
- Acoustic Model Trainer
- Decoder
Izmjenjeno od Ernad Husremović prije skoro 17 godina
Izmjenjeno od Ernad Husremović prije skoro 17 godina
- % završeno promijenjeno iz 0 u 10
Izmjenjeno od Ernad Husremović prije skoro 17 godina
Izmjenjeno od Ernad Husremović prije skoro 17 godina
- Speech Synthesizer (TTS)
- Speech Recognizer (ASR)
- Speaker Verifier (SV)
- Speech Recorder (SR)
- SIP (MRCPv2), RTSP (MRCPv1) session management
- SDP offer/answer model
- RTP media streaming
UniMRCP is an open source cross-platform MRCP implementation, which provides everything required for MRCP client and server side deployment. UniMRCP encapsulates SIP/MRCPv2, RTSP, SDP and RTP stacks inside and provides MRCP version independent user level interface for the integration.
Everybody is welcome to join the community, use and make the project better by participating in discussions, raising issues, providing patches.
UniMRCP PocketSphinxPlugin¶
PocketSphinx Plugin Available¶
posted Jul 1, 2009 9:59 AM by Arsen Chaloyan
I would like to announce the availability of PocketSphinx ASR plugin for UniMRCP server.
PocketSphinx UniMRCP server can be used with an MRCP compliant client, which supports JSGF grammar.
Currently supported ASR features are as follows:
- DEFINE-GRAMMAR
- RECOGNIZE
- GET-RESULT
- START-INPUT-TIMERS
- STOP
- START-OF-INPUT
- RECOGNITION-COMPLETE
- Noinput-Timeout
- Recognition-Timeout
- Completion-Cause
- Completion-Reason
- Save-Waveform
Grammar: JSGF
For the instructions on how to build and configure PocketSphinx with UniMRCP refer to
Izmjenjeno od Ernad Husremović prije oko 16 godina
- Status promijenjeno iz Dodijeljeno u Odbačeno