Boranbayev A.S.
L.N. Gumilyov Eurasian National University, Astana,
Kazakhstan
METHODS OF DEVELOPMENT OF MULTILINGUAL VOICE ENABLED INFORMATION SYSTEMS
This paper
addresses issues related to the development of multilingual voice portals in
Kazakhstan, where Russian one of the main languages in the country, and English
is being studied by the population. I
will discuss the technology behind these voice applications. I will explain why I think that VoiceXML
(Extensible Markup Language) is a great way to develop voice recognition
applications over the telephone. I will
show you an approach for building a multilingual voice application by using
technologies such as VoiceXML [1].
Instead of
talking to your computer, you're essentially talking to a web site, and you're
doing this over the phone. Not all
people have easy access to the Internet due to geographical location and lack
of computer knowledge. On the other hand
a lot of people have access to a telephone or a wireless network. Many enterprises are already taking advantage
of the Internet to integrate both Internet and voice recognition technologies
into a service that allows easy access to the Internet via the telephone. This will allow us to access the Internet
from anywhere using such devices as telephone, or Voice over Internet Protocol
[2].
A lot of
people in Kazakhstan have access to telephone lines and personal computers, but
very few people in Kazakhstan have Internet access. Voice portals allow us to access Web
information using a voice interface and as such have a major role to play in
Kazakhstan. Establishment of voice
portals in Kazakhstan has been almost nonexistent until now, unlike in
developed countries, such as United States.
Voice
application consist of technology components that allow the interaction between
the voice application server and the user.
These components are: a dialog manager, a speech recognition module, a
module which understands the language, and a text-to-speech module, which also
includes speech synthesis. The dialogue
manager dictates the sequence of prompts and responses or dialogue states,
interfaces to information and audio databases, and manages the telephony calls
for instance call transfer in the case of a directory enquiries system. Additional modules can be included such as
speaker verification. In the case of a
multilingual voice portal, a language identification module would be added and
would be responsible for the automatic switching between Russian and English.
VoiceXML
technologies have multilingualism built in, but since Russian belongs to a
class of languages that are not represented by Roman scripts, the
implementation of our voice portals has certain difficulties. VoiceXML was originally designed to allow
audio dialogues that include voice recognition, speech synthesis, playing audio
and telephony. VoiceXML supports
mixedinitiative conversations where the caller and the system take turns in
driving the conversation [3]. The
important component of the overall voice portal system is called Speaker
Verification, which allows the verification of the voice signature of the
speaker.
In the process
of developing a multilingual voice portal for Russian language adopting a
standard such as VoiceXML has some problems: dialogue modes are limited to directed
dialogue, semantic interpretations are confined to key/value pairs in the
grammar; there is limited support for multilingualism and no place for language
identification.
VoiceXML has
limited support for multilingualism expressed in the xml:lang attribute, which
can be specified for prompts and grammars [3]. For example the system could
prompt in both English and Russian as follows:
<prompt xml:lang=”en-US”>How are you
doing?</prompt>
<prompt xml:lang="ru">Как у
вас дела?</prompt>
Also, in order
to recognize the above two words in English and Russian one needs to specify
the grammars as:
<grammar xml:lang=”en-US”>How are you
doing?</grammar>
<grammar xml:lang=”ru”> Как у
вас дела?</grammar>
Although
Russian is one of the major languages in terms of the number of speakers,
research in the area of speech processing has been lagging behind English. A very promising approach to Russian
orthography is the romanization or transliteration of Russian language, which
can be combined with automatic diacritization into an approach that can be
described as auto-romanization. The
biggest advantage of using auto-romanization is the fact that English and
Russian can be mixed in the same VoiceXML document without worrying about which
encoding standard to use. The main
disadvantage is that it is harder to write Russian in romanized form especially
for native speakers of Kazakhstan.
Auto-romanization in VoiceXML can be implemented either by exploiting
the object tag or by including them as part of the text-to-speech modules.
An example of
a Voice Application was built to demonstrate the ideas that are presented in
this paper: the Bill Payment Demo. The
Bill Payment system is automated whereby balance due is recorded and then
posted on the Internet. The speaker can
enquire about his/her balances due by account his/her unique ID number. This is a good example of a mixed-initiative
dialogue since the speaker is expected to say the ID number. CGI / Perl scripts running on an Apache
server were used to extract information from a central database.
Next I will
tell you what needed to be done for the
multilingual Russian/English implementation. Whenever the application switched to a
different language, the construct xml:lang=“language” had to be used (where
language was either en-US or RU). This
had to be done for the prompt and the grammar tags. Russian recognition was implemented using
Cyrillic script. The Voice Web Server
that I used did not have any Russian TTS engine linked to the VoiceXML
environment, so the audio had to be pre-recorded. To aid with the pronunciation of words for
the recognition dictionary and the recording of prompts, I used a commercial
diacritizer. We still need to find a
solution to integrate a text-to-speech engine combined with diacritization and
text normalization front-end.
References:
1. Boranbayev A.S. Developing applications using
speech recognition and VoiceXML // Proceedings of the international conference
“The theory of functions and computing methods”. -Astana, 2007, p.66-68.
2. Boranbayev A.S. The future of IVR and the continuous progress in speech recognition technology // Материалы VI
Казахстанско-Российской международной научно-практической конференции
“Математическое моделирование научно-технологических и экологических проблем в
нефтегазодобывающей промышленности”. -Астана, 2007, с.82-86.
3. W3C Voice Browser:
http://www.w3.org/Voice/