Speech synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.[1]

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely “synthetic” voice output.[2]

The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s.

Overview of a typical TTS system

A text-to-speech system (or “engine”) is composed of two parts:[3] a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),[4] which is then imposed on the output speech.

Here is a non-exhaustive comparison of speech synthesis programs


Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system. And full tools and documentation for build new voices are available through Carnegie Mellon’s FestVox project (http://festvox.org)

The system is written in C++ and uses the Edinburgh Speech Tools Library for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate, a printed manual, info files and HTML.

Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.


Text to Speech engine for English and many other languages. Compact size with clear but artificial pronunciation. Available as a command-line program with many options, a shared library for Linux, and a Windows SAPI5 version.

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.   http://espeak.sourceforge.net

eSpeak uses a “formant synthesis” method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:

  • A command line program (Linux and Windows) to speak text from a file or from stdin.
  • A shared library version for use by other programs. (On Windows this is a DLL).
  • A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
  • eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris.

Features.

  • Includes different Voices, whose characteristics can be altered.
  • Can produce speech output as a WAV file.
  • SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
  • Compact size. The program and its data, including many languages, totals about 2 Mbytes.
  • Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
  • Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
  • Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
  • Development tools are available for producing and tuning phoneme data.
  • Written in C.

Search Engine Optimization (SEO)

Internet es la calle más transitada del mundo, pero el trafico en cada pagina depende principalmente del posicionamiento en los buscadores como Google. Al arte de colocarse en los primeros lugares de los listados se le conoce como Search Engine Optimization (SEO).


Google mantiene como secreto la mecánica de asignación de lugares, que además cambia de manera continúa. Es un proceso bastante errático, y los que logran colocarse en la primera pagina para la lista de búsqueda de un conjunto de palabras, tenderán a mantenerse ahí hagan lo que hagan, tengan el contenido que tengan, siempre y cuando Google no los vete, por razones también erráticas y misteriosas. Es decir, el SEO es un deporte extremo.

Referencias, recursos, y ejemplos

Google Privacy Policy

We’re getting rid of over 60 different privacy policies across Google and replacing them with one that’s a lot shorter and easier to read. Our new policy covers multiple products and features, reflecting our desire to create one beautifully simple and intuitive experience across Google.
We believe this stuff matters, so please take a few minutes to read our updated Privacy Policy and Terms of Service at http://www.google.com/policies. These changes will take effect on March 1, 2012.
Got questions?
We’ve got answers.
Visit our FAQ at http://www.google.com/policies/faq to read more about the changes. (We figured our users might have a question or twenty-two.)

States Move on Privacy Law

Over two dozen privacy laws have passed this year in more than 10 states, in places as different as Oklahoma and California.
For Internet companies, the patchwork of rules across the country means keeping a close eye on evolving laws to avoid overstepping.

Continue reading “Google Privacy Policy”

Quick Response Code

QR code (abbreviated from Quick Response Code) is the trademark for a type of matrix barcode (or two-dimensional barcode) first designed for the automotive industry in Japan; a barcode is an optically machine-readable label that is attached to an item and that records information related to that item: The information encoded by a QR code may be made up of four standardized types (“modes”) of data (numeric, alphanumeric, byte / binary, Kanji) or, through supported extensions, virtually any type of data.[1]

The QR Code system has become popular outside the automotive industry due to its fast readability and greater storage capacity compared to standard UPC barcodes. Applications include product tracking, item identification, time tracking, document management, general marketing, and much more.[2]

A QR code consists of black modules (square dots) arranged in a square grid on a white background, which can be read by an imaging device (such as a camera) and processed using Reed-Solomon error correction until the image can be appropriately interpreted; data is then extracted from patterns present in both horizontal and vertical components of the image.[2]


PHP QR Code is open source (LGPL) library for generating QR Code, 2-dimensional barcode. Based on libqrencode C library, provides API for creating QR Code barcode images (PNG, JPEG thanks to GD2). Implemented purely in PHP, with no external dependencies (except GD2 if needed).

Some of library features includes:

  • Supports QR Code versions (size) 1-40
  • Numeric, Alphanumeric, 8-bit and Kanji encoding. (Kanji encoding was not fully tested, if you are japan-encoding enabled you can contribute by verifing it 🙂 )
  • Implemented purely in PHP, no external dependencies except GD2
  • Exports to PNG, JPEG images, also exports as bit-table
  • TCPDF 2-D barcode API integration
  • Easy to configure
  • Data cache for calculation speed-up
  • Provided merge tool helps deploy library as a one big dependency-less file, simple to “include and do not wory”
  • Debug data dump, error logging, time benchmarking
  • API documentation
  • Detailed examples
  • 100% Open Source, LGPL Licensed

It’s official: Google buys mapping app developer Waze

Google is buying the crowdsourced mapping app developer Waze, in a move geared toward adding more real-time navigation tools to its own Maps software, the company announced Tuesday.

Google is buying the crowdsourced mapping app developer Waze, in a move geared toward adding more real-time navigation tools to its own Maps software, the company announced Tuesday.

Apple hosts live webcast of WWDC keynote today at 10 a.m. PT/ 1 p.m. ET

Apple will webcast the keynote of its developers conference live starting at 10 a.m. PT, but the webcast will be available only on the company’s own hardware, or via an OS X-powered virtual machine.

Apple will webcast the keynote of its developers conference live starting at 10 a.m. PT, but the webcast will be available only on the company’s own hardware, or via an OS X-powered virtual machine.

Forget the keynote. WWDC is still about the developers

As usual, the Apple rumor mill has been on overdrive as WWDC nears. But all the hype about anything CEO Tim Cook might reveal misses the point, says columnist Ryan Faas. WWDC is still about developers.

As usual, the Apple rumor mill has been on overdrive as WWDC nears. But all the hype about anything CEO Tim Cook might reveal misses the point, says columnist Ryan Faas. WWDC is still about developers.