Machine Translation and Post-Editing

Machine Translation (MT) involves the use of sophisticated computer technologies that automate the translation of text from one language (source language) to another language (target language) while minimizing human intervention. MT was considered to be a revolutionary concept in the 1950s, however, an unattainable timetable for large-scale development, that was established by scientists working on the Georgetown-IBM Experiment, resulted in a shift away from MT until the 1980s. The Georgetown-IBM Experiment consisted of the automatic translation of Russian sentences into English in a very specialized field (organic chemistry). This experiment garnered worldwide media attention at the time and was considered successful to a certain degree. Still, preliminary success could not prevent a general delay of progress in the field of MT. The translation algorithm that had been used was not sophisticated enough to support the overly ambitious expectations that were initially forecast.

The onset of the 1980s computer revolution was the catalyst for a returned interest in MT and its potential use in business. Computer power has increased exponentially over the past thirty years and investments in MT have followed suit as companies position themselves and battle for market share. During this time, the localization industry has led the transition from crude translation models that could translate only simple sentences to the highly sophisticated models seen today.

Recently, the convergence of translation memory (TM) and machine translation technologies have been critically important to MT development within the localization industry. Translation memory is a technology founded on a large database of language pairs (individual words, phrases, sentences, paragraphs) that offers translators word suggestions based on previous translations. These databases are in a constant state of evolution because human translators readily utilize and update them when performing translations. A major benefit of translation memory is that it increases translator efficiency and lowers overall costs by reusing previously translated content. Translation memory promotes consistency and speed because the same sentence is never translated twice.

2006 marked quite possibly the most important innovation in the history of machine translation: the release of Google’s trillion-word corpus. Google MT is built on the latest innovation in machine translation: Statistical Machine Translation (SMT). At a high level, SMT incorporates computer algorithms that utilize probability distributions (the bell curve and standard deviations) to predict the words used in translations. The massive size of Google MT’s corpus is an attempt to exhaust all possible language combinations. Hence, Google MT epitomizes the convergence of machine translation and translation memory technologies.

How Businesses Benefit from Machine Translation

In today’s global economy, businesses must distinguish themselves in the international market in order to stay competitive. As a result, translating business documentation into various target languages is becoming increasingly paramount.

As opposed to source-content creations, which usually deal with a single language, the efforts to translate the same document into many different languages can be substantially exhaustive and costly which is due to the multiplying factor. Therefore, automating the translation process to improve production efficiency, and cut costs, has always been a sought-after solution.

Although MT may not completely replace human translators anytime soon, it is a legitimate tool for commercial, technical, and function-oriented text translation. This is because documents tend to contain a lot of repetitions and require regular updates, work that is a waste of time for translators and a waste of money for customers. By freeing human translators from these repetitive tasks, MT can significantly improve translation efficiency while reducing project costs.

As technologies have improved, the localization industry has experienced a paradigm shift and reliance on MT is increasing. This trend will certainly continue into the future. Global businesses appreciate the decreased costs and increased speed by which MT localizes documentation but limitations in the technology still exist. In order to compensate for these technological constraints, human translation is still needed to provide global companies with the critical post-editing void left by machine translation.

The MT Debate

Experts within the localization industry agree that the biggest drawback to machine translation is the system’s inability to adequately deal with the subtleties of language and translation. On account of these shortcomings, MT systems are making decisions and outputting translations based on incomplete data which puts the accuracy of translations at risk.

Google acknowledges that the idiosyncrasies of language are extraordinarily difficult to replicate and that even the world’s most sophisticated software coupled with massive amounts of data (the technology driving Google MT) doesn’t approach the fluency of a native speaker or possess the skill of a professional translator.

Further illustrating the challenges facing Google MT, Google’s trillion-word corpus is not annotated so the accuracy of its translations is a concern within the localization industry. Google believes more raw data is better than less annotated data. Prominent individuals within the localization industry agree, however, that there is no substitute for professional translators in cases where the accuracy of translations is of the utmost importance.

Jaap van der Meer, director of the Translation Automation User Society (TAUS), raised some interesting points about the current state of machine translation. Mr. van der Meer considered the negative ramifications of a for-profit company holding a monopoly on information within the MT industry and opined about the potentially debilitating effects a single information-holding powerhouse would have on the future of machine translation.

With these considerations in mind, many questions remain: If a single company accrues a majority share in the MT sweepstake, how much longer will their translation services be free of charge? Accumulating translations from a variety of sources, what is the possibility of incorporating contaminated data within a massive corpus? Competition breeds efficiency and innovation but the localization industry must contend with how to compete with single sources vying for supremacy within the MT industry.

Regardless of personal opinion, one fact within the MT debate remains: Raw MTs tend to be only 50% – 70% accurate. Even the most cutting-edge technology is not a one-for-one replacement for in-country, professional translators. Therefore, it is important for businesses to identify translation services like Stepes that combine innovative post-editing technology with world-class translators.

The MT Post-Editing Solution

Post-MT translation proofreading and editing is the process by which professionally-trained human translators systematically review and edit machine translated content to remove linguistic errors. Examples of these errors include incorrect translations and poor-readability. Unlike the traditional localization process, customers now either provide a localization vendor with a partially-translated document that has already been translated by machine or asks the vendor to apply MT to the source text itself. Next, the localization vendor uses professionally-trained human translators to proofread and edit the machine-translated material then delivers the final content as a finished product.

Accurate translations should be the lifeblood of any professional localization company. A client should never receive a final deliverable that is not reviewed and verified by a professionally-trained translator with subject matter expertise. Take, for example, the stringent quality requirements for translations in the life sciences and medical device industries: 100% translation accuracy is a matter of life and death.

The sophistication of MT technology has increased exponentially and global businesses are incorporating it more and more frequently into their daily operations. No one can deny the increased efficiency and savings that the technology can offer. At the same time, however, on account of its inability to properly express the many subtleties of human expression, MT technology has created a demand for post-editing services which is why humans and machines are still heavily intertwined in the translation dance, and will be for some time to come.

Glossary

Machine Translation (MT): The use of sophisticated computer technologies that automate the translation of text from one language (source language) into another language (target language), with minimal human involvement.

Translation Memory (TM): A large database of language pairs (individual words, phrases, sentences, paragraphs) created by human translators.

Language Pairs: Bilingual and/or multilingual text segments that match words from a source language to its equivalent in a target language.

Statistical Language Model: Assigns a probability to a short sequence of words (called n-grams) by means of a probability distribution.

Quality Assurance (QA): Verifying the accuracy and quality of work completed at specific, predetermined points in the production process.

Corpus: A collection of recorded utterances used as a basis for the descriptive analysis of a language.