Translating With OmegaT
from www.linux.com
Thursday February 17, 2005 (08:00 AM GMT)
By: Dmitri Popov
Although computers have yet to take over the business of language translation (if they ever will!), they have become a common part of the translation process. Many professional translators use computer-assisted translation (CAT) tools such as TRADOS, Déjà Vu, and WordFast. But a less-known, yet excellent, open source CAT application called OmegaT can help as well.
Before you begin exploring OmegaT yourself, you should understand how it, or any CAT tool, works. OmegaT is a so-called translation memory application; that is, it doesn't translate texts for you. Instead, it stores pieces of text (called 'segments') and their corresponding translations in a file called 'translation memory.' During translation OmegaT divides the translated text into conceptual segments. When you select a segment for translation, OmegaT scans the translation memory for possible matches, and displays found matches in a separate window. The translator running the application can insert the closest matching translation into the text.
There are two types of matches: exact and fuzzy. If the segment in the current text is identical to the one stored in the translation memory, you have an exact match. In the real world, however, you rarely have exact matches -- some words or forms in a selected segment vary from the segment in the translation memory. Luckily, OmegaT supports partial matches, which in translation lingo are called 'fuzzy matches.' This means that OmegaT can find segments in the translation memory that are not identical to, but are similar to, the one in the current text.
Before OmegaT can be really useful, you have to use it for some time in order to build up a usable translation memory. The good news is that OmegaT works with translation memories in TXM format, which is supported by almost every CAT tool on the market, so you can easily use existing translation memories, and exchange memories with other users.
OmegaT advantages
While OmegaT lacks some of the advanced features of commercial packages, it has quite a few strong points.
OmegaT is a Java-based application, which means that it can run on Windows, Linux, and Mac OS X. Most commercial CAT tools are available for Windows only.
OmegaT can work with multiple translation memories. You can easily combine several translation memories and use them for a particular project.
Tools such as TRADOS or WordFast are tied to Microsoft Word, meaning if you want to use them you have to use Word. OmegaT is a standalone application, which doesn't dictate which text processor to use (although OpenOffice.org significantly increases its usefulness).
OmegaT is distributed under the GPL, free of charge. Since many CAT tools are aimed at professional translators, they tend to be expensive.
Installing and using OmegaT
OmegaT requires that the free Java Runtime Environment (JRE) is installed on your computer before you install OmegaT. Once the JRE is installed, download OmegaT and install it on your computer.
Now it's time to create your first translation project:
Launch OmegaT.
Create a new project by selecting File > Create New Project from the menu bar.
Choose a directory for your project, give it a name, and save it.
You will then see a dialogue window that allows you to specify directories and language settings for the project. Use the default folder paths unless you have a very good reason to change them. Enter the language and local codes for your source and target languages -- for example, FR-FR for French (France) and EN-US for English (United States). Press OK when done. (The actual codes are not particularly important unless you intend to exchange translation memories with other users. In that case you should use the ISO language codes.)
OmegaT will create a project folder (also called 'project root directory') containing five subdirectories: /glossary, /source, /omegat, /target, and /tm.
Quit OmegaT by choosing File > Quit.
Now you have to add the documents you want to translate (also called 'source documents') to your project. OmegaT supports plain text, HTML, XHTML, StarOffice, and OpenOffice.org formats (including Writer, Calc, and Impress). To add source files in these formats to the project, put them into the /source folder.
The good thing about OmegaT is that you can work with several documents in different formats within one project; that is, one project can contain Writer documents, HTML pages, Impress presentations, and so on. Although OmegaT cannot work with Microsoft Office documents directly, you can convert them into OpenOffice.org formats, translate them, and then convert them back to the original formats.
Now that the project is populated with the necessary files, you are ready to do some translation work. Launch OmegaT, and choose File > Open. Point to the project folder and double-click on the omegat.project file inside it. If your project contains more than one source file, OmegaT opens the Project Files window, where you can choose the document you want. In the Project Files window you can also see a number of segments in each document, which can come in handy when you have to estimate the amount of work.
Once you have opened a document you can begin translating it. The translation process using OmegaT is straightforward:
Place the cursor in the target field of the first segment, between the tags <segment 0001> and <end segment>.
Type in your translation and delete the original text. Press Enter to confirm the translation and to jump to the next segment. Repeat this process with each segment until you have translated all of the text.
Select File > Save, then File > Compile. The Compile command does two important things: it creates a translated version of the source document (target text), and generates a translation memory.
Quit OmegaT by choosing File > Quit.
You will find your translated file in the /target folder, and the updated (or new) translation memory in the /tm folder. The translation memory can then be used with any other translation project.
Using OmegaT with existing translation memories
If you translate a text using an existing translation memory, OmegaT will display possible matches for the active segment in the upper part of the Match and Glossary Viewer window. OmegaT can display up to five fuzzy matches and you can select the closest one by selecting the Select Fuzzy Match # command from the Edit menu. You can paste the selected match into the active segment using Edit > Insert Translation (inserts the match at the cursor position) or Edit > Overwrite Translation (substitutes it for the active segment text).
OmegaT also allows you to search translation memories and project files. Choose Edit > Search Translation Memory to open the Search dialogue window, enter the word or phrase you wish to search for in the Search for field, and press Search.
OmegaT supports keyword and exact searches. Keyword searches find text fragments containing all the search words, similar to a search using the AND operator (red AND balloons). Keyword searches can only find whole words. An exact search finds text fragments that contain the exact matches of the search term, which can be one word or a phrase. Exact searches can search source and target segments in the current project, translation memories (files in /tm), and any file in a format OmegaT can read, in any selected directory and subdirectories.
We've just touched upon the basics of OmegaT, and there is much more to the application than meets the eye. If you want to get the most out of using OmegaT, be sure to read its documentation. Even if you only need to translate a business letter or a product sheet every now and then, you can benefit from using OmegaT.
Dmitri Popov is a freelance contributor and an avid OpenOffice.org user. His articles have appeared in Russian, British, and Danish computer magazines. |