A Translation Memory (TM) is a database made up of segments of original, source text and translations of each segment. These segments can be sentences, paragraphs, headings etc. A pair of source and translated segments is called a translation unit (TU). A TU is created and stored in the TM as the source text is translated into the target language.
Using TMs has several benefits. Translators can reuse translations which speeds up the translation process and reduces costs. TMs also help to ensure that translations are consistent, especially when, for example, different translators are working on the same project for a specific client.
Creating a TM
- Add new TU directly from within Memsource Editor when translating. TU is added right after the translated segment is confirmed.
- Import TM from other translation tools (in TMX format) or from MS Excel.
- Align previously translated documents and import them as XLS into a Memsource translation memory.
Consistent segmentation is crucial for retrieving the best TM match. Segmentation rules in Memsource Cloud correspond with specifics of each supported language and can be customized if needed. However, keep in mind that importing Jobs with poor segmentation (e.g. poorly formatted Word files) or applying customized segmentation can affect retrieved TM match value. Such example can be seen on the picture below. The sentence in the second and third segment was manually broken into two lines. As you can see, the CAT pane shows only 63% match exactly because the second half of the sentence is missing in segment no. 2.
Assigning a TM
There can be up to 10 TMs assigned to each language pair - it means for project with two target languages you can have 20 different TMs. However large quantity of large TMs can slow down Analysis and Pre-translation process.
Memsource allows you to add TM to project with the same language but different locale. Generally all languages with the same prefix can be added (en, en_gb, en_uk...). For example, if you create DE_EN TM, you can assign it to both DE-en-GB and DE-en-US projects. Keep in mind that all TUs will be stored as EN only with no distinction between US and GB, therefore using this TM for strictly GB or US projects might be inaccurate.
TM for workflow steps
When selecting TMs for projects with workflow, the user can decide whether the same translation memories should be selected for all of the project's workflow steps, or whether each workflow step should have its own TM setup.
If you are using proofreading on regular basis, it is recommended to:
- Create TM for proofreaders only, where only reviewed translation will be stored (TM_REV) and always assign it as READ to Translation step and WRITE in Revision step.
- Create TM for translation only (TM_TRA) and assign as WRITE to Translation step and READ in Revision step.
- You can even set 2% penalty to TM_TRA, so the matches from TM_REV have always priority (101% matches from TM_TRA will be shown as 99% matches - see "Setting Penalties" bellow.)
- After project is completed and all translation is reviewed and saved in TM_REV, you can delete the TM_TRA.
Please note, that you can select a TM for all workflow steps and all languages. Afterwards, you can select one specific TM to one step and one language (if needed). However, it does not work in the reverse order. You cannot select a TM for a specific step and language, and afterwards select a TM for all steps and all languages.
Setting Penalties for Translation Memories
In the Editors CAT pane, the penalized matches will be displayed in a new order (101% match penalized by 2% will become 99% match), with a little arrow indicating the penalization.
Matches in Memsource
- 101% match – In-context match
memory, including the context. These 101% will be always overwritten in TM, when the segment is confirmed in the editor. There are 3 types of In-context match, which can be set for jobs (see TM Match Context and Optimization):
- Preceding and following segment - default settings.
- ID context - based on segment's key, available for specific file formats only).
- No context - only the source and target will be searched and saved in TM.
- 100% match
- Fuzzy matches
Differences between source segment and TM match are shown in the CAT pane as well.
- Subsegment matches - S