Terms can be imported from both Excel and TBX file formats into a term base. When importing terms, the user can decide whether they should be imported as new (meaning that new terms will be created) or if the existing terms in the term base should be updated.
To launch the Import dialogue window, first click on the name of your term base. Then, click the Import button in the Import/Export section of the page.
Note: The size limit for a file that can be uploaded is 1GB.
Importing an XLS or XLSX File
If you do not have a ready TBX file, using an XLS file is the easiest way to import terms into a term base. A plain list of terms can be imported, but more complex terminology imports are also supported (importing synonyms, terms with various attributes, etc.).
Important: Only terms from the first sheet in your Excel file are imported.
If, for example, you have an Excel file with terms in English, German, and Italian, the first thing you need to do is organize the terms into columns, where each column represents a language. Then, in the first row, use the appropriate language code for each language. In this case, they will be en, de, and it. This will ensure that each term is assigned to the correct language.
When you're done, save the Excel file and click on the Import button.
Let's say we want to import the same terms as above. However, this time we also want to import a synonym for one of the terms in English ("contract" as a synonym for "agreement"). To do this, we need to add one more column with English in the Excel file for import. The list of languages will now be English, English, German, Italian. Place the synonym "contract" in the empty en column, in the same row as the word "agreement".
Important: Any synonyms must have their own column with the appropriate language code in the header.
Importing Terms with Attributes
- CID (Concept ID - the concept includes the source and all its targets)
- TID (Term Id - the ID of the specific term in the specific language)
- status (either New or Approved)
- forbidden (True or False)
- preferred (True or False)
- case (meaning "case sensitive." The case can be either True or False.)
- exact (meaning "exact match." This can be either True or False.)
- created_by (only Memsource usernames are supported)
- created_at (date and time)
- modified_by (only Memsource usernames are supported)
- modified_at (date and time)
Below is an example of an Excel file that imports status and case information:
To do this, add any number of attribute columns next to each term column. Make sure you put the name of the attribute in the header and the attribute values in the rows below. Then, save the Excel file and click on the Import button.
Importing Terms with Challenging Morphology
There are several ways that morphology can be handled in Memsource.
- By default, new terms will have their match type set to Fuzzy. This works well for words such as "agreement" since Memsource will also match longer words, like "agreements" to this term. (As long as the word's suffix is not longer than 50% of the term, it will be matched).
- The Fuzzy match type will not work well for extremely short words, such as abbreviations. Therefore, it is advisable to set the match type for abbreviations, for instance, to exact. This is done by making a column called exact and putting the word true in that column.
- To improve matching for terms with rich morphology (for instance when part of the word changes), a boundary between the word stem (the part that does not change) and the suffix (the part that does change) can be defined by inserting a pipe character ("|"). This is especially useful for words whose endings can change. For example, the term smíšen|ý in Czech can also come up as smíšeného, smíšenou, etc. Putting the "|" character before the ý ensures that all three endings will be considered matches.
TBX Import Format
Memsource also supports the TBX format for terminology imports (and exports). The TBX standard is a "loose" standard, which means that if a TBX file is imported from non-Memsource software, some of the metadata may not get imported. However, we have optimized the import from Multiterm TBX and the following metadata should get imported correctly into Memsource:
- Timestamps (created at, last modified at)
- Value in element <descrip type="usageNote"> to the attribute usageExample of all the terms of the concept
- Value in element <descrip type="note"> to the attribute note of all the terms of the concept
Note: If you need to import terminology between two Memsource term bases, use the TBX format because inside the Memsource environment, the data will be imported correctly.
SDL TBX.XML file
The SDL Trados uses a special TBX.XML format instead of the standard TBX format. Because of this, it has different specifications for import.
Articles related to Term Base: