Translation Resources

Import Content to a Term Base

External terminology (or glossary) files can be imported into a Memsource term base in Excel or TBX file formats. The size limit for a file that can be uploaded is 1GB.

To import content, follow these steps:

  1. From a term base page, click Import.

    The Import TBX/XLSX window opens.

  2. Select a file type for import:

    • XLSX

      An XML based file format for spreadsheet applications.

      XLSX is the easiest way to import terms into a term base. A plain list of terms can be imported, but more complex terminology imports are also supported (import of synonyms, morphology, terms with various attributes, etc.).

    • TBX

      An exchange format for use in other CAT tools. Can also be used for editing content in external tools such as Okapi Olifant.

  3. Select options:

    • Create new terms

    • Update existing terms

    • Strict locale matching

      Prevents the import of a language if it has a different locale than the project.

      Example:

      A file with an EN_US designation will not be imported into a TM designated with just EN and not EN_US.

Term Metadata in a Term Base

Every term in a term base has a list of attributes that can be exported to, or imported from TMX or XLSX files. Some of these attributes can be edited directly in the term setting or edited externally in the XLSX or TMX file.

The attributes of a term base (Client, Domain, etc.) have no effect on the individual term attributes.

XLSX Files

  • CID

    Memsource Concept ID (needed for reimport of updated terms). The term concept includes the source and all its targets and synonyms.

  • concept_domain

  • concept_subdomain

  • concept_url

  • concept_definition

  • concept_note

  • TID

    Memsource Term ID (needed for reimport of updated terms). The ID of the specific term in the specific language.

  • {Language code}

    A term's language code based on our supported languages.

  • status

    Either New or Approved

  • forbidden

    True or False

  • preferred

    True or False

  • case

    Meaning case sensitive. The case can be either True or False.

  • exact

    Meaning exact match. This can be either True or False.

  • note

    Only the target note will be displayed in the Editor

  • usage

    Only the target usage will be displayed in the Editor

  • POS

    Part of Speech

  • gender

  • number

  • short_translation

  • term_type

  • created_by

    Only Memsource usernames are supported

  • created_at

    Date and time of the term creation

  • modified_by

    Only Memsource usernames are supported

  • modified_at

    Date and time of the last modification of the term

TBX Files

  • <descrip type="conceptId">

    Memsource Concept ID (needed for reimporting updated terms). The Term concept includes the source and all its targets and synonyms.

  • <descrip type="conceptDefinition">

  • <descrip type="conceptDomain">

  • <descrip type="conceptNote">

  • <descrip type="conceptSubdomain">

  • <descrip type="conceptUrl">

  • <langSet xml:lang="cs">

    A term's language code based on supported languages.

  • <termNote type="termId">

    Memsource Term ID (needed for reimporting updated terms). This is the ID of the specific term in the specific language.

  • <note>

    Term's note

  • <termNote type="partOfSpeech">

  • <termNote type="grammaticalGender">

  • <termNote type="grammaticalNumber">

  • <termNote type="usageNote">

  • <termNote type="forbidden"> 

    True or False

  • <termNote type="preferred"> 

    True or False

  • <termNote type="exactMatch"> 

    True or False

  • <termNote type="status"> 

    New or Approved

  • <termNote type="caseSensitive"> 

    True or False

  • <termNote type="createdBy"> 

    Memsource Username

  • <termNote type="createdAt"> 

    Unix time

  • <termNote type="lastModifiedBy"> 

    Memsource Username

  • <termNote type="lastModifiedAt"> 

    Unix time

  • <termNote type="shortTranslation">

  • <termNote type="termType">

Prepare XLSX for Import to a Term Base

XLSX files must be formatted in specific manner before being imported into a Term Base.

To prepare the file, follow these steps:

  1. In the XLSX file, organize all terms into columns with each column representing one language.

  2. In the first row, apply the language code for each language.

    Example:

    A

    B

    C

    1

    en

    de

    it

    2

    Agreement

    Abkommen

    accordo

    3

    Joint Committee

    Gemischte Kommission

    Commissione mista

    4

    Federal Council

    Bundesrat

    Consiglio federale

  3. Save the file.

Synonyms

Synonyms can be accommodated by adding a second row with the same language code.

Example:

en

en

de

Agreement

Contract

Abkommen

Joint Committee

Gemischte Kommission

Terms with Attributes

Terms can be imported with specified attributes. Some are generated by Memsource and are available only in files exported from a Memsource TB.

To apply an attribute to a term, follow these steps:

  1. Place a column with the attribute name after each term or synonym column.

  2. Place the value of the attribute in the row with the associated term.

Terms with Challenging Morphology

Terms that are being imported follow the same morphology rules as terms created directly in a term base.

Apart from working with synonyms and Fuzzy/Exact matches, a pipe character can be added as a boundary between the word stem (the part that does not change) and the suffix (the part that does change).

Example:

The term smíšen|ý in Czech can also come up as smíšeného, smíšenou, etc. Putting the | character before the ý ensures that all three endings will be considered matches.

TBX Import Format

Memsource also supports the TBX format for terminology imports (and exports). The TBX standard is a "loose" standard. If a TBX file is imported from another CAT tool, some metadata may not get imported.

See Term Metadata in Term Base for more details.

If importing terminology between two Memsource term bases, use the TBX format. Inside the Memsource environment, data will be correctly imported.

SDL Trados uses a special TBX.XML format and it has different specifications for import.

Multiterm TBX

The import process from Multiterm TBX files has been optimized and the following metadata will be imported into Memsource:

  • Timestamps (created at, last modified at)

  • Value in element <descrip type="usageNote"> to the attribute usage of all the terms of the concept

  • Value in element <descrip type="note"> to the attribute note of all the terms of the concept

Import TBX.xml from SDL Trados

SDL Trados does not support the TBX format for term bases and uses the XML format with a TBX schema. Importing this XML format is supported but not with all attributes.

Attributes specified for the whole term concept will be added to every individual term's Note (each language, each synonym, etc.)

Imported attributes:

  • Source

  • Target

  • Synonyms

  • Date of Creation

  • Date of Modification

  • Names of Author and Reviewer

    These will be imported only if the name is the same as the username of an existing Memsource user. You can either edit the names in the TBX.xml or add the users to Memsource.

  • Customized Attributes

    These will be imported into the term’s Note. Every attribute will have a separate line starting with the attribute’s name. For example:

    • Origin: Wikipedia

    • Theme: Law

    • Status: New

Edit the TBX.xml Before Import

To make the best use of your data, edit the TBX.xml file before importing it to Memsource. To edit the file, open it in a text editor that supports Multiline Regex (such as Notepad++) and that can use regular expressions in Search and Replace features.

Editing Note, Usage and Status

Customized attributes in TBX.xml files have the following format. Actual names of the attributes will be different since they are not standardized:

<descripGrp>
<descrip type="Comment">term =API= should not be translated</descrip>
</descripGrp>
<descripGrp>
<descrip type="Definition">API = application programming interface</descrip>
</descripGrp>
<descripGrp>
<descrip type="Example">Memsource offers a set of API calls.</descrip>
</descripGrp>
<descripGrp>
<descrip type="Status">confirmed</descrip>
</descripGrp>

These attributes will be automatically imported into the Note in Memsource:

  • Comment: term =API= should not be translated

  • Definition: API = application programming interface

  • Example: Memsource offers a set of API calls

  • Status: confirmed

To change this behavior and import, for example:

  • Only the Comment as a Memsource Note

  • Example as Usage

  • Status as Approved or New

  • Don't require import of Definition

Edit the TBX.xml file to fit the standard of the Memsource format for TBX files:

<note>term =API= should not be translated</note>
<termNote type="usageNote">Memsource offers a set of API calls.</termNote>
<termNote type="status">Approved</termNote>

Changing Comment to Note

Search:

<descripGrp>.[^\<]+<descrip type="Comment">([^\<]+)</descrip>.[^\<]+</descripGrp>

Replace:

<note>\1</note>

Changing Example to Usage

Search:

<descripGrp>.[^\<]+<descrip type="Example">([^\<]+)</descrip>.[^\<]+</descripGrp>

Replace:

<termNote type="usageNote">\1</termNote>

Setting Status to Approved 

Search:

<descripGrp>.[^\<]+<descrip type="Status">[^\<]+</descrip>.[^\<]+</descripGrp>

Replace:

<termNote type="status">Approved</termNote>

Deleting Definition

<descripGrp>.[^\<]+<descrip type="Definition">([^\<]+)</descrip>.[^\<]+</descripGrp>

Replace with an empty field.

Adding an Author to Note

Remove the author from the <transacGrp / origination> element and add it to the <descript> element.

<transacGrp>
<transac type="terminologyManagementTransactions">origination</transac>
<date>2006-09-27T11:25:19</date>
<transacNote type="responsibility">MikeS</transacNote>
</transacGrp>

should be replaced by:

<transacGrp>
<transac type="terminologyManagementTransactions">origination</transac>
<date>2006-09-27T11:25:19</date>
</transacGrp>
<descripGrp>
<descrip type="author">MikeS</descrip>
</descripGrp>

The regular expression will be:

Search:

(origination</transac>.[^\<]+<date>[^\<]+</date>.[^\<]+)<transacNote type="responsibility">([^\<]+)</transacNote>.[^\<]+</transacGrp>

Replace:

\1</transacGrp>\r\n<descripGrp>\r\n<descrip type="author">\2</descrip>\r\n</descripGrp>

Adding Edited by to a Note

To add Edited by to a Note, remove the Editor from the <transacGrp / modification> element and add it to the <descript> element.

<transacGrp>
<transac type="terminologyManagementTransactions">modification</transac>
<date>2006-09-27T11:25:19</date>
<transacNote type="responsibility">lauraB</transacNote>
</transacGrp>

should be replaced by:

<transacGrp>
<transac type="terminologyManagementTransactions">modification</transac>
<date>2006-09-27T11:25:19</date>
</transacGrp>
<descripGrp>
<descrip type="Edited by">lauraB</descrip>
</descripGrp>

The regular expression will be:

Search:

(modification</transac>.[^\<]+<date>[^\<]+</date>.[^\<]+)<transacNote type="responsibility">([^\<]+)</transacNote>.[^\<]+</transacGrp>

Replace:

\1</transacGrp>\r\n<descripGrp>\r\n<descrip type="edited by">\2</descrip>\r\n</descripGrp>
Was this article helpful?

Sorry about that! In what way was it not helpful?

The article didn’t address my problem.
I couldn’t understand the article.
The feature doesn’t do what I need.
Other reason.

Note that feedback is provided anonymously so we aren't able to reply to questions.
If you'd like to ask a question you can leave a public comment below or Submit a request to our Support team.
Thank you for your feedback.

Comments

0 comments

Please sign in to leave a comment.