File Import settings

XML

Introduction

The XML file filter by default imports files with the following extensions:

  • .xml

Default XML Settings

The XML file format has not been primarily designed for translation. That is why it is at times necessary to provide additional settings for XML files. This will ensure that appropriate content is extracted for translation.

XML file settings can be modified when adding a new job for translation into a project. Memsource XML import options, as described below, provide ample options for configuring the Memsource XML import filter.

The default settings, (which have an asterisk in the Element Field) will import all XML elements for translation. If you would like to import these files using something other than the default behavior, this can be done via the two options below:

  • Plain import rules
  • XPath

Plain Import Rules

This method gives you a simple way of specifying which elements and attributes should be extracted for translation. You can customize these rules using the following options:

  • Elements: Only the selected elements (name, title, para) will be imported. Use an asterisk (*) to import all elements.
  • Attributes: A note will be imported for translation. Use an asterisk (*) to import all attributes.
  • Translatable inline elements: If the Identify inline elements automatically option is selected, Memsource will import all elements in the translatable text as Translatable inline elements.
  • Non-translatable inline elements: The selected inline element productname will be converted into Memsource tags and its content will not be translatable.
  • Identify inline elements automatically: With this option enabled, elements that are neighbors of text nodes will be automatically converted to inline tags.
  • Elements (processed as HTML): The selected element code will be processed as HTML. Please note that HTML Import Settings such as Preserve Whitespace or Break segment on 
    tag can be used for these elements.
  • Locked elements: The selected elements will be imported as Locked.
  • Locked attributes: The selected attributes will be imported as Locked.
  • Import XML entities: When selected, XML entities in DTP Declaration will be included for translation.
  • A line break creates a new segment: This option should be used rarely. Normally, a new line in an XML file should not create a new segment.
  • Segment XML: Unselect this option if segmentation is not desired.
  • Convert to Memsource tags (use regexp): Use a regular expression to convert specific text to tags.
  • Convert to character entities: Enter a list of character references (separated by commas) into the output file. For example, when you want quotes ("), they would be represented as &quot; and the character Σ would be represented as &#x3A3; use &quot;,&#x3A3;. Please note that & and < are always exported as &amp; and &lt; respectively.

XML Settings Using XPath

Translatable content can also be defined using the XPath query language. This method allows for the creation of complex import rules and some additional features that the Plain Import feature lacks. However, the user must be familiar with XPath:

  • Context key: This is the value that constitutes TM context (101% matches) if applicable.
  • Context note: Import elements or context attributes for each element.
  • Max. target length: Import elements or the maximum target length for each element.
  • Preserve whitespaces: Keep this empty to preserve whitespaces in elements> Using xml:whitespace='preserve'. //* will preserve all whitespaces in all elements, or you can use an arbitrary XPath expression.
  • Convert character entities: Enter a list of character references (such as &quot; or &#x3A3;) that are required in the output file. Each item in this list will be separated by a comma. Please note that & and < are always exported as &amp; and &lt; respectively.

A subset of XPath 1.0 is accepted with the following limitations:

  • Axis in step
    • Supported: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self
    • Not supported: following, preceding, following-sibling, preceding-sibling, namespace
  • Predicate
    • Supported: conditions on the current node or ancestor nodes and their properties (attributes, namespaces)
    • Not supported: For example: position number, axis child::, descendant::, descendant-or-self::, following::, preceding::, following-sibling::, preceding-sibling::, function last()

Please note that the XPath expression should define the elements and/or attributes whose text/value should be translated and not the actual text node. See our article on XPath for more information.

HTML Preview with XSLT stylesheet

XSLT language (Extensible Stylesheet Language Transformations) can be used to transform XML documents into HTML format for in-context preview purposes. Memsource currently supports XSLT 2.0.

To import the XML file with a stylesheet (XSL or XSLT), navigate to the bottom of the XML import settings and choose XSLT file. When you import an XML with a stylesheet, the Preview Translation feature (found the Editor by clicking on Document and selecting Preview Translation) will generate an HTML preview instead of an XML one. (Please see this video for more information.) Once used for file import, XSLT can also be downloaded from the File Import Settings page.

CDATA in XML file

The term CDATA means, Character Data. CDATA is defined as blocks of text that are not parsed by the parser but are otherwise recognized as markup. The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, the CDATA section can be used.

If the source file contains CDATA and if you use option "Segment XML" then Memsource will add CDATA to every segment in the Completed file.

Source:

<text><![CDATA[Translatable text A. Translatable text B.]]></text>

Target:

<text><![CDATA[Translatable text A.]]><![CDATA[ ]]><![CDATA[Translatable text B.]]></text>

This is the correct behavior. The Completed file is valid XML and the XML viewer will display the text correctly as "Translatable text A. Translatable text B."