File Import Settings

XML (Extensible Markup Language)

The XML file format is not designed for translation and requires additional settings for successful import.

Default settings are marked with an asterisk (*) and will import all XML elements for translation. Import options can be used to change the import behavior.

File Types

  • .XML

Import Options

Plain Import Rules

  • Elements

    Only selected elements (i.e. name, title, para) are imported. An asterisk (*)  imports all elements.

  • Attributes

    Only selected attributes (i.e. name, title, para) are imported. An asterisk (*) imports all attributes.

  • Translatable inline elements

    If the Identify inline elements automatically option is selected, all elements in the translatable text are imported as Translatable inline elements.

  • Non-translatable inline elements

    Selected inline element name, title, para will be converted into Memsource tags and content will not be translatable.

  • Identify inline elements automatically

    Elements that are neighbors of text nodes will be automatically converted to inline tags.

  • Elements (processed as HTML)

    Selected element code is processed as HTML. HTML Import Settings such as Preserve Whitespaces or Break tag (<br/>) creates new segment can be used for these elements.

  • Locked elements

    The selected elements will be imported as Locked.

  • Locked attributes

    The selected attributes will be imported as Locked.

  • Import XML entities

    XML entities in DTP Declaration will be imported for translation.

  • Segment XML

    Deselect if segmentation is not desired.

  • Import comments

  • Convert to Memsource tags

    Apply regular expressions to convert specified text to tags.

  • Convert to character entities

    Enter a list of character references (separated by commas) into the output file.

    Example:

    If quotation marks (") are required, they would be represented as &quot;, the character Σ would be represented as &#x3A3;use &quot;,&#x3A3;. & and < are always exported as &amp; and &lt; respectively.

XML Settings Using XPath

Using the XPath query language allows for the creation of complex import rules and some additional features unavailable in Plain Import Rules.

XPath expression should define the elements and/or attributes whose text/value should be translated and not the actual text node.

Familiarity with XPath is recommended before using.

Context NoteContext Key, and Max. target length will not be processed for files with more than 10,000 XML elements.

  • Context key

    Constitutes TM context (101% matches) if applicable.

  • Context note

    Import elements or context attributes for each element.

  • Max. target length

    Import elements or the maximum target length for each element.

  • Preserve whitespaces

    Keep empty to preserve whitespaces in elements. Apply xml:whitespace='preserve'. //* to preserve all whitespaces in all elements, or use an arbitrary XPath expression.

HTML Preview with XSLT stylesheet

XSLT language (Extensible Stylesheet Language Transformations) can be used to transform XML documents into HTML format for in-context preview purposes. Memsource currently supports XSLT 2.0.

Click Choose file to import a stylesheet.

Click Download XSLT to download the stylesheet after file import.

CDATA in XML file

CDATA means Character Data and is defined as blocks of text that are not processed by the parser but are recognized as markup. Predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, the CDATA section can be used.

If the source file contains CDATA and the Segment XML is used then CDATA is added to every segment in the Completed file.

CDATA will be only be segmented if there is a clear indication of a segment break such as punctuation or spacing.

Source:

<text><![CDATA[Translatable text A. Translatable text B.]]></text>

Target:

<text><![CDATA[Translatable text A.]]><![CDATA[ ]]><![CDATA[Translatable text B.]]></text>

The Completed file is valid XML and the XML viewer will display the text correctly as Translatable text A. Translatable text B.

Application Specific Settings

Wordpress XML

Recommended settings for Wordpress XML:

  • XML

    XPath

  • Elements & attributes

    //*[local-name()='encoded']|//description|//title

  • Elements (processed as HTML)

    //*[local-name()='encoded']|//description|//title

  • Convert to Memsource tags

    (\[[^\]]++\])++

Select Preserve whitespaces under HTML settings.

Multilingual XML

Multilingual files are imported as multiple bilingual jobs with languages mapped before import. They are represented with multilingual_xml.png in the Jobs Table. If imported into several target languages, the Completed file is composed of all target languages.

Memsource supports XML files that have both source and target elements present for all paragraphs even if the target is empty. When the source and target segmentation are different, the source segmentation is determining.

  • When creating a job, select Multilingual XML from the File Type pane before applying Import Options. If not specified, the file will be imported as standard XML.

  • Tag content of source XML file can be visualized in the Editor by clicking Expand Tags under the Tool menu and edited by clicking F2.

Individual language elements must all be descendants of the same trans-unit element and one language cannot be contained in the other.

Example:

Sample of partially translated text from English to German and French. All <tuv lang="en">, <tuv lang="de"> and <tuv lang="fr"> are children of the same <tu> element.

<?xml version="1.0" encoding="utf-8"?>
<root>
Not translatable text.
<tu note="context note" key="ID 254" maxlen="16"> 
  <tuv lang="en">
    <seg>First segment.</seg>
  </tuv>
  <tuv lang="de">
    <seg>Erste segment</seg>
  </tuv>
  <tuv lang="fr">
    <seg></seg>
  </tuv>
</tu>
<tu note="another context note" key="ID 255" maxlen="18"> 
  <tuv lang="en">
    <seg>Second segment.</seg>
  </tuv>
  <tuv lang="de">
    <seg></seg>
  </tuv>
  <tuv lang="fr">
    <seg></seg>
  </tuv>
</tu>
</root>

Import Options

For the import of Multilingual XML files, the XPath query language must be used. See example above for reference. The XPath expression defines the elements in which the text/value should be translated and not the actual text node.

  • Elements containing source and target sub-elements

    //tu

  • Elements containing source text

    tuv[@lang='en']/seg (in relation to the parent element //tu)

  • Elements containing target text

    tuv[@lang='de']/seg (in relation to the paContext note rent element //tu)

  • Elements containing target text

    tuv[@lang='fr']/seg (in relation to the parent element //tu)

  • Non-translatable inline elements

    All elements in source or target are considered Translatable inline elements unless specified here as Non-translatable inline elements.

  • Convert to Memsource tags

    Apply regular expressions to convert specified text to tags.

  • Context key

    Specify a context key that is saved with the segment to the Translation Memory and used for match context.

  • Context note

    Import elements or context attributes for each element.

  • Max. target length

    Import elements or the maximum target length for each element

  • Convert to character entities

    Enter a list of character references (separated by commas) into the output file.

    Example:

    If quotation marks (") are required, they would be represented as &quot;, the character Σ would be represented as &#x3A3;use &quot;,&#x3A3;. & and < are always exported as &amp; and &lt; respectively.

  • Use HTML subfilter

    Imports HTML tags contained in the file. Tags can then be used with HTML File Import Settings. Paragraph tags <p> will create new segments even if Segment Multilingual XML is unselected.

  • Segment multilingual XML

    Text is segmented by a general segmentation rule rather than one segment per cell.

    Caution

    Applying Segment multilingual XML to a file that contains target text may result in a different number of segments in the source than in the target.

Example:

If a multilingual XML contains namespace, the XPath could be the following:

  • Elements containing source and target sub-elements

    //*[local-name()='trans-unit']

  • Elements containing source text

    *[local-name()='source']

  • Elements containing target text

    *[local-name()='target']

Was this article helpful?

Sorry about that! In what way was it not helpful?

The article didn’t address my problem.
I couldn’t understand the article.
The feature doesn’t do what I need.
Other reason.

Note that feedback is provided anonymously so we aren't able to reply to questions.
If you'd like to ask a question you can leave a public comment below or Submit a request to our Support team.
Thank you for your feedback.

Comments

0 comments

Article is closed for comments.