Segmentation rules (the way the original text is split into segments) can be customized to your needs.
One way is to add new abbreviations to the XLSX file (text with the new abbreviation will not be split into to segments) as described in our Segmentation User Manual.
Other way is to edit regular expression of SRX file, which is described below.
Segmentation User Manual | Regular Expression | Excel segmentation
What rules can be changed in SRX file? For example:
- Import text from excel without segmentation > one cell = one segment
- Import text with new line to one segment instead of two
- Don't use semicolon (or any other character) as segment separator
- Use colon (or any other character) as segment separator
- Removing the abbreviation from the list (text will be segmented)
The rules are 'character based', which means that only one character can be used as segment separator (group of characters for example <p> cannot be used as segment separator).
Download SRX file:
- Go to Setup - Segmentation
- Click on Export SRX/CSV.
- Select Format: SRX and Language (the source language of your project) and hit the Download button.
- Save the SRX file to your PC.
Edit the SRX file:
- Open it in text editor (for example Notepad ++ - free for download)
- Edit using Regular Expression or example of Excel segmentation
- <rule break="no"> is the list of rules, where segment will not be broken ie list of abbreviations
- <rule> <beforebreak> - regular expression for character before the break (for example end of the sentence ". ? ! :") - if you for example don't want segment text after colon, simply delete : from every <rule> <beforebreak> code.
- <rule> <afterbreak> - regular expression for character after the break (for example start of the new sentence - space and capital letter)
- Save the modified SRX file.
Upload new segmentation rules to Memsource and use it for import:
- Go to Setup - Segmentation and hit the New button
- Select Language, Name (e.g. "New Segment after Semicolon") and choose the modified SRX file. Check the Primary check box only if you want to make the custom segmentation your primary segmentation for the language. Hit the Create button
- If everything goes well, a message will appear "The segmentation file has been uploaded successfully." And the new file will get listed on the Segmentation page.
- Now go to your project and hit the "New" button to create a new job.
- In File Import Settings expand Segmentation and select your custom segmentation rule.
- Hit the Create button to add the job(s) to your project, segmented with your custom segmentation rules.