Memsource Legacy API

Memsource Legacy API—Analytics Aggregations

Important:

Memsource Legacy API will be deprecated in 2020. It is disabled for all new organization accounts created after May 7th 2019.
Please use Memsource REST API instead.

Deep-Dive Introduction

Let's start right away with an example of how to run a simple analytics aggregation—getting the total number of jobs you have in your organization:

  • Open up your favorite tool for running HTTP requests (we will use Postman here).
  • Login to Memsource through your PM or Admin account, and obtain your login token using one of the login APIs.
  • POST a request to "api/v3/analytics/jobPart?token=<your_login_token>" with the following JSON body:
{
  "aggregations": {
    "data": {
      "children": {
        "type": "jobPartType"
      }
    }
  }
}
  • You should get a response looking like this:
{
  "hits": {
    "total": 359
  },
  "aggregations": {
    "data": {
      "doc_count": 14417
    }
  }
}
  • For us, the interesting part of the response is under the "aggregations" field, where an aggregation named "data" responded that it has found 14,417 matching documents—shown in the "doc_count" field. That is how many jobs have been created in our example organization.

Total Number of Source Words

  • Let's build on the previous example, and try to obtain the total number of source words in all jobs from our organization.
  • We POST to the same URL as before, but the body will look like this:
{
  "aggregations": {
    "data": {
      "children": {
        "type": "jobPartType"
      },
      "aggs": {
        "wordCount": {
          "sum": {
            "field": "data.volume.words"
          }
        }
      }
    }
  }
}
  • The result will look something like this:
{
  "hits": {
    "total": 359
  },
  "aggregations": {
    "data": {
      "doc_count": 14417,
      "wordCount": {
        "value": 6893067
      }
    }
  }
}
  • We can see that some pattern in the query and response is beginning to appear. Let's look at a few more examples, and then try to explain the different parts of what we see.

Total Number of Source Words Divided by Target Language

  • Query:
{
  "aggregations": {
    "data": {
      "children": {
        "type": "jobPartType"
      },
      "aggs": {
        "byTargetLanguage": {
          "terms": {
            "field": "job.targetLanguage",
            "size": 3
          },
          "aggs": {
            "wordCount": {
              "sum": {
                "field": "data.volume.words"
              }
            }
          }
        }
      }
    }
  }
}
  • Result:
{
  "hits": {
    "total": 359
  },
  "aggregations": {
    "data": {
      "byTargetLanguage": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 10071,
        "buckets": [
          {
            "doc_count": 1835,
            "wordCount": {
              "value": 702721
            },
            "key": "cs"
          },
          {
            "doc_count": 1491,
            "wordCount": {
              "value": 2602529
            },
            "key": "de"
          },
          {
            "doc_count": 1020,
            "wordCount": {
              "value": 92676
            },
            "key": "fi"
          }
        ]
      },
      "doc_count": 14417
    }
  }
}
  • Here we can see a new entity in the response—buckets. When we ask the analytics module to split the data by some category, the result is represented as a list of buckets. Each bucket contains a key that defines what data this bucket represents (in our case a target language) and a value (other aggregations) specific just to this part of the data set.

Total Number of Jobs Divided by Project Status

  • Query:
{
  "aggregations": {
    "projectStatus": {
      "terms": {
        "field": "project.status"
      },
      "aggs": {
        "data": {
          "children": {
            "type": "jobPartType"
          }
        }
      }
    }
  }
}
  • Result:
{
  "hits": {
    "total": 359
  },
  "aggregations": {
    "projectStatus": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "doc_count": 326,
          "data": {
            "doc_count": 14318
          },
          "key": "NEW"
        },
        {
          "doc_count": 31,
          "data": {
            "doc_count": 89
          },
          "key": "COMPLETED"
        },
        {
          "doc_count": 1,
          "data": {
            "doc_count": 4
          },
          "key": "ASSIGNED"
        },
        {
          "doc_count": 1,
          "data": {
            "doc_count": 6
          },
          "key": "DECLINED_BY_VENDOR"
        }
      ]
    }
  }
}
  • This query is a little bit different than the ones before. The part that tells the Analytics module to split the data by project status is before the ever-present "data" aggregation. We will discuss the reason for this later.

Total Number of Jobs From New Projects

  • Query:
{
  "filter": {
    "term": {
      "project.status": "NEW"
    }
  },
  "aggregations": {
    "data": {
      "children": {
        "type": "jobPartType"
      }
    }
  }
}
  • Response:
{
  "hits": {
    "total": 326
  },
  "aggregations": {
    "data": {
      "doc_count": 14318
    }
  }
}
  • Here, we have introduced a new field in the query—"filter". We use this to first narrow down the data we are interested in before doing the aggregations.

Analytics User Manual

Memsource Analytics are based on Elasticsearch

The datastore running all the aggregation queries is built on top of Elasticsearch (currently version 1.7.x). Elasticsearch provides very powerful, real-time searches and aggregation capabilities—our Analytics API forms only a thin, security-enhanced layer on top of it. This means that that the query language we have used in the examples above is actually the query and aggregation language of the Elasticsearch itself.

  • The language used in the "filter" field of the query JSON is fully described in the Elasticsearch Query DSL documentation.
  • The language used in the "aggregations" field of the query JSON is fully described in the Elasticsearch Aggregations documentation.

Indexes

Different types of data live in different indexes. For example, job data lives in the "jobPart" index, and costs data lives in the "costs" index. You can specify which data you want to work with by specifying the index in the aggregation API endpoint URL (see Analytics Api V3). The reference manual below describes the data model of each index in more detail.

Parent-Child Documents

Throughout the deep-dive examples, we have been consistently using an aggregation called "data". The reason for this is because parts of the information about a job live in two different documents in the Elasticsearch DB. These documents are in a relationship called a "parent-child" relationship in the jargon of Elasticsearch. For example, information about a job's project lives in the parent document, while information about the job itself lives in the child document. Depending on the information we are interested in, we put our aggregations either before the data parent-child aggregation or after it. But the data aggregation itself should be present in all our queries. Keep in mind that for each index, the data aggregation will look a little bit different. This is documented in the reference manual below.

Analytics Reference Manual

The last piece of the puzzle is the actual data model in each of the indexes. We need to know what fields of the documents we need to reference in the queries and which of the documents (parent or child) hold what information. The documents will be described using typed JSON. We will start with common JSON snippets used throughout the indexes and then define the indexes themselves. When a field in a JSON description is described by another JSON description, we use "<...>" notation to refer to it. For each index, we will define the data aggregation, parent document, and child document. This should give you all the information you need to run your own queries.

Common Data

Analysis

{
  "id": string,
  "createdBy": <User>,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "innerId": number,
  "name": string,
  "tags": string[],
  "trashed": boolean,
  "trashedBy": <User>,
  "type": string
}

Assignment

{
  "name": string,
  "linguist": <User>,
  "vendor": <Vendor>
}

Automation Widget

{
  "id": string,
  "createdBy": <User>,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "name": string,
  "tags": string[],
  "trashed": boolean,
  "urlId": string
}

Buyer

{
  "id": string,
  "name": string
}

Client

{
  "id": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "externalId": string,
  "name": string,
  "tags": string[],
  "trashed": boolean,
}

CostCenter

{
  "id": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "name": string,
  "tags": string[],
  "trashed": boolean,
}

Domain

{
  "id": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "name": string,
  "tags": string[],
  "trashed": boolean
}

Job

{
  "id": string,
  "createdBy": <User>,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "extension": string,
  "fileName": string,
  "groupCount": number,
  "innerId": "1",
  "languagePair": string,
  "lastModified": Date,
  "localePair": string,
  "sourceLanguage": string,
  "sourceLocale": string,
  "tags": string[],
  "targetLanguage": string,
  "targetLocale": string",
  "taskId": string,
  "trashed": boolean,
  "trashedBy": <User>,
  "uid": string
}

JobPart

{
  "id": "19",
  "assignedTo": <Assignment>,
  "beginIndex": number,
  "buyer": <Buyer>,
  "createdBy": <User>,
  "dateCreated": Date,
  "dateDue": Date,
  "endIndex": number,
  "groupCount": number,
  "innerId": string,
  "lastModified": Date,
  "level": number,
  "status": string,
  "tags": [],
  "uid": string,
  "workflowStep": <WorkflowStep>
}

MtEngine

{
  "id": string,
  "deleted": boolean,
  "default_: boolean,
  "includeTags": boolean,
  "name": string,
  "tags": string[],
  "type": string,
}

NetRateScheme

{
  "id": string,
  "createdBy": <User>,
  "dateCreated": Date,
  "dateDeleted": Date,
  "default_": boolean,
  "deleted": boolean,
  "externalId": string,
  "name": string,
  "tags": string[],
}

PriceList

{
  "id": string,
  "createdBy": <User>,
  "currency": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "default_": boolean,
  "deleted": boolean,
  "name": string,
  "tags": string[],
  "trashed": boolean,
  "unit": string
}

Project

{
  "id": string,
  "buyer": <Buyer>,
  "client": <Client>,
  "costCenter": <CostCenter>,
  "createdBy": <User>,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateDue": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "domain": <Domain>,
  "innerId": number,
  "languagePairs": string[],
  "localePairs": string[],
  "mtEngine": <MtEngine>,
  "name": string,
  "note": string,
  "owner": <User>,
  "sourceLanguage": string,
  "sourceLocale": string,
  "status": string,
  "subDomain": <SubDomain>,
  "tags": string[],
  "targetLanguages": string[],
  "targetLocales": string[],
  "trashed": boolean,
  "trashedBy": <User>,
  "uid": string,
  "vendor": <Vendor>
}

Quote

{
  "id": string,
  "createdBy: <User>,
  "currency": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "innerId": string,
  "name": string,
  "status": string,
  "tags": string[],
  "trashed": boolean,
  "trashedBy": <User>,
  "unit": string
}

Service

{
  "id": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "name": string,
  "publicName": string,
  "tags": string[],
  "trashed": boolean,
  "type": string
}

SubDomain

{
  "id": string,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "name": string,
  "tags": string[],
  "trashed": boolean
}

User

{
  "id": number,
  "active": boolean,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "email": string,
  "firstName": string,
  "fullName": string,
  "innerId": number,
  "jobTitle": string,
  "lastName": string,
  "locale": string,
  "note": string,
  "role": string,
  "tags": string[],
  "timeZone": string,
  "trashed": boolean,
  "userName": string
}

Vendor

{
  "id": string,
  "candidate": boolean,
  "dateCreated": Date,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "deleted": boolean,
  "tags": string[],
  "token": string,
  "trashed": boolean
}

WorkflowStep

{
  "id": string,
  "abbreviation": string,
  "dateDeleted": Date,
  "dateTrashed": Date,
  "name": string,
  "order": number,
  "tags": string[],
  "trashed": boolean
}

Analysis Index

Data Aggregation

"data": {
  "children": {
    "type": "analysisType"
  }
}

Parent Document

{
  "automationWidget": <AutomationWidget>,
  "project": <Project>,
  "service": <Service>
}

Child Document

{
  "analysis": <Analysis>,
  "jobPart": <JobPart>,
  "job": <Job>,
  "netRateScheme": <NetRateScheme>,

  "priority": number,
  "data": {
    "mt": {
      "match0": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match100": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match50": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match75": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match85": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match95": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      }
    },
    "repetitions": {
      "characters": number,
      "pages": number,
      "percent": number,
      "segments": number,
      "words": number
    },
    "tm": {
      "match0": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match100": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match101": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match50": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match75": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match85": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      },
      "match95": {
        "characters": number,
        "pages": number,
        "percent": number,
        "segments": number,
        "words": number
      }
    },
    "total": {
      "characters": number,
      "pages": number,
      "percent": number,
      "segments": number,
      "words": number
    }
  }
}

API Index

Data Aggregation

"data": {
  "children": {
    "type": "apiType"
  }
}

Parent Document

{
  "user": <User>
}

Child Document

{
  "request": {
    "date": Date,
    "host": string,
    "ipv4": string,
    "ipv6": string,
    "location": string,
    "method": string
  },
  "response": {
    "date": Date,
    "duration": number,
    "status": number
  },
  "api": {
    "action": string,
    "asynch": boolean,
    "type": string,
    "uri": string,
    "ver": string
  }
}

Costs Index

Data Aggregation

"data": {
  "children": {
    "type": "costsType"
  }
}

Parent Document

{
  "automationWidget": <AutomationWidget>,
  "project": <Project>,
  "service": <Service>
}

Child Document

{
  "analysis": <Analysis>,
  "job": <Job>,
  "jobPart": <JobPart>,
  "netRateScheme": <NetRateScheme>,
  "priceList": <PriceList>,
  "quote": <Quote>,
  "workflowStep": <WorkflowStep>,
  
  "priority": number,
  "data": {
    "mt": {
      "match0": number,
      "match100": number,
      "match50": number,
      "match75": number,
      "match85": number,
      "match95": number
    },
    "repetitions": number,
    "tm": {
      "match0": number,
      "match100": number,
      "match101": number,
      "match50": number,
      "match75": number,
      "match85": number,
      "match95": number
    },
    "total": number
  }
}

jobPart Index

Data Aggregation

"data": {
  "children": {
    "type": "jobPartType"
  }
}

Parent Document

{
  "automationWidget": <AutomationWidget>,
  "project": <Project>,
  "service": <Service>
}

Child Document

{
"job": <Job>,
"jobPart": <JobPart>

"data": {
"counts": {
"chars": {
"total": number,
"confirmed": number,
  "notConfirmed": number,
"locked": number,
"notLocked": number,
"confirmedAndLocked": number,
"notConfirmedAndLocked": number,
"completed": number,
"notCompleted": number
},
"groups": {
"total": number
},
"segments": {
"total": number,
"confirmed": number,
"notConfirmed": number,
"locked": number,
"notLocked": number,
"confirmedAndLocked": number,
"notConfirmedAndLocked": number,
"completed": number,
"notCompleted": number,
"mt": {
"postEdited": number,
"relevant": number,
"notRelevant": number
},
"qa": {
"checked": number,
"notChecked": number
}
},
"words": {
"total": number,
"confirmed": number,
"notConfirmed": number,
"locked": number,
"notLocked": number,
"confirmedAndLocked": number,
"notConfirmedAndLocked": number,
"completed": number,
"notCompleted": number
},
"qa": {
"warnings": number,
"ignoredWarnings": number,
"notIgnoredWarnings": number
}
}
}