Azure has a lot of cool stuff regarding AI. One part is the Azure Cognitive Services family.
The feature I want to explain ab it more in detail in this post is the Azure Translator Service, to be more precise, the document translation. It is in preview at the moment. With this feature we are able to translate a whole bunch of documents from one language to another automatically, while the original document structure keeps the same. I used it to translate some Word .docx documents.
The service supports the translation of
- Office documents (Excel, Outlook, Word, PowerPoint)
- PDF files
- HTML
and more, for a complete list see supported documents
Supported are also custom translations and custom glossaries.
Steps to do:
- Create a Translator Service
- Create a Storage Account
- Create a ClientApp for the HTTP Post
Create a Translator Service
We need to create a Translator Service which is a separate resource and not part of the Cognitive Services resource.
Make sure to choose the S1 Tier to be able to translate documents and note the name you have chosen. With that name, the needed custom domain endpoint for the translation request can be set up. Simply replace <mycustomendpoint>
https://<mycustomendpoint>.cognitiveservices.azure.com/translator/text/batch/v1.0-preview.1
We also take a note of the subscription key:
Create a Storage Account
Next part is a Storage Account (GP 2) with two blob containers, one for the source files and one for the translated target files.
To access the files, 2 Shared Access Signatures have to be created:
- The SAS for the source container should have read and list permissions
- The SAS for the target container should have write and list permissions
Note the generated signatures.
Create a ClientApp for the HTTP Post
The batch translation job is started through a http Post, in this case demonstrated with a .Net core console App. Choose Postman or other tools if you prefer.
We have to set up the correct (custom) endpoint, the subscription key and both SAS in the JSON string. The example will translate from german to english.
Programm.cs:
static readonly string route = "/batches";
private static readonly string endpoint = "https://(custom)endpoint.cognitiveservices.azure.com/translator/text/batch/v1.0-preview.1";
private static readonly string subscriptionKey = "key1";
static readonly string json = ("{\"inputs\": [{\"source\": {\"sourceUrl\": \"SAS-source\",\"storageSource\": \"AzureBlob\",\"language\": \"de\" }, \"targets\": [{\"targetUrl\": \"SAS-target\",\"storageSource\": \"AzureBlob\",\"category\": \"general\",\"language\": \"en\"}]}]}");
static async Task Main(string[] args)
{
using HttpClient client = new HttpClient();
using HttpRequestMessage request = new HttpRequestMessage();
{
StringContent content = new StringContent(json, Encoding.UTF8, "application/json");
request.Method = HttpMethod.Post;
request.RequestUri = new Uri(endpoint + route);
request.Headers.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
request.Content = content;
HttpResponseMessage response = await client.SendAsync(request);
string result = response.Content.ReadAsStringAsync().Result;
if (response.IsSuccessStatusCode)
{
Console.WriteLine($"Status code: {response.StatusCode}");
Console.WriteLine();
Console.WriteLine($"Response Headers:");
Console.WriteLine(response.Headers);
}
else
Console.Write("Error");
}
}
Starting the program will create the translation job. For more details, the job can be queried with a GET request and the job ID, which is stated in the response Header (Operation-Location)
The result: Every document from the source Blob container is completely translated and stored in the target Blob container.
For deeper diving, check:
Happy translations!