What is HTML and why is it important for website Translation?
Every original phrase that you translate in SiteTran, is extracted from a website’s HTML.
You don't need to know too much about HTML to be able to use SiteTran, but it's nice to have an expert understanding, so here it is:
HTML is the language that websites are written in. Web browsers use it to generate web pages, including the one you’re on right now. With HTML elements (The building blocks of HTML code), a website’s content can be organized, separated, and have different functionality like text, buttons, images, videos forms, select boxes, text-areas, etc. and styles and positioning can be applied.
Example Page:
Below is an example "page". We'll use it to understand how HTML works:
This is a Heading
This is a normal phrase. It would be straightforward to translate.
This is another phrase. The p element in HTML represents a “paragraph”.
This is a link to an external webpage.
This is some text in bold, and this is some text in italic.
- List item 1
- List item 2
- List item 3
Example Page HTML:
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a normal phrase.</p>
<p>This is another phrase. The <p> element in HTML represents a “paragraph”.</p>
<img src="https://www.sitetran.com/images/Green-black-sitetran-no-background.svg" alt="This is the alt text of an image. it should discribe the image" width="500">
<p>This is a <a href="https://www.example.com">link</a> to an external webpage.</p>
<p>This is some text in <strong>bold</strong>, and this is some text in <em>italic</em>.</p>
<ol>
<li>List item 1</li>
<li>List item 2</li>
<li>List item 3</li>
</ol>
</body>
</html>
From the above, you can see the This is a Heading element, which is created from this HTML code: <h1>This is a Heading</h1>
Its called an H1 element, and it's comprised of an "opening" tag <h1> (used at the start of the element), and a “closing” tag </h1> (used at the end of the element), the closing tag has a slash “/” before the element/tag name h1.
After SiteTran extracts the contents of that element, the phrase “This is a Heading” would get added to the translator interface.
In SiteTran, the contents of each element are treated as distinct phrases. This leads to a sentence sometimes being broken up into multiple phrases, which is necessary to accommodate the different styling, or spacing requirements of the website. Each phrase in SiteTran is exactly how the text appears in the original HTML of the website!
Understand how SiteTran extracts HTML and presents it in the translation table:
From the example.html HTML code above, let’s let look at the code, that generates the following sentence:
This sentence is actually broken up into 3 distinct phrases in SiteTran (as well as the original HTML of the site):
- [This is a ]
- [link]
- [ to an external webpage.]
You can see why it’s like that, when looking at the html:
<p>This is a <a href="https://www.example.com">link</a> to an external webpage.</p>
Because the texts are either in separate elements, or they are split up by having an element between them, SiteTran treats them as distinct phrases. So when you translate, make sure you understand what what's before and what comes after. Context matters!
Note: Word order can vary between languages. This means that sometimes words need to be moved in the translation from one phrase to another, in order to keep proper grammatical structure!