Difference between revisions of "GPG guide/Repo Restructure"

From LibrePlanet
Jump to: navigation, search
m (Updating translations or translating from scratch)
(Templating: Add the possibility of a style element; reword.)
 
(6 intermediate revisions by the same user not shown)
Line 55: Line 55:
  
 
In addition:
 
In addition:
* [http://dev.man-online.org/man1/po4a/ po4a] will do the job of all three if the proper configuration file is provided. According to the man page, it is possible to share POs between different documents. When a common string is updated in one PO, its counterparts in the other POs are automatically updated. Another way to do the same thing is to have a single PO for all the pages. TODO: test this.
+
* [http://dev.man-online.org/man1/po4a/ po4a] will do the job of all three if the proper configuration file is provided. According to the man page, it is possible to share POs between different documents. When a common string is updated in one PO, its counterparts in the other POs are automatically updated. Another way to do the same thing is to have a single “big” POT file (and one big PO per language) for all the pages. This method has been tested and it works fine.
 
* Several other utilities are available in PO4A, and the Gettext tools can be used independently.
 
* Several other utilities are available in PO4A, and the Gettext tools can be used independently.
  
Line 68: Line 68:
 
On the minus side:
 
On the minus side:
 
* The PO format is rather strict, and the original page has to be well-formed XHTML.
 
* The PO format is rather strict, and the original page has to be well-formed XHTML.
* To make use of the PO-sharing feature, the strings which are common to different originals have to be strictly identical. HTML tags may be different as long as they don't affect the text itself. For instance <li><p>...</p></li> can be replaced with <li>...</li>.
+
* Converting an existing translation to the PO format can be rather frustrating if it doesn't have exactly the same structure as the original. But this happens only once.
 
* Any small change to the original requires an update, even when it doesn't affect translations.
 
* Any small change to the original requires an update, even when it doesn't affect translations.
 
* po4a-updatepo may propose a translation for a new string if it is similar to an existing one, in which case the string is marked fuzzy. This type of fuzz requires very careful checking.
 
* po4a-updatepo may propose a translation for a new string if it is similar to an existing one, in which case the string is marked fuzzy. This type of fuzz requires very careful checking.
* Converting an existing translation to the PO format can be rather frustrating if it doesn't have exactly the same structure as the original. But this happens only once.
+
* To take advantage of the PO-sharing feature, the strings that are common to different originals have to be strictly identical. HTML tags may be different as long as they don't affect the text itself. For instance <li><p>...</p></li> can be replaced with <li>...</li> (except for initial conversion of an existing translation).
  
 
My conclusion:<br />
 
My conclusion:<br />
Line 80: Line 80:
 
==Implementation==
 
==Implementation==
  
===Branch structure===
+
===Templating===
 +
Since the language list is not supposed to be translated, the POT is made from a file that doesn't include it; inclusion is done after regeneration of the page, and customization (addition of the “current” class) is done just before inclusion, or immediately afterwards.
 +
 
 +
We may also have other includes: javascript, and header (doctype + html tag + untranslatable meta, link, and optional style elements, some of which need to be customized for RTL languages).
 +
 
 +
The 3 home pages (index.html, mac.html and windows.html) could be templated further (kitchen-style).
 +
 
 +
===Structure of the repo===
 +
 
 +
====1st possibility: one development branch for each language====
 +
 
 +
This is what Zak proposes. To make a clear distinction between branches and subdirectories, branches are in parentheses.
 +
 
 +
* enc/en(next): stripped originals being worked on
 +
* enc/en(master): current stripped originals
 +
* enc(master): current POT and common tools (includes, scripts and makefiles)
 +
* enc/LANG(master): up-to-date POs
 +
* enc/LANG(dev): POs being worked on and language-specific scripts (e.g. to customize the language list or the header)
 +
* enc/en(untracked): complete originals
 +
* enc/LANG(untracked): complete translations
 +
 
 +
* live/en(master): published originals
 +
* live/LANG(master): published translations
 +
 
 +
I see a problem with the following statement: “Translators should never push anything here unless directed by the FSF.” For the FSF to determine whether a translation should be pushed, the development branches (dev) have to be checked out one by one (unless there are as many copies of the repo on the FSF's machine as there are languages). This may become tedious.
 +
 
 +
====2nd possibility: one module for each language====
 +
 
 +
It is very easy to create modules from the current repo without loosing any history (tested).
 +
 
 +
Advantages:
 +
 
 +
* Each module only has the history of its own files. It is thus much lighter than the complete repo (2.5 to 13 MB, depending on the language).
 +
* No need to switch branches, something that may be problematic for people who are new to Git. Translators push to their module's master branch whenever they wish to save their work, instead of either pushing to their own branch or merging with master.
 +
* No need to switch branches to see the current state of translations.
  
* enc/next-version: as in Zak's proposal.
+
Disadvantages:
  
* enc/LANGUAGECODE-translation:
+
* Setting up the working directory is a little more complicated than in the current setup: translators have to check out the main repo (enc), then 3 modules (LANG, en, and static); the FSF has to check out all the modules, but this can be done automatically.
** PO files being worked on (tracked);<br />
+
* Modules need to be updated individually, but this too can be done by a script.
** Other files, either tracked (e.g. scripts), or untracked (e.g. regenerated HTML).
 
  
* enc/master:
+
From now on, we assume that each language has its own module. Each directory contains the same files as above, only there is no distinction between the up-to-date POs and the POs being worked on.
** Current original pages and derived POT files (tracked).
 
** Up-to-date POs (tracked), and derived translated pages (untracked).
 
  
* live/master: publishable pages (tracked).
+
===Creating the POT and the POs===
  
===Creating the POs===
+
This is done with po4a-gettextize. When there is no translation, it makes a POT file. A big POT already exists for version 4.0.
  
This is done with po4a-gettextize. When there is no translation, it makes a POT file. When there is a translation which has exactly the same structure as the original (same number of strings, same HTML tags), it makes a PO. If the structures are slightly different, there are ways to make them fit artificially. The alternate method is to fill up the POT manually.
+
When a translation exists, and has exactly the same structure as the original (same number of strings, same HTML tags), po4a-gettextize makes a PO. If the structures are slightly different, there are ways to make them fit artificially. The alternate method is to fill up the POT manually.
  
 
A tip: make the PO from the version of the original document it corresponds to.
 
A tip: make the PO from the version of the original document it corresponds to.
Line 102: Line 133:
 
===Updating translations or translating from scratch===
 
===Updating translations or translating from scratch===
  
* When a new version is ready, The FSF updates the POTs and propagates the modifications to the POs in enc/LANGUAGECODE-translation (po4a-updatepo).
+
* When a new version is ready, The FSF updates the POT and propagates the modifications to the POs in enc/LANG (po4a-updatepo).
 
* The language teams work on the new or modified strings. During the translation process, the PO can easily be converted to generic HTML for proofreading.
 
* The language teams work on the new or modified strings. During the translation process, the PO can easily be converted to generic HTML for proofreading.
* When a team decides that their translation is ready, they push the POs to enc/master. A pre-receive hook makes sure they are valid (msgcat), and will produce valid XHTML (xmllint).
+
* When a team decides that their translation is ready, they push the POs to the master branch of their module. A pre-receive hook makes sure the POs are valid (msgcat), and will produce well formed XHTML (xmllint). A similar script already exists.
  
 
<em>Note: Translators should be allowed to fix trivial errors in the English version (without ever touching the text itself, of course).</em>
 
<em>Note: Translators should be allowed to fix trivial errors in the English version (without ever touching the text itself, of course).</em>
Line 110: Line 141:
 
===Generating and publishing translated pages===
 
===Generating and publishing translated pages===
  
The FSF generates pages from POs in the enc/master working directory, and checks them. These files should be ignored by Git because they can be regenerated any time.
+
The FSF generates pages from POs in the enc/LANG directory, and checks them. These files should be ignored by Git because they can be regenerated any time.
  
 
<em>Note: Regeneration should only take place after a team pushes their updated POs, and before the original is further modified. Otherwise, perfectly good translated strings will be replaced with English strings.</em>
 
<em>Note: Regeneration should only take place after a team pushes their updated POs, and before the original is further modified. Otherwise, perfectly good translated strings will be replaced with English strings.</em>
 
    
 
    
If the pages look good, they are moved or copied to the live/master working directory, committed, and pushed. Merging the two repos wouldn't do the trick since the translated pages are not in the enc/master repo.
+
If the pages look good, they are moved or copied to the live/master working directory, committed, and pushed. Merging the two repos wouldn't do the trick because the translated pages are untracked.

Latest revision as of 10:42, 28 April 2016

Current processes for git usage (FSF only)

Translators work in enc/master (or in their own branches on enc, merging with master when they are done). At any given time, some of the translations in enc/master are ready for publication and some are not.

Once translators notify the FSF that their language is ready, the FSF reviews the translation in enc/master by viewing it online at https://enc-dev0.fsf.org. Once we are satisfied, we manually copy the directory for the language from from enc/master to enc-live master locally, test it locally and then commit and push directly to the live site.

  • The testing environment is farther from the live environment than is ideal, but the translation will already have been tested on the dev server and on your local machine, so it's really pretty unlikely that something will go wrong when it's on live. Still, this system is sloppy and produces poor git records of the publishing process.*

Zak's Proposed New System

  • If you have tweaks or corrections, please make them in this section. If you'd like to propose an entirely separate new system (which you are more than welcome to do), please start a new section below.*

The current method is problematic because all the translators are working on the same branch (enc/master). If one language is finished and another is not, the FSF cannot publish the finished language by merging enc/master into enc-live master, because that would also publish the unfinished language. Instead, the FSF manually copies a finished language's directory from a local checkout of enc/master to a local checkout of enc-live/master, which is not the best practice because it forgoes testing in an environment matching the one where the translation will be displayed in enc-live.

This is a proposal for a new system (as well as a transition plan from the old system).

Branch structure

Email Self-Defense has two repositories, which I'll refer to here as enc (for development and testing) and enc-live (the contents of which are served directly to the live site). enc has many branches, but enc-live has only one:

    • enc/master** - Translators should never push anything here unless directed by the FSF. Only for final testing during the FSF's publishing process. Hosted at https://enc-dev0.fsf.org/master.
    • enc/LANGUAGECODE-translation** - one such branch for each language. This is where first-time translations and translation updates are developed.
    • live/master** - the only branch in the live repo, it is hosted at https://emailselfdefense.fsf.org. This is edited only be the FSF, and only by merging from origin/master.

Translating

When developing a new translation or working on any update or change to an existing translation, translators should work in the corresponding branch for their translation (origin/LANGUAGECODE-translation). If it doesn't exist, please create it from the master branch.

When your translation is complete and peer reviewed, send an email to campaigns@fsf.org (FIXME COPY THIS PART). The FSF will review your branch, merge it into master, test it by reviewing that everything displays correctly on https://enc-dev0.fsf.org/next-version, and then publish by pushing enc/master to enc-live/master.

Reviewing your work during translation

Previously, the recommended way to review your work was to push to enc/master and view it on the Web at https://enc-dev0.fsf.org. However, with the new system, we aren't allowing partially finished work on enc/master. From now on, you should set up your own development server on your computer and use it to review your work. (FIXME LINK TO SOME RESOURCES FOR THIS) If you have trouble setting up your development environment, please reach out for help to the esd-translators list. Do not push to master!

Developing a new version (FSF only, English only)

Copy the current contents of enc/master into enc/next-version and edit it. A normal merge most likely won't work -- you'll have to actually copy or use another git technique.

When it's ready, merge enc/next-version into enc/master and then delete everything in enc/next-version and replace it with a single page that explains there is no new version in development at the moment.

Transition plan

Translators should remove any branches on the FSF's server that they are not actually using. The FSF removes the confusing and outdated origin/live branch (which is not actually live anywhere).

There is currently some unfinished translation work in enc/master. Translators: **please make a new translation branch called from master and work from there.** In one month, Zak will create a backup snapshot of enc/master and then will **overwrite master with a current copy of live**.


Therese's proposal: translate PO files

Why switch to the PO system

Not having easy access to the displayed translated pages is a problem for translators right now because they are responsible for keeping the structure of their pages intact. A system that would minimize their dealings with HTML tags would reduce the need for viewing a full-featured page as it is worked on. This is what PO4A is doing, with the help of Gettext tools.

The PO4A suite

It includes many programs. We would essentially use three of them:

  • po4a-gettextize extracts translatable strings from the original version of a document, and lists them in PO format. In the resulting POT (PO template) file, each original string (msgid) is followed by an empty msgstr which will hold the translation. The same POT is used by all the teams to initiate their own PO file. PO4A has modules for many formats, including XHTML (based on the XML module).
  • po4a-translate generates the translated page from the PO and original XHTML.
  • po4a-updatepo transfers modifications and new strings from the original to the POs, and adds “fuzzy” markers to the modified strings. The obsolete strings are kept and can be reused.

In addition:

  • po4a will do the job of all three if the proper configuration file is provided. According to the man page, it is possible to share POs between different documents. When a common string is updated in one PO, its counterparts in the other POs are automatically updated. Another way to do the same thing is to have a single “big” POT file (and one big PO per language) for all the pages. This method has been tested and it works fine.
  • Several other utilities are available in PO4A, and the Gettext tools can be used independently.

Pros and cons

On the plus side:

  • The translator doesn't do anything manually, except translating.
  • Updating translations is very easy.
  • The msgid's and msgstr's are conveniently displayed in a PO editor which often has elaborate features such as a translation memory. A simple text editor can also be used.
  • The validity of POs and regenerated pages can be checked automatically.
  • The PO-sharing feature is equivalent to a templating system, from the translators' point of view.

On the minus side:

  • The PO format is rather strict, and the original page has to be well-formed XHTML.
  • Converting an existing translation to the PO format can be rather frustrating if it doesn't have exactly the same structure as the original. But this happens only once.
  • Any small change to the original requires an update, even when it doesn't affect translations.
  • po4a-updatepo may propose a translation for a new string if it is similar to an existing one, in which case the string is marked fuzzy. This type of fuzz requires very careful checking.
  • To take advantage of the PO-sharing feature, the strings that are common to different originals have to be strictly identical. HTML tags may be different as long as they don't affect the text itself. For instance <li><p>...</p></li> can be replaced with <li>...</li> (except for initial conversion of an existing translation).

My conclusion:
All in all, the pros outweigh the cons. If I have to choose between translating POs and setting up a web server, I will choose the former any time.

==> What is your conclusion?

Implementation

Templating

Since the language list is not supposed to be translated, the POT is made from a file that doesn't include it; inclusion is done after regeneration of the page, and customization (addition of the “current” class) is done just before inclusion, or immediately afterwards.

We may also have other includes: javascript, and header (doctype + html tag + untranslatable meta, link, and optional style elements, some of which need to be customized for RTL languages).

The 3 home pages (index.html, mac.html and windows.html) could be templated further (kitchen-style).

Structure of the repo

1st possibility: one development branch for each language

This is what Zak proposes. To make a clear distinction between branches and subdirectories, branches are in parentheses.

  • enc/en(next): stripped originals being worked on
  • enc/en(master): current stripped originals
  • enc(master): current POT and common tools (includes, scripts and makefiles)
  • enc/LANG(master): up-to-date POs
  • enc/LANG(dev): POs being worked on and language-specific scripts (e.g. to customize the language list or the header)
  • enc/en(untracked): complete originals
  • enc/LANG(untracked): complete translations
  • live/en(master): published originals
  • live/LANG(master): published translations

I see a problem with the following statement: “Translators should never push anything here unless directed by the FSF.” For the FSF to determine whether a translation should be pushed, the development branches (dev) have to be checked out one by one (unless there are as many copies of the repo on the FSF's machine as there are languages). This may become tedious.

2nd possibility: one module for each language

It is very easy to create modules from the current repo without loosing any history (tested).

Advantages:

  • Each module only has the history of its own files. It is thus much lighter than the complete repo (2.5 to 13 MB, depending on the language).
  • No need to switch branches, something that may be problematic for people who are new to Git. Translators push to their module's master branch whenever they wish to save their work, instead of either pushing to their own branch or merging with master.
  • No need to switch branches to see the current state of translations.

Disadvantages:

  • Setting up the working directory is a little more complicated than in the current setup: translators have to check out the main repo (enc), then 3 modules (LANG, en, and static); the FSF has to check out all the modules, but this can be done automatically.
  • Modules need to be updated individually, but this too can be done by a script.

From now on, we assume that each language has its own module. Each directory contains the same files as above, only there is no distinction between the up-to-date POs and the POs being worked on.

Creating the POT and the POs

This is done with po4a-gettextize. When there is no translation, it makes a POT file. A big POT already exists for version 4.0.

When a translation exists, and has exactly the same structure as the original (same number of strings, same HTML tags), po4a-gettextize makes a PO. If the structures are slightly different, there are ways to make them fit artificially. The alternate method is to fill up the POT manually.

A tip: make the PO from the version of the original document it corresponds to.

Updating translations or translating from scratch

  • When a new version is ready, The FSF updates the POT and propagates the modifications to the POs in enc/LANG (po4a-updatepo).
  • The language teams work on the new or modified strings. During the translation process, the PO can easily be converted to generic HTML for proofreading.
  • When a team decides that their translation is ready, they push the POs to the master branch of their module. A pre-receive hook makes sure the POs are valid (msgcat), and will produce well formed XHTML (xmllint). A similar script already exists.

Note: Translators should be allowed to fix trivial errors in the English version (without ever touching the text itself, of course).

Generating and publishing translated pages

The FSF generates pages from POs in the enc/LANG directory, and checks them. These files should be ignored by Git because they can be regenerated any time.

Note: Regeneration should only take place after a team pushes their updated POs, and before the original is further modified. Otherwise, perfectly good translated strings will be replaced with English strings.

If the pages look good, they are moved or copied to the live/master working directory, committed, and pushed. Merging the two repos wouldn't do the trick because the translated pages are untracked.