GPG guide/Repo Restructure

From LibrePlanet
< GPG guide
Revision as of 15:22, 21 February 2016 by Tgodef (talk | contribs) (The PO4A suite)
Jump to: navigation, search

Current processes for git usage (FSF only)

Translators work in enc/master (or in their own branches on enc, merging with master when they are done). At any given time, some of the translations in enc/master are ready for publication and some are not.

Once translators notify the FSF that their language is ready, the FSF reviews the translation in enc/master by viewing it online at https://enc-dev0.fsf.org. Once we are satisfied, we manually copy the directory for the language from from enc/master to enc-live master locally, test it locally and then commit and push directly to the live site.

  • The testing environment is farther from the live environment than is ideal, but the translation will already have been tested on the dev server and on your local machine, so it's really pretty unlikely that something will go wrong when it's on live. Still, this system is sloppy and produces poor git records of the publishing process.*

Zak's Proposed New System

  • If you have tweaks or corrections, please make them in this section. If you'd like to propose an entirely separate new system (which you are more than welcome to do), please start a new section below.*

The current method is problematic because all the translators are working on the same branch (enc/master). If one language is finished and another is not, the FSF cannot publish the finished language by merging enc/master into enc-live master, because that would also publish the unfinished language. Instead, the FSF manually copies a finished language's directory from a local checkout of enc/master to a local checkout of enc-live/master, which is not the best practice because it forgoes testing in an environment matching the one where the translation will be displayed in enc-live.

This is a proposal for a new system (as well as a transition plan from the old system).

Branch structure

Email Self-Defense has two repositories, which I'll refer to here as enc (for development and testing) and enc-live (the contents of which are served directly to the live site). enc has many branches, but enc-live has only one:

    • enc/master** - Translators should never push anything here unless directed by the FSF. Only for final testing during the FSF's publishing process. Hosted at https://enc-dev0.fsf.org/master.
    • enc/LANGUAGECODE-translation** - one such branch for each language. This is where first-time translations and translation updates are developed.
    • live/master** - the only branch in the live repo, it is hosted at https://emailselfdefense.fsf.org. This is edited only be the FSF, and only by merging from origin/master.

Translating

When developing a new translation or working on any update or change to an existing translation, translators should work in the corresponding branch for their translation (origin/LANGUAGECODE-translation). If it doesn't exist, please create it from the master branch.

When your translation is complete and peer reviewed, send an email to campaigns@fsf.org (FIXME COPY THIS PART). The FSF will review your branch, merge it into master, test it by reviewing that everything displays correctly on https://enc-dev0.fsf.org/next-version, and then publish by pushing enc/master to enc-live/master.

Reviewing your work during translation

Previously, the recommended way to review your work was to push to enc/master and view it on the Web at https://enc-dev0.fsf.org. However, with the new system, we aren't allowing partially finished work on enc/master. From now on, you should set up your own development server on your computer and use it to review your work. (FIXME LINK TO SOME RESOURCES FOR THIS) If you have trouble setting up your development environment, please reach out for help to the esd-translators list. Do not push to master!

Developing a new version (FSF only, English only)

Copy the current contents of enc/master into enc/next-version and edit it. A normal merge most likely won't work -- you'll have to actually copy or use another git technique.

When it's ready, merge enc/next-version into enc/master and then delete everything in enc/next-version and replace it with a single page that explains there is no new version in development at the moment.

Transition plan

Translators should remove any branches on the FSF's server that they are not actually using. The FSF removes the confusing and outdated origin/live branch (which is not actually live anywhere).

There is currently some unfinished translation work in enc/master. Translators: **please make a new translation branch called from master and work from there.** In one month, Zak will create a backup snapshot of enc/master and then will **overwrite master with a current copy of live**.


Therese's proposal: translate PO files

Why switch to the PO system

Not having easy access to the displayed translated pages is a problem for translators right now because they are responsible for keeping the structure of their pages intact. A system that would minimize their dealings with HTML tags would reduce the need for viewing a full-featured page as it is worked on. This is what PO4A is doing, with the help of Gettext tools.

The PO4A suite

It includes many programs. We would essentially use three of them:

  • po4a-gettextize extracts translatable strings from the original version of a document, and lists them in PO format. In the resulting POT (PO template) file, each original string (msgid) is followed by an empty msgstr which will hold the translation. The same POT is used by all the teams to initiate their own PO file. PO4A has modules for many formats, including XHTML (based on the XML module).
  • po4a-translate generates the translated page from the PO and original XHTML.
  • po4a-updatepo transfers modifications and new strings from the original to the POs, and adds “fuzzy” markers to the modified strings. The obsolete strings are kept and can be reused.

In addition:

  • po4a will do the job of all three if the proper configuration file is provided. According to the man page, it is possible to share POs between different documents. When a common string is updated in one PO, its counterparts in the other POs are automatically updated. Another way to do the same thing is to have a single PO for all the pages. TODO: test this.
  • Several other utilities are available in PO4A, and the Gettext tools can be used independently.

Pros and cons

On the plus side:

  • The translator doesn't do anything manually, except translating.
  • Updating translations is very easy.
  • The msgid's and msgstr's are conveniently displayed in a PO editor which often has elaborate features such as a translation memory. A simple text editor can also be used.
  • The validity of POs and regenerated pages can be checked automatically.
  • The PO-sharing feature is equivalent to a templating system, from the translators' point of view.

On the minus side:

  • The PO format is rather strict, and the original page has to be valid XHTML.
  • To make use of the PO-sharing feature, the strings which are common to different originals have to be strictly identical. HTML tags may be different as long as they don't affect the text itself. For instance <li><p>...</p></li> can be replaced with <li>...</li>.
  • Any small change to the original requires an update, even when it doesn't affect translations.
  • po4a-updatepo may propose a translation for a new string if it is similar to an existing one, in which case the string is marked fuzzy. This type of fuzz requires very careful checking.
  • Converting an existing translation to the PO format can be rather frustrating if it doesn't have exactly the same structure as the original. But this happens only once.

My conclusion:
All in all, the pros outweigh the cons. If I have to choose between translating POs and setting up a web server, I will choose the former any time.

==> What is your conclusion?

Implementation

Branch structure

  • enc/next-version: as in Zak's proposal.
  • enc/LANGUAGECODE-translation:
    • PO files being worked on (tracked);
    • Other files, either tracked (e.g. scripts), or untracked (e.g. regenerated HTML).
  • enc/master:
    • Current original pages and derived POT files (tracked).
    • Up-to-date POs (tracked), and derived translated pages (untracked).
  • live/master: publishable pages (tracked).

Creating the POs

This is done with po4a-gettextize. When there is no translation, it makes a POT file. When there is a translation which has exactly the same structure as the original (same number of strings, same HTML tags), it makes a PO. It the structures are slightly different, there are ways to make the them fit artificially. The alternate method is to fill up the POT manually.

A tip: make the PO from the version of the original document it corresponds to.

Updating translations or translating from scratch

  • When a new version is ready, The FSF updates the POTs and propagates the modifications to the POs in enc/LANGUAGECODE-translation (po4a-updatepo).
  • The language teams work on the new or modified strings. During the translation process, the PO can easily be converted into generic HTML for proofreading.
  • When a team decides that their translation is ready, they push the POs to enc/master. A pre-receive hook makes sure they are valid (msgcat), and will produce valid XHTML (xmllint).

Note: Translators should be allowed to fix trivial errors in the English version (without ever touching the text itself, of course).

Generating and publishing translated pages

The FSF generates pages from POs in the enc/master working directory, and checks them. These files should be ignored by Git because they can be regenerated any time.

Note: Regeneration should only take place after a team pushes their updated POs, and before the original is further modified. Otherwise, perfectly good translated strings will be replaced with English strings.

If the pages look good, they are moved or copied to the live/master working directory, committed, and pushed. Merging the two repos wouldn't do the trick since the translated pages are not in the enc/master repo.