[Padre-dev] localizing plugin documentation

Enrique Nell perl_nell at telefonica.net
Thu Apr 30 17:15:49 PDT 2009


Hi

On Apr 30, 2009, at 3:31 PM, Joaquin Ferrero wrote:

> We used OmegaT for the Perl Spanish Project:

I have been using OmegaT lately, and it hangs frequently on Mac OS X,  
but otherwise it's quite nice.
It provides fuzzy matching and it can handle conveniently the document  
updates, so I think that
it is a good option for the translation of documentation.
The issue here is that most of the current computer-assisted  
translation (CAT) tools, like OmegaT,
work at the segment level, and POD is not too friendly in this  
respect, since paragraphs are formatted
using line breaks, i.e., a typical segment (sentence) is broken in  
several lines.
For instance, a CAT tool will find 3 segments in the following single- 
sentence paragraph:

If a message can be controlled by the C<warnings> pragma, its warning
category is included with the classification letter in the description
below.

There's no point in using this kind of translation memory tool if we  
are not going to process complete sentences.
So, to handle this correctly, we should pre-process the POD files and  
either remove these "inner" line breaks,
or turn them into a literal "\n". Any CPAN module available for this?  
Is it possible using gettext? (I don't know much
about it... good references are welcome.)
Some time ago I developed a short program to solve a similar problem:  
I had to remove some of the line breaks
from a set of PDF export files to get full sentences, in order to  
align them and create a translation memory.
It was based on some heuristics and the output wasn't perfect, but it  
fixed a high percentage of the issues.
Perhaps some of you know better ways of doing this.

> The translation is at sentence level, and the process is very fast  
> if the players interchange the translation memories.
> Well, in theory...

In a typical project you won't have that many repetitions and high  
fuzzy matches to get a speed boost
(except for naïve texts); that would require working with a finer  
granularity (at the chunk level).
But it sure helps to improve consistency and makes updating the files  
a lot easier: OmegaT pre-translates
the new version of the file using the translation memory and it also  
shows the differences for fuzzy matches.

Enrique


More information about the Padre-dev mailing list