Latest Posts

Topic: Transifex Translation Memory Fill-Up

Nordfriese
Avatar
Topic Opener
Joined: 2017-01-17, 18:07
Posts: 2051
OS: Debian Testing
Version: Latest master
Ranking
One Elder of Players
Location: 0x55555d3a34c0
Posted at: 2022-05-21, 09:29

Transifex offers an option to "automatically translate phrases with exact matches from the Translation Memory" when a source string is updated. This reduces work when the same string appears in multiple contexts, but may also lead to incorrect autogenerated translations in case a string needs to be disambiguated in a target language but not in English. Should we enable this option, or is it better to leave it off?


Top Quote
tothxa
Avatar
Joined: 2021-03-24, 12:44
Posts: 481
OS: antix / Debian
Version: some new PR I'm testing
Ranking
Tribe Member
Posted at: 2022-05-22, 13:36

Can we get some statistics how much it would fill without actually doing it? Does transifex mark these auto filled translations? (I once uploaded some offline translations with such draft translator remarks, and transifex just ignored them.) Between different source files it may be less risky, but IMO different contexts should be respected and manually reviewed.

I think a bigger issue is when there is a minor fix in the English text, especially if it's just punctuation, typo or English specific grammar that don't need changing the translations that were done properly. These are handled by gettext and transifex as if they were completely new strings and the translation is lost, and must be manually restored from translation memory. But of course these are better checked manually as well because there's no automatic way to tell if a small change requires change in a given translation. Again, ideally these should be marked as minor changes that need reexamination.


Top Quote
Nordfriese
Avatar
Topic Opener
Joined: 2017-01-17, 18:07
Posts: 2051
OS: Debian Testing
Version: Latest master
Ranking
One Elder of Players
Location: 0x55555d3a34c0
Posted at: 2022-05-22, 14:30

Transifex doesn't offer such statistics AFAIK, so I wrote a quick program (attached) to gather these stats. Plural forms and contexts are ignored for the sake of simplicity. The number of duplicate strings that would be filled in per language in current master are:

Language Translated Untranslated Duplicates
ar 1330 6320 243
bg 3988 3662 289
br 866 6784 115
ca 7650 0 0
cs 7242 408 6
da 5578 2072 190
de 7645 5 0
el 2175 5475 220
en_GB 3585 4065 290
en_US 63 7587 6
eo 1247 6403 236
es 6447 1203 64
eu 331 7319 45
fa 93 7557 16
fi 7461 189 13
fr 7234 416 9
fy 1067 6583 210
ga 29 7621 0
gd 6592 1058 81
gl 1080 6570 185
he 308 7342 94
hi 33 7617 4
hr 741 6909 150
hu 7483 167 6
id 183 7467 48
ig 79 7571 24
it 5023 2627 140
ja 4537 3113 193
ka 32 7618 0
ko 6827 823 36
krl 75 7575 11
la 950 6700 202
lt 601 7049 72
ms 1335 6315 159
nb 2058 5592 229
nds 7645 5 0
nl 6030 1620 160
nn 747 6903 138
pl 5678 1972 227
pt 3959 3691 232
pt_BR 2211 5439 207
ro 271 7379 66
ru 7397 253 1
sk 3657 3993 244
sl 1285 6365 129
sr 206 7444 30
sr_RS 29 7621 0
sv 4600 3050 336
tr 621 7029 114
uk 932 6718 217
zh_CN 3373 4277 191
zh_TW 282 7368 62

This Transifex option is only on/off, we can't customise to differentiate by resource or context.

There's unfortunately no way to treat two source strings as "similar" in gettext AFAIK.

Edited: 2022-05-22, 14:32

Top Quote
hessenfarmer
Avatar
Joined: 2014-12-11, 23:16
Posts: 2730
Ranking
One Elder of Players
Location: Bavaria
Posted at: 2022-05-22, 16:00

Using this option would be problematich with all soldier related translations. In Englisch they are all soldiers. but in German for example we have "Soldat" for Empire and Atlanteans, "Krieger" for Barbarians and Frisians and "Kriegerin" for Amazons.
I really would prefer top stick for manual check and use the suggested memory strings in these cases.
IIRC we made these Soldier Strings translatable in different fashions by using Pgetttext and they were treated as the same string before, with multiple occurences.
So AFAIK 100% matching strings are translated the same as long as they are not excluded by some function like Pgettext.


Top Quote
tothxa
Avatar
Joined: 2021-03-24, 12:44
Posts: 481
OS: antix / Debian
Version: some new PR I'm testing
Ranking
Tribe Member
Posted at: 2022-05-22, 17:17

300 strings making 4% of all is indeed in the region that is annoying to some, but not annoying enough to easily let go of manual checking to others. I'm in the latter camp. face-wink.png


Top Quote