The ATA Chronicle - September/October 2021 - 36

RR
Figure 4: Fine-tuning settings in OPUS-CAT
training with it and see what
happens. (I haven't done it
yet.) Finally, don't use any
of your training sentences
in validation or testing. That
would run afoul of the first
principle of machine learning!
( " Never use your training data
in testing! " ) I suggest using
your CAT tool to filter out any
TM matches > 75% from both.
So far, my decidedly
Figure 5: Tracking fine-tuning progress with BLEU scores. From: Nieminen 2021
customize.yml file (see Figure
3b), which can be opened with
Notepad++ or by clicking Open
fine-tune setting in text editor in
the Settings tab of the GUI (see
Figure 2).
Consider using more
demanding validation sets
(e.g., your own domainspecific
parallel Unicode text
files with 100-300 sentence
pairs) instead of letting the
system split your TMX. And if
you know what you're doing:
try repeated training (with
the same or different TMs) by
replacing the model file in the
base model directory with a
similarly named file from the
fine-tuned model directory
and deleting the latter. As
a last resort, try increasing
the learning rate (roughly,
the proportion by which the
parameters of your model are
adjusted after each training
36 The ATA Chronicle | September/October 2021
batch) from, for example,
learn-rate:0.00002 to, say,
learn-rate:0.00005 in the
customize.yml file. But beware
of overfitting: the system
may effectively " memorize "
your training data and
become unable to generalize
beyond them. This is a bit
like manually oversteering a
self-driving car into a wreck.
If you have a good in-domain
glossary, try to repeat the
unscientific experimentation
with all of that has produced
mixed results. For example,
using in-domain resources
enabled the system to
correct its initial translation
of " fibrotic infiltration "
from фибротическая
инфильтрация to
фибриозная инфильтрация.
But the fine-tuned model
sometimes misses or
mistranslates entire fragments
in the test sentences, such
as " positive end expiratory
pressure, " while generating
very fluent output. Postediting
it would definitely
require extra care. Repeated
training or increasing the
number of epochs improves
the translation of some
parts of a test sentence but
deteriorates others. Finetuning
with a mammoth TM
(general health care, about
350K units) and a more
narrowly specialized custom
validation set took over six
hours, degraded both in- and
out-of-domain performance,
but did surprisingly well on
that famous " garden path "
sentence, translating " The
horse raced past the barn
fell " as Лошадь, которая
бежала мимо сарая, упала
( " The horse, which ran past
the barn, fell " ), and beating
Google Translate, DeepL, and
all my previous OPUS-CAT
models. Neural networks are
www.atanet.org
http://www.atanet.org

The ATA Chronicle - September/October 2021

Table of Contents for the Digital Edition of The ATA Chronicle - September/October 2021

Contents
The ATA Chronicle - September/October 2021 - 1
The ATA Chronicle - September/October 2021 - Contents
The ATA Chronicle - September/October 2021 - 3
The ATA Chronicle - September/October 2021 - 4
The ATA Chronicle - September/October 2021 - 5
The ATA Chronicle - September/October 2021 - 6
The ATA Chronicle - September/October 2021 - 7
The ATA Chronicle - September/October 2021 - 8
The ATA Chronicle - September/October 2021 - 9
The ATA Chronicle - September/October 2021 - 10
The ATA Chronicle - September/October 2021 - 11
The ATA Chronicle - September/October 2021 - 12
The ATA Chronicle - September/October 2021 - 13
The ATA Chronicle - September/October 2021 - 14
The ATA Chronicle - September/October 2021 - 15
The ATA Chronicle - September/October 2021 - 16
The ATA Chronicle - September/October 2021 - 17
The ATA Chronicle - September/October 2021 - 18
The ATA Chronicle - September/October 2021 - 19
The ATA Chronicle - September/October 2021 - 20
The ATA Chronicle - September/October 2021 - 21
The ATA Chronicle - September/October 2021 - 22
The ATA Chronicle - September/October 2021 - 23
The ATA Chronicle - September/October 2021 - 24
The ATA Chronicle - September/October 2021 - 25
The ATA Chronicle - September/October 2021 - 26
The ATA Chronicle - September/October 2021 - 27
The ATA Chronicle - September/October 2021 - 28
The ATA Chronicle - September/October 2021 - 29
The ATA Chronicle - September/October 2021 - 30
The ATA Chronicle - September/October 2021 - 31
The ATA Chronicle - September/October 2021 - 32
The ATA Chronicle - September/October 2021 - 33
The ATA Chronicle - September/October 2021 - 34
The ATA Chronicle - September/October 2021 - 35
The ATA Chronicle - September/October 2021 - 36
The ATA Chronicle - September/October 2021 - 37
The ATA Chronicle - September/October 2021 - 38
The ATA Chronicle - September/October 2021 - 39
The ATA Chronicle - September/October 2021 - 40
https://www.nxtbook.com/nxtbooks/chronicle/20240102
https://www.nxtbook.com/nxtbooks/chronicle/20231112
https://www.nxtbook.com/nxtbooks/chronicle/20230910
https://www.nxtbook.com/nxtbooks/chronicle/20230506
https://www.nxtbook.com/nxtbooks/chronicle/20230304
https://www.nxtbook.com/nxtbooks/chronicle/20230102
https://www.nxtbook.com/nxtbooks/chronicle/20221112
https://www.nxtbook.com/nxtbooks/chronicle/20220910
https://www.nxtbook.com/nxtbooks/chronicle/20220708
https://www.nxtbook.com/nxtbooks/chronicle/20220506
https://www.nxtbook.com/nxtbooks/chronicle/20220304
https://www.nxtbook.com/nxtbooks/chronicle/20220102
https://www.nxtbook.com/nxtbooks/chronicle/20211112
https://www.nxtbook.com/nxtbooks/chronicle/20210910
https://www.nxtbook.com/nxtbooks/chronicle/20210708
https://www.nxtbook.com/nxtbooks/chronicle/20210506
https://www.nxtbook.com/nxtbooks/chronicle/20210304
https://www.nxtbook.com/nxtbooks/chronicle/20210102
https://www.nxtbookmedia.com