The ultimate goal of AI is a world taking care of itself, so the humans can finally become a pack of carefree, barefooted, long-haired hippies. (← irony)

Posts

Lessons learned from analyzing values in multilingual encoders and what it means for LLMs

This post is a hindsight on two studies on multilingual sentence embeddings we published a year ago and comments on what I think people analyzing LLMs today should take away from them. In late 2022, we (which mainly was the work of Kathy Hämmerl from...

en  ⏰ 4.0 min

Highlights from Machine Translation and Multilinguality in May 2024

Here are short summaries of three pre-prints that I enjoyed reading in May. Zero-Shot Tokenizer Transfer Folks from the University of Cambridge and the Univerisity of Edinburgh propose a nice trick for changing the vocabulary of an already trained language model. They train a hyper-network...

mtml-highlights  en  ⏰ 2.0 min

Highlights from Machine Translation and Multilinguality in April 2024

Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation Folks from the University of the Basque Country prepared an English-Spanish dataset for natural langauge inference (i.e., deciding if sentences follow from each other, are in contradiction, or have nothing to do with each other)...

mtml-highlights  en  ⏰ 2.2 min

Highlights from Machine Translation and Multilinguality in March 2024

Did Translation Models Get More Robust Without Anyone Even Noticing? Folks from Lisbon study how robust the newest MT systems are against source-side noise. Machine translation using large models, including translation-specific NLLB or via LLMs (such as Tower or GPT-3.5), is much more robust both...

mtml-highlights  en  ⏰ 1.6 min

Highlights from Machine Translation and Multilinguality in February 2024

With a new month, here are a few papers that I noticed on arXiv in February. Linear-time Minimum Bayes Risk Decoding with Reference Aggregation A preprint from the University of Zurich proposes a linear time version of Minimum Bayes Risk (MBR) decoding in machine translation....

mtml-highlights  en  ⏰ 2.0 min

Highlights from Machine Translation and Multilinguality in December 2023 and January 2024

Many things happened in the field in December: EMNLP, Google released Gemini, and Mixtral appeared. January was seemingly not that packed with new events, but plenty of new interesting work popped up on arXiv. Predicting Human Translation Difficulty with Neural Machine Translation Folks from the...

mtml-highlights  en  ⏰ 2.7 min

Highlights from Machine Translation and Multilinguality in November 2023

Here are a couple of articles that caught my attention in November. Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles A team from Johns Hopkins University published a pre-print that belongs to the currently trendy genre: stuff we can do with...

mtml-highlights  en  ⏰ 2.4 min

Highlights from Machine Translation and Multilinguality in October 2023

Here is my monthly summary of what papers on multilinguality and machine translation I found the most noteworthy during October 2023. There were 2,881 preprints in the computation and language category on arXiv (a new record number), so there is a big chance that there...

mtml-highlights  en  ⏰ 3.3 min

Highlights from Machine Translation and Multilinguality in summer 2023

Here are short summaries of the papers I liked the most during the (academic) summer. Also, this time, I am posting both on GitHub pages and on Medium. mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs The preprint from the University of Würzburg presents a recipe for...

mtml-highlights  en  ⏰ 2.4 min

Highlights from Machine Translation and Multilinguality in June 2023

Here are the preprints that I found the most interesting in June 2023. Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers Folks from LORIA (a French research institute) and Posos (a French company) study the relationship between cross-lingual representation alignment and cross-lingual...

mtml-highlights  en  ⏰ 3.1 min

Speeding up arXiv browsing

Staying up to date with the newest NLP work is a tough job, and reading about new research takes a significant amount of my time. For several years, one of my work routines has been skimming over the arXiv digest. I open a few preprints,...

en  automated-academic  ⏰ 4.9 min

Highlights from Machine Translation and Multilinguality in May 2023

Here are a few papers I found most interesting in the flood of new pre-prints on arXiv. There was ACL’s camera-ready deadline and the start of the EMNLP anonymity period, so there were many more papers than usual. What is the best recipe for character-level...

mtml-highlights  en  ⏰ 2.9 min

Highlights from Machine Translation and Multilinguality in April 2023

Here is my monthly summary of what new papers and preprints are liked the most during the previous month. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis Several institutions in China did a thorough evaluation of how large language models work for...

mtml-highlights  en  ⏰ 2.9 min

Few words on Natural Language Processing and User Autonomy

As natural language processing (NLP) finds its way from university labs and becomes a crucial element of many user-facing technologies (machine translation, search, language-model-based assistants), people start to get concerned about the ethics of this technology. When people talk about NLP ethics, the main topics...

en  ⏰ 6.4 min

Highlights from Machine Translation and Multilinguality in March 2023

Here is what I found the most interesting in MT and multilinguality in March I only feature two papers (both from Microsoft, co-incidence), not because there were too few on arXiv, but because I did not manage to read that much this month. DiTTO: A...

mtml-highlights  en  ⏰ 2.0 min

Highlights from Machine Translation and Multilinguality in February 2023

There were plenty of interesting pre-prints on arXiv in February. Here is a brief summary of three that I think are cool but could get lost in the hundreds of papers that went public. The unreasonable effectiveness of few-shot learning for machine translation Folks from...

mtml-highlights  en  ⏰ 2.1 min

Questions and answers about ChatGPT and large language models

Česká verze příspěvku There’s been a lot of media coverage of ChatGPT and language models lately, and I feel like not everything is being said quite right. That’s why I have prepared some questions and answers that hopefully help clarify what they are talking about....

popularization  en  ⏰ 10.4 min

Otázky a odpovědi o ChatGPT a velkých jazykových modelech

English version of the post Poslední dobou se v médiích poměrně často píše o ChatGPT a jazykových modelech a mám pocit, že ne úplně všechno se říká úplně správně. Proto jsem připravil několik otázek a odpovědí, které snad pomůžou vyjasnit, o čem se to vlastně mluví....

popularization  cs  ⏰ 9.3 min

Highlights from Machine Translation and Multilinguality in December 2022 and January 2023

Here is what I found interesting on arXiv in December 2022 and January 2023. At the beginning of January, there a relatively few new pre-prints in general. But now it is catching momentum again, with more papers appearing every day. BLOOM+1: Adding Language Support to...

mtml-highlights  en  ⏰ 3.1 min

Why don't people use character-level MT? – One year later

In this post, I comment on our (i.e., myself, Helmut Schmid and Alex Fraser) year-old paper “Why don’t people use character-level machine translation,” published in Findings of ACL 2022. Here, I will (besides briefly summarizing the paper’s main message) mostly comment on what I learned...

en  ⏰ 4.0 min

Notes from EMNLP 2022

Last week I was at EMNLP in Abu Dhabi. Besides losing my passport and figuring out what to do on such an occasion (many thanks to the personnel of the Czech embassy in Abu Dhabi), I had plenty of interesting conversations and saw many interesting...

en  ⏰ 4.6 min

Highlights from Machine Translation and Multilinguality in November 2022

Here are my monthly highlights from paper machine translation and multilinguality that appeared on arXiv in November 2022. A preprint with 19 authors from 13 institutions presents something like the T0 model: but instead of starting with the (more or less) monolingual T5 model, they...

mtml-highlights  en  ⏰ 2.1 min

Highlights from Machine Translation and Multilinguality in October 2022

Here are my monthly highlights from paper machine translation and multilinguality that appeared on arXiv, many of them preprints from the upcoming EMNLP conference. Folks from Amazon published a pre-print that introduces a simple method of how to make pre-trained multilingual representation more robust towards...

mtml-highlights  en  ⏰ 2.6 min

Highlights from Machine Translation and Multilinguality in September 2022

Here are my monthly highlights from paper machine translation and multilinguality. A preprint from the Nara Institute of Science and Technology shows that target-language-specific fully connected layers in the Transformer decoder improve multilingual and zero-shot MT compared to the current practice of using a special...

mtml-highlights  en  ⏰ 2.4 min

Highlights from Machine Translation and Multilinguality in August 2022

There were not many papers I made notes about in August (likely because I was on vacation most of it). Anyway, here are three papers that I think should not be forgotten just because they went out in August. A paper by folks from JHU,...

mtml-highlights  en  ⏰ 1.2 min

Highlights from Machine Translation and Multilinguality in July 2022

Here is my monthly summary of what I found worth reading on arXiv in the past month. A preprint from JHU studies zero-shot cross-lingual transfer using pretrained multilingual representation and comes to the conclusion that it is an under-specified optimization problem. In other words, with...

mtml-highlights  en  ⏰ 2.1 min

Highlights from Machine Translation and Multilinguality in May and June 2022

After a while, here is a dump of what I found most interesting on arXiv about machine translation and multilinguality, covering May and June of this year. Google Research published a pre-print of their NAACL paper: SCONES (Single-label Contrastive Objective for Non-Exclusive Sequences). The paper...

mtml-highlights  en  ⏰ 1.4 min

Notes from ACL 2022

Here are some of my notes and comments on what I had a chance to see at ACL in Dublin last week (my first in-person conference since 2019). ACL D&I 60-60 initiative ACL announced its 60-60 initiative, for the 60th birthday of ACL, all materials...

en  ⏰ 3.5 min

Highlights from Machine Translation and Multilinguality 04/2022

Another month is over, so here is my overview of what I found most interesting in machine translation and multilinguality. Rotation ciphers as regularizers A paper accepted to ACL 2022 from Simon Fraser University experiments with using rotation ciphers on the source side of MT...

mtml-highlights  en  ⏰ 2.3 min

Highlights from Machine Translation and Multilinguality in March 2022

Here is a monthly summary of what I found most interesting on arXiv this month from machine translation and mutlilinguality. This month was the camera-ready deadline for ACL 2022, so many of the interesting papers are accepted to ACL. Overlapping BPE When training, BPE merges...

mtml-highlights  en  ⏰ 2.1 min

Highlights from Machine Translation and Multilinguality 02/2022

After 100 MT Weekly posts (which took me 130 weeks to write), I realized that weekly blogging is impossible while weekly teaching. So I decided to change the format of the post and write monthly summaries of what I found most interesting in machine translation...

mtml-highlights  en  ⏰ 2.3 min

Machine Translation Weekly 100: IGLUE as cool as igloo, multilingual and multimodal benchmark

This week I would like to feature a new multimodal-multilingual benchmark called IGLUE, presented in a pre-print that went out last Friday. The authors are from many place around the world: University of Copenhagen, Mila – Quebec Artificial Intelligence Institute, University of Cambridge, TU Darmstadt,...

mt-weekly  en  ⏰ 2.1 min

Machine Translation Weekly 99: Vícejazyčné jazykové modely občas také můžou mít problémy

Vícejazyčné jazykové modely a technologie, které na jejich základě vznikají pomáhají zásadní mírou zpřístupňovat nástroje, které až donedávna byly dostupné pouze mluvčím velkých jazyků v bohatší části planety. Umožňují (do jisté míry) jednotně reprezentovat text v různých jazycích. Modely strojového učení trénované v jednom jazyce...

mt-weekly  en  ⏰ 4.8 min

Machine Translation Weekly 99: Multilingual models can also be evil

In a report published in December on arXiv, Google Deepmind tries to categorize major ethical and societal issues connected to large language models. The report probably does not say anything that was not known before, but I like the way they categorize the issues they...

mt-weekly  en  ⏰ 4.0 min

Machine Translation Weekly 98: XGLM: GPT-3 for 30 languages

By the end of the year, Meta AI (previously Facebook AI) published a pre-print introducing a multilingual version of GPT-3 called XGLM. As its title – Few-shot Learning with Multilingual Language Models – suggests, it explores the few-shot learning capabilities. The main takeaways are: Good...

mt-weekly  en  ⏰ 2.2 min

Machine Translation Weekly 97: Multilingual and Non-autoregressive MT at the same time

Multilingual machine translation models look very promising, especially for low-resource languages that can benefit from similar patterns in similar languages. A new preprint with authors from the University of Maryland and Google Research studies how these results transfer to non-autoregressive machine translation models. The title...

mt-weekly  en  ⏰ 1.7 min

Machine Translation Weekly 96: On Evaluation of Non-Autoregressive MT Systems

I often review papers on non-autoregressive machine translation a tend the repeat the same things in my reviews. The papers often compare non-comparable things to show the non-autoregressive models in a better light. Apart from the usual flaws in MT evaluation, non-autoregressive papers often (with...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 95: Minimum Bayes Risk Decoding – the Cooler the Metric, the Cooler it gets

This week I am returning to a topic that I follow with fascination (cf. MT Weekly #20, #61, #63, and #66) without actually doing any research myself – decoding in machine learning models. The preprint I will discuss today comes from Google Research and has...

mt-weekly  en  ⏰ 2.6 min

Machine Translation Weekly 94: Notes from WMT 2021

After the notes from EMNLP 2021, here is also an unsorted list of some observations from the Conference on Machine Translation. Facebook AI won in many translation directions (not at all in all of them) in the news task with a multilingual system. At the...

mt-weekly  en  ⏰ 1.6 min

Machine Translation Weekly 93: Notes from EMNLP 2021

Another big NLP conference is over and here are my notes about the paper that I liked the most. My general impression was sort of similar to what I got from ACL this year. It seems to me that the field is progressing towards some...

mt-weekly  en  ⏰ 4.5 min

Machine Translation Weekly 92: Multilingual Machine Translation with Plug-and-Play Embeddings

Deep learning models are prone to so-called catastrophic forgetting when finetuned on slightly different data than they were originally trained on. Often, they also badly generalize when confronted with data that do not exactly look like those they were trained on. On the other hand,...

mt-weekly  en  ⏰ 1.7 min

Machine Translation Weekly 91: Zero-Shot Machine Translation with a Universal Encoder from Pre-trained Representations

How many times have you heard someone saying that multilingual BERT or similar models could be used as a universal encoder in machine translation? I heard that (and said that) many times, but never heard about someone who actually did that, until now. Folks from...

mt-weekly  en  ⏰ 2.0 min

Machine Translation Weekly 90: The Surprising Multinguality of Large Language Models

This week, I am going to share my amazement and doubts about what could be called the surprising multilinguality of large language models. By large language models, I mean the really large ones that I can hardly run myself, trained on huge, hardly curated data...

mt-weekly  en  ⏰ 3.1 min

Machine Translation Weekly 89: BPE and Memorization

Similar to last week, I will discuss a paper about input segmentation. The paper is not directly about machine translation or multilinguality but brings interesting insights for Transformer models in general. The title of the paper is How BPE affects memorization in Transformers, it has...

mt-weekly  en  ⏰ 1.7 min

Machine Translation Weekly 88: Text Segmentation and Multilinguality

With the semester start, it is also time to renew MT Weekly. My new year’s resolution was to make it to 100 issues, so let’s see if I can keep it. Today, I will talk about a paper by my colleagues from LMU Munich that...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 87: Notes from ACL 2021

The story of the science fiction novel Roadside Picnic by Arkady and Boris Strugatsky (mostly known via Tarkovsky’s 1979 film Stalker) takes place after an extraterrestrial event called the Visitation. Some aliens stopped by, made a roadside picnic, and left behind plenty of weird and...

mt-weekly  en  ⏰ 2.8 min

Machine Translation Weekly 86: The Wisdom of the WMT Crowd

Most of the papers that I comment on and review here present novel and cool ideas on how to improve something in machine translation or multilingual NLP. On the other hand, the WMT submissions are different. People want to get the best translation quality and...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 85: The Incredibility of MT Evaluation

This week, I will comment on a paper that quantifies and exactly measures the dimensions of the elephant in the room of machine translation: the lack of empirical support of claims in research papers on machine translation. The title of the paper is Scientific Credibility...

mt-weekly  en  ⏰ 2.3 min

Machine Translation Weekly 84: Order Agnostic Cross-Entropy

I tend to be a little biased against autoregressive models. The way they operate: say exactly one subword, think for a while, and then say again exactly one subword, just does not sound natural to me. Moreover, with current models, a subword can be anything...

mt-weekly  en  ⏰ 1.8 min

Machine Translation Weekly 83: On Language Indentity and Zero-Shot Transfer

This week I will comment on two papers on zero-shot cross-lingual model transfer which do not focus on the representation quality but on the transfer itself. The title of the first one is Language Embeddings for Typology and Cross-lingual Transfer Learning and has authors from...

mt-weekly  en  ⏰ 2.4 min

Machine Translation Weekly 82: Multimodal Translation and the Visual Context

This week I am going to discuss (and criticize) a paper on multimodal machine translation that attempts to once again evaluate if and how the visual information could be useful in machine translation. The title of the paper is Good for Misconceived Reasons: An Empirical...

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 81: Unsupervsied MT and Parallel Sentence Mining

This week I am going to briefly comment on a paper that uses unsupervised machine translation to improve unsupervised scoring for parallel data mining. The title of the paper is Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining, it has authors from Charles University and...

mt-weekly  en  ⏰ 1.3 min

Machine Translation Weekly 80: Deontological ethics and MT

At this year’s NAACL, there will be a paper that tries to view NLP from the perspective of deontological ethics and promotes an unusual and very insightful view on NLP ethics. The title of the paper is Case Study: Deontological Ethics in NLP, it was...

mt-weekly  en  ⏰ 3.2 min

My most amazing Makefile for CL papers

Automation of stuff that does not need to be automated at all is one of my most favorite procrastination activities. As an experienced (and most of the time unsuccessful) submitter to conferences organized by ACL (ACL, NAACL, EACL, EMNLP), I spent a lot of procrastinating...

en  ⏰ 3.7 min

Machine Translation Weekly 79: More context in MT

The lack of broader context is one of the main problems in machine translation and in NLP in general. People tried various methods with actually quite mixed results. A recent preprint from Unbabel introduces an unusual quantification of context-awareness and based on that do some...

mt-weekly  en  ⏰ 1.7 min

Machine Translation Weekly 78: Multilingual Hate Speech Detection

This week I will comment on a preprint Cross-lingual hate speech detection based on multilingual domain-specific word embeddings by authors from the University of Chile. The pre-print evaluates the possibility of cross-lingual transfer of models for hate speech detection, i.e., training a model in one...

mt-weekly  en  ⏰ 1.9 min

Machine Translation Weekly 77: Reference-free Evaluation

This week, I am will comment on a paper by authors from the University of Maryland and Google Research on reference-free evaluation of machine translation, which seems quite disturbing to me and suggests there is a lot about current MT models we still don’t quite...

mt-weekly  en  ⏰ 2.4 min

Machine Translation Weekly 76: Zero-shot MT with pre-trained encoder

Using pre-trained multilingual representation as a universal encoder for machine translation might seem like an obvious thing to try: train a decoder into one target language using one or several source languages and you get a translation from 100 languages into the target language. This...

mt-weekly  en  ⏰ 1.7 min

Machine Translation Weekly 75: Outbound Translation

This week, I will comment on a paper by my good old friends from Charles University in collaboration with the University of Edinburgh, the University of Sheffield, and the University of Tartu within the Bergamot project. The main goal of the project is to develop...

mt-weekly  en  ⏰ 2.4 min

Machine Translation Weekly 74: Architectrues we will hear about in MT

This week, I would like to feature three recent papers with innovations in neural architectures that I think might become important in MT and multilingual NLP during the next year. But of course, I might be wrong, in MT Weekly 27, I self-assuredly claimed that...

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 73: Non-autoregressive MT with Latent Codes

Today, I will comment on a paper on non-autoregressive machine translation that shows a neat trick for increasing output fluency. The title of the paper is Non-Autoregressive Translation by Learning Target Categorical Codes, has authors from several Chinese private and public institutions and will appear...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 72: Self-Training for Zero-Shot MT

This week, I will have a look at a pre-print that describes an unconventional setup for zero-shot machine translation. The title of the pre-print is Self-Learning for Zero-Shot Neural Machine Translation and was written by authors from the University of Trento. First of all, I...

mt-weekly  en  ⏰ 2.2 min

Machine Translation Weekly 71: Explaining Random Feature Attention

Transformers are the neural architecture that underlies most of the current state-of-the-art machine translation and natural language processing in general. One of its major drawbacks is the quadratic complexity of the underlying self-attention mechanism, which in practice limits the sequence length that could be processed...

mt-weekly  en  ⏰ 4.1 min

Machine Translation Weekly 70: Loss Masking instead of Data Filtering

This week, I will have a closer look at a recent pre-print introducing an alternative for parallel data filtering for machine translation training. The title of the pre-print is Gradient-guided Loss Masking for Neural Machine Translation and comes from CMU and Google. Training data cleanness...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 69: One-Shot learning in MT

This week I will discuss a paper about the one-shot vocabulary learning abilities of machine translation. The title of the paper is Continuous Learning in Neural Machine Translation using Bilingual Dictionaries and will be presented at EACL in May this year. A very similar idea...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 68: Pre-editing of MT inputs

Today, I am going to comment on a paper that systematically explores something that probably many MT users do this is pre-editing (editing the source sentence) to get a better output of an MT that is treated as a black box. The title of the...

mt-weekly  en  ⏰ 1.8 min

Machine Translation Weekly 67: Where the language neurality of mBERT reside?

If someone told me ten years ago when I was a freshly graduated bachelor of computer science that there would models that would produce multilingual sentence representation allowing zero-shot model transfer, I would have hardly believed such a prediction. If they added that the models...

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 66: Means against ends of sentences

This week I am going to revisit the mystery of decoding in neural machine translation for one more time. It has been more than a year ago when Felix Stahlberg and Bill Byrne discovered the very disturbing feature of neural machine translation models – that...

mt-weekly  en  ⏰ 2.0 min

Machine Translation Weekly 65: Sequence-to-sequence models and substitution ciphers

Today, I am going to talk about a recent pre-print on sequence-to-sequence models for deciphering substitution ciphers. Doing such a thing was somewhere at the bottom of my todo list for a few years, I suggested it as a thesis topic to several master students...

mt-weekly  en  ⏰ 2.0 min

Machine Translation Weekly 64: Non-autoregressive Models Strike Back

Half a year ago I featured here (MT Weekly 45) a paper that questions the contribution of non-autoregressive models to computational efficiency. It showed that a model with a deep encoder (that can be parallelized) and a shallow decoder (that works sequentially) reaches the same...

mt-weekly  en  ⏰ 3.7 min

Machine Translation Weekly 63: Maximum Aposteriori vs. Minimum Bayes Risk decoding

This week I will have a look at the best paper from this year’s COLING that brings an interesting view on inference in NMT models. The title of the paper is “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 62: The EDITOR

Papers about new models for sequence-to-sequence modeling have always been my favorite genre. This week I will talk about a model called EDITOR that was introduced in a pre-print of a paper that will appear in the TACL journal with authors from the University of...

mt-weekly  en  ⏰ 2.8 min

Machine Translation Weekly 61: Decoding and diversity

This week I will comment on a short paper from Carnegie Mellon University and Amazon that shows a simple analysis of the diversity of machine translation outputs. The title of the paper is Decoding and Diversity in Machine Translation and it will be presented at...

mt-weekly  en  ⏰ 1.9 min

Machine Translation Weekly 60: Notes about WMT 2020 Shared Tasks

This week, I will follow up the last week’s post and comment on the news from this year’s WMT that was collocated with EMNLP. As every year, there were many shared tasks on various types of translation and evaluation of machine translation. News translation task...

mt-weekly  en  ⏰ 2.3 min

Machine Translation Weekly 59: Notes from EMNLP 2020

Another large NLP conference that must have taken place in a virtual environment, EMNLP 2020, is over, and here are my notes from the conference. The ACL in the summer that had most Q&A sessions on Zoom, which meant most of the authors waiting forever...

mt-weekly  en  ⏰ 7.1 min

Machine Translation Weekly 58: Poisoning machine translation

Today, I am going to talk about a topic that is rather unknown to me: the safety and vulnerability of machine translation. I will comment on a paper Targeted Poisoning Attacks on Black-Box Neural Machine Translation by authors from the University of Melbourne and Facebook...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 57: Document-level MT with Context Masking

This week, I am going to discuss the paper “Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation” by authors from Alibaba Group. The preprint of the paper appeared a month ago on arXiv and will be presented at this...

mt-weekly  en  ⏰ 2.4 min

Machine Translation Weekly 56: Beam Search and Models' Surprisal

Last year an EMNLP paper “On NMT Search Errors and Model Errors: Cat Got Your Tongue?” (that I discussed in MT Weekly 20) showed a mindblowing property of neural machine translation models that the most probable target sentence is not necessarily the best target sentence....

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 55: Social Polarization Seen through Word Embeddings

This week, I am going to have a closer look at a paper that creatively uses methods for bilingual word embeddings for social media analysis. The paper’s preprint was uploaded last week on arXiv. The title is “We Don’t Speak the Same Language: Interpreting Polarization...

mt-weekly  en  ⏰ 2.1 min

Neuronové sítě a strojový překlad

Článek původně vyšel v loňském prosincovém čísle časopisu Rozhledy matematicko-fyzikální. Co je to strojový překlad Pod strojovým překladem si většina lidí představí nejspíš Google Translate a většina lidí si také nejspíš vyzkoušela, jak funguje. Ten, kdo překladač používá častěji si mohl všimnout, že zhruba před...

cs  popularization  ⏰ 10.9 min

Machine Translation Weekly 54: Nearest Neighbor MT

This week, I will discuss Nearest Neighbor Machine Translation, a paper from this year ICML that takes advantage of overlooked representation learning capabilities of machine translation models. This paper’s idea is pretty simple and is basically the same as in the previous work on nearest...

mt-weekly  en  ⏰ 1.8 min

Machine Translation Weekly 53: Code Swithing Pre-training for NMT

After a short break, MT weekly is again here, and today I will talk about a paper “CSP: Code-Switching Pre-training for Neural Machine Translation” that will appear at this year’s virtual EMNLP. The paper proposes a new and surprisingly elegant way of monolingual pre-training for...

mt-weekly  en  ⏰ 2.1 min

Machine Translation Weekly 52: Human Parity in Machine Translation

This week I am going to have a look at a paper by my former colleagues from Prague “Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals” that was published in Nature Communications. The paper systematically studies machine translation...

mt-weekly  en  ⏰ 2.6 min

Machine Translation Weekly 51: Machine Translation without Embeddings

Over the few years when neural models are the state of the art in machine translation, the architectures got quite standardized. There is a vocabulary of several thousand discrete input/output units. As the first step, the inputs are represented by static embeddings which get encoded...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 50: Language-Agnostic Multilingual Representations

Pre-trained multilingual representations promise to make the current best NLP model available even for low-resource languages. With a truly language-neutral pre-trained multilingual representation, we could train a task-specific model for English (or another language with available training data) and such a model would work for...

mt-weekly  en  ⏰ 3.4 min

Machine Translation Weekly 49: Paraphrasing using multilingual MT

It is a well-known fact that when you have a hammer, everything looks like a nail. It is a less-known fact that when you have a sequence-to-sequence model, everything looks like machine translation. One example of this thinking is the paper Paraphrase Generation as Zero-Shot...

mt-weekly  en  ⏰ 2.2 min

Machine Translation Weekly 48: MARGE

This week, I will comment on a recent pre-print by Facebook AI titled Pre-training via Paraphrasing. The paper introduces a model called MARGE (indeed, they want to say it belongs to the same family as BART by Facebook) that uses a clever way of denoising...

mt-weekly  en  ⏰ 3.2 min

Machine Translation Weekly 47: Notes from the ACL

In this extremely long post, I will not focus on one paper as I usually do, but instead will show my brief, but still infinitely long notes from this year’s ACL. Many people already commented on the virtual format of the conference. I will spare...

mt-weekly  en  ⏰ 9.5 min

Machine Translation Weekly 46: The News GPT-3 has for Machine Translation

Back in 2013, a friend of mine enthusiastically told me, how excited he was about deep learning democratizing AI (and way saying it was not relevant for NLP at all): there was no need for large CPU clusters, all you needed was buying a gaming...

mt-weekly  en  ⏰ 3.7 min

Machine Translation Weekly 45: Deep Encoder, Shallow Decoder, and the Fall of Non-autoregressive models

Researchers concerned with machine translation speed invented several methods that are supposed to significantly speed up the translation while maintaining as much as possible from the translation quality of the state-of-the-art models. The methods are usually based on generating as many words as possible in...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 44: Tangled up in BLEU (and not blue)

For quite a while, machine translation is approached as a behaviorist simulation. Don’t you know what a good translation is? It does not matter, you can just simulate what humans do. Don’t you know how to measure if something is a good translation? It does...

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 43: Dynamic Programming Encoding

One of the narratives people (including me) love to associate with neural machine translation is that we got rid of all linguistic assumptions about the text and let the neural network learn their own way independent of what people think about language. It sounds cool,...

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 42: Unsupervised Multimodal Machine Translation

Several weeks ago, I discussed a paper that showed how parallel data between two languages can be used to improve unsupervised translation between one of the two languages and a third one. This week, I will have a look at a similar idea applied in...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 41: Translating Fast and Slow

Recently, I came across a paper that announces a dataset release. The dataset is called PuzzLing and collects translation puzzles from the international linguistic olympiad. In machine translation jargon, I would say it provides extremely small training data to learn how to translate unknown languages....

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 40: Getting Massively Multilingual Again

More than half a year ago (in MT Weekly 10), I discussed massively multilingual models by Google. They managed to train two large models: one translating from 102 languages into English, the other one from English into 102 languages. This approach seemed to help a...

mt-weekly  en  ⏰ 1.9 min

O datovém kolonializmu

English version of the post Poslední na tomto blogu většinou komentuji odborné články o strojovém překladu, které nebývají delší než deset stran. Tentokrát udělám výjimku a napíšu o knize, která je dlouhá několik set stran. Věnuje se některým důležitým společenským problémům, nad kterými by se...

cs  ⏰ 8.5 min

On Data Colonialism

Česká verze příspěvku On this blog, I usually review papers that are usually arund 10 pages long. This time, I am going to write about a book that is several hundred pages long and discusses important issues that I believe people dealing with AI should...

en  ⏰ 9.6 min

Machine Translation Weekly 39: Formal Hierarchy of Recurrent Architectures

Before the Transformer architecture was invented, recurrent networks were the most prominent architectures used in machine translation and the rest of natural language processing. It is quite surprising how little we still know about the architectures from the theoretical perspective. People often repeat a claim...

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 38: Taking Care about Reference Sentences

In the recent week, there were quite a lot of papers on machine translation on arXiv, at least a few of them every day. Let me have a look at one that tackles an important topic – machine translation evaluation – from a quite unusual...

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 37: Backtranslation and Domain Adaptation

It is sometimes fascinating to observe how each step of training neural machine translation systems gets one by one picked up by the research community, analyzed to the tiniest detail and turned into a complex recipe. Data augmentation by back-translation used to be a pretty...

mt-weekly  en  ⏰ 2.1 min

Machine Translation Weekly 36: Sign Language Translation

This week, I am going to have a look at a topic that I was not thinking about much before: sign language translation. I will comment on a bachelor thesis from Ecole Polytechnique in Paris that was uploaded to arXiv earlier this week. It was...

mt-weekly  en  ⏰ 2.6 min

Machine Translation Weekly 35: Word Translation of Transformer Layers

I always rationalized the encoder-decoder architecture as a conditional language model. The decoder is the language model, the component that “knows” the target language (whatever it means) and uses the encoder as an external memory, so it does not forget what it was talking about....

mt-weekly  en  ⏰ 2.2 min

Machine Translation Weekly 34: Echo State Neural Machine Translation

This week I am going to write a few notes on paper Echo State Neural Machine Translation by Google Research from some weeks ago. Echo state networks are a rather weird idea: initialize the parameters of a recurrent neural network randomly, keep them fixed and...

mt-weekly  en  ⏰ 1.6 min

Machine Translation Weekly 33: Document-level translation via self-fine-tuning

This week, I will have a look at a recent pre-print that presents an interesting method for document-level machine translation that is quite different from all previous ones. The title of the paper Capturing document context inside sentence-level neural machine translation models with self-training and...

mt-weekly  en  ⏰ 2.0 min

Machine Translation Weekly 32: BERT in Machine Translation

I am pretty sure everyone tried to use BERT as a machine translation encoder and who says otherwise, keeps trying. Representations from BERT brought improvement in most natural language processing tasks, why would machine translation be an exception? Well, because it is not that easy....

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 31: Fixing Transformer's Heads

This week, I am going to comment on a paper that appeared on arXiv on Tuesday and raised quite a lot of interest on Twitter. The title of the paper is Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation and it describes work that has...

mt-weekly  en  ⏰ 4.1 min

Machine Translation Weekly 30: A Multilignual View of Unsupervised Machine Translation

This week, it will be the third time in recent weeks when I am going to review a paper that primarily focuses on unsupervised machine translation. The title of the paper is A Multilingual View on Unsupervised Machine Translation and it describes again work done...

mt-weekly  en  ⏰ 5.7 min

Machine Translation Weekly 29: Encode, Tag, Realize - sequence transformation by learned edit operations

This week I will have a look at a paper from last year’s EMNLP that introduces a relatively simple architecture for sequence generation when the target sequence is very similar to the source sequence. The title of the paper is “Encode, Tag, Realize: High-Precisions Text...

mt-weekly  en  ⏰ 4.0 min

Machine Translation Weekly 28: mBART – Multilingual Pretraining of Sequence-to-sequence Models

The trend of model-pretraining and task-specific fine-tuning finally fully hit machine translation as well. After begin used for some time for unsupervised machine translation training, at the end of January Facebook published a model, a pre-trained sequence-to-sequence model for 25 languages at the same time....

mt-weekly  en  ⏰ 4.0 min

Machine Translation Weekly 27: Explaining the Reformer

This week I will review a paper that is not primarily about machine translation but about a neural architecture but can make a big impact on machine translation and natural language processing in general. This post is about Google’s Reformer, a neural architecture that is...

mt-weekly  en  ⏰ 8.9 min

Machine Translation Weekly 26: Unsupervised Machine Translation

One of the hottest topics in machine translation and one of the topics I ignored so far is unsupervised machine translation, i.e., machine translation trained without the use of any parallel data. I will go through a seven-month-old paper published at this year’s ACL titled...

mt-weekly  en  ⏰ 6.5 min

Machine Translation Weekly 25: Weaknesses of Reinforcement Learning for NMT

Back in 2016, one of the trendy topics was reinforcement learning and other forms of optimizing NMT directly towards some more relevant metrics rather than using cross-entropy of the conditional word distributions. Standard machine translation models are trained to maximize single-word conditional distribution, which is...

mt-weekly  en  ⏰ 2.8 min

Machine Translation Weekly 24: Cross-Lingual Ability of Multilingual BERT

After the Christmas holidays, I will once again have a look at multilingual BERT. I already discussed multilingual BERT on this blog once when I reviewed a paper that explored some cross-lingual and multilingual properties of multilingual BERT. This week’s paper does more in-depth experiments...

mt-weekly  en  ⏰ 3.3 min

Machine Translation Weekly 23: Word Sense Disambiguation in Neural Machine Translation

This week, I would like to give some thoughts about word senses and representation contextualization in machine translation. I will start by explaining why I think the current way of writing about word senses in NLP is kind of misleading and why I think we...

mt-weekly  en  ⏰ 4.3 min

Machine Translation Weekly 22: Understanding Knowledge Distillation in Non-Autoregressive Machine Translation

Last week, I discussed a paper claiming that forward-translation might be a better data augmentation technique than back-translation. This week, I will follow with a paper that touches a similar topic, but in a slightly different context. The title of the paper is Understanding Knowledge Distillation in Non-Autoregressive Machine Translation and was...

mt-weekly  en  ⏰ 3.8 min

Machine Translation Weekly 21: On Translationese and Back-Translation

Does WMT speak translationese? And who else speaks translationese? Is the success of back-translation fake news? These are the questions that implicitly ask authors of a paper called Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation that was uploaded on arXiv earlier...

mt-weekly  en  ⏰ 3.4 min

Machine Translation Weekly 20: Search and Model Errors in Neural Machine translation

This week, I will have a look at a paper from this year’s EMNLP that got a lot of attention on Twitter this week. If there was an award for the most disturbing machine translation paper, this would be a good candidate. The title of...

mt-weekly  en  ⏰ 4.1 min

Machine Translation Weekly 19: Domain Robustness

This week, I will briefly have a look at a paper that discusses another major problem of current machine translation which is domain robustness. The problem is very well analyzed in a paper from the University of Zurich called Domain Robustness in Neural Machine Translation...

mt-weekly  en  ⏰ 2.4 min

Machine Translation Weekly 18: BPE Dropout

Everyone who followed natural language processing on Twitter last week must have noticed a paper called BPE-Dropout: Simple and effective Subword Regularizations that introduces a simple way of adding stochastic noise into text segmentation to increase model robustness. It sounds complicated, but it is fairly easy. As...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 17: When is Document-Level Context Useful?

One of the biggest limitations of current machine translation systems is they only work with isolated sentences. The systems need to guess when it comes to phenomena that cross the (often rather arbitrary) sentence boundaries. The typical example that is mentioned everywhere is the translation...

mt-weekly  en  ⏰ 4.0 min

Machine Translation Weekly 16: Hybrid character-level and word-level machine translation

One of the topics I am currently dealing with in my research is character-level modeling for neural machine translation. Therefore, I was glad to see a paper that appeared on arXiv last week called On the Importance of Word Boundaries in Character-level Neural Machine Translation that shows an interesting...

mt-weekly  en  ⏰ 5.3 min

Machine Translation Weekly 15: How Multilingual is Multiligual BERT?

This week, I will slightly depart from machine translation and have a look at a paper How Multilingual is Multilingual BERT by Google Research. BERT, the Sesame Street muppet that recently colonized the whole area of natural language processing is a model trained to predict...

mt-weekly  en  ⏰ 4.0 min

Machine Translation Weekly 14: Modeling Confidence in Sequence-to-Sequence Models

Neural machine translation is based on machine learning—we collect training data, pairs of parallel sentences which we hope represent how language is used in the two languages, and train models using the data. When the model is trained, the more the input resembles the sentences...

mt-weekly  en  ⏰ 2.8 min

Machine Translation Weekly 13: Long Distance Dependencies

Let us follow up on the gender paper and have a look at other cases where machine translation does not work as well as we would like it to work. This time, we will have a look at a paper that talks about grammatically complex...

mt-weekly  en  ⏰ 3.4 min

Machine Translation Weekly 12: Memory-Augmented Networks

Five years ago when deep learning slowly started to be cool, there was a paper called Neural Turing Machines (which are not really Turing machines, but at least they are neural in a narrow technical sense). The paper left me with a foolishly naive impression...

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 11: Gender and Machine Translation

It’s time to talk about gender—why things go wrong with gender in machine translation and what people do about it. Some languages have gendered nouns (German), some have gendered almost everything (Czech, French) and some only few pronouns (English). Let’s say you want to translate...

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 10: Massively Multilingual Neural Machine Translation

The holiday period is over and I almost settled in my new place of operation which is the Ludwig-Maximilian University of Munich, and now there is nothing that can prevent me from continuing with weekly reports on what is new in the world of machine...

mt-weekly  en  ⏰ 2.7 min

Machine Translation Weekly 9: Shared Task on Machine Translation Robustness

Machine translation is typically trained on bilingual data that can be found on the Internet. It mostly comes from international government and non-government organizations, commercial web presentations, books, movie subtitles, etc. Therefore, most of the text is quite formal and almost without typos and certainly...

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 8: A Generalized Framework of Sequence Generation

This week’s post contains more math than usually. I will talk about a paper that unifies several decoding algorithms in MT using one simple equation. The paper is called A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models, it comes from New...

mt-weekly  en  ⏰ 4.8 min

Machine Translation Weekly 7: Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Remember two years ago when all tech-related servers enthusiastically reported that a translator by Google created its own language? These were based on a paper that was published in TAACL in summer 2017 after its pre-print was available on arXiv since November 2016. The paper...

mt-weekly  en  ⏰ 3.6 min

Machine Translation Weekly 6: Probing the Need for Visual Context in Multimodal Machine Translation

This week, we will have a look at a paper that won the best short paper award at NAACL 2019. The name of the paper is Probing the Need for Visual Context in Multimodal Machine Translation and it was written by friends of mine from...

mt-weekly  en  ⏰ 2.5 min

Machine Translation Weekly 5: Revisiting Low-Resource Neural Machine Translation

Let’s continue with pre-prints of papers which are going to appear at ACL this year and have a look at another paper that comes from the University of Edinburgh, titled Revisiting Low-Resource Neural Machine Translation: A Case Study. This paper is a reaction to an...

mt-weekly  en  ⏰ 2.2 min

Machine Translation Weekly 4: Analyzing Multi-Head Self-Attention

With the ACL camera-ready deadline slowly approaching, future ACL papers start to pop up on arXiv. One of those which went public just a few days ago is a paper called Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned...

mt-weekly  en  ⏰ 4.2 min

Machine Translation Weekly 3: Constant-Time Machine Translation with Conditional Masked Language Models

This week, we will have a look at a brand-new method for non-autoregressive machine translation published a few weeks ago on arXiv by Facebook AI Research, two days before the anonymity period for the EMNLP conference. Most models for neural machine translation work autoregressively. When...

mt-weekly  en  ⏰ 3.0 min

Machine Translation Weekly 2: BERTScore

Last week, there was a paper on arXiv that introduces a method for MT evaluation using BERT sentence representation. The metric seems to be the new state of the art in MT evaluation. Its name is BERTScore and was done at Cornell University. MT evaluation...

mt-weekly  en  ⏰ 2.9 min

Machine Translation Weekly 1: Bidirectional Decoding

This is the first post from a series in which I will try to come up with summaries of some of the latest papers and other news on machine translation. The main goal of this exercise is to force myself to read new papers regularly...

mt-weekly  en  ⏰ 3.1 min

Tak trochu fake news o umělé inteligenci

English version of the post Když si v médiích čtu o technologiích, které využívají strojové učení, a o umělé inteligenci, jak se teď říká, často se divím, jak jsou zprávy nejen zjednodušující, ale především zavádějící. Určitě se to děje ve všech oborech, nicméně rozhořčování se...

popularization  cs  ⏰ 5.4 min

Kind of fake news on artificial intelligence

Česká verze příspěvku While reading news stories on research or products involving deep learning, I get often surprised how inaccurate and misleading the news stories are. It is probably a problem of almost all expert fields which happen to appear in media, luckily they do...

popularization  en  ⏰ 5.9 min

Nesnesitelná soutěživost umělých inteligentů

English version of the post Odmala jsem si myslel, že biatlon je divný sport. Vrtalo mi hlavou, jak někoho napadlo soutěžit v tak odlišných věcech jako je ježdění na lyžích a střelba. O trochu větší překvapení přišlo, když jsem se dozvěděl o existenci moderního pětiboje. Díky...

popularization  cs  ⏰ 9.1 min

Further, faster, stronger, dear AI

Česká verze příspěvku Since I was a little boy, I was astonished how weird sport biathlon is. I couldn’t imagine how could someone possible invent a combination of cross-country skying and shooting. It blew my mind when I found out there is even weirder combination...

popularization  en  ⏰ 6.8 min

Computational Linguistics in the 21st century – a private manifesto of its perpetual student

Česká verze příspěvku In this essay, I would like to sum up my opinions on what is the role of computational linguistics, why should people concern with it and I believe are its current problems, and most importantly why it is a very exciting field...

popularization  en  ⏰ 8.2 min

Počítačová lingvistika 21. století – soukromý manifest jejího věčného studenta

English version of the post V tomto příspěvku se pokusím zaujatě a angažovaně shrnout, co je to počítačová (chcete-li komputační nebo matematická) lingvistika, jaké jsou její současné problémy a proč je i přesto fascinujícím oborem, kterým má smysl se zabývat. Počítačová lingvistika je cosi mezi...

popularization  cs  ⏰ 7.3 min

Deep learning a psaní I/Y

English version of the post Z deep learningu (hlubokého učení – strojového učení pomocí neuronových sítí) se v posledních letech stal buzzword technologického světa. Můžeme číst články, co dovede umělá inteligence (vyčpělé artificial intelligence se s oblibou nahrazuje pojmem machine intelligence) – jak dovede vyřešit automatický překlad,...

popularization  cs  ⏰ 10.1 min

Spell checking of y and in Czech using deep learning

Česká verze příspěvku In the recent years, deep learning (machine learning with neural networks) became a frequently used buzzword in the technological word. We can find plenty of articles on how machine intelligence (a new, probably sexier term for artificial intelligence) can solve machine translation,...

popularization  en  ⏰ 9.9 min

What is Neural Machine Translation Capable of?

Česká verze příspěvku A year ago at EMNLP in Lisbon I saw a paper called On Statistical Machine Translation and Translation Theory by Christian Hardmeier. He was standing in front of his poster and almost apologized to everybody who passed by his poster that the...

popularization  en  ⏰ 4.7 min

Co dovede neuronový překlad?

English version of the post Před rokem jsem na konferenci EMNLP v Lisabonu zahlédl článek, který se jmenoval On Statistical Machine Translation and Translation Theory (O statistickém strojovém překladu a teorii překladu) od Christiana Hardmeiera. Stál před svým posterem a každému, kdo se u jeho posteru...

popularization  cs  ⏰ 4.2 min

subscribe via RSS