Originally, Blink use fallback fonts if the font loading isn’t finished in 3 seconds. But this timeout value wasn’t defined in any spec. Blink changes this timeout adaptively to improve performance on slow connections. If font-display feature is enabled, this change happens only if ‘auto’ is specified to follow the font-display spec. Otherwise, it happens always. The definition of slow connection would be changed in the field trial.
Documentation
Specification
Status: Specification being incubated in a Community Group
Status in Chromium
Blink components: Blink>Fonts
Implementation status: Browser Intervention(tracking bug)Estimated milestones:DesktopShipping: 49AndroidShipping: 49iOSShipping: 49
Consensus & Standardization
After a feature ships in Chrome, the values listed here are not guaranteed to be up to date.
- Firefox:No signal
- Safari:No signal
- Web Developers:No signals
Owners
Comments
Chrome 49, the feature is enabled as a User-Agent Intervention. 2G Cellular connection will trigger the intervention. Chrome 53, network quality estimator has been used to trigger the intervention on effectively slow networks. This estimator-based intervention was enabled by default in Chrome 59. Developer tools shows a warning message when the intervention is triggered for each font. Web site owners can opt-out from the intervention by setting CSS font-display to other than ‘auto’.
Search tags
Font, Fonts, WebFont, WebFonts, Intervention, User Agent Intervention
META DATA Last updated on 2022-10-29
Feature nameIntervention: WebFonts use adaptive timeouts to take fallback fontsSummaryOriginally, Blink use fallback fonts if the font loading isn’t finished in 3 seconds. But this timeout value wasn’t defined in any spec. Blink changes this timeout adaptively to improve performance on slow connections. If font-display feature is enabled, this change happens only if ‘auto’ is specified to follow the font-display spec. Otherwise, it happens always. The definition of slow connection would be changed in the field trial.
UnlistedFalseBreaking changeFalseFeature ownerstoyoshim@chromium.org,kenjibaheux@chromium.orgFeature editorsNo information provided yetCCNo information provided yetCategoryNetwork / ConnectivityFeature typeNew feature incubationSearch tagsFont,Fonts,WebFont,WebFonts,Intervention,User Agent InterventionImplementation statusBrowser InterventionBlink componentBlink>FontsTracking bug URLhttp://crbug.com/578029Launch URLNo information provided yetCommentsChrome 49, the feature is enabled as a User-Agent Intervention. 2G Cellular connection will trigger the intervention. Chrome 53, network quality estimator has been used to trigger the intervention on effectively slow networks. This estimator-based intervention was enabled by default in Chrome 59. Developer tools shows a warning message when the intervention is triggered for each font. Web site owners can opt-out from the intervention by setting CSS font-display to other than ‘auto’.
Start incubating
Create an initial WebStatus feature entry and kick off standards incubation (WICG) to share ideas.MotivationNo information provided yetInitial public proposal URLNo information provided yetExplainer link(s)No information provided yetRequires Embedder SupportFalse
Share an explainer doc and API. Start prototyping code in a public repo.API OwnersSpec linkhttp://tabatkins.github.io/specs/css-font-display/Standard maturitySpecification being incubated in a Community GroupAPI specFalseSpec mentorsNo information provided yetIntent to Prototype linkNo information provided yet
availability for developers to try. Provide sample code. Request feedback from browser vendors.DevTrial instructionsNo information provided yetDoc link(s)
Interoperability and Compatibility RisksNo information provided yetSafari viewsNo signalSafari views linkNo information provided yetSafari views notesNo information provided yetFirefox viewsNo signalFirefox views linkNo information provided yetFirefox views notesNo information provided yetWeb / Framework developer viewsNo signalsWeb / Framework developer views linkNo information provided yetWeb / Framework developer views notesNo information provided yetOther viewsNo information provided yetSecurity review statusPendingPrivacy review statusPendingErgonomics RisksNo information provided yetActivation RisksNo information provided yetSecurity RisksNo information provided yetDebuggabilityNo information provided yetSupported on all platforms?No information provided yetPlatform Support ExplanationNo information provided yetWeb Platform TestsNo information provided yetWeb Platform Tests DescriptionNo information provided yetDemo and sample linksNo information provided yetDeveloper relations emailsNo information provided yetFlag nameNo information provided yetDevTrial on desktopNo information provided yetDevTrial on AndroidNo information provided yetDevTrial on iOS (RARE)No information provided yetReady for Trial linkNo information provided yet
Work through a TAG review and gather vendor signals.Prefixed?FalseTAG Specification ReviewNo information provided yet
in shipping milestone. Finalize docs and announcements. Further standardization.API OwnersTAG Specification Review StatusPendingWebView application risksNo information provided yetAnticipated spec changesNo information provided yetMeasurementNo information provided yetNon-OSS dependenciesNo information provided yetFinch experimentNo information provided yetIntent to Ship linkNo information provided yetChrome for desktop49Chrome for Android49Chrome for iOS (RARE)49Android WebviewNo information provided yet
Web technology for developers
The open Web presents incredible opportunities for developers. To take full advantage of these technologies, you need to know how to use them. Below you’ll find links to our Web technology documentation.
Documentation for Web developers
The Web Developer Guide provides useful how-to content to help you actually use Web technologies to do what you want or need to do.Tutorials for Web developers
Tutorials to take you step-by-step through learning HTML, CSS, JavaScript, and Web APIs.Accessibility
Enabling as many people as possible to use websites, even when those people’s abilities are limited in some way.Performance
Making content as available and interactive as possible, as soon as possible.Security
Protecting users from data leaks and data theft, side-channel attacks, and attacks such as cross-site scripting, content injection, and click-jacking.
Web technology references
JavaScript programming APIs you can use to build apps on the Web.HTML
HTML provides the fundamental building blocks for structuring Web documents and apps.CSS
Cascading Style Sheets are used to describe the appearance of Web documents and apps.JavaScript
JavaScript is the Web’s native programming language.WebAssembly
WebAssembly allows programs written in C, C++, Rust, Swift, C#, Go, and more to run on the Web.Events
Events are what you build Web apps to react to; for example, when a Web page finishes loading, or a user selects something, presses a key, resizes a window, submits a form, or pauses a video.HTTP
HTTP is the fundamental Internet protocol for fetching documents, stylesheets, scripts, images, videos, fonts, and other resources over the Web — and for sending data back to Web servers.Media
Formats, codecs, protocols, APIs, and techniques for embedding and streaming video, audio, and image content in Web documents and apps.SVG
Scalable Vector Graphics lets you create images that scale smoothly to any size.MathML
MathML lets you display complex mathematical notation on the Web.Web Components
Web Components are custom elements that you can define and reuse in your Web apps.WebDriver
WebDriver is a browser-automation mechanism for remotely controlling a browser by emulating the actions of a real person using the browser. It’s widely used for cross-browser testing of Web apps.Web Extensions
Web Extensions are a way for you to give users enhanced capabilities in their browsers — for doing things such as blocking ads and other content, customizing the appearance of pages, and more.Web App Manifests
Web App Manifests let you enable users to install Web apps to their device home screens, with aspects such as portrait/landscape screen orientation and display mode (e.g., full screen) pre-set.Progressive Web Apps (PWAs)
Progressive Web Apps provide a user experience similar to native mobile apps.
Developer tools documentation
Documentation for the set of web-developer tools built into Firefox.Chrome DevTools
Documentation for the set of web-developer tools built into Chrome.Safari Web Inspector
Documentation for the set of web-developer tools built into Safari.Edge DevTools
Documentation for the set of web-developer tools built into Edge.
Found a content problem with this page?
- Edit the page on GitHub.
- Report the content issue.
- View the source on GitHub.
Want to get more involved? Learn how to contribute.
This page was last modified on Jan 22, 2023 by MDN contributors.
Interop 2022: Outcomes →
- How the Mozilla Community helps shape our productsA product is first an idea, then a project, and then a prototype. Here, at Mozilla, our awesome community is there every step of the way to support and contribute to our products. None of what we do would be possible without this multicultural, multilingual community of like-minded people working together to be a better internet. Posted on December 7, 2022
- Improving Firefox stability with this one weird trickWe break down how we reduced Firefox out-of-memory crashes on Windows with a simple trick. Poorly behaving web pages and apps are no longer capable of crashing the browser by exhausting memory.Posted on November 22, 2022
- Revamp of MDN Web Docs Contribution DocsThe MDN Web Docs team recently undertook a project to revamp and reorganize the “Contribution Docs”. These are all the pages on MDN that describe what’s what – the templates and page structures, how to perform a task on MDN, how to contribute to MDN, and the community guidelines to follow while contributing to this massive open source project.Posted on October 31, 2022
- Improving Firefox responsiveness on macOSIf you’re running Firefox on macOS you might have noticed that its responsiveness has improved significantly in version 103, especially if you’ve got a lot of tabs, or when your machine is busy running other applications at the same time. This improvement was achieved via a small change in how locking is implemented within Firefox’s memory allocator.Posted on October 10, 2022
- The 100% Markdown ExpeditionIn June 2021, we decided to start converting the source code for MDN web docs from HTML into a format that would be easier for us to work with. The goal was to get 100% of our manually-written documentation converted to Markdown, and we really had a mountain of source code to climb for this particular expedition. In this post, we’ll describe why we decided to migrate to Markdown, and the steps you can take that will help us on our mission.Posted on September 8, 2022
- Merging two GitHub repositories without losing commit historyHow do you merge two Git repositories without losing history? This post will take you through the step-by-step process.Posted on August 29, 2022
- Neural Machine Translation Engine for Firefox Translations add-onFirefox Translations is a website translation add-on that provides an automated translation of web content. In this article, we will discuss the technical challenges around the development of the translation engine and how we solved them to build a usable Firefox Translations add-on.Posted on June 29, 2022
- The JavaScript Specification has a New LicenseAs part of our work to ensure a free and open web, we’ve been working together with Ecma International, and many partners to write a License inspired by the W3C Document and Software License. Our goal was that JavaScript’s status would align with other specifications of the Web. In addition, with this new license available to all TCs at Ecma International, this will provide other organizations to approach standardization with the same perspective.Posted on June 27, 2022
- Fuzzing rust-minidump for Embarrassment and Crashes – Part 2For the last year, we’ve been working on the development of rust-minidump, a pure-Rust replacement for the minidump-processing half of google-breakpad. The final part in this series takes you through fuzzing rust-minidump.Posted on June 23, 2022
- Hacks Decoded: Bikes and Boomboxes with Samuel AboagyeSamuel Aboagye is a genius. Aboagye is 17 years old. In those 17 years, he’s crafted more inventions than you have, probably. Among them: a solar-powered bike and a Bluetooth speaker, both using recycled materials. We caught up with Aboagye over video chat in hopes that he’d talk with us about his creations, and ultimately how he’s way cooler than any of us at 17.Posted on June 16, 2022
FEATURED ARTICLE
Recent Articles
- Interop 2022: OutcomesLast March we announced the Interop 2022 project, a collaboration between Apple, Bocoup, Google, Igalia, Microsoft, and Mozilla to improve the quality and consistency of their implementations of the web platform. Now that it’s 2023 and we’re deep into preparations for the next iteration of Interop, it’s a good time to reflect on how the first year of Interop has gone.Posted on January 31, 2023
Discover great resources for web development
Sign up for the Mozilla Developer Newsletter:E-mailI’m okay with Mozilla handling my info as explained in this Privacy Policy.Sign up now

Browse All Articles →
Categories
- AV1
- Code Cartoons
- CSS
- Developer Tools
- Dweb
- ES6 In Depth
- Firefox
- Firefox Releases
- Games
- JavaScript
- MDN
- Performance
- rust
- Security
- Web of Things
- WebAssembly
- WebVR
Interop 2022: Outcomes →
FEATURED ARTICLE
Recent Articles
- Interop 2022: OutcomesLast March we announced the Interop 2022 project, a collaboration between Apple, Bocoup, Google, Igalia, Microsoft, and Mozilla to improve the quality and consistency of their implementations of the web platform. Now that it’s 2023 and we’re deep into preparations for the next iteration of Interop, it’s a good time to reflect on how the first year of Interop has gone.Posted on January 31, 2023
Discover great resources for web development
Sign up for the Mozilla Developer Newsletter:E-mailI’m okay with Mozilla handling my info as explained in this Privacy Policy.Sign up now

- How the Mozilla Community helps shape our productsA product is first an idea, then a project, and then a prototype. Here, at Mozilla, our awesome community is there every step of the way to support and contribute to our products. None of what we do would be possible without this multicultural, multilingual community of like-minded people working together to be a better internet. Posted on December 7, 2022
- Improving Firefox stability with this one weird trickWe break down how we reduced Firefox out-of-memory crashes on Windows with a simple trick. Poorly behaving web pages and apps are no longer capable of crashing the browser by exhausting memory.Posted on November 22, 2022
- Revamp of MDN Web Docs Contribution DocsThe MDN Web Docs team recently undertook a project to revamp and reorganize the “Contribution Docs”. These are all the pages on MDN that describe what’s what – the templates and page structures, how to perform a task on MDN, how to contribute to MDN, and the community guidelines to follow while contributing to this massive open source project.Posted on October 31, 2022
- Improving Firefox responsiveness on macOSIf you’re running Firefox on macOS you might have noticed that its responsiveness has improved significantly in version 103, especially if you’ve got a lot of tabs, or when your machine is busy running other applications at the same time. This improvement was achieved via a small change in how locking is implemented within Firefox’s memory allocator.Posted on October 10, 2022
- The 100% Markdown ExpeditionIn June 2021, we decided to start converting the source code for MDN web docs from HTML into a format that would be easier for us to work with. The goal was to get 100% of our manually-written documentation converted to Markdown, and we really had a mountain of source code to climb for this particular expedition. In this post, we’ll describe why we decided to migrate to Markdown, and the steps you can take that will help us on our mission.Posted on September 8, 2022
- Merging two GitHub repositories without losing commit historyHow do you merge two Git repositories without losing history? This post will take you through the step-by-step process.Posted on August 29, 2022
- Neural Machine Translation Engine for Firefox Translations add-onFirefox Translations is a website translation add-on that provides an automated translation of web content. In this article, we will discuss the technical challenges around the development of the translation engine and how we solved them to build a usable Firefox Translations add-on.Posted on June 29, 2022
- The JavaScript Specification has a New LicenseAs part of our work to ensure a free and open web, we’ve been working together with Ecma International, and many partners to write a License inspired by the W3C Document and Software License. Our goal was that JavaScript’s status would align with other specifications of the Web. In addition, with this new license available to all TCs at Ecma International, this will provide other organizations to approach standardization with the same perspective.Posted on June 27, 2022
- Fuzzing rust-minidump for Embarrassment and Crashes – Part 2For the last year, we’ve been working on the development of rust-minidump, a pure-Rust replacement for the minidump-processing half of google-breakpad. The final part in this series takes you through fuzzing rust-minidump.Posted on June 23, 2022
- Hacks Decoded: Bikes and Boomboxes with Samuel AboagyeSamuel Aboagye is a genius. Aboagye is 17 years old. In those 17 years, he’s crafted more inventions than you have, probably. Among them: a solar-powered bike and a Bluetooth speaker, both using recycled materials. We caught up with Aboagye over video chat in hopes that he’d talk with us about his creations, and ultimately how he’s way cooler than any of us at 17.Posted on June 16, 2022
Browse All Articles →
Categories
- AV1
- Code Cartoons
- CSS
- Developer Tools
- Dweb
- ES6 In Depth
- Firefox
- Firefox Releases
- Games
- JavaScript
- MDN
- Performance
- rust
- Security
- Web of Things
- WebAssembly
- WebVR
Training efficient neural network models for Firefox Translations
By Evgeny Pavlov
Posted on June 7, 2022 in Featured Article, Firefox, and Machine Translation
Machine Translation is an important tool for expanding the accessibility of web content. Usually, people use cloud providers to translate web pages. State-of-the-art Neural Machine Translation (NMT) models are large and often require specialized hardware like GPUs to run inference in real-time.
If people were able to run a compact Machine Translation (MT) model on their local machine CPU without sacrificing translation accuracy it would help to preserve privacy and reduce costs.
The Bergamot project is a collaboration between Mozilla, the University of Edinburgh, Charles University in Prague, the University of Sheffield, and the University of Tartu with funding from the European Union’s Horizon 2020 research and innovation programme. It brings MT to the local environment, providing small, high-quality, CPU optimized NMT models. The Firefox Translations web extension utilizes proceedings of project Bergamot and brings local translations to Firefox.
In this article, we will discuss the components used to train our efficient NMT models. The project is open-source, so you can give it a try and train your model too!
Architecture
NMT models are trained as language pairs, translating from language A to language B. The training pipeline was designed to train translation models for a language pair end-to-end, from environment configuration to exporting the ready-to-use models. The pipeline run is completely reproducible given the same code, hardware and configuration files.
The complexity of the pipeline comes from the requirement to produce an efficient model. We use Teacher-Student distillation to compress a high-quality but resource-intensive teacher model into an efficient CPU-optimized student model that still has good translation quality. We explain this further in the Compression section.
The pipeline includes many steps: compiling of components, downloading and cleaning datasets, training teacher, student and backward models, decoding, quantization, evaluation etc (more details below). The pipeline can be represented as a Directly Acyclic Graph (DAG).

The workflow is file-based and employs self-sufficient scripts that use data on disk as input, and write intermediate and output results back to disk.
We use the Marian Neural Machine Translation engine. It is written in C++ and designed to be fast. The engine is open-sourced and used by many universities and companies, including Microsoft.
Training a quality model
The first task of the pipeline is to train a high-quality model that will be compressed later. The main challenge at this stage is to find a good parallel corpus that contains translations of the same sentences in both source and target languages and then apply appropriate cleaning procedures.
Datasets
It turned out there are many open-source parallel datasets for machine translation available on the internet. The most interesting project that aggregates such datasets is OPUS. The Annual Conference on Machine Translation also collects and distributes some datasets for competitions, for example, WMT21 Machine Translation of News. Another great source of MT corpus is the Paracrawl project.
OPUS dataset search interface:

I
possible to use any dataset on disk, but automating dataset downloading from Open source resources makes adding new language pairs easy, and whenever the data set is expanded we can then easily retrain the model to take advantage of the additional data. Make sure to check the licenses of the open-source datasets before usage.
Data cleaning
Most open-source datasets are somewhat noisy. Good examples are crawled websites and translation of subtitles. Texts from websites can be poor-quality automatic translations or contain unexpected HTML, and subtitles are often free-form translations that change the meaning of the text.
It is well known in the world of Machine Learning (ML) that if we feed garbage into the model we get garbage as a result. Dataset cleaning is probably the most crucial step in the pipeline to achieving good quality.
We employ some basic cleaning techniques that work for most datasets like removing too short or too long sentences and filtering the ones with an unrealistic source to target length ratio. We also use bicleaner, a pre-trained ML classifier that attempts to indicate whether the training example in a dataset is a reversible translation. We can then remove low-scoring translation pairs that may be incorrect or otherwise add unwanted noise.
Automation is necessary when your training set is large. However, it is always recommended to look at your data manually in order to tune the cleaning thresholds and add dataset-specific fixes to get the best quality.
Data augmentation
There are more than 7000 languages spoken in the world and most of them are classified as low-resource for our purposes, meaning there is little parallel corpus data available for training. In these cases, we use a popular data augmentation strategy called back-translation.
Back-translation is a technique to increase the amount of training data available by adding synthetic translations. We get these synthetic examples by training a translation model from the target language to the source language. Then we use it to translate monolingual data from the target language into the source language, creating synthetic examples that are added to the training data for the model we actually want, from the source language to the target language.
The model
Finally, when we have a clean parallel corpus we train a big transformer model to reach the best quality we can.
Once the model converges on the augmented dataset, we fine-tune it on the original parallel corpus that doesn’t include synthetic examples from back-translation to further improve quality.
Compression
The trained model can be 800Mb or more in size depending on configuration and requires significant computing power to perform translation (decoding). At this point, it’s generally executed on GPUs and not practical to run on most consumer laptops. In the next steps we will prepare a model that works efficiently on consumer CPUs.
Knowledge distillation
The main technique we use for compression is Teacher-Student Knowledge Distillation. The idea is to decode a lot of text from the source language into the target language using the heavy model we trained (Teacher) and then train a much smaller model with fewer parameters (Student) on these synthetic translations. The student is supposed to imitate the teacher’s behavior and demonstrate similar translation quality despite being significantly faster and more compact.
We also augment the parallel corpus data with monolingual data in the source language for decoding. This improves the student by providing additional training examples of the teacher’s behavior.
Ensemble
Another trick is to use not just one teacher but an ensemble of 2-4 teachers independently trained on the same parallel corpus. It can boost quality a little bit at the cost of having to train more teachers. The pipeline supports training and decoding with an ensemble of teachers.
Quantization
One more popular technique for model compression is quantization. We use 8-bit quantization which essentially means that we store weights of the neural net as int8 instead of float32. It saves space and speeds up matrix multiplication on inference.
Other tricks
Other features worth mentioning but beyond the scope of this already lengthy article are the specialized Neural Network architecture of the student model, half-precision decoding by the teacher model to speed it up, lexical shortlists, training of word alignments, and finetuning of the quantized student.
Yes, it’s a lot! Now you can see why we wanted to have an end-to-end pipeline.
How to learn more
This work is based on a lot of research. If you are interested in the science behind the training pipeline, check out reference publications listed in the training pipeline repository READMEand across the wider Bergamot project. Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task is a good academic starting article. Check this tutorial by Nikolay Bogoychev for a more practical and operational explanation of the steps.
Results
The final student model is 47 times smaller and 37 times faster than the original teacher model and has only a small quality decrease!
Benchmarks for en-pt model and Flores dataset:ModelSizeTotal number of parametersDataset decoding time on 1 CPU coreQuality, BLEUTeacher798Mb192.75M631s52.5Student quantized17Mb15.7M17.9s50.7
We evaluate results using MT standard BLEU scores that essentially represent how similar translated and reference texts are. This method is not perfect but it has been shown that BLEU scores correlate well with human judgment of translation quality.
We have a GitHub repository with all the trained models and evaluation results where we compare the accuracy of our models to popular APIs of cloud providers. We can see that some models perform similarly, or even outperform, the cloud providers which is a great result taking into account our model’s efficiency, reproducibility and open-source nature.
For example, here you can see evaluation results for the English to Portuguese model trained by Mozilla using open-source data only.

Anyone can train models and contribute them to our repo. Those contributions can be used in the Firefox Translations web extension and other places (see below).
Scaling
It is of course possible to run the whole pipeline on one machine, though it may take a while. Some steps of the pipeline are CPU bound and difficult to parallelize, while other steps can be offloaded to multiple GPUs. Most of the official models in the repository were trained on machines with 8 GPUs. A few steps, like teacher decoding during knowledge distillation, can take days even on well-resourced single machines. So to speed things up, we added cluster support to be able to spread different steps of the pipeline over multiple nodes.
Workflow manager
To manage this complexity we chose Snakemake which is very popular in the bioinformatics community. It uses file-based workflows, allows specifying step dependencies in Python, supports containerization and integration with different cluster software. We considered alternative solutions that focus on job scheduling, but ultimately chose Snakemake because it was more ergonomic for one-run experimentation workflows.
Example of a Snakemake rule (dependencies between rules are inferred implicitly):rule train_teacher: message: "Training teacher on all data" log: f"{log_dir}/train_teacher{{ens}}.log" conda: "envs/base.yml" threads: gpus_num*2 resources: gpu=gpus_num input: rules.merge_devset.output, train_src=f'{teacher_corpus}.{src}.gz', train_trg=f'{teacher_corpus}.{trg}.gz', bin=ancient(trainer), vocab=vocab_path output: model=f'{teacher_base_dir}{{ens}}/{best_model}' params: prefix_train=teacher_corpus, prefix_test=f"{original}/devset", dir=directory(f'{teacher_base_dir}{{ens}}'), args=get_args("training-teacher-base") shell: '''bash pipeline/train/train.sh \ teacher train {src} {trg} "{params.prefix_train}" \ "{params.prefix_test}" "{params.dir}" \ "{input.vocab}" {params.args} >> {log} 2>&1'''
Cluster support
To parallelize workflow steps across cluster nodes we use Slurm resource manager. It is relatively simple to operate, fits well for high-performance experimentation workflows, and supports Singularity containers for easier reproducibility. Slurm is also the most popular cluster manager for High-Performance Computers (HPC) used for model training in academia, and most of the consortium partners were already using or familiar with it.
How to start training
The workflow is quite resource-intensive, so you’ll need a pretty good server machine or even a cluster. We recommend using 4-8 Nvidia 2080-equivalent or better GPUs per machine.
Clone https://github.com/mozilla/firefox-translations-trainingand follow the instructions in the readme for configuration.
The most important part is to find parallel datasets and properly configure settings based on your available data and hardware. You can learn more about this in the readme.
How to use the existing models
The existing models are shipped with the Firefox Translations web extension, enabling users to translate web pages in Firefox. The models are downloaded to a local machine on demand. The web extension uses these models with thebergamot-translator Marian wrapper compiled to Web Assembly.
Also, there is a playground website at https://mozilla.github.io/translate where you can input text and translate it right away, also locally but served as a static website instead of a browser extension.
If you are interested in an efficient NMT inference on the server, you can try a prototype HTTP service that uses bergamot-translator natively compiled, instead of compiled to WASM.
Or follow the build instructions in the bergamot-translator readme to directly use the C++, JavaScript WASM, or Python bindings.
Conclusion
It is fascinating how far Machine Translation research has come in recent years. Local high-quality translations are the future and it’s becoming more and more practical for companies and researchers to train such models even without access to proprietary data or large-scale computing power.
We hope that Firefox Translations will set a new standard of privacy-preserving, efficient, open-source machine translation accessible for all.
Acknowledgements
I would like to thank all the participants of the Bergamot Project for making this technology possible, my teammates Andre Natal and Abhishek Aggarwal for the incredible work they have done bringing Firefox Translations to life, Lonnen for managing the project and editing this blog post and of course awesome Mozilla community for helping with localization of the web-extension and testing its early builds.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303 🇪🇺
About Evgeny Pavlov
Evgeny is a Senior Software Engineer at Mozilla, working on Applied Machine Learning projects.
More articles by Evgeny Pavlov…
Discover great resources for web development
Sign up for the Mozilla Developer Newsletter:E-mailI’m okay with Mozilla handling my info as explained in this Privacy Policy.Sign up now
Snake Make io
Readability
With Snakemake, data analysis workflows are defined via an easy to read, adaptable, yet powerful specification language on top of Python. Each rule describes a step in an analysis defining how to obtain output files from input files.Dependencies between rules are determined automatically.
Portability
By integration with the Conda package manager and container virtualization , all software dependencies of each workflow step are automatically deployed upon execution.
Modularization
Rapidly implement analysis steps via direct script and jupyter notebook integration. Easily create and employre-usable tool wrappers and split your data analysis into well-separated modules.
Transparency
Automatic, interactive, self-contained reports ensure full transparency from results down to used steps, parameters, code, and software.

D
calability
Workflows scale seamlessly from single to multicore, clusters or the cloud, without modification of the workflow definition and automatic avoidance of redundant computations.

workstation

compute server

cluster

grid computing

cloud computing

Readability
With Snakemake, data analysis workflows are defined via an easy to read, adaptable, yet powerful specification language on top of Python. Each rule describes a step in an analysis defining how to obtain output files from input files.Dependencies between rules are determined automatically.
Portability
By integration with the Conda package manager and container virtualization , all software dependencies of each workflow step are automatically deployed upon execution.
Modularization
Rapidly implement analysis steps via direct script and jupyter notebook integration. Easily create and employre-usable tool wrappers and split your data analysis into well-separated modules.
Transparency
Automatic, interactive, self-contained reports ensure full transparency from results down to used steps, parameters, code, and software.

Creating an environment from an environment.yaml file
First, make sure to activate the conda base environment with
conda activate base
The environment.yaml file can be used to install all required software into an isolated Conda environment with the name snakemake-RawIlluminaPipeline via
mamba env create --name snakemake-RawIlluminaPipeline --file environment.yaml
O, using conda:
conda env create --name snakemake-RawIlluminaPipeline --file environment.yaml
To activate this environment, use
conda activate snakemake-RawIlluminaPipeline
To deactivate an active environment, use
conda deactivate
Running the pipeline with toy data
The input data must meet the following requirements: be inside the reads/
folder and be named consistently with the following scheme:
ls reads #clock_10K.1.fastq.gz clock_10K.2.fastq.gz
In this case, the sample identifier is clock_10K
.
We can, by executing the following command, draw the workflow to be followed:
snakemake --dag results/clock_10K_prokka | dot -Tsvg > docs/dag_prokka.svg