![](23-11_SEAMLESS_BlogHero_11.17.jpg)
# Seamless Intro
Seamless is a family of AI models that enable more natural and authentic communication across languages. SeamlessM4T is a massive multilingual multimodal machine translation model supporting around 100 languages. SeamlessM4T serves as foundation for SeamlessExpressive, a model that preserves elements of prosody and voice style across languages and SeamlessStreaming, a model supporting simultaneous translation and streaming ASR for around 100 languages. SeamlessExpressive and SeamlessStreaming are combined into Seamless, a unified model featuring multilinguality, real-time and expressive translations.
## Links
### Demos
| | SeamlessM4T v2 | SeamlessExpressive | SeamlessStreaming |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- |
| Demo | [SeamlessM4T v2 Demo](https://seamless.metademolab.com/m4t?utm_source=github&utm_medium=web&utm_campaign=seamless&utm_content=readme) | [SeamlessExpressive Demo](https://seamless.metademolab.com/expressive?utm_source=github&utm_medium=web&utm_campaign=seamless&utm_content=readme) | |
| HuggingFace Space Demo | [ð¤ SeamlessM4T v2 Space](https://huggingface.co/spaces/facebook/seamless-m4t-v2-large) | [ð¤ SeamlessExpressive Space](https://huggingface.co/spaces/facebook/seamless-expressive) | [ð¤ SeamlessStreaming Space](https://huggingface.co/spaces/facebook/seamless-streaming) |
### Papers
[Seamless](https://ai.facebook.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/)
[EMMA](https://ai.meta.com/research/publications/efficient-monotonic-multihead-attention/)
[SONAR](https://ai.meta.com/research/publications/sonar-expressive-zero-shot-expressive-speech-to-speech-translation/)
### Blog
[AI at Meta Blog](https://ai.meta.com/research/seamless-communication/)
## Tutorial
An exhaustive [tutorial](Seamless_Tutorial.ipynb) given at the NeurIPS 2023 - Seamless EXPO, which is a one-stop shop to learn how to use the entire suite of Seamless models. Please feel free to play with the notebook.
## SeamlessM4T
SeamlessM4T is our foundational all-in-one **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation model delivering high-quality translation for speech and text in nearly 100 languages.
SeamlessM4T models support the tasks of:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)
:star2: We are releasing SeamlessM4T v2, an updated version with our novel *UnitY2* architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks.
To learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the [SeamlessM4T README](docs/m4t/README.md) or [ð¤ Model Card](https://huggingface.co/facebook/seamless-m4t-v2-large).
> [!NOTE]
> Seamless M4T is also available in the ð¤ Transformers library. Visit [this section](docs/m4t/README.md#transformers-usage) for more details.
## SeamlessExpressive
SeamlessExpressive is a speech-to-speech translation model that captures certain underexplored aspects of prosody such as speech rate and pauses, while preserving the style of one's voice and high content translation quality.
To learn more about SeamlessExpressive models, visit the [SeamlessExpressive README](docs/expressive/README.md) or [ð¤ Model Card](https://huggingface.co/facebook/seamless-expressive)
## SeamlessStreaming
SeamlessStreaming is a streaming translation model. The model supports speech as input modality and speech/text as output modalities.
The SeamlessStreaming model supports the following tasks:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Automatic speech recognition (ASR)
To learn more about SeamlessStreaming models, visit the [SeamlessStreaming README](docs/streaming/README.md) or [ð¤ Model Card](https://huggingface.co/facebook/seamless-streaming)
## Seamless
The Seamless model is the unified model for expressive streaming speech-to-speech translations.
## What's new
- [12/18/2023] We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder) as described in Section 3.2.1 of the [paper](https://arxiv.org/pdf/2312.05187.pdf), which is at the core of our Seamless models.
- [12/14/2023] We are releasing the Seamless [tutorial](#tutorial) given at NeurIPS 2023.
# Quick Start
## Installation
> [!NOTE]
> One of the prerequisites is [fairseq2](https://github.com/facebookresearch/fairseq2) which has pre-built packages available only
> for Linux x86-64 and Apple-silicon Mac computers. In addition it has a dependency on [libsndfile](https://github.com/libsndfile/libsndfile) which
> might not be installed on your machine. If you experience any installation issues, please refer to its
> [README](https://github.com/facebookresearch/fairseq2) for further instructions.
```
pip install .
```
> [!NOTE]
> Transcribing inference audio for computing metric uses [Whisper](https://github.com/openai/whisper#setup), which is automatically installed. Whisper in turn requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers.
## Running inference
### SeamlessM4T Inference
Hereâs an example of using the CLI from the root directory to run inference.
S2ST task:
```bash
m4t_predict
--task s2st --tgt_lang --output_path
```
T2TT task:
```bash
m4t_predict --task t2tt --tgt_lang --src_lang
```
Please refer to the [inference README](src/seamless_communication/cli/m4t/predict) for detailed instruction on how to run inference and the list of supported languages on the source, target sides for speech, text modalities.
For running S2TT/ASR natively (without Python) using GGML, please refer to [the unity.cpp section](#unitycpp).
### SeamlessExpressive Inference
> [!NOTE]
> Please check the [section](#seamlessexpressive-models) on how to download the model.
Hereâs an example of using the CLI from the root directory to run inference.
```bash
expressivity_predict --tgt_lang --model_name seamless_expressivity --vocoder_name vocoder_pretssel --output_path
```
### SeamlessStreaming and Seamless Inference
[Streaming Evaluation README](src/seamless_communication/cli/streaming) has detailed instructions for running evaluations for the SeamlessStreaming and Seamless models. The CLI has an `--no-scoring` option that can be used to skip the scoring part and just run inference.
Please check the inference [README](src/seamless_communication/inference) for more details.
## Running SeamlessStreaming Demo
You can duplicate the [SeamlessStreaming HF space](https://huggingface.co/spaces/facebook/seamless-streaming?duplicate=true) to run the streaming demo.
You can also run the demo loca