Skip to content

Commit e72f9b8

Browse files
authored
Merge pull request #295 from TensorSpeech/feat-streaming
Keras 3, Kaggle, CLI, Streaming
2 parents 141deeb + 4a4dece commit e72f9b8

File tree

148 files changed

+6900
-2927
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

148 files changed

+6900
-2927
lines changed

.dockerignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
LibriSpeech
22
Models
3+
.venv*
4+
venv*

.pylintrc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# A comma-separated list of package or module names from where C extensions may
44
# be loaded. Extensions are loading into the active Python interpreter and may
55
# run arbitrary code.
6-
extension-pkg-allow-list=pydantic,tensorflow
6+
extension-pkg-allow-list=pydantic
77

88
# A comma-separated list of package or module names from where C extensions may
99
# be loaded. Extensions are loading into the active Python interpreter and may
@@ -120,6 +120,12 @@ disable=too-few-public-methods,
120120
consider-using-f-string,
121121
fixme,
122122
unused-variable,
123+
pointless-string-statement,
124+
too-many-lines,
125+
abstract-method,
126+
too-many-ancestors,
127+
import-outside-toplevel,
128+
too-many-positional-arguments,
123129

124130
# Enable the message, report, category or checker with the given id(s). You can
125131
# either give multiple identifier separated by comma (,) or put this option

.vscode/settings.json

Lines changed: 30 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,31 @@
11
{
2-
"[python]": {
3-
"editor.defaultFormatter": "ms-python.black-formatter"
4-
},
5-
"autoDocstring.docstringFormat": "numpy",
6-
"black-formatter.args": [
7-
"--config",
8-
"${workspaceFolder}/pyproject.toml"
9-
],
10-
"black-formatter.path": [
11-
"${interpreter}",
12-
"-m",
13-
"black"
14-
],
15-
"editor.codeActionsOnSave": {
16-
"source.fixAll": "explicit",
17-
"source.organizeImports": "explicit"
18-
},
19-
"editor.formatOnSave": true,
20-
"isort.args": [
21-
"--settings-file",
22-
"${workspaceFolder}/pyproject.toml"
23-
],
24-
"pylint.args": [
25-
"--rcfile=${workspaceFolder}/.pylintrc"
26-
],
27-
"pylint.path": [
28-
"${interpreter}",
29-
"-m",
30-
"pylint"
31-
],
32-
"python.analysis.fixAll": [
33-
"source.unusedImports",
34-
"source.convertImportFormat"
35-
],
36-
"python.analysis.importFormat": "absolute"
37-
}
2+
"[python]": {
3+
"editor.defaultFormatter": "ms-python.black-formatter",
4+
"editor.tabSize": 4
5+
},
6+
"[markdown]": {
7+
"editor.tabSize": 2,
8+
"editor.indentSize": 2,
9+
"editor.detectIndentation": false
10+
},
11+
"[json]": {
12+
"editor.tabSize": 2
13+
},
14+
"[yaml]": {
15+
"editor.tabSize": 2
16+
},
17+
"autoDocstring.docstringFormat": "numpy",
18+
"black-formatter.args": ["--config", "${workspaceFolder}/pyproject.toml"],
19+
"black-formatter.path": ["${interpreter}", "-m", "black"],
20+
"editor.codeActionsOnSave": {
21+
"source.fixAll": "explicit",
22+
"source.organizeImports": "explicit"
23+
},
24+
"editor.formatOnSave": true,
25+
"isort.args": ["--settings-file", "${workspaceFolder}/pyproject.toml"],
26+
"pylint.args": ["--rcfile=${workspaceFolder}/.pylintrc"],
27+
"pylint.path": ["${interpreter}", "-m", "pylint"],
28+
"python.analysis.fixAll": ["source.unusedImports", "source.convertImportFormat"],
29+
"python.analysis.importFormat": "absolute",
30+
"markdown.extension.list.indentationSize": "inherit"
31+
}

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM tensorflow/tensorflow:2.3.2-gpu
1+
FROM tensorflow/tensorflow:2.18.0-gpu
22

33
RUN apt-get update \
44
&& apt-get upgrade -y \
@@ -9,8 +9,8 @@ RUN apt-get update \
99
RUN apt clean && apt-get clean
1010

1111
# Install dependencies
12-
COPY requirements.txt /
13-
RUN pip --no-cache-dir install -r /requirements.txt
12+
COPY requirements*.txt /
13+
RUN pip --no-cache-dir install -r /requirements.txt -r /requirements.cuda.txt
1414

1515
# Install rnnt_loss
1616
COPY scripts /scripts
@@ -21,4 +21,4 @@ RUN if [ "$install_rnnt_loss" = "true" ] ; \
2121
&& ./scripts/install_rnnt_loss.sh \
2222
else echo 'Using pure TensorFlow'; fi
2323

24-
RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> /root/.bashrc
24+
RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> /root/.bashrc

README.md

Lines changed: 18 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ TensorFlowASR implements some automatic speech recognition architectures such as
3434
- [Installing from source (recommended)](#installing-from-source-recommended)
3535
- [Installing via PyPi](#installing-via-pypi)
3636
- [Installing for development](#installing-for-development)
37-
- [Install for Apple Sillicon](#install-for-apple-sillicon)
3837
- [Running in a container](#running-in-a-container)
3938
- [Training \& Testing Tutorial](#training--testing-tutorial)
4039
- [Features Extraction](#features-extraction)
@@ -61,6 +60,8 @@ TensorFlowASR implements some automatic speech recognition architectures such as
6160

6261
- **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100))
6362
See [examples/models/transducer/conformer](./examples/models/transducer/conformer)
63+
- **Streaming Conformer** (Reference: [http://arxiv.org/abs/2010.11395](http://arxiv.org/abs/2010.11395))
64+
See [examples/models/transducer/conformer](./examples/models/transducer/conformer)
6465
- **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191))
6566
See [examples/models/transducer/contextnet](./examples/models/transducer/contextnet)
6667
- **RNN Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621))
@@ -74,62 +75,46 @@ TensorFlowASR implements some automatic speech recognition architectures such as
7475

7576
For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.)
7677

78+
**NOTE ONLY FOR APPLE SILICON**: TensorFlowASR requires python >= 3.12
79+
80+
See the `requirements.[extra].txt` files for extra dependencies
81+
7782
### Installing from source (recommended)
7883

7984
```bash
8085
git clone https://github.com/TensorSpeech/TensorFlowASR.git
8186
cd TensorFlowASR
82-
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
83-
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
87+
pip3 install -e . # or ".[cuda]" if using GPU
8488
```
8589

86-
For anaconda3:
90+
For **anaconda3**:
8791

8892
```bash
89-
conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
93+
conda create -y -n tfasr python=3.11 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
9094
conda activate tfasr
91-
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
9295
git clone https://github.com/TensorSpeech/TensorFlowASR.git
9396
cd TensorFlowASR
94-
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
95-
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
97+
pip3 install -e . # or ".[cuda]" if using GPU
9698
```
9799

98-
### Installing via PyPi
100+
For **colab with TPUs**:
99101

100102
```bash
101-
# Tensorflow 2.x (with 2.x >= 2.3)
102-
pip3 install "TensorFlowASR[tf2.x]" # or pip3 install "TensorFlowASR[tf2.x-gpu]"
103+
pip3 install -e ".[tpu]" -f https://storage.googleapis.com/libtpu-tf-releases/index.html
103104
```
104105

105-
### Installing for development
106+
### Installing via PyPi
106107

107108
```bash
108-
git clone https://github.com/TensorSpeech/TensorFlowASR.git
109-
cd TensorFlowASR
110-
pip3 install -e ".[dev]"
111-
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" or ".[tf2.x-apple]" for apple m1 machine
109+
pip3 install "TensorFlowASR" # or "TensorFlowASR[cuda]" if using GPU
112110
```
113111

114-
### Install for Apple Sillicon
115-
116-
Due to tensorflow-text is not built for Apple Sillicon, we need to install it with the prebuilt wheel file from [sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)
112+
### Installing for development
117113

118114
```bash
119115
git clone https://github.com/TensorSpeech/TensorFlowASR.git
120116
cd TensorFlowASR
121-
pip3 install -e "." # or pip3 install -e ".[dev] for development # or pip3 install "TensorFlowASR[dev]" from PyPi
122-
pip3 install tensorflow~=2.14.0 # change minor version if you want
123-
```
124-
125-
Do this after installing TensorFlowASR with tensorflow above
126-
127-
```bash
128-
TF_VERSION="$(python3 -c 'import tensorflow; print(tensorflow.__version__)')" && \
129-
TF_VERSION_MAJOR="$(echo $TF_VERSION | cut -d'.' -f1,2)" && \
130-
PY_VERSION="$(python3 -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(f"{major}{minor}");')" && \
131-
URL="https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon" && \
132-
pip3 install "${URL}/releases/download/v${TF_VERSION_MAJOR}/tensorflow_text-${TF_VERSION_MAJOR}.0-cp${PY_VERSION}-cp${PY_VERSION}-macosx_11_0_arm64.whl"
117+
pip3 install -e ".[apple,dev]"
133118
```
134119

135120
### Running in a container
@@ -139,7 +124,6 @@ docker-compose up -d
139124
```
140125

141126

142-
143127
## Training & Testing Tutorial
144128

145129
- For training, please read [tutorial_training](./docs/tutorials/training.md)
@@ -165,7 +149,7 @@ See [tflite_convertion](./docs/tutorials/tflite.md)
165149

166150
## Pretrained Models
167151

168-
Go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing)
152+
See the results on each example folder, e.g. [./examples/models//transducer/conformer/results/sentencepiece/README.md](./examples/models//transducer/conformer/results/sentencepiece/README.md)
169153

170154
## Corpus Sources
171155

@@ -183,6 +167,7 @@ Go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZx
183167
| Vivos | [https://ailab.hcmus.edu.vn/vivos](https://www.kaggle.com/datasets/kynthesis/vivos-vietnamese-speech-corpus-for-asr) | 15h |
184168
| InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/datasets/infore/25hours.zip) | 25h |
185169
| InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/datasets/infore/audiobooks.zip) | 415h |
170+
| VieitBud500 | [https://huggingface.co/datasets/linhtran92/viet_bud500](https://huggingface.co/datasets/linhtran92/viet_bud500) | 500h |
186171

187172
## How to contribute
188173

docs/tokenizers.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,26 @@
1-
# Tokenizers
2-
31
- [Tokenizers](#tokenizers)
42
- [1. Character Tokenizer](#1-character-tokenizer)
53
- [2. Wordpiece Tokenizer](#2-wordpiece-tokenizer)
64
- [3. Sentencepiece Tokenizer](#3-sentencepiece-tokenizer)
75

6+
# Tokenizers
87

98
## 1. Character Tokenizer
109

11-
See [librespeech config](../examples/configs/librispeech/characters/char.yml.j2)
10+
See [librespeech config](../examples/datasets/librispeech/characters/char.yml.j2)
1211

1312
This splits the text into characters and then maps each character to an index. The index starts from 1 and 0 is reserved for blank token. This tokenizer only used for languages that have a small number of characters and each character is not a combination of other characters. For example, English, Vietnamese, etc.
1413

1514
## 2. Wordpiece Tokenizer
1615

17-
See [librespeech config](../examples/configs/librispeech/wordpiece/wp.yml.j2) for wordpiece splitted by whitespace
16+
See [librespeech config](../examples/datasets/librispeech/wordpiece/wp.yml.j2) for wordpiece splitted by whitespace
1817

19-
See [librespeech config](../examples/configs/librispeech/wordpiece/wp_whitespace.yml.j2) for wordpiece that whitespace is a separate token
18+
See [librespeech config](../examples/datasets/librispeech/wordpiece/wp_whitespace.yml.j2) for wordpiece that whitespace is a separate token
2019

2120
This splits the text into words and then splits each word into subwords. The subwords are then mapped to indices. Blank token can be set to <unk> as index 0. This tokenizer is used for languages that have a large number of words and each word can be a combination of other words, therefore it can be applied to any language.
2221

2322
## 3. Sentencepiece Tokenizer
2423

25-
See [librespeech config](../examples/configs/librispeech/sentencepiece/sp.yml.j2)
24+
See [librespeech config](../examples/datasets/librispeech/sentencepiece/sp.yml.j2)
2625

2726
This splits the whole sentence into subwords and then maps each subword to an index. Blank token can be set to <unk> as index 0. This tokenizer is used for languages that have a large number of words and each word can be a combination of other words, therefore it can be applied to any language.

docs/tutorials/testing.md

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,33 @@
1+
- [Testing Tutorial](#testing-tutorial)
2+
- [1. Installation](#1-installation)
3+
- [2. Prepare transcripts files](#2-prepare-transcripts-files)
4+
- [3. Prepare config file](#3-prepare-config-file)
5+
- [4. Run testing](#4-run-testing)
6+
7+
18
# Testing Tutorial
29

310
These commands are example for librispeech dataset, but we can apply similar to other datasets
411

5-
## 1. Install packages
6-
7-
If you use google colab, it's recommended to use the tensorflow version pre-installed on the colab itself
12+
## 1. Installation
813

914
```bash
10-
pip uninstall -y TensorFlowASR # uninstall for clean install if needed
11-
pip install ".[tf2.x]"
15+
./setup.sh [tpu|gpu|cpu] install
1216
```
1317

1418
## 2. Prepare transcripts files
1519

1620
This is the example for preparing transcript files for librispeech data corpus
1721

1822
```bash
19-
python scripts/create_librispeech_trans.py \
23+
python examples/datasets/librispeech/prepare_transcript.py \
2024
--directory=/path/to/dataset/test-clean \
2125
--output=/path/to/dataset/test-clean/transcripts.tsv
2226
```
2327

2428
Do the same thing with `test-clean`, `test-other`
2529

26-
For other datasets, you must prepare your own python script like the `scripts/create_librispeech_trans.py`
30+
For other datasets, please make your own script to prepare the transcript files, take a look at the [`prepare_transcript.py`](../../examples/datasets/librispeech/prepare_transcript.py) file for more reference
2731

2832
## 3. Prepare config file
2933

@@ -33,27 +37,27 @@ Please take a look in some examples for config files in `examples/*/*.yml.j2`
3337

3438
The config file is the same as the config used for training
3539

36-
## 4. [Optional][Required if not exists] Generate vocabulary and metadata
40+
The inputs, outputs and other options of vocabulary are defined in the config file
41+
42+
For example:
3743

38-
Use the same vocabulary file used in training
44+
```jinja2
45+
{% import "examples/datasets/librispeech/sentencepiece/sp.yml.j2" as decoder_config with context %}
46+
{{decoder_config}}
3947
40-
```bash
41-
python scripts/prepare_vocab_and_metadata.py \
42-
--config-path=/path/to/config.yml.j2 \
43-
--datadir=/path/to/datadir
48+
{% import "examples/models/transducer/conformer/small.yml.j2" as config with context %}
49+
{{config}}
4450
```
4551

46-
The inputs, outputs and other options of vocabulary are defined in the config file
47-
48-
## 5. Run testing
52+
## 4. Run testing
4953

5054
```bash
51-
python examples/test.py \
55+
tensorflow_asr test \
5256
--config-path /path/to/config.yml.j2 \
5357
--dataset_type slice \
5458
--datadir /path/to/datadir \
5559
--outputdir /path/to/modeldir/tests \
5660
--h5 /path/to/modeldir/weights.h5
5761
## See others params
58-
python examples/test.py --help
62+
tensorflow_asr test --help
5963
```

docs/tutorials/tflite.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@
1111
## Conversion
1212

1313
```bash
14-
python3 examples/tflite.py \
14+
tensorflow_asr tflite \
1515
--config-path=/path/to/config.yml.j2 \
1616
--h5=/path/to/weight.h5 \
1717
--bs=1 \ # Batch size
1818
--beam-width=0 \ # Beam width, set >0 to enable beam search
1919
--output=/path/to/output.tflite
2020
## See others params
21-
python examples/tflite.py --help
21+
tensorflow_asr tflite --help
2222
```
2323

2424
## Inference

0 commit comments

Comments
 (0)