Author Topic: Implementation of Deep Speech 2 in neon  (Read 130 times)

0 Members and 1 Guest are viewing this topic.

Offline Flavio58

Implementation of Deep Speech 2 in neon
« Reply #1 on: March 13, 2018, 02:52:39 AM »
Advertisement
Implementation of Deep Speech 2 in neon
This repository contains an implementation of Baidu SVAIL's Deep Speech 2 model in neon. Much of the model is readily available in mainline neon; to also support the CTC cost function, we have included a neon-compatible wrapper for Baidu's Warp-CTC.

Deep Speech 2 models are computationally intensive, and thus they can require long periods of time to run. Even with near-perfect GPU utilization, the model can take up to 1 week to train on large enough datasets to see respectable performance. Please keep this in mind when exploring this repo.

https://github.com/NervanaSystems/deepspeech/archive/master.zip

We have used this code to train models on both the Wall Street Journal (81 hours) and Librispeech (1000 hours) datasets. The WSJ dataset is available through the LDC only; however, Librispeech can be freely acquired from Librispeech corpus.

The model presented here uses a basic argmax-based decoder:

Choose the most probable character in each frame
Collapse the resulting output string according to CTC's rules: remove repeat characters first, remove blank characters next.
After decoding, you might expect outputs like this when trained on WSJ data:

Ground truth   Model output
united presidential is a life insurance company   younited presidentiol is a lefe in surance company
that was certainly true last week   that was sertainly true last week
we're not ready to say we're in technical default a spokesman said   we're now ready to say we're intechnical default a spokesman said
Or outputs like this when trained on Librispeech (see "Decoding and evaluating a trained model"):

Ground truth   Model output
this had some effect in calming him   this had some offectind calming him
he went in and examined his letters but there was nothing from carrie   he went in an examined his letters but there was nothing from carry
the design was different but the thing was clearly the same   the design was differampat that thing was clarly the same
Getting Started
neon 2.3.0 and the aeon dataloader (v1.0.0) must both be installed.

Clone the repo: git clone https://github.com/NervanaSystems/deepspeech.git && cd deepspeech.

Within a neon virtualenv, run pip install -r requirements.txt.

Run make to build warp-ctc.

Training a model
1. Prepare a manifest file for your dataset.
The details on how to go about doing this are determined by the specifics of the dataset.

Example: Librispeech recipe
A recipe for ingesting Librispeech data is provided in data/ingest_librispeech.py. Note that Librispeech provides distinct datasets for training and validation, and each set must be ingested separately. Additionally, we'll have to get around the quirky way that the Librispeech data is distributed; after "unpacking" the archives, we should re-pack them in a consistent manner.

To be more precise, Librispeech data is distributed in zipped tar files, e.g. train-clean-100.tar.gz for training and dev-clean.tar.gz for validation. Upon unpacking, each archive creates a directory named LibriSpeech, so trying to unpack both files together in the same directory is a bad idea. To get around this, try something like:

$ mkdir librispeech && cd librispeech
$ wget http://www.openslr.org/resources/12/train-clean-100.tar.gz
$ wget http://www.openslr.org/resources/12/dev-clean.tar.gz
$ tar xvzf dev-clean.tar.gz LibriSpeech/dev-clean  --strip-components=1
$ tar xvzf train-clean-100.tar.gz LibriSpeech/train-clean-100  --strip-components=1
Follow the above prescription and you will have the training data as a subdirectory librispeech/train-clean-100 and the validation data in a subdirectory librispeech/dev-clean. To ingest the data, you would then run the python script on the directory where you've unpacked the clean training data, followed by directions to where you want the script to write the transcripts and training mainfests for that dataset:

$ python data/ingest_librispeech.py <absolute path to train-clean-100 directory> <absolute path to directory to write transcripts to> <absolute path to where to write training manifest to>
For example, if the absolute path to the train-clean-100 directory is located in /usr/local/data/librispeech/train-clean-100, run:

$ python data/ingest_librispeech.py  /usr/local/data/librispeech/train-clean-100  /usr/local/data/librispeech/train-clean-100/transcripts_dir  /usr/local/data/librispeech/train-clean-100/train-manifest.csv
which would create a training manifest file named train-manifest.csv. Similarly, if the absolute path to the dev-clean directory is located at /usr/local/data/librispeech/dev-clean, run:

$ python data/ingest_librispeech.py  /usr/local/data/librispeech/dev-clean  /usr/local/data/librispeech/dev-clean/transcripts_dir  /usr/local/data/librispeech/train-clean-100/val-manifest.csv
To train on the full 1000 hours, execute the same commands for the 360 hour and 540 hour training datasets as well. The manifest files can then be concatenated with a simple:

$ cat /path/to/100_hour_manifest.csv /path/to/360_hour_manifest.csv /path/to/540_hour_manifest.csv > /path/to/1000_hour_manifest.csv
2a. Train a new model
$ python train.py --manifest train:<training manifest> --manifest val:<validation manifest> -e <num_epochs> -z <batch_size> -s </path/to/model_output.pkl> [-b <backend>]
where <training manifest> is the path to the training manifest file produced in the ingest. For the example above, that path is /usr/local/data/librispeech/train-clean-100/train-manifest.csv) and <validation manifest> is the path to the validation manifest file.

2b. Continue training after pause on a previous model
For a previously-trained model that wasn't trained for the full time needed, it's possible to resume training by passing the --model_file </path/to/pre-trained_model> argument to train.py. For example, you could continue training a pre-trained model from our Model Zoo sample. This particular model was trained using 1000 hours of speech data from the Librispeech corpus. The model was trained for 16 epochs after attaining a Character Error Rate (CER) of 14% without using a language model. You could continue training it for, say, an additional 4 epochs, by calling:

$ python train.py --manifest train:<training manifest> --manifest val:<validation manifest> -e20  -z <batch_size> -s </path/to/model_output.prm> --model_file </path/to/pre-trained_model> [-b <backend>]
which will save a new model to model_output.prm.

Decoding and evaluating a trained model
After you have a trained model, it's easy to evaluate its performance on any given dataset. Simply create a manifest file and then call:

$ python evaluate.py --manifest val:/path/to/manifest.csv --model_file /path/to/saved_model.prm
replacing the file paths as needed. It prints CERs (Character Error Rates) by default. To instead print WERs (Word Error Rates), include the argument --use_wer.

For example, you could evaluate our pre-trained model from our Model Zoo. To evaluate the pre-trained model, follow these steps:

Download some test data from the Librispeech ASR corpus and prepare a manifest file for the dataset that follows the prescription provided above.

Download the pre-trained DS2 model from our Model Zoo.

Subject the pre-trained model and the manifest file for the test data to the evaluate.py script, as described above.

Optionally inspect the transcripts produced by the trained model; this can be done by appending it with the argument --inference_file <name_of_file_to_save_results_to.pkl>. The result dumps the model transcripts together with the corresponding "ground truth" transcripts to a pickle file.


Consulente in Informatica dal 1984

Software automazione, progettazione elettronica, computer vision, intelligenza artificiale, IoT, sicurezza informatica, tecnologie di sicurezza militare, SIGINT. 

Facebook:https://www.facebook.com/flaviobernardotti58
Twitter : https://www.twitter.com/Flavio58

Cell:  +39 366 3416556

f.bernardotti@deeplearningitalia.eu

#deeplearning #computervision #embeddedboard #iot #ai

 

Related Topics

  Subject / Started by Replies Last post
0 Replies
62 Views
Last post April 26, 2018, 07:48:09 PM
by Flavio58
0 Replies
62 Views
Last post April 30, 2018, 08:55:51 PM
by Flavio58
0 Replies
61 Views
Last post June 30, 2018, 04:04:29 AM
by Flavio58
0 Replies
62 Views
Last post July 07, 2018, 12:01:52 AM
by Flavio58
0 Replies
34 Views
Last post July 29, 2018, 12:04:09 PM
by Flavio58

Sitemap 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326