Domain Adaptation Via Teacher Student Learning For Speech Recognition ๐Ÿ”—

๐Ÿ‘ฉโ€๐Ÿซโžก๐Ÿ‘ฉโ€๐ŸŽ“ Presented by Christabella Irwanto View this post as slides.

Need for data-efficiency ๐Ÿ”—

Transfer learning ๐Ÿ”—

Domain adaptation ๐Ÿ”—

Speaker adaptation ๐Ÿ”—

Setting the context ๐Ÿ”—

T/S learning ๐Ÿ”—

Parallel data ๐Ÿ”—

Source domain Target domain How to simulate?
Clean speech Noisy speech Add noise
Close-talk speech Far-field speech Apply RIR, add noise
Adults Children Voice morphing
Original speech Compressed speech Apply codec
Wideband speech Narrowband speech Downsample/filter

Architecture ๐Ÿ”—

Domain adaptation of a (teacher) acoustic model that is well-trained with source-domain transcribed data to a target domain (Li et al., 2017)

How T/S learning works ๐Ÿ”—

Successes ๐Ÿ”—

Advantages ๐Ÿ”—

Shortfalls of T/S learning ๐Ÿ”—

IT/S, i.e. knowledge distillation ๐Ÿ”—

Conditional T/S ๐Ÿ”—

The student network exclusively uses the soft posteriors from the teacher as the training target when the teacher is correct and uses the hard label instead when the teacher is wrong (Meng et al., 2019).

Results ๐Ÿ”—

Adaptive T/S ๐Ÿ”—

T/S learning for AED models ๐Ÿ”—

T/S learning for unsupervised domain adaptation of AED model for E2E ASR. The two orange lines signify the two-level knowledge transfer. (Meng et al., 2019)

AT/S learning for AED models ๐Ÿ”—

AT/S for supervised domain adaptation of AED model for E2E ASR (Meng et al., 2019).

AT/S results ๐Ÿ”—

The ASR WER (%) of far-field AEDs trained with CE and AED models adapted by various T/S learning methods to 3400 hours far-field Microsoft Cortana data for E2E ASR on HK speaker test set. (Meng et al., 2019)

Conclusion ๐Ÿ”—

Resources ๐Ÿ”—

Questions for assignment ๐Ÿ”—

  1. Give some examples of transfer learning and domain adaptation in speech. Explain in terms of domains and tasks, as defined by Pan & Yang.
  2. What is teacher/student learning useful for?
  3. What are the drawbacks of teacher/student learning in general (whether soft, interpolated, conditional, adaptive etc.)?

Bibliography

Li, J., Seltzer, M. L., Wang, X., Zhao, R., & Gong, Y. (2017), [Large-scale domain adaptation via teacher-student learning](), arXiv preprint arXiv:1708.05466. โ†ฉ

Pan, S. J., & Yang, Q. (2009), [A survey on transfer learning](), IEEE Transactions on knowledge and data engineering. โ†ฉ

Kamath, U., Liu, J., & Whitaker, J., Deep learning for nlp and speech recognition (2019), : Springer. โ†ฉ

Wang, K., Zhang, J., Wang, Y., & Xie, L. (2018), [Empirical evaluation of speaker adaptation on dnn based acoustic model](), arXiv preprint arXiv:1803.10146. โ†ฉ

Meng, Z., Li, J., Gaur, Y., & Gong, Y., Domain adaptation via teacher-student learning for end-to-end speech recognition, In , 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 268โ€“275) (2019). : . โ†ฉ

Meng, Z., Li, J., Zhao, Y., & Gong, Y., Conditional teacher-student learning, In , ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6445โ€“6449) (2019). : . โ†ฉ

Meng, Z., Chen, Z., Mazalov, V., Li, J., & Gong, Y., Unsupervised adaptation with domain separation networks for robust speech recognition, In , 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 214โ€“221) (2017). : . โ†ฉ