FocalCodec Inference Result Comparison

Chinese Demo — AISHELL-1 Test Set
Model Frame Rate Bitrate Overall Sample Audio
dCER ↓ MOS_Q ↑ dCER ↓ MOS_Q ↑
File Name: BAC009S0907W0372  |  Speaker: S0907  |  Transcript: 二零一五年二月一日
Original
🔊
Pretrained FocalCodec 50Hz 50Hz650 bps 0.00%2.34 0.00%3.08
🔊
Pretrained FocalCodec 25Hz 25Hz325 bps 0.00%2.43 0.00%2.84
🔊
Pretrained FocalCodec 12.5Hz 12.5Hz163 bps 46.15%2.83 0.00%3.26
🔊
Pretrained FocalCodec-S 50Hz 2k 50Hz550 bps 15.38%2.51 0.00%3.08
🔊
Pretrained FocalCodec-S 50Hz 4k 50Hz600 bps 30.77%2.46 0.00%2.26
🔊
Pretrained FocalCodec-S 50Hz 65k 50Hz800 bps 15.38%2.62 0.00%3.06
🔊
Exp D — Stage 2 25Hz275 bps 6.03%1.46 0.00% 1.53
🔊
Exp E — Stage 1 50Hz550 bps 6.17%2.96 0.00% 2.19
🔊
Exp E — Stage 2 25Hz275 bps 13.3%2.74 0.00% 2.25
🔊
Exp E — Stage 3 12.5Hz137.5 bps 43.6%2.42 76.92% 2.54
🔊
Exp F — Stage 2 25Hz275 bps 14.5%2.68 0.00% 2.34
🔊
Exp F — Stage 3 12.5Hz137.5 bps 39.51%2.31 0.00% 2.09
🔊
Exp G — Stage 2 25Hz275 bps 17.4%2.65 15.38% 2.06
🔊
Exp H — Stage 2 25Hz275 bps 16.4%2.71 15.38% 2.35
🔊
Exp I — Stage 2 25Hz275 bps 16.6%2.95 15.38% 2.81
🔊
Exp J — Stage 2 ⭐ 25Hz275 bps 4.15%1.59 0.00% 1.50
🔊
Exp K — Stage 3 12.5Hz137.5 bps 16.0%1.33 15.38% 1.12
🔊
Focal v6 (Dual-Stream+GAN) 25Hz550 bps 4.76%1.82 0.00% 1.60
🔊
Focal v6-24kHz (Dual-Stream+GAN) 25Hz550 bps 6.28%2.11 0.00% 2.30
🔊
Focal Sem-6bit 25Hz425 bps 8.01%1.44 0.00% 1.34
🔊
English Demo — LibriSpeech test-clean
Model Frame Rate Bitrate Overall Sample Audio
dWER ↓ MOS_Q ↑ dWER ↓ MOS_Q ↑
File Name: 3729-6852-0011  |  Speaker: 3729  |  Transcript: "I had a name I believe in my young days but I have forgotten it since I have been in service."
Original
🔊
Pretrained FocalCodec 50Hz 50Hz650 bps 0.00%3.28 0.00%3.66
🔊
Pretrained FocalCodec 25Hz 25Hz325 bps 0.00%3.61 0.00%3.71
🔊
Pretrained FocalCodec 12.5Hz 12.5Hz163 bps 0.00%3.61 0.00%3.15
🔊
Pretrained FocalCodec-S 50Hz 2k 50Hz550 bps 0.00%2.64
🔊
Pretrained FocalCodec-S 50Hz 4k 50Hz600 bps 0.00%2.78
🔊
Pretrained FocalCodec-S 50Hz 65k 50Hz800 bps 0.00%3.34
🔊
Exp E — Stage 1 50Hz550 bps 3.87%3.32 5.88% 2.52
🔊
Exp E — Stage 2 25Hz275 bps 30.6%2.95 41.18% 2.48
🔊
Exp E — Stage 3 12.5Hz137.5 bps 96.2%2.51 76.47% 2.27
🔊
Exp F — Stage 2 25Hz275 bps 27.7%2.82 23.53% 2.88
🔊
Exp F — Stage 3 12.5Hz137.5 bps 93.14%2.34 4.76% 2.46
🔊
Exp G — Stage 2 25Hz275 bps 10.1%3.02 3.04
🔊
Exp H — Stage 2 25Hz275 bps 11.8%3.07 2.40
🔊
Exp I — Stage 2 25Hz275 bps 11.3%3.31 0.00% 3.11
🔊
Exp J — Stage 2 ⭐ 25Hz275 bps 3.02%2.00 0.00% 1.99
🔊
Exp K — Stage 3 12.5Hz137.5 bps 10.0%1.44 11.76% 1.46
🔊
Focal v6 (Dual-Stream+GAN) 25Hz550 bps 4.52%2.10 0.00% 2.09
🔊
Focal v6-24kHz (Dual-Stream+GAN) 25Hz550 bps 4.56%2.50 0.00% 2.58
🔊
Focal Sem-6bit 25Hz425 bps 4.87%1.29 4.76% 1.08
🔊