164:
An intermediate project assessment meeting was held in Lublin, Poland, in 1996. Work then continued until a final assessment and presentation of outcomes in
Granada, Spain, at the First International Conference on Language Resources and Evaluation, in 1998. The project was completed in December 1998.
58:
The initial impetus came from the SAM (Speech
Assessment Methods) project funded by the European Union as ESPRIT Project #1541 in 1987–89. This project was conducted by an international group of phoneticians, and was applied in the first instance to the European Communities languages Danish, Dutch,
168:
At the time of its completion, BABEL was the largest high-quality speech database available for research purposes in languages such as
Hungarian and Estonian. It has been used for research into topics such as pronunciation modeling and automatic speech recognition. The project was also part of what
169:
has been called the most significant recent development in corpus linguistics – the increasing range of languages covered by corpus data, which promises to bring to a wider range of languages the benefits that corpus linguistics has brought to the study of
Western European languages.
84:, and Grant #1304 was awarded for this. A pilot project to create a small corpus of spoken Bulgarian was carried out jointly by the Universities of Sofia (Bulgaria) and Reading (U.K.). The initial meeting of the whole project team took place at the University of Reading in 1995.
63:
computer-based phonetic transcription which was also used for the BABEL project) and a corpus of recorded speech material distributed on CD-ROM. A proposal was made to the
European Union under the Copernicus initiative in 1994, with the objective of creating a corpus of spoken
92:
Since the objective was to produce material suitable for use in speech technology applications, the digital recordings were made in strictly controlled conditions in recording studios. For each language the material had the following composition:
253:
97:
Many-talker set: 30 males and 30 females each read 100 numbers, 3 connected-speech passages and 5 "filler" sentences (to provide further instances of some items) or 4 passages if no fillers were needed.
165:
The resulting set of corpora was then supplied to the
European Language Resources Association. ELRA is exclusively responsible for distributing the material to users via their website.
100:
Few-talker set: 5 males and 5 females, normally selected from the above group, each read 5 blocks of 100 numbers, 15 passages and 25 filler sentences, plus 5 lists of syllables.
188:, B. Lindberg, A. Moreno, J. Mouropoulos, F. Senia, I. Trancoso, C. Veld & J. Zeiliger, "EUROM – A Spoken Language Resource for the EU", in Eurospeech'95,
276:
Fegyó, Tibor; Péter
Mihajlik; Péter Tatai; Géza Gordos (2001). "Pronunciation modeling in Hungarian number recognition." In INTERSPEECH, pp. 1465-1468.
383:
222:
Misheva, A., Dimitrova, S., Filipov, V., Grigorova, E., Nikov, M., Roach, P. and
Arnfield, S. ‘Bulgarian Speech Database: a pilot study’,
352:
103:
Very-few-talker set: 1 male and 1 female selected from the above read 5 blocks of syllables, with and without carrier sentences.
378:
47:
Following the creation of a speech corpus of
European Union languages by the SAM project, funding was granted by the
239:, K.Marasek, A.Marchal, E.Meister, K.Vicsi (1998). ‘BABEL: A Database Of Central And Eastern European Languages’,
52:
32:
113:
373:
59:
English, French, German, and
Italian (by 1989). SAM produced many speech research tools (including the
235:
Roach, P., S.Arnfield, W.Barry, S.Dimitrova, M.Boldea, A.Fourcin, W.Gonet, R.Gubrynowicz, E.Hallum,
344:
150:
125:
Bulgaria: initially, A. Misheva until her death in 1995, then S. Dimitrova (University of Sofia).
73:
65:
35:
languages. Intended for use in speech technology applications, it was funded by a grant from the
348:
81:
69:
336:
317:
77:
337:
39:
and completed in 1998. It is distributed by the European Language Resources Association.
190:
Proceedings of the 4th European Conference on Speech Communication and Speech Technology
155:
United Kingdom: J. Wells (University College London); P. Roach (University of Reading)
131:
Poland: R. Gubrynowicz (Polish Academy of Sciences) and W. Gonet (University of Lublin)
48:
36:
17:
302:
241:
Proceedings of the First International Conference on Language Resources and Evaluation
367:
184:
D. Chan, A. Fourcin, D. Gibbon, B. Granstrom, M. Huckvale, G. Kokkinakis, K. Kvale,
288:
Large vocabulary continuous speech recognition for Estonian using morpheme classes
321:
236:
185:
144:
202:
51:
for the creation along similar lines of a speech corpus of languages of
60:
301:
Mihajlik, Péter; Révész, Tibor; Tatai, Péter (2002-11-01).
192:. Madrid, Spain, 18–21 September 1995. Vol 1, pp. 867-870
303:"Phonetic transcription in automatic speech recognition"
129:
Hungary: K. Vicsi (Technical University of Budapest)
31:is a corpus of recorded speech materials from five
243:, eds. A. Rubio et al, Granada, Vol. 1, pp. 371-4.
290:. INTERSPEECH, Jeju, Korea. pp. 389–392.
133:Romania: M. Boldea (University of Timișoara)
120:Project leaders in Central and Eastern Europe
8:
272:
270:
127:Estonia: E. Meister (University of Tallinn)
256:. European Language Resources Association
177:
153:); K. Marasek (University of Stuttgart)
203:"EUROM1 – Multilingual Speech Corpus"
7:
343:. Oxford University Press. p.
339:Corpus Linguistics: An Introduction
226:, Madrid, vol. 1, pp.859-862 (1995)
25:
147:(LIMSI, Paris); A. Marchal (CNRS)
138:Project members in Western Europe
43:Development of the BABEL Project
108:Membership of the BABEL Project
1:
384:1998 establishments in Europe
224:Proceedings of Eurospeech ‘95
33:Central and Eastern European
322:10.1556/ALing.49.2002.3-4.9
254:"Search results for: babel"
205:. University College London
400:
310:Acta Linguistica Hungarica
55:, with the name of BABEL.
53:Central and Eastern Europe
149:Germany : W. Barry (
116:(University of Reading)
18:The BABEL Speech Corpus
335:McEnery, Tony (2001).
286:Alumae, Tanel (2004).
379:Linguistic research
151:Saarland University
29:BABEL speech corpus
112:Project Director:
88:Recorded material
16:(Redirected from
391:
359:
358:
342:
332:
326:
325:
307:
298:
292:
291:
283:
277:
274:
265:
264:
262:
261:
250:
244:
233:
227:
220:
214:
213:
211:
210:
199:
193:
182:
160:Project outcomes
21:
399:
398:
394:
393:
392:
390:
389:
388:
374:Phonetics works
364:
363:
362:
355:
334:
333:
329:
305:
300:
299:
295:
285:
284:
280:
275:
268:
259:
257:
252:
251:
247:
234:
230:
221:
217:
208:
206:
201:
200:
196:
183:
179:
175:
162:
157:
154:
148:
140:
135:
132:
130:
128:
126:
122:
110:
90:
45:
23:
22:
15:
12:
11:
5:
397:
395:
387:
386:
381:
376:
366:
365:
361:
360:
353:
327:
316:(3): 407–425.
293:
278:
266:
245:
228:
215:
194:
176:
174:
171:
161:
158:
141:
139:
136:
123:
121:
118:
109:
106:
105:
104:
101:
98:
89:
86:
49:European Union
44:
41:
37:European Union
24:
14:
13:
10:
9:
6:
4:
3:
2:
396:
385:
382:
380:
377:
375:
372:
371:
369:
356:
354:9780748611652
350:
346:
341:
340:
331:
328:
323:
319:
315:
311:
304:
297:
294:
289:
282:
279:
273:
271:
267:
255:
249:
246:
242:
238:
232:
229:
225:
219:
216:
204:
198:
195:
191:
187:
181:
178:
172:
170:
166:
159:
156:
152:
146:
137:
134:
119:
117:
115:
107:
102:
99:
96:
95:
94:
87:
85:
83:
79:
75:
71:
67:
62:
56:
54:
50:
42:
40:
38:
34:
30:
19:
338:
330:
313:
309:
296:
287:
281:
258:. Retrieved
248:
240:
231:
223:
218:
207:. Retrieved
197:
189:
180:
167:
163:
142:
124:
111:
91:
57:
46:
28:
26:
114:Peter Roach
368:Categories
260:2015-01-18
209:2015-01-19
173:References
74:Hungarian
66:Bulgarian
186:L. Lamel
145:L. Lamel
143:France:
82:Romanian
70:Estonian
237:L.Lamel
351:
78:Polish
306:(PDF)
61:SAMPA
349:ISBN
80:and
27:The
345:188
318:doi
370::
347:.
314:49
312:.
308:.
269:^
76:,
72:,
68:,
357:.
324:.
320::
263:.
212:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.