Computer audition - Knowledge (XXG)

174:

accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) could benefit from efficient compression schemes, which discard inaudible information in the sound. Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces.

339:. The ability to separate sources from stereo signals requires different techniques than those usually applied in communications where multiple sensors are available. Other source separation methods rely on training or clustering of features in mono recording, such as tracking harmonically related partials for multiple pitch detection. Some methods, before explicit recognition, rely on revealing structures in data without knowing the structures (like recognizing objects in abstract pictures without attributing them meaningful labels) by finding the least complex data representations, for instance describing audio scenes as generated by a few tone patterns and their trajectories (polyphonic voices) and acoustical contours drawn by a tone (chords). 330:

Since one of the basic characteristics of general audio is that it comprises multiple simultaneously sounding sources, such as multiple musical instruments, people talking, machine noises or animal vocalization, the ability to identify and separate individual sources is very desirable. Unfortunately,

173:

Since audio signals are interpreted by the human ear–brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio

317:

Comparison of sounds can be done by comparison of features with or without reference to time. In some cases an overall similarity can be assessed by close values of features between two sounds. In other cases when temporal structure is important, methods of dynamic time warping need to be applied to

364:

Among the available data for describing music, there are textual representations, such as liner notes, reviews and criticisms that describe the audio contents in words. In other cases human reactions such as emotional judgements or psycho-physiological measurements might provide an insight into the

308:

Finding specific musical structures is possible by using musical knowledge as well as supervised and unsupervised machine learning methods. Examples of this include detection of tonality according to distribution of frequencies that correspond to patterns of occurrence of notes in musical scales,

351:

due to creation of expectations and their realization or violation. Animals attend to signs of danger in sounds, which could be either specific or general notions of surprising and unexpected change. Generally, this creates a situation where computer audition can not rely solely on detection of

34:

and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The

288:

Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape etc., statistical

279:

models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual

621:

Hendrik Purwins, Perfecto Herrera, Maarten Grachten, Amaury Hazan, Ricard Marxer, and Xavier Serra. Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews, vol. 5, no. 3, pp. 151-168, 2008.

241:

Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of

258:

algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as

352:

specific features or sound properties and has to come up with general methods of adapting to changing auditory environment and monitoring its structure. This consists of analysis of larger repetition and

266:

Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a

289:

characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (

45:, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." 365:

contents and structure of audio. Computer Audition tries to find relation between these different representations in order to provide this additional understanding of the audio contents.

300:

Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation.

181:

Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture.

318:"correct" for different temporal scales of acoustic events. Finding repetitions and similar sub-sequences of sonic events is important for tasks such as texture synthesis and 347:

Listening to music and general audio is commonly not a task directed activity. People enjoy music for various poorly understood reasons, which are commonly referred to the

389: 60:

for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of

96:

versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of

690: 309:

distribution of note onset times for detection of beat structure, distribution of energies in different frequencies to detect musical chords and so on.

720: 514: 566: 793: 223:

Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods.

461: 394: 788: 530:

Tanguiane (Tanguiane), Andranick (1994). "A principle of correlativity of perception and its application to music recognition".

217:

Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering.

53: 226:

Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure.

384: 290: 725: 451: 139: 683: 676: 124: 335:

fashion. Existing methods of source separation rely sometimes on correlation between different audio channels in

188: 255: 767: 379: 336: 145: 81: 229: 155:

and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data.

762: 374: 319: 65: 41: 710: 745: 633: 120: 112: 73: 750: 604: 547: 184: 97: 493: 356:

structures in audio to detect innovation, as well as ability to predict local feature dynamics.

757: 735: 596: 510: 477: 457: 399: 348: 251: 192: 104: 61: 740: 654: 586: 578: 539: 243: 77: 353: 100:

since it deals with general audio signals, such as natural sounds and musical recordings.

93: 49: 36: 509:. Lecture Notes in Artificial Intelligence. Vol. 746. Berlin-Heidelberg: Springer. 730: 158: 421: 782: 645:

Tanguiane (Tangian), Andranick (1995). "Towards axiomatization of music perception".

116: 608: 715: 416: 247: 272: 196: 108: 271:

representation for general audio. Parametric audio representations usually use

658: 582: 220:

Sequence modeling: matching and alignment between signals and note sequences.

152: 232:

analysis: finding correspondences between textual, visual, and audio signals.

276: 268: 69: 57: 31: 600: 177:

The study of CA could be roughly divided into the following sub-problems:

203: 551: 332: 211: 591: 411: 142:: methods for search and analysis of similarity between music signals. 543: 294: 207: 668: 623: 103:

Applications of computer audition are widely varying, and include

494:

Paris Smaragdis taught computers how to play more life-like music

432:

Sound and Music Computing, Aalborg University Copenhagen, Denmark

567:"Pervasive Sound Sensing: A Weakly Supervised Training Approach" 260: 164:

Machine musicianship: audition driven interactive music systems.

672: 426: 148:: understanding and description of audio sources and events. 135:

Computer Audition overlaps with the following disciplines:

431: 56:, grouping, use of musical knowledge and general sound 478:"Machine Audition: Principles, Algorithms and Systems" 331:

there are no methods that can solve this problem in a

427:

Department of Electrical Engineering, IIT (Bangalore)

453:

Machine Audition: Principles, Algorithms and Systems

280:

information in the case of audio-visual recordings.

161:: use of computers in creative musical applications. 254:are samples of acoustic waveform or parameters of 390:Medical intelligence and language engineering lab 417:George Tzanetakis' Computer Audition Resources 684: 422:Shlomo Dubnov's Tutorial on Computer Audition 52:, CA deals with questions of representation, 8: 565:Kelly, Daniel; Caulfield, Brian (Feb 2015). 507:Artificial Perception and Music Recognition 691: 677: 669: 202:Musical knowledge structures: analysis of 187:: sound descriptors, segmentation, onset, 16:Study of understanding of audio by machine 590: 80:, as well as more traditional methods of 634:Machine Listening Course Webpage at MIT 505:Tanguiane (Tangian), Andranick (1993). 443: 313:Sound similarity and sequence modeling 84:for musical knowledge representation. 721:Computational auditory scene analysis 7: 111:recognition, acoustic monitoring, 14: 395:Music and artificial intelligence 30:is the general field of study of 571:IEEE Transactions on Cybernetics 199:, and auditory representations. 98:speech understanding by machine 1: 647:Journal of New Music Research 385:List of emerging technologies 726:Music information retrieval 412:UCSD Computer Audition Lab 140:Music information retrieval 810: 794:Digital signal processing 706: 659:10.1080/09298219508570685 583:10.1109/TCYB.2015.2396291 349:emotional effect of music 337:multi-channel recordings 789:Artificial intelligence 768:3D sound reconstruction 380:Audio signal processing 146:Auditory scene analysis 82:artificial intelligence 68:, music perception and 48:Inspired by models of 763:3D sound localization 375:3D sound localization 320:machine improvisation 297:invariance (chroma). 237:Representation issues 711:Acoustic fingerprint 456:. IGI Global. 2011. 360:Multi-modal analysis 746:Speaker recognition 131:Related disciplines 121:music improvisation 115:, score following, 113:music transcription 74:pattern recognition 751:Speech recognition 343:Auditory cognition 293:) in frequency or 252:Digital recordings 185:Feature extraction 66:auditory modelling 776: 775: 758:Sound recognition 736:Speech processing 700:Computer audition 516:978-3-540-57394-4 400:Sound recognition 326:Source separation 304:Musical knowledge 256:audio compression 105:search for sounds 62:signal processing 42:Technology Review 39:, interviewed in 28:machine listening 20:Computer audition 801: 741:Speech analytics 693: 686: 679: 670: 663: 662: 642: 636: 631: 625: 619: 613: 612: 594: 562: 556: 555: 544:10.2307/40285634 532:Music Perception 527: 521: 520: 502: 496: 491: 485: 484: 482: 474: 468: 467: 448: 125:emotion in audio 78:machine learning 809: 808: 804: 803: 802: 800: 799: 798: 779: 778: 777: 772: 702: 697: 667: 666: 644: 643: 639: 632: 628: 620: 616: 564: 563: 559: 529: 528: 524: 517: 504: 503: 499: 492: 488: 480: 476: 475: 471: 464: 450: 449: 445: 440: 408: 371: 362: 354:self-similarity 345: 328: 315: 306: 286: 239: 171: 133: 94:computer vision 90: 37:Paris Smaragdis 17: 12: 11: 5: 807: 805: 797: 796: 791: 781: 780: 774: 773: 771: 770: 765: 760: 755: 754: 753: 748: 743: 733: 731:Semantic audio 728: 723: 718: 713: 707: 704: 703: 698: 696: 695: 688: 681: 673: 665: 664: 653:(3): 247–281. 637: 626: 614: 577:(1): 123–135. 557: 538:(4): 465–502. 522: 515: 497: 486: 469: 462: 442: 441: 439: 436: 435: 434: 429: 424: 419: 414: 407: 406:External links 404: 403: 402: 397: 392: 387: 382: 377: 370: 367: 361: 358: 344: 341: 327: 324: 314: 311: 305: 302: 285: 282: 238: 235: 234: 233: 227: 224: 221: 218: 215: 200: 182: 170: 169:Areas of study 167: 166: 165: 162: 159:Computer music 156: 151:Computational 149: 143: 132: 129: 89: 86: 50:human audition 15: 13: 10: 9: 6: 4: 3: 2: 806: 795: 792: 790: 787: 786: 784: 769: 766: 764: 761: 759: 756: 752: 749: 747: 744: 742: 739: 738: 737: 734: 732: 729: 727: 724: 722: 719: 717: 714: 712: 709: 708: 705: 701: 694: 689: 687: 682: 680: 675: 674: 671: 660: 656: 652: 648: 641: 638: 635: 630: 627: 624: 618: 615: 610: 606: 602: 598: 593: 588: 584: 580: 576: 572: 568: 561: 558: 553: 549: 545: 541: 537: 533: 526: 523: 518: 512: 508: 501: 498: 495: 490: 487: 479: 473: 470: 465: 463:9781615209194 459: 455: 454: 447: 444: 437: 433: 430: 428: 425: 423: 420: 418: 415: 413: 410: 409: 405: 401: 398: 396: 393: 391: 388: 386: 383: 381: 378: 376: 373: 372: 368: 366: 359: 357: 355: 350: 342: 340: 338: 334: 325: 323: 321: 312: 310: 303: 301: 298: 296: 292: 283: 281: 278: 274: 270: 264: 262: 257: 253: 249: 245: 236: 231: 228: 225: 222: 219: 216: 213: 209: 205: 201: 198: 194: 190: 186: 183: 180: 179: 178: 175: 168: 163: 160: 157: 154: 150: 147: 144: 141: 138: 137: 136: 130: 128: 126: 122: 118: 117:audio texture 114: 110: 106: 101: 99: 95: 87: 85: 83: 79: 75: 71: 67: 63: 59: 55: 51: 46: 44: 43: 38: 33: 29: 25: 21: 716:Audio mining 699: 650: 646: 640: 629: 617: 574: 570: 560: 535: 531: 525: 506: 500: 489: 472: 452: 446: 363: 346: 329: 316: 307: 299: 287: 273:filter banks 265: 250:recordings. 240: 176: 172: 134: 102: 91: 88:Applications 54:transduction 47: 40: 27: 23: 19: 18: 230:Multi-modal 195:detection, 127:and so on. 783:Categories 592:10197/6853 438:References 277:sinusoidal 269:parametric 153:musicology 32:algorithms 291:bandwidth 212:harmonies 70:cognition 58:semantics 35:engineer 609:16042016 601:25675471 552:40285634 369:See also 284:Features 244:analogue 204:tonality 193:envelope 263:files. 248:digital 607: 599: 550: 513: 460: 333:robust 295:octave 210:, and 208:rhythm 197:chroma 76:, and 605:S2CID 548:JSTOR 481:(PDF) 189:pitch 109:genre 92:Like 26:) or 597:PMID 511:ISBN 458:ISBN 261:MIDI 191:and 655:doi 587:hdl 579:doi 540:doi 275:or 246:or 785:: 651:24 649:. 603:. 595:. 585:. 575:46 573:. 569:. 546:. 536:11 534:. 322:. 206:, 123:, 119:, 107:, 72:, 64:, 24:CA 692:e 685:t 678:v 661:. 657:: 611:. 589:: 581:: 554:. 542:: 519:. 483:. 466:. 214:. 22:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index