Knowledge (XXG)

Subjective video quality

Source đź“ť

338:(Absolute Category Rating with Hidden Reference): a variation of ACR, in which an original unimpaired source sequence is shown in addition to the impaired sequences, without informing the subjects of its presence (hence, "hidden"). The ratings are calculated as differential scores between the reference and the impaired versions. The differential score is defined as the score of the PVS minus the score given to the hidden reference, plus the number of points on the scale. For example, if a PVS is rated as “poor", and its corresponding hidden reference as “good", then the rating is 441:. A number of subjective picture and video quality databases based on such studies have been made publicly available by research institutes. These databases – some of which have become de facto standards – are used globally by television, cinematic, and video engineers around the world to design and test objective quality models, since the developed models can be trained against the obtained subjective data. An overview of publicly available databases has been compiled by the 255:
cheat during the test. The overall reliability of a subject can be determined by various procedures, some of which are outlined in ITU-R and ITU-T recommendations. For example, the correlation between a person's individual scores and the overall MOS, evaluated for all sequences, is a good indicator of their reliability in comparison with the remaining test participants.
264:
describing the rating process and subsequently recovering noisiness in subjective ratings. According to Janowski et al., subjects may have an opinion bias that generally shifts their scores, as well as a scoring imprecision that is dependent on the subject and stimulus to be rated. Li et al. have proposed to differentiate between
233:. Here, viewers give ratings using their own computer, at home, rather than taking part in a subjective quality test in laboratory rooms. While this method allows for obtaining more results than in traditional subjective tests at lower costs, the validity and reliability of the gathered responses must be carefully checked. 429:
as DCR should be used for testing the fidelity of transmission, especially in high quality systems. ACR and ACR-HR are better suited for qualification tests and – due to giving absolute results – comparison of systems. The PC method has a high discriminatory power, but it requires longer test sessions.
166:
least 15 observers should participate in the experiment. They should not be directly involved in picture quality evaluation as part of their work and should not be experienced assessors. In other documents, it is also claimed that at minimum 10 subjects are needed to obtain meaningful averaged ratings.
217:
Subjective quality tests can be done in any environment. However, due to possible influence factors from heterogenous contexts, it is typically advised to perform tests in a neutral environment, such as a dedicated laboratory room. Such a room may be sound-proofed, with walls painted in neutral grey,
241:
Opinions of viewers are typically averaged into the mean opinion score (MOS). To this aim, the labels of categorical scales may be translated into numbers. For example, the responses "bad" to "excellent" can be mapped to the values 1 to 5, and then averaged. MOS values should always be reported with
208:
community as to whether a viewer's cultural, social, or economic background has a significant impact on the obtained subjective video quality results. A systematic study involving six laboratories in four countries found no statistically significant impact of subject's language and culture / country
173:
Brunnström and Barkowsky have provided calculations for estimating the minimum number of subjects necessary based on existing subjective tests. They claim that in order to ensure statistically significant differences when comparing ratings, a larger number of subjects than usually recommended may be
169:
However, most recommendations for the number of subjects have been designed for measuring video quality encountered by a home television or PC user, where the range and diversity of distortions tend to be limited (e.g., to encoding artifacts only). Given the large ranges and diversity of impairments
161:
Viewers are also called "observers" or "subjects". A certain minimum number of viewers should be invited to a study, since a larger number of subjects increases the reliability of the experiment outcome, for example by reducing the standard deviation of averaged ratings. Furthermore, there is a risk
428:
Which method to choose largely depends on the purpose of the test and possible constraints in time and other resources. Some methods may have fewer context effects (i.e. where the order of stimuli influences the results), which are unwanted test biases. In ITU-T P.910, it is noted that methods such
280:
There are many ways to select proper sequences, system settings, and test methodologies. A few of them have been standardized. They are thoroughly described in several ITU-R and ITU-T recommendations, among those ITU-R BT.500 and ITU-T P.910. While there is an overlap in certain aspects, the BT.500
165:
The minimum number of subjects that are required for a subjective video quality study is not strictly defined. According to ITU-T, any number between 4 and 40 is possible, where 4 is the absolute minimum for statistical reasons, and inviting more than 40 subjects has no added value. In general, at
117:
Typically, a system should be tested with a representative number of different contents and content characteristics. For example, one may select excerpts from contents of different genres, such as action movies, news shows, and cartoons. The length of the source video depends on the purpose of the
254:
Often, additional measures are taken before evaluating the results. Subject screening is a process in which viewers whose ratings are considered invalid or unreliable are rejected from further analysis. Invalid ratings are hard to detect, as subjects may have rated without looking at a video, or
263:
While rating stimuli, humans are subject to biases. These may lead to different and inaccurate scoring behavior and consequently result in MOS values that are not representative of the “true quality” of a stimulus. In the recent years, advanced models have been proposed that aim at formally
108:
Many parameters of the viewing conditions may influence the results, such as room illumination, display type, brightness, contrast, resolution, viewing distance, and the age and educational level of viewers. It is therefore advised to report this information along with the obtained ratings.
399:(Double Stimulus Continuous Quality Scale): the viewer sees an unimpaired reference and the impaired sequence in a random order. They are allowed to re-view the sequences, and then rate the quality for both on a continuous scale labeled with the ACR categories. 307:
Another recommendation, ITU-T P.913, gives researchers more freedom to conduct subjective quality tests in environments different from a typical testing laboratory, while still requiring them to report all details necessary to make such tests reproducible.
409:(Degradation Category Rating): both refer to the same method. The viewer sees an unimpaired reference video, then the same video impaired, and after that they are asked to vote on the second video using a so-called 136:
The design of the HRCs depends on the system under study. Typically, multiple independent variables are introduced at this stage, and they are varied with a number of levels. For example, to test the quality of a
48:
in which a number of viewers rate a given set of stimuli. These tests are quite expensive in terms of time (preparation and running) and human resources and must therefore be carefully designed.
218:
and using properly calibrated light sources. Several recommendations specify these conditions. Controlled environments have been shown to result in lower variability in the obtained scores.
182:
Viewers should be non-experts in the sense of not being professionals in the field of video coding or related domains. This requirement is introduced to avoid potential subject bias.
643:
ITU-T P.913: Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment
419:(Pair Comparison): instead of comparing an unimpaired and impaired sequence, different impairment types (HRCs) are compared. All possible combinations of HRCs should be evaluated. 26:
as experienced by humans. It is concerned with how video is perceived by a viewer (also called "observer" or "subject") and designates their opinion on a particular
713:
Hossfeld, Tobias; Hirth, Matthias; Redi, Judith; Mazza, Filippo; Korshunov, Pavel; Naderi, Babak; Seufert, Michael; Gardlo, Bruno; Egger, Sebastian (October 2014).
374: 386:), on which subjects rate the current quality. Samples are taken in regular intervals, resulting in a quality curve over time rather than a single quality rating. 170:
that may occur on videos captured with mobile devices and/or transmitted over wireless networks, generally, a larger number of human subjects may be required.
332:. The labels on the scale are "bad", "poor", "fair", "good", and "excellent", and they are translated to the values 1, 2, 3, 4 and 5 when calculating the MOS. 121:
The amount of motion and spatial detail should also cover a broad range. This ensures that the test contains sequences which are of different complexity.
582:
Pinson, M. H.; Janowski, L.; Pepion, R.; Huynh-Thu, Q.; Schmidmer, C.; Corriveau, P.; Younkin, A.; Callet, P. Le; Barkowsky, M. (October 2012).
38:
have been shown to correlate poorly with subjective ratings. Subjective ratings may also be used as ground truth to develop new algorithms.
382:(Single Stimulus Continuous Quality Rating): a longer sequence is rated continuously over time using a slider device (a variation of a 883: 825: 141:, independent variables may be the video encoding software, a target bitrate, and the target resolution of the processed sequence. 98:
Carry out testing in a specific environment (e.g. a laboratory context) and present each PVS in a certain order to every viewer
504: 715:"Best Practices and Recommendations for Crowdsourced QoE - Lessons learned from the Qualinet Task Force "Crowdsourcing"" 144:
It is advised to select settings that result in ratings which cover the full quality range. In other words, assuming an
888: 79:. To evaluate the subjective video quality of a video processing system, the following steps are typically taken: 383: 145: 526:"Statistical quality of experience analysis: on planning the sample size and statistical significance testing" 300:
whether ratings are absolute, i.e. referring to one stimulus only, or relative (comparing two or more stimuli)
92:
Choose a test method, describing how sequences are presented to viewers and how their opinion is collected
34:. Measuring subjective video quality is necessary because objective quality assessment algorithms such as 678:
Hossfeld, Tobias (2014-01-15). "Best Practices for QoE Crowdtesting: QoE Assessment With Crowdsourcing".
893: 722: 230: 205: 31: 229:
has recently been used for subjective video quality evaluation, and more generally, in the context of
598: 537: 464: 125: 584:"The Influence of Subjects and Environment on Audiovisual Subjective Tests: An International Study" 243: 798:
Li, Zhi; Bampis, Christos G. (2017). "Recover Subjective Quality Scores from Noisy Measurements".
583: 831: 803: 780: 695: 622: 561: 525: 102: 72: 492:
ITU-T Rec. P.910 : Subjective video quality assessment methods for multimedia applications
821: 772: 614: 553: 666:
ITU-R BT.500: Methodology for the subjective assessment of the quality of television pictures
813: 762: 687: 606: 545: 376:. When these ratings are averaged, the result is not a MOS, but a differential MOS ("DMOS"). 465:
ITU-T Tutorial: Objective perceptual assessment of video quality: Full reference television
281:
recommendation has its roots in broadcasting, whereas P.910 focuses on multimedia content.
735: 194: 341: 602: 541: 877: 442: 297:
whether ratings are taken once per stimulus (e.g. after presentation) or continuously
226: 198: 190: 186: 148:
scale, the test should show sequences that viewers would rate from bad to excellent.
45: 23: 849: 835: 784: 699: 626: 565: 714: 437:
The results of subjective quality tests, including the used stimuli, are called
138: 55:("Sources", i.e. original video sequences) are treated with various conditions ( 751:"The Accuracy of Subjects in a Quality Experiment: A Theoretical Subject Model" 610: 549: 776: 767: 750: 691: 618: 557: 128:
or other properties that would lower the quality of the original sequence.
413:(from "impairments are imperceptible" to "impairments are very annoying"). 665: 491: 817: 642: 328:(Absolute Category Rating): each sequence is rated individually on the 316:
Below, some examples of standardized testing procedures are explained.
284:
A standardized testing method usually describes the following aspects:
101:
Calculate rating results for individual PVSs, SRCs and HRCs, e.g. the
71:
The main idea of measuring subjective video quality is similar to the
505:"On the properties of subjectiveratings in video quality experiments" 162:
of having to exclude subjects for unreliable behavior during rating.
118:
test, but typically, sequences of no less than 10 seconds are used.
808: 246:
so that the general agreement between observers can be evaluated.
124:
Sources should be of pristine quality. There should be no visible
76: 27: 35: 89:
Apply settings to the SRC, which results in the test sequences
294:
how many times and in which order each PVS should be viewed
850:"Comparing Subjective Video Quality Testing Methodologies" 854:
SPIE Video Communications and Image Processing Conference
83:
Choose original, unimpaired video sequences for testing
344: 86:
Choose settings of the system that should be evaluated
868: 591:
IEEE Journal of Selected Topics in Signal Processing
446: 16:
Assessment of video quality as experienced by humans
524:Brunnström, Kjell; Barkowsky, Marcus (2018-09-25). 445:, and video assets have been made available in the 59:for "Hypothetical Reference Circuits") to generate 368: 51:In subjective video quality tests, typically, 8: 749:Janowski, Lucjan; Pinson, Margaret (2015). 807: 766: 343: 30:sequence. It is related to the field of 457: 405:(Double Stimulus Impairment Scale) and 800:2017 Data Compression Conference (DCC) 731: 720: 204:There is an ongoing discussion in the 7: 848:Pinson, Margaret and Wolf, Stephen. 661: 659: 657: 655: 653: 651: 638: 636: 577: 575: 487: 485: 483: 481: 479: 477: 475: 473: 391:Double-stimulus or multiple stimulus 288:how long an experiment session lasts 209:of origin on video quality ratings. 189:or corrected-to-normal vision using 185:Typically, viewers are screened for 14: 856:, Lugano, Switzerland, July 2003. 512:Quality of Multimedia Experience 303:which scale ratings are taken on 291:where the experiment takes place 755:IEEE Transactions on Multimedia 680:IEEE Transactions on Multimedia 63:("Processed Video Sequences"). 447:Consumer Digital Video Library 42:Subjective video quality tests 1: 530:Journal of Electronic Imaging 276:Standardized testing methods 869:Video Quality Experts Group 443:Video Quality Experts Group 910: 611:10.1109/jstsp.2012.2215306 46:psychophysical experiments 884:Film and video technology 550:10.1117/1.jei.27.5.053013 95:Invite a panel of viewers 802:. IEEE. pp. 52–61. 768:10.1109/tmm.2015.2484963 692:10.1109/TMM.2013.2291663 146:Absolute Category Rating 20:Subjective video quality 730:Cite journal requires 370: 424:Choice of methodology 371: 266:subject inconsistency 231:Quality of Experience 197:is often tested with 75:(MOS) evaluation for 32:Quality of Experience 369:{\textstyle 2-4+5=3} 342: 244:confidence intervals 818:10.1109/dcc.2017.26 603:2012ISTSP...6..640P 542:2018JEI....27e3013B 237:Analysis of results 889:Digital television 366: 242:their statistical 73:mean opinion score 761:(12): 2210–2224. 503:Winkler, Stefan. 270:content ambiguity 250:Subject screening 157:Number of viewers 901: 857: 846: 840: 839: 811: 795: 789: 788: 770: 746: 740: 739: 733: 728: 726: 718: 710: 704: 703: 675: 669: 663: 646: 640: 631: 630: 588: 579: 570: 569: 521: 515: 501: 495: 489: 468: 462: 411:impairment scale 375: 373: 372: 367: 213:Test environment 178:Viewer selection 126:coding artifacts 113:Source selection 909: 908: 904: 903: 902: 900: 899: 898: 874: 873: 865: 860: 847: 843: 828: 797: 796: 792: 748: 747: 743: 729: 719: 717:. hal-01078761. 712: 711: 707: 677: 676: 672: 664: 649: 641: 634: 586: 581: 580: 573: 523: 522: 518: 502: 498: 490: 471: 463: 459: 455: 435: 426: 393: 340: 339: 322: 320:Single-Stimulus 314: 278: 261: 259:Advanced models 252: 239: 224: 215: 199:Ishihara plates 195:Color blindness 180: 159: 154: 134: 115: 69: 17: 12: 11: 5: 907: 905: 897: 896: 891: 886: 876: 875: 872: 871: 864: 863:External links 861: 859: 858: 841: 826: 790: 741: 732:|journal= 705: 686:(2): 541–558. 670: 647: 632: 597:(6): 640–651. 571: 516: 496: 469: 456: 454: 451: 434: 431: 425: 422: 421: 420: 414: 400: 392: 389: 388: 387: 377: 365: 362: 359: 356: 353: 350: 347: 333: 321: 318: 313: 310: 305: 304: 301: 298: 295: 292: 289: 277: 274: 260: 257: 251: 248: 238: 235: 223: 220: 214: 211: 191:Snellen charts 179: 176: 158: 155: 153: 150: 133: 130: 114: 111: 106: 105: 99: 96: 93: 90: 87: 84: 68: 65: 15: 13: 10: 9: 6: 4: 3: 2: 906: 895: 892: 890: 887: 885: 882: 881: 879: 870: 867: 866: 862: 855: 851: 845: 842: 837: 833: 829: 827:9781509067213 823: 819: 815: 810: 805: 801: 794: 791: 786: 782: 778: 774: 769: 764: 760: 756: 752: 745: 742: 737: 724: 716: 709: 706: 701: 697: 693: 689: 685: 681: 674: 671: 667: 662: 660: 658: 656: 654: 652: 648: 644: 639: 637: 633: 628: 624: 620: 616: 612: 608: 604: 600: 596: 592: 585: 578: 576: 572: 567: 563: 559: 555: 551: 547: 543: 539: 536:(5): 053013. 535: 531: 527: 520: 517: 513: 510: 506: 500: 497: 493: 488: 486: 484: 482: 480: 478: 476: 474: 470: 466: 461: 458: 452: 450: 448: 444: 440: 432: 430: 423: 418: 415: 412: 408: 404: 401: 398: 395: 394: 390: 385: 381: 378: 363: 360: 357: 354: 351: 348: 345: 337: 334: 331: 327: 324: 323: 319: 317: 311: 309: 302: 299: 296: 293: 290: 287: 286: 285: 282: 275: 273: 271: 267: 258: 256: 249: 247: 245: 236: 234: 232: 228: 227:Crowdsourcing 222:Crowdsourcing 221: 219: 212: 210: 207: 202: 200: 196: 192: 188: 187:normal vision 183: 177: 175: 171: 167: 163: 156: 151: 149: 147: 142: 140: 131: 129: 127: 122: 119: 112: 110: 104: 100: 97: 94: 91: 88: 85: 82: 81: 80: 78: 74: 66: 64: 62: 58: 54: 49: 47: 43: 39: 37: 33: 29: 25: 24:video quality 21: 894:Video codecs 853: 844: 799: 793: 758: 754: 744: 723:cite journal 708: 683: 679: 673: 594: 590: 533: 529: 519: 511: 508: 499: 460: 438: 436: 427: 416: 410: 406: 402: 396: 379: 335: 329: 325: 315: 306: 283: 279: 269: 265: 262: 253: 240: 225: 216: 203: 184: 181: 172: 168: 164: 160: 143: 135: 123: 120: 116: 107: 70: 60: 56: 52: 50: 41: 40: 19: 18: 139:video codec 67:Measurement 878:Categories 809:1611.01715 453:References 777:1520-9210 619:1932-4553 558:1017-9909 439:databases 433:Databases 349:− 330:ACR scale 836:14251604 785:22343847 700:16862362 627:10667847 566:53058660 312:Examples 174:needed. 132:Settings 668:, 2012. 645:, 2014. 599:Bibcode 538:Bibcode 514:, 2009. 494:, 2008. 467:, 2004. 152:Viewers 834:  824:  783:  775:  698:  625:  617:  564:  556:  336:ACR-HR 832:S2CID 804:arXiv 781:S2CID 696:S2CID 623:S2CID 587:(PDF) 562:S2CID 509:Proc. 397:DSCQS 384:fader 380:SSCQE 77:audio 28:video 822:ISBN 773:ISSN 736:help 615:ISSN 554:ISSN 403:DSIS 326:ACR 268:and 61:PVSs 57:HRCs 53:SRCs 44:are 36:PSNR 814:doi 763:doi 688:doi 607:doi 546:doi 407:DCR 206:QoE 103:MOS 22:is 880:: 852:. 830:. 820:. 812:. 779:. 771:. 759:17 757:. 753:. 727:: 725:}} 721:{{ 694:. 684:16 682:. 650:^ 635:^ 621:. 613:. 605:. 593:. 589:. 574:^ 560:. 552:. 544:. 534:27 532:. 528:. 507:. 472:^ 449:. 417:PC 272:. 201:. 193:. 838:. 816:: 806:: 787:. 765:: 738:) 734:( 702:. 690:: 629:. 609:: 601:: 595:6 568:. 548:: 540:: 364:3 361:= 358:5 355:+ 352:4 346:2

Index

video quality
video
Quality of Experience
PSNR
psychophysical experiments
mean opinion score
audio
MOS
coding artifacts
video codec
Absolute Category Rating
normal vision
Snellen charts
Color blindness
Ishihara plates
QoE
Crowdsourcing
Quality of Experience
confidence intervals
fader
Video Quality Experts Group
Consumer Digital Video Library
ITU-T Tutorial: Objective perceptual assessment of video quality: Full reference television






Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑