Prof. Dr. K. R. Rao, IEEE Fellow, University of Texas Arlington, USA
Based on a human visual system (HVS) based approach, digital video image quality and perceptual coding(DVIQPC) outlines the principles, metrics an standards associated with perceptual coding as well as the latest techniques and applications. It discusses the latest developments in vision research as they relate to HVS based video and image coding. It discusses subjective and objective assessment methods, quantitative quality metrics including vision model based digital video impairment metrics, test criteria and procedures. It examines post-filtering and reconstruction issues associated with color bleeding, blocking, ringing and temporal fluctuation artifacts in detail along with methods to reduce/eliminate them. It also focuses on picture quality assessment criteria. It poses new challenges to vision research and/or how to transfer vision science to imaging and visual communication systems engineering. It also poses an obvious theoretical and practical challenge regarding the concept of psychovisual redundancy (also how to define this quantitatively) and to establish theoretical bound for perceptually lossless coding.
K. R. Rao received the Ph. D. degree in electrical engineering from The University of New Mexico, Albuquerque in 1966. Since 1966, he has been with the University of Texas at Arlington where he is currently a professor of electrical engineering. He, along with two other researchers, introduced the Discrete Cosine Transform in 1975 which has since become very popular in digital signal processing. He is the co-author of the books “Orthogonal Transforms for Digital Signal Processing”(Springer-Verlag, 1975), Also recorded for the blind in Braille by the Royal National Institute for the blind. “Fast Transforms:Analyses and Applications”(Academic Press, 1982), “Discrete Cosine Transform-Algorithms, Advantages, Applications”(Academic Press, 1990).He has edited a benchmark volume, “Discrete Transforms and Their Applications” (Van Nostrand Reinhold, 1985). He has coedited a benchmark volume, “Teleconferencing” (Van Nostrand Reinhold, 1985). He is co-author of the books, “Techniques and standards for Image/Video/Audio Coding” (Prentice Hall) 1996 “Packet video communications over ATM networks”(Prentice Hall) 2000 and “Multimedia communication systems” (Prentice Hall) 2002. He has coedited a handbook “ The transform and data compression handbook,” ( CRC Press, 2001). Digital video image quality and perceptual coding, (with H.R. Wu),Taylor and Francis(2006). Introduction to multimedia communications: applications, middleware, networking, (with Z.S. Bojkovic and D.A. Milovanovic), Wiley, (2006). He has also published a book, “ Discrete cosine and sine transforms”, with V. Britanak and P. YIP (Elsevier 2007). Some of his books have been translated into Japanese, Chinese, Korean and Russian and also published as Asian (paperback) editions. He has been an external examiner for graduate students from Universities in Australia, Canada, Hong Kong, India, Singapore, Thailand and Taiwan. He was a visiting professor in several Universities -3 weeks to 7 and1/2 months- (Australia, Japan, Korea, Singapore and Thailand. He has conducted workshops/tutorials on video/audio coding/standards worldwide. He has supervised several students at the Masters (59) and Doctoral (29) levels. He has published extensively in refereed journals and has been a consultant to industry, research institutes, law firms and academia. He is a Fellow of the IEEE.
Prof. Dr. M. Rupp, Vienna University of Technology, Austria
The deployment of third generation mobile networks enabled new real-time multimedia services like video call, conferencing and streaming. The real-time nature of these services excludes the possibility of end-to-end retransmissions. Therefore, errors affecting the quality of the received service are inevitable. The aim of error resilience methods is to minimize the impact of errors on the end-user quality.
In this talk, the effect of errors at different positions in the video stream and the possibility of their detection will be discussed. Typically, within an IP video stream, the presence of errors in one IP packet can be detected by means of a simple checksum. Thus, the IP packet size determines the resolution of the error detection. In order to reduce the rate increase due to packet headers, the IP packets are rather large and their loss results in a loss of a considerable part of a picture. Currently, the erroneous IP packets are discarded at the receiver and the corresponding missing parts of the video sequence are concealed. However, the discarded IP packets may still contain correctly received information. If this information is used additionally, an essential improvement in the end-user quality can be obtained.
If the access network technology is known, an appropriate cross-layer design enables easier error detection and allows for further improvements of error resilience. In this talk, the UMTS access network is focused. Error resilience methods can be further improved by an appropriate scheduling of the video stream. Here, link-error aware and distortion-aware concepts and their combinations will be discussed and their performance demonstrated.
Markus Rupp received his Dipl.-Ing. degree in 1988 at the University of Saarbruecken, Germany and his Dr.-Ing. degree in 1993 at the Technische Universitaet Darmstadt, Germany, where he worked with Eberhardt Haensler on designing new algorithms for acoustical and electrical echo compensation. From November 1993 until July 1995 he had a postdoctoral position at the University of Santa Barbara, California with Sanjit Mitra where he worked with Ali H. Sayed on a robustness description of adaptive filters with impacts on neural networks and active noise control. From October 1995 until August 2001 he was a member of the Technical Staff in the Wireless Technology Research Department of Bell-Labs where he was working on various topics related to adaptive equalization and rapid implementation for IS-136, 802.11 and UMTS. He is presently a full professor for Digital Signal Processing in Mobile Communications at the Technical University of Vienna. He was associate editor of IEEE Transactions on Signal Processing from 2002-2005, is currently associate editor of JASP EURASIP Journal of Applied Signal Processing, and of JES EURASIP Journal on Embedded Systems and is elected AdCom member of EURASIP. He authored and co-authored more than 200 papers and patents on adaptive filtering, wireless communications and rapid prototyping as well as automatic design methods.
Prof. Dr. L. Onural, Bilkent University, Ankara, Turkey
A consortium of 19 European institutions, led by Bilkent University, has been focusing on all technical aspects of 3DTV: 3D scene capture, representation, compression, transmission and display are the main technical building blocks. Fundamental signal processing issues associated with scalar wave propagation, diffraction and holography are also of prime interest. Advanced versions of stereoscopic viewing, integral imaging based systems, volumetric displays and holographic displays are among the candidate techniques for presenting 3DTV to the viewer. Current 3D monitors are mostly stereoscopic, however, experimental holographic displays are also demonstrated. It is envisioned that future 3DTV systems will decouple the capture and display steps: 3D scenes will be captured by some means, like multi-camera systems, and this data will then be converted to abstract 3D representation using computer graphics techniques. The display will then access this abstract data to generate the 3D video to the observer.
Levent Onural received his Ph.D. in ECE from SUNY at Buffalo in 1985 (BS, MS from METU). He was a Fulbright scholar. After a research assistant professor position at ECE Department of SUNYAB, he joined EEE Department of Bilkent University, where he is a full Professor at present. His current research interests are in image and video processing, 3DTV, holographic TV, and signal processing aspects of optical wave propagation. Dr. Onural received a TUBITAK award in 1995, and an IEEE Third Millenium Medal IEEE in 2000. He served IEEE as the Director of IEEE Region 8 (Europe, Middle Eastand Africa), and as the Secretary of IEEE. He was a member of IEEE Board of Directors, IEEE Executive Committee and IEEE Assembly. He is an Associate Editor of IEEE TCSVT. Currently, he is leading the EC funded 3DTV project as the coordinator; 19 institutions (about 200 researchers) are contributing to 3DTV.
Prof. Dr. H. Wechsler, George Mason University, Fairfax, Virginia, USA
The ability to recognize objects, in general, and living creatures, in particular, in photographs or video clips, is a critical enabling technology for a wide range of applications including health care, human-computer intelligent interaction, search engines for image retrieval and data mining, industrial and personal robotics, surveillance and security, and transportation. Despite almost 50 years of research, however, today’s object recognition systems are still largely unable to handle the extraordinary wide range of appearances assumed by common objects [including human faces] in typical images. Some of the challenges for modern pattern recognition that have to be addressed in order to advance and make practical both detection and categorization include open set recognition, occlusion and masking, change detection and time-varying imagery, lack of enough data for training, and proper performance evaluation and error analysis. Open set recognition operates under the assumption that not all the test (unknown) probes have mates in the gallery (training set), occlusion and masking hide and disguise parts of the input, image contents vary across both the spatial and temporal dimensions, the amount of data available for learning and adaptation is limited, and errors are not uniformly distributed across patterns. The recognition-by-parts approach proposed here to address the challenges listed above is driven by transduction and boosting. Transduction employs local estimation and inference to find a compatible labeling of joined training and test data. Active learning further promotes the recognition process by making incremental choices about what is best to learn and when in order to accumulate the evidence needed to disambiguate among alternative interpretations. The interplay between labeled (“training”) and unlabeled (“test”) data points mediates between semi-supervised learning and transduction. The additional information coming from the unlabeled data points includes constraints and hints about the meaningful relations and regularities affecting their very discrimination. Boosting combines in an iterative fashion part-based, model-free, and non-parametric simple weak classifiers, whose contents and relative ranking are driven by their “strangeness” characteristics. The scope of the proposed approach covers also stream-based data points and includes change detection. The benefits of the proposed discriminative recognition-by-parts approach include a priori setting of rejection thresholds, no need for image segmentation, robustness to occlusion, clutter, and disguise. Examples drawn from biometrics illustrate the proposed approach and show its feasibility and utility.
Harry Wechsler received the PhD degree in information and computer science from the University of California, Irvine. Currently, he is a professor of computer science and director for the Center of Distributed and Intelligent Computation at George Mason University (GMU). His research in the field of intelligent systems focuses on computational vision, image and signal processing, data mining, and machine learning and pattern recognition, with applications to biometrics / face recognition / gait analysis / performance evaluation, augmented cognition and HCI, change detection and link analysis, and video processing and surveillance. He has published more than 250 scientific papers, serves on the editorial board of major scientific publications, and is the author of Computational Vision (Academic Press, 1990). As a leading researcher in face recognition, he organized and directed in 1997 the seminal NATO Advanced Study Institute (ASI) on “Face Recognition: From Theory to Applications” whose proceedings were published by Springer in 1998. His book on Reliable Face Recognition Methods, which breaks new ground in applied modern pattern recognition and biometrics, was published by Springer in the fall of 2006. Dr. Wechsler has directed at GMU the design and development of FERET, which has become the standard facial data base for benchmark studies and experimentation. He was elected an IEEE Fellow in 1992 for “contributions to spatial/spectral image representations and neural networks and their theoretical integration and application to human and machine perception” and an IAPR (International Association of Pattern Recognition) Fellow in 1998. He was granted (together with his former doctoral students) two patents by USPO in 2004 on fractal image compression using quad-q-learning (licensed in 2006) and feature based classification (for face recognition). Two additional patents (together with his former doctoral students) on open set (face) recognition (and intrusion / outlier detection) and change detection using martingale are now pending with USPO.