Special Sessions – VCIP 2026

Date TBA Time TBA Location TBA

Abstract

Video coding is a cornerstone of modern visual communication, enabling applications such as video streaming, conferencing, broadcasting, cloud gaming, immersive media, and intelligent visual services. While international standards such as H.265/HEVC, AV1, AV2, and H.266/VVC have continuously improved compression efficiency, emerging visual applications are driving video coding beyond traditional rate–distortion optimization. Future coding systems must simultaneously address compression performance, perceptual quality, task utility, deployment feasibility, model generalization, and standardization requirements. In response, next-generation video coding is evolving through multiple complementary directions, including advanced coding tools extending existing standards, end-to-end neural video coding, machine- and task-oriented compression, generative visual communication, and immersive media coding. These developments are fostering the convergence of conventional codec evolution, learned compression, application-aware representations, and emerging standardization efforts. In line with these trends, we invite contributions on advanced video coding tools, neural video coding, perceptual and task-oriented compression, machine vision and vision-language-model-based communication, generative video communication, immersive media coding, codec deployment, and future video coding standardization.

Organizers

Zhuoyuan Li (The Hong Kong Polytechnic University, HK)
Bolin Chen (DAMO Academy, Alibaba Group)
Yichi Zhang (Purdue University, US)
Li Li (University of Science and Technology of China, CN)
Giuseppe Valenzise (Université Paris-Saclay, FR)
Yan Ye (DAMO Academy, Alibaba Group)

Date TBA Time TBA Location TBA

Abstract

Recent advances in visual sensing, computational imaging, neural representations, and multimodal learning are transforming the way visual data are acquired, processed, communicated, and understood. Modern visual systems increasingly rely on high-dimensional visual data that extend beyond conventional RGB images and videos to include event streams, light fields, hyperspectral and polarization imagery, LiDAR, time-of-flight sensing, neural scene representations, 3D Gaussian splats, and hybrid multimodal sensing modalities. These data capture rich spatial, temporal, geometric, spectral, and cross-modal information, enabling more robust visual processing under challenging conditions such as fast motion, low light, occlusion, missing modalities, and distribution shift. At the same time, the growing complexity and volume of high-dimensional visual data create new challenges in acquisition, restoration, compression, representation, quality assessment, perception, and reasoning. Emerging solutions increasingly integrate imaging, communication, perception, and multimodal intelligence to support reliable visual understanding and decision making. In line with these developments, we invite contributions on computational imaging and novel sensing systems, event-based and multimodal vision, high-dimensional visual restoration and enhancement, learned compression, implicit and neural representations, quality assessment, cross-modal fusion and alignment, robust visual perception, vision-language reasoning, trustworthy AI, and efficient visual communication for next-generation visual systems.

Organizers

Haowen Bai (Nanyang Technological University, SG)
Rui Zhao (Nanyang Technological University, SG)
Zeyu Xiao (National University of Singapore, SG)
Taewoo Kim (INSAIT, BG)
Hadi Amirpour (University of Klagenfurt, AT)
Tae Hyun Kim (Hanyang University, KR)

Date TBA Time TBA Location TBA

Abstract

With the increasing demand for immersive visual media, lenslet video has become an important representation format for dense light fields captured by plenoptic cameras, camera arrays, moving gantry systems, and computer-generated scenes. Lenslet video preserves rich spatial and angular information and is useful for multiview rendering, refocusing, glasses-free 3D display, industrial inspection, immersive communication, and future visual media services. At the same time, its large data volume and strong spatial-angular-temporal redundancy make efficient compression and processing highly challenging.

This special session focuses on recent advances in lenslet video coding and processing, with particular attention to the current standardization activities in ISO/IEC JTC 1/SC 29/WG 4, which has launched the Lenslet Video Coding standardization activity, and the work is now moving from exploration and CfP evaluation toward working draft development. This makes VCIP 2026 a timely venue to bring together researchers working on coding tools, codec-agnostic preprocessing and postprocessing, metadata design, rendering-oriented quality preservation, and objective and subjective evaluation of lenslet video.

The session welcomes contributions on technologies that improve compression efficiency while preserving lenslet reconstruction quality and rendered multiview quality. Topics include, but are not limited to:

Lenslet video coding tools and codec-agnostic coding frameworks
Microimage cropping, alignment, relocalization, and reconstruction
Disparity-guided pixel rearrangement, shuffling, filtering, and luminance correction
Metadata representation, bitstream packing, and compatibility with existing video codecs
VVC-compatible and future-codec-compatible coding architectures
Objective and subjective quality assessment of lenslet video and rendered views
Multiview rendering, view synthesis, and refocusing from lenslet video
Calibration and microimage center localization for plenoptic cameras
In-camera processing and acquisition methods for lenslet video
Applications of lenslet video in immersive communication, 3D display, inspection, robotics, and visual analytics

By gathering recent academic and standardization-related progress, this session aims to provide a clear picture of where lenslet video coding stands now, what technical problems remain open, and how the research community can contribute to the next stage of interoperable immersive light field media.

Organizers

Xin Jin (Tsinghua University, CN)
Mehrdad Teratani (Nagoya University, JP)