Computer Vision and Pattern Recognition

Concerns with enabling computers to interpret and understand visual inputs, such as images and videos.

MatSynth: A Modern PBR Materials Dataset

root January 12, 2024 0

The article introduces MatSynth, a dataset comprised of over 4,000 ultra-high resolution PBR materials. These materials are key for defining the interaction of light on the surfaces of virtual objects….

Computation and Language Computer Vision and Pattern Recognition

LEGO:Language Enhanced Multi-modal Grounding Model

root January 12, 2024 0

The LEGO model is a multi-modal model that emphasizes both global and local information across different modalities. Unlike existing models, which focus mainly on global information, the LEGO model can…

Computation and Language Computer Vision and Pattern Recognition

PALP: Prompt Aligned Personalization of Text-to-Image Models

root January 12, 2024 0

The article discusses a new concept called ‘prompt-aligned personalization’ for improving the performance of text-to-image models. It addresses the limitations of existing personalization methods, such as compromising the ability to…

Computer Vision and Pattern Recognition

Gaussian Shadow Casting for Neural Characters

root January 12, 2024 0

This research presents a new shadow model named Gaussian Shadow Casting (GSC). The model can reconstruct 3D neural characters from videos, with improved shadows and shading, even in challenging outdoor…

Computer Vision and Pattern Recognition

Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors

root January 12, 2024 0

The article presents a new method for visual dubbing, which is the process of generating lip motions of an actor in a video to synchronize with given audio. The method,…

Artificial Intelligence Computer Vision and Pattern Recognition

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

root January 12, 2024 0

The article introduces E2GAN, a novel approach for efficient training of GANs for image-to-image translation. This technique uses data distillation from large-scale text-to-image diffusion models, such as Stable Diffusion, for…

Computer Vision and Pattern Recognition

Distilling Vision-Language Models on Millions of Videos

root January 12, 2024 0

The research aims to replicate the success of image-text data for video-language models. The researchers fine-tuned a video-language model from a strong image-language baseline with synthesized instructional data. The adapted…

Artificial Intelligence Computer Vision and Pattern Recognition

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

root January 11, 2024 0

The article introduces a novel audio-video recognition approach called the Audio-Video Transformer (AVT) that uses effective spatio-temporal representation for improved action recognition. The research reduces cross-modality complexity via an audio-video…

Artificial Intelligence Computation and Language

FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild

root January 11, 2024 0

The article introduces ‘FunnyNet-W’, a model that relies on cross- and self-attention for visual, audio, and text data to predict funny moments in videos. Unlike most methods that rely on…

Computation and Language Computer Vision and Pattern Recognition

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

root January 11, 2024 0

The article introduces ‘Anim-400K’, a large-scale dataset designed to aid in the automated end-to-end dubbing of video content. With 60% of online content published in English and only 18.8% of…

Previous Page 11 of 51 Next

Press ESC to close

Computer Vision and Pattern Recognition

Please allow ads on our site