root, Author at BytesArchive

February 11, 2024

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

This paper introduces CREMA, a new and efficient modality-fusion framework designed to improve video reasoning. By leveraging…

February 11, 2024

The Segment Anything Model (SAM) is a widely used tool for image processing, but its application in…

February 11, 2024

The article introduces SPHINX-X, a series of Multi-modality Large Language Models (MLLMs) developed based on SPHINX. This…

February 10, 2024

The paper introduces a new category of diffusion models built on state space architecture for image data….

February 10, 2024

The article discusses the challenge of 6D object pose estimation and the improved accuracy achieved by incorporating…

February 10, 2024

The paper presents a new dataset called DAPlankton for developing and benchmarking domain adaptation methods for image…

February 10, 2024

The article presents a novel training strategy for deep denoisers in signal and image processing. The strategy…

February 10, 2024

The article presents a new method for estimating robot pose from RGB images, even when robot internal…

February 10, 2024

This research presents an ordinal regression framework for assessing disease severity in chest radiographs using deep learning….

February 10, 2024

This article introduces DiffSpeaker, a new model for speech-driven 3D facial animation. Traditional models use either Diffusion…