Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images
This article presents a new approach to learn geometries such as depth and surface normal from images….
This article presents a new approach to learn geometries such as depth and surface normal from images….
The paper discusses CREMA, a modality-fusion framework aimed at enhancing the efficiency of multimodal compositional video reasoning….
This article presents Mamba-ND, a design that extends the Mamba architecture to multi-dimensional data. Transformers, the de-facto…
The Segment Anything Model (SAM) is a popular image processing tool for its segmentation accuracy, variety of…
The article introduces a novel method called Point-VOS for Video Object Segmentation (VOS). Traditional VOS methods require…
The paper presents a novel method for generating PBR images directly, avoiding the challenges and inaccuracies associated…
The article introduces SPHINX-X, a Multi-modality Large Language Model (MLLM) series, which is an enhancement of the…
The study presents InstaGen, a novel method to enhance object detector’s ability by training on synthetic dataset…
The study aimed to identify machine learning models that could efficiently categorize tweets concerning eating disorders. Over…
This research focuses on understanding the information encoded in speech processing by using vector representations of speech…