3D Understanding과 Scene Recognition의 차이
3D Understanding
3차원 공간에서 객체와 장면을 인식하고 공간적 맥락을 이해하는 능력
- 객체 복원(Object Reconstruction): 단일 또는 다중 이미지에서 3차원 객체의 형태를 재구성.
- 깊이 추정(Depth Estimation): 2D 이미지로부터 각 픽셀의 깊이 정보를 추정.
- 3D 포즈 추정(3D Pose Estimation): 객체의 위치와 방향을 3차원 공간에서 추정.
- 포인트 클라우드(Point Cloud) 처리: LIDAR 등에서 얻은 포인트 클라우드를 분석하여 환경을 이해.
- LLM에는 택스트뿐만 아니라 이미지, 비디오 등의 다양한 데이터 유형을 학습할 수 있는 다중 모달리티 능력을 갖추고 있다.
- 택스트 설명을 통해 3D 객체나 장면을 더 잘 이해할 수 있다. “의자”라는 단어가 다양한 형태를 가지고 있음을 학습하고 이를 3D 모델과 연계할 수 있다.
- LLM은 방대한 텍스트 데이터를 통해 축적된 지식을 바탕으로 3D 이해 작업에 적용가능. 이는 3D 객체의 형태, 구성 요소, 상호 관계 등에 대한 지식을 주입 가능
- 3D 데이터에 대한 자연어 설명을 생성할 수 있어 직관적인 3D Understanding 가능
3D Scene Understanding의 연구 갈피
Modal-modal 3D Scene Understanding (https://arxiv.org/pdf/2310.15676) 3D+2D Scene Understanding Multi-modal Outdoor 3D Object Detection Camera-LiDAR Projection Based Attention Mechanism Based Cross-modal Transformer Based Multi-modal Indoor 3D Object Detection Multi-modal Outdoor 3D Semantic Segmentation Interactive Fusion Based (Sec. 3.3.1) Knowledge Distillation Based (Sec. 3.3.2) Unsupervised Domain Adaptation Based Multi-modal Indoor 3D Semantic Segmentation 3D+Language Scene Understanding 3D Visual Grounding Two-Stage Detect-then-Match One-Stage Language-Guided 3D Dense Captioning Two-Stage Detect-then-Describe One-Stage Parallel Detect-Describe 3D Question Answering Text-Driven 3D Scene Generation Mesh Based Neural Radiance Field Based Open-Vocabulary 3D Recognition Miscellaneous Joint 3D Grounding and Captioning 3D Vision-Language Pre-Training Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
정리 중인 관련 연구
- https://github.com/geomagical/GeoSynth
- https://scenefun3d.github.io/ (CVPR2024 Oral)
- https://opennerf.github.io/ (ICRA 2024)
- https://chat-with-nerf.github.io/ (ICRA 2024)
- https://pengsongyou.github.io/openscene (CVPR2023)
- https://www.lerf.io/ (ICCV2023)
- https://www.garfield.studio/ (CVPR2024)
- https://opensun3d.github.io/#challenge
- 1st workshop on Open-Vocabulary 3D Scene Understanding
- https://lifuguan.github.io/gpnerf-pages/ (GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding) ![[Pasted image 20240608165922.png]]
- UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes (https://rozdavid.github.io/unscene3d) ![[Pasted image 20240608170704.png]]
- MiKASA: Multi-Key-Anchor Scene-Aware Transformer for 3D Visual Grounding (https://github.com/dfki-av/mikasa-3dvg)
-
Oryon: Open-Vocabulary Object 6D Pose Estimation(https://jcorsetti.github.io/oryon/) ![[Pasted image 20240608172222.png]]
-
Memory-based Adapters for Online 3D Scene Perception
-
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
-
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
-
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
- Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration https://nihalsid.github.io/panoptic-lifting/ (Panoptic Lifting for 3D Scene Understanding with Neural Fields) 1Technical University of Munich, 2Meta Reality Labs ![[Pasted image 20240609144326.png]]
- https://3d-grand.github.io/
- https://scenefun3d.github.io/
- https://openobj.github.io/
Awesome list
- https://github.com/bertjiazheng/awesome-scene-understanding
- https://github.com/ActiveVisionLab/Awesome-LLM-3D
- https://github.com/zchoi/Awesome-Embodied-Agent-with-LLMs
3D Reconstruction
- View rendering?
- Mesh reconstruction?
- RGB? or RGB-D?
- On-device? or Cloud?
3D Reconstruction Multi-view Reconstruction Structure-from-Motion (SfM) Multi-view Stereo (MVS) Single-view Reconstruction Image-based 3D Reconstruction Depth Estimation 3D Shape Reconstruction Shape-from-X (Shape-from-Silhouette, Shape-from-Texture) Volumetric methods Surface Reconstruction Neural Reconstruction Neural Radiance Fields (NeRF) Implicit Neural Representations Diffusion Models Dynamic 3D Reconstruction 4D Reconstruction Temporal Coherence in Reconstruction Semantic Reconstruction Semantic 3D Reconstruction Scene understanding and Segmentation Point Cloud processing Gaussian Splatting
프로젝트 목록
-
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video (https://zc-alexfan.github.io/hold ) ![[Pasted image 20240608163636.png]]
- RoHM: Robust Human Motion Reconstruction via Diffusion (https://sanweiliti.github.io/ROHM/ROHM.html) ![[results_prox_rgbd_init_full.mp4]]
- SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors (https://daveredrum.github.io/SceneTex/) ![[Pasted image 20240608164407.png]]
-
SuperPrimitive: Scene Reconstruction at a Primitive Level (https://makezur.github.io/SuperPrimitive/)
-
MultiDiff: Consistent Novel View Synthesis from a Single Image
-
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
-
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
-
Synergistic Global-space Camera and Human Reconstruction from Videos
-
Towards Detailed and Robust 3D Clothed Human Reconstruction with High-Frequency and Low-Frequency Information of Parametric Body Models
-
Snapshot Lidar: Fourier embedding of amplitude and phase for single-image depth reconstruction
-
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments
-
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images (https://llrtt.github.io/G-NeRF-Demo/) 1South China University of Technology, 2The University of Adelaide, 3Guangzhou Shiyuan Electronics Co., Ltdc, 4Pazhou Lab
- Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle (https://nju-3dv.github.io/projects/Gaussian-Flow/)
1 Nanjing University, 2 Alibaba Group, 3 Fudan University
-
VOODOO 3D: VOlumetric pOrtrait Disentanglement fOr Online 3D head reenactment (https://p0lyfish.github.io/voodoo3d/) 1MBZUAI, 2ETH Zurich, 3 VinAI Research, 4Pinscreen
-
NARUTO: Neural Active Reconstruction from Uncertain Target Observations (https://oppo-us-research.github.io/NARUTO-website/) 1 OPPO US Research Center 2 Clemson University 3 Indiana University
-
Diffusion Time-step Curriculum for One Image to 3D Generation (https://paperswithcode.com/paper/diffusion-time-step-curriculum-for-one-image)
-
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything (https://github.com/ucwxb/NTO3D?tab=readme-ov-file)
-
G3DR: Generative 3D Reconstruction in ImageNet (https://preddy5.github.io/g3dr_website/)
-
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis (https://sinoyou.github.io/nelf-pro/)
- OpenVLA: An Open-Source Vision-Language-Action Model (https://openvla.github.io/)
3D Scene Editing
프로젝트 목록
- InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields (https://ivrl.github.io/InNeRF360/) ![[Pasted image 20240608171728.png]]
-
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
-
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
-
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
-
HOIAnimator: Text-Prompt Human-Object Animations Generation with Perceptive Diffusion Models
- ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing