VLA/VLM Engineer
Vị trí: Application Deployment Division
Số lượng: 1
Hạn nộp: Chưa xác định

Job Summary:

The VLA/VLM Engineer will design and deploy vision-language-action models that enable humanoid robots to perceive scenes, understand context, and execute actions intelligently. This role bridges computer vision, natural language processing, and decision-making — empowering robots to interpret human instructions, recognize behaviors, and perform real-world tasks safely and autonomously.

Position: Vision-Language-Action (VLA/VLM) Engineer
Department: AI application
Location: Gia Lam, Ha Noi
Reports to: Head of AI Application.

Key Responsibilities:

  • Design, train, and deploy vision-language-action models (VLA) that combine visual perception, language understanding, and robotic control.
  • Develop multimodal AI pipelines that process RGB/Depth video, LiDAR, and audio data for perception and reasoning
  • Implement VLMs (e.g., LLaVA, GPT-4V, BLIP-2, Florence-2, Kosmos-2, or Gemini-based models) for scene understanding and natural interaction with humans.
  • Integrate AI models with the humanoid robot’s control and decision layers, enabling the robot to interpret human commands and respond intelligently.
  • Build and optimize prompt-based reasoning and visual grounding systems for human-robot dialogue and situational awareness.
  • Conduct experiments in simulation and real-world environments to test perception–reasoning–action loops.
  • Collaborate with cross-functional teams (Vision, RL, Motion, Control, Conversation) to ensure seamless end-to-end integration.

Requirements:

Must Have:

  • Bachelor’s, Master’s, or PhD degree in Computer Science, Artificial Intelligence, Robotics, or a related field.
  • Strong programming skills in Python and C++; experience with PyTorch or TensorFlow.
  • Solid understanding of multimodal learning (vision + language + action) and transformer architectures (ViT, CLIP, BLIP, Flamingo, or LLaVA).
  • Hands-on experience training or fine-tuning VLMs or LLMs with vision input for image/video captioning, grounding, or reasoning.
  • Experience with dataset preparation and annotation for multimodal tasks (e.g., VQA, instruction-following, embodied navigation).
  • Knowledge of deployment and inference optimization on edge hardware such as NVIDIA Jetson, AGX Orin, or RTX platforms.
  • Familiarity with ROS/ROS2, real-time inference, and integrating AI perception modules into robotic systems.

Nice to Have:

  • Experience with Vision-Language-Action models (RT-2, OpenVLA, PaLM-E, or GR-1).
  • Research or publications in robotics, multimodal AI, or embodied intelligence (CVPR, ICRA, NeurIPS, CoRL, RSS).
  • Experience in robotic control via language (e.g., natural language navigation, human instruction following).
  • Understanding of reinforcement learning with multimodal feedback.
  • Familiarity with safety alignment and hallucination mitigation for large vision-language models.

Benefits:

  • Competitive salary and benefits package (open to salary negotiations).
  • Opportunities for professional development and career growth.
  • Flexible work arrangements.
  • A collaborative and innovative work environment where ideas are valued and creativity is encouraged.

Ứng tuyển ngay
Vị trí: VLA/VLM Engineer
Số lượng: 1
Hạn nộp: Chưa xác định
Ứng tuyển ngay