We systematically compared single-modality training vs. multimodal joint optimization (MVC). Result: MVC consistently yields the highest performance, outperforming single-modality baselines by 2-6%.
UniMolV2 emerged as the most robust chemical backbone for multimodal tasks (CPS=1.0), followed by KPGT.