Modalities · World Models · Industry Pillars
Not a peer modality — a system-level paradigm that integrates all modalities below. Learns 3D spatial structure, physical laws, causal logic, and environmental interaction to simulate how the real world operates.
Text, dialogue, code, logical reasoning, knowledge induction. The foundation for cognitive thinking and human-computer interaction.
Independent acoustic dimension. Speech recognition, synthesis, sound-field understanding, and voiceprint analysis.
Image + Video. Static 2D parsing and feature recognition, plus temporal understanding of continuous frames and dynamic scenes.
Physical-world interaction: robotic control, manipulation, locomotion, and the mapping from perception to action.
Raw material of AI. Collection, cleaning, labeling, general corpora, and vertical-industry data. Determines the upper limit of model capability.
Logical brain of AI. Foundational architectures, training paradigms, fine-tuning, and alignment strategies.
Physical carrier of AI. GPUs, NPUs, server clusters, and edge devices that execute training and inference.
Physical foundation for large-scale AI. Power supply, data-center cooling, carbon footprint. Rising to co-equal pillar status as frontier-model power demand reaches gigawatt scale.