The rapid expansion of the metaverse drives the growing demand for effective 3D scene description technologies, as well as augmented reality (AR) and virtual reality (VR) applications. However, a noticeable disconnect remains between academic research and industry-driven standardization efforts. While academic work often focuses on semantic richness, such as through the development of 3D scene graphs, industry standards prioritize interoperability, exemplified by the MPEG graphics language transmission format (i.e., glTF) extensions. This survey seeks to bridge this divide by systematically reviewing and comparing pivotal contributions from both academia and industry related to 3D scene description technologies and standardization efforts. Our analysis highlights notable differences in methodologies, data formats, and core objectives. Key challenges include the need for unified data representations and the establishment of standardized evaluation benchmarks to support broader integration. This survey emphasizes the urgent need for closer collaboration between academia and industry and proposes potential pathways towards a unified framework to accelerate the real-world adoption of advanced 3D scene description technologies.