A Deep Feature Fusion Framework for Accurate and Efficient Cross-Modal Retrieval in Digital Libraries

Sun, X.; Meng, W.; Zhang, X.

Journal of Engineering, Project, and Production Management, 2026, 16(6), 2025-185

A Deep Feature Fusion Framework for Accurate and Efficient Cross-Modal Retrieval in Digital Libraries

Xiaoyu Sun¹, Wenjie Meng², and Xuesong Zhang³

¹ Associate Researcher, Library of the China University of Petroleum (East China), Qingdao, 266580, China, E-mail: sunxiaoyu0818@126.com (corresponding author)
² Librarian, Library of the China University of Petroleum (East China), Qingdao, 266580, China
³ Associate Researcher, Library of the China University of Petroleum (East China), Qingdao, 266580, China

Project Management

Received August 27, 2025; revised December 21, 2025; June 5, 2026; accepted June 9, 2026

Available online June 17, 2026

Abstract: To address the multi-modal resource retrieval needs of digital libraries, archives, and knowledge bases, this study proposes a Feature Fusion Cross Modal Hashing (FFCMH) model. It innovatively constructs a specialized dataset (multi-level filtering + cross modal denoising), employs autoencoders for feature fusion, and enhances image-text extraction through semantic segmentation, thereby supporting efficient retrieval. Experimental findings reveal that the proposed technology outperforms existing mainstream models, such as Locality-Sensitive Hashing, Semantic Topic Multi-modal Hashing (STMH), and Deep Cross Modal Hashing (DCMH), across metrics including recall rate and average precision on professional datasets. For instance, on the Flickr-25k dataset, the proposed technology achieves a maximum recall rate of 95.5% for Image-To-Text Cross Modal (I2TCM) retrieval and 86.7% for Text-To-Image Cross Modal (T2ICM) retrieval. Furthermore, the proposed technology exhibits significant advantages in retrieval accuracy and efficiency on self-made datasets, with average precision values of 0.948 for I2TCM and 0.938 for T2ICM, while requiring significantly less retrieval time than other models. This technology provides technical support for libraries to achieve efficient and precise cross modal resource retrieval.

Keywords: Library management, cross-modal hashing, feature fusion, semantic segmentation, resource retrieval.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Requests for reprints and permissions at eppm.journal@gmail.com.

Citation: Sun, X., Meng, W., and Zhang, X. (2026). A Deep Feature Fusion Framework for Accurate and Efficient Cross-Modal Retrieval in Digital Libraries. Journal of Engineering, Project, and Production Management, 16(6), 2025-185.

DOI: 10.32738/JEPPM-2025-185

Full Text