CyberBERT: A Semantic Search Framework for Security Terminologies Using Transformer Models

Authors

  • Rudolf Sinaga Faculty of Computer Science, Dinamika Bangsa University
  • Frangky Faculty of Computer Science, Dinamika Bangsa University

DOI:

https://doi.org/10.70062/globalscience.v1i4.179

Keywords:

Semantic Search; Cybersecurity Ontologies; Transformer Models; Terminology Alignment; Semantic Interoperability

Abstract

: The rapid expansion of cybersecurity standards and threat intelligence frameworks has led to significant semantic fragmentation among security terminologies, hindering effective information retrieval and interoperability across systems. Traditional keyword-based search approaches are inadequate for capturing the contextual meaning of security terms, particularly within formal frameworks such as NIST, MITRE ATT&CK, and CWE. This study addresses this challenge by proposing CyberBERT, a transformer-based semantic search framework designed to align cybersecurity terminologies through deep contextual representation and ontology-driven reasoning. Research Objectives: The primary objective of this research is to develop a semantic retrieval model capable of understanding conceptual relationships between security terms beyond lexical similarity. Methodology: The proposed methodology fine-tunes a BERT-based model on the NIST Glossary corpus using a combination of masked language modeling and triplet loss objectives to generate discriminative semantic embeddings. These embeddings are further aligned with cybersecurity ontologies, including MITRE ATT&CK and CWE, to enhance semantic consistency and explainability. Semantic retrieval is performed using cosine similarity within a 768-dimensional embedding space and evaluated using Mean Reciprocal Rank (MRR) and Precision@K metrics. Results: Experimental results demonstrate that CyberBERT achieves an MRR of 0.832, outperforming domain-adapted baselines such as SecureBERT and CyBERT. The integration of ontology alignment improves semantic accuracy by over 6%, while robustness evaluations confirm resilience against adversarial linguistic perturbations. Visualization using t-SNE reveals coherent semantic clustering aligned with the five core NIST Cybersecurity Framework functions. Conclusions: In conclusion, CyberBERT effectively bridges semantic gaps across cybersecurity terminologies by combining transformer-based contextual learning with ontological reasoning. The framework offers a robust, interpretable, and scalable solution for semantic search, supporting improved interoperability and knowledge discovery in cybersecurity operations and standards harmonization.

References

Akbar, K. A., Rahman, F. I., Singhal, A., Khan, L., & Thuraisingham, B. (2023). The Design and Application of a Unified Ontology for Cyber Security. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 14424 LNCS, 23–41. https://doi.org/10.1007/978-3-031-49099-6_2

Al Daoud, E., Al Daoud, L., Asassfeh, M., Al-Shaikh, A., Al-Sherideh, A. S., & Afaneh, S. (2024). Enhancing Cybersecurity with Transformers: Preventing Phishing Emails and Social Media Scams. Proceedings - 2024 IEEE Conference on Dependable and Secure Computing, DSC 2024, 31–36. https://doi.org/10.1109/DSC63325.2024.00017

Biderman, D., Portes, J., Ortiz, J. J. G., Paul, M., Greengard, P., Jennings, C., King, D., Havens, S., Chiley, V., Frankle, J., Blakeney, C., & Cunningham, J. P. (2024). LoRA Learns Less and Forgets Less. Transactions on Machine Learning Research, 2024. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85207034728&partnerID=40&md5=ff7954441d7c9b6e6ea1c5a72b679723

Fu, K., & Dai, J. (2025). Semantic-Aware Framework for Backdoor Detection in AI Models. International Journal on Semantic Web and Information Systems, 21(1). https://doi.org/10.4018/IJSWIS.378675

Gyawali, S., Jiang, Y., & Huang, J. (2025). In-Progress: Augmenting Explainable AI with LLMs to Enhance User Trust in Intelligent Transportation Systems. Proceedings - 46th IEEE Symposium on Security and Privacy Workshops, SPW 2025, 358–360. https://doi.org/10.1109/SPW67851.2025.00051

Hussain, A., Saadia, A., & Alserhani, F. M. (2025). Ransomware detection and family classification using fine-tuned BERT and RoBERTa models. Egyptian Informatics Journal, 30, 100645. https://doi.org/https://doi.org/10.1016/j.eij.2025.100645

Jahangir, H., Goel, S. K., & Khurana, S. (2024). Scaling Up the Transformers: A Survey of Training and Inference Optimization Techniques. 2024 International Conference on Electrical, Electronics and Computing Technologies, ICEECT 2024. https://doi.org/10.1109/ICEECT61758.2024.10739061

Jbene, M., Tigani, S., Chehri, A., Chaibi, H., & Saadane, R. (2023). Tracking Dialog States in Goal-Oriented Dialogues using a BERT-Based Siamese Network. Procedia Computer Science, 225, 80–87. https://doi.org/https://doi.org/10.1016/j.procs.2023.09.094

Karakoltzidis, A., Battistelli, C. L., Bossa, C., Bouman, E. A., Garmendia Aguirre, I., Iavicoli, I., Jeddi, M. Z., Karakitsios, S., Leso, V., Løfstedt, M., Magagna, B., Sarigiannis, D., Schultes, E., Soeteman-Hernández, L. G., Subramanian, V., & Nymark, P. (2024). The FAIR principles as a key enabler to operationalize safe and sustainable by design approaches. RSC Sustainability, 2(11), 3464–3477. https://doi.org/https://doi.org/10.1039/d4su00171k

Khoshvaght, H., Permala, R. R., Razmjou, A., & Khiadani, M. (2025). A critical review on selecting performance evaluation metrics for supervised machine learning models in wastewater quality prediction. Journal of Environmental Chemical Engineering, 13(6), 119675. https://doi.org/https://doi.org/10.1016/j.jece.2025.119675

Kollapally, N. M., Geller, J., Keloth, V. K., He, Z., & Xu, J. (2025). Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement. Journal of Biomedical Informatics, 168, 104865. https://doi.org/https://doi.org/10.1016/j.jbi.2025.104865

Kurnia, R., Brata, Z. A., Nelistiani, G. A., Heo, S., Kim, H., & Kim, H. (2025). Toward Robust Security Orchestration and Automated Response in Security Operations Centers with a Hyper-Automation Approach Using Agentic Artificial Intelligence. Information (Switzerland), 16(5). https://doi.org/10.3390/info16050365

Leblanc, A., Robin, J., Rabah, N. Ben, Huang, Z., & Grand, B. Le. (2025). Rethinking Cybersecurity Ontology Classification and Evaluation: Towards a Credibility-Centered Framework. https://www.linkedin.com/company/autonomic-cybersecurity-with-adversarial-learning-and-

Li, T., Mohamedikbal, S., Bestry, M., Batley, J., & Edwards, D. (2025). Pangenomics combined with artificial intelligence and precision breeding can accelerate crop improvement. Current Opinion in Plant Biology, 88, 102825. https://doi.org/https://doi.org/10.1016/j.pbi.2025.102825

Liu, Z., Lyn, J., Zhu, W., Tian, X., & Graham, Y. (2024). ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024, 1, 622–641. https://doi.org/10.18653/v1/2024.naacl-long.35

Lu, Z., Wang, X., Arafin, M. T., Yang, H., Liu, Z., Zhang, J., & Qu, G. (2024). An RRAM-Based Computing-in-Memory Architecture and Its Application in Accelerating Transformer Inference. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 32(3), 485–496. https://doi.org/10.1109/TVLSI.2023.3345651

Momcilovic, T. B., Buesser, B., Zizzo, G., Purcell, M., & Balta, D. (2024). Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation. CEUR Workshop Proceedings, 3793, 121–128. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85208283226&partnerID=40&md5=2be6d1e85223a03b9147402cf6dfd07f

Moreira, J., Donkers, A., Pauwels, P., Bektas, E., & van Ee, T. (2024). Onto4Reuse: Towards an Ontology Reuse Framework for Knowledge-intensive Software Engineering. CEUR Workshop Proceedings, 3882. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214569511&partnerID=40&md5=d98e534875c2885ed43d088c53510b07

Oliveira, N., Sousa, N., & Praça, I. (2022). A Search Engine for Scientific Publications: A Cybersecurity Case Study. Lecture Notes in Networks and Systems, 327 LNNS, 108–118. https://doi.org/10.1007/978-3-030-86261-9_11

Ranade, P., Piplai, A., Joshi, A., & Finin, T. (2021). CyBERT: Contextualized Embeddings for the Cybersecurity Domain. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, 3334–3342. https://doi.org/10.1109/BigData52589.2021.9671824

Sánchez-Zas, C., Larriva-Novo, X., Villagrá, V. A., Solera-Cotanilla, S., & Sanz-Rodrigo, M. (2026). Dynamic characterisation of cyberattacks based on the MITRE ATT&CK framework applied to the optimisation of a mitigation selection process. Future Generation Computer Systems, 177, 108272. https://doi.org/https://doi.org/10.1016/j.future.2025.108272

Sarker, I. H., Janicke, H., Mohammad, N., Watters, P., & Nepal, S. (2024). AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity. Lecture Notes in Networks and Systems, 874 LNNS, 140–149. https://doi.org/10.1007/978-3-031-50887-5_14

Shah, V. H., & Maniar, R. (2024). A Comprehensive Review of Ontologies in Cybersecurity. In Advanced Cyber Security Techniques for Data, Blockchain, IoT, and Network Protection (pp. 1–20). https://doi.org/10.4018/979-8-3693-9225-6.ch001

Tilbury, J., & Flowerday, S. (2024). Humans and Automation: Augmenting Security Operation Centers. Journal of Cybersecurity and Privacy, 4(3), 388–409. https://doi.org/10.3390/jcp4030020

Wan, M., Liu, X., An, S., Tan, A., Jin, X., & Sheng, C. (2026). Security script arrangement based on enhanced BERT for cooperative defense in networked control systems. Expert Systems with Applications, 298, 129753. https://doi.org/https://doi.org/10.1016/j.eswa.2025.129753

Wang, Z., Liang, J., He, R., Wang, Z., & Tan, T. (2025). LORA-PRO: ARE LOW-RANK ADAPTERS PROPERLY OPTIMIZED? 13th International Conference on Learning Representations, ICLR 2025, 40167–40188. https://www.scopus.com/inward/record.uri?eid=2-s2.0-105010205526&partnerID=40&md5=28b3e3b3ff18fd0cad230e670899681e

Yan, S., Wang, Z., & Dobolyi, D. (2025). An explainable framework for assisting the detection of AI-generated textual content. Decision Support Systems, 196, 114498. https://doi.org/https://doi.org/10.1016/j.dss.2025.114498

Zhang, H., Xu, B., Xiao, S., Zhang, C., & Ji, L. (2026). Zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning. Information Processing and Management, 63(1). https://doi.org/10.1016/j.ipm.2025.104344

Zhang, J., Muhamed, A., Anantharaman, A., Wang, G., Chen, C., Zhong, K., Cui, Q., Xu, Y., Zeng, B., Chilimbi, T., & Chen, Y. (2023). ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2, 1128–1136. https://doi.org/10.18653/v1/2023.acl-short.97

Zheng, W., Yin, L., Chen, X., Ma, Z., Liu, S., & Yang, B. (2021). Knowledge base graph embedding module design for Visual question answering model. Pattern Recognition, 120, 108153. https://doi.org/https://doi.org/10.1016/j.patcog.2021.108153

Downloads

Published

2025-12-29

How to Cite

Sinaga, R., & Frangky. (2025). CyberBERT: A Semantic Search Framework for Security Terminologies Using Transformer Models . Global Science: Journal of Information Technology and Computer Science, 1(4), 1–16. https://doi.org/10.70062/globalscience.v1i4.179