Turkish Journal of Electrical Engineering and Computer Sciences
Author ORCID Identifier
SAMET TENEKECİ: 0000-0001-8875-4111
HÜSEYİN ÜNLÜ: 0000-0001-9906-6066
BURAK KEÇECİ: 0009-0006-2813-1303
MUHAMMED İNCİR: 0009-0001-7829-6775
ONUR DEMİRÖRS: 0000-0001-6601-3937
Abstract
Software Size Measurement (SSM) is crucial for estimating required project effort as well as budget and schedule. However, many small and medium-sized companies struggle to apply objective SSM due to limited resources and lack of expertise. This often leads to inaccurate estimates and project overruns. There is a need for practical, low-resource solutions that support these tasks without requiring expert involvement. Motivated by this challenge, this study proposes an automated software size measurement approach that formulates the measurement task as supervised regression over natural language requirements, using domain-adapted transformer models. We construct large-scale Turkish and English software engineering corpora to pre-train two models: SE-BERT and SE-BERTurk. These models are fine-tuned on a multilingual, organization-specific dataset annotated with COSMIC Function Points (CFP) and MicroM size by domain experts. We evaluate the models using various regression and classification metrics. Results show that SE-BERT improves exact match accuracy from 66.9% to 68.2% compared to BERT, while SE-BERTurk improves from 65.7% to 69.3% over BERTurk. Both models also achieve lower normalized errors than previous domain-adapted baselines BERT_SE and RE-BERT, demonstrating superior generalization. These findings highlight the effectiveness of domain-specific pre-training for software engineering tasks and its potential to support accurate software size estimation, especially in low-resource languages like Turkish and in real-world, organization-specific contexts.
DOI
10.55730/1300-0632.4172
Keywords
Software size measurement, natural language processing, COSMIC, MicroM
First Page
234
Last Page
247
Publisher
The Scientific and Technological Research Council of Türkiye (TÜBİTAK)
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
TENEKECİ, S, ÜNLÜ, H, KEÇECİ, B, İNCİR, M. E, & DEMİRÖRS, O (2026). Automated software size measurement using multilingual domain-adapted language models. Turkish Journal of Electrical Engineering and Computer Sciences 34 (2): 234-247. https://doi.org/10.55730/1300-0632.4172
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons