•  
  •  
 

Turkish Journal of Electrical Engineering and Computer Sciences

Author ORCID Identifier

SAMET TENEKECİ: 0000-0001-8875-4111

HÜSEYİN ÜNLÜ: 0000-0001-9906-6066

BURAK KEÇECİ: 0009-0006-2813-1303

MUHAMMED İNCİR: 0009-0001-7829-6775

ONUR DEMİRÖRS: 0000-0001-6601-3937

Abstract

Software Size Measurement (SSM) is crucial for estimating required project effort as well as budget and schedule. However, many small and medium-sized companies struggle to apply objective SSM due to limited resources and lack of expertise. This often leads to inaccurate estimates and project overruns. There is a need for practical, low-resource solutions that support these tasks without requiring expert involvement. Motivated by this challenge, this study proposes an automated software size measurement approach that formulates the measurement task as supervised regression over natural language requirements, using domain-adapted transformer models. We construct large-scale Turkish and English software engineering corpora to pre-train two models: SE-BERT and SE-BERTurk. These models are fine-tuned on a multilingual, organization-specific dataset annotated with COSMIC Function Points (CFP) and MicroM size by domain experts. We evaluate the models using various regression and classification metrics. Results show that SE-BERT improves exact match accuracy from 66.9% to 68.2% compared to BERT, while SE-BERTurk improves from 65.7% to 69.3% over BERTurk. Both models also achieve lower normalized errors than previous domain-adapted baselines BERT_SE  and RE-BERT, demonstrating superior generalization. These findings highlight the effectiveness of domain-specific pre-training for software engineering tasks and its potential to support accurate software size estimation, especially in low-resource languages like Turkish and in real-world, organization-specific contexts.

DOI

10.55730/1300-0632.4172

Keywords

Software size measurement, natural language processing, COSMIC, MicroM

First Page

234

Last Page

247

Publisher

The Scientific and Technological Research Council of Türkiye (TÜBİTAK)

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS