Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models (MSR 2024 - Technical Papers)

Who

Anamaria-Roberta Hartl, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed

Track

MSR 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 Apr 2024 14:12 - 14:24 at Grande Auditório - Software Quality Chair(s): Gopi Krishnan Rajbahadur

Abstract

Refining high-level requirements into low-level requirements is a common task, especially in safety-critical systems engineering. The objective is to describe every important aspect of the high-level requirement in a low-level requirement, ensuring a complete and correct implementation of the system’s features. To this end, standards and regulations for safety-critical systems require reviewing the coverage of high-level requirements by all its low-level requirements to ensure no missing aspects. Supporting automatic requirements coverage reviewing is difficult as high-level and low-level requirements reside at different levels of abstraction, are natural language heavy, and often use different vocabulary. Unfortunately, this problem has received noticeably little attention from the research community.

With the rise of Large Language Models (LLMs) that have been trained on a huge corpus of text and hence might ``understand'' the context of high-level and low-level requirements, we would expect to be able to address this problem. This paper presents the first study to explore the performance of LLMs to check requirements coverage. For evaluation, we selected requirements from five publicly available data sets and evaluated whether GPT-3.5 and GPT-4 can detect whether the traced low-level requirements cover a high-level requirement. While GPT-3.5 with a zero-shot plus explanation prompting strategy correctly classifies covered high-level requirements across four projects, it correctly identifies incomplete coverage due to a single removed low-level requirements with 99.7% recall across the complete evaluation data set.

Link to Preprint

https://drive.google.com/file/d/1LzqBEmcEaoYYz-f_KIAy1J6zJwRL0PK8/view?usp=sharing

Authorizer Link

https://dl.acm.org/doi/10.1145/3643991.3644922

DOI

https://doi.org/10.1145/3643991.3644922

Anamaria-Roberta Hartl

Johannes Kepler University Linz

Austria

Christoph Mayr-Dorn

JOHANNES KEPLER UNIVERSITY LINZ

Atif Mashkoor

Johannes Kepler University Linz

Austria

Alexander Egyed

Johannes Kepler University Linz

Austria

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 Apr
Displayed time zone: Lisbon change

14:00 - 15:30	Software QualityTechnical Papers / Registered Reports / Data and Tool Showcase Track at Grande Auditório Chair(s): Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada

14:00 12m Talk		Not all Dockerfile Smells are the Same: An Empirical Evaluation of Hadolint Writing Practices by Experts Technical Papers Giovanni Rosa University of Molise, Simone Scalabrino University of Molise, Gregorio Robles Universidad Rey Juan Carlos, Rocco Oliveto University of Molise
14:12 12m Talk		Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models Technical Papers Anamaria-Roberta Hartl Johannes Kepler University Linz, Christoph Mayr-Dorn JOHANNES KEPLER UNIVERSITY LINZ, Atif Mashkoor Johannes Kepler University Linz, Alexander Egyed Johannes Kepler University Linz DOI Authorizer link Pre-print
14:24 12m Talk		On the Executability of R Markdown Files Technical Papers Md Anaytul Islam Lakehead University, Muhammad Asaduzzaman University of Windsor, Shaowei Wang Department of Computer Science, University of Manitoba, Canada
14:36 12m Talk		APIstic: A Large Collection of OpenAPI Metrics Technical Papers Souhaila Serbout Software Institute @ USI, Cesare Pautasso Software Institute, Faculty of Informatics, USI Lugano
14:48 6m Talk		Improving Automated Code Reviews: Learning From Experience Technical Papers Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne, Christoph Treude Singapore Management University, Wachiraphan (Ping) Charoenwet The University of Melbourne
14:55 4m Talk		Multi-faceted Code Smell Detection at Scale using DesigniteJava 2.0 Data and Tool Showcase Track Tushar Sharma Dalhousie University Pre-print
14:59 4m Talk		SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt Data and Tool Showcase Track Edi Sutoyo Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Andrea Capiluppi University of Groningen
15:03 4m Talk		Curated Email-Based Code Reviews Datasets Data and Tool Showcase Track Mingzhao Liang The University of Melbourne, Wachiraphan (Ping) Charoenwet The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:07 4m Talk		TestDossier: A Dataset of Tested Values Automatically Extracted from Test Execution Data and Tool Showcase Track Andre Hora UFMG Pre-print Media Attached
15:11 4m Talk		Greenlight: Highlighting TensorFlow APIs Energy Footprint Data and Tool Showcase Track Saurabhsingh Rajput Dalhousie University, Maria Kechagia University College London, Federica Sarro University College London, Tushar Sharma Dalhousie University Pre-print
15:15 5m Talk		When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems Registered Reports Gilberto Recupito University of Salerno, Giammaria Giordano University of Salerno, Filomena Ferrucci University of Salerno, Dario Di Nucci University of Salerno, Fabio Palomba University of Salerno
15:20 5m Talk		Comparison of Static Analysis Architecture Recovery Tools for Microservice Applications Registered Reports Simon Schneider Hamburg University of Technology, Alexander Bakhtin University of Oulu, Xiaozhou Li University of Oulu, Jacopo Soldani University of Pisa, Italy, Antonio Brogi Università di Pisa, Tomas Cerny University of Arizona, Riccardo Scandariato Hamburg University of Technology, Davide Taibi University of Oulu and Tampere University