ASIMON360 enables simple and intuitive hardware configuration, addressing, configuration and commissioning of secure ASi networks. The software can be used to carry out simple live diagnostics on devices configured and detected in the ASi circuit using the integrated ASi Control Tools360.
Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative.
Software Asimon V3 0 Download 26
Download Zip: https://posbimruize.blogspot.com/?xw=2vBebl
Regression testing approaches assist developers to uncover faults in new software versions, compared to previous versions. One such approach is test case prioritization (TCP): it reorders tests to execute the most important ones firsts, to find faults sooner on average. TCP has been extensively studied in unit testing research (Rothermel et al. 1999; Rothermel et al. 2001; Elbaum et al. 2001; 2002; Tonella et al. 2006; Zhang et al. 2009b; Mei et al. 2012; Yoo and Harman 2012; Zhang et al. 2013; Hao et al. 2014; Henard et al. 2016; Luo et al. 2016; Luo et al. 2018; Luo et al. 2019). The unit-testing-equivalent technique for testing performance is software microbenchmarking. However, software microbenchmarks take substantially longer to execute, often taking multiple hours or even days (Huang et al. 2014; Stefan et al. 2017; Laaber and Leitner 2018), which is a compelling reason to apply TCP to capture important performance changes sooner. Unfortunately, compared to functional regression testing, performance regression testing is not as intensively studied. So far, the focus has been on predicting the performance impact of code changes on commits to decide whether performance tests should be run at all (Huang et al. 2014; Sandoval Alcocer et al. 2016), on prioritizing microbenchmarks according to the expected performance change size (Mostafa et al. 2017), or on selecting microbenchmarks that are most likely to detect a performance regression (de Oliveira et al. 2017; Alshoaibi et al. 2019; Chen et al. 2020).
Applying traditional TCP techniques to software microbenchmarks could work well due to their similarities to unit tests, i.e., a suite contains many microbenchmarks, they are defined in code, they are self-contained and therefore rearrangeable, and they operate on a granularity-level of statements and methods. In addition, existing research builds on the assumption that traditional TCP techniques can be used as baselines for TCP on microbenchmarks (Mostafa et al. 2017). However, traditional TCP techniques might also behave differently when used to prioritize microbenchmarks, for the following reasons: (1) They rank their tests based on coverage information, under the assumption that a test covering more statements, branches, or functions is more likely to find defects. However, performance changes might not be associated with the number of covered elements, but with the performance impact of each of these elements (e.g., a change to a loop variable potentially has a bigger impact than one to multiple conditional statements (Jin et al. 2012)). (2) Where unit tests have a clearly defined binary outcome (pass or fail), software microbenchmarks result in distributions of performance counters indicating probabilistic results. (3) The reliability of software microbenchmark results and, consequently, of the performance changes is dependent on how rigorous one conducts the measurement. Hence, the effectiveness of TCP techniques could be compromised by performance measurement inaccuracies.
To investigate whether these underlying differences of unit tests and software microbenchmarks lead to measurable differences in the usefulness of existing TCP techniques, we empirically study traditional coverage-based prioritization techniques along multiple dimensions: (1) greedy prioritization strategies that rank benchmarks either by their total coverage or additional coverage that is not covered by already ranked benchmarks, (2) benchmark granularity on either method or parameter level, (3) coverage information with method granularity extracted either dynamically or statically, and (4) different coverage-type-specific parameterizations. In total, our study compares 54 unique TCP technique instantiations. Research has shown that the studied dimensions affect TCP effectiveness and coverage precision (Rothermel et al. 2001; Elbaum et al. 2002; Hao et al. 2014; Henard et al. 2016; Luo et al. 2016; Luo et al. 2019; Reif et al. 2016; Reif et al. 2019).
As study objects, we select 10 Java open-source software (OSS) projects with comprehensive Java Microbenchmark Harness (JMH) suites, having 1,829 unique microbenchmarks with 6,460 unique parameterizations across 161 versions, to which we apply all prioritization techniques.
A first large-scale empirical comparison of TCP techniques applied to software microbenchmarks, which can serve as a reference point for future research to decide which techniques and parameters to choose as baselines.
Software microbenchmarking is a performance testing technique that measures certain performance metrics, such as execution time, throughput, or memory utilization, of small code units. These small code units are usually individual methods or statements, which makes software microbenchmarking comparable to unit tests in functional testing. In the remainder of the paper, we use both benchmark and microbenchmark to refer to software microbenchmarks.
Test case prioritization (TCP) describes a set of techniques that make the regression testing effort in software evolution, i.e., when new versions are submitted for testing, more effective. The idea is to reorder the execution sequence of individual test cases in a test suite, such that tests that are executed earlier have a higher potential of exposing important faults than tests that are executed later. TCP has been extensively studied for functional unit tests (Yoo and Harman 2012), but there is only one work, to the best of our knowledge, which applies TCP to performance tests, i.e., Mostafa et al. (2017).
To investigate whether TCP techniques originating from unit testing research are applicable to software microbenchmarks, we conduct a laboratory experiment (Stol and Fitzgerald 2018) on open-source Java projects with JMH software microbenchmark suites. The study compares the effectiveness and efficiency (i.e., dependent variables) of different TCP techniques, exploring a wide variety of parameter combinations (i.e., independent variables).
To study TCP for software microbenchmarks, we select 10 OSS Java libraries. Because of the time-intensive nature of rigorously executing benchmarks, it is infeasible to conduct a study as ours on, for example, all projects that have JMH suites. Therefore, we aim to select a diverse set of projects from different domains, with varying benchmark suite sizes, and a multitude of versions to apply TCP on. To this end, we perform purposive sampling (Baltes and Ralph 2020) of Github projects based on a list of 1,545 projects with JMH suites from Laaber et al. (2020).
To the best of our knowledge, this is the largest data set of software microbenchmark executions across multiple versions to date. Details, including the exact versions and commit hashes used, can be found in our replication package (Laaber et al. 2021b).
Generalizability of our study is mostly concerned with the choice of our projects and versions. We selected 10 Java OSS projects in 161 versions and with 6,460 distinct JMH benchmark parameterizations. Although we can not generalize our findings to all Java/JMH projects, the data set created for this study is, to the best of our knowledge, the most extensive microbenchmarking data set to date. More projects would have rendered our study infeasible because of the time-intensive nature of running rigorous performance experiments. We picked Java because benchmark suites written in it are long-running (Laaber and Leitner 2018; Laaber et al. 2020) and, hence, would benefit from TCP. Regarding the benchmark framework, JMH is the de facto standard for Java at the time of study (Stefan et al. 2017; Leitner and Bezemer 2017). We selected projects that are large, well-known, popular projects from different domains to investigate high-quality software projects. However, the results might not generalize to closed-source or industrial software, other programming languages, or even other software written in Java. 2ff7e9595c
Comments