Progress Towards Development of a Machine Learning-Based Automated Test System

Abras, Jennifer (HPCMP CREATE)

Co-Authors:
Todd Tuckey
Nathan Hariharan

Category:
CFD

Machine learning refers to a vast class of mathematical, statistical, and optimization methods that can infer useful information from observed data. One class of widely popular methods is the kernel methods, and the best known is the support vector machine (SVM). The versatility of the SVM extends to the problem of novelty detection. This semi-supervised method is ideal for applications where the primary purpose of the effort is to detect novelty, not to assign classifications to a field of data. SVMs, in particular, are advantageous because by reducing the importance of the dataset to only the required points needed to define the support vectors, the density of the dataset becomes an irrelevant aspect of the problem; thus, the method can support a wide variety of datasets.

Novelty detection is applied to aid verification and validation of physics-based simulation software in the present effort. A multi-physics code incorporates computational fluid dynamics components and includes other non-aerodynamic-based capabilities. The HPCMP CREATETM-AV Kestrel automated testing system covers all the capabilities and hence contains a wide variety of aerodynamic content and output data types. The limiting factor in the current process is the number of features of each ATS case that are being monitored. A comprehensive ATS report with only a few plots per case will span hundreds of pages for the analyst to process. Expanding this report to accommodate hundreds, or thousands, of features per case will make it untenable for human oversight without errors creeping in.

Novelty detection may be applied to provide additional "eyes" on the problem. A properly trained ML model can routinely watch the hundreds, or thousands, of features the analyst would like to but cannot and flag the analyst if there are potential problems to address in a single table added to the existing report. An automated system has been developed with this goal in mind. This process is demonstrated in six cases that cover various simulations that manifest different target areas of interest. Each case presents a different challenge to the process. For some cases, it is the nature of the underlying aerodynamic data; for others, it is the manifestation of the data distribution in the standard space as data accumulate. In all cases, the current process is demonstrated to provide the analyst with a reasonable assessment of the need for further investigation.