Machine learning based cloud forecast models and GPU architectures in DoD HPC system

Nguyen, Chuyen (NRL-Monterey)

Co-Authors:
Nguyen,Chuyen
Jason, Nachamkin
Surratt, Melinda
Sidoti, David
Jacob, Gull
Bienkowski, Adam

Category:
Climate/Weather/Ocean Modeling

General Purpose Graphical Processing Units (GP-GPUs) are a new and useful technology to facilitate and accelerate numerous data-parallel machine learning algorithms. Several GPU architectures and programming models have emerged in recent years and are being applied within the DoD High-Performance Computing (HPC) community. The Naval Research Laboratory (NRL) is developing machine learning-based forecasts of cloud cover by fusing numerical weather prediction (NWP) model and satellite data. These models were built for the dual purpose of understanding NWP model errors as well as improving the accuracy and sensitivity of the forecasts. Our framework implements a Unet-Convolution Neural Network with features extracted from clouds observed by the Geostationary Operational Environmental Satellite (GOES-16) as well as clouds predicted by the Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS)#. The machine learning models were developed and run on Nvidia's parallel architectures using a GP-GPU programming model: Compute Unified Device Architecture (CUDA) implemented on the Vulcanite HPC system. The system constructs and features that have direct improvement on our model run time and performance will be discussed. The large number of multiprocessors enabled our models to complete the learning process from loading, preprocessing, data splitting, training, testing and evaluation of a 24 GB 3 dimensional dataset. Approximately 8 million trainable parameters were processed within 32 minutes total running time. Our models utilized all of the optimization strategies plausible on Vulcanite system. In this presentation, we compare model runtime and performance from different levels of implementation of the GP-GPU architectures. An overview of the framework and comparative assessments of results for the same machine learning architecture running with 2, 4, and 8 GPU and 1, 2, and 4 nodes on Vulcanite will be presented. The comprehensive study presented here will establish connections between programming models, GPU implementations and applications.