Authors
Affiliation

Nicole Bonge

University of Arkansas

Prof. Jihong Zhang

Workshop Date

Thursday, March 20, 2025

1 Agenda

  1. Intro to parallelization
  2. Pinnacle Portal
  3. Simulation 1 Demonstration
  4. HPC in Terminal
  5. Simulation 2 Activity

2 Getting Started

  1. Make sure the Workshop Materials (Sim1-AHPC.R file, Sim2-AHPC.R file, and Sim2-Datasets folder) are easily accessible on your personal computer.

  2. If you use Projects in R (I recommend), use the R-in-AHPC folder containing (1) the R-in-AHPC.Rproj Project, (2) the “Code” folder, with the Sim1-AHPC.R and Sim2-AHPC.R files, and (3) the “Sim2-Datasets” folder.

  3. If you have not yet made an AHPCC account, request one at hpc.uark.edu/hpc-support/user-account-requests/internal.php

3 Intro to Parallelization1

3.1 Serial vs. Parallel Processing

  • Suppose we have a series of functions to run, \(f_1\), \(f_2\), and \(f_3\).

  • Serial processing: Run \(f_1\) until it completes, and until \(f_1\) is finished, nothing else can run. Once \(f_1\) completes, \(f_2\) begins, and the process repeats.

  • Parallel processing: All \(f_i\) functions start simultaneously and run to completion.

3.2 The Serial-Parallel Scale

  • A problem can range from “inherently serial” to “perfectly parallel” (or “embarrassingly parallel”).

  • Inherently serial: A problem that cannot be parallelized at all.

    • Example: \(f_2\) depends on the output of \(f_1\) before it can begin. In this case, parallel processing wouldn’t help and might take longer than on a single core.
  • Perfectly parallel: There is absolutely no dependency between iterations, and all functions can start simultaneously.

    • Monte Carlo and statistical simulations usually fall into this category.

3.3 Vocabulary

  • HPC: High performance computing. Implies a program that is too large, or takes too long, to reasonably run on a desktop computer.

  • Core: A general term for either a single processor on your own computer or a single machine in a cluster.

  • Cluster: A collection of objects capable of hosting cores, either a network or the collection of cores on your personal computer.

  • Process: A single version of R (or any program). Each core runs a single process, and a process typically runs a single function.

3.4 Parallelization Analogy

  • In AHPC, we can run up to 32 cores per session.

  • Imagine having 15,000 jobs. Distributing these jobs among 32 friends will take much less time than doing 15,000 jobs alone.

  • Once one friend (core) finishes the job they’re working on, that friend begins the next job in the list.

  • There are diminishing returns for adding cores. Giving each friend (core) instructions takes time, and the friend telling you the results takes time, too.

3.5 The parallel & doParallel Packages

  • Step 1: Load the parallel package in R: library(parallel)

    library(parallel)
  • Step 2: Check the number of cores you have access to with detectCores().

    detectCores()
    [1] 8
    Tip

    Leave one core free when you’re running a simulation.

    n_cores <- detectCores() - 1
    n_cores
    [1] 7
  • Step 3: Use foreach loops

    The doParallel package allows foreach “loops”, similar to for loops.

    library(doParallel)

    Before using a foreach loop, you must make a cluster via the makeCluster function, then use the registerDoParallel() function:

    cl <- makeCluster(n_cores)
    registerDoParallel(cl)
    Example: foreach loop
    foreach(i = 1:20) %dopar% {
      print(paste0("sqrt(", i, ") = ", round(sqrt(i), digits = 4)))
    }
    [[1]]
    [1] "sqrt(1) = 1"
    
    [[2]]
    [1] "sqrt(2) = 1.4142"
    
    [[3]]
    [1] "sqrt(3) = 1.7321"
    
    [[4]]
    [1] "sqrt(4) = 2"
    
    [[5]]
    [1] "sqrt(5) = 2.2361"
    
    [[6]]
    [1] "sqrt(6) = 2.4495"
    
    [[7]]
    [1] "sqrt(7) = 2.6458"
    
    [[8]]
    [1] "sqrt(8) = 2.8284"
    
    [[9]]
    [1] "sqrt(9) = 3"
    
    [[10]]
    [1] "sqrt(10) = 3.1623"
    
    [[11]]
    [1] "sqrt(11) = 3.3166"
    
    [[12]]
    [1] "sqrt(12) = 3.4641"
    
    [[13]]
    [1] "sqrt(13) = 3.6056"
    
    [[14]]
    [1] "sqrt(14) = 3.7417"
    
    [[15]]
    [1] "sqrt(15) = 3.873"
    
    [[16]]
    [1] "sqrt(16) = 4"
    
    [[17]]
    [1] "sqrt(17) = 4.1231"
    
    [[18]]
    [1] "sqrt(18) = 4.2426"
    
    [[19]]
    [1] "sqrt(19) = 4.3589"
    
    [[20]]
    [1] "sqrt(20) = 4.4721"
    Tip

    Always stop the cluster at the end of your parallelization using the stopCluster function.

    stopCluster(cl)

3.6 Demonstration: foreach Efficiency2

How quickly can we square the first 1,000 or 10,000 or 100,000 integers? It might take a while sequentially (using serial processing), but we can speed up the process by using multiple cores. Let’s see how using more cores shortens computation time.

3.6.1 Demonstration Overview

The test() function does the following:

  • Creates and registers a new cluster with n_cores CPU cores (specified by user), and stops the cluster after computation.

  • Uses foreach to square the first n_iter integers (specified by user).

  • Keeps track of the time needed in total, the time needed for the squaring computations, and the time spent communicating with the cores.

Note: test() does not store the squared integers. We’re only concerned about timing for this demonstration.

test <- function(n_cores, n_iter){
  # Record start time
  time_start <- Sys.time()

  # Create and register cluster
  cl <- makeCluster(n_cores)
  registerDoParallel(cl)

  # Record this run's computation start time
  time_start_processing <- Sys.time()

  # Do the processing
  results <- foreach(i = 1:n_iter) %dopar% {
    i^2
  }

  # Record this run's computation stop time
  time_finish_processing <- Sys.time()

  # Stop the cluster
  stopCluster(cl)

  # Keep track of the end time
  time_end <- Sys.time()

  # Create report 
  total_time <- round(difftime(time_end, time_start, units = "secs"), digits = 5)
  
  compute_time <- round(difftime(time_finish_processing, time_start_processing, units = "secs"), digits = 5)
  
  overhead_time <- round((total_time - compute_time), digits = 5)
  
  out <- data.frame(
    Cores = n_cores,
    Iterations = as.integer(n_iter),
    Total.Time = total_time,
    Compute.Time = compute_time,
    Overhead.Time = overhead_time)
  
  # Return the report
  return(out)
}

3.6.2 Demonstration Execution

# Using 1, 4, and (almost) all cores
cores <- c(1, 4, detectCores()-1)

# Varying the number of replications
replications <- c(1000, 10000, 100000)

# Initializing data frame to store results
results <- data.frame()

for(n in 1:length(cores)){
  for(r in 1:length(replications)){
    # running test with specified number of cores & replications
    out1 <- test(cores[n], replications[r])
    
    # appending out1 to results data frame
    results <- rbind(results, out1)
  }
}

# view results
results

3.6.3 Demonstration Results

library(ggplot2)
library(tidyverse)
library(patchwork)
library(reshape2)

res.vis <- results |>
  mutate("Overhead.Time" = Process.Time) |>
  dplyr::select(Cores, Iterations, Total.Time, Compute.Time, Overhead.Time) |>
  melt(id.vars = c("Cores", "Iterations"), value.name = "Time") |>
  filter(variable != "Total.Time")

res.vis <- res.vis |>
  mutate(variable = factor(res.vis$variable, 
                           levels = c("Compute.Time", "Overhead.Time")), 
         Cores = as.factor(res.vis$Cores),
         Time = as.numeric(res.vis$Time))

results.vis1000 <- res.vis |> 
  filter(Iterations == 1000) 

runtime1000 <- results.vis1000 |>
  ggplot(aes(x = Cores, y = Time)) +
  geom_bar(position = "stack", stat = "identity", aes(fill = variable)) +
  scale_fill_manual(labels = c("Compute.Time" = "Computation Time", 
                               "Overhead.Time" = "Overhead Time"),
                    values = c("cyan3", "coral")) +
  labs(title = "Run Time with \n 1,000 Iterations",
       x = "Number of Cores", 
       y = "Time (Seconds)",
       fill = "Time Type") 


results.vis10000 <- res.vis |> 
  filter(Iterations == 10000) 

runtime10000 <- results.vis10000 |>
  ggplot(aes(x = Cores, y = Time)) +
  geom_bar(position = "stack", stat = "identity", aes(fill = variable)) +
  scale_fill_manual(labels = c("Compute.Time" = "Computation Time", 
                               "Overhead.Time" = "Overhead Time"),
                    values = c("cyan3", "coral")) +
  labs(title = "Run Time with \n 10,000 Iterations",
       x = "Number of Cores", 
       y = "Time (Seconds)",
       fill = "Time Type") 



results.vis100000 <- res.vis |> 
  filter(Iterations == 100000) 

runtime100000 <- results.vis100000 |>
  mutate(variable = factor(results.vis100000$variable, 
                           levels = c("Compute.Time", "Overhead.Time")), 
         Cores = as.factor(results.vis100000$Cores),
         Time = as.numeric(results.vis100000$Time)) |>
  ggplot(aes(x = Cores, y = Time)) +
  geom_bar(position = "stack", stat = "identity", aes(fill = variable)) +
  scale_fill_manual(labels = c("Compute.Time" = "Computation Time", 
                               "Overhead.Time" = "Overhead Time"),
                    values = c("cyan3", "coral")) +
  labs(title = "Run Time with \n 100,000 Iterations",
       x = "Number of Cores", 
       y = "Time (Seconds)",
       fill = "Time Type") 

runtime1000 + runtime10000 + runtime100000 +
  plot_layout(guides = "collect",
              ncol = 3) &
  theme(legend.position = "bottom",
        guides)

Observations:

  • More iterations require more computation & overhead time.

  • With more cores, iterations take less computation time but more overhead time.

4 Interactive Sessions in the Pinnacle Portal

4.1 Starting an Interactive Session

4.2 Launch R-Studio

  • Once your job is active, the “My Interactive Sessions” page looks like this:

  • Clicking “Launch R-Studio” takes you to the interactive session.

4.3 Checking Active Jobs Queue

  • In Pinnacle Portal, Menu > Jobs > Active Jobs

  • Active Jobs are sorted alphabetically by queue.

4.4 Tips for Interactive Sessions in R

  • Tip 1: Closing RStudio ends your Interactive Session.

  • Tip 2: the read_csv function causes RStudio to terminate the (RStudio) session. Use read.csv instead.

4.5 Uploading Data

  • Menu > Files > Home Directory

  • I put my files in the Desktop for quick access.

  • Make new folders using the “New Dir” button.

  • Upload file(s) using the “Upload” button.

    Tip

    To upload multiple files, compress the files/folder on your computer and upload a zipped file (.zip)

4.6 Downloading Files

  • Tip: For multiple files, compress the files/folder in the Interactive Session (or in Terminal).

  • In the Files App, click on the file, then click the “Download” button.

5 Simulation 1 Demonstration: Type I Error

5.1 Simulation Rationale

  • In this study, we will demonstrate the Type I Error rates for One-Way ANOVA.

  • Recall, we use One-Way ANOVA to detect group mean differences.

  • For three groups, the null hypothesis is \(H_0:\mu_1=\mu_2=\mu_3\).

  • Type I Error is a false positive (rejecting the null hypothesis when the null hypothesis is true).

  • We (the researchers) determine \(\alpha\), the probability of committing a Type I error (usually .05, sometimes .01 or .001).

5.2 Simulation Overview

  • In this simulation, we will generate (simulate) 10,000 datasets of three groups with equal means.

  • We will perform an ANOVA test on each dataset, and record whether the resulting \(p\)-value is significant.

  • When \(\alpha=.05\), we expect ~500/10,000 significance tests to reject \(H_0\), even though \(H_0\) is true.

  • When \(\alpha=.01\), we expect ~100/10,000 significance tests to reject \(H_0\).

  • When \(\alpha=.001\), we expect ~10/10,000 significance tests to reject \(H_0\).

5.3 Simulation 1 in AHPC Interactive Session: Steps

  1. Upload Sim1-AHPC.R code file from your computer using the Files App in the Pinnacle Portal.

    • If you are using the R-in-AHPC folder, upload the .zip file using the Files App.
  2. Begin Interactive Session.

  3. Run R Script in the Interactive Session.

  4. Compress Results file in the Interactive Session.

  5. End Interactive Session.

  6. Download (zipped) results from Files App to your personal computer, then analyze!

5.4 Video: Simulation 1 in AHPC

6 Submitting Jobs in Terminal

6.1 Connect to HPCC from the Terminal

Type in the following command in your terminal on Mac (or Powershell on Windows). It will request your university password (“[username]@hpc-portal2.hpc.uark.edu’s password:”). After you provide the password, you should connect to the pinnacle login node.

Code
ssh [username]@hpc-portal2.hpc.uark.edu

6.2 Create a new folder and Upload the file

  • Use the following command to create a new folder in your AHPCC account root folder.
Code
mkdir R-in-HPCC

Open a new terminal, type in the following command to upload the R file and job file to the folder R-in-HPCC

Code
scp Sim1-HPCC.R [username]@hpc-portal2.hpc.uark.edu:/home/[username]/R-in-HPCC
scp job.sh [username]@hpc-portal2.hpc.uark.edu:/home/[username]/R-in-HPCC

6.3 Submit the job task

Code
pinnacle-l3:[username]:~/R-in-HPCC$ sbatch job.sh
Submitted batch job 637189

6.4 Check Results

Code
pinnacle-l3:[username]:~/R-in-HPCC$ ls Type1-Results/

7 Simulation 2 Exercise: Power

  • Your turn! Use the Sim2-AHPC.R file to run Simulation 2.

8 Wrapping up

9 Thank you!

Footnotes

  1. Information in this section is adapted from Dr. Josh Errickson’s Notes on Parallel Processing in R↩︎

  2. Demonstration adapted from https://www.appsilon.com/post/r-doparallel.↩︎