5 min read

Volleyball Waffle Chart

First we will load in our data from the GitHub URL.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
logs <- read_csv('https://raw.githubusercontent.com/dwillis/NCAAWomensVolleyballData/main/data/ncaa_womens_volleyball_matchstats_2022.csv')
## Rows: 6077 Columns: 36
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (3): team, opponent, home_away
## dbl  (31): team_score, opponent_score, s, kills, errors, total_attacks, hit_...
## lgl   (1): result
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s take a look.

logs
## # A tibble: 6,077 × 36
##    date       team     oppon…¹ home_…² result team_…³ oppon…⁴     s kills errors
##    <date>     <chr>    <chr>   <chr>   <lgl>    <dbl>   <dbl> <dbl> <dbl>  <dbl>
##  1 2022-08-26 A&M-Cor… Nebras… Away    NA           0       3     3    20     22
##  2 2022-08-26 A&M-Cor… Pepper… Away    NA           0       3     3    36     21
##  3 2022-08-27 A&M-Cor… Tulsa   Away    NA           2       3     5    75     22
##  4 2022-08-30 A&M-Cor… UTRGV   Away    NA           2       3     5    78     24
##  5 2022-09-02 A&M-Cor… Sam Ho… Home    NA           0       3     3    41     22
##  6 2022-09-03 A&M-Cor… SMU     Home    NA           0       3     3    41     18
##  7 2022-09-03 A&M-Cor… Indiana Home    NA           0       3     3    25     17
##  8 2022-09-06 A&M-Cor… UTRGV   Home    NA           0       3     3    34     22
##  9 2022-09-09 A&M-Cor… Texas … Away    NA           0       3     3    33     20
## 10 2022-09-10 A&M-Cor… Rice    Away    NA           0       3     3    36     21
## # … with 6,067 more rows, 26 more variables: total_attacks <dbl>,
## #   hit_pct <dbl>, assists <dbl>, aces <dbl>, s_err <dbl>, digs <dbl>,
## #   r_err <dbl>, block_solos <dbl>, block_assists <dbl>, b_err <dbl>,
## #   pts <dbl>, bhe <dbl>, defensive_kills <dbl>, defensive_errors <dbl>,
## #   defensive_total_attacks <dbl>, defensive_hit_pct <dbl>,
## #   defensive_assists <dbl>, defensive_aces <dbl>, defensive_s_err <dbl>,
## #   defensive_digs <dbl>, defensive_r_err <dbl>, defensive_block_solos <dbl>, …

We want to group these game logs by team and then summarize them by adding up each game’s blocks, kills, and aces. We use a special formula to calculate points scored from blocks. We’ll store this in a new dataframe called totals.

totals <- logs %>% group_by(team) %>% 
  summarize(total_kills = sum(kills), total_blocks = sum(block_solos) + (round(sum(block_assists)/2)), total_aces = sum(aces))
totals
## # A tibble: 345 × 4
##    team                         total_kills total_blocks total_aces
##    <chr>                              <dbl>        <dbl>      <dbl>
##  1 A&M-Corpus Christi Islanders         991          111         96
##  2 Abilene Christian Wildcats           804           93         93
##  3 Air Force Falcons                    871          175        120
##  4 Akron Zips                           797          111         77
##  5 Alabama A&M Bulldogs                 646          108         89
##  6 Alabama Crimson Tide                 879          147        138
##  7 Alabama St. Lady Hornets             831          128        130
##  8 Alcorn Lady Braves                   480           80         66
##  9 American Eagles                      910          159        115
## 10 App State Mountaineers               818          151        108
## # … with 335 more rows

If we filter by team, we can find Maryland and make a variable that has the season totals.

totals %>% filter(team == "Maryland Terrapins, Terps")
## # A tibble: 1 × 4
##   team                      total_kills total_blocks total_aces
##   <chr>                           <dbl>        <dbl>      <dbl>
## 1 Maryland Terrapins, Terps         767          228        133
md <- c("Kills" = 767, "Blocks" = 228, "Aces" = 133)

Let’s try and plot this on a waffle chart. We need to load in our waffle library and format the plot so we can understand it.

library(waffle)
waffle(
        md, 
        rows = 10, 
        title="Maryland Volleyball's Offense", 
        xlab="1 square = 1 point", 
        colors = c("red", "orange", "yellow")
)

Well… the squares are too small. What if we increase the scale of each square and add some more default rows?

md <- md/2
waffle(
        md, 
        rows = 14, 
        title="Maryland Volleyball's Offense", 
        xlab="1 square = 2 points", 
        colors = c("red", "orange", "yellow")
)

Better. But now we should compare to another team and see how the Terps stack up. I picked Michigan because that’s where my dad went.

totals %>% filter(team == "Michigan Wolverines")
## # A tibble: 1 × 4
##   team                total_kills total_blocks total_aces
##   <chr>                     <dbl>        <dbl>      <dbl>
## 1 Michigan Wolverines         748          128         88

We’ll make another variable for their totals and divide by two to match the same scale as before. If we use the iron function we can plot the two teams side by side.

mich <- c("Kills" = 748, "Blocks" = 128, "Aces" = 88)
mich <- mich/2
iron(
waffle(
        mich, 
        rows = 14, 
        title="Michigan Volleyball's Offense", 
        xlab="1 square = 2 points", 
        colors = c("red", "orange", "yellow")
),
waffle(
        md, 
        rows = 14, 
        title="Maryland Volleyball's Offense", 
        xlab="1 square = 2 points", 
        colors = c("red", "orange", "yellow")
)
)

Almost done. We need to make sure each square is the same size across teams so that we can compare. Michigan scored less points, so they need to be padded with whitespace. We’ll take the difference of the two vectors and add it to Michigan’s. After we divide by 2, this should fill in the difference with white squares instead of enlarging the squares.

mich <- c("Kills" = 748, "Blocks" = 128, "Aces" = 88, sum(md - mich)*2)
mich <- mich/2
iron(
waffle(
        mich, 
        rows = 14, 
        title="Michigan Volleyball's Offense", 
        xlab="1 square = 2 points", 
        colors = c("red", "orange", "yellow", "white")
),
waffle(
        md, 
        rows = 14, 
        title="Maryland Volleyball's Offense", 
        xlab="1 square = 2 points", 
        colors = c("red", "orange", "yellow")
)
)

Cool! Maryland has scored more points than Michigan. Both teams get about the same amount of points from kills though. Maryland only has 8 more. Maryland gets a lot more blocks and a handful more aces though. This is what really creates the gap between the two teams’ totals.