Occasional tidyverse tasks

tidyverse
Author

Josef Fruehwald

These tasks are “occasional”, until they’re not, and then you’ll be glad to have them.

stringr

Lots of good string operations

Squishing

a <- "   hello    "
print(a)
[1] "   hello    "
b <- str_squish(a)
print(b)
[1] "hello"

Detecting

x <- c("Sunday", "Monday", "March", "Thursday")

str_detect(x, "day")
[1]  TRUE  TRUE FALSE  TRUE
str_extract(x, "day")
[1] "day" "day" NA    "day"

Combining

first_name <- c("John", "Paul", "George", "Ringo")
last_name <- c("Lennon", "McCartney", "Harrison", "Starr")

band <- str_c(first_name, last_name, sep = " ")
print(band)
[1] "John Lennon"     "Paul McCartney"  "George Harrison" "Ringo Starr"    
str_flatten(band, collapse = ", ")
[1] "John Lennon, Paul McCartney, George Harrison, Ringo Starr"

“Glue”ing

str_glue(
  "I know {first_name} is in the band."
)
I know John is in the band.
I know Paul is in the band.
I know George is in the band.
I know Ringo is in the band.

forcats

Useful for doing things to factors

library(palmerpenguins)
library(phonTools)

data("pb52")
penguins |> 
  count(
    species
  ) ->
  species_count

Columns in annoying orders

species_count |> 
  ggplot(
    aes(species, n)
  )+
    geom_col()

pb52 |> 
  ggplot(
    aes(
      vowel, 
      f1,
      color = vowel,
      fill = vowel
    )
  )+
    stat_dots(
      side = "both",
      layout = "hex"
    )

Reordering with fct_reorder()

species_count |> 
  mutate(
    species = fct_reorder(
      species,
      desc(n)
    )
  ) |> 
  ggplot(
    aes(species, n)
  )+
    geom_col()

pb52 |> 
  mutate(
    vowel = fct_reorder(
      vowel,
      desc(f1),
      .fun = mean
    )
  ) |> 
  ggplot(
    aes(
      vowel,
      f1, 
      color = vowel,
      fill = vowel
    )
  )+
    stat_dots(
      side = "both",
      layout = "hex"
    )

lubridate

Dates are, in general, a mess. For example:

# this will change
# every time the site rebuilds
time_now <- now()
time_now
[1] "2024-09-20 15:20:18 UTC"

That looks like a reasonable time for us to read with our eyes, but if we convert it to a numeric value:

as.numeric(time_now)
[1] 1726845618

That number is how many seconds it has been since January 1, 1970 because that’s just the most stable way to represent dates and times.

Creating and parsing dates

# iso
ymd("2024-09-16")
[1] "2024-09-16"
# US
mdy("September 16, 2024")
[1] "2024-09-16"
# most of the world
dmy("16 September 2024")
[1] "2024-09-16"

dplyr

consecutive_id()

If you have data with “streaks” in it, consecutive_id() will return an ID vector for each streak.

commits <- c(
  "Oakley",
  "Skyler", "Skyler",
  "Robin",
  "Oakley", "Oakley"
)

consecutive_id(commits)
[1] 1 2 2 3 4 4

I’ve used this to re-create transcript-looking things from force-aligned data.

transcript <- read_csv("data/transcript.csv")
head(transcript)
transcript |> 
  filter(
    str_detect(
      tier,
      "words"
    )
  ) ->
  words
words |> 
  dplyr::slice(
    .by = tier,
    -1
  ) |>
  arrange(tmin) |> 
  drop_na() |> 
  mutate(
    turn  = consecutive_id(tier),
    speaker = str_remove(tier, " - words")
  ) ->
  words_turn

head(words_turn)
library(gt)

words_turn |> 
  summarise(
    .by = turn,
    speaker = first(speaker),
    start = min(tmin),
    end = max(tmax),
    phrase = str_flatten(text, collapse = " ")
  ) |> 
  gt() |> 
  fmt_number(
    columns = c(start, end),
    decimals = 3
  ) ->
  transcript_tbl
turn speaker start end phrase
1 IVR 0.672 10.832 well uh mister scott i have a number of uh things i'd like to ask you about i wonder if you'd just mind uh answering questions uh one
2 KY25A 10.702 11.002 yeah
3 IVR 10.832 13.352 after another if you if
4 KY25A 13.312 13.652 well
5 IVR 13.352 13.952 i remind
6 KY25A 13.652 14.972 now you
7 IVR 13.952 15.002 you
8 KY25A 15.002 15.312 might
9 IVR 15.212 15.312 of
10 KY25A 15.452 15.942 start
11 IVR 15.522 15.812 uh
12 KY25A 15.942 20.642 that i was born in eighteen sixty seven
13 IVR 21.922 24.712 mhm and that makes you how old ninety
14 KY25A 24.352 24.732 nintey
15 IVR 24.712 25.162 three
16 KY25A 24.732 25.172 three

Reuse

CC-BY 4.0

Citation

BibTeX citation:
@online{fruehwald,
  author = {Fruehwald, Josef},
  title = {Occasional Tidyverse Tasks},
  url = {https://lin611-2024.github.io/notes/side-notes/content/tidyverse-examples.html},
  langid = {en}
}
For attribution, please cite this work as:
Fruehwald, Josef. n.d. “Occasional Tidyverse Tasks.” https://lin611-2024.github.io/notes/side-notes/content/tidyverse-examples.html.