Occasional tidyverse tasks
tidyverse
These tasks are “occasional”, until they’re not, and then you’ll be glad to have them.
stringr
Lots of good string operations
Squishing
Detecting
x <- c("Sunday", "Monday", "March", "Thursday")
str_detect(x, "day")
[1] TRUE TRUE FALSE TRUE
str_extract(x, "day")
[1] "day" "day" NA "day"
Combining
first_name <- c("John", "Paul", "George", "Ringo")
last_name <- c("Lennon", "McCartney", "Harrison", "Starr")
band <- str_c(first_name, last_name, sep = " ")
print(band)
[1] "John Lennon" "Paul McCartney" "George Harrison" "Ringo Starr"
str_flatten(band, collapse = ", ")
[1] "John Lennon, Paul McCartney, George Harrison, Ringo Starr"
“Glue”ing
str_glue(
"I know {first_name} is in the band."
)
I know John is in the band.
I know Paul is in the band.
I know George is in the band.
I know Ringo is in the band.
forcats
Useful for doing things to factors
library(palmerpenguins)
library(phonTools)
data("pb52")
penguins |>
count(
species
) ->
species_count
Columns in annoying orders
pb52 |>
ggplot(
aes(
vowel,
f1,
color = vowel,
fill = vowel
)
)+
stat_dots(
side = "both",
layout = "hex"
)
Reordering with fct_reorder()
lubridate
Dates are, in general, a mess. For example:
# this will change
# every time the site rebuilds
time_now <- now()
time_now
[1] "2024-09-20 15:20:18 UTC"
That looks like a reasonable time for us to read with our eyes, but if we convert it to a numeric value:
as.numeric(time_now)
[1] 1726845618
That number is how many seconds it has been since January 1, 1970 because that’s just the most stable way to represent dates and times.
Creating and parsing dates
# iso
ymd("2024-09-16")
[1] "2024-09-16"
# US
mdy("September 16, 2024")
[1] "2024-09-16"
# most of the world
dmy("16 September 2024")
[1] "2024-09-16"
dplyr
consecutive_id()
If you have data with “streaks” in it, consecutive_id()
will return an ID vector for each streak.
commits <- c(
"Oakley",
"Skyler", "Skyler",
"Robin",
"Oakley", "Oakley"
)
consecutive_id(commits)
[1] 1 2 2 3 4 4
I’ve used this to re-create transcript-looking things from force-aligned data.
transcript |>
filter(
str_detect(
tier,
"words"
)
) ->
words
words |>
dplyr::slice(
.by = tier,
-1
) |>
arrange(tmin) |>
drop_na() |>
mutate(
turn = consecutive_id(tier),
speaker = str_remove(tier, " - words")
) ->
words_turn
head(words_turn)
library(gt)
words_turn |>
summarise(
.by = turn,
speaker = first(speaker),
start = min(tmin),
end = max(tmax),
phrase = str_flatten(text, collapse = " ")
) |>
gt() |>
fmt_number(
columns = c(start, end),
decimals = 3
) ->
transcript_tbl
turn | speaker | start | end | phrase |
---|---|---|---|---|
1 | IVR | 0.672 | 10.832 | well uh mister scott i have a number of uh things i'd like to ask you about i wonder if you'd just mind uh answering questions uh one |
2 | KY25A | 10.702 | 11.002 | yeah |
3 | IVR | 10.832 | 13.352 | after another if you if |
4 | KY25A | 13.312 | 13.652 | well |
5 | IVR | 13.352 | 13.952 | i remind |
6 | KY25A | 13.652 | 14.972 | now you |
7 | IVR | 13.952 | 15.002 | you |
8 | KY25A | 15.002 | 15.312 | might |
9 | IVR | 15.212 | 15.312 | of |
10 | KY25A | 15.452 | 15.942 | start |
11 | IVR | 15.522 | 15.812 | uh |
12 | KY25A | 15.942 | 20.642 | that i was born in eighteen sixty seven |
13 | IVR | 21.922 | 24.712 | mhm and that makes you how old ninety |
14 | KY25A | 24.352 | 24.732 | nintey |
15 | IVR | 24.712 | 25.162 | three |
16 | KY25A | 24.732 | 25.172 | three |
Reuse
CC-BY 4.0
Citation
BibTeX citation:
@online{fruehwald,
author = {Fruehwald, Josef},
title = {Occasional Tidyverse Tasks},
url = {https://lin611-2024.github.io/notes/side-notes/content/tidyverse-examples.html},
langid = {en}
}
For attribution, please cite this work as:
Fruehwald, Josef. n.d. “Occasional Tidyverse Tasks.” https://lin611-2024.github.io/notes/side-notes/content/tidyverse-examples.html.