Daten importieren, exportieren, zusammenfügen und pivotieren

Unit 4

Ziele für heute

grundlegenden Befehle für den Datenimport und -export benennen
Befehle zur Zusammenfügung von Daten identifizieren
Grundprinzipien des tidy data-Konzepts nennen
{tidyr}-Befehle um Daten zu pivotieren auflisten

Daten importieren und exportieren

Rechteckige Daten importieren

install.packages("readxl")
library(readxl)

Rechteckige Daten importieren

`readr`

Funktion	Trennung
`read_csv()`	,
`read_csv2()`	;
`read_tsv()`	⇥ (Tab)
`read_delim()`	Selbst-definiert

`readxl`

Funktion	Dateityp
`read_excel()`	xls oder xlsx

Rechteckige Daten exportieren

install.packages("writexl")
library(writexl)

Rechteckige Daten exportieren

`readr`

write_csv()
write_csv2()
write_tsv()
write_delim()

`writexl`

write_xlsx()

Daten lesen

df <- read_delim("data/ogd_10130.csv", delim = ";")
glimpse(df)

Rows: 1,487
Columns: 13
$ date               <date> 2024-11-01, 2024-10-01, 2024-09-01, 2024-08-01, 20…
$ `station/location` <chr> "BAS", "BAS", "BAS", "BAS", "BAS", "BAS", "BAS", "B…
$ station_name       <chr> "Basel / Binningen", "Basel / Binningen", "Basel / …
$ gre000m0           <dbl> 54, 80, 139, 247, 247, 216, 198, 161, 120, 69, 44, …
$ hto000m0           <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, …
$ nto000m0           <dbl> 72, 76, 69, 48, 59, 76, 76, 80, 81, 83, 79, 80, 82,…
$ prestam0           <dbl> 984.7, 980.0, 978.5, 979.3, 979.0, 977.6, 976.4, 97…
$ rre150m0           <dbl> 53.5, 81.4, 81.2, 25.8, 62.2, 76.4, 166.7, 39.3, 78…
$ sre000m0           <dbl> 5349, 4917, 7705, 17098, 13519, 9151, 9188, 7279, 6…
$ tre200m0           <dbl> 6.2, 12.9, 15.6, 22.2, 21.0, 18.7, 14.9, 11.1, 9.3,…
$ tre200mn           <dbl> -4.4, 4.8, 3.8, 10.3, 12.6, 8.3, 6.3, 0.4, 1.4, 0.3…
$ tre200mx           <dbl> 16.1, 21.8, 32.1, 35.4, 34.5, 32.2, 27.1, 28.8, 20.…
$ ure200m0           <dbl> 84.3, 85.8, 78.2, 68.7, 70.5, 73.4, 75.0, 68.6, 74.…

Daten schreiben

fussball_weltmeister

# A tibble: 10 × 3
    jahr weltmeisterschaft titeltraeger
   <int> <chr>             <chr>       
 1  2023 Frauen            Spanien     
 2  2022 Männer            Argentinien 
 3  2019 Frauen            USA         
 4  2018 Männer            Frankreich  
 5  2015 Frauen            USA         
 6  2014 Männer            Deutschland 
 7  2011 Frauen            Japan       
 8  2010 Männer            Spanien     
 9  2007 Frauen            Deutschland 
10  2006 Männer            Italien

write_csv(x = fussball_weltmeister, file = "data/fussball_weltmeister.csv")

Daten wieder einlesen

read_csv("data/fussball_weltmeister.csv")

# A tibble: 10 × 3
    jahr weltmeisterschaft titeltraeger
   <dbl> <chr>             <chr>       
 1  2023 Frauen            Spanien     
 2  2022 Männer            Argentinien 
 3  2019 Frauen            USA         
 4  2018 Männer            Frankreich  
 5  2015 Frauen            USA         
 6  2014 Männer            Deutschland 
 7  2011 Frauen            Japan       
 8  2010 Männer            Spanien     
 9  2007 Frauen            Deutschland 
10  2006 Männer            Italien

`rio`

import(), export(), convert()

install.packages("rio")
library(rio)

x <- import("mtcars.csv") 
y <- import("mtcars.rds") # R data format 
z <- import("mtcars.sav") # SPSS
u <- import("mtcars.xlsx")
w <- import("mtcars.json")

export("mtcars.csv")
export(list(mtcars, penguins), "mtcars-penguins.xlsx") # multiple sheets
import("mtcars-penguins.xlsx", which = "penguins") # select one sheet
import_list("mtcars-penguins.xlsx") # import multiple objects

Andere Formate

readRDS() und writeRDS()

Zwischenergebnisse als CSV zu speichern unzuverlässig, wenn bestimmte Variablentypen beibehalten werden sollen
read_csv() kann nicht wissen welche Levels eine Faktor-Variable hat
Alternative: RDS-Dateien, ein R-internes Dateiformat

Variablen-Namen

wetter <- read_delim("data/ogd_12030.csv")
names(wetter)

 [1] "Datum"                       "Jahr"                       
 [3] "Globalstrahlung in W/m2"     "Gesamtschneemenge"          
 [5] "Gesamtbewölkung"             "Luftdruck in hPa"           
 [7] "Niederschlag"                "Sonnenscheindauer"          
 [9] "Tagesmittel Lufttemperatur"  "Tagesminimum Lufttemperatur"
[11] "Tagesmaximum Lufttemperatur" "Relative Luftfeuchtigkeit"

wetter |> 
  filter(Globalstrahlung in W/m2 > 111)

Error in parse(text = input): <text>:2:26: unexpected 'in'
1: wetter |> 
2:   filter(Globalstrahlung in
                            ^

Variablen-Namen - Backticks `

wetter |> 
  filter(`Globalstrahlung in W/m2` > 111)

# A tibble: 8,013 × 12
   Datum       Jahr `Globalstrahlung in W/m2` Gesamtschneemenge Gesamtbewölkung
   <date>     <dbl>                     <dbl>             <dbl>           <dbl>
 1 2001-02-15  2001                       113                 0              17
 2 2001-02-19  2001                       117                 0              42
 3 2001-02-26  2001                       113                 0              13
 4 2001-02-27  2001                       118                 0              88
 5 2001-03-24  2001                       122                 0              88
 6 2001-03-25  2001                       112                 0              92
 7 2001-04-02  2001                       218                 0              83
 8 2001-04-05  2001                       188                 0              50
 9 2001-04-13  2001                       203                 0              63
10 2001-04-14  2001                       160                 0              79
# ℹ 8,003 more rows
# ℹ 7 more variables: `Luftdruck in hPa` <dbl>, Niederschlag <dbl>,
#   Sonnenscheindauer <dbl>, `Tagesmittel Lufttemperatur` <dbl>,
#   `Tagesminimum Lufttemperatur` <dbl>, `Tagesmaximum Lufttemperatur` <dbl>,
#   `Relative Luftfeuchtigkeit` <dbl>

Mühsam

Variablen-Namen - `{readr}`-Funktion

wetter<- read_delim(
  "data/ogd_12030.csv",
  col_names = c(
    "datum",
    "jahr",
    "globalstrahlung_in_w_m2",
    "gesamtschneemenge",
    "gesamtbewolkung",
    "luftdruck_in_h_pa",
    "niederschlag",
    "sonnenscheindauer",
    "tagesmittel_lufttemperatur",
    "tagesminimum_lufttemperatur",
    "tagesmaximum_lufttemperatur",
    "relative_luftfeuchtigkeit"
  )
)
names(wetter)

 [1] "datum"                       "jahr"                       
 [3] "globalstrahlung_in_w_m2"     "gesamtschneemenge"          
 [5] "gesamtbewolkung"             "luftdruck_in_h_pa"          
 [7] "niederschlag"                "sonnenscheindauer"          
 [9] "tagesmittel_lufttemperatur"  "tagesminimum_lufttemperatur"
[11] "tagesmaximum_lufttemperatur" "relative_luftfeuchtigkeit"

Auch mühsam

Variablen-Namen - `{janitor}`

# install.packages("janitor")
library(janitor)

wetter <- read_delim("data/ogd_12030.csv")

wetter |> 
  clean_names() |> 
  names()

 [1] "datum"                       "jahr"                       
 [3] "globalstrahlung_in_w_m2"     "gesamtschneemenge"          
 [5] "gesamtbewolkung"             "luftdruck_in_h_pa"          
 [7] "niederschlag"                "sonnenscheindauer"          
 [9] "tagesmittel_lufttemperatur"  "tagesminimum_lufttemperatur"
[11] "tagesmaximum_lufttemperatur" "relative_luftfeuchtigkeit"

Praktikum 04a: Daten importieren und exportieren

prak-04a-import-export.qmd

30:00

Break ☕ 🍵 🍜

10:00

Daten-Transformation mit `dplyr`

Zeilen: auswählen, anordnen

Spalten: aswählen, anordnen, umbenennen, erstellen

Gruppen: zusammenfassen, zählen

Tabellen: zusammenfügen

Daten zusammenfügen mit `dplyr`

Wir…

haben mehrere Dataframes

wollen diese zusammenbringen

Daten: Frauen in der Wissenschaft

Ada Lovelace Marie Curie Janaki Ammal Chien-Shiung Wu Katherine Johnson

Rosalind Franklin Vera Rubin Gladys West Flossie Wong-Staal Jennifer Doudna

name	profession
Ada Lovelace	Mathematician
Marie Curie	Physicist and Chemist
Janaki Ammal	Botanist
Chien-Shiung Wu	Physicist
Katherine Johnson	Mathematician
Rosalind Franklin	Chemist
Vera Rubin	Astronomer
Gladys West	Mathematician
Flossie Wong-Staal	Virologist and Molecular Biologist
Jennifer Doudna	Biochemist

name	birth_year	death_year
Janaki Ammal	1897	1984
Chien-Shiung Wu	1912	1997
Katherine Johnson	1918	2020
Rosalind Franklin	1920	1958
Vera Rubin	1928	2016
Gladys West	1930	NA
Flossie Wong-Staal	1947	NA
Jennifer Doudna	1964	NA

name	known_for
Ada Lovelace	first computer algorithm
Marie Curie	theory of radioactivity, discovery of elements polonium and radium, first woman to win a Nobel Prize
Janaki Ammal	hybrid species, biodiversity protection
Chien-Shiung Wu	confim and refine theory of radioactive beta decy, Wu experiment overturning theory of parity
Katherine Johnson	calculations of orbital mechanics critical to sending the first Americans into space
Vera Rubin	existence of dark matter
Gladys West	mathematical modeling of the shape of the Earth which served as the foundation of GPS technology
Flossie Wong-Staal	first scientist to clone HIV and create a map of its genes which led to a test for the virus
Jennifer Doudna	one of the primary developers of CRISPR, a ground-breaking technology for editing genomes

Gewünschter Output

name	profession	birth_year	death_year	known_for
Ada Lovelace	Mathematician	NA	NA	first computer algorithm
Marie Curie	Physicist and Chemist	NA	NA	theory of radioactivity, discovery of elements polonium and radium, first woman to win a Nobel Prize
Janaki Ammal	Botanist	1897	1984	hybrid species, biodiversity protection
Chien-Shiung Wu	Physicist	1912	1997	confim and refine theory of radioactive beta decy, Wu experiment overturning theory of parity
Katherine Johnson	Mathematician	1918	2020	calculations of orbital mechanics critical to sending the first Americans into space
Rosalind Franklin	Chemist	1920	1958	NA
Vera Rubin	Astronomer	1928	2016	existence of dark matter
Gladys West	Mathematician	1930	NA	mathematical modeling of the shape of the Earth which served as the foundation of GPS technology
Flossie Wong-Staal	Virologist and Molecular Biologist	1947	NA	first scientist to clone HIV and create a map of its genes which led to a test for the virus
Jennifer Doudna	Biochemist	1964	NA	one of the primary developers of CRISPR, a ground-breaking technology for editing genomes

Inputs: drei Dataframes

names(professions)

[1] "name"       "profession"

names(dates)

[1] "name"       "birth_year" "death_year"

names(works)

[1] "name"      "known_for"

nrow(professions)

[1] 10

nrow(dates)

[1] 8

nrow(works)

[1] 9

Dataframes zusammenfügen

***_join(x, y)

left_join(x, y): alle Reihen aus x
right_join(x, y): alle Reihen aus y
full_join(x, y): alle Reihen aus x und y
inner_join(x, y): gemeinsame Reihen aus x und y
semi_join(x, y): wie inner_join(x, y), nur Spalten aus x
anti_join(x, y): Reihen aus x ohne Übereinstimmung in y

Beispiel

Für die nächsten Folien

# A tibble: 3 × 2
     id var_x
  <dbl> <chr>
1     1 x1   
2     2 x2   
3     3 x3

# A tibble: 3 × 2
     id var_y
  <dbl> <chr>
1     1 y1   
2     2 y2   
3     4 y4

`left_join()`

left_join(x, y)

# A tibble: 3 × 3
     id var_x var_y
  <dbl> <chr> <chr>
1     1 x1    y1   
2     2 x2    y2   
3     3 x3    <NA>

`left_join()`

professions |>
  left_join(dates)

# A tibble: 10 × 4
   name               profession                         birth_year death_year
   <chr>              <chr>                                   <dbl>      <dbl>
 1 Ada Lovelace       Mathematician                              NA         NA
 2 Marie Curie        Physicist and Chemist                      NA         NA
 3 Janaki Ammal       Botanist                                 1897       1984
 4 Chien-Shiung Wu    Physicist                                1912       1997
 5 Katherine Johnson  Mathematician                            1918       2020
 6 Rosalind Franklin  Chemist                                  1920       1958
 7 Vera Rubin         Astronomer                               1928       2016
 8 Gladys West        Mathematician                            1930         NA
 9 Flossie Wong-Staal Virologist and Molecular Biologist       1947         NA
10 Jennifer Doudna    Biochemist                               1964         NA

`right_join()`

right_join(x, y)

# A tibble: 3 × 3
     id var_x var_y
  <dbl> <chr> <chr>
1     1 x1    y1   
2     2 x2    y2   
3     4 <NA>  y4

`right_join`

professions |>
  right_join(dates)

# A tibble: 8 × 4
  name               profession                         birth_year death_year
  <chr>              <chr>                                   <dbl>      <dbl>
1 Janaki Ammal       Botanist                                 1897       1984
2 Chien-Shiung Wu    Physicist                                1912       1997
3 Katherine Johnson  Mathematician                            1918       2020
4 Rosalind Franklin  Chemist                                  1920       1958
5 Vera Rubin         Astronomer                               1928       2016
6 Gladys West        Mathematician                            1930         NA
7 Flossie Wong-Staal Virologist and Molecular Biologist       1947         NA
8 Jennifer Doudna    Biochemist                               1964         NA

`full_join()`

full_join(x, y)

# A tibble: 4 × 3
     id var_x var_y
  <dbl> <chr> <chr>
1     1 x1    y1   
2     2 x2    y2   
3     3 x3    <NA> 
4     4 <NA>  y4

`full_join()`

dates |>
  full_join(works)

# A tibble: 10 × 4
   name               birth_year death_year known_for                           
   <chr>                   <dbl>      <dbl> <chr>                               
 1 Janaki Ammal             1897       1984 hybrid species, biodiversity protec…
 2 Chien-Shiung Wu          1912       1997 confim and refine theory of radioac…
 3 Katherine Johnson        1918       2020 calculations of orbital mechanics c…
 4 Rosalind Franklin        1920       1958 <NA>                                
 5 Vera Rubin               1928       2016 existence of dark matter            
 6 Gladys West              1930         NA mathematical modeling of the shape …
 7 Flossie Wong-Staal       1947         NA first scientist to clone HIV and cr…
 8 Jennifer Doudna          1964         NA one of the primary developers of CR…
 9 Ada Lovelace               NA         NA first computer algorithm            
10 Marie Curie                NA         NA theory of radioactivity,  discovery…

Alles in einer Code-Sequenz

professions |>
  left_join(dates) |>
  left_join(works)

# A tibble: 10 × 5
   name               profession                 birth_year death_year known_for
   <chr>              <chr>                           <dbl>      <dbl> <chr>    
 1 Ada Lovelace       Mathematician                      NA         NA first co…
 2 Marie Curie        Physicist and Chemist              NA         NA theory o…
 3 Janaki Ammal       Botanist                         1897       1984 hybrid s…
 4 Chien-Shiung Wu    Physicist                        1912       1997 confim a…
 5 Katherine Johnson  Mathematician                    1918       2020 calculat…
 6 Rosalind Franklin  Chemist                          1920       1958 <NA>     
 7 Vera Rubin         Astronomer                       1928       2016 existenc…
 8 Gladys West        Mathematician                    1930         NA mathemat…
 9 Flossie Wong-Staal Virologist and Molecular …       1947         NA first sc…
10 Jennifer Doudna    Biochemist                       1964         NA one of t…

`join_by()`

mitarbeiter <- tibble(
  id = c(1, 2, 3),
  name = c("Alice", "Bob", "Charlie")
)
mitarbeiter

# A tibble: 3 × 2
     id name   
  <dbl> <chr>  
1     1 Alice  
2     2 Bob    
3     3 Charlie

gehälter <- tibble(
  persid = c(1, 2, 4),
  gehalt = c(50000, 60000, 70000)
)
gehälter

# A tibble: 3 × 2
  persid gehalt
   <dbl>  <dbl>
1      1  50000
2      2  60000
3      4  70000

`join_by()`

ergebnis <- mitarbeiter |>
  left_join(gehälter, join_by(id == persid))
ergebnis

# A tibble: 3 × 3
     id name    gehalt
  <dbl> <chr>    <dbl>
1     1 Alice    50000
2     2 Bob      60000
3     3 Charlie     NA

Praktikum 04b: Daten zusammenfügen

prak-04b-join-firmen.qmd

20:00

Break ☕ 🍵 🍜

10:00

Tidy Data

“Alle glücklichen Familien gleichen einander, jede unglückliche Familie ist auf ihre eigene Weise unglücklich.” – Leo Tolstoy

“Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham

tidy = ordentlich, sauber, aufgeräumt.

Tidy Data

Jede Variable muss eine eigene Spalte haben
Jede Beobachtung muss eine eigene Zeile haben
Jeder Wert muss eine eigene Zelle haben

❓

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
Adelie	Torgersen	39.1	18.7	181	3750	male	2007
Adelie	Torgersen	39.5	17.4	186	3800	female	2007
Adelie	Torgersen	40.3	18.0	195	3250	female	2007
Adelie	Torgersen	NA	NA	NA	NA	NA	2007
Adelie	Torgersen	36.7	19.3	193	3450	female	2007
Adelie	Torgersen	39.3	20.6	190	3650	male	2007
Adelie	Torgersen	38.9	17.8	181	3625	female	2007
Adelie	Torgersen	39.2	19.6	195	4675	male	2007
Adelie	Torgersen	34.1	18.1	193	3475	NA	2007
Adelie	Torgersen	42.0	20.2	190	4250	NA	2007
Adelie	Torgersen	37.8	17.1	186	3300	NA	2007
Adelie	Torgersen	37.8	17.3	180	3700	NA	2007
Adelie	Torgersen	41.1	17.6	182	3200	female	2007
Adelie	Torgersen	38.6	21.2	191	3800	male	2007

❓

Variable Jahr als Zeile

❓

Variable Jahr als Zeile

Zeile als Zusammenfassung (Durchschnitt)

❓

Variable Jahr als Zeile

Zeile als Zusammenfassung (Durchschnitt)

3 Spalten für eine Variable

Daten mit `tidyr` aufräumen

Daten umformen/pivotieren (erweitern, verlängern)

Zellen teilen
Fehlende Werte (NA) behandeln

Daten pivotieren

Nicht das…

sondern das!

Daten pivotieren

wide |>
  pivot_longer(
    cols = x:z,
    names_to = "key",
    values_to = "val"
  )

long |>
  pivot_wider(
    names_from = key,
    values_from = val
  )

country	year	cases	population
Afghanistan	1999	745	19987071
Afghanistan	2000	2666	20595360
Brazil	1999	37737	172006362
Brazil	2000	80488	174504898
China	1999	212258	1272915272
China	2000	213766	1280428583

country	type	1999	2000
Afghanistan	cases	745	2666
Afghanistan	population	19987071	20595360
Brazil	cases	37737	80488
Brazil	population	172006362	174504898
China	cases	212258	213766
China	population	1272915272	1280428583

country	year	type	count
Afghanistan	1999	cases	745
Afghanistan	1999	population	19987071
Afghanistan	2000	cases	2666
Afghanistan	2000	population	20595360
Brazil	1999	cases	37737
Brazil	1999	population	172006362
Brazil	2000	cases	80488
Brazil	2000	population	174504898
China	1999	cases	212258
China	1999	population	1272915272
China	2000	cases	213766
China	2000	population	1280428583

country	year	rate
Afghanistan	1999	745/19987071
Afghanistan	2000	2666/20595360
Brazil	1999	37737/172006362
Brazil	2000	80488/174504898
China	1999	212258/1272915272
China	2000	213766/1280428583

country	year	cases	population
Afghanistan	1999	745	19987071
Afghanistan	2000	2666	20595360
Brazil	1999	37737	172006362
Brazil	2000	80488	174504898
China	1999	212258	1272915272
China	2000	213766	1280428583

table1 |>
  mutate(rate = cases / population * 10000)

# A tibble: 6 × 5
  country      year  cases population  rate
  <chr>       <dbl>  <dbl>      <dbl> <dbl>
1 Afghanistan  1999    745   19987071 0.373
2 Afghanistan  2000   2666   20595360 1.29 
3 Brazil       1999  37737  172006362 2.19 
4 Brazil       2000  80488  174504898 4.61 
5 China        1999 212258 1272915272 1.67 
6 China        2000 213766 1280428583 1.67

country	year	type	count
Afghanistan	1999	cases	745
Afghanistan	1999	population	19987071
Afghanistan	2000	cases	2666
Afghanistan	2000	population	20595360
Brazil	1999	cases	37737
Brazil	1999	population	172006362
Brazil	2000	cases	80488
Brazil	2000	population	174504898
China	1999	cases	212258
China	1999	population	1272915272
China	2000	cases	213766
China	2000	population	1280428583

table2 |>
  pivot_wider(
    names_from = type,
    values_from = count
  ) |>
  mutate(rate = cases / population * 10000)

# A tibble: 6 × 5
  country      year  cases population  rate
  <chr>       <dbl>  <dbl>      <dbl> <dbl>
1 Afghanistan  1999    745   19987071 0.373
2 Afghanistan  2000   2666   20595360 1.29 
3 Brazil       1999  37737  172006362 2.19 
4 Brazil       2000  80488  174504898 4.61 
5 China        1999 212258 1272915272 1.67 
6 China        2000 213766 1280428583 1.67

country	year	rate
Afghanistan	1999	745/19987071
Afghanistan	2000	2666/20595360
Brazil	1999	37737/172006362
Brazil	2000	80488/174504898
China	1999	212258/1272915272
China	2000	213766/1280428583

table3 |>
  separate_wider_delim(
    cols = rate,
    delim = "/",
    names = c("cases", "population")
  ) |>
  mutate(
    # separate_wider_delim() outputs character
    cases = as.numeric(cases),
    population = as.numeric(population),
    rate = cases / population * 10000
  )

# A tibble: 6 × 5
  country      year  cases population  rate
  <chr>       <dbl>  <dbl>      <dbl> <dbl>
1 Afghanistan  1999    745   19987071 0.373
2 Afghanistan  2000   2666   20595360 1.29 
3 Brazil       1999  37737  172006362 2.19 
4 Brazil       2000  80488  174504898 4.61 
5 China        1999 212258 1272915272 1.67 
6 China        2000 213766 1280428583 1.67

country	type	1999	2000
Afghanistan	cases	745	2666
Afghanistan	population	19987071	20595360
Brazil	cases	37737	80488
Brazil	population	172006362	174504898
China	cases	212258	213766
China	population	1272915272	1280428583

❓

table4 |> 
  pivot_longer(cols = `1999`:`2000`, names_to = "year")

# A tibble: 12 × 4
   country     type       year       value
   <chr>       <chr>      <chr>      <dbl>
 1 Afghanistan cases      1999         745
 2 Afghanistan cases      2000        2666
 3 Afghanistan population 1999    19987071
 4 Afghanistan population 2000    20595360
 5 Brazil      cases      1999       37737
 6 Brazil      cases      2000       80488
 7 Brazil      population 1999   172006362
 8 Brazil      population 2000   174504898
 9 China       cases      1999      212258
10 China       cases      2000      213766
11 China       population 1999  1272915272
12 China       population 2000  1280428583

country	type	1999	2000
Afghanistan	cases	745	2666
Afghanistan	population	19987071	20595360
Brazil	cases	37737	80488
Brazil	population	172006362	174504898
China	cases	212258	213766
China	population	1272915272	1280428583

table4 |> 
  pivot_longer(cols = `1999`:`2000`, names_to = "year") |> 
  pivot_wider(names_from = type, values_from = value) |> 
  mutate(rate = cases / population * 10000)

# A tibble: 6 × 5
  country     year   cases population  rate
  <chr>       <chr>  <dbl>      <dbl> <dbl>
1 Afghanistan 1999     745   19987071 0.373
2 Afghanistan 2000    2666   20595360 1.29 
3 Brazil      1999   37737  172006362 2.19 
4 Brazil      2000   80488  174504898 4.61 
5 China       1999  212258 1272915272 1.67 
6 China       2000  213766 1280428583 1.67

Praktikum 04c: Daten pivotieren

prak-04c-pivot.qmd

30:00

Workflow: Code-Style

R (1993) \(\leftarrow\) S (1976) \(\leftarrow\) APL (1962)

The tidyverse Style Guide

“Ein guter Kodierungsstil ist wie eine korrekte Zeichensetzung: Man kann auch ohne sie auskommen, abersiemachtalleseinfacherzulesen.” – Hadley Wickham

# Good
df |>
  mutate(
    sum_xy = x + y,
    prod_xy = x * y
  ) |>
  arrange(sum_xy)

# Bad
df|>mutate(  sum_xy=x+y,prod_xy=x*y)|>arrange( sum_xy )

`styler`

install.packages("styler")

Praktikum: Code-Style

Den Code in eine neue Quarto-Datei formatieren:

library( palmerpenguins )
library(tidyverse   )

penguins|>filter( species=="Adelie" )|>group_by(island)|>summarize(n=n(),mean_bill=
mean(bill_length_mm,na.rm=TRUE))|>filter(n>10)

penguins|>filter(   species=="Chinstrap",island%in%c("Dream","Biscoe"),flipper_length_mm>190,
body_mass_g<4000)|>group_by(sex)|>summarize(
mean_mass=mean(body_mass_g,na.rm=TRUE),count=n())|>filter(count>5)

10:00

Danke! 🌔

Slides created via revealjs and Quarto.

Access slides as PDF.

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.

Daten importieren, exportieren, zusammenfügen und pivotieren

Ziele für heute

Daten importieren und exportieren

Rechteckige Daten importieren

Rechteckige Daten importieren

readr

readxl

Rechteckige Daten exportieren

Rechteckige Daten exportieren

readr

writexl

Daten lesen

Daten schreiben

Daten wieder einlesen

rio

Andere Formate

Variablen-Namen

Variablen-Namen - Backticks `

Variablen-Namen - {readr}-Funktion

Variablen-Namen - {janitor}

Praktikum 04a: Daten importieren und exportieren

Break ☕ 🍵 🍜

Daten-Transformation mit dplyr

Daten zusammenfügen mit dplyr

Daten: Frauen in der Wissenschaft

Inputs: drei Dataframes

Gewünschter Output

Inputs: drei Dataframes

Dataframes zusammenfügen

Beispiel

left_join()

left_join()

right_join()

right_join

full_join()

full_join()

Alles in einer Code-Sequenz

join_by()

join_by()

Praktikum 04b: Daten zusammenfügen

Break ☕ 🍵 🍜

Tidy Data

Tidy Data

Tidy Data

Daten mit tidyr aufräumen

Daten pivotieren

Nicht das…

sondern das!

Daten pivotieren

Praktikum 04c: Daten pivotieren

Workflow: Code-Style

The tidyverse Style Guide

styler

Praktikum: Code-Style

Danke! 🌔

`readr`

`readxl`

`readr`

`writexl`

`rio`

Variablen-Namen - `{readr}`-Funktion

Variablen-Namen - `{janitor}`

Daten-Transformation mit `dplyr`

Daten zusammenfügen mit `dplyr`

`left_join()`

`left_join()`

`right_join()`

`right_join`

`full_join()`

`full_join()`

`join_by()`

`join_by()`

Daten mit `tidyr` aufräumen

`styler`