+ - 0:00:00
Notes for current slide
Notes for next slide

Data analysis II

Webscraping: Exercize

1 / 2

Laurent Bergé

University of Bordeaux, BxSE

Fall 2022

Exercize

Use the Google doc to obtain all the addresses of the websites created in class.

For all the webpages:

  • extract the content of the highest header. (h1 is the highest header. If h1 is missing, then take the h2, etc. If there's no header, report that the header is missing. If there are duplicates, keep all content.)
  • format the resulting content in a data.frame with one column for the website url, and one column for the content of the header.
2 / 2

Exercize

Use the Google doc to obtain all the addresses of the websites created in class.

For all the webpages:

  • extract the content of the highest header. (h1 is the highest header. If h1 is missing, then take the h2, etc. If there's no header, report that the header is missing. If there are duplicates, keep all content.)
  • format the resulting content in a data.frame with one column for the website url, and one column for the content of the header.
2 / 2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow