class: center, middle, inverse, title-slide .title[ # Data analysis II ] .subtitle[ ## Webscraping: Exercize ] .author[ ### Laurent Bergé ] .institute[ ### University of Bordeaux, BxSE ] .date[ ### Fall 2022 ] --- # Exercize Use the [Google doc](https://docs.google.com/document/d/1lp7zB1Ael7iYAUTUmB-u2bl7AXqVeggtRB97xU2mFrs/edit) to obtain all the addresses of the websites created in class. For all the webpages: - extract the content of the highest header. (**h1** is the highest header. If **h1** is missing, then take the **h2**, etc. If there's no header, report that the header is missing. If there are duplicates, keep all content.) - format the resulting content in a `data.frame` with one column for the website url, and one column for the content of the header.