Web scraping

Dr. Mine Dogucu

A Brief Introduction to HTML & CSS

Hypertext Markup Language

Cascading Style Sheets

An ugly web page

HTML document outline

Paragraphs

<a href="https://www.r-project.org/">R</a>

<a> </a>

HTML tag

href

attribute (name)

https://www.r-project.org/

attribute (value)

R

content

Spans

Styling

Web Scraping

library(rvest)
library(tidyverse)

rvest package has several functions that help with web scraping.

robotstxt::paths_allowed("http://www.criterion.com")
[1] TRUE
robotstxt::paths_allowed("http://www.facebook.com")
[1] FALSE

What we see

What we want

The first five rows of the .csv file we would like to achieve.

Selector Gadget

Selector Gadget extension can help us select CSS classes.

We should use .R files as opposed to .qmd.

Criterion Collection is a video distribution company that is known for “important movies”.

page <- read_html("https://www.criterion.com/shop/browse/list?sort=spine_number")

Even though it is hard to see all the data we want is stored in the object called page.

page
{html_document}
<html class="no-js" lang="en">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
[2] <body class="criterion-collection page__storebrowsing default wildsand__b ...

Scraping Spine Numbers

page |> 
  html_nodes(".g-spine")
{xml_nodeset (1660)}
 [1] <th class="g-spine">Spine #</th>
 [2] <td class="g-spine">\n1\n</td>
 [3] <td class="g-spine">\n2\n</td>
 [4] <td class="g-spine">\n3\n</td>
 [5] <td class="g-spine">\n4\n</td>
 [6] <td class="g-spine">\n5\n</td>
 [7] <td class="g-spine">\n6\n</td>
 [8] <td class="g-spine">\n7\n</td>
 [9] <td class="g-spine">\n8\n</td>
[10] <td class="g-spine">\n9\n</td>
[11] <td class="g-spine">\n10\n</td>
[12] <td class="g-spine">\n11\n</td>
[13] <td class="g-spine">\n12\n</td>
[14] <td class="g-spine">\n13\n</td>
[15] <td class="g-spine">\n14\n</td>
[16] <td class="g-spine">\n15\n</td>
[17] <td class="g-spine">\n16\n</td>
[18] <td class="g-spine">\n17\n</td>
[19] <td class="g-spine">\n18\n</td>
[20] <td class="g-spine">\n19\n</td>
...

Scraping Spine Numbers

page  |>  
  html_nodes(".g-spine") |> 
  html_text() 
   [1] "Spine #"  "\n1\n"    "\n2\n"    "\n3\n"    "\n4\n"    "\n5\n"   
   [7] "\n6\n"    "\n7\n"    "\n8\n"    "\n9\n"    "\n10\n"   "\n11\n"  
  [13] "\n12\n"   "\n13\n"   "\n14\n"   "\n15\n"   "\n16\n"   "\n17\n"  
  [19] "\n18\n"   "\n19\n"   "\n20\n"   "\n21\n"   "\n22\n"   "\n23\n"  
  [25] "\n24\n"   "\n25\n"   "\n26\n"   "\n27\n"   "\n28\n"   "\n29\n"  
  [31] "\n30\n"   "\n31\n"   "\n32\n"   "\n33\n"   "\n34\n"   "\n35\n"  
  [37] "\n36\n"   "\n37\n"   "\n38\n"   "\n39\n"   "\n40\n"   "\n41\n"  
  [43] "\n42\n"   "\n43\n"   "\n44\n"   "\n45\n"   "\n46\n"   "\n47\n"  
  [49] "\n48\n"   "\n49\n"   "\n50\n"   "\n51\n"   "\n52\n"   "\n53\n"  
  [55] "\n54\n"   "\n55\n"   "\n56\n"   "\n57\n"   "\n58\n"   "\n59\n"  
  [61] "\n60\n"   "\n61\n"   "\n62\n"   "\n63\n"   "\n64\n"   "\n65\n"  
  [67] "\n66\n"   "\n67\n"   "\n68\n"   "\n69\n"   "\n70\n"   "\n71\n"  
  [73] "\n72\n"   "\n73\n"   "\n74\n"   "\n75\n"   "\n76\n"   "\n77\n"  
  [79] "\n78\n"   "\n79\n"   "\n80\n"   "\n81\n"   "\n82\n"   "\n83\n"  
  [85] "\n84\n"   "\n85\n"   "\n86\n"   "\n87\n"   "\n88\n"   "\n88\n"  
  [91] "\n89\n"   "\n90\n"   "\n91\n"   "\n92\n"   "\n93\n"   "\n94\n"  
  [97] "\n95\n"   "\n96\n"   "\n97\n"   "\n98\n"   "\n99\n"   "\n100\n" 
 [103] "\n101\n"  "\n102\n"  "\n103\n"  "\n104\n"  "\n105\n"  "\n106\n" 
 [109] "\n107\n"  "\n108\n"  "\n109\n"  "\n110\n"  "\n111\n"  "\n112\n" 
 [115] "\n113\n"  "\n114\n"  "\n115\n"  "\n116\n"  "\n117\n"  "\n118\n" 
 [121] "\n119\n"  "\n120\n"  "\n121\n"  "\n122\n"  "\n123\n"  "\n124\n" 
 [127] "\n125\n"  "\n126\n"  "\n127\n"  "\n128\n"  "\n129\n"  "\n130\n" 
 [133] "\n131\n"  "\n132\n"  "\n133\n"  "\n134\n"  "\n135\n"  "\n136\n" 
 [139] "\n137\n"  "\n138\n"  "\n139\n"  "\n140\n"  "\n141\n"  "\n142\n" 
 [145] "\n143\n"  "\n144\n"  "\n145\n"  "\n146\n"  "\n147\n"  "\n148\n" 
 [151] "\n149\n"  "\n150\n"  "\n151\n"  "\n152\n"  "\n153\n"  "\n154\n" 
 [157] "\n155\n"  "\n156\n"  "\n157\n"  "\n158\n"  "\n159\n"  "\n160\n" 
 [163] "\n161\n"  "\n162\n"  "\n163\n"  "\n164\n"  "\n165\n"  "\n166\n" 
 [169] "\n167\n"  "\n168\n"  "\n169\n"  "\n170\n"  "\n171\n"  "\n172\n" 
 [175] "\n173\n"  "\n174\n"  "\n175\n"  "\n176\n"  "\n177\n"  "\n178\n" 
 [181] "\n179\n"  "\n180\n"  "\n181\n"  "\n182\n"  "\n183\n"  "\n184\n" 
 [187] "\n185\n"  "\n186\n"  "\n187\n"  "\n188\n"  "\n189\n"  "\n190\n" 
 [193] "\n191\n"  "\n192\n"  "\n193\n"  "\n194\n"  "\n195\n"  "\n196\n" 
 [199] "\n197\n"  "\n198\n"  "\n199\n"  "\n200\n"  "\n201\n"  "\n202\n" 
 [205] "\n203\n"  "\n204\n"  "\n205\n"  "\n206\n"  "\n207\n"  "\n208\n" 
 [211] "\n209\n"  "\n210\n"  "\n211\n"  "\n213\n"  "\n214\n"  "\n215\n" 
 [217] "\n216\n"  "\n217\n"  "\n218\n"  "\n219\n"  "\n220\n"  "\n221\n" 
 [223] "\n222\n"  "\n223\n"  "\n224\n"  "\n225\n"  "\n226\n"  "\n227\n" 
 [229] "\n228\n"  "\n229\n"  "\n230\n"  "\n231\n"  "\n232\n"  "\n233\n" 
 [235] "\n234\n"  "\n235\n"  "\n236\n"  "\n237\n"  "\n238\n"  "\n239\n" 
 [241] "\n240\n"  "\n241\n"  "\n242\n"  "\n243\n"  "\n244\n"  "\n245\n" 
 [247] "\n246\n"  "\n247\n"  "\n248\n"  "\n249\n"  "\n250\n"  "\n251\n" 
 [253] "\n252\n"  "\n253\n"  "\n254\n"  "\n255\n"  "\n256\n"  "\n257\n" 
 [259] "\n258\n"  "\n259\n"  "\n260\n"  "\n261\n"  "\n262\n"  "\n263\n" 
 [265] "\n264\n"  "\n265\n"  "\n266\n"  "\n267\n"  "\n268\n"  "\n269\n" 
 [271] "\n270\n"  "\n271\n"  "\n272\n"  "\n273\n"  "\n274\n"  "\n275\n" 
 [277] "\n276\n"  "\n277\n"  "\n278\n"  "\n279\n"  "\n280\n"  "\n281\n" 
 [283] "\n282\n"  "\n283\n"  "\n284\n"  "\n285\n"  "\n286\n"  "\n287\n" 
 [289] "\n288\n"  "\n289\n"  "\n290\n"  "\n291\n"  "\n292\n"  "\n293\n" 
 [295] "\n294\n"  "\n295\n"  "\n296\n"  "\n297\n"  "\n298\n"  "\n299\n" 
 [301] "\n300\n"  "\n301\n"  "\n302\n"  "\n303\n"  "\n304\n"  "\n305\n" 
 [307] "\n306\n"  "\n307\n"  "\n308\n"  "\n309\n"  "\n310\n"  "\n311\n" 
 [313] "\n312\n"  "\n313\n"  "\n314\n"  "\n315\n"  "\n316\n"  "\n317\n" 
 [319] "\n318\n"  "\n319\n"  "\n320\n"  "\n321\n"  "\n322\n"  "\n323\n" 
 [325] "\n324\n"  "\n325\n"  "\n326\n"  "\n327\n"  "\n328\n"  "\n329\n" 
 [331] "\n330\n"  "\n331\n"  "\n332\n"  "\n333\n"  "\n334\n"  "\n335\n" 
 [337] "\n336\n"  "\n337\n"  "\n338\n"  "\n339\n"  "\n340\n"  "\n341\n" 
 [343] "\n342\n"  "\n343\n"  "\n344\n"  "\n345\n"  "\n346\n"  "\n347\n" 
 [349] "\n348\n"  "\n349\n"  "\n350\n"  "\n351\n"  "\n352\n"  "\n353\n" 
 [355] "\n354\n"  "\n355\n"  "\n356\n"  "\n357\n"  "\n358\n"  "\n359\n" 
 [361] "\n360\n"  "\n361\n"  "\n362\n"  "\n363\n"  "\n364\n"  "\n365\n" 
 [367] "\n366\n"  "\n367\n"  "\n368\n"  "\n369\n"  "\n370\n"  "\n371\n" 
 [373] "\n372\n"  "\n373\n"  "\n374\n"  "\n375\n"  "\n376\n"  "\n377\n" 
 [379] "\n378\n"  "\n379\n"  "\n380\n"  "\n381\n"  "\n382\n"  "\n383\n" 
 [385] "\n384\n"  "\n385\n"  "\n386\n"  "\n387\n"  "\n388\n"  "\n389\n" 
 [391] "\n390\n"  "\n391\n"  "\n392\n"  "\n393\n"  "\n394\n"  "\n395\n" 
 [397] "\n396\n"  "\n397\n"  "\n398\n"  "\n399\n"  "\n400\n"  "\n401\n" 
 [403] "\n402\n"  "\n403\n"  "\n404\n"  "\n405\n"  "\n406\n"  "\n407\n" 
 [409] "\n408\n"  "\n409\n"  "\n410\n"  "\n411\n"  "\n412\n"  "\n413\n" 
 [415] "\n414\n"  "\n415\n"  "\n416\n"  "\n417\n"  "\n418\n"  "\n419\n" 
 [421] "\n420\n"  "\n421\n"  "\n422\n"  "\n423\n"  "\n424\n"  "\n425\n" 
 [427] "\n426\n"  "\n427\n"  "\n428\n"  "\n429\n"  "\n430\n"  "\n431\n" 
 [433] "\n432\n"  "\n433\n"  "\n434\n"  "\n435\n"  "\n436\n"  "\n437\n" 
 [439] "\n438\n"  "\n439\n"  "\n440\n"  "\n441\n"  "\n442\n"  "\n443\n" 
 [445] "\n444\n"  "\n445\n"  "\n446\n"  "\n447\n"  "\n448\n"  "\n449\n" 
 [451] "\n450\n"  "\n451\n"  "\n452\n"  "\n453\n"  "\n454\n"  "\n455\n" 
 [457] "\n456\n"  "\n457\n"  "\n458\n"  "\n459\n"  "\n460\n"  "\n461\n" 
 [463] "\n462\n"  "\n463\n"  "\n464\n"  "\n465\n"  "\n466\n"  "\n467\n" 
 [469] "\n468\n"  "\n469\n"  "\n470\n"  "\n471\n"  "\n472\n"  "\n473\n" 
 [475] "\n474\n"  "\n475\n"  "\n476\n"  "\n477\n"  "\n478\n"  "\n479\n" 
 [481] "\n480\n"  "\n481\n"  "\n482\n"  "\n483\n"  "\n484\n"  "\n485\n" 
 [487] "\n486\n"  "\n487\n"  "\n488\n"  "\n489\n"  "\n490\n"  "\n491\n" 
 [493] "\n492\n"  "\n493\n"  "\n494\n"  "\n495\n"  "\n496\n"  "\n497\n" 
 [499] "\n498\n"  "\n499\n"  "\n500\n"  "\n501\n"  "\n502\n"  "\n503\n" 
 [505] "\n504\n"  "\n505\n"  "\n506\n"  "\n507\n"  "\n508\n"  "\n509\n" 
 [511] "\n510\n"  "\n511\n"  "\n512\n"  "\n513\n"  "\n514\n"  "\n515\n" 
 [517] "\n516\n"  "\n517\n"  "\n518\n"  "\n519\n"  "\n520\n"  "\n521\n" 
 [523] "\n522\n"  "\n523\n"  "\n524\n"  "\n525\n"  "\n526\n"  "\n527\n" 
 [529] "\n528\n"  "\n529\n"  "\n530\n"  "\n531\n"  "\n532\n"  "\n533\n" 
 [535] "\n534\n"  "\n535\n"  "\n536\n"  "\n537\n"  "\n538\n"  "\n539\n" 
 [541] "\n540\n"  "\n541\n"  "\n542\n"  "\n543\n"  "\n544\n"  "\n545\n" 
 [547] "\n546\n"  "\n547\n"  "\n548\n"  "\n549\n"  "\n550\n"  "\n551\n" 
 [553] "\n552\n"  "\n553\n"  "\n554\n"  "\n555\n"  "\n556\n"  "\n557\n" 
 [559] "\n558\n"  "\n559\n"  "\n560\n"  "\n561\n"  "\n562\n"  "\n563\n" 
 [565] "\n564\n"  "\n565\n"  "\n566\n"  "\n567\n"  "\n568\n"  "\n569\n" 
 [571] "\n570\n"  "\n571\n"  "\n572\n"  "\n573\n"  "\n574\n"  "\n575\n" 
 [577] "\n576\n"  "\n577\n"  "\n578\n"  "\n579\n"  "\n580\n"  "\n581\n" 
 [583] "\n582\n"  "\n583\n"  "\n584\n"  "\n585\n"  "\n586\n"  "\n587\n" 
 [589] "\n588\n"  "\n589\n"  "\n590\n"  "\n591\n"  "\n592\n"  "\n593\n" 
 [595] "\n594\n"  "\n595\n"  "\n596\n"  "\n597\n"  "\n598\n"  "\n599\n" 
 [601] "\n600\n"  "\n601\n"  "\n602\n"  "\n603\n"  "\n604\n"  "\n605\n" 
 [607] "\n606\n"  "\n607\n"  "\n608\n"  "\n609\n"  "\n610\n"  "\n611\n" 
 [613] "\n612\n"  "\n613\n"  "\n614\n"  "\n615\n"  "\n616\n"  "\n617\n" 
 [619] "\n618\n"  "\n619\n"  "\n620\n"  "\n621\n"  "\n622\n"  "\n623\n" 
 [625] "\n624\n"  "\n625\n"  "\n626\n"  "\n627\n"  "\n628\n"  "\n629\n" 
 [631] "\n630\n"  "\n631\n"  "\n632\n"  "\n633\n"  "\n634\n"  "\n635\n" 
 [637] "\n636\n"  "\n637\n"  "\n638\n"  "\n639\n"  "\n640\n"  "\n641\n" 
 [643] "\n642\n"  "\n643\n"  "\n644\n"  "\n645\n"  "\n646\n"  "\n647\n" 
 [649] "\n648\n"  "\n649\n"  "\n650\n"  "\n651\n"  "\n652\n"  "\n653\n" 
 [655] "\n654\n"  "\n655\n"  "\n656\n"  "\n657\n"  "\n658\n"  "\n659\n" 
 [661] "\n660\n"  "\n661\n"  "\n662\n"  "\n663\n"  "\n664\n"  "\n665\n" 
 [667] "\n666\n"  "\n667\n"  "\n668\n"  "\n669\n"  "\n670\n"  "\n671\n" 
 [673] "\n672\n"  "\n673\n"  "\n674\n"  "\n675\n"  "\n676\n"  "\n677\n" 
 [679] "\n678\n"  "\n679\n"  "\n680\n"  "\n681\n"  "\n682\n"  "\n683\n" 
 [685] "\n684\n"  "\n685\n"  "\n686\n"  "\n687\n"  "\n688\n"  "\n689\n" 
 [691] "\n690\n"  "\n691\n"  "\n692\n"  "\n693\n"  "\n694\n"  "\n695\n" 
 [697] "\n696\n"  "\n697\n"  "\n698\n"  "\n699\n"  "\n700\n"  "\n701\n" 
 [703] "\n702\n"  "\n703\n"  "\n704\n"  "\n705\n"  "\n706\n"  "\n707\n" 
 [709] "\n708\n"  "\n709\n"  "\n710\n"  "\n711\n"  "\n712\n"  "\n713\n" 
 [715] "\n714\n"  "\n715\n"  "\n716\n"  "\n717\n"  "\n718\n"  "\n719\n" 
 [721] "\n720\n"  "\n721\n"  "\n722\n"  "\n723\n"  "\n724\n"  "\n725\n" 
 [727] "\n726\n"  "\n727\n"  "\n728\n"  "\n729\n"  "\n730\n"  "\n731\n" 
 [733] "\n732\n"  "\n733\n"  "\n734\n"  "\n735\n"  "\n736\n"  "\n737\n" 
 [739] "\n738\n"  "\n739\n"  "\n740\n"  "\n741\n"  "\n742\n"  "\n743\n" 
 [745] "\n744\n"  "\n745\n"  "\n746\n"  "\n747\n"  "\n748\n"  "\n749\n" 
 [751] "\n750\n"  "\n751\n"  "\n752\n"  "\n753\n"  "\n754\n"  "\n755\n" 
 [757] "\n756\n"  "\n757\n"  "\n758\n"  "\n759\n"  "\n760\n"  "\n761\n" 
 [763] "\n762\n"  "\n763\n"  "\n764\n"  "\n765\n"  "\n766\n"  "\n767\n" 
 [769] "\n768\n"  "\n769\n"  "\n770\n"  "\n771\n"  "\n772\n"  "\n773\n" 
 [775] "\n774\n"  "\n775\n"  "\n776\n"  "\n777\n"  "\n778\n"  "\n779\n" 
 [781] "\n780\n"  "\n781\n"  "\n782\n"  "\n783\n"  "\n784\n"  "\n785\n" 
 [787] "\n786\n"  "\n787\n"  "\n788\n"  "\n789\n"  "\n790\n"  "\n791\n" 
 [793] "\n792\n"  "\n793\n"  "\n794\n"  "\n795\n"  "\n796\n"  "\n797\n" 
 [799] "\n798\n"  "\n799\n"  "\n800\n"  "\n801\n"  "\n802\n"  "\n803\n" 
 [805] "\n804\n"  "\n805\n"  "\n806\n"  "\n807\n"  "\n808\n"  "\n809\n" 
 [811] "\n810\n"  "\n811\n"  "\n812\n"  "\n813\n"  "\n814\n"  "\n815\n" 
 [817] "\n816\n"  "\n817\n"  "\n818\n"  "\n819\n"  "\n820\n"  "\n821\n" 
 [823] "\n822\n"  "\n823\n"  "\n824\n"  "\n825\n"  "\n826\n"  "\n827\n" 
 [829] "\n828\n"  "\n829\n"  "\n830\n"  "\n831\n"  "\n832\n"  "\n833\n" 
 [835] "\n834\n"  "\n835\n"  "\n836\n"  "\n837\n"  "\n838\n"  "\n839\n" 
 [841] "\n840\n"  "\n841\n"  "\n842\n"  "\n843\n"  "\n844\n"  "\n845\n" 
 [847] "\n846\n"  "\n847\n"  "\n848\n"  "\n849\n"  "\n850\n"  "\n851\n" 
 [853] "\n852\n"  "\n853\n"  "\n854\n"  "\n855\n"  "\n856\n"  "\n857\n" 
 [859] "\n858\n"  "\n859\n"  "\n860\n"  "\n861\n"  "\n862\n"  "\n863\n" 
 [865] "\n864\n"  "\n865\n"  "\n866\n"  "\n867\n"  "\n868\n"  "\n869\n" 
 [871] "\n870\n"  "\n871\n"  "\n872\n"  "\n873\n"  "\n874\n"  "\n875\n" 
 [877] "\n876\n"  "\n877\n"  "\n878\n"  "\n879\n"  "\n880\n"  "\n881\n" 
 [883] "\n882\n"  "\n883\n"  "\n884\n"  "\n885\n"  "\n886\n"  "\n887\n" 
 [889] "\n888\n"  "\n889\n"  "\n890\n"  "\n891\n"  "\n892\n"  "\n893\n" 
 [895] "\n894\n"  "\n895\n"  "\n896\n"  "\n897\n"  "\n898\n"  "\n899\n" 
 [901] "\n900\n"  "\n901\n"  "\n902\n"  "\n903\n"  "\n904\n"  "\n905\n" 
 [907] "\n906\n"  "\n907\n"  "\n908\n"  "\n909\n"  "\n910\n"  "\n911\n" 
 [913] "\n912\n"  "\n913\n"  "\n914\n"  "\n915\n"  "\n916\n"  "\n917\n" 
 [919] "\n918\n"  "\n919\n"  "\n920\n"  "\n921\n"  "\n922\n"  "\n923\n" 
 [925] "\n924\n"  "\n925\n"  "\n926\n"  "\n927\n"  "\n928\n"  "\n929\n" 
 [931] "\n930\n"  "\n931\n"  "\n932\n"  "\n933\n"  "\n934\n"  "\n935\n" 
 [937] "\n936\n"  "\n937\n"  "\n938\n"  "\n939\n"  "\n940\n"  "\n941\n" 
 [943] "\n942\n"  "\n943\n"  "\n944\n"  "\n945\n"  "\n946\n"  "\n947\n" 
 [949] "\n948\n"  "\n949\n"  "\n950\n"  "\n951\n"  "\n952\n"  "\n953\n" 
 [955] "\n954\n"  "\n955\n"  "\n956\n"  "\n957\n"  "\n958\n"  "\n959\n" 
 [961] "\n960\n"  "\n961\n"  "\n962\n"  "\n963\n"  "\n964\n"  "\n965\n" 
 [967] "\n966\n"  "\n967\n"  "\n968\n"  "\n969\n"  "\n970\n"  "\n971\n" 
 [973] "\n972\n"  "\n973\n"  "\n974\n"  "\n975\n"  "\n976\n"  "\n977\n" 
 [979] "\n978\n"  "\n979\n"  "\n980\n"  "\n981\n"  "\n982\n"  "\n983\n" 
 [985] "\n984\n"  "\n985\n"  "\n986\n"  "\n987\n"  "\n988\n"  "\n989\n" 
 [991] "\n990\n"  "\n991\n"  "\n992\n"  "\n993\n"  "\n994\n"  "\n995\n" 
 [997] "\n996\n"  "\n997\n"  "\n998\n"  "\n999\n"  "\n1000\n" "\n1001\n"
[1003] "\n1002\n" "\n1003\n" "\n1004\n" "\n1005\n" "\n1006\n" "\n1007\n"
[1009] "\n1008\n" "\n1009\n" "\n1010\n" "\n1011\n" "\n1012\n" "\n1013\n"
[1015] "\n1014\n" "\n1015\n" "\n1016\n" "\n1017\n" "\n1018\n" "\n1019\n"
[1021] "\n1020\n" "\n1021\n" "\n1022\n" "\n1023\n" "\n1024\n" "\n1025\n"
[1027] "\n1026\n" "\n1027\n" "\n1028\n" "\n1029\n" "\n1030\n" "\n1031\n"
[1033] "\n1032\n" "\n1033\n" "\n1034\n" "\n1035\n" "\n1036\n" "\n1037\n"
[1039] "\n1038\n" "\n1039\n" "\n1040\n" "\n1041\n" "\n1042\n" "\n1043\n"
[1045] "\n1044\n" "\n1045\n" "\n1046\n" "\n1047\n" "\n1048\n" "\n1049\n"
[1051] "\n1050\n" "\n1051\n" "\n1052\n" "\n1053\n" "\n1054\n" "\n1055\n"
[1057] "\n1056\n" "\n1057\n" "\n1058\n" "\n1059\n" "\n1060\n" "\n1061\n"
[1063] "\n1062\n" "\n1063\n" "\n1064\n" "\n1065\n" "\n1066\n" "\n1067\n"
[1069] "\n1068\n" "\n1069\n" "\n1070\n" "\n1071\n" "\n1072\n" "\n1073\n"
[1075] "\n1074\n" "\n1075\n" "\n1076\n" "\n1077\n" "\n1078\n" "\n1079\n"
[1081] "\n1080\n" "\n1081\n" "\n1082\n" "\n1083\n" "\n1084\n" "\n1085\n"
[1087] "\n1086\n" "\n1087\n" "\n1088\n" "\n1089\n" "\n1090\n" "\n1091\n"
[1093] "\n1092\n" "\n1093\n" "\n1094\n" "\n1095\n" "\n1096\n" "\n1097\n"
[1099] "\n1098\n" "\n1099\n" "\n1100\n" "\n1101\n" "\n1102\n" "\n1103\n"
[1105] "\n1104\n" "\n1105\n" "\n1106\n" "\n1107\n" "\n1108\n" "\n1109\n"
[1111] "\n1110\n" "\n1111\n" "\n1112\n" "\n1113\n" "\n1114\n" "\n1115\n"
[1117] "\n1116\n" "\n1117\n" "\n1118\n" "\n1119\n" "\n1120\n" "\n1121\n"
[1123] "\n1122\n" "\n1123\n" "\n1124\n" "\n1125\n" "\n1126\n" "\n1127\n"
[1129] "\n1128\n" "\n1129\n" "\n1130\n" "\n1131\n" "\n1132\n" "\n1133\n"
[1135] "\n1134\n" "\n1135\n" "\n1136\n" "\n1137\n" "\n1138\n" "\n1139\n"
[1141] "\n1140\n" "\n1141\n" "\n1142\n" "\n1143\n" "\n1144\n" "\n1145\n"
[1147] "\n1146\n" "\n1147\n" "\n1148\n" "\n1149\n" "\n1150\n" "\n1151\n"
[1153] "\n1152\n" "\n1153\n" "\n1154\n" "\n1155\n" "\n1156\n" "\n1157\n"
[1159] "\n1158\n" "\n1159\n" "\n1160\n" "\n1161\n" "\n1162\n" "\n1163\n"
[1165] "\n1164\n" "\n1165\n" "\n1166\n" "\n1167\n" "\n1168\n" "\n1169\n"
[1171] "\n1170\n" "\n1171\n" "\n1172\n" "\n1173\n" "\n1174\n" "\n1175\n"
[1177] "\n1176\n" "\n1177\n" "\n1178\n" "\n1179\n" "\n1180\n" "\n1181\n"
[1183] "\n1182\n" "\n1183\n" "\n1184\n" "\n1185\n" "\n1186\n" "\n1187\n"
[1189] "\n1188\n" "\n1189\n" "\n1190\n" "\n1191\n" "\n1192\n" "\n1193\n"
[1195] "\n1194\n" "\n1195\n" "\n1196\n" "\n1197\n" "\n1198\n" "\n1199\n"
[1201] "\n1200\n" "\n1201\n" "\n1202\n" "\n1203\n" "\n1204\n" "\n1205\n"
[1207] "\n1206\n" "\n1207\n" "\n1208\n" "\n1209\n" "\n1210\n" "\n1211\n"
[1213] "\n1212\n" "\n1213\n" "\n1214\n" "\n1215\n" "\n1216\n" "\n"      
[1219] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1225] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1231] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1237] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1243] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1249] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1255] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1261] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1267] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1273] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1279] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1285] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1291] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1297] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1303] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1309] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1315] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1321] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1327] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1333] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1339] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1345] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1351] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1357] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1363] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1369] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1375] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1381] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1387] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1393] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1399] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1405] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1411] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1417] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1423] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1429] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1435] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1441] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1447] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1453] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1459] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1465] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1471] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1477] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1483] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1489] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1495] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1501] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1507] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1513] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1519] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1525] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1531] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1537] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1543] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1549] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1555] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1561] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1567] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1573] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1579] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1585] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1591] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1597] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1603] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1609] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1615] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1621] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1627] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1633] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1639] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1645] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1651] "\n"       "\n"       "\n"       "\n"       "\n"       "\n"      
[1657] "\n"       "\n"       "\n"       "\n"      

Scraping Spine Numbers

page  |>  
  html_nodes(".g-spine") |> 
  html_text() |> 
  str_remove_all("\n") 
   [1] "Spine #" "1"       "2"       "3"       "4"       "5"       "6"      
   [8] "7"       "8"       "9"       "10"      "11"      "12"      "13"     
  [15] "14"      "15"      "16"      "17"      "18"      "19"      "20"     
  [22] "21"      "22"      "23"      "24"      "25"      "26"      "27"     
  [29] "28"      "29"      "30"      "31"      "32"      "33"      "34"     
  [36] "35"      "36"      "37"      "38"      "39"      "40"      "41"     
  [43] "42"      "43"      "44"      "45"      "46"      "47"      "48"     
  [50] "49"      "50"      "51"      "52"      "53"      "54"      "55"     
  [57] "56"      "57"      "58"      "59"      "60"      "61"      "62"     
  [64] "63"      "64"      "65"      "66"      "67"      "68"      "69"     
  [71] "70"      "71"      "72"      "73"      "74"      "75"      "76"     
  [78] "77"      "78"      "79"      "80"      "81"      "82"      "83"     
  [85] "84"      "85"      "86"      "87"      "88"      "88"      "89"     
  [92] "90"      "91"      "92"      "93"      "94"      "95"      "96"     
  [99] "97"      "98"      "99"      "100"     "101"     "102"     "103"    
 [106] "104"     "105"     "106"     "107"     "108"     "109"     "110"    
 [113] "111"     "112"     "113"     "114"     "115"     "116"     "117"    
 [120] "118"     "119"     "120"     "121"     "122"     "123"     "124"    
 [127] "125"     "126"     "127"     "128"     "129"     "130"     "131"    
 [134] "132"     "133"     "134"     "135"     "136"     "137"     "138"    
 [141] "139"     "140"     "141"     "142"     "143"     "144"     "145"    
 [148] "146"     "147"     "148"     "149"     "150"     "151"     "152"    
 [155] "153"     "154"     "155"     "156"     "157"     "158"     "159"    
 [162] "160"     "161"     "162"     "163"     "164"     "165"     "166"    
 [169] "167"     "168"     "169"     "170"     "171"     "172"     "173"    
 [176] "174"     "175"     "176"     "177"     "178"     "179"     "180"    
 [183] "181"     "182"     "183"     "184"     "185"     "186"     "187"    
 [190] "188"     "189"     "190"     "191"     "192"     "193"     "194"    
 [197] "195"     "196"     "197"     "198"     "199"     "200"     "201"    
 [204] "202"     "203"     "204"     "205"     "206"     "207"     "208"    
 [211] "209"     "210"     "211"     "213"     "214"     "215"     "216"    
 [218] "217"     "218"     "219"     "220"     "221"     "222"     "223"    
 [225] "224"     "225"     "226"     "227"     "228"     "229"     "230"    
 [232] "231"     "232"     "233"     "234"     "235"     "236"     "237"    
 [239] "238"     "239"     "240"     "241"     "242"     "243"     "244"    
 [246] "245"     "246"     "247"     "248"     "249"     "250"     "251"    
 [253] "252"     "253"     "254"     "255"     "256"     "257"     "258"    
 [260] "259"     "260"     "261"     "262"     "263"     "264"     "265"    
 [267] "266"     "267"     "268"     "269"     "270"     "271"     "272"    
 [274] "273"     "274"     "275"     "276"     "277"     "278"     "279"    
 [281] "280"     "281"     "282"     "283"     "284"     "285"     "286"    
 [288] "287"     "288"     "289"     "290"     "291"     "292"     "293"    
 [295] "294"     "295"     "296"     "297"     "298"     "299"     "300"    
 [302] "301"     "302"     "303"     "304"     "305"     "306"     "307"    
 [309] "308"     "309"     "310"     "311"     "312"     "313"     "314"    
 [316] "315"     "316"     "317"     "318"     "319"     "320"     "321"    
 [323] "322"     "323"     "324"     "325"     "326"     "327"     "328"    
 [330] "329"     "330"     "331"     "332"     "333"     "334"     "335"    
 [337] "336"     "337"     "338"     "339"     "340"     "341"     "342"    
 [344] "343"     "344"     "345"     "346"     "347"     "348"     "349"    
 [351] "350"     "351"     "352"     "353"     "354"     "355"     "356"    
 [358] "357"     "358"     "359"     "360"     "361"     "362"     "363"    
 [365] "364"     "365"     "366"     "367"     "368"     "369"     "370"    
 [372] "371"     "372"     "373"     "374"     "375"     "376"     "377"    
 [379] "378"     "379"     "380"     "381"     "382"     "383"     "384"    
 [386] "385"     "386"     "387"     "388"     "389"     "390"     "391"    
 [393] "392"     "393"     "394"     "395"     "396"     "397"     "398"    
 [400] "399"     "400"     "401"     "402"     "403"     "404"     "405"    
 [407] "406"     "407"     "408"     "409"     "410"     "411"     "412"    
 [414] "413"     "414"     "415"     "416"     "417"     "418"     "419"    
 [421] "420"     "421"     "422"     "423"     "424"     "425"     "426"    
 [428] "427"     "428"     "429"     "430"     "431"     "432"     "433"    
 [435] "434"     "435"     "436"     "437"     "438"     "439"     "440"    
 [442] "441"     "442"     "443"     "444"     "445"     "446"     "447"    
 [449] "448"     "449"     "450"     "451"     "452"     "453"     "454"    
 [456] "455"     "456"     "457"     "458"     "459"     "460"     "461"    
 [463] "462"     "463"     "464"     "465"     "466"     "467"     "468"    
 [470] "469"     "470"     "471"     "472"     "473"     "474"     "475"    
 [477] "476"     "477"     "478"     "479"     "480"     "481"     "482"    
 [484] "483"     "484"     "485"     "486"     "487"     "488"     "489"    
 [491] "490"     "491"     "492"     "493"     "494"     "495"     "496"    
 [498] "497"     "498"     "499"     "500"     "501"     "502"     "503"    
 [505] "504"     "505"     "506"     "507"     "508"     "509"     "510"    
 [512] "511"     "512"     "513"     "514"     "515"     "516"     "517"    
 [519] "518"     "519"     "520"     "521"     "522"     "523"     "524"    
 [526] "525"     "526"     "527"     "528"     "529"     "530"     "531"    
 [533] "532"     "533"     "534"     "535"     "536"     "537"     "538"    
 [540] "539"     "540"     "541"     "542"     "543"     "544"     "545"    
 [547] "546"     "547"     "548"     "549"     "550"     "551"     "552"    
 [554] "553"     "554"     "555"     "556"     "557"     "558"     "559"    
 [561] "560"     "561"     "562"     "563"     "564"     "565"     "566"    
 [568] "567"     "568"     "569"     "570"     "571"     "572"     "573"    
 [575] "574"     "575"     "576"     "577"     "578"     "579"     "580"    
 [582] "581"     "582"     "583"     "584"     "585"     "586"     "587"    
 [589] "588"     "589"     "590"     "591"     "592"     "593"     "594"    
 [596] "595"     "596"     "597"     "598"     "599"     "600"     "601"    
 [603] "602"     "603"     "604"     "605"     "606"     "607"     "608"    
 [610] "609"     "610"     "611"     "612"     "613"     "614"     "615"    
 [617] "616"     "617"     "618"     "619"     "620"     "621"     "622"    
 [624] "623"     "624"     "625"     "626"     "627"     "628"     "629"    
 [631] "630"     "631"     "632"     "633"     "634"     "635"     "636"    
 [638] "637"     "638"     "639"     "640"     "641"     "642"     "643"    
 [645] "644"     "645"     "646"     "647"     "648"     "649"     "650"    
 [652] "651"     "652"     "653"     "654"     "655"     "656"     "657"    
 [659] "658"     "659"     "660"     "661"     "662"     "663"     "664"    
 [666] "665"     "666"     "667"     "668"     "669"     "670"     "671"    
 [673] "672"     "673"     "674"     "675"     "676"     "677"     "678"    
 [680] "679"     "680"     "681"     "682"     "683"     "684"     "685"    
 [687] "686"     "687"     "688"     "689"     "690"     "691"     "692"    
 [694] "693"     "694"     "695"     "696"     "697"     "698"     "699"    
 [701] "700"     "701"     "702"     "703"     "704"     "705"     "706"    
 [708] "707"     "708"     "709"     "710"     "711"     "712"     "713"    
 [715] "714"     "715"     "716"     "717"     "718"     "719"     "720"    
 [722] "721"     "722"     "723"     "724"     "725"     "726"     "727"    
 [729] "728"     "729"     "730"     "731"     "732"     "733"     "734"    
 [736] "735"     "736"     "737"     "738"     "739"     "740"     "741"    
 [743] "742"     "743"     "744"     "745"     "746"     "747"     "748"    
 [750] "749"     "750"     "751"     "752"     "753"     "754"     "755"    
 [757] "756"     "757"     "758"     "759"     "760"     "761"     "762"    
 [764] "763"     "764"     "765"     "766"     "767"     "768"     "769"    
 [771] "770"     "771"     "772"     "773"     "774"     "775"     "776"    
 [778] "777"     "778"     "779"     "780"     "781"     "782"     "783"    
 [785] "784"     "785"     "786"     "787"     "788"     "789"     "790"    
 [792] "791"     "792"     "793"     "794"     "795"     "796"     "797"    
 [799] "798"     "799"     "800"     "801"     "802"     "803"     "804"    
 [806] "805"     "806"     "807"     "808"     "809"     "810"     "811"    
 [813] "812"     "813"     "814"     "815"     "816"     "817"     "818"    
 [820] "819"     "820"     "821"     "822"     "823"     "824"     "825"    
 [827] "826"     "827"     "828"     "829"     "830"     "831"     "832"    
 [834] "833"     "834"     "835"     "836"     "837"     "838"     "839"    
 [841] "840"     "841"     "842"     "843"     "844"     "845"     "846"    
 [848] "847"     "848"     "849"     "850"     "851"     "852"     "853"    
 [855] "854"     "855"     "856"     "857"     "858"     "859"     "860"    
 [862] "861"     "862"     "863"     "864"     "865"     "866"     "867"    
 [869] "868"     "869"     "870"     "871"     "872"     "873"     "874"    
 [876] "875"     "876"     "877"     "878"     "879"     "880"     "881"    
 [883] "882"     "883"     "884"     "885"     "886"     "887"     "888"    
 [890] "889"     "890"     "891"     "892"     "893"     "894"     "895"    
 [897] "896"     "897"     "898"     "899"     "900"     "901"     "902"    
 [904] "903"     "904"     "905"     "906"     "907"     "908"     "909"    
 [911] "910"     "911"     "912"     "913"     "914"     "915"     "916"    
 [918] "917"     "918"     "919"     "920"     "921"     "922"     "923"    
 [925] "924"     "925"     "926"     "927"     "928"     "929"     "930"    
 [932] "931"     "932"     "933"     "934"     "935"     "936"     "937"    
 [939] "938"     "939"     "940"     "941"     "942"     "943"     "944"    
 [946] "945"     "946"     "947"     "948"     "949"     "950"     "951"    
 [953] "952"     "953"     "954"     "955"     "956"     "957"     "958"    
 [960] "959"     "960"     "961"     "962"     "963"     "964"     "965"    
 [967] "966"     "967"     "968"     "969"     "970"     "971"     "972"    
 [974] "973"     "974"     "975"     "976"     "977"     "978"     "979"    
 [981] "980"     "981"     "982"     "983"     "984"     "985"     "986"    
 [988] "987"     "988"     "989"     "990"     "991"     "992"     "993"    
 [995] "994"     "995"     "996"     "997"     "998"     "999"     "1000"   
[1002] "1001"    "1002"    "1003"    "1004"    "1005"    "1006"    "1007"   
[1009] "1008"    "1009"    "1010"    "1011"    "1012"    "1013"    "1014"   
[1016] "1015"    "1016"    "1017"    "1018"    "1019"    "1020"    "1021"   
[1023] "1022"    "1023"    "1024"    "1025"    "1026"    "1027"    "1028"   
[1030] "1029"    "1030"    "1031"    "1032"    "1033"    "1034"    "1035"   
[1037] "1036"    "1037"    "1038"    "1039"    "1040"    "1041"    "1042"   
[1044] "1043"    "1044"    "1045"    "1046"    "1047"    "1048"    "1049"   
[1051] "1050"    "1051"    "1052"    "1053"    "1054"    "1055"    "1056"   
[1058] "1057"    "1058"    "1059"    "1060"    "1061"    "1062"    "1063"   
[1065] "1064"    "1065"    "1066"    "1067"    "1068"    "1069"    "1070"   
[1072] "1071"    "1072"    "1073"    "1074"    "1075"    "1076"    "1077"   
[1079] "1078"    "1079"    "1080"    "1081"    "1082"    "1083"    "1084"   
[1086] "1085"    "1086"    "1087"    "1088"    "1089"    "1090"    "1091"   
[1093] "1092"    "1093"    "1094"    "1095"    "1096"    "1097"    "1098"   
[1100] "1099"    "1100"    "1101"    "1102"    "1103"    "1104"    "1105"   
[1107] "1106"    "1107"    "1108"    "1109"    "1110"    "1111"    "1112"   
[1114] "1113"    "1114"    "1115"    "1116"    "1117"    "1118"    "1119"   
[1121] "1120"    "1121"    "1122"    "1123"    "1124"    "1125"    "1126"   
[1128] "1127"    "1128"    "1129"    "1130"    "1131"    "1132"    "1133"   
[1135] "1134"    "1135"    "1136"    "1137"    "1138"    "1139"    "1140"   
[1142] "1141"    "1142"    "1143"    "1144"    "1145"    "1146"    "1147"   
[1149] "1148"    "1149"    "1150"    "1151"    "1152"    "1153"    "1154"   
[1156] "1155"    "1156"    "1157"    "1158"    "1159"    "1160"    "1161"   
[1163] "1162"    "1163"    "1164"    "1165"    "1166"    "1167"    "1168"   
[1170] "1169"    "1170"    "1171"    "1172"    "1173"    "1174"    "1175"   
[1177] "1176"    "1177"    "1178"    "1179"    "1180"    "1181"    "1182"   
[1184] "1183"    "1184"    "1185"    "1186"    "1187"    "1188"    "1189"   
[1191] "1190"    "1191"    "1192"    "1193"    "1194"    "1195"    "1196"   
[1198] "1197"    "1198"    "1199"    "1200"    "1201"    "1202"    "1203"   
[1205] "1204"    "1205"    "1206"    "1207"    "1208"    "1209"    "1210"   
[1212] "1211"    "1212"    "1213"    "1214"    "1215"    "1216"    ""       
[1219] ""        ""        ""        ""        ""        ""        ""       
[1226] ""        ""        ""        ""        ""        ""        ""       
[1233] ""        ""        ""        ""        ""        ""        ""       
[1240] ""        ""        ""        ""        ""        ""        ""       
[1247] ""        ""        ""        ""        ""        ""        ""       
[1254] ""        ""        ""        ""        ""        ""        ""       
[1261] ""        ""        ""        ""        ""        ""        ""       
[1268] ""        ""        ""        ""        ""        ""        ""       
[1275] ""        ""        ""        ""        ""        ""        ""       
[1282] ""        ""        ""        ""        ""        ""        ""       
[1289] ""        ""        ""        ""        ""        ""        ""       
[1296] ""        ""        ""        ""        ""        ""        ""       
[1303] ""        ""        ""        ""        ""        ""        ""       
[1310] ""        ""        ""        ""        ""        ""        ""       
[1317] ""        ""        ""        ""        ""        ""        ""       
[1324] ""        ""        ""        ""        ""        ""        ""       
[1331] ""        ""        ""        ""        ""        ""        ""       
[1338] ""        ""        ""        ""        ""        ""        ""       
[1345] ""        ""        ""        ""        ""        ""        ""       
[1352] ""        ""        ""        ""        ""        ""        ""       
[1359] ""        ""        ""        ""        ""        ""        ""       
[1366] ""        ""        ""        ""        ""        ""        ""       
[1373] ""        ""        ""        ""        ""        ""        ""       
[1380] ""        ""        ""        ""        ""        ""        ""       
[1387] ""        ""        ""        ""        ""        ""        ""       
[1394] ""        ""        ""        ""        ""        ""        ""       
[1401] ""        ""        ""        ""        ""        ""        ""       
[1408] ""        ""        ""        ""        ""        ""        ""       
[1415] ""        ""        ""        ""        ""        ""        ""       
[1422] ""        ""        ""        ""        ""        ""        ""       
[1429] ""        ""        ""        ""        ""        ""        ""       
[1436] ""        ""        ""        ""        ""        ""        ""       
[1443] ""        ""        ""        ""        ""        ""        ""       
[1450] ""        ""        ""        ""        ""        ""        ""       
[1457] ""        ""        ""        ""        ""        ""        ""       
[1464] ""        ""        ""        ""        ""        ""        ""       
[1471] ""        ""        ""        ""        ""        ""        ""       
[1478] ""        ""        ""        ""        ""        ""        ""       
[1485] ""        ""        ""        ""        ""        ""        ""       
[1492] ""        ""        ""        ""        ""        ""        ""       
[1499] ""        ""        ""        ""        ""        ""        ""       
[1506] ""        ""        ""        ""        ""        ""        ""       
[1513] ""        ""        ""        ""        ""        ""        ""       
[1520] ""        ""        ""        ""        ""        ""        ""       
[1527] ""        ""        ""        ""        ""        ""        ""       
[1534] ""        ""        ""        ""        ""        ""        ""       
[1541] ""        ""        ""        ""        ""        ""        ""       
[1548] ""        ""        ""        ""        ""        ""        ""       
[1555] ""        ""        ""        ""        ""        ""        ""       
[1562] ""        ""        ""        ""        ""        ""        ""       
[1569] ""        ""        ""        ""        ""        ""        ""       
[1576] ""        ""        ""        ""        ""        ""        ""       
[1583] ""        ""        ""        ""        ""        ""        ""       
[1590] ""        ""        ""        ""        ""        ""        ""       
[1597] ""        ""        ""        ""        ""        ""        ""       
[1604] ""        ""        ""        ""        ""        ""        ""       
[1611] ""        ""        ""        ""        ""        ""        ""       
[1618] ""        ""        ""        ""        ""        ""        ""       
[1625] ""        ""        ""        ""        ""        ""        ""       
[1632] ""        ""        ""        ""        ""        ""        ""       
[1639] ""        ""        ""        ""        ""        ""        ""       
[1646] ""        ""        ""        ""        ""        ""        ""       
[1653] ""        ""        ""        ""        ""        ""        ""       
[1660] ""       

Scraping Spine Numbers

spine <- page  |>  
  html_nodes(".g-spine") |> 
  html_text() |> 
  str_remove_all("\n") 

Scraping Titles

title <- page |> 
  html_nodes(".g-title") |> 
  html_text() |> 
  str_remove_all("\n")

Scraping Directors

director <- page |> 
  html_nodes(".g-director") |> 
  html_text() |> 
  str_remove_all("\n")

Scraping Years

year <- page |> 
  html_nodes(".g-year") |> 
  html_text() |> 
  str_remove_all("\n")

Putting Everything in a Tibble

criterion <- tibble(spine = spine,
                    title = title,
                    director = director,
                    year = year) |> 
  slice(-1) |> 
  mutate(spine = as.numeric(spine),
         year = as.numeric(year)) 

Writing the Dataset

readr::write_csv(criterion, here::here("data/criterion.csv"))

Considerations

Finding data online does not grant permission to scrape and use the data

  1. Is it ethical? Be especially mindful when using data from human subjects. Check out Institutional Review Board for regulations on doing research with human subjects data.

  2. Is it legal? Check terms of use. Commercial use may not be permitted.