| 02:25:27 | <klea> | lovely https://w.wiki/ WMF seems to have a url shortener: https://w.wiki/94fS example url |
| 02:35:40 | <@JAA> | Yes, dumps are available, too: https://dumps.wikimedia.org/other/shorturls/ |
| 02:36:19 | <klea> | oh good |
| 03:22:08 | | DogsRNice_ quits [Read error: Connection reset by peer] |
| 03:30:03 | | tzt quits [Quit: tzt] |
| 03:30:30 | | tzt (tzt) joins |
| 06:10:10 | | ArchivalEfforts quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
| 06:10:19 | | ArchivalEfforts joins |
| 11:10:08 | | archiveDrill quits [Ping timeout: 256 seconds] |
| 14:29:19 | | pabs quits [Ping timeout: 272 seconds] |
| 14:32:18 | | pabs (pabs) joins |
| 15:02:53 | | that_lurker quits [Ping timeout: 272 seconds] |
| 15:07:34 | | that_lurker (that_lurker) joins |
| 15:13:45 | <justauser> | Huh. Do we have https://goo.gl/fb/mthfjx -style covered? |
| 15:20:41 | <nstrom|m> | yeah we did those w/ goo-gl-grab |
| 15:21:38 | <klea> | https://archive.org/details/UrlteamWebCrawls?tab=collection&query=goo-gl |
| 15:21:56 | <klea> | curl -Ls https://archive.org/download/urlteam_2025-08-25-00-17-01/goo-gl.2025-08-25-00-17-01.zip/goo-gl%2F______.txt.xz | unxz -dc |
| 15:22:04 | <klea> | ofc for all items, not just that one |
| 15:25:53 | <klea> | tho maybe this query is better: https://archive.org/details/UrlteamWebCrawls?tab=collection&and%5B%5D=subject%3A%22goo-gl%22 |
| 15:27:28 | <justauser> | tl;dr looks painful. |
| 15:27:57 | <justauser> | I wouldn't be surprised if someone/something has the data in one piece. |
| 15:28:15 | <justauser> | Maybe datechnoman already includes those in stash? |
| 15:28:48 | <justauser> | Or that much of them were fed down the #//? |
| 15:29:27 | <klea> | im not sure. |
| 15:29:44 | <klea> | i can try to download those if you want. |
| 15:29:48 | <justauser> | That's exactly why I asked. |
| 15:30:33 | <klea> | iirc i wrote a python script to get the torrent links, that's easy to repurpose |
| 15:31:04 | <klea> | oh no that used the scrape endpoint thingy :p |
| 15:31:21 | <klea> | i'll just use jq |
| 15:33:55 | <klea> | aaa zip has datee |
| 15:39:14 | <klea> | huh, the number of underscores changes :( |
| 15:39:48 | <klea> | and more than one file in some zips |
| 16:03:13 | <klea> | justauser: if you want to download them, here: <https://transfer.archivete.am/WbXO3/urls.txt>, i'm too lazy to extract them and all that sorry |
| 16:03:14 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/WbXO3/urls.txt>, |
| 17:04:18 | | TastyWiener95 quits [Ping timeout: 256 seconds] |
| 17:08:17 | | TastyWiener95 (TastyWiener95) joins |
| 18:33:40 | | DogsRNice joins |
| 21:41:52 | | jesterjunk_ joins |
| 21:45:03 | | jesterjunk quits [Ping timeout: 272 seconds] |
| 22:35:35 | <@JAA> | That's the URLTeam data, which is easy to process. goo-gl-grab and #urlteamwasright is separate. |
| 23:16:55 | | atphoenix__ (atphoenix) joins |
| 23:20:03 | | atphoenix_ quits [Ping timeout: 272 seconds] |