| 00:00:02 | | dm4v quits [Client Quit] |
| 00:02:08 | | dm4v joins |
| 00:02:10 | | dm4v is now authenticated as dm4v |
| 00:02:10 | | dm4v quits [Changing host] |
| 00:02:10 | | dm4v (dm4v) joins |
| 00:08:11 | | arkhive quits [Remote host closed the connection] |
| 00:19:34 | | Wingy quits [Remote host closed the connection] |
| 00:20:19 | | Wingy (Wingy) joins |
| 00:36:26 | <pabs> | could someone archive these two directories and their subdirs? https://www.acc.umu.se/~maswan/debian-push/ https://ftp.mgts.by/debian-mirror/ |
| 00:36:47 | | pabs is always wary of things that live in random people's home directories |
| 01:03:24 | | dm4v quits [Ping timeout: 265 seconds] |
| 01:05:37 | | dm4v joins |
| 01:05:39 | | dm4v is now authenticated as dm4v |
| 01:05:39 | | dm4v quits [Changing host] |
| 01:05:39 | | dm4v (dm4v) joins |
| 01:06:18 | <@JAA> | Ryz: Grabbing the 4Players.de forums with qwarc now. Their server's quite performant, so shouldn't be a problem to grab it all in time. I'm also continuously grabbing new content as it's posted and should get everything to within a few minutes of the actual shutdown. |
| 01:06:39 | <@JAA> | I'm only fetching the topic page HTML, nothing else. |
| 01:15:25 | <Ryz> | Woo <#>; |
| 01:15:50 | <Ryz> | Hello pabs, what's your reason for archiving these two links? |
| 01:16:03 | <Ryz> | Also uhh, I'm not sure if this is just repeated mirrored content~ |
| 01:17:08 | <pabs> | Ryz: they are referenced by Debian's website, but really should be part of the website itself or some other official location and I'm always wary of things that live in random people's home directories randomly going away |
| 01:18:01 | <@JAA> | ++ |
| 01:19:23 | <Ryz> | ...Huh~ |
| 01:21:05 | <Ryz> | Running both of 'em pabs |
| 01:21:17 | <pabs> | great, thanks |
| 01:21:41 | <Ryz> | They were both small though, huh~ |
| 01:24:52 | <@JAA> | ETA for 4players.de forum topics: 4-5 hours. Afterwards, it'll start fetching the individual post URLs. |
| 01:27:50 | <Ryz> | Meanwhile I'm mining https://www.acc.umu.se/ out for additional goodies :p |
| 01:29:36 | <pabs> | acc.umu.se hosts lots of Linux distro mirrors |
| 01:36:20 | <Ryz> | It's time for a impromptu website archiving adventure >>; |
| 01:38:43 | | pabs quits [Client Quit] |
| 01:40:08 | | pabs (pabs) joins |
| 02:57:38 | | Ruthalas quits [Quit: END OF LINE] |
| 02:58:33 | | Ruthalas (Ruthalas) joins |
| 03:01:32 | | qw3rty_ joins |
| 03:05:19 | | qw3rty__ quits [Ping timeout: 252 seconds] |
| 03:20:09 | | Ruthalas quits [Client Quit] |
| 03:32:16 | | TheTechRobo quits [Ping timeout: 252 seconds] |
| 03:32:46 | | TheTechRobo (TheTechRobo) joins |
| 04:00:52 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 04:04:20 | | pawbs joins |
| 04:27:03 | | qw3rty__ joins |
| 04:30:49 | | qw3rty_ quits [Ping timeout: 258 seconds] |
| 04:36:39 | | qwertyasdfuiopghjkl joins |
| 05:21:02 | | Arcorann quits [Ping timeout: 258 seconds] |
| 05:33:36 | | pabs quits [Remote host closed the connection] |
| 06:37:09 | | Ctrl-S quits [Read error: Connection reset by peer] |
| 06:37:11 | | Ctrl-S joins |
| 06:39:14 | | @jrwr_ quits [Ping timeout: 622 seconds] |
| 06:39:48 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:41:51 | | jrwr_ joins |
| 07:15:31 | | Arcorann (Arcorann) joins |
| 07:21:57 | | Wingy quits [Remote host closed the connection] |
| 07:22:51 | | Wingy (Wingy) joins |
| 08:10:29 | | qwertyasdfuiopghjkl35 joins |
| 08:11:32 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 09:51:34 | <@OrIdow6> | I am saving fastSWF, by the way |
| 09:52:50 | <@OrIdow6> | Going by UTC, currently on track to finish a few hours before |
| 10:15:09 | | pabs (pabs) joins |
| 10:19:04 | | qwertyasdfuiopghjkl35 is now known as qwertyasdfuiopghjkl |
| 10:29:37 | | Barto_ (Barto) joins |
| 10:29:37 | | Barto quits [Read error: Connection reset by peer] |
| 10:31:23 | | Barto_ is now known as Barto |
| 10:46:26 | | drexler quits [Remote host closed the connection] |
| 11:12:42 | | mutantmonkey quits [Remote host closed the connection] |
| 11:13:17 | | mutantmonkey (mutantmonkey) joins |
| 11:25:03 | | mutantmonkey quits [Remote host closed the connection] |
| 11:26:25 | | mutantmonkey (mutantmonkey) joins |
| 11:56:05 | <h2ibot> | Sanqui edited Webzone.ee (+150): https://wiki.archiveteam.org/?diff=47315&oldid=47313 |
| 12:48:14 | | HP_Archivist (HP_Archivist) joins |
| 12:49:58 | | britmob256364 quits [Ping timeout: 252 seconds] |
| 13:13:04 | | britmob256364 joins |
| 13:50:23 | | benjinsmith joins |
| 13:50:52 | | benjins quits [Ping timeout: 258 seconds] |
| 13:53:33 | | benjinsmith is now known as benjins |
| 13:53:43 | | benjins is now authenticated as benjins |
| 14:25:35 | | Wingy quits [Remote host closed the connection] |
| 14:26:27 | | Wingy (Wingy) joins |
| 14:27:54 | <@arkiver> | ping me when the data is on IA OrIdow6 :) |
| 14:32:47 | | paul2520 (paul2520) joins |
| 14:41:37 | | Arcorann quits [Ping timeout: 252 seconds] |
| 15:17:11 | <ThreeHM> | Regarding 4players.de: Have we considered grabbing their self-hosted videos? I've done a quick scrape of the page for all videos that were labelled as "4players exclusive" (about 4.5k), downloaded the JWPlayer XML playlist files and extracted the video URLs from those. Looks like just under 200 GiB to download all of those at the lowest quality. |
| 15:20:11 | <ThreeHM> | Here is the URL list if anyone wants to grab them: https://share.ctrl-c.xyz/gz?17dbbafdb7368fa9fe6721b460e15744/4players_urls_video_smallest.txt.gz |
| 15:21:14 | <ThreeHM> | And the XML files that contain URLs for all available qualities: https://share.ctrl-c.xyz/9f4ac4641887407fb8d0feb216fe7262/4players_video_xml.tar.xz |
| 15:24:47 | | Wingy quits [Remote host closed the connection] |
| 15:25:33 | | Wingy (Wingy) joins |
| 15:29:28 | <@JAA> | ThreeHM: Could you also share the list of the video pages and the XML playlist files so we can grab those as well? |
| 15:32:12 | <ThreeHM> | JAA: https://share.ctrl-c.xyz/gz?2004cac0fba788b8bc4405e6faf6d49c/4players_urls_videopages.txt.gz https://share.ctrl-c.xyz/gz?440efde4ab0c405d6458ef0c8fdcfdab/4players_urls_video_xml.txt.gz |
| 15:32:44 | <ThreeHM> | Although the AB job might have already grabbed some of those |
| 15:33:00 | <@JAA> | Thanks |
| 15:33:12 | <@JAA> | Yep, but I'm not worried about duplication of a few hundred HTML/XML files. :-) |
| 15:37:23 | <@JAA> | ThreeHM: I see these are the lowest-resolution videos. Any idea about the rough size of the highest-resolution versions? |
| 15:43:34 | <ThreeHM> | JAA: I keep getting my IP blocked from accessing the site while trying to get the total size of those URLs via HTTP HEAD requests, so I don't have an exact number. My guess is a few TB (likely <10) from a crawl that got banned after about half the URLs. |
| 15:44:49 | <@JAA> | Hmm, that doesn't sound too bad except for the deadline. |
| 15:50:19 | <ThreeHM> | I'm sort of suspecting they might keep the site for a bit longer since they're still releasing new content right now (they've just published a review about an hour ago) |
| 15:59:43 | <@JAA> | Hmm, maybe. Have a list of those high-res videos? |
| 16:02:53 | <ThreeHM> | https://share.ctrl-c.xyz/gz?2475c45b22ae7c7bfc7d15d5ca4e1c44/4players_urls_video_largest.txt.gz - That list likely has duplicates since some videos only have a single resoluation available |
| 16:04:06 | <ThreeHM> | Also, here is a list of all thumbnails: https://share.ctrl-c.xyz/gz?bb1f8630d29bbca96e9b72585cf22b75/4players_urls_video_thumbs.txt.gz |
| 16:05:10 | <ThreeHM> | JAA: ^ |
| 16:08:56 | <@JAA> | Cheers |
| 16:10:38 | | spirit joins |
| 16:10:40 | | wyatt8740 joins |
| 16:11:49 | | wyatt8750 quits [Ping timeout: 252 seconds] |
| 16:15:03 | <@JAA> | I'm getting an estimate of 517 GiB from a 5% sample. |
| 16:15:59 | <@JAA> | Oh wait no, that was a 2% sample. So ~1300 GiB then. |
| 16:22:06 | | wyatt8750 joins |
| 16:23:22 | | wyatt8740 quits [Ping timeout: 252 seconds] |
| 16:28:52 | | wyatt8750 quits [Ping timeout: 252 seconds] |
| 16:30:08 | | wyatt8740 joins |
| 17:02:09 | <@JAA> | DemoDrop shut down about an hour ago after explicitly having 2021-11-01 00:00Z as the deadline in its shutdown header. :-| |
| 17:03:05 | <@JAA> | Looks like I managed to get about 288k streams and missed about 60k. |
| 17:12:00 | | spirit quits [Client Quit] |
| 17:35:58 | | pabs quits [Ping timeout: 252 seconds] |
| 17:36:31 | | pabs (pabs) joins |
| 17:51:05 | | Wingy quits [Remote host closed the connection] |
| 17:51:57 | | Wingy (Wingy) joins |
| 18:18:28 | <Ryz> | Hmm, since 4players.de is located in Germany, that means right now checking https://www.timeanddate.com/worldclock/germany - we have around 3-4 hours until it hits October 31 over there~ |
| 18:19:20 | <Ryz> | After that, we have no idea if it'll shut down when it hits 12am there or at the end of that day~ |
| 18:19:31 | <Ryz> | Could be in the middle |
| 18:29:29 | <@JAA> | Apart from a handful of errors, I have all forum contents I think. The post URLs are currently stalled for unrelated reasons, but will most likely not finish in time anyway. |
| 18:33:21 | <@JAA> | Also, the original announcement at https://www.4players.de/4players.php/spielinfonews/Allgemein/3926/2199762/4Players-Mehr_als_14500_Berichte_ueber_200000_News_und_ca_60000_Videos_Das_Magazin_bedankt_sich_fuer_eure_Treue_und_verabschiedet_sich_im_Herbst.html says that 'mit dem Stichtag 31. Oktober 2021 die redaktionelle Betreuung des Portals eingestellt wird', which I read as 'no new content anymore', |
| 18:33:27 | <@JAA> | not 'website goes down immediately', but who knows... |
| 19:15:29 | | Wingy quits [Remote host closed the connection] |
| 19:16:13 | | Wingy (Wingy) joins |
| 20:08:41 | | qwertyasdfuiopghjkl40 joins |
| 20:10:44 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 20:17:45 | | qwertyasdfuiopghjkl40 is now known as qwertyasdfuiopghjkl |
| 20:17:45 | <ThreeHM> | Since we're grabbing videos from 4players: They have another 57k videos listed under "Trailers". Seems to all be 3rd party content supplied by game studios which is why I've collected all their original productions first (I'm guessing most trailers can easily be found elsewhere, but at 57k videos dating back to 2001 - that might not be the case for all of them). I'm currently |
| 20:17:47 | <ThreeHM> | downloading and processing the XMLs; I'll post the lists once the crawl is done in case we want to grab those videos as well. |
| 20:21:30 | <@JAA> | Small correction: My grab of the forums is not quite complete yet. There's one particular topic with 35k pages. Yes, pages, not posts. There are almost 530k posts in that single topic... As you might expect, it's taking its sweet time to load: https://forum.4pforen.4players.de/viewtopic.php?f=63&t=13672 |
| 20:22:55 | <@JAA> | That's been running for 18+ hours and is a bit over half-way done now. |
| 20:39:43 | <Frogging101> | My mister poll ADULT:ON grab is pretty well done now. I should probably upload it or something |
| 20:39:57 | <Frogging101> | I just finished tweaking the script to grab the poll directory |
| 20:40:43 | <Frogging101> | I'll admit it's kind of bothersome that I probably can't get this into wayback |
| 20:41:22 | <Frogging101> | who has time to download 23G of WARCs to browse some old polls :P |
| 21:24:43 | <ThreeHM> | Ok, here are my URL lists for the trailers hosted on 4players: |
| 21:24:51 | <ThreeHM> | Video view pages: https://share.ctrl-c.xyz/gz?d6220801ff5415826c48ac6f41628155/4players_urls_trailers_pages.txt.gz |
| 21:24:58 | <ThreeHM> | XML playlists: https://share.ctrl-c.xyz/gz?d343b5f5c8e4ee75b18daef92107e6d9/4players_urls_trailers_xml.txt.gz |
| 21:25:03 | <ThreeHM> | Thumbnails: https://share.ctrl-c.xyz/gz?3ded57402e2fab266bba6575257c9464/4players_trailers_thumbnails.txt.gz |
| 21:25:13 | <ThreeHM> | Videos, lowest quality: https://share.ctrl-c.xyz/gz?3ff5cfe5cf551fe4ff4f1d2e8b8c400b/4players_trailers_smallest.txt.gz |
| 21:25:19 | <ThreeHM> | Videos, highest quality: https://share.ctrl-c.xyz/gz?ed4733b8e038e252f7beb62e7e156bb9/4players_trailers_largest.txt.gz |
| 21:25:30 | <ThreeHM> | Subtitles: https://share.ctrl-c.xyz/gz?744c0f4f40402ae5e8350cbdd6574c31/4players_trailers_subtitles.txt.gz |
| 21:52:20 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 22:14:43 | <Ryz> | We're past 12AM in Germany in regards to 4players.de - no shut down when that time striked~ |
| 22:22:30 | | C4K3 quits [Remote host closed the connection] |
| 22:37:52 | | BlueMaxima joins |
| 22:43:45 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 22:46:43 | | C4K3 joins |
| 22:46:43 | | C4K3 is now authenticated as C4K3 |
| 22:59:12 | | paul2520 quits [Remote host closed the connection] |
| 23:47:38 | | HP_Archivist (HP_Archivist) joins |