| 00:00:24 | | march_happy (march_happy) joins |
| 00:08:20 | | BlueMaxima_ joins |
| 00:09:00 | | wickedplayer494 quits [Remote host closed the connection] |
| 00:09:00 | | BlueMaxima quits [Remote host closed the connection] |
| 00:09:00 | | fuzzy8021 quits [Remote host closed the connection] |
| 00:09:00 | | qwertyasdfuiopghjkl_ quits [Client Quit] |
| 00:09:11 | | fuzzy8021 (fuzzy8021) joins |
| 00:09:17 | | lunik17 quits [Client Quit] |
| 00:09:24 | | lunik17 joins |
| 00:12:35 | | wickedplayer494 joins |
| 00:12:35 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 00:22:33 | | fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173.224.25.67))] |
| 00:22:39 | | fuzzy8021 (fuzzy8021) joins |
| 00:23:34 | | Jens-Rex (JensRex) joins |
| 00:23:49 | | jacobk quits [Client Quit] |
| 00:23:49 | | JensRex quits [Client Quit] |
| 00:23:49 | | wickedplayer494 quits [Remote host closed the connection] |
| 00:23:53 | | jacobk joins |
| 00:23:55 | | qwertyasdfuiopghjkl_ joins |
| 00:38:23 | | Megame quits [Client Quit] |
| 00:39:23 | | jacobk quits [Client Quit] |
| 00:39:23 | | qwertyasdfuiopghjkl_ quits [Remote host closed the connection] |
| 00:39:30 | | jacobk joins |
| 00:50:13 | | HP_Archivist quits [Client Quit] |
| 01:01:53 | | sonick quits [Client Quit] |
| 01:04:06 | | zack joins |
| 01:04:34 | | zack quits [Remote host closed the connection] |
| 01:09:18 | | qwertyasdfuiopghjkl_ joins |
| 01:43:27 | | HP_Archivist (HP_Archivist) joins |
| 01:46:44 | | andrew quits [Read error: Connection reset by peer] |
| 01:54:05 | | march_happy quits [Remote host closed the connection] |
| 01:57:43 | | march_happy (march_happy) joins |
| 03:10:56 | | sonick (sonick) joins |
| 03:11:25 | | qwertyasdfuiopghjkl_ quits [Remote host closed the connection] |
| 03:18:20 | | jacobk quits [Client Quit] |
| 03:18:25 | | jacobk joins |
| 03:32:28 | | march_happy quits [Ping timeout: 268 seconds] |
| 03:32:41 | | march_happy (march_happy) joins |
| 03:37:42 | | march_happy quits [Ping timeout: 276 seconds] |
| 03:38:22 | | march_happy (march_happy) joins |
| 03:45:53 | | march_happy quits [Ping timeout: 265 seconds] |
| 03:46:20 | | march_happy (march_happy) joins |
| 03:52:46 | | qwertyasdfuiopghjkl_ joins |
| 03:53:21 | | qwertyasdfuiopghjkl_ is now known as qwertyasdfuiopghjkl |
| 04:12:10 | | DLoader quits [Quit: DLoader] |
| 04:19:03 | | DLoader joins |
| 04:21:34 | | lukash799 joins |
| 04:30:03 | | lukash799 quits [Client Quit] |
| 04:32:15 | | lukash799 joins |
| 04:48:05 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 04:50:34 | | HP_Archivist (HP_Archivist) joins |
| 04:52:09 | | benjins quits [Read error: Connection reset by peer] |
| 04:53:44 | | benjins joins |
| 05:11:41 | | mrfooooo0 joins |
| 05:24:38 | | mrfooooo06 joins |
| 05:25:31 | | mrfooooo0 quits [Client Quit] |
| 05:25:31 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 05:25:31 | | mrfooooo06 is now known as mrfooooo0 |
| 05:30:36 | | BlueMaxima_ quits [Read error: Connection reset by peer] |
| 05:34:46 | | wickedplayer494 joins |
| 05:34:50 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 05:44:50 | | eroc1990 quits [Quit: The Lounge - https://thelounge.chat] |
| 05:50:37 | | eroc1990 (eroc1990) joins |
| 06:08:15 | | DLoader_ joins |
| 06:12:53 | | thuban1 joins |
| 06:12:59 | | Church_ (Church) joins |
| 06:13:00 | | FalconK_ (FalconK) joins |
| 06:13:34 | | Hackerpcs_1 (Hackerpcs) joins |
| 06:13:38 | | Hackerpcs quits [Ping timeout: 265 seconds] |
| 06:13:38 | | Church quits [Ping timeout: 265 seconds] |
| 06:13:38 | | thuban quits [Ping timeout: 265 seconds] |
| 06:13:38 | | FalconK quits [Ping timeout: 265 seconds] |
| 06:13:38 | | DLoader quits [Ping timeout: 265 seconds] |
| 06:13:38 | | @JAA quits [Ping timeout: 265 seconds] |
| 06:13:47 | | DLoader_ is now known as DLoader |
| 06:13:55 | | JAA (JAA) joins |
| 06:13:55 | | @ChanServ sets mode: +o JAA |
| 06:18:41 | | AlsoJAA (JAA) joins |
| 06:18:41 | | @ChanServ sets mode: +o AlsoJAA |
| 06:31:04 | | qwertyasdfuiopghjkl_ joins |
| 06:32:26 | | Island_ quits [Read error: Connection reset by peer] |
| 06:38:24 | | qwertyasdfuiopghjkl_ is now known as qwertyasdfuiopghjkl |
| 06:56:06 | | sec^nd quits [Ping timeout: 255 seconds] |
| 06:57:55 | | sec^nd (second) joins |
| 07:00:30 | | sec^nd quits [Remote host closed the connection] |
| 07:01:15 | | sec^nd (second) joins |
| 07:30:22 | | sonick quits [Client Quit] |
| 07:54:58 | | yawkat (yawkat) joins |
| 08:56:44 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 09:43:29 | <@Sanqui> | arkiver: I will deduplicate the list and enter what's new. Thank you |
| 10:21:22 | | adia4 (adia) joins |
| 10:29:16 | | sonick (sonick) joins |
| 10:41:30 | | HackMii quits [Remote host closed the connection] |
| 10:42:31 | | HackMii (hacktheplanet) joins |
| 11:56:14 | | qwertyasdfuiopghjkl_ joins |
| 11:57:47 | | qwertyasdfuiopghjkl_ is now known as qwertyasdfuiopghjkl |
| 12:00:22 | | benjins is now authenticated as benjins |
| 12:03:41 | | mrfooooo0 quits [Ping timeout: 268 seconds] |
| 12:48:23 | | tzt quits [Read error: Connection reset by peer] |
| 12:51:40 | | Sluggs joins |
| 12:55:29 | | Arcorann quits [Ping timeout: 268 seconds] |
| 13:22:12 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 13:22:43 | | mut4ntm0nkey (mutantmonkey) joins |
| 13:40:49 | | mrfooooo0 joins |
| 14:01:14 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 15:26:48 | <@arkiver> | Sanqui: sounds good! |
| 15:35:45 | | Island joins |
| 15:47:59 | | mrfooooo0 quits [Ping timeout: 265 seconds] |
| 16:19:57 | <@Sanqui> | <@Sanqui> arkiver: 3482 new domains from your set of 13352 |
| 16:19:57 | <@Sanqui> | <@Sanqui> thank you very much! |
| 16:19:57 | <@Sanqui> | <@Sanqui> (sweb.cz) |
| 16:25:44 | <h2ibot> | Sanqui edited Sweb.cz (+307): https://wiki.archiveteam.org/?diff=49176&oldid=49174 |
| 16:28:33 | | mrfooooo0 joins |
| 16:29:47 | | mrfooooo0 quits [Remote host closed the connection] |
| 16:31:23 | <@Sanqui> | arkiver: that said, your domains include a lot of non-sweb urls like http://www.synagoga-slatina.atlasweb.cz/ |
| 16:31:38 | <@Sanqui> | (not that atlasweb.cz shouldn't also be archived at some point ,'D) |
| 16:33:03 | | HP_Archivist quits [Client Quit] |
| 16:59:06 | <@arkiver> | Sanqui: oops, sorry about that |
| 16:59:26 | <@arkiver> | can you filter those out, or should I? |
| 16:59:31 | <@Sanqui> | too late |
| 16:59:33 | <@Sanqui> | they're getting archived |
| 16:59:38 | <@arkiver> | fun :P |
| 16:59:38 | <@Sanqui> | we should archive atlasweb.cz at some point anyway |
| 17:09:53 | <h2ibot> | Switchnode edited Deathwatch (+289, /* 2022 */ add blog.siol.net): https://wiki.archiveteam.org/?diff=49177&oldid=49171 |
| 17:14:08 | <@arkiver> | Sanqui: for blog.siol.net reported by HCross I have a list here (may be incomplete) https://transfer.archivete.am/10qXvT/blog.siol.net.txt |
| 17:14:12 | <@arkiver> | 1700 sites |
| 17:14:17 | <@arkiver> | I hope AB is enough for that? |
| 17:14:51 | <@Sanqui> | yeah, for sweb.cz I put in 4000 domain batches, but that's also because half of them are typically already dead |
| 17:15:16 | <@Sanqui> | does archivebot !a < work without http:// prefixes? |
| 17:15:21 | <@arkiver> | also, do you know how many sweb.cz sites that you had in your lists previously that were not in the list I created? |
| 17:15:32 | <@arkiver> | Sanqui: no, needs http or https |
| 17:15:41 | <@Sanqui> | OK, noted, I will handle it |
| 17:15:41 | <@arkiver> | i think |
| 17:16:42 | <@HCross> | arkiver: i have bad news |
| 17:16:44 | <@HCross> | it's all wordpress |
| 17:16:49 | <@HCross> | hilariously butchered wordpress |
| 17:17:40 | <@Sanqui> | arkiver: 3.5k domains from the 13k you provided were new, I previously archived 144k domains, that means 131k of them you didn't know about |
| 17:18:04 | <@Sanqui> | (but some of them may never have appeared online -- because I also derived sweb.com/[username] to [username].sweb.cz) |
| 17:18:31 | <@arkiver> | Sanqui: oof |
| 17:18:46 | <@arkiver> | it was a very incomplete list then, good to know |
| 17:18:50 | <@Sanqui> | wait I may have miscalculated that (I'm bad at math) |
| 17:18:51 | <@arkiver> | ah |
| 17:18:54 | <@arkiver> | hmm |
| 17:18:57 | <@Sanqui> | but you get the idea |
| 17:19:20 | <@arkiver> | HCross: i'm not expert on AB - what is the consequence of that for AB? |
| 17:19:22 | <@Sanqui> | I got my urls from Bing scrape, CDX, and mwlinkscrape (including czech wikipedia) |
| 17:19:31 | <@arkiver> | nice sources |
| 17:21:08 | <@Sanqui> | I still want to scrape some warcs for more domains but depends on if I find the time |
| 17:21:33 | <@Sanqui> | where did you source yours arkiver? |
| 17:24:21 | | Iki1 quits [Ping timeout: 268 seconds] |
| 17:49:00 | <joepie91|m> | does urlteam also scrape t.co links? |
| 17:52:01 | <Peetz0r|m> | 👀 |
| 17:52:06 | <joepie91|m> | :p |
| 18:22:49 | <@JAA> | I've said it before in #archivebot but it easily gets swallowed by the noise there, so for visibility: stay away from !a < if at all possible. It has lots of pitfalls that may lead to either missing content or overzealous recursion, depending on the URL list, cross-links, and even retrieval order. This is why it isn't documented (and won't be until there are changes made in that area). |
| 18:23:11 | <@JAA> | Sanqui, arkiver: ^ |
| 18:23:45 | <@Sanqui> | oh dear |
| 18:24:11 | <@Sanqui> | I've always had good experience with it, and I just did all of sweb with it. I can't think of an alternative way |
| 18:26:20 | <@JAA> | It should be fine if there are no links between the sites appearing in the list (assuming it's all plain domains without paths, else it gets more complicated). But if there are, various things can go wrong. |
| 18:27:31 | <@JAA> | One alternative might be to use wget-at or wpull with --span-hosts (and no --span-hosts-allow) and a --domains filter, although that would skip external page requisites and outlinks. Or wpull with a custom accept_url hook that overrides its internal filtering. |
| 18:28:07 | <@JAA> | Neither of these are *good* alternatives though. |
| 18:29:48 | <schwarzkatz|m> | JAA: I don’t think I got these kinds of errors while going through my url list of lacartoonerie |
| 19:18:36 | <schwarzkatz|m> | however, there seem to be database errors on some pages, and grab-site did not notice them (maybe they were 200s?) |
| 19:24:13 | | daxxy quits [Quit: bye] |
| 19:24:27 | | daxxy (daxxy) joins |
| 19:24:49 | | daxxy quits [Client Quit] |
| 19:25:02 | | daxxy (daxxy) joins |
| 19:30:17 | <@JAA> | schwarzkatz|m: Oof. What do those errors look like? |
| 19:36:22 | <schwarzkatz|m> | I already deleted the warc locally, so I don't know really, sorry |
| 19:41:56 | <schwarzkatz|m> | since I can't even find it via the search on archive.org, here is it if you want to take a look https://archive.org/details/forum.lacartoonerie.com-2022-11-11-24a72456-00000.warc |
| 19:42:34 | <@JAA> | Ack, thanks. |
| 20:10:19 | | lukash799 quits [Read error: Connection reset by peer] |
| 20:10:22 | <h2ibot> | Fidel edited List of websites excluded from the Wayback Machine (+21, Add pawoo.net, as it is now (2022-11-21)…): https://wiki.archiveteam.org/?diff=49178&oldid=49151 |
| 20:28:15 | | yano1 is now known as yano |
| 20:42:34 | | daxxy_ (daxxy) joins |
| 20:46:37 | | daxxy quits [Ping timeout: 268 seconds] |
| 20:57:23 | | BlueMaxima joins |
| 21:00:30 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=49179&oldid=49178 |
| 21:06:49 | | eroc1990 quits [Client Quit] |
| 21:29:03 | | HP_Archivist (HP_Archivist) joins |
| 21:37:12 | | sec^nd quits [Ping timeout: 255 seconds] |
| 21:42:38 | | sec^nd (second) joins |
| 22:00:55 | | tzt (tzt) joins |
| 22:20:31 | | eroc1990 (eroc1990) joins |
| 22:27:54 | | eroc1990 quits [Client Quit] |
| 22:28:30 | | sec^nd quits [Ping timeout: 255 seconds] |
| 22:30:46 | | sec^nd (second) joins |
| 22:35:51 | | eroc1990 (eroc1990) joins |
| 22:52:50 | <schwarzkatz|m> | how is illegal material on archive.org handled? Or in site grabs in general? |
| 22:54:11 | <schwarzkatz|m> | site owner wants to know before eventually helping us save their site |
| 23:00:20 | <ivan> | generally if something causes a problem the item is darked and no one ever sees it again |
| 23:01:08 | <schwarzkatz|m> | hm, so it gets reported through archive.org and they will handle it all? |
| 23:04:47 | <schwarzkatz|m> | the previous owner cannot be held responsible, right? |
| 23:05:26 | <schwarzkatz|m> | that was a dumb question, I'm sorry :/ |
| 23:05:39 | <ivan> | "the previous owner" you mean the original website? |
| 23:05:44 | <schwarzkatz|m> | yes |
| 23:07:02 | <schwarzkatz|m> | probably best if I just paste the question of the owner here: |
| 23:07:02 | <schwarzkatz|m> | There could also be illegal material on there, as I've had problems with it in the past. How will this be handled? |
| 23:07:12 | <ivan> | if I collect evidence that someone broke the law and put it on archive.org, can they be held responsible for breaking the law? |
| 23:07:17 | <ivan> | perhaps |
| 23:07:21 | <schwarzkatz|m> | ivan, I thank you for helping me, I just don't want to say something wrong |
| 23:08:36 | <ivan> | if you can just archive in such a way to not collect illegal material that would be optimal |
| 23:10:15 | | sec^nd quits [Remote host closed the connection] |
| 23:10:19 | <ivan> | you can PM me and I'll tell you the odds of someone getting prosecuted |
| 23:10:19 | <ivan> | haha |
| 23:10:30 | <schwarzkatz|m> | that's probably impossible... we're looking at a file hoster here, how would you even determine if it is illegal via scripting :D |
| 23:11:16 | <ivan> | putting those on IA seems dubious |
| 23:11:34 | <ivan> | collect known-good links from forums and archive those |
| 23:11:55 | | sec^nd (second) joins |
| 23:11:57 | <schwarzkatz|m> | it's a pomf clone |
| 23:12:28 | <schwarzkatz|m> | well, it's way older than pomf |
| 23:12:28 | <schwarzkatz|m> | losing that amount of files would be insane |
| 23:18:21 | | sec^nd quits [Remote host closed the connection] |
| 23:18:41 | | sec^nd (second) joins |
| 23:27:54 | | sec^nd quits [Ping timeout: 255 seconds] |
| 23:30:08 | | sec^nd (second) joins |
| 23:32:12 | | march_happy quits [Remote host closed the connection] |
| 23:32:45 | | march_happy (march_happy) joins |
| 23:40:48 | | lukash799 joins |
| 23:58:55 | | sec^nd quits [Remote host closed the connection] |