00:00:24march_happy (march_happy) joins
00:08:20BlueMaxima_ joins
00:09:00wickedplayer494 quits [Remote host closed the connection]
00:09:00BlueMaxima quits [Remote host closed the connection]
00:09:00fuzzy8021 quits [Remote host closed the connection]
00:09:00qwertyasdfuiopghjkl_ quits [Client Quit]
00:09:11fuzzy8021 (fuzzy8021) joins
00:09:17lunik17 quits [Client Quit]
00:09:24lunik17 joins
00:12:35wickedplayer494 joins
00:22:33fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173.224.25.67))]
00:22:39fuzzy8021 (fuzzy8021) joins
00:23:34Jens-Rex (JensRex) joins
00:23:49jacobk quits [Client Quit]
00:23:49JensRex quits [Client Quit]
00:23:49wickedplayer494 quits [Remote host closed the connection]
00:23:53jacobk joins
00:23:55qwertyasdfuiopghjkl_ joins
00:38:23Megame quits [Client Quit]
00:39:23jacobk quits [Client Quit]
00:39:23qwertyasdfuiopghjkl_ quits [Remote host closed the connection]
00:39:30jacobk joins
00:50:13HP_Archivist quits [Client Quit]
01:01:53sonick quits [Client Quit]
01:04:06zack joins
01:04:34zack quits [Remote host closed the connection]
01:09:18qwertyasdfuiopghjkl_ joins
01:43:27HP_Archivist (HP_Archivist) joins
01:46:44andrew quits [Read error: Connection reset by peer]
01:54:05march_happy quits [Remote host closed the connection]
01:57:43march_happy (march_happy) joins
03:10:56sonick (sonick) joins
03:11:25qwertyasdfuiopghjkl_ quits [Remote host closed the connection]
03:18:20jacobk quits [Client Quit]
03:18:25jacobk joins
03:32:28march_happy quits [Ping timeout: 268 seconds]
03:32:41march_happy (march_happy) joins
03:37:42march_happy quits [Ping timeout: 276 seconds]
03:38:22march_happy (march_happy) joins
03:45:53march_happy quits [Ping timeout: 265 seconds]
03:46:20march_happy (march_happy) joins
03:52:46qwertyasdfuiopghjkl_ joins
03:53:21qwertyasdfuiopghjkl_ is now known as qwertyasdfuiopghjkl
04:12:10DLoader quits [Quit: DLoader]
04:19:03DLoader joins
04:21:34lukash799 joins
04:30:03lukash799 quits [Client Quit]
04:32:15lukash799 joins
04:48:05HP_Archivist quits [Read error: Connection reset by peer]
04:50:34HP_Archivist (HP_Archivist) joins
04:52:09benjins quits [Read error: Connection reset by peer]
04:53:44benjins joins
05:11:41mrfooooo0 joins
05:24:38mrfooooo06 joins
05:25:31mrfooooo0 quits [Client Quit]
05:25:31qwertyasdfuiopghjkl quits [Client Quit]
05:25:31mrfooooo06 is now known as mrfooooo0
05:30:36BlueMaxima_ quits [Read error: Connection reset by peer]
05:34:46wickedplayer494 joins
05:44:50eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
05:50:37eroc1990 (eroc1990) joins
06:08:15DLoader_ joins
06:12:53thuban1 joins
06:12:59Church_ (Church) joins
06:13:00FalconK_ (FalconK) joins
06:13:34Hackerpcs_1 (Hackerpcs) joins
06:13:38Hackerpcs quits [Ping timeout: 265 seconds]
06:13:38Church quits [Ping timeout: 265 seconds]
06:13:38thuban quits [Ping timeout: 265 seconds]
06:13:38FalconK quits [Ping timeout: 265 seconds]
06:13:38DLoader quits [Ping timeout: 265 seconds]
06:13:38@JAA quits [Ping timeout: 265 seconds]
06:13:47DLoader_ is now known as DLoader
06:13:55JAA (JAA) joins
06:13:55@ChanServ sets mode: +o JAA
06:18:41AlsoJAA (JAA) joins
06:18:41@ChanServ sets mode: +o AlsoJAA
06:31:04qwertyasdfuiopghjkl_ joins
06:32:26Island_ quits [Read error: Connection reset by peer]
06:38:24qwertyasdfuiopghjkl_ is now known as qwertyasdfuiopghjkl
06:56:06sec^nd quits [Ping timeout: 255 seconds]
06:57:55sec^nd (second) joins
07:00:30sec^nd quits [Remote host closed the connection]
07:01:15sec^nd (second) joins
07:30:22sonick quits [Client Quit]
07:54:58yawkat (yawkat) joins
08:56:44qwertyasdfuiopghjkl quits [Client Quit]
09:43:29<@Sanqui>arkiver: I will deduplicate the list and enter what's new. Thank you
10:21:22adia4 (adia) joins
10:29:16sonick (sonick) joins
10:41:30HackMii quits [Remote host closed the connection]
10:42:31HackMii (hacktheplanet) joins
11:56:14qwertyasdfuiopghjkl_ joins
11:57:47qwertyasdfuiopghjkl_ is now known as qwertyasdfuiopghjkl
12:03:41mrfooooo0 quits [Ping timeout: 268 seconds]
12:48:23tzt quits [Read error: Connection reset by peer]
12:51:40Sluggs joins
12:55:29Arcorann quits [Ping timeout: 268 seconds]
13:22:12mut4ntm0nkey quits [Remote host closed the connection]
13:22:43mut4ntm0nkey (mutantmonkey) joins
13:40:49mrfooooo0 joins
14:01:14qwertyasdfuiopghjkl quits [Client Quit]
15:26:48<@arkiver>Sanqui: sounds good!
15:35:45Island joins
15:47:59mrfooooo0 quits [Ping timeout: 265 seconds]
16:19:57<@Sanqui><@Sanqui> arkiver: 3482 new domains from your set of 13352
16:19:57<@Sanqui><@Sanqui> thank you very much!
16:19:57<@Sanqui><@Sanqui> (sweb.cz)
16:25:44<h2ibot>Sanqui edited Sweb.cz (+307): https://wiki.archiveteam.org/?diff=49176&oldid=49174
16:28:33mrfooooo0 joins
16:29:47mrfooooo0 quits [Remote host closed the connection]
16:31:23<@Sanqui>arkiver: that said, your domains include a lot of non-sweb urls like http://www.synagoga-slatina.atlasweb.cz/
16:31:38<@Sanqui>(not that atlasweb.cz shouldn't also be archived at some point ,'D)
16:33:03HP_Archivist quits [Client Quit]
16:59:06<@arkiver>Sanqui: oops, sorry about that
16:59:26<@arkiver>can you filter those out, or should I?
16:59:31<@Sanqui>too late
16:59:33<@Sanqui>they're getting archived
16:59:38<@arkiver>fun :P
16:59:38<@Sanqui>we should archive atlasweb.cz at some point anyway
17:09:53<h2ibot>Switchnode edited Deathwatch (+289, /* 2022 */ add blog.siol.net): https://wiki.archiveteam.org/?diff=49177&oldid=49171
17:14:08<@arkiver>Sanqui: for blog.siol.net reported by HCross I have a list here (may be incomplete) https://transfer.archivete.am/10qXvT/blog.siol.net.txt
17:14:12<@arkiver>1700 sites
17:14:17<@arkiver>I hope AB is enough for that?
17:14:51<@Sanqui>yeah, for sweb.cz I put in 4000 domain batches, but that's also because half of them are typically already dead
17:15:16<@Sanqui>does archivebot !a < work without http:// prefixes?
17:15:21<@arkiver>also, do you know how many sweb.cz sites that you had in your lists previously that were not in the list I created?
17:15:32<@arkiver>Sanqui: no, needs http or https
17:15:41<@Sanqui>OK, noted, I will handle it
17:15:41<@arkiver>i think
17:16:42<@HCross>arkiver: i have bad news
17:16:44<@HCross>it's all wordpress
17:16:49<@HCross>hilariously butchered wordpress
17:17:40<@Sanqui>arkiver: 3.5k domains from the 13k you provided were new, I previously archived 144k domains, that means 131k of them you didn't know about
17:18:04<@Sanqui>(but some of them may never have appeared online -- because I also derived sweb.com/[username] to [username].sweb.cz)
17:18:31<@arkiver>Sanqui: oof
17:18:46<@arkiver>it was a very incomplete list then, good to know
17:18:50<@Sanqui>wait I may have miscalculated that (I'm bad at math)
17:18:51<@arkiver>ah
17:18:54<@arkiver>hmm
17:18:57<@Sanqui>but you get the idea
17:19:20<@arkiver>HCross: i'm not expert on AB - what is the consequence of that for AB?
17:19:22<@Sanqui>I got my urls from Bing scrape, CDX, and mwlinkscrape (including czech wikipedia)
17:19:31<@arkiver>nice sources
17:21:08<@Sanqui>I still want to scrape some warcs for more domains but depends on if I find the time
17:21:33<@Sanqui>where did you source yours arkiver?
17:24:21Iki1 quits [Ping timeout: 268 seconds]
17:49:00<joepie91|m>does urlteam also scrape t.co links?
17:52:01<Peetz0r|m>👀
17:52:06<joepie91|m>:p
18:22:49<@JAA>I've said it before in #archivebot but it easily gets swallowed by the noise there, so for visibility: stay away from !a < if at all possible. It has lots of pitfalls that may lead to either missing content or overzealous recursion, depending on the URL list, cross-links, and even retrieval order. This is why it isn't documented (and won't be until there are changes made in that area).
18:23:11<@JAA>Sanqui, arkiver: ^
18:23:45<@Sanqui>oh dear
18:24:11<@Sanqui>I've always had good experience with it, and I just did all of sweb with it. I can't think of an alternative way
18:26:20<@JAA>It should be fine if there are no links between the sites appearing in the list (assuming it's all plain domains without paths, else it gets more complicated). But if there are, various things can go wrong.
18:27:31<@JAA>One alternative might be to use wget-at or wpull with --span-hosts (and no --span-hosts-allow) and a --domains filter, although that would skip external page requisites and outlinks. Or wpull with a custom accept_url hook that overrides its internal filtering.
18:28:07<@JAA>Neither of these are *good* alternatives though.
18:29:48<schwarzkatz|m>JAA: I don’t think I got these kinds of errors while going through my url list of lacartoonerie
19:18:36<schwarzkatz|m>however, there seem to be database errors on some pages, and grab-site did not notice them (maybe they were 200s?)
19:24:13daxxy quits [Quit: bye]
19:24:27daxxy (daxxy) joins
19:24:49daxxy quits [Client Quit]
19:25:02daxxy (daxxy) joins
19:30:17<@JAA>schwarzkatz|m: Oof. What do those errors look like?
19:36:22<schwarzkatz|m>I already deleted the warc locally, so I don't know really, sorry
19:41:56<schwarzkatz|m>since I can't even find it via the search on archive.org, here is it if you want to take a look https://archive.org/details/forum.lacartoonerie.com-2022-11-11-24a72456-00000.warc
19:42:34<@JAA>Ack, thanks.
20:10:19lukash799 quits [Read error: Connection reset by peer]
20:10:22<h2ibot>Fidel edited List of websites excluded from the Wayback Machine (+21, Add pawoo.net, as it is now (2022-11-21)…): https://wiki.archiveteam.org/?diff=49178&oldid=49151
20:28:15yano1 is now known as yano
20:42:34daxxy_ (daxxy) joins
20:46:37daxxy quits [Ping timeout: 268 seconds]
20:57:23BlueMaxima joins
21:00:30<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=49179&oldid=49178
21:06:49eroc1990 quits [Client Quit]
21:29:03HP_Archivist (HP_Archivist) joins
21:37:12sec^nd quits [Ping timeout: 255 seconds]
21:42:38sec^nd (second) joins
22:00:55tzt (tzt) joins
22:20:31eroc1990 (eroc1990) joins
22:27:54eroc1990 quits [Client Quit]
22:28:30sec^nd quits [Ping timeout: 255 seconds]
22:30:46sec^nd (second) joins
22:35:51eroc1990 (eroc1990) joins
22:52:50<schwarzkatz|m>how is illegal material on archive.org handled? Or in site grabs in general?
22:54:11<schwarzkatz|m>site owner wants to know before eventually helping us save their site
23:00:20<ivan>generally if something causes a problem the item is darked and no one ever sees it again
23:01:08<schwarzkatz|m>hm, so it gets reported through archive.org and they will handle it all?
23:04:47<schwarzkatz|m>the previous owner cannot be held responsible, right?
23:05:26<schwarzkatz|m>that was a dumb question, I'm sorry :/
23:05:39<ivan>"the previous owner" you mean the original website?
23:05:44<schwarzkatz|m>yes
23:07:02<schwarzkatz|m>probably best if I just paste the question of the owner here:
23:07:02<schwarzkatz|m>There could also be illegal material on there, as I've had problems with it in the past. How will this be handled?
23:07:12<ivan>if I collect evidence that someone broke the law and put it on archive.org, can they be held responsible for breaking the law?
23:07:17<ivan>perhaps
23:07:21<schwarzkatz|m>ivan, I thank you for helping me, I just don't want to say something wrong
23:08:36<ivan>if you can just archive in such a way to not collect illegal material that would be optimal
23:10:15sec^nd quits [Remote host closed the connection]
23:10:19<ivan>you can PM me and I'll tell you the odds of someone getting prosecuted
23:10:19<ivan>haha
23:10:30<schwarzkatz|m>that's probably impossible... we're looking at a file hoster here, how would you even determine if it is illegal via scripting :D
23:11:16<ivan>putting those on IA seems dubious
23:11:34<ivan>collect known-good links from forums and archive those
23:11:55sec^nd (second) joins
23:11:57<schwarzkatz|m>it's a pomf clone
23:12:28<schwarzkatz|m>well, it's way older than pomf
23:12:28<schwarzkatz|m>losing that amount of files would be insane
23:18:21sec^nd quits [Remote host closed the connection]
23:18:41sec^nd (second) joins
23:27:54sec^nd quits [Ping timeout: 255 seconds]
23:30:08sec^nd (second) joins
23:32:12march_happy quits [Remote host closed the connection]
23:32:45march_happy (march_happy) joins
23:40:48lukash799 joins
23:58:55sec^nd quits [Remote host closed the connection]