00:04:23archiveDrill quits [Quit: The Lounge - https://thelounge.chat]
00:15:17dabs quits [Read error: Connection reset by peer]
00:18:33<Flashfire42>https://itch.io/t/5149036/reindexing-adult-nsfw-content
00:19:13<@JAA>→ #scratchtheitch (already posted there)
00:31:52dabs joins
00:37:47Guest58 joins
00:40:33hamouda joins
00:45:37Wohlstand quits [Quit: Wohlstand]
00:47:12<hamouda>hii everyone, am back after the archive.org was hacked, am the OP of this post on reddit : https://www.reddit.com/r/Archiveteam/comments/1gdszot/archiving_archives_of_highly_important_lost_forums/ . you've told me to wait till the full rcovery of the website to be able to scrape these archives. I want them to be WARC 1.1 to convert them to ZIM.
00:47:12<hamouda>any ideas? thank you for giving me this opportunity.
00:50:53<pokechu22>Are you specifically interested in just the 5 pages linked in that reddit post, or the entirety of https://al-maktaba.org/ (which I'm not sure how to do, since the main page redirects)
00:53:04<pokechu22>ah, each "book" has multiple pages corresponding to individual threads, hmm
00:53:21<hamouda>they are not pages , yes.
00:54:58<hamouda>they are related to shamela.ws website. its one of the best libraries on web.
00:58:01<pokechu22>I've started an archivebot job: http://archivebot.com/?initialFilter=ahlalhdeeth.com - WARCs will appear at https://archive.fart.website/archivebot/viewer/job/13g82 when it finishes
00:58:12Webuser903083 joins
01:00:08<hamouda>thank you so much. will this be WARC 1.1 or 1.0?
01:02:54<pokechu22>ArchiveBot generates WARC 1.0. I'm not sure what actually changed with WARC 1.1; https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/ seems to have some new features but they probably aren't required
01:03:32<pokechu22>You could try https://archive.fart.website/archivebot/viewer/job/20240103195343cadvi to see if it works as expected
01:06:12Webuser220582 joins
01:07:38<hamouda>ok, will this crawl will get the pages or just the titles of the book, see for example this title : https://al-maktaba.org/book/31621/5 and its pages are https://al-maktaba.org/book/31621/6#p1 https://al-maktaba.org/book/31621/7#p1 https://al-maktaba.org/book/31621/8#p1 etc.
01:08:10Webuser903083 quits [Client Quit]
01:09:53<pokechu22>Yes, it should get those
01:10:53<hamouda>that's great. thank you.
01:11:14<@JAA>Re WARC 1.0 vs 1.1, see the two diff* files here: https://github.com/JustAnotherArchivist/warc-specifications/tree/comparison-1.0-v-1.1/specifications/warc-format/warc-1.1
01:21:57<hamouda>good, I've told you this (WARC1.1) because am using warc2zim to convert the warc 1.1 and playing it with kiwix app. If this WARC 1.0, which I haven't tried to convert from yet. I'll try to test it works normally or not. but I think it should.
01:27:42etnguyen03 (etnguyen03) joins
01:31:42nicolas17_ joins
01:32:59nicolas17 quits [Ping timeout: 260 seconds]
01:37:29etnguyen03 quits [Client Quit]
01:46:53etnguyen03 (etnguyen03) joins
01:47:08nicolas17 joins
01:47:34nicolas17_ quits [Ping timeout: 260 seconds]
02:28:03Guest58 quits [Client Quit]
02:29:16etnguyen03 quits [Remote host closed the connection]
02:32:19dabs quits [Client Quit]
03:03:00kansei quits [Quit: ZNC 1.10.1 - https://znc.in]
03:08:17<h2ibot>TriangleDemon edited YouTube (+48, UPDATE): https://wiki.archiveteam.org/?diff=56659&oldid=56658
03:10:10kansei (kansei) joins
03:14:18Guest58 joins
03:16:04Hackerpcs quits [Quit: Hackerpcs]
03:23:22Hackerpcs (Hackerpcs) joins
03:28:02nicolas17_ joins
03:28:29nicolas17 quits [Ping timeout: 260 seconds]
03:29:38Guest58 quits [Read error: Connection reset by peer]
03:30:05Guest58 joins
03:41:22nicolas17_ is now known as nicolas17
03:47:13Guest58_ joins
03:48:24Guest58_ quits [Client Quit]
03:50:39Guest58 quits [Ping timeout: 260 seconds]
04:18:52Guest58 joins
04:21:36devkev0 (devkev) joins
04:23:19ATinySpaceMarine quits [Ping timeout: 260 seconds]
04:23:19devkev quits [Ping timeout: 260 seconds]
04:23:19devkev0 is now known as devkev
04:24:43ATinySpaceMarine joins
04:29:53evergreen quits [Quit: Ping timeout (120 seconds)]
04:30:02evergreen joins
04:30:19khaoohs quits [Ping timeout: 260 seconds]
04:30:34lennier2_ quits [Ping timeout: 240 seconds]
04:44:50GradientCat quits [Quit: Connection closed for inactivity]
04:47:34midou quits [Ping timeout: 240 seconds]
04:55:03midou joins
05:01:59hamouda quits [Quit: Ooops, wrong browser tab.]
05:07:17Guest58 quits [Read error: Connection reset by peer]
05:18:50i_have_n0_idea37 quits [Quit: The Lounge - https://thelounge.chat]
05:20:49i_have_n0_idea37 (i_have_n0_idea) joins
05:21:05nicolas17_ joins
05:23:24nicolas17 quits [Ping timeout: 260 seconds]
05:23:50khaoohs joins
05:23:53Webuser220582 quits [Quit: Ooops, wrong browser tab.]
05:38:55khaoohs quits [Remote host closed the connection]
05:39:12khaoohs joins
05:40:13Guest58 joins
05:58:18feed quits [Quit: Limnoria 2024.12.20]
05:58:28feed (feed) joins
06:11:47nicolas17_ is now known as nicolas17
06:13:26PredatorIWD25 joins
06:14:32Island quits [Read error: Connection reset by peer]
06:33:25awauwa (awauwa) joins
06:47:49lennier2_ joins
07:01:00Guest58 quits [Client Quit]
07:04:58Webuser580573 joins
07:10:42Webuser580573 quits [Client Quit]
07:18:54Sokar quits [Read error: Connection reset by peer]
07:20:51Sokar joins
07:33:43Guest58 joins
08:43:12<h2ibot>TriangleDemon edited YouTube (+99, Add YouTube crawls): https://wiki.archiveteam.org/?diff=56660&oldid=56659
08:45:12<h2ibot>TriangleDemon edited Yahoo! Video (+87, Add data crawls): https://wiki.archiveteam.org/?diff=56661&oldid=47342
08:47:00Dada joins
08:47:12<h2ibot>TriangleDemon edited TikTok (+38, Add data crawls): https://wiki.archiveteam.org/?diff=56662&oldid=56095
08:49:13<h2ibot>TriangleDemon edited Rumble (+213): https://wiki.archiveteam.org/?diff=56663&oldid=51579
08:50:13<h2ibot>TriangleDemon uploaded File:Rumble.png: https://wiki.archiveteam.org/?title=File%3ARumble.png
08:50:14<h2ibot>TriangleDemon uploaded File:Rumble homepage.png: https://wiki.archiveteam.org/?title=File%3ARumble%20homepage.png
08:52:13<h2ibot>TriangleDemon edited Rumble (+87): https://wiki.archiveteam.org/?diff=56666&oldid=56663
09:26:04celestial quits [Ping timeout: 260 seconds]
09:27:05celestial joins
09:36:34TheEnbyperor_ quits [Ping timeout: 260 seconds]
09:36:44TheEnbyperor_ (TheEnbyperor) joins
09:37:34TheEnbyperor quits [Ping timeout: 240 seconds]
09:38:41TheEnbyperor_ is now known as TheEnbyperor
09:38:43TheEnbyperor_ joins
09:54:57sec^nd quits [Remote host closed the connection]
09:58:35sec^nd (second) joins
09:59:20BornOn420 quits [Remote host closed the connection]
09:59:58BornOn420 (BornOn420) joins
10:02:36LunarianBunny1147 (LunarianBunny1147) joins
10:04:34Lunarian1 quits [Ping timeout: 260 seconds]
11:00:03Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:50Bleo182600722719623455222 joins
11:16:28egallager joins
12:24:28Deksor joins
12:25:09Snivy quits [Ping timeout: 260 seconds]
12:25:27<Deksor>Hello, I just realized that anandtech.com is now fully gone (only the forum is online)
12:25:27<Deksor>Is the ETA of the archive up to date ? https://wiki.archiveteam.org/index.php/AnandTech :(
12:27:11<Deksor>Also, where could I download such archive ?
12:47:44<that_lurker>it should be in the wayback machine. Do you want a local copy?
12:58:36<Deksor>Yes.
13:08:44egallager quits [Read error: Connection reset by peer]
13:08:56egallager joins
13:35:12ericgallager joins
13:36:54egallager quits [Ping timeout: 260 seconds]
14:23:18<that_lurker>Deksor: https://archive.fart.website/archivebot/viewer/job/20240901213047bvqa8 this is most likely the latest copy
14:29:09<Deksor>thanks !
14:37:08PredatorIWD25 quits [Read error: Connection reset by peer]
14:40:27PredatorIWD25 joins
14:41:00IDK (IDK) joins
14:49:49midou quits [Ping timeout: 260 seconds]
15:00:07midou joins
15:06:37Barto quits [Quit: WeeChat 4.7.0]
15:06:56Webuser623865 joins
15:10:08Webuser623865 quits [Client Quit]
15:11:03Barto (Barto) joins
16:05:15Island joins
16:29:04awauwa quits [Quit: awauwa]
16:33:26archiveDrill joins
16:35:07GradientCat (GradientCat) joins
16:49:55BennyOtt quits [Quit: ZNC 1.10.1 - https://znc.in]
16:50:56BennyOtt (BennyOtt) joins
16:54:13BennyOtt quits [Client Quit]
16:56:03BennyOtt (BennyOtt) joins
17:23:49nicolas17 quits [Ping timeout: 260 seconds]
17:26:16nicolas17 joins
17:38:53UwU joins
17:44:53<mgrandi>https://cpb.org/pressroom/Corporation-Public-Broadcasting-Addresses-Operations-Following-Loss-Federal-Funding , Should probably throw the site in archive bot
17:47:11<pokechu22>https://archive.fart.website/archivebot/viewer/job/20250503221447lj30p - looks like that took 16 hours when we did it a few months ago
17:47:42<pokechu22>will run it in a bit once the election stuff finishes
17:58:54HP_Archivist (HP_Archivist) joins
18:17:20<yano>https://blog.google/technology/developers/googl-link-shortening-update/
18:18:01justaguy is now known as mystique_altrosky
18:24:54<Jens>Many such cases.
18:25:57<nulldata>https://www.cnn.com/2025/08/01/media/trump-cpb-corporation-public-media-shuts-down
18:26:25<@JAA>→ #UncleSamsArchive and #urlteamwasright respectively
19:15:58<h2ibot>Anonymoususer852 edited Talk:Tracker (+2, Typo "In" → "Done".): https://wiki.archiveteam.org/?diff=56667&oldid=56640
19:15:59<h2ibot>Anonymoususer852 edited Talk:Tracker (+13, Place references before my signature.): https://wiki.archiveteam.org/?diff=56668&oldid=56667
19:22:16Barto quits [Quit: WeeChat 4.7.0]
19:24:19Barto (Barto) joins
19:26:31archiveDrill4 joins
19:27:14archiveDrill quits [Ping timeout: 240 seconds]
19:27:14archiveDrill4 is now known as archiveDrill
19:37:34<hexagonwin>when using grab-site, is it ok to have a very large ignore list(2.7M)? I scraped a website but it's missing quite a few pages, so I'm trying to make it scrape for everything except I already have.
19:48:13CYBERDEV joins
19:52:39<pokechu22>That's probably not going to perform well
19:53:07<pokechu22>it might make more sense to somehow add those specific URLs to the database as done, but I don't know exactly how to go about doing that
19:53:36IDK quits [Quit: Connection closed for inactivity]
19:56:21cuphead2527480 (Cuphead2527480) joins
20:29:38FiTheArchiver joins
20:32:14anonymoususer852 quits [Ping timeout: 260 seconds]
20:59:14anonymoususer852 (anonymoususer852) joins
21:03:51etnguyen03 (etnguyen03) joins
21:14:25GradientCat quits [Quit: Connection closed for inactivity]
21:21:55Larsenv quits [Quit: The Lounge - https://thelounge.chat]
21:26:27etnguyen03 quits [Client Quit]
21:26:48etnguyen03 (etnguyen03) joins
21:35:21<@JAA>grab-site's ignores are a bit different than AB's, and it might scale a bit better, but 2.7 million sounds like it'll be slow regardless.
21:36:36etnguyen03 quits [Client Quit]
21:37:15etnguyen03 (etnguyen03) joins
21:41:46Larsenv (Larsenv) joins
21:44:24<h2ibot>TriangleDemon edited GeoCities (+32, Add data crawl): https://wiki.archiveteam.org/?diff=56669&oldid=55244
21:45:24<h2ibot>TriangleDemon edited Sketch (+46, Add data crawl): https://wiki.archiveteam.org/?diff=56670&oldid=49246
21:46:32Larsenv quits [Remote host closed the connection]
21:47:25<h2ibot>TriangleDemon edited Karayou.com (+0, update): https://wiki.archiveteam.org/?diff=56671&oldid=50362
21:49:09Larsenv (Larsenv) joins
21:49:25<h2ibot>TriangleDemon edited Colors! (-1): https://wiki.archiveteam.org/?diff=56672&oldid=56597
22:08:56etnguyen03 quits [Remote host closed the connection]
22:10:11etnguyen03 (etnguyen03) joins
22:16:42Yakov joins
22:18:04SootBector quits [Remote host closed the connection]
22:19:16SootBector (SootBector) joins
22:29:00Yakov quits [Changing host]
22:29:00Yakov (Yakov) joins
22:36:01cuphead2527480 quits [Client Quit]
22:37:41lunik1 quits [Quit: :x]
22:38:13lunik1 joins
22:39:30Dada quits [Remote host closed the connection]
22:45:42Wohlstand (Wohlstand) joins
23:01:49nicolas17_ joins
23:03:54nicolas17 quits [Ping timeout: 260 seconds]
23:04:23Hackerpcs quits [Quit: Hackerpcs]
23:15:15FiTheArchiver quits [Read error: Connection reset by peer]
23:20:19nicolas17_ is now known as nicolas17
23:27:28cuphead2527480 (Cuphead2527480) joins
23:44:54UwU quits [Ping timeout: 240 seconds]
23:47:00UwU- joins
23:47:28etnguyen03 quits [Client Quit]
23:50:19hackbug quits [Remote host closed the connection]
23:50:52UwU- quits [Client Quit]
23:53:04hackbug (hackbug) joins
23:54:39UwU joins
23:58:44UwU quits [Client Quit]
23:59:22UwU joins