00:00:32Arcorann (Arcorann) joins
00:06:05Perk quits [Client Quit]
00:09:13Perk joins
00:13:37Perk quits [Client Quit]
00:15:13DLoader quits [Quit: DLoader]
00:15:33DLoader (DLoader) joins
00:15:39Perk joins
00:24:26<eightthree>should I use wpull, wget-lua or something else for downloading random website domains? wpull seems not in the aur, but will it work just fine in a python venv if my distro discourages installing directly through pip3 install?
00:24:46<eightthree>I'd be looking to get as close to browsing the actual website (only using local data and maybe even code, perhaps the same search engine like lucene or whatever the website uses),
00:26:52Perk quits [Client Quit]
00:28:11Perk joins
00:49:57Wohlstand quits [Client Quit]
01:08:35<thuban>eightthree: you probably want https://github.com/ArchiveTeam/grab-site/ (and a viewer like https://replayweb.page/ to browse the resulting warc)
01:09:46<pabs>eightthree: if it was me I'd just ask folks to run it in ArchiveBot, that doesn't have a JS interpreter or do page interactions though, which are often needed to get everything
01:14:39<thuban>fireonlive: you're still monitoring the archivebot websocket for project urls, correct?
01:15:01<fireonlive>indeed
01:15:17pabs is too (for code, wikis and Mailman/2)
01:15:26<thuban>cool, ty
01:15:35<thuban>oh lol, that explains why i couldn't remember who was doing it
01:16:16<fireonlive>:)
01:16:34<thuban>just double-checking since i noticed mediafire links in some of these scanlation blog jobs
01:17:57<fireonlive>ah ye
01:36:20MrMcNuggets joins
01:37:13MrMcNuggets quits [Client Quit]
01:47:32<eightthree>pabs: im confused, if I want to save the page as if it were downloaded/viewed with js and everything as though a real user browsing in a browser, are you saying wpull and wget-lua don't reproduce the resulting page "bug for bug" and "bit for bit", or is it the archivebot and the specific grab-site that doesn't? Or both? I don't think my website has a grab-site project associated with it, unless I just use the generic one that...
01:47:37<eightthree>... isn't tailored to any specific site?
01:48:38<@JAA>eightthree: None of these tools know what JS is.
01:49:17<@JAA>If a site is very script-heavy, you may need brozzler.
01:49:58<@JAA>But exact and functional reproduction of script-heavy sites is hard to impossible.
01:50:08<@JAA>In the general case, anyway.
01:51:08<eightthree>thuban: I think Ill try locally hosting that https://github.com/webrecorder/replayweb.page, hopefully it works just as well as the site, thanks!
01:52:27<@JAA>That's only for playback, not for archival.
01:53:04<pabs>eightthree: think about a game written in JS, if you don't play it in a browser to the end and do all the side quests, you won't get everything. some JS websites are similar
01:53:28<@JAA>That's a good analogy! :-)
01:53:44<pabs>horrifying one but yeah :)
01:55:25<nicolas17>I remember a point-and-click Flash game (Myst style), each possible place you could be standing at was a different .swf
01:56:05<pabs>whoa
01:56:28<pabs>eightthree: which sites are you interested in btw?
01:58:57<nicolas17>pabs: well that's better than actual Myst/Riven where each possible place you can be standing at and each possible *state* the room can be in (door open / door closed?) is a different bitmap image
02:05:08<TheTechRobo>JAA: Would you happen to still have your process for retrieving the highest-quality audio from The Artists Union on WBM?
02:05:47TheTechRobo would rather not dig through logs with TheLounge's awful search functionality
02:06:35<@JAA>You're the second one tonight to confuse its perceived awful search with its actual general awfulness. ;-)
02:06:58<@JAA>I did it with the CDX.
02:07:11<TheTechRobo>JAA: Oh, yeah, TL sucks, but its search sucks especially. :-)
02:07:23<TheTechRobo>Trouble is, I still haven't gotten around to making my replacement for it.
02:07:34<thuban>TheTechRobo: recent discussion re tau: https://hackint.logs.kiska.pw/archiveteam-bs/20240224
02:07:35<TheTechRobo>Whether I like it or not, it checks the most boxes out of any IRC client I've seen so far.
02:07:37<@JAA>Find the relevant WARC via WBM headers, sort the corresponding CDX by offset (field 10 or whatever it is), then look at the nearby responses.
02:07:52<TheTechRobo>thuban: Ah thanks, forgor about public logs.
02:08:17<@JAA>There are at least two or three different URL patterns for the audio URLs, so that's the most reliable method.
02:08:34<nicolas17>my phone autocomplete learned the word "forgor" recently
02:08:40<fireonlive>The Lounge best irc client
02:08:46<TheTechRobo>Fun fact about The Lounge I learned a few weeks ago: Its sqlite database option is literally just a table of JSON objects. No wonder it can only show messages it's loaded into memory already.
02:09:12<@JAA>Yeah, the awfulness is fractal.
02:09:28<fireonlive>yeah.... best.. database.. design.. 🥲
02:11:04<TheTechRobo>TheLounge--
02:11:05<eggdrop>[karma] 'TheLounge' now has -1 karma!
02:11:11<TheTechRobo>or is it with a space
02:11:20<fireonlive>!karma The Lounge
02:11:21<eggdrop>[karma] "The Lounge" has -1 karma
02:11:28<fireonlive>The Lounge--
02:11:30<eggdrop>[karma] 'The Lounge' now has -3 karma!
02:11:33<fireonlive>some ppl like https://github.com/glowing-bear/glowing-bear
02:11:37<fireonlive>but it uploads to imgur i think
02:11:38<@JAA>The automatic history deletion in the next release is going to surprise a bunch of people.
02:12:04<TheTechRobo>fireonlive: That actually looks really cool
02:12:06<TheTechRobo>JAA: The what?
02:12:10<@JAA>:-)
02:12:19<fireonlive>rip me history
02:12:31<fireonlive>it's disabled by default though isn't it?
02:12:38<@JAA>The 'data hoarder' option only deletes 'low-value' messages.
02:12:39<TheTechRobo>Where's that meme when I need it?
02:12:48<@JAA>As I said, the awfulness is fractal.
02:12:52<@JAA>https://github.com/thelounge/thelounge/pull/4799
02:13:11<thuban>ಠ_ಠ
02:13:18<@JAA>fireonlive: Might be, yeah. I wouldn't trust them to not enable it by default though.
02:13:27<fireonlive>true
02:13:34<fireonlive>maybe i should look into flowing bear
02:13:43<fireonlive>and replacing imgur with.. not that
02:13:59<TheTechRobo>yeah that'll be way easier than what I wanted to do
02:14:14<@JAA>The Lounge--
02:14:15<eggdrop>[karma] 'The Lounge' now has -4 karma!
02:15:20<TheTechRobo>> and then provides some nice features on top of that, like embedding images, videos, and other content
02:15:20<TheTechRobo>please tell me that's configurable
02:15:20Lord_Nightmare (Lord_Nightmare) joins
02:17:15<fireonlive>the ddos must continue
02:22:50<eightthree>Im trying to join #down-the-tube and failing 4+ times through heisenbridge, is there anything different with the room like +R +r, just in case? the room is functional? Otherwise I'll keep trying to sort it out with other bridge users in case it's failing at that level
02:26:19<icedice>Convos is another option: https://convos.chat/
02:28:12<@JAA>eightthree: #down-the-tube has the same modes as this channel, -p.
02:47:10^ quits [Ping timeout: 255 seconds]
02:47:49^ (^) joins
02:49:35<TheTechRobo>icedice: This feels imposing https://lounge.thetechrobo.ca/uploads/3ffe46ce618ea206/image.png
02:52:38<h2ibot>Ryz edited List of websites excluded from the Wayback Machine (+28, Added https://www.tamindir.com/): https://wiki.archiveteam.org/?diff=51989&oldid=51982
03:00:39<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51990&oldid=51989
03:14:10Perk quits [Read error: Connection reset by peer]
03:16:32Perk joins
03:16:33Perk7 joins
03:16:34Perk quits [Remote host closed the connection]
03:16:34Perk7 is now known as Perk
03:38:59^ quits [Remote host closed the connection]
03:39:13^ (^) joins
03:48:34fireonlive is now known as \
03:48:40\ is now known as fireonlive
04:04:32<thuban>aw, this (spanish-language) scanlation blog has a bunch of taringa links :(
04:05:48<@JAA>F
04:36:41<thuban>and this other one a bunch of zippyshare :(
04:46:44GNU_world joins
05:14:51grid joins
06:15:14qwertyasdfuiopghjkl quits [Client Quit]
06:16:22<nicolas17>thuban: not on WBM?
06:17:25<thuban>didn't check, since we can't do anything about it at this point
06:24:47qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
06:28:36Ruthalas59 quits [Client Quit]
06:39:21bladem quits [Read error: Connection reset by peer]
06:39:31BlueMaxima quits [Read error: Connection reset by peer]
06:42:05Naruyoko5 quits [Quit: Leaving]
06:53:46Ruthalas59 (Ruthalas) joins
06:59:58Naruyoko joins
07:03:13pabs quits [Ping timeout: 255 seconds]
07:05:09pabs (pabs) joins
07:24:45grid quits [Client Quit]
07:26:12<thuban>hm, do we have a regex for pastebin we can add to the url lists template?
07:32:53<fireonlive>i've just been using the imgur one but with pastebin\.com instead
08:19:29GNU_world quits [Ping timeout: 272 seconds]
09:00:01Bleo182600 quits [Client Quit]
09:01:22Bleo182600 joins
09:27:39GNU_world joins
09:32:02line joins
09:35:29line_ quits [Ping timeout: 272 seconds]
09:47:02<icedice><TheTechRobo> icedice: This feels imposing https://lounge.thetechrobo.ca/uploads/3ffe46ce618ea206/image.png
09:47:03<icedice>oof
09:47:14<icedice>See you guys later
09:47:16icedice quits [Client Quit]
10:15:23VickoSaviour joins
10:18:18<VickoSaviour>I was just searching the wiki for fun when i checked the Friendster archive. Looking at it, surely it was a big and a close one, like a Google+ project... Now, when i checked the website that was supposed to be offline, i saw that it is online (probably relaunching, copyright 2023) and it has a early access participation. Someone could edit the
10:18:19<VickoSaviour>wiki page...
10:22:11<VickoSaviour>Also, while I'm still on IRC, can someone tell me what happened to DeviantArt project, we have about 1 million items, and it is stuck at 1.31 items... Did we finished backing up the groups or did it got stuck on those items?
10:24:44JaffaCakes118 (JaffaCakes118) joins
10:28:25JaffaCakes118_2 quits [Ping timeout: 255 seconds]
10:33:17JaffaCakes118 quits [Remote host closed the connection]
10:47:12JaffaCakes118 (JaffaCakes118) joins
11:02:33<imer>VickoSaviour: I believe deviantart blocked the UA and the deadline ran out, so code patch to change UA again hasn't been applied since it's too late?
11:02:51<imer>neevermind lol arkiver just patched it
11:04:37<qwertyasdfuiopghjkl>According to https://www.deviantart.com/team/journal/Convert-your-group-to-a-new-design-994388001,  "all Groups will be migrated by [2024-04-08]"
11:05:03<imer>ah, topic said 25th
11:05:38<imer>we should head over to #devianttart if there's more to talk about :)
11:15:45kiryu quits [Remote host closed the connection]
11:17:29kiryu joins
11:17:29kiryu quits [Changing host]
11:17:29kiryu (kiryu) joins
11:30:18vukky quits [Quit: @ERROR: max connections (-1) reached -- try again later]
11:37:43Wohlstand (Wohlstand) joins
11:54:12kiryu quits [Client Quit]
11:59:42kiryu joins
11:59:42kiryu quits [Changing host]
11:59:42kiryu (kiryu) joins
12:06:20vukky (vukky) joins
12:08:14Wohlstand quits [Client Quit]
12:48:22GNU_world quits [Ping timeout: 255 seconds]
12:57:13Letur quits [Quit: Client Quit]
12:59:25Arcorann quits [Ping timeout: 272 seconds]
13:00:03Letur joins
13:05:12sec^nd quits [Ping timeout: 255 seconds]
13:10:21sec^nd (second) joins
13:40:14GNU_world joins
14:02:37zhongfu quits [Ping timeout: 255 seconds]
14:22:45zhongfu (zhongfu) joins
14:28:43line quits [Ping timeout: 272 seconds]
14:30:22line joins
15:05:52JaffaCakes118 quits [Remote host closed the connection]
15:33:53wickerz quits [Quit: The Lounge - https://thelounge.chat]
15:34:15wickerz joins
16:10:41tzt quits [Ping timeout: 272 seconds]
16:11:30tzt (tzt) joins
16:56:17abirkill- (abirkill) joins
16:58:07abirkill quits [Ping timeout: 255 seconds]
16:58:07abirkill- is now known as abirkill
17:01:03<pokechu22>from #archivebot: 14:53 <youbanana> Have you guys gotten google podcasts yet? It'll be shutting down in 3 days and I couldn't find a full scrape of it in the viewer.
17:06:48<pokechu22>do we have anything going on for that?
17:07:29<pokechu22>deathwatch says date is unknown
17:40:07<pokechu22>https://podcasts.google.com/ says the date is April 2 though
17:40:58<h2ibot>Pokechu22 edited Deathwatch (+53, /* 2024 */ April 2 for Google Podcasts): https://wiki.archiveteam.org/?diff=51991&oldid=51956
17:43:25<c3manu>!ig 4bu2dpgytytcjp7bnhjfbudc2 ^https?://www\.tametick\.com/
18:21:28Doranwen (Doranwen) joins
18:24:08Unholy23613166180851599738 (Unholy2361) joins
18:24:25Unholy23613166180851599738 quits [Client Quit]
18:24:46Unholy23613166180851599738 (Unholy2361) joins
18:25:35Dango360 quits [Ping timeout: 272 seconds]
18:27:26icedice (icedice) joins
18:33:49Dango360 (Dango360) joins
18:37:47Dango360_ joins
18:41:37Dango360 quits [Ping timeout: 255 seconds]
19:10:31icedice quits [Client Quit]
19:26:18icedice (icedice) joins
19:44:21VickoSaviour quits [Client Quit]
19:52:57nertzy joins
20:12:41<fireonlive>-+rss- A. K. Dewdney has died: https://lfpress.remembering.ca/obituary/alexander-dewdney-1089463499 https://news.ycombinator.com/item?id=39886272
20:13:33<fireonlive>Dan Lynch Has Died (SRI, Arpanet, Internet) https://www.internethalloffame.org/2021/04/19/dan-lynchs-love-brilliant-complexity-fuels-early-internet-development-growth/ https://news.ycombinator.com/item?id=39887275
20:14:22<h2ibot>That lurker edited List of websites excluded from the Wayback Machine (+22, Added ntcore.com): https://wiki.archiveteam.org/?diff=51992&oldid=51990
20:16:22grid joins
20:20:09Dango360_ quits [Client Quit]
20:20:29Dango360 (Dango360) joins
20:28:00jacksonchen666 quits [Ping timeout: 255 seconds]
20:39:44that_lurker scrolls up and wonders why lounge got minus karma and then found out why
20:39:49<that_lurker>https://lounge.kuhaon.fun/folder/36e26e279eeb5fd9/fuuuck-nicolas-cage.gif
20:44:08tt joins
20:44:47jacksonchen666 (jacksonchen666) joins
20:45:08tt quits [Client Quit]
20:52:18eightthree quits [Remote host closed the connection]
20:54:07eightthree joins
20:56:20eightthree quits [Remote host closed the connection]
21:00:28<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51993&oldid=51992
21:01:24eightthree joins
21:09:36jacksonchen666 quits [Remote host closed the connection]
21:10:08jacksonchen666 (jacksonchen666) joins
21:15:36eightthree quits [Remote host closed the connection]
21:16:49eightthree joins
21:21:03JaffaCakes118 (JaffaCakes118) joins
21:27:14pixel leaves [Error from remote client]
21:27:15pixel (pixel) joins
21:28:20eightthree quits [Remote host closed the connection]
21:29:26eightthree joins
21:33:41<nicolas17>fireonlive: SRI is where Siri came from
21:38:48eightthree quits [Remote host closed the connection]
21:50:57eightthree joins
21:51:48eightthree quits [Remote host closed the connection]
21:52:36eightthree joins
22:03:38eightthree quits [Remote host closed the connection]
22:04:49eightthree joins
22:09:33eightthree quits [Remote host closed the connection]
22:11:58eightthree joins
22:13:12BlueMaxima joins
22:21:25<fireonlive>:o
22:21:35<fireonlive>no wonder it's shitty, apple didn't make it
22:24:22teacold66 joins
22:24:29<teacold66>Could someone archive https://forum.kaspersky.com/ with archivebot please (not much coverage after mid 2023)
22:28:47midou quits [Ping timeout: 272 seconds]
22:30:54teacold66 quits [Client Quit]
22:36:15grid quits [Client Quit]
22:38:44nertzy quits [Remote host closed the connection]
22:38:49midou joins
22:40:58<@JAA>thuban: https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/extract-urls-for-archiveteam-projects
22:44:13<@JAA>Oof re Google Podcasts
23:05:35eightthree quits [Remote host closed the connection]
23:06:56eightthree joins
23:09:27Bleo182600 quits [Client Quit]
23:09:46Bleo182600 joins
23:33:45Arcorann (Arcorann) joins