00:02:00hackbug quits [Client Quit]
00:03:55hackbug (hackbug) joins
00:38:08whoami quits [Client Quit]
00:39:29whoami (whoami) joins
00:49:20programmerq quits [Ping timeout: 252 seconds]
00:53:29programmerq (programmerq) joins
02:17:56second (second) joins
02:18:21sec^nd quits [Ping timeout: 245 seconds]
02:18:22second is now known as sec^nd
02:44:38<pabs>I heard that someone inside Google has been trying to get rid of feedburner for years.
02:44:41<pabs>asked them to do a proper transition and also contact us before it goes away
03:14:32_19100 leaves
03:56:53Island quits [Read error: Connection reset by peer]
03:58:18user__ quits [Remote host closed the connection]
03:58:31user__ joins
04:05:35BlueMaxima quits [Read error: Connection reset by peer]
04:40:15TastyWiener95 (TastyWiener95) joins
04:42:54jamesatjaminit quits [Ping timeout: 252 seconds]
04:43:52geezabiscuit quits [Read error: Connection reset by peer]
04:44:19jamesatjaminit (jamesatjaminit) joins
04:44:40geezabiscuit (geezabiscuit) joins
05:02:12sonick (sonick) joins
05:03:59hackbug quits [Ping timeout: 265 seconds]
05:14:45<h2ibot>JustAnotherArchivist edited Zippyshare.com (+172, Update infobox): https://wiki.archiveteam.org/?diff=49604&oldid=49575
06:19:08Arcorann (Arcorann) joins
06:36:44user__ quits [Remote host closed the connection]
06:40:32umgr036 joins
06:41:20umgr036 quits [Remote host closed the connection]
06:41:33umgr036 joins
07:00:24DiscantX quits [Ping timeout: 252 seconds]
08:10:47qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
08:19:42benjins quits [Read error: Connection reset by peer]
08:19:54michaelblob quits [Read error: Connection reset by peer]
08:20:19michaelblob (michaelblob) joins
08:20:24benjins joins
08:21:02Jake quits [Client Quit]
08:21:46Jake (Jake) joins
08:21:59Lord_Nightmare quits [Quit: ZNC - http://znc.in]
08:22:20Lord_Nightmare (Lord_Nightmare) joins
09:43:29benjins quits [Remote host closed the connection]
09:43:38benjins joins
10:02:18<@OrIdow6^2>Shutterfly share sites is dead
10:03:12<@OrIdow6^2>I did not figure out in time how to generate image URLs
10:04:36TastyWiener95 quits [Client Quit]
10:05:13hitgrr8 joins
10:09:27<@OrIdow6^2>Appears that the DNS trick works to some extent
10:10:57<@OrIdow6^2>Well, I should go asleep now, but if it's still "up" in the morning I guess I'll just take a best guess at that image URL generation
10:12:56<@OrIdow6^2>Which probably isn't that hard, but I was trying to be a perfectionist :|
11:08:56dan_a quits [Client Quit]
11:12:51dan_a (dan_a) joins
11:24:12eroc1990 quits [Client Quit]
11:24:50eroc1990 (eroc1990) joins
11:30:45benjinsm joins
11:31:44benjins quits [Remote host closed the connection]
11:31:44qwertyasdfuiopghjkl quits [Client Quit]
11:38:06drin joins
11:38:35hitgrr8 quits [Client Quit]
11:38:35geezabiscuit quits [Client Quit]
11:38:35Terbium quits [Client Quit]
11:38:41Terbium joins
11:38:57drin is now known as geezabiscuit
11:39:43hitgrr8 joins
11:42:15qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
11:57:39hackbug (hackbug) joins
12:21:53<thuban>grab-site works with a pyenv 3.8 venv but not with a system 3.7 venv, because the latter looks for libre2.so.9 and chokes on libre2.so.10. why? idk -_-
12:34:46Sluggs quits [Excess Flood]
12:36:25Sluggs joins
13:03:46Arcorann quits [Ping timeout: 252 seconds]
14:05:17_19100 (themillenniumbug) joins
14:05:55zhongfu quits [Ping timeout: 252 seconds]
14:12:59zhongfu (zhongfu) joins
14:22:49umgr036 quits [Remote host closed the connection]
14:41:54benjinsm is now known as benjins
14:54:18umgr036 joins
14:55:06umgr036 quits [Remote host closed the connection]
14:55:19umgr036 joins
14:59:08VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
14:59:33VerifiedJ (VerifiedJ) joins
15:00:33AnotherIki joins
15:04:24Iki1 quits [Ping timeout: 252 seconds]
15:27:28spirit quits [Client Quit]
15:36:22Island joins
15:45:01DopefishJustin quits [Remote host closed the connection]
15:45:53umgr036 quits [Remote host closed the connection]
15:49:35jacksonchen666 (jacksonchen666) joins
15:52:42DopefishJustin joins
15:56:20Sophira joins
16:01:47<Sophira>Hi there. I'm the owner of a site that's been running for the last 10-11 years or so dedicated to the TV Tropes ARG "The Wall Will Fall" ( http://twwf.info/ and its linked subdomains ). I intended to shut it down in December but haven't been able to bring myself to do so yet. The domain expires in a week, though, and I would prefer not to renew it if possible. Not all of the forum is archived on
16:01:53<Sophira>web.archive.org as for much of its life the forums had restrictive robots.txt files. I removed them a while back but there's a lot that still isn't in the archive. Is it possible to request that the sites be archived?
16:02:05<Sophira>As the owner I'm willing to help in any way I can for this to happen.
16:02:22<Sophira>(I was one of the original puppetmasters on the ARG.)
16:02:48<Sophira>Actually no my mistake, I think the forums had some kind of bot detection IIRC.
16:02:48Nulo quits [Read error: Connection reset by peer]
16:02:55Nulo joins
16:23:55<thuban>Sophira: yes, certainly! do you have a sitemap you can provide?
16:39:17<Sophira>thuban: I don't. I should be able to create one, I think, though it might take a while. Would a list of URLs suffice?
16:39:39<thuban>a list of urls would be perfect
16:39:48<Sophira>Also bear in mind that this will cover several different hostnames, though they're all under the umbrella domain of twwf.info.
16:41:11<thuban>that should be fine
16:41:32<Sophira>Okay. I'll do what I can, then! It might take a while though, as I say. Is there any kind of special processing you would normally do for phpBB forums and Wordpress sites?
16:42:06<pokechu22>wordpress and phpbb can both be done with archivebot without much issue
16:42:59<pokechu22>(in that the annoying stuff has mostly already been solved with some standard ignoresets)
16:44:05<Sophira>Awesome. One thing to bear in mind is that many of the sites will link to each other in forum posts and blog comments, so those 'external' links will need to be rewritten accordingly.
16:45:40<pokechu22>Yeah, archivebot doesn't do that super well - it only recurses within a single domain and saves individual outlinks. If each of the wordpress/phpBB forums has a front page where everything can be accessed that won't be as much of a problem though
16:46:45<pokechu22>there isn't any super good way to rewrite them with archivebot as-is :/
16:47:48<thuban>i think there's a miscommunication here--archivebot (like archiveteam) does not rewrite anything
16:48:31<Sophira>Oh, even links within the same site?
16:50:26<thuban>archivebot _follows_ links, but it won't alter anything. if an old post on foo.twwf.info links to bar.com (which is now copied at bar.twwf.info), it will be saved exactly as it is, including the link to bar.com
16:51:56<pokechu22>http://twwf.info/ says that links like that are already rewritten though so that might not be a problem
16:52:51<Sophira>Yeah, I use mod_filter on the server in order to do domain substitution like thata.
16:52:55<Sophira>^that/.
16:53:02<Sophira>...pretend I typed that correctly.
16:53:46<Sophira>But yes, all links to sites that have been archived to a subdomain beneath twwf.info are rewritten automatically.
16:54:10<thuban>what did you then mean by "those 'external' links will need to be rewritten"?
16:54:52VerifiedJ quits [Client Quit]
16:55:32VerifiedJ (VerifiedJ) joins
16:56:27umgr036 joins
16:56:44<Sophira>I mean 'external' in that, for example, some users making comments on Romeo's blog site, romeo.ezblog.twwf.info, have links to Juliet's blog site, which are rewritten to juliet.ezblog.twwf.info automatically. From your point of view, juliet.ezblog.twwf.info will be a different site from romeo.ezblog.twwf.info, right?
16:57:50<Sophira>That's what I mean by 'external', and that's why I put the word in quotes - because they're still under twwf.info, but from the point of view that they use two different hostnames, they could be considered two different sites.
16:58:05<Sophira>twwf.info
16:58:11<Sophira>Er.
16:58:25<thuban>ah. so by "rewrite" you only mean 'consider as part of the same site'.
16:58:29<Sophira>The main page at http://twwf.info/ (sorry, no HTTPS) links to all the various sites and they should all be accessible.
16:58:38<pokechu22>Yes, but that won't cause issues with doing two separate jobs that recurse over all of http://romeo.ezblog.twwf.info/ and http://juliet.ezblog.twwf.info/ (the pages that are linked between them would get saved twice, but that's probably fine)
16:58:59<pokechu22>It'd be an issue for http://xovr.twwf.info/ though and any deep links that aren't reachable from the front page
16:59:40<thuban>archivebot's subdomain handling is complicated™, but a complete sitemap will render it moot
17:00:56<thuban>(or, failing that (eg for the forums), a complete list of subdomains)
17:01:42<pokechu22>My thought is that doing an !a on each of the forums and blogs would get good enough coverage of those; wordpress and phpbb are usually fine for discovering pages even without a sitemap (though wordpress generally generates a sitemap anyways; seems like there isn't one in this case (too old?))
17:02:03Island_ joins
17:04:12<pokechu22>I might as well just try it and see how it goes... Sophira, any parameters on rate-limiting? Archivebot's default is 3 concurrency sets of requests where after each request it waits 250-375 milliseconds
17:04:54hitgrr8_ joins
17:05:03VerifiedJ8 (VerifiedJ) joins
17:05:32Sophira_ joins
17:05:34<Sophira_>Okay. An example of a page on xovr.twwf.info, btw, would be http://xovr.twwf.info/i_xukb3tnd.php . Entering the password "Gurt" (case-sensitive) would then show an image. I assume in these cases I should give both the pages themselves and the image URLs.
17:05:53VerifiedJ quits [Client Quit]
17:05:53hitgrr8 quits [Client Quit]
17:05:53Sluggs quits [Client Quit]
17:05:53qwertyasdfuiopghjkl quits [Client Quit]
17:05:53AnotherIki quits [Remote host closed the connection]
17:05:53Island quits [Remote host closed the connection]
17:05:53Sophira quits [Remote host closed the connection]
17:05:53VerifiedJ8 is now known as VerifiedJ
17:06:00Sophira_ is now known as Sophira
17:06:14<Sophira>I'm not sure if my last message sent because of the ping timeout, so:
17:06:18<Sophira>Okay. An example of a page on xovr.twwf.info, btw, would be http://xovr.twwf.info/i_xukb3tnd.php . Entering the password "Gurt" (case-sensitive) would then show an image. I assume in these cases I should give both the pages themselves and the image URLs.
17:06:41<Sophira>(also, the last thing I saw was thuban saying subdomain handling is complicated™.)
17:07:32<thuban>Sophira: https://hackint.logs.kiska.pw/archiveteam-bs/20230328#c340128
17:07:58qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
17:08:27Sluggs joins
17:10:13<Sophira>Ah, thank you! Odd that my message sent but I didn't see anybody else's. Oh well. As for rate-limiting, I imagine that'll be fine. The sites themselves aren't really used any more so there won't really be any disturbances.
17:12:11<Sophira>Subdomain-wise, I *think* all the subdomains are on twwf.info's front page. Let me just double-check.
17:17:15<Sophira>Yeah, they're all listed, I believe. Also just to note, the only sites that are Wordpress/phpBB-based are watchthefootage.twwf.info, forum.watchthefootage.twwf.info, and all the *.ezblog.twwf.info subdomains.
17:19:37<Sophira>Actually, that said, I would also like to archive the phpBB forum at forum.twwf.info. It's not listed in the main table because it only became a thing after the ARG itself, but it has a lot on it.
17:19:52<Sophira>(not so active any more though, heh)
17:21:44<pokechu22>Yeah, I can do that too. Last active Dec 31, 2022 is fairly good as far as inactive forums go :P
17:22:56<pokechu22>I've started on the ezblog ones
17:23:11<Sophira>Awesome, thank you <3
17:24:52<Sophira>Heee. I like the "and not" in the User-Agent string.
17:28:49<Sophira>So does this mean that with regard to the site map that I don't need to bother with grabbing all the post URLs and such from the databases?
17:29:00<Sophira>Or should I do that anyway?
17:29:21<pokechu22>For the wordpress ones? It's probably not necessary
17:30:21<pokechu22>It might be useful after everything's been saved to verify that it's actually complete, though (but that would have to be in a few days)
17:35:48<Sophira>That makes sense. Okay.
17:52:55<pokechu22>Based on http://watchthefootage.twwf.info/ there's also several twitter accounts linked with it - I can save those via socialbot. Is there a more complete list than the ones in the sidebar?
17:54:38IDK (IDK) joins
17:58:29<Sophira>One moment...
18:00:50<kiska>I hate npm... I broke etherpad :(
18:03:06<Sophira>I can't think of any other Twitter accounts to archive. I think it's complete.
18:07:14jamesatjaminit quits [Client Quit]
18:07:24_19100 quits [Client Quit]
18:11:13_19100 (themillenniumbug) joins
18:32:15sadsa joins
18:32:22sadsa quits [Remote host closed the connection]
18:46:58automato83 quits [Ping timeout: 252 seconds]
18:53:03<kiska>Fuck that was annoying... pad.notkiska.pw is back online
19:07:51qwertyasdfuiopghjkl quits [Remote host closed the connection]
19:08:27qwertyasdfuiopghjkl joins
19:12:26jamesatjaminit (jamesatjaminit) joins
19:13:46jamesatjaminit quits [Remote host closed the connection]
19:20:30automato83 joins
19:21:37Barto quits [Ping timeout: 252 seconds]
19:24:03jamesatjaminit (jamesatjaminit) joins
19:28:39TastyWiener95 (TastyWiener95) joins
19:46:29AnotherIki joins
19:50:33Pingerfowder quits [Quit: ZNC - https://znc.in]
19:50:42Pingerfowder (Pingerfowder) joins
19:53:42@rewby quits [Ping timeout: 252 seconds]
20:01:37jacksonchen666 quits [Client Quit]
20:08:52dan_a quits [Client Quit]
20:12:00user__ joins
20:14:58umgr036 quits [Ping timeout: 252 seconds]
20:23:48Barto (Barto) joins
20:26:41<qwertyasdfuiopghjkl>Sophira: From searching for links to twitter.com on https://tvtropes.org/pmwiki/pmwiki.php/Recap/TheWallWillFall and the other tvtropes wiki pages linked from that, I found https://twitter.com/DeadCatInABox , https://twitter.com/GurtTheLimeMan and https://twitter.com/RADIOVOIDREBEL that look related but weren't listed on the sidebar
20:32:40dan_a (dan_a) joins
20:40:24dan_a quits [Client Quit]
20:47:47dan_a (dan_a) joins
21:06:43rewby (rewby) joins
21:06:43@ChanServ sets mode: +o rewby
21:11:26hitgrr8_ quits [Client Quit]
21:16:06user__ quits [Read error: Connection reset by peer]
21:16:29user__ joins
22:28:19<@OrIdow6^2>Shutterfly share sites is indeed usable with the DNS trick
22:32:24BlueMaxima joins
23:35:44<driib>Hi, I've been running a warrior on the telegrab project for some short while and got interested to check what kind of data ends up published on IA. I tried to download and inspect one package from https://archive.org/details/archiveteam_telegram but ran into some issues. 1) I cannot unzstd the megawarc file due to a "Decoding error (36) :
23:35:44<driib>Dictionary mismatch"; the internet says it's due to an external dictionary use but I can't seem to find one on https://archive.org/download/archiveteam_telegram_20230327203637_2cf0eb8f, for example. 2) https://github.com/internetarchive/warctools does not seem to include tools to deal with megawarcs or zstd, what CLI tools do you recommend if I
23:35:44<driib>want to look into the payload of a single item? Thank you all for your patience with my noob questions! Hope I put em in the right channel too.
23:37:12<pokechu22>Pretty sure this is the right channel but I don't have an answer beyond that
23:37:15<@OrIdow6^2>The dict is in a skippable frame at the beginning of the zstd
23:37:47<@OrIdow6^2>I wrote an awful tool to extract them a while back, I believe someone else wrote a better one, but if no one comes around with that in a bit I can give you the old one
23:39:31<@OrIdow6^2>(It's in the skippable frame, and furthermore it itself is compressed with vanilla zstd)
23:48:54user__ quits [Remote host closed the connection]
23:49:52umgr036 joins
23:55:12<@OrIdow6^2>driib: Alright, actually I've made a little new one without the dependency issue https://transfer.archivete.am/sW9PL/get_zstd_dict_simple.py
23:55:33<@OrIdow6^2>This takes the name of the warc.gz as its argument and puts the compressed dict to stdout