| 00:02:00 | | hackbug quits [Client Quit] |
| 00:03:55 | | hackbug (hackbug) joins |
| 00:38:08 | | whoami quits [Client Quit] |
| 00:39:29 | | whoami (whoami) joins |
| 00:49:20 | | programmerq quits [Ping timeout: 252 seconds] |
| 00:53:29 | | programmerq (programmerq) joins |
| 02:17:56 | | second (second) joins |
| 02:18:21 | | sec^nd quits [Ping timeout: 245 seconds] |
| 02:18:22 | | second is now known as sec^nd |
| 02:44:38 | <pabs> | I heard that someone inside Google has been trying to get rid of feedburner for years. |
| 02:44:41 | <pabs> | asked them to do a proper transition and also contact us before it goes away |
| 03:14:32 | | _19100 leaves |
| 03:56:53 | | Island quits [Read error: Connection reset by peer] |
| 03:58:18 | | user__ quits [Remote host closed the connection] |
| 03:58:31 | | user__ joins |
| 04:05:35 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 04:40:15 | | TastyWiener95 (TastyWiener95) joins |
| 04:42:54 | | jamesatjaminit quits [Ping timeout: 252 seconds] |
| 04:43:52 | | geezabiscuit quits [Read error: Connection reset by peer] |
| 04:44:19 | | jamesatjaminit (jamesatjaminit) joins |
| 04:44:40 | | geezabiscuit (geezabiscuit) joins |
| 05:02:12 | | sonick (sonick) joins |
| 05:03:59 | | hackbug quits [Ping timeout: 265 seconds] |
| 05:14:45 | <h2ibot> | JustAnotherArchivist edited Zippyshare.com (+172, Update infobox): https://wiki.archiveteam.org/?diff=49604&oldid=49575 |
| 06:19:08 | | Arcorann (Arcorann) joins |
| 06:36:44 | | user__ quits [Remote host closed the connection] |
| 06:40:32 | | umgr036 joins |
| 06:41:20 | | umgr036 quits [Remote host closed the connection] |
| 06:41:33 | | umgr036 joins |
| 07:00:24 | | DiscantX quits [Ping timeout: 252 seconds] |
| 08:10:47 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 08:19:42 | | benjins quits [Read error: Connection reset by peer] |
| 08:19:54 | | michaelblob quits [Read error: Connection reset by peer] |
| 08:20:19 | | michaelblob (michaelblob) joins |
| 08:20:24 | | benjins joins |
| 08:21:02 | | Jake quits [Client Quit] |
| 08:21:46 | | Jake (Jake) joins |
| 08:21:59 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 08:22:20 | | Lord_Nightmare (Lord_Nightmare) joins |
| 09:43:29 | | benjins quits [Remote host closed the connection] |
| 09:43:38 | | benjins joins |
| 10:02:18 | <@OrIdow6^2> | Shutterfly share sites is dead |
| 10:03:12 | <@OrIdow6^2> | I did not figure out in time how to generate image URLs |
| 10:04:36 | | TastyWiener95 quits [Client Quit] |
| 10:05:13 | | hitgrr8 joins |
| 10:09:27 | <@OrIdow6^2> | Appears that the DNS trick works to some extent |
| 10:10:57 | <@OrIdow6^2> | Well, I should go asleep now, but if it's still "up" in the morning I guess I'll just take a best guess at that image URL generation |
| 10:12:56 | <@OrIdow6^2> | Which probably isn't that hard, but I was trying to be a perfectionist :| |
| 11:08:56 | | dan_a quits [Client Quit] |
| 11:12:51 | | dan_a (dan_a) joins |
| 11:24:12 | | eroc1990 quits [Client Quit] |
| 11:24:50 | | eroc1990 (eroc1990) joins |
| 11:30:45 | | benjinsm joins |
| 11:31:44 | | benjins quits [Remote host closed the connection] |
| 11:31:44 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 11:38:06 | | drin joins |
| 11:38:35 | | hitgrr8 quits [Client Quit] |
| 11:38:35 | | geezabiscuit quits [Client Quit] |
| 11:38:35 | | Terbium quits [Client Quit] |
| 11:38:41 | | Terbium joins |
| 11:38:57 | | drin is now known as geezabiscuit |
| 11:39:43 | | hitgrr8 joins |
| 11:42:15 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 11:57:39 | | hackbug (hackbug) joins |
| 12:21:53 | <thuban> | grab-site works with a pyenv 3.8 venv but not with a system 3.7 venv, because the latter looks for libre2.so.9 and chokes on libre2.so.10. why? idk -_- |
| 12:34:46 | | Sluggs quits [Excess Flood] |
| 12:36:25 | | Sluggs joins |
| 13:03:46 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:05:17 | | _19100 (themillenniumbug) joins |
| 14:05:55 | | zhongfu quits [Ping timeout: 252 seconds] |
| 14:12:59 | | zhongfu (zhongfu) joins |
| 14:22:49 | | umgr036 quits [Remote host closed the connection] |
| 14:41:54 | | benjinsm is now known as benjins |
| 14:42:01 | | benjins is now authenticated as benjins |
| 14:54:18 | | umgr036 joins |
| 14:55:06 | | umgr036 quits [Remote host closed the connection] |
| 14:55:19 | | umgr036 joins |
| 14:59:08 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
| 14:59:33 | | VerifiedJ (VerifiedJ) joins |
| 15:00:33 | | AnotherIki joins |
| 15:04:24 | | Iki1 quits [Ping timeout: 252 seconds] |
| 15:27:28 | | spirit quits [Client Quit] |
| 15:36:22 | | Island joins |
| 15:45:01 | | DopefishJustin quits [Remote host closed the connection] |
| 15:45:53 | | umgr036 quits [Remote host closed the connection] |
| 15:49:35 | | jacksonchen666 (jacksonchen666) joins |
| 15:52:42 | | DopefishJustin joins |
| 15:52:42 | | DopefishJustin is now authenticated as DopefishJustin |
| 15:56:20 | | Sophira joins |
| 16:01:47 | <Sophira> | Hi there. I'm the owner of a site that's been running for the last 10-11 years or so dedicated to the TV Tropes ARG "The Wall Will Fall" ( http://twwf.info/ and its linked subdomains ). I intended to shut it down in December but haven't been able to bring myself to do so yet. The domain expires in a week, though, and I would prefer not to renew it if possible. Not all of the forum is archived on |
| 16:01:53 | <Sophira> | web.archive.org as for much of its life the forums had restrictive robots.txt files. I removed them a while back but there's a lot that still isn't in the archive. Is it possible to request that the sites be archived? |
| 16:02:05 | <Sophira> | As the owner I'm willing to help in any way I can for this to happen. |
| 16:02:22 | <Sophira> | (I was one of the original puppetmasters on the ARG.) |
| 16:02:48 | <Sophira> | Actually no my mistake, I think the forums had some kind of bot detection IIRC. |
| 16:02:48 | | Nulo quits [Read error: Connection reset by peer] |
| 16:02:55 | | Nulo joins |
| 16:23:55 | <thuban> | Sophira: yes, certainly! do you have a sitemap you can provide? |
| 16:39:17 | <Sophira> | thuban: I don't. I should be able to create one, I think, though it might take a while. Would a list of URLs suffice? |
| 16:39:39 | <thuban> | a list of urls would be perfect |
| 16:39:48 | <Sophira> | Also bear in mind that this will cover several different hostnames, though they're all under the umbrella domain of twwf.info. |
| 16:41:11 | <thuban> | that should be fine |
| 16:41:32 | <Sophira> | Okay. I'll do what I can, then! It might take a while though, as I say. Is there any kind of special processing you would normally do for phpBB forums and Wordpress sites? |
| 16:42:06 | <pokechu22> | wordpress and phpbb can both be done with archivebot without much issue |
| 16:42:59 | <pokechu22> | (in that the annoying stuff has mostly already been solved with some standard ignoresets) |
| 16:44:05 | <Sophira> | Awesome. One thing to bear in mind is that many of the sites will link to each other in forum posts and blog comments, so those 'external' links will need to be rewritten accordingly. |
| 16:45:40 | <pokechu22> | Yeah, archivebot doesn't do that super well - it only recurses within a single domain and saves individual outlinks. If each of the wordpress/phpBB forums has a front page where everything can be accessed that won't be as much of a problem though |
| 16:46:45 | <pokechu22> | there isn't any super good way to rewrite them with archivebot as-is :/ |
| 16:47:48 | <thuban> | i think there's a miscommunication here--archivebot (like archiveteam) does not rewrite anything |
| 16:48:31 | <Sophira> | Oh, even links within the same site? |
| 16:50:26 | <thuban> | archivebot _follows_ links, but it won't alter anything. if an old post on foo.twwf.info links to bar.com (which is now copied at bar.twwf.info), it will be saved exactly as it is, including the link to bar.com |
| 16:51:56 | <pokechu22> | http://twwf.info/ says that links like that are already rewritten though so that might not be a problem |
| 16:52:51 | <Sophira> | Yeah, I use mod_filter on the server in order to do domain substitution like thata. |
| 16:52:55 | <Sophira> | ^that/. |
| 16:53:02 | <Sophira> | ...pretend I typed that correctly. |
| 16:53:46 | <Sophira> | But yes, all links to sites that have been archived to a subdomain beneath twwf.info are rewritten automatically. |
| 16:54:10 | <thuban> | what did you then mean by "those 'external' links will need to be rewritten"? |
| 16:54:52 | | VerifiedJ quits [Client Quit] |
| 16:55:32 | | VerifiedJ (VerifiedJ) joins |
| 16:56:27 | | umgr036 joins |
| 16:56:44 | <Sophira> | I mean 'external' in that, for example, some users making comments on Romeo's blog site, romeo.ezblog.twwf.info, have links to Juliet's blog site, which are rewritten to juliet.ezblog.twwf.info automatically. From your point of view, juliet.ezblog.twwf.info will be a different site from romeo.ezblog.twwf.info, right? |
| 16:57:50 | <Sophira> | That's what I mean by 'external', and that's why I put the word in quotes - because they're still under twwf.info, but from the point of view that they use two different hostnames, they could be considered two different sites. |
| 16:58:05 | <Sophira> | twwf.info |
| 16:58:11 | <Sophira> | Er. |
| 16:58:25 | <thuban> | ah. so by "rewrite" you only mean 'consider as part of the same site'. |
| 16:58:29 | <Sophira> | The main page at http://twwf.info/ (sorry, no HTTPS) links to all the various sites and they should all be accessible. |
| 16:58:38 | <pokechu22> | Yes, but that won't cause issues with doing two separate jobs that recurse over all of http://romeo.ezblog.twwf.info/ and http://juliet.ezblog.twwf.info/ (the pages that are linked between them would get saved twice, but that's probably fine) |
| 16:58:59 | <pokechu22> | It'd be an issue for http://xovr.twwf.info/ though and any deep links that aren't reachable from the front page |
| 16:59:40 | <thuban> | archivebot's subdomain handling is complicated™, but a complete sitemap will render it moot |
| 17:00:56 | <thuban> | (or, failing that (eg for the forums), a complete list of subdomains) |
| 17:01:42 | <pokechu22> | My thought is that doing an !a on each of the forums and blogs would get good enough coverage of those; wordpress and phpbb are usually fine for discovering pages even without a sitemap (though wordpress generally generates a sitemap anyways; seems like there isn't one in this case (too old?)) |
| 17:02:03 | | Island_ joins |
| 17:04:12 | <pokechu22> | I might as well just try it and see how it goes... Sophira, any parameters on rate-limiting? Archivebot's default is 3 concurrency sets of requests where after each request it waits 250-375 milliseconds |
| 17:04:54 | | hitgrr8_ joins |
| 17:05:03 | | VerifiedJ8 (VerifiedJ) joins |
| 17:05:32 | | Sophira_ joins |
| 17:05:34 | <Sophira_> | Okay. An example of a page on xovr.twwf.info, btw, would be http://xovr.twwf.info/i_xukb3tnd.php . Entering the password "Gurt" (case-sensitive) would then show an image. I assume in these cases I should give both the pages themselves and the image URLs. |
| 17:05:53 | | VerifiedJ quits [Client Quit] |
| 17:05:53 | | hitgrr8 quits [Client Quit] |
| 17:05:53 | | Sluggs quits [Client Quit] |
| 17:05:53 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 17:05:53 | | AnotherIki quits [Remote host closed the connection] |
| 17:05:53 | | Island quits [Remote host closed the connection] |
| 17:05:53 | | Sophira quits [Remote host closed the connection] |
| 17:05:53 | | VerifiedJ8 is now known as VerifiedJ |
| 17:06:00 | | Sophira_ is now known as Sophira |
| 17:06:14 | <Sophira> | I'm not sure if my last message sent because of the ping timeout, so: |
| 17:06:18 | <Sophira> | Okay. An example of a page on xovr.twwf.info, btw, would be http://xovr.twwf.info/i_xukb3tnd.php . Entering the password "Gurt" (case-sensitive) would then show an image. I assume in these cases I should give both the pages themselves and the image URLs. |
| 17:06:41 | <Sophira> | (also, the last thing I saw was thuban saying subdomain handling is complicated™.) |
| 17:07:32 | <thuban> | Sophira: https://hackint.logs.kiska.pw/archiveteam-bs/20230328#c340128 |
| 17:07:58 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 17:08:27 | | Sluggs joins |
| 17:10:13 | <Sophira> | Ah, thank you! Odd that my message sent but I didn't see anybody else's. Oh well. As for rate-limiting, I imagine that'll be fine. The sites themselves aren't really used any more so there won't really be any disturbances. |
| 17:12:11 | <Sophira> | Subdomain-wise, I *think* all the subdomains are on twwf.info's front page. Let me just double-check. |
| 17:17:15 | <Sophira> | Yeah, they're all listed, I believe. Also just to note, the only sites that are Wordpress/phpBB-based are watchthefootage.twwf.info, forum.watchthefootage.twwf.info, and all the *.ezblog.twwf.info subdomains. |
| 17:19:37 | <Sophira> | Actually, that said, I would also like to archive the phpBB forum at forum.twwf.info. It's not listed in the main table because it only became a thing after the ARG itself, but it has a lot on it. |
| 17:19:52 | <Sophira> | (not so active any more though, heh) |
| 17:21:44 | <pokechu22> | Yeah, I can do that too. Last active Dec 31, 2022 is fairly good as far as inactive forums go :P |
| 17:22:56 | <pokechu22> | I've started on the ezblog ones |
| 17:23:11 | <Sophira> | Awesome, thank you <3 |
| 17:24:52 | <Sophira> | Heee. I like the "and not" in the User-Agent string. |
| 17:28:49 | <Sophira> | So does this mean that with regard to the site map that I don't need to bother with grabbing all the post URLs and such from the databases? |
| 17:29:00 | <Sophira> | Or should I do that anyway? |
| 17:29:21 | <pokechu22> | For the wordpress ones? It's probably not necessary |
| 17:30:21 | <pokechu22> | It might be useful after everything's been saved to verify that it's actually complete, though (but that would have to be in a few days) |
| 17:35:48 | <Sophira> | That makes sense. Okay. |
| 17:52:55 | <pokechu22> | Based on http://watchthefootage.twwf.info/ there's also several twitter accounts linked with it - I can save those via socialbot. Is there a more complete list than the ones in the sidebar? |
| 17:54:38 | | IDK (IDK) joins |
| 17:58:29 | <Sophira> | One moment... |
| 18:00:50 | <kiska> | I hate npm... I broke etherpad :( |
| 18:03:06 | <Sophira> | I can't think of any other Twitter accounts to archive. I think it's complete. |
| 18:07:14 | | jamesatjaminit quits [Client Quit] |
| 18:07:24 | | _19100 quits [Client Quit] |
| 18:11:13 | | _19100 (themillenniumbug) joins |
| 18:32:15 | | sadsa joins |
| 18:32:22 | | sadsa quits [Remote host closed the connection] |
| 18:46:58 | | automato83 quits [Ping timeout: 252 seconds] |
| 18:53:03 | <kiska> | Fuck that was annoying... pad.notkiska.pw is back online |
| 19:07:51 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 19:08:27 | | qwertyasdfuiopghjkl joins |
| 19:12:26 | | jamesatjaminit (jamesatjaminit) joins |
| 19:13:46 | | jamesatjaminit quits [Remote host closed the connection] |
| 19:20:30 | | automato83 joins |
| 19:21:37 | | Barto quits [Ping timeout: 252 seconds] |
| 19:24:03 | | jamesatjaminit (jamesatjaminit) joins |
| 19:28:39 | | TastyWiener95 (TastyWiener95) joins |
| 19:46:29 | | AnotherIki joins |
| 19:50:33 | | Pingerfowder quits [Quit: ZNC - https://znc.in] |
| 19:50:42 | | Pingerfowder (Pingerfowder) joins |
| 19:53:42 | | @rewby quits [Ping timeout: 252 seconds] |
| 20:01:37 | | jacksonchen666 quits [Client Quit] |
| 20:08:52 | | dan_a quits [Client Quit] |
| 20:12:00 | | user__ joins |
| 20:14:58 | | umgr036 quits [Ping timeout: 252 seconds] |
| 20:23:48 | | Barto (Barto) joins |
| 20:26:41 | <qwertyasdfuiopghjkl> | Sophira: From searching for links to twitter.com on https://tvtropes.org/pmwiki/pmwiki.php/Recap/TheWallWillFall and the other tvtropes wiki pages linked from that, I found https://twitter.com/DeadCatInABox , https://twitter.com/GurtTheLimeMan and https://twitter.com/RADIOVOIDREBEL that look related but weren't listed on the sidebar |
| 20:29:07 | | qwertyasdfuiopghjkl is now authenticated as qwertyasdfuiopghjkl |
| 20:32:40 | | dan_a (dan_a) joins |
| 20:40:24 | | dan_a quits [Client Quit] |
| 20:47:47 | | dan_a (dan_a) joins |
| 21:06:43 | | rewby (rewby) joins |
| 21:06:43 | | @ChanServ sets mode: +o rewby |
| 21:11:26 | | hitgrr8_ quits [Client Quit] |
| 21:16:06 | | user__ quits [Read error: Connection reset by peer] |
| 21:16:29 | | user__ joins |
| 22:28:19 | <@OrIdow6^2> | Shutterfly share sites is indeed usable with the DNS trick |
| 22:32:24 | | BlueMaxima joins |
| 23:35:44 | <driib> | Hi, I've been running a warrior on the telegrab project for some short while and got interested to check what kind of data ends up published on IA. I tried to download and inspect one package from https://archive.org/details/archiveteam_telegram but ran into some issues. 1) I cannot unzstd the megawarc file due to a "Decoding error (36) : |
| 23:35:44 | <driib> | Dictionary mismatch"; the internet says it's due to an external dictionary use but I can't seem to find one on https://archive.org/download/archiveteam_telegram_20230327203637_2cf0eb8f, for example. 2) https://github.com/internetarchive/warctools does not seem to include tools to deal with megawarcs or zstd, what CLI tools do you recommend if I |
| 23:35:44 | <driib> | want to look into the payload of a single item? Thank you all for your patience with my noob questions! Hope I put em in the right channel too. |
| 23:37:12 | <pokechu22> | Pretty sure this is the right channel but I don't have an answer beyond that |
| 23:37:15 | <@OrIdow6^2> | The dict is in a skippable frame at the beginning of the zstd |
| 23:37:47 | <@OrIdow6^2> | I wrote an awful tool to extract them a while back, I believe someone else wrote a better one, but if no one comes around with that in a bit I can give you the old one |
| 23:39:31 | <@OrIdow6^2> | (It's in the skippable frame, and furthermore it itself is compressed with vanilla zstd) |
| 23:48:54 | | user__ quits [Remote host closed the connection] |
| 23:49:52 | | umgr036 joins |
| 23:55:12 | <@OrIdow6^2> | driib: Alright, actually I've made a little new one without the dependency issue https://transfer.archivete.am/sW9PL/get_zstd_dict_simple.py |
| 23:55:33 | <@OrIdow6^2> | This takes the name of the warc.gz as its argument and puts the compressed dict to stdout |