| 00:06:55 | | etnguyen03 (etnguyen03) joins |
| 00:11:42 | | rohvani joins |
| 00:15:43 | <klea> | tbh i don't think i have done much |
| 00:15:49 | <klea> | also burnout fun .p |
| 00:15:55 | <klea> | oh this is -ot chat |
| 00:16:12 | | Dada quits [Remote host closed the connection] |
| 00:37:17 | | etnguyen03 quits [Client Quit] |
| 00:58:34 | <steering> | Guest: me irl |
| 00:59:12 | | Shard795915 quits [Client Quit] |
| 00:59:39 | <steering> | > The page "MediaWiki:Editnotice-N" is loaded for an entire namespace, where "N" is the namespace number. For example "MediaWiki:Editnotice-3" is loaded for user talk pages. |
| 00:59:45 | <steering> | mediawiki why? |
| 00:59:48 | | Shard795915 (Shard) joins |
| 01:00:03 | <steering> | why is it not MW:Editnotice-Talk smh |
| 01:00:55 | <steering> | seems like MW:Talkpagetext would be better (all talk namespaces) |
| 01:03:14 | <klea> | oh i thought talkpagetext was displayed when viewing talk pages too, not only when editing. |
| 01:03:30 | <steering> | hmm, it is unclear |
| 01:03:39 | <steering> | "This corresponds to all "MediaWiki:Editnotice-X" messages together, where X is an odd number" leads me to believe its shown on edit though |
| 01:05:25 | | etnguyen03 (etnguyen03) joins |
| 01:54:11 | | Shard795915 quits [Ping timeout: 272 seconds] |
| 01:56:46 | | Shard795915 (Shard) joins |
| 02:39:06 | | Island joins |
| 03:27:47 | | epoch (epoch) joins |
| 03:33:29 | | HP_Archivist quits [Quit: Leaving] |
| 03:37:22 | | sec^nd quits [Ping timeout: 256 seconds] |
| 03:38:33 | | sec^nd (second) joins |
| 03:47:37 | | midou quits [Read error: Connection reset by peer] |
| 03:55:43 | | Guest quits [Quit: Guest] |
| 03:56:05 | | Guest joins |
| 03:56:15 | | Guest quits [Client Quit] |
| 03:56:27 | | Guest joins |
| 03:57:12 | | midou joins |
| 04:04:10 | | midou quits [Read error: Connection reset by peer] |
| 04:08:09 | | midou joins |
| 04:08:13 | | HP_Archivist (HP_Archivist) joins |
| 04:09:05 | | nothere_ quits [Ping timeout: 272 seconds] |
| 04:10:29 | | etnguyen03 quits [Remote host closed the connection] |
| 04:38:02 | | nothere joins |
| 04:46:49 | | Island quits [Read error: Connection reset by peer] |
| 04:53:04 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:54:43 | <pabs> | JAA: Canonical folks sent a mail asking for an IP address we will use for archiving the Ubuntu MoinMoin wikis and also the vBulletin forums that were at http://ubuntuforums.org/ (currently redirects to Discourse, but they would open it back up for us) |
| 04:54:53 | <pabs> | I'm guessing AB for the latter |
| 04:55:16 | <pabs> | MoinMoin we need someone to write a script, so thats a bit trickier |
| 05:04:38 | | Kotomind joins |
| 05:09:15 | | beardicus quits [Ping timeout: 272 seconds] |
| 05:09:27 | | beardicus (beardicus) joins |
| 05:37:05 | <@JAA> | pabs: Oh, right, thanks for the reminder. |
| 05:37:25 | <@JAA> | We should be able to do MoinMoin through AB as well. But yes, needs some scripts. |
| 05:49:41 | | Wohlstand (Wohlstand) joins |
| 06:01:42 | | nexussfan quits [Read error: Connection reset by peer] |
| 06:10:32 | <HP_Archivist> | JAA: Any interest in proactively looking at gov domains in Greenland...? |
| 06:10:40 | <HP_Archivist> | Unsure if anyone has done that yet |
| 06:29:41 | | midou quits [Ping timeout: 272 seconds] |
| 06:30:12 | | midou joins |
| 06:41:36 | | midou quits [Read error: Connection reset by peer] |
| 06:51:01 | | midou joins |
| 06:54:38 | | Webuser979078 joins |
| 06:56:46 | | Webuser979078 quits [Client Quit] |
| 07:21:52 | | midou quits [Ping timeout: 256 seconds] |
| 07:22:40 | <h2ibot> | PaulWise edited Software (-20, about software/code for archiving, not about…): https://wiki.archiveteam.org/?diff=60015&oldid=59179 |
| 07:40:54 | | midou joins |
| 07:41:56 | <pabs> | JAA: reminder for MoinMoin though, non-sequential diffs are a serious issue with no adequate solution yet (the ignores are still buggy). so I asked Canonical to hide the non-sequential diffs from AB :) |
| 07:42:07 | <pabs> | https://wiki.archiveteam.org/index.php/MoinMoin |
| 07:42:35 | <pabs> | https://wiki.archiveteam.org/index.php/ArchiveBot/Ignore/NonSequentialIntegers |
| 08:02:45 | <h2ibot> | Hans5958 edited Bre.ad (-38): https://wiki.archiveteam.org/?diff=60016&oldid=59577 |
| 08:10:19 | | beastbg8__ joins |
| 08:11:01 | | midou quits [Ping timeout: 272 seconds] |
| 08:14:49 | | beastbg8_ quits [Ping timeout: 272 seconds] |
| 08:18:57 | | midou joins |
| 08:22:33 | | LddPotato quits [Read error: Connection reset by peer] |
| 08:23:23 | | LddPotato (LddPotato) joins |
| 08:24:04 | | Webuser339099 joins |
| 08:25:00 | | Webuser339099 quits [Client Quit] |
| 08:33:14 | | LddPotato quits [Read error: Connection reset by peer] |
| 08:34:00 | | LddPotato (LddPotato) joins |
| 08:44:00 | | LddPotato quits [Read error: Connection reset by peer] |
| 08:45:09 | | LddPotato (LddPotato) joins |
| 08:54:38 | <triplecamera|m> | Hi. I'd like to make some small WARC dumps by myself with wpull, the same crawler used by archivebot. However, it cannot run on the latest Python. What magic is archivebot using to make it work? |
| 08:55:59 | <triplecamera|m> | <https://github.com/ArchiveTeam/wpull> last commit was 4 years ago |
| 08:57:30 | <pabs> | AB uses old software IIRC. if you want to share the links here we can do them in AB |
| 09:01:52 | <h2ibot> | PaulWise edited .ps (+53, add data in sidebar, indicate it is private): https://wiki.archiveteam.org/?diff=60017&oldid=58998 |
| 09:10:10 | <triplecamera|m> | pabs: Well, no, thanks. I think I should try wget. |
| 09:10:43 | <pabs> | wget doesn't make standards-compliant WARC https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem |
| 09:11:26 | <triplecamera|m> | 🤔 |
| 09:14:09 | <triplecamera|m> | So is there any recommended web crawlers for personal use? |
| 09:16:45 | <pabs> | only recommended stuff on the page is grab-site (uses wpull) and wget-at https://github.com/ArchiveTeam/wget-lua |
| 09:20:43 | <triplecamera|m> | Well, seems that wpull is the best option. |
| 09:21:23 | <triplecamera|m> | I will try to run wpull on Python 3.7 / 3.8. Thank you. |
| 09:22:36 | <@JAA> | pabs: I wouldn't be doing a recursive crawl nor grab the diff pages at all. I do need to finish those scripts though. |
| 09:23:41 | <pabs> | hmm, I wonder if non-recursive would miss uploaded files? |
| 09:23:55 | <@JAA> | HP_Archivist: That may be a good idea, yeah. |
| 09:24:50 | <HP_Archivist> | JAA: Sure thing. I'll circle back later today about it. |
| 09:25:40 | <@JAA> | triplecamera|m: You need 3.6 for wpull, but wpull on its own isn't very pleasant due to a few CLI bugs. grab-site uses a fork of wpull that supports newer Python versions. |
| 09:31:45 | <triplecamera|m> | JAA: 😕 |
| 09:31:54 | <triplecamera|m> | OK, I will have a try. |
| 09:34:37 | | nathang2184 quits [Ping timeout: 272 seconds] |
| 09:42:31 | | nathang2184 joins |
| 09:55:00 | <triplecamera|m> | pabs: By the way, has this been reported to the wget maintainers? I can still see angle brackets in the latest wget. |
| 09:57:27 | <pabs> | looks like yes, my browser history has these wget warc URLs https://savannah.gnu.org/bugs/?64203 https://savannah.gnu.org/bugs/?func=detailitem&item_id=47281 |
| 10:02:34 | <pabs> | hmm, the wget bug says <http://example.com/> is correct but the wiki says that its the standard but shouldn't be done |
| 10:02:58 | <@arkiver> | i believe it was a bug/error in the standard |
| 10:03:06 | <@arkiver> | it was not supposed to be defined as such |
| 10:03:26 | <pabs> | ah, I see, that bug was what caused wget to make the change to add <> |
| 10:06:12 | <@arkiver> | yes |
| 10:07:42 | <pabs> | looks like that being bogus was never reported in the wget savannah bugs system |
| 10:08:01 | <h2ibot> | PaulWise edited The WARC Ecosystem (+137, add refs related to wget WARC issues): https://wiki.archiveteam.org/?diff=60018&oldid=59464 |
| 10:09:23 | <triplecamera|m> | I just checked the specs. The angle brackets were required in WARC 1.0, but were removed in WARC 1.1. |
| 10:10:09 | <triplecamera|m> | > NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC 3986> ">". This rule has |
| 10:10:09 | <triplecamera|m> | been changed to meet requests from implementers. |
| 10:17:02 | <h2ibot> | PaulWise edited The WARC Ecosystem (+113, mention the WARC/1.1 removal of brackets around…): https://wiki.archiveteam.org/?diff=60019&oldid=60018 |
| 10:20:11 | | Shard795915 quits [Quit: Ping timeout (120 seconds)] |
| 10:20:26 | | Shard795915 (Shard) joins |
| 10:22:12 | | Dada joins |
| 10:23:09 | | Wohlstand quits [Quit: Wohlstand] |
| 10:29:31 | <@JAA> | The angle brackets were in the 1.0 spec grammar but not in the 1.0 spec examples, and no software other than wget ever wrote (or supported reading) them. |
| 10:29:47 | <@JAA> | Yes, this has been discussed with the wget devs before. |
| 10:30:15 | <@JAA> | (On IRC) |
| 11:11:08 | <h2ibot> | Movses edited Academia.edu (+1, IRC #archiveteam-bs <pokechu22> I don't think…): https://wiki.archiveteam.org/?diff=60020&oldid=59049 |
| 11:11:09 | <h2ibot> | Hans5958 deleted Template:Partially saved (Deleted to make way for move from…) |
| 11:11:10 | <h2ibot> | Hans5958 moved Template:Partiallysaved to Template:Partially saved: https://wiki.archiveteam.org/?title=Template%3APartially%20saved |
| 11:11:11 | <h2ibot> | Hans5958 moved Template:Selfsaved to Template:Self-saved: https://wiki.archiveteam.org/?title=Template%3ASelf-saved |
| 11:11:12 | <h2ibot> | Hans5958 moved Template:Onhiatus to Template:On hiatus: https://wiki.archiveteam.org/?title=Template%3AOn%20hiatus |
| 11:11:13 | <h2ibot> | Hans5958 deleted Template:On hiatus (Deleted to make way for move from…) |
| 11:24:24 | | LddPotato quits [Read error: Connection reset by peer] |
| 11:25:32 | | LddPotato (LddPotato) joins |
| 11:26:19 | <@arkiver> | JAA: i approved those renamed on the Templates, but if they should not have been, we can turn them back |
| 11:26:28 | <@arkiver> | (see last three h2ibot messages) |
| 11:30:58 | <@JAA> | arkiver: Seems fine with me. |
| 11:31:18 | <@arkiver> | alright |
| 11:32:11 | <h2ibot> | JustAnotherArchivist edited Academia.edu (-1, Reverted edits by…): https://wiki.archiveteam.org/?diff=60028&oldid=60020 |
| 11:32:14 | <@JAA> | But this one was a misunderstanding about what the 'project status' row means. |
| 11:36:28 | | LddPotato quits [Read error: Connection reset by peer] |
| 11:37:06 | | LddPotato (LddPotato) joins |
| 11:48:07 | | LddPotato quits [Read error: Connection reset by peer] |
| 11:48:46 | | LddPotato (LddPotato) joins |
| 11:49:32 | | quartermaster quits [Quit: Connection closed for inactivity] |
| 12:00:00 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:42 | | Bleo182600722719623455222 joins |
| 12:52:50 | | Dada quits [Remote host closed the connection] |
| 13:05:50 | | Dada joins |
| 13:29:39 | | Wohlstand (Wohlstand) joins |
| 13:40:51 | <triplecamera|m> | <triplecamera|m> "By the way, has this been..." <- ...So, is anyone trying to file a bug report for wget? |
| 13:41:37 | | Shard795915 quits [Ping timeout: 272 seconds] |
| 13:42:50 | | Shard795915 (Shard) joins |
| 13:47:03 | <chrismrtn> | triplecamera|m: The problem has been acknowledged in the wget mailing list before, and a solution was proposed by one of the developers at https://lists.gnu.org/archive/html/bug-wget/2024-11/msg00010.html (although no action has been taken, AFAIK) |
| 14:03:22 | | ArcadianMaggie quits [Read error: Connection reset by peer] |
| 14:06:22 | | Dada quits [Remote host closed the connection] |
| 14:08:37 | | Dada joins |
| 14:18:27 | <triplecamera|m> | chrismrtn: Well, that's unfortunate... Is there a way to remind them? |
| 14:27:19 | | SootBector quits [Remote host closed the connection] |
| 14:28:27 | | SootBector (SootBector) joins |
| 14:36:17 | | Webuser876363 joins |
| 14:36:51 | <Webuser876363> | anyone here? |
| 14:37:41 | | Webuser876363 quits [Client Quit] |
| 14:38:32 | <klea> | very great, less than a minute of wait time :( |
| 14:49:55 | | SootBector quits [Remote host closed the connection] |
| 14:51:03 | | SootBector (SootBector) joins |
| 14:59:45 | | Woodie joins |
| 14:59:45 | | Woodie is now authenticated as Woodie |
| 15:02:45 | | ArcadianMaggie joins |
| 15:42:48 | | ArcadianMaggie quits [Ping timeout: 256 seconds] |
| 15:43:26 | | ArcadianMaggie joins |
| 16:05:59 | | chrismrtn quits [Quit: leaving] |
| 16:12:33 | | chrismrtn (chrismrtn) joins |
| 16:28:46 | | DogsRNice joins |
| 17:04:17 | | DogsRNice_ joins |
| 17:08:05 | | DogsRNice quits [Ping timeout: 272 seconds] |
| 17:16:57 | | ThreeHM quits [Ping timeout: 272 seconds] |
| 17:19:01 | | ThreeHM (ThreeHeadedMonkey) joins |
| 17:32:16 | | AlsoHP_Archivist joins |
| 17:33:25 | | HP_Archivist quits [Ping timeout: 272 seconds] |
| 17:49:55 | <justauser> | It looks like nobody used mwlinkscrape for a while. |
| 17:50:27 | <justauser> | Are there more projects that could benefit from it, other that obvious candidates on the wiki? |
| 18:08:55 | | Wohlstand quits [Quit: Wohlstand] |
| 18:37:55 | <nulldata> | Speaking of, Louis Rossmann had a video recently asking for people to add archive links of any URLs in https://consumerrights.wiki/ articles. |
| 18:42:22 | <justauser> | So, just throw it into AB and let it grab offsite links? |
| 18:46:55 | | Wohlstand (Wohlstand) joins |
| 19:06:46 | | AlsoHP_Archivist quits [Client Quit] |
| 19:07:02 | | HP_Archivist (HP_Archivist) joins |
| 19:09:17 | <h2ibot> | Sjeben edited Deathwatch (+281, /* 2020s */ 2026-05 Pittsburgh Post-Gazette): https://wiki.archiveteam.org/?diff=60029&oldid=60011 |
| 19:09:18 | <h2ibot> | Sjeben edited Deathwatch (-1, /* 2026-05 */): https://wiki.archiveteam.org/?diff=60030&oldid=60029 |
| 19:12:13 | | Kotomind quits [Ping timeout: 272 seconds] |
| 19:48:50 | | leo60228 quits [Read error: Connection reset by peer] |
| 19:48:53 | | leo60228 (leo60228) joins |
| 19:53:42 | | Wohlstand quits [Client Quit] |
| 20:07:17 | <klea> | i think we should update that script a little lol |
| 20:13:44 | | etnguyen03 (etnguyen03) joins |
| 20:30:40 | | Wohlstand (Wohlstand) joins |
| 20:32:08 | | unlobito quits [Quit: Quit.] |
| 20:37:24 | | Wohlstand quits [Client Quit] |
| 21:06:20 | | unlobito (unlobito) joins |
| 21:06:54 | | PC joins |
| 21:07:21 | <PC> | is this the place for twitter stuff? just saw this https://twitter.com/i/status/2008905412930634045 |
| 21:07:23 | <eggdrop> | nitter: https://nitter.net/i/status/2008905412930634045 |
| 21:08:12 | <PC> | was already preparing a list of URLs to ask to be archived in case they can be now (since i've noticed some recent ones showing up as JSONs on the WBM), guess i ought to finish that ASAP before folks delete stuff (understandably so, but they shouldn't have to!) |
| 21:12:14 | | Webuser105901 joins |
| 21:15:12 | | Webuser105901 quits [Client Quit] |
| 21:25:11 | | etnguyen03 quits [Remote host closed the connection] |
| 21:28:28 | | programmerq quits [Ping timeout: 256 seconds] |
| 21:43:46 | | ArcadianMaggie quits [Ping timeout: 256 seconds] |
| 22:02:53 | <Guest> | i think its practically the same as before || old tos: https://x.com/en/tos#current:~:text=the%20United%20States-,These%20Terms%20of%20Service%20(%E2%80%9CTerms%E2%80%9D)%20govern%20your%20and%20other%20users%E2%80%99%20access,using%20the%20Services%20you%20agree%20to%20be%20bound%20by%20these%20Terms.,-These%20Terms%20are |
| 22:02:54 | <eggdrop> | nitter: https://nitter.net/en/tos |
| 22:02:59 | <Guest> | new tos: https://x.com/en/tos#current:~:text=the%20United%20States-,These%20Terms%20of%20Service%20(%E2%80%9CTerms%E2%80%9D)%20govern%20your%20relationship%20with%20us%20and,using%20the%20Services%20you%20agree%20to%20be%20bound%20by%20these%20Terms.,-These%20Terms%20are |
| 22:02:59 | <eggdrop> | nitter: https://nitter.net/en/tos |
| 22:03:34 | <Guest> | those are "copy link to highlight"'s so if anyone wants the raw url its: https://x.com/en/tos |
| 22:03:34 | <eggdrop> | nitter: https://nitter.net/en/tos |
| 22:04:54 | <Guest> | old: information, text, links, graphics, photos, audio, videos, or other materials or arrangements of materials uploaded, downloaded or appearing on the Services (collectively referred to as “Content”). |
| 22:04:59 | <Guest> | new: information, text, links, graphics, photos, audio, videos, or other materials or arrangements of materials uploaded, downloaded or appearing on the Services (collectively referred to as “Content”). |
| 22:05:41 | <Guest> | i think grok just hallucinated it, cc PC |
| 22:07:47 | <PC> | oh, good to know, thank you for verifying. i'm afraid that it might still cause at least some panic, given that i just saw it without context (and ironically, was too busy archiving twitter itself to verify), so others will have too |
| 22:08:58 | <PC> | but hopefully not much |
| 22:14:55 | <Guest> | the same language exists in a 2019 version of the tos (when it was still twitter): https://archive.ph/5bPGs |
| 22:16:14 | <Guest> | "audio" is not included in the definition of content in a 2017 copy: https://archive.ph/hIcgY |
| 22:31:52 | | Dada quits [Remote host closed the connection] |
| 22:33:16 | | Dada joins |
| 23:04:53 | <h2ibot> | Klea edited Wallhaven (-47, Remolve note that it's new because it's no…): https://wiki.archiveteam.org/?diff=60031&oldid=58325 |
| 23:07:53 | <h2ibot> | Klea edited Wallhaven (-8, Note that you can download NSFW wallpapers from…): https://wiki.archiveteam.org/?diff=60032&oldid=60031 |
| 23:10:28 | | agtsmith quits [Ping timeout: 256 seconds] |
| 23:19:10 | | nexussfan (nexussfan) joins |
| 23:23:59 | | ArcadianMaggie joins |
| 23:35:57 | <h2ibot> | Klea edited Wallhaven (+86, Mention the fact it has forums section): https://wiki.archiveteam.org/?diff=60033&oldid=60032 |
| 23:41:58 | <h2ibot> | Klea edited URLTeam (+190, /* "Official" shorteners */ Add whvn.cc): https://wiki.archiveteam.org/?diff=60034&oldid=59340 |
| 23:54:00 | <klea> | i wonder, if a project to archieve wallhaven were to be done, would all collections also be archieved?, (and in any case such projects will likely mean continuous archival because new content) |
| 23:54:09 | <klea> | also, it has a forums side |
| 23:58:33 | | Yakov quits [Quit: Ping timeout (120 seconds)] |
| 23:58:49 | | Yakov joins |