00:06:55etnguyen03 (etnguyen03) joins
00:11:42rohvani joins
00:15:43<klea>tbh i don't think i have done much
00:15:49<klea>also burnout fun .p
00:15:55<klea>oh this is -ot chat
00:16:12Dada quits [Remote host closed the connection]
00:37:17etnguyen03 quits [Client Quit]
00:58:34<steering>Guest: me irl
00:59:12Shard795915 quits [Client Quit]
00:59:39<steering>> The page "MediaWiki:Editnotice-N" is loaded for an entire namespace, where "N" is the namespace number. For example "MediaWiki:Editnotice-3" is loaded for user talk pages.
00:59:45<steering>mediawiki why?
00:59:48Shard795915 (Shard) joins
01:00:03<steering>why is it not MW:Editnotice-Talk smh
01:00:55<steering>seems like MW:Talkpagetext would be better (all talk namespaces)
01:03:14<klea>oh i thought talkpagetext was displayed when viewing talk pages too, not only when editing.
01:03:30<steering>hmm, it is unclear
01:03:39<steering>"This corresponds to all "MediaWiki:Editnotice-X" messages together, where X is an odd number" leads me to believe its shown on edit though
01:05:25etnguyen03 (etnguyen03) joins
01:54:11Shard795915 quits [Ping timeout: 272 seconds]
01:56:46Shard795915 (Shard) joins
02:39:06Island joins
03:27:47epoch (epoch) joins
03:33:29HP_Archivist quits [Quit: Leaving]
03:37:22sec^nd quits [Ping timeout: 256 seconds]
03:38:33sec^nd (second) joins
03:47:37midou quits [Read error: Connection reset by peer]
03:55:43Guest quits [Quit: Guest]
03:56:05Guest joins
03:56:15Guest quits [Client Quit]
03:56:27Guest joins
03:57:12midou joins
04:04:10midou quits [Read error: Connection reset by peer]
04:08:09midou joins
04:08:13HP_Archivist (HP_Archivist) joins
04:09:05nothere_ quits [Ping timeout: 272 seconds]
04:10:29etnguyen03 quits [Remote host closed the connection]
04:38:02nothere joins
04:46:49Island quits [Read error: Connection reset by peer]
04:53:04DogsRNice quits [Read error: Connection reset by peer]
04:54:43<pabs>JAA: Canonical folks sent a mail asking for an IP address we will use for archiving the Ubuntu MoinMoin wikis and also the vBulletin forums that were at http://ubuntuforums.org/ (currently redirects to Discourse, but they would open it back up for us)
04:54:53<pabs>I'm guessing AB for the latter
04:55:16<pabs>MoinMoin we need someone to write a script, so thats a bit trickier
05:04:38Kotomind joins
05:09:15beardicus quits [Ping timeout: 272 seconds]
05:09:27beardicus (beardicus) joins
05:37:05<@JAA>pabs: Oh, right, thanks for the reminder.
05:37:25<@JAA>We should be able to do MoinMoin through AB as well. But yes, needs some scripts.
05:49:41Wohlstand (Wohlstand) joins
06:01:42nexussfan quits [Read error: Connection reset by peer]
06:10:32<HP_Archivist>JAA: Any interest in proactively looking at gov domains in Greenland...?
06:10:40<HP_Archivist>Unsure if anyone has done that yet
06:29:41midou quits [Ping timeout: 272 seconds]
06:30:12midou joins
06:41:36midou quits [Read error: Connection reset by peer]
06:51:01midou joins
06:54:38Webuser979078 joins
06:56:46Webuser979078 quits [Client Quit]
07:21:52midou quits [Ping timeout: 256 seconds]
07:22:40<h2ibot>PaulWise edited Software (-20, about software/code for archiving, not about…): https://wiki.archiveteam.org/?diff=60015&oldid=59179
07:40:54midou joins
07:41:56<pabs>JAA: reminder for MoinMoin though, non-sequential diffs are a serious issue with no adequate solution yet (the ignores are still buggy). so I asked Canonical to hide the non-sequential diffs from AB :)
07:42:07<pabs>https://wiki.archiveteam.org/index.php/MoinMoin
07:42:35<pabs>https://wiki.archiveteam.org/index.php/ArchiveBot/Ignore/NonSequentialIntegers
08:02:45<h2ibot>Hans5958 edited Bre.ad (-38): https://wiki.archiveteam.org/?diff=60016&oldid=59577
08:10:19beastbg8__ joins
08:11:01midou quits [Ping timeout: 272 seconds]
08:14:49beastbg8_ quits [Ping timeout: 272 seconds]
08:18:57midou joins
08:22:33LddPotato quits [Read error: Connection reset by peer]
08:23:23LddPotato (LddPotato) joins
08:24:04Webuser339099 joins
08:25:00Webuser339099 quits [Client Quit]
08:33:14LddPotato quits [Read error: Connection reset by peer]
08:34:00LddPotato (LddPotato) joins
08:44:00LddPotato quits [Read error: Connection reset by peer]
08:45:09LddPotato (LddPotato) joins
08:54:38<triplecamera|m>Hi. I'd like to make some small WARC dumps by myself with wpull, the same crawler used by archivebot. However, it cannot run on the latest Python. What magic is archivebot using to make it work?
08:55:59<triplecamera|m><https://github.com/ArchiveTeam/wpull> last commit was 4 years ago
08:57:30<pabs>AB uses old software IIRC. if you want to share the links here we can do them in AB
09:01:52<h2ibot>PaulWise edited .ps (+53, add data in sidebar, indicate it is private): https://wiki.archiveteam.org/?diff=60017&oldid=58998
09:10:10<triplecamera|m>pabs: Well, no, thanks. I think I should try wget.
09:10:43<pabs>wget doesn't make standards-compliant WARC https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem
09:11:26<triplecamera|m>🤔
09:14:09<triplecamera|m>So is there any recommended web crawlers for personal use?
09:16:45<pabs>only recommended stuff on the page is grab-site (uses wpull) and wget-at https://github.com/ArchiveTeam/wget-lua
09:20:43<triplecamera|m>Well, seems that wpull is the best option.
09:21:23<triplecamera|m>I will try to run wpull on Python 3.7 / 3.8. Thank you.
09:22:36<@JAA>pabs: I wouldn't be doing a recursive crawl nor grab the diff pages at all. I do need to finish those scripts though.
09:23:41<pabs>hmm, I wonder if non-recursive would miss uploaded files?
09:23:55<@JAA>HP_Archivist: That may be a good idea, yeah.
09:24:50<HP_Archivist>JAA: Sure thing. I'll circle back later today about it.
09:25:40<@JAA>triplecamera|m: You need 3.6 for wpull, but wpull on its own isn't very pleasant due to a few CLI bugs. grab-site uses a fork of wpull that supports newer Python versions.
09:31:45<triplecamera|m>JAA: 😕
09:31:54<triplecamera|m>OK, I will have a try.
09:34:37nathang2184 quits [Ping timeout: 272 seconds]
09:42:31nathang2184 joins
09:55:00<triplecamera|m>pabs: By the way, has this been reported to the wget maintainers? I can still see angle brackets in the latest wget.
09:57:27<pabs>looks like yes, my browser history has these wget warc URLs https://savannah.gnu.org/bugs/?64203 https://savannah.gnu.org/bugs/?func=detailitem&item_id=47281
10:02:34<pabs>hmm, the wget bug says <http://example.com/> is correct but the wiki says that its the standard but shouldn't be done
10:02:58<@arkiver>i believe it was a bug/error in the standard
10:03:06<@arkiver>it was not supposed to be defined as such
10:03:26<pabs>ah, I see, that bug was what caused wget to make the change to add <>
10:06:12<@arkiver>yes
10:07:42<pabs>looks like that being bogus was never reported in the wget savannah bugs system
10:08:01<h2ibot>PaulWise edited The WARC Ecosystem (+137, add refs related to wget WARC issues): https://wiki.archiveteam.org/?diff=60018&oldid=59464
10:09:23<triplecamera|m>I just checked the specs. The angle brackets were required in WARC 1.0, but were removed in WARC 1.1.
10:10:09<triplecamera|m>> NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC 3986> ">". This rule has
10:10:09<triplecamera|m>been changed to meet requests from implementers.
10:17:02<h2ibot>PaulWise edited The WARC Ecosystem (+113, mention the WARC/1.1 removal of brackets around…): https://wiki.archiveteam.org/?diff=60019&oldid=60018
10:20:11Shard795915 quits [Quit: Ping timeout (120 seconds)]
10:20:26Shard795915 (Shard) joins
10:22:12Dada joins
10:23:09Wohlstand quits [Quit: Wohlstand]
10:29:31<@JAA>The angle brackets were in the 1.0 spec grammar but not in the 1.0 spec examples, and no software other than wget ever wrote (or supported reading) them.
10:29:47<@JAA>Yes, this has been discussed with the wget devs before.
10:30:15<@JAA>(On IRC)
11:11:08<h2ibot>Movses edited Academia.edu (+1, IRC #archiveteam-bs <pokechu22> I don't think…): https://wiki.archiveteam.org/?diff=60020&oldid=59049
11:11:09<h2ibot>Hans5958 deleted Template:Partially saved (Deleted to make way for move from…)
11:11:10<h2ibot>Hans5958 moved Template:Partiallysaved to Template:Partially saved: https://wiki.archiveteam.org/?title=Template%3APartially%20saved
11:11:11<h2ibot>Hans5958 moved Template:Selfsaved to Template:Self-saved: https://wiki.archiveteam.org/?title=Template%3ASelf-saved
11:11:12<h2ibot>Hans5958 moved Template:Onhiatus to Template:On hiatus: https://wiki.archiveteam.org/?title=Template%3AOn%20hiatus
11:11:13<h2ibot>Hans5958 deleted Template:On hiatus (Deleted to make way for move from…)
11:24:24LddPotato quits [Read error: Connection reset by peer]
11:25:32LddPotato (LddPotato) joins
11:26:19<@arkiver>JAA: i approved those renamed on the Templates, but if they should not have been, we can turn them back
11:26:28<@arkiver>(see last three h2ibot messages)
11:30:58<@JAA>arkiver: Seems fine with me.
11:31:18<@arkiver>alright
11:32:11<h2ibot>JustAnotherArchivist edited Academia.edu (-1, Reverted edits by…): https://wiki.archiveteam.org/?diff=60028&oldid=60020
11:32:14<@JAA>But this one was a misunderstanding about what the 'project status' row means.
11:36:28LddPotato quits [Read error: Connection reset by peer]
11:37:06LddPotato (LddPotato) joins
11:48:07LddPotato quits [Read error: Connection reset by peer]
11:48:46LddPotato (LddPotato) joins
11:49:32quartermaster quits [Quit: Connection closed for inactivity]
12:00:00Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:42Bleo182600722719623455222 joins
12:52:50Dada quits [Remote host closed the connection]
13:05:50Dada joins
13:29:39Wohlstand (Wohlstand) joins
13:40:51<triplecamera|m><triplecamera|m> "By the way, has this been..." <- ...So, is anyone trying to file a bug report for wget?
13:41:37Shard795915 quits [Ping timeout: 272 seconds]
13:42:50Shard795915 (Shard) joins
13:47:03<chrismrtn>triplecamera|m: The problem has been acknowledged in the wget mailing list before, and a solution was proposed by one of the developers at https://lists.gnu.org/archive/html/bug-wget/2024-11/msg00010.html (although no action has been taken, AFAIK)
14:03:22ArcadianMaggie quits [Read error: Connection reset by peer]
14:06:22Dada quits [Remote host closed the connection]
14:08:37Dada joins
14:18:27<triplecamera|m>chrismrtn: Well, that's unfortunate... Is there a way to remind them?
14:27:19SootBector quits [Remote host closed the connection]
14:28:27SootBector (SootBector) joins
14:36:17Webuser876363 joins
14:36:51<Webuser876363>anyone here?
14:37:41Webuser876363 quits [Client Quit]
14:38:32<klea>very great, less than a minute of wait time :(
14:49:55SootBector quits [Remote host closed the connection]
14:51:03SootBector (SootBector) joins
14:59:45Woodie joins
15:02:45ArcadianMaggie joins
15:42:48ArcadianMaggie quits [Ping timeout: 256 seconds]
15:43:26ArcadianMaggie joins
16:05:59chrismrtn quits [Quit: leaving]
16:12:33chrismrtn (chrismrtn) joins
16:28:46DogsRNice joins
17:04:17DogsRNice_ joins
17:08:05DogsRNice quits [Ping timeout: 272 seconds]
17:16:57ThreeHM quits [Ping timeout: 272 seconds]
17:19:01ThreeHM (ThreeHeadedMonkey) joins
17:32:16AlsoHP_Archivist joins
17:33:25HP_Archivist quits [Ping timeout: 272 seconds]
17:49:55<justauser>It looks like nobody used mwlinkscrape for a while.
17:50:27<justauser>Are there more projects that could benefit from it, other that obvious candidates on the wiki?
18:08:55Wohlstand quits [Quit: Wohlstand]
18:37:55<nulldata>Speaking of, Louis Rossmann had a video recently asking for people to add archive links of any URLs in https://consumerrights.wiki/ articles.
18:42:22<justauser>So, just throw it into AB and let it grab offsite links?
18:46:55Wohlstand (Wohlstand) joins
19:06:46AlsoHP_Archivist quits [Client Quit]
19:07:02HP_Archivist (HP_Archivist) joins
19:09:17<h2ibot>Sjeben edited Deathwatch (+281, /* 2020s */ 2026-05 Pittsburgh Post-Gazette): https://wiki.archiveteam.org/?diff=60029&oldid=60011
19:09:18<h2ibot>Sjeben edited Deathwatch (-1, /* 2026-05 */): https://wiki.archiveteam.org/?diff=60030&oldid=60029
19:12:13Kotomind quits [Ping timeout: 272 seconds]
19:48:50leo60228 quits [Read error: Connection reset by peer]
19:48:53leo60228 (leo60228) joins
19:53:42Wohlstand quits [Client Quit]
20:07:17<klea>i think we should update that script a little lol
20:13:44etnguyen03 (etnguyen03) joins
20:30:40Wohlstand (Wohlstand) joins
20:32:08unlobito quits [Quit: Quit.]
20:37:24Wohlstand quits [Client Quit]
21:06:20unlobito (unlobito) joins
21:06:54PC joins
21:07:21<PC>is this the place for twitter stuff? just saw this https://twitter.com/i/status/2008905412930634045
21:07:23<eggdrop>nitter: https://nitter.net/i/status/2008905412930634045
21:08:12<PC>was already preparing a list of URLs to ask to be archived in case they can be now (since i've noticed some recent ones showing up as JSONs on the WBM), guess i ought to finish that ASAP before folks delete stuff (understandably so, but they shouldn't have to!)
21:12:14Webuser105901 joins
21:15:12Webuser105901 quits [Client Quit]
21:25:11etnguyen03 quits [Remote host closed the connection]
21:28:28programmerq quits [Ping timeout: 256 seconds]
21:43:46ArcadianMaggie quits [Ping timeout: 256 seconds]
22:02:53<Guest>i think its practically the same as before || old tos: https://x.com/en/tos#current:~:text=the%20United%20States-,These%20Terms%20of%20Service%20(%E2%80%9CTerms%E2%80%9D)%20govern%20your%20and%20other%20users%E2%80%99%20access,using%20the%20Services%20you%20agree%20to%20be%20bound%20by%20these%20Terms.,-These%20Terms%20are
22:02:54<eggdrop>nitter: https://nitter.net/en/tos
22:02:59<Guest>new tos: https://x.com/en/tos#current:~:text=the%20United%20States-,These%20Terms%20of%20Service%20(%E2%80%9CTerms%E2%80%9D)%20govern%20your%20relationship%20with%20us%20and,using%20the%20Services%20you%20agree%20to%20be%20bound%20by%20these%20Terms.,-These%20Terms%20are
22:02:59<eggdrop>nitter: https://nitter.net/en/tos
22:03:34<Guest>those are "copy link to highlight"'s so if anyone wants the raw url its: https://x.com/en/tos
22:03:34<eggdrop>nitter: https://nitter.net/en/tos
22:04:54<Guest>old: information, text, links, graphics, photos, audio, videos, or other materials or arrangements of materials uploaded, downloaded or appearing on the Services (collectively referred to as “Content”).
22:04:59<Guest>new: information, text, links, graphics, photos, audio, videos, or other materials or arrangements of materials uploaded, downloaded or appearing on the Services (collectively referred to as “Content”).
22:05:41<Guest>i think grok just hallucinated it, cc PC
22:07:47<PC>oh, good to know, thank you for verifying. i'm afraid that it might still cause at least some panic, given that i just saw it without context (and ironically, was too busy archiving twitter itself to verify), so others will have too
22:08:58<PC>but hopefully not much
22:14:55<Guest>the same language exists in a 2019 version of the tos (when it was still twitter): https://archive.ph/5bPGs
22:16:14<Guest>"audio" is not included in the definition of content in a 2017 copy: https://archive.ph/hIcgY
22:31:52Dada quits [Remote host closed the connection]
22:33:16Dada joins
23:04:53<h2ibot>Klea edited Wallhaven (-47, Remolve note that it's new because it's no…): https://wiki.archiveteam.org/?diff=60031&oldid=58325
23:07:53<h2ibot>Klea edited Wallhaven (-8, Note that you can download NSFW wallpapers from…): https://wiki.archiveteam.org/?diff=60032&oldid=60031
23:10:28agtsmith quits [Ping timeout: 256 seconds]
23:19:10nexussfan (nexussfan) joins
23:23:59ArcadianMaggie joins
23:35:57<h2ibot>Klea edited Wallhaven (+86, Mention the fact it has forums section): https://wiki.archiveteam.org/?diff=60033&oldid=60032
23:41:58<h2ibot>Klea edited URLTeam (+190, /* "Official" shorteners */ Add whvn.cc): https://wiki.archiveteam.org/?diff=60034&oldid=59340
23:54:00<klea>i wonder, if a project to archieve wallhaven were to be done, would all collections also be archieved?, (and in any case such projects will likely mean continuous archival because new content)
23:54:09<klea>also, it has a forums side
23:58:33Yakov quits [Quit: Ping timeout (120 seconds)]
23:58:49Yakov joins