00:03:47<ericgallager>all the tesseract language packages use the 3-letter variants
00:04:55<@JAA>Consistency is good. If you need to cover languages that don't have a 2-char code, you might as well go with 3-char codes for all.
00:05:13<klea>i need to cover the language JAA
00:08:56stepney1411 (stepney141) joins
00:11:31stepney141 quits [Ping timeout: 272 seconds]
00:11:32stepney1411 is now known as stepney141
00:13:12cyanbox_ joins
00:16:50cyanbox quits [Ping timeout: 256 seconds]
00:28:22epoch quits [Quit: leaving]
00:41:46ducky quits [Ping timeout: 256 seconds]
00:47:34<h2ibot>Klea uploaded File:DocumentCloud Homepage.png (Homepage of the DocumentCloud website,…): https://wiki.archiveteam.org/?title=File%3ADocumentCloud%20Homepage.png
00:47:35<h2ibot>Klea created DocumentCloud (+987, Created page with "{{underconstruction}}…): https://wiki.archiveteam.org/?title=DocumentCloud
00:54:07Webuser860424 joins
00:54:35<h2ibot>Klea created URLs project (+18, Redirected page to [[URLs]]): https://wiki.archiveteam.org/?title=URLs%20project
00:54:47Webuser860424 quits [Client Quit]
00:54:57Webuser443086 joins
01:02:36<klg>Arawan Jamamadí?
01:06:22<ericgallager>anyone know what happened to freesf2 dot com? I tried checking the Wayback Machine, but it seems to only have grabs from after it was parked by some Chinese group unrelated to the original owners...
01:06:37<ericgallager>I'm coming to it from: https://forums.wesnoth.org/viewtopic.php?t=28736
01:09:40<klea>estimated shutdown time?
01:10:43<klea>ericgallager: <https://web.archive.org/web/20110207142202/http://freesf2.com/> ?
01:10:43Webuser443086 quits [Client Quit]
01:11:56<ericgallager>ok yeah that looks right; I guess a redirect was getting in my way previously...
01:11:59<ericgallager>klea++
01:11:59<eggdrop>[karma] 'klea' now has 9 karma!
01:12:22<klea>you're welcome :)
01:13:43nexussfan quits [Remote host closed the connection]
01:15:10nexussfan (nexussfan) joins
01:52:48Webuser780995 joins
01:53:00Webuser780995 quits [Client Quit]
01:55:48<h2ibot>Cooljeanius edited TikTok (+32, Separate references from notes): https://wiki.archiveteam.org/?diff=60094&oldid=59862
01:57:02<klea>oh, i did in fact exclude notes from that bulk job of adding reference sections i did a while ago apparently.
02:00:36sec^nd quits [Remote host closed the connection]
02:00:49<h2ibot>Cooljeanius edited TikTok (+308, /* Notes */ add note about vxtiktok, and…): https://wiki.archiveteam.org/?diff=60095&oldid=60094
02:00:56sec^nd (second) joins
02:09:02ThreeHM quits [Ping timeout: 256 seconds]
02:22:10ThreeHM (ThreeHeadedMonkey) joins
02:43:31ATinySpaceMarine quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
02:43:59ATinySpaceMarine joins
02:53:01<ivan>klea: if you send me a sample I can give you the output from PDF Expert on macOS, which is probably the best non-enterprise-grade OCR thing available
02:53:59<klea>ivan: i got it with https://olocr.com/ocr/persian already, i only needed enough to be able to search in the interwebs for more telegram links to hopefully have #telegrab get the post that was mentioned on that site.
02:54:15<ivan>okay
02:54:21<klea>thanks anyways.
02:57:03Hackerpcs_1 quits [Quit: Hackerpcs_1]
03:08:01Hackerpcs (Hackerpcs) joins
03:10:07mls quits [Ping timeout: 272 seconds]
03:20:05Webuser281296 joins
03:20:21Webuser281296 quits [Client Quit]
03:22:05Arachnophine quits [Quit: Arachnophine]
03:23:49<pabs>c3manu: thx, re-ran it
03:31:41mls (mls) joins
04:41:19midou quits [Ping timeout: 272 seconds]
04:54:18ducky (ducky) joins
04:59:03ducky quits [Ping timeout: 272 seconds]
05:00:17midou joins
05:08:42midou quits [Read error: Connection reset by peer]
05:10:39ducky (ducky) joins
05:11:02DogsRNice quits [Read error: Connection reset by peer]
05:12:13Kabaya quits [Quit: Ping timeout (120 seconds)]
05:12:19Kabaya joins
05:13:12nepeat quits [Quit: ZNC - https://znc.in]
05:13:35nepeat (nepeat) joins
05:21:37midou joins
05:26:16<h2ibot>PaulWise edited DocumentCloud (+169, mention the "Download File" links /cc klea): https://wiki.archiveteam.org/?diff=60096&oldid=60092
05:59:56nexussfan quits [Quit: Konversation terminated!]
06:49:18ArchivalEfforts quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
06:49:27ArchivalEfforts joins
07:01:52midou quits [Remote host closed the connection]
07:01:55midou joins
07:12:12ThreeHM quits [Ping timeout: 256 seconds]
07:12:53lennier2 joins
07:17:22takamori300 joins
07:17:51takamori300 quits [Client Quit]
07:25:20ThreeHM (ThreeHeadedMonkey) joins
08:03:28Arachnophine (Arachnophine) joins
08:17:26Arachnophine8 (Arachnophine) joins
08:17:31Arachnophine quits [Read error: Connection reset by peer]
08:17:31Arachnophine8 is now known as Arachnophine
09:37:46Dada joins
09:43:35ducky quits [Remote host closed the connection]
09:54:54ducky (ducky) joins
10:02:35ducky quits [Remote host closed the connection]
10:02:39ducky (ducky) joins
10:07:29ducky quits [Ping timeout: 272 seconds]
10:19:01ducky (ducky) joins
10:23:57ducky quits [Ping timeout: 272 seconds]
10:27:35fangfufu quits [Quit: ZNC 1.9.1+deb2+b3 - https://znc.in]
10:35:24ducky (ducky) joins
11:15:03<c3manu>i love it when people keep archives of (their) old stuff.. https://iranian.com/Features/archive.html
11:26:58<@arkiver>c3manu: nice! definitely archive it
11:28:38<c3manu>arkiver: it's running, i discovered that archive afterwards. from the frontpage iranian.com just looks like a wordpress that hasn't been updated since 2021 :)
11:29:05<c3manu>i guess i ignored the left column entirely ^^"
11:31:03fangfufu (fangfufu) joins
11:33:07<h2ibot>Cooljeanius edited DocumentCloud (+192, Write a lede so that it is no longer under…): https://wiki.archiveteam.org/?diff=60097&oldid=60096
11:41:54sonick (sonick) joins
11:49:27twiswist quits [Quit: twiswist]
12:00:03Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:49Bleo182600722719623455222 joins
12:10:12<h2ibot>Manu edited Governments/Germany (+375, /* Government Site Builder (GSB) */): https://wiki.archiveteam.org/?diff=60098&oldid=59512
12:31:08Webuser983101 joins
12:32:30Webuser983101 quits [Client Quit]
12:40:35Dada quits [Remote host closed the connection]
12:40:47Dada joins
12:54:18<h2ibot>Manu edited Deathwatch (+162, 2026-06-30: blogs.sapo.pt to be discontinued): https://wiki.archiveteam.org/?diff=60099&oldid=60045
12:55:18<h2ibot>Manu edited Deathwatch (+10, blogs.sapo.pt is a blogfarm): https://wiki.archiveteam.org/?diff=60100&oldid=60099
12:56:13<c3manu>Just discovered by accident that the Portuguese blog farm https://blogs.sapo.pt/ will be discontinued end of June this year. No idea yet how many blogs they're hosting, but on the help blog (https://ajuda.blogs.sapo.pt/descontinuacao-do-sapo-blogs-87653) they say it existed for 23 years.
13:01:24<justauser>User blogs as subdomains - simplifies discovery a bit.
13:01:49<justauser>Crawling https://blogs.sapo.pt/posts/ultimos could find at least some of them.
13:02:42<c3manu>yep, i was just about to say. i'm running it in #archivebot right now, so we can probably find most (if not all of them) from the logs, depending on how long it will take
13:02:47<justauser>But this list is only 5 pages/5 hours deep.
13:03:35<justauser>Some other areas of the main website are also good in that regard, but who cares if we are doing AB.
13:03:58<justauser>Probably needs a DPoS?
13:04:32SootBector quits [Remote host closed the connection]
13:05:11<c3manu>awww /directory.bml doesn't exist ^^
13:05:13<justauser>https://blogsquentes.blogs.sapo.pt/
13:05:19<c3manu>from robots.txt: Disallow: /directory.bml
13:05:43SootBector (SootBector) joins
13:05:50<c3manu>justauser: oh nice
13:06:24<justauser>Individual blogs have sitemaps, value unclear.
13:06:55<justauser>I hope it's not a bot trap.
13:07:29<justauser>No AB runs under this domain, so no idea if it ever existed.
13:08:40<c3manu>robots.txt of blogs.apo.pt also requests a delay for bots (10)
13:09:13<justauser>Depends on a bot.
13:09:31<justauser>5s for Bing, 20s for Yandex, none for Google.
13:09:53<justauser>We probably won't care...
13:17:22<justauser>Securitytrails has over 10K subdomains, but refuses to show more.
13:18:02<justauser>And DNSHistory is a lot more trigger-happy again.
13:18:45<justauser>Near the New Year, I could get thousands of pages per day. Now I'm banned after a dozen.
13:22:20<c3manu>rude
13:23:48<justauser>Sometimes I feel they have a human watching logs scroll by.
13:24:31<justauser>But perhaps it's just an adaptive system that bans more under high load, and other scrapers were taking their holiday break.
13:48:31nulldata-alt1 quits [Ping timeout: 272 seconds]
13:49:58<justauser>https://delistedgames.com/ - perhaps something to be watched by feed bot?
14:18:54<klea>I guess things like migrations for services like <https://www.servicerocket.com/resources/opsgenie-end-of-support-what-it-means-and-what-to-do-next> really can't easily be found, and thus shouldn't be added to DW?, or should they?
14:24:45NatTheCat (NatTheCat) joins
14:29:01<cruller>Speaking of which, there are several websites in Japan that list online games scheduled for service termination.
14:32:02Sluggs quits [Excess Flood]
14:35:45<cruller>weekly: https://gamebiz.jp/news/418977 https://dengekionline.com/article/202601/62842
14:36:29<cruller>monthly: https://game.watch.impress.co.jp/docs/kikaku/2074862.html https://dengekionline.com/article/202512/62039
14:37:43<justauser>https://transfer.archivete.am/14h1BV/blogs.sapo.pt_securitytrails.txt
14:37:43<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/14h1BV/blogs.sapo.pt_securitytrails.txt
14:38:11<justauser>Repeating later could plausibly return different results.
14:40:55<klea>time to make another few channels to pour links to :p
14:41:39Sluggs (Sluggs) joins
14:42:00<klea>cruller: it seems gamebiz.jp only has a rss feed at <https://gamebiz.jp/feed.rss>, since i don't speak Japanese, I can't really do much with those, but i guess time to setup new channel for it.
14:43:44<cruller>I'm concerned that smartphone apps and PWAs often can't be archived using standard methods. Even so, their shutdown announcements should probably be compiled somewhere.
14:45:30<klea>cruller: i made #gameaway, i might wait for that_lurker to put the feed bot there, or if im too lazy, i'll run another rss2irc bot there.
14:45:38<klea>feed url: <https://delistedgames.com/feed/>
14:52:06<@arkiver>maxmodels project is started tomorrow
14:52:17<cruller>I monitor shutdown announcements via https://news.ceek.jp/. It crawls Japanese news sites and provides full-text search. Those search results can be subscribed to as an RSS feed.
14:53:18<klea>cruller: give in #gameaway the rss urls you want to have machines monitor.
14:56:42<cruller>Yeah, the query I usually use generates too much noise and isn't suitable for that purpose, so I'll create dedicated queries.
14:56:56<klea>perfect
14:59:40<h2ibot>Klea edited List of websites excluded from the Wayback Machine (+67, add angelfire.lycos.com): https://wiki.archiveteam.org/?diff=60101&oldid=60084
15:01:40<justauser>Was already there as www.
15:02:41<h2ibot>Klea edited Angelfire (+47, angelfire.lycos.com is excluded from the WBM): https://wiki.archiveteam.org/?diff=60102&oldid=59197
15:03:35<klea>oh sorry
15:05:41<h2ibot>Klea edited List of websites excluded from the Wayback Machine (-69, it seems .angelfire.lycos.com is excluded): https://wiki.archiveteam.org/?diff=60103&oldid=60101
15:06:07<klea>huh
15:06:11DogsRNice joins
15:06:12<justauser>www. and non-www are equivalent for WBM.
15:06:24<justauser>We even have it documented somewhere.
15:06:34<klea>oh
15:07:25<klea>justauser: i thought that a leading www. meant that only the www. subdomain was excluded rather than any subdomain for the parent. https://web.archive.org/web/2/https://meow.angelfire.lycos.com/
15:07:50<klea>and if www. and non-www are equivalent for the exclusions list, shouldn't we consider removing the www.'s with the bot?, i guess maybe?
15:09:31<justauser>IDK if subdomains are normally excluded.
15:09:42<justauser>Maybe there never were a reason to.
15:09:56<justauser>s/were/was/
15:10:09<justauser>And besides, we are in #internetarchive land.
15:11:31<klea>aaah, yeah sorry.
15:12:07<cruller>Sometimes I think, “It would be great if I could replicate everything needed to generate HTTP messages instead of just capturing them.” But that would essentially mean becoming the service provider myself.
15:12:44peaches quits [Ping timeout: 256 seconds]
15:14:27peaches joins
15:14:29Shard1 (Shard) joins
15:17:11Shard quits [Ping timeout: 272 seconds]
15:17:11Shard1 is now known as Shard
15:21:40sonick quits [Quit: Connection closed for inactivity]
15:27:44<justauser>That's what URLTeam and WikiTeam do, to some extent.
15:30:20<cruller>Definitely.
15:35:02<justauser>klea tried to calculate the "efficiency" of AB for me.
15:35:39<justauser>Ratio of website size as observed by AB to space taken on the server.
15:36:08<justauser>The results weren't exciting at all.
15:46:45BornOn420 quits [Remote host closed the connection]
15:53:33BornOn420 (BornOn420) joins
16:04:33<justauser>https://transfer.archivete.am/i9Kat/blogs.sapo.pt_crux.txt
16:04:34<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/i9Kat/blogs.sapo.pt_crux.txt
16:12:05<justauser>blogs.sapo.pt has custom domains, for example https://bricopoupar.com/.
16:14:02<justauser>Some blogs killed earlier: https://ajuda.blogs.sapo.pt/sapo-blogs-internacional-64856
16:16:56<c3manu>justauser++
16:16:56<eggdrop>[karma] 'justauser' now has 6 karma!
16:26:11<justauser>Most of the custom domains should be here: https://transfer.archivete.am/ro5HS/blogs.sapo.pt_custom_dnshistory.txt
16:26:13<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/ro5HS/blogs.sapo.pt_custom_dnshistory.txt
16:32:15<cruller>A summary article on how to find sites on a specific server would be helpful.
16:32:40<cruller>It would be quite similar to [[Finding subdomains]] though.
16:33:21<justauser>SecurityTrails, DNSHistory, SSL certs, perhaps Shodan.
16:43:22<cruller>I wish WBM CDX had an IP field.
16:44:02<justauser>WARCs do have it, FWIW, but indexing by it would be a waste of space and time.
16:46:18andybak joins
16:47:28<andybak>Hi. I just heard from an ex-employee that Sketchfab is being gradually wound down in favour of Epic's fab. There's a lot of cultural institutions with data on there. Is there something people are aware of?
16:48:13<andybak>(i did check the wiki but there was no recent mention of it)
16:49:01<cruller>justauser: For indexing a few GB of warc files, I can include any fields, but I have no idea what the cost would be at the scale of WBM...
16:52:52<justauser>andybak: No.
16:54:44<@arkiver>andybak: do they already have a public mention of it?
16:56:56<justauser>"Blog" link broken for me.
17:08:14andybak quits [Client Quit]
17:17:08Shard quits [Client Quit]
17:23:35nyakase3 is now known as nyakase
17:27:26Shard1 (Shard) joins
17:37:32<that_lurker>ref. #archiveteam Would be odd to see them discontinue tenor as it's heavily used in messaging
17:37:56<that_lurker>though it is google so ¯\_(ツ)_/¯
17:41:32Cronfox quits [Quit: No Ping reply in 180 seconds.]
17:42:38Cronfox (Cronfox) joins
17:45:20<@arkiver>nyakase: would it be possible to paste the text on transfer.archivete.am from that email? or else, forward to arkiver@protonmail.com?
17:47:00<nyakase>that_lurker: I agree, but they *are* discontinuing the API, so that would end GIF picker integrations unless they have an alternative not mentioned in the email I got
17:47:26<nyakase>arkiver: I get a Google Safe Browsing warning when I go to that page
17:47:36<justauser>Known issue, sorry.
17:47:48<justauser>Probably someone uploaded something bad.
17:49:46<nyakase>https://transfer.archivete.am/yDE1v/tenorapimail.txt
17:49:46<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/yDE1v/tenorapimail.txt
17:50:22andybak joins
17:50:54<nyakase>It doesn't say anything about the website or GIFs themselves so those should be fine for now
17:52:09<DigitalDragons>I see a public announcement at the bottom of https://support.google.com/tenor/answer/10455265#whatll-happen-to-the-tenor-api&zippy=%2Cwhatll-happen-to-the-tenor-api too
17:53:46<andybak>arkiver - sorry. got disconnected. no - no public mention i'm aware of. i was on a zoom call with uk museum community people. i joined late but just caught a really clear mention of the fact from someone who seemed to have direct knowledge
17:54:42<andybak>also mentioned that content was already disappearing as orgnisations where moving away due to platform changes
17:54:58<andybak>where / were
17:56:08ice quits [Read error: Connection reset by peer]
17:57:29ice joins
18:13:20Wake quits [Quit: The Lounge - https://thelounge.chat]
18:13:40Wake joins
18:19:22<justauser>https://transfer.archivete.am/gbrz4/blogs.sapo.pt_ddg.txt
18:19:22<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/gbrz4/blogs.sapo.pt_ddg.txt
18:19:42<klea>Given <https://oldgate.org/> says it'll be closing it's doors in 2 days, i wonder, should we run domains from <https://status.oldgate.org/> on AB?
18:19:47<klea>> We have bad news: OGS, effective January 15th, 2025, Oldgate Studios will be closing its metaphoric doors. Since launching on March 15th, 2022, this online platform has been a place for creativity, collaboration, and connection.
18:20:07lumidify quits [Remote host closed the connection]
18:20:11klea bonks klea
18:20:18<klea>it's 2026 not 2025, so they closed a year ago.
18:21:38<justauser>Won't hurt either way.
18:21:40<andybak>oh - while i'm here (which isn't very often) - i'd like to thank everyone involved in archiving Google Poly. We've spent a lot of time over the last year making sure that archive can be viewed as closely as possible to how it was originally: https://icosa.gallery/
18:22:18<klea>i wonder if we should have a wiki page for archive viewer project things
18:22:44<klea>andybak: do you mind if i add that link to the wiki?
18:22:46<justauser>arkiver: Not urgent, but look towards blogs.sapo.pt when you have time. It looks like a future DPoS.
18:23:00<andybak>klea please do
18:23:55<andybak>A few notes. We're linking to archive.org for alternative formats but we've copied the primary viewable files to our own backblaze B2 account because of CORS issues. We had some support from the Internet Archive but then they got hit with those lawsuits and went quiet on us. We were hoping they could help with the storage issues.
18:24:54<andybak>A few other websites made your scrape available back when Poly shut down but they didn't handle format conversion very well and a lot of the files were unviewable.
18:26:08<h2ibot>Klea edited Google Poly (+194, Added archive viewing link): https://wiki.archiveteam.org/?diff=60104&oldid=59906
18:26:27lumidify (lumidify) joins
18:27:41<klea>I'm thinking about the idea of making a page to list uptime monitor pages, which typically have more outlinks to sites run by the same organizations (tho the uptime monitoring instances i've seen are all uptime kumas and seem to be for smallish indieweb projects such as <https://status.vulpinecitrus.info/status/default>)
18:28:13<klea>huh, that one doesn't have outlinks, but via human could get potential dns names, trough the process of brute forcing.
18:32:42NatTheCat2 (NatTheCat) joins
18:33:54NatTheCat quits [Ping timeout: 256 seconds]
18:33:54NatTheCat2 is now known as NatTheCat
18:38:22<justauser>https://transfer.archivete.am/2jpEb/blogs.sapo.pt_ip_securitytrails.txt
18:38:22<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/2jpEb/blogs.sapo.pt_ip_securitytrails.txt
18:45:10sknebel quits [Quit: No Ping reply in 180 seconds.]
18:47:34sknebel (sknebel) joins
18:52:58<andybak>ok. i'm going to have to go soon but can i get an email or something to discuss the sketchfab thing? IRC is too synchronous for my brain chemistry
18:53:47<andybak>arkiver ?
18:57:34<klea>andybak: <mailto:arkiver@protonmail.com> arkiver <at> protonmail <DOT> com
19:08:59sknebel quits [Ping timeout: 272 seconds]
19:10:10lumidify quits [Ping timeout: 256 seconds]
19:15:15<h2ibot>Klea edited URLTeam (+445, /* "Official" shorteners */ Add fans.ly…): https://wiki.archiveteam.org/?diff=60105&oldid=60034
19:16:15<h2ibot>Klea edited URLTeam (-33, Duplicate references section (blame me)): https://wiki.archiveteam.org/?diff=60106&oldid=60105
19:16:28anarcat quits [Quit: rebooting]
19:16:50Webuser559881 joins
19:16:51<eggdrop>[tell] Webuser559881: [2026-01-08T14:05:15Z] <klea> No, those channels aren't logged.
19:17:40<Webuser559881>This guy just died: https://www.youtube.com/@RealCoffeewithScottAdams/featured
19:17:44Webuser559881 quits [Client Quit]
19:18:25anarcat (anarcat) joins
19:20:00<klea>Scott Adams: <https://scottadams.locals.com> <https://x.com/ScottAdamsSays> <https://dilbert.com/>
19:20:02<eggdrop>nitter: https://nitter.net/ScottAdamsSays
19:23:21<klea>Death note: https://transfer.archivete.am/inline/P3Tlu/Scott-Adams-FinalMessage-p1.png https://transfer.archivete.am/inline/2Rz4R/Scott-Adams-FinalMessage-p2.png
19:27:44<klea>as text: https://transfer.archivete.am/inline/vlxXw/Scott-Adams-FinalMessage.txt
19:47:24<h2ibot>Manu edited Discourse/active (+51, Add forum.sailfishos.org): https://wiki.archiveteam.org/?diff=60107&oldid=60083
19:49:58<@JAA>Huh, SketchFab, bit of a throwback. We were doing their URL shortener in #urlteam for a while several years ago, until they asked us to stop.
20:11:27<h2ibot>Manu edited Discourse/active (+47, Add mixxx.discourse.group): https://wiki.archiveteam.org/?diff=60108&oldid=60107
20:32:27andybak quits [Client Quit]
20:36:38Barto quits [Quit: WeeChat 4.8.1]
20:36:55Barto (Barto) joins
21:13:51Cuphead2527480 (Cuphead2527480) joins
21:22:45twiswist (twiswist) joins
21:39:19Shard1 quits [Quit: Im doing something rq. Il brb]