00:00:04matoro quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
00:01:02DopefishJustin quits [Remote host closed the connection]
00:01:51matoro joins
00:02:50midou joins
00:17:53etnguyen03 (etnguyen03) joins
00:20:39<@JAA>Uh, right
00:26:38DopefishJustin joins
00:39:07chaoticbee (chaoticbee) joins
01:05:32cyanbox joins
01:08:37makeworld quits [Remote host closed the connection]
01:16:07<h2ibot>PaulWise edited ArchiveBot (-145, use variables to make the script shorter, more…): https://wiki.archiveteam.org/?diff=57793&oldid=57658
01:19:07<h2ibot>PaulWise edited ArchiveBot (-2, use <pre>): https://wiki.archiveteam.org/?diff=57794&oldid=57793
01:34:41<pabs>justauser|m: can you add that wiki to Deathwatch?
02:14:52ducky quits [Ping timeout: 260 seconds]
02:15:52ducky (ducky) joins
02:50:00andrewnyr quits [Quit: Ping timeout (120 seconds)]
02:57:22midou quits [Ping timeout: 256 seconds]
03:02:22pabs quits [Read error: Connection reset by peer]
03:03:44pabs (pabs) joins
03:05:42midou joins
03:17:55Wohlstand (Wohlstand) joins
03:30:53Island quits [Read error: Connection reset by peer]
03:31:53<nicolas17>agh
03:32:18<nicolas17>https://opensource.samsung.com/uploadSearch?searchValue=SCH-E329I this (and a few others) are supposed to be PDFs
03:32:28<nicolas17>instead they have a magic number of "<## NASCA DRM FILE - VER1.00 ##>"
03:32:48<nicolas17>whatever, let's archive them anyway for completeness and to preserve evidence of samsung's fuckup, right?
03:33:06<nicolas17>"error uploading SCH-E329I.pdf: Uploaded content is unacceptable. - error checking pdf file"
03:45:12PredatorIWD256 joins
03:45:50Guest58 joins
03:46:00Guest58 quits [Client Quit]
03:46:05midou quits [Ping timeout: 272 seconds]
03:47:21PredatorIWD25 quits [Ping timeout: 272 seconds]
03:47:21PredatorIWD256 is now known as PredatorIWD25
03:54:29midou joins
04:03:15etnguyen03 quits [Remote host closed the connection]
04:22:50Guest58 joins
04:26:31Shard79 quits [Quit: Ping timeout (120 seconds)]
04:26:43Shard79 (Shard) joins
04:33:35nothere quits [Ping timeout: 272 seconds]
04:54:37nothere_ joins
06:00:36Guest58 quits [Client Quit]
06:01:27Guest58 joins
06:06:41Guest58 quits [Ping timeout: 272 seconds]
06:10:59DogsRNice quits [Read error: Connection reset by peer]
06:16:06Guest58 joins
06:35:11midou quits [Ping timeout: 272 seconds]
07:01:19<steering>nicolas17: https://cpcex.sec.samsung.net/Windchill/ext/cpcex/common/gate/jsp/guideDrmEN.jsp#1 amazing
07:01:19Guest58 quits [Client Quit]
07:04:22<steering>some sort of fs filter or something to recognize that header, seems lovely
07:04:43midou joins
07:04:47<nicolas17>I assume IA is just seeing that it's not a valid PDF
07:05:00<steering>oh yeah I mean for samsung's use of it
07:09:32midou quits [Ping timeout: 256 seconds]
07:25:39midou joins
07:39:00midou quits [Ping timeout: 256 seconds]
07:47:25Guest58 joins
07:49:05midou joins
07:52:35twse346865 joins
07:53:43<twse346865>consider hiding the privacy and disclaimer links in the footer from the wiki, by editing the LocalSettings.php file. there's a MediaWiki extension called FooterManager to do this.
07:54:53<twse346865>in the LocalSettings.php file, there are lines for $wgFooterManagerLinks['privacy'] and $wgFooterManagerLinks['disclaimer'], which are set to true. please change them to false!
07:58:54<nicolas17>https://data.nicolas17.xyz/samsung-grab/ 13 files pending, don't let ungeskriptet do everything :p
08:02:48midou quits [Ping timeout: 256 seconds]
08:16:58skyrocket quits [Ping timeout: 256 seconds]
08:18:07<twse346865>InMotion Hosting did a tutorial on editing the footer links in MediaWiki, find it at inmotionhosting dot com/support/edu/mediawiki/edit-footer-mediawiki.
08:21:03twse346865 quits [Client Quit]
08:21:58skyrocket joins
08:23:53twse525053 joins
08:24:18<twse525053>in the Special:Version page from the wiki, I don't have any entry shown for FooterManager.
08:28:52cmlow quits [Ping timeout: 256 seconds]
08:29:51lennier2_ joins
08:30:05<twse525053>the team could have downloaded the FooterManager extension from MediaWiki and disabled the privacy policy and disclaimer links, adding the entries of $wgFooterManagerLinks lines of ['privacy'] and ['disclaimer'] to false.
08:30:44<twse525053>and in the LocalSettings.php file from the MediaWiki installation being used, there is no FooterManager extension line!
08:32:59lennier2 quits [Ping timeout: 272 seconds]
08:35:07<twse525053>taking a look at Wayback Machine snapshots of Archiveteam:Privacy policy and Archiveteam:General disclaimer pages, all the snapshots have the 404 Not Found error code, and they are shown in orange.
08:38:23<twiswist>How do I view the history of an individual file in an Internet Archive upload? I remember there being a page (either like /details/ and /download/ but something else, or appended to the end of the download url) that showed you a log of operations that have been performed on the file, the most interesting of which is whether the file was generated by IA or originally uploaded by the uploader
08:38:48<twiswist>It's not discoverable anywhere (or I'm just overlooking it) but I swear I've stumbled across it before
08:41:09<twiswist>It's supposed to be in my browser history but isn't
08:41:36<twse525053>in the MediaWiki page for Extension:FooterManager, it says that the extension is no longer available for download and has been archived.
08:48:46<twse525053>InMotion Hosting provided the FooterManager extension for download in its support page in a comment from 2013.
08:49:43<twse525053>please add the URL bobsgame dot com (was excluded in December 15th, 2012) to the wiki page: "List of websites excluded from the Wayback Machine/Former exclusions". the edits won't go through if I don't create an account!
08:53:56AK (AK) joins
08:56:40<twse525053>the footer links to privacy policy and disclaimer are MediaWiki:Privacy and MediaWiki:Disclaimer. please edit LocalSettings.php file from the MediaWiki installation being used by Archiveteam, to remove the links to privacy policy and disclaimer from the footer!
08:59:43AK quits [Client Quit]
09:01:19<chrismrtn>twiswist: Is the file named like itemIdentifierHere_files.xml what you are looking for? It doesn't show a history, but it does show if a file is original or a derivative (generated by IA)
09:43:48<c3manu>Rince: thanks, will do! :) i think i'd need a more or less complete checklist, even if it would be straight forward for some.
09:44:07<c3manu>masterx244|m: the latter :)
09:48:28twse525053 quits [Client Quit]
10:04:14<masterx244|m>twistwist: you mean the /history/ page showing the task logs?
10:04:22<masterx244|m>*twiswist
10:05:17<masterx244|m>c3manu that makes it much easier than fighting the F5 wars.
10:05:17<masterx244|m>also: waiting for guru3 to open to snatch up my usual DECT extension
10:08:43midou joins
10:34:12ducky quits [Ping timeout: 260 seconds]
10:35:36ducky (ducky) joins
10:38:20szczot3k|t (szczot3k) joins
10:41:06szczot3k|t quits [Client Quit]
10:41:15szczot3k|t (szczot3k) joins
11:17:31<katia>Yayyyy 39c3
11:17:34<katia>Hype hype hype hype
12:00:02Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:46Bleo182600722719623455222 joins
12:17:41<masterx244|m>letzs hope that we manage the AT nerd meetup this time
12:17:44<masterx244|m>*lets
12:27:51etnguyen03 (etnguyen03) joins
12:51:18etnguyen03 quits [Client Quit]
12:55:31etnguyen03 (etnguyen03) joins
13:07:44ducky quits [Ping timeout: 260 seconds]
13:42:13<katia>Yaaaa
13:46:29chaoticbee quits [Ping timeout: 272 seconds]
14:03:35nine quits [Ping timeout: 272 seconds]
14:07:08nine joins
14:07:08nine quits [Changing host]
14:07:08nine (nine) joins
14:20:45AK (AK) joins
15:00:26chaoticbee (chaoticbee) joins
15:21:36ducky (ducky) joins
15:25:29cyanbox quits [Read error: Connection reset by peer]
15:26:33nicolas17 quits [Ping timeout: 272 seconds]
15:33:34<h2ibot>Justauser edited Deathwatch (+343, /* 2026 */ computersciencewiki.org, wormbase.org): https://wiki.archiveteam.org/?diff=57795&oldid=57781
15:41:25emanuele6 (emanuele6) joins
15:46:44Guest58_ joins
15:46:44Guest58 quits [Read error: Connection reset by peer]
15:50:52<c3manu>masterx244|m: you didn’t call dibs on yours? ;)
15:54:36<c3manu>katia: \o/
16:00:51<emanuele6>katia: \o/
16:11:55nicolas17 (nicolas17) joins
16:12:48Wohlstand quits [Quit: Wohlstand]
16:40:52Wohlstand (Wohlstand) joins
17:05:24etnguyen03 quits [Quit: Konversation terminated!]
17:06:34etnguyen03 (etnguyen03) joins
17:09:02Guest58_ quits [Client Quit]
17:11:46ThreeHM quits [Quit: WeeChat 4.7.1]
17:12:35ThreeHM (ThreeHeadedMonkey) joins
17:15:43Snivy quits [Quit: Ping timeout (120 seconds)]
17:15:56Snivy (Snivy) joins
17:16:02Snivy quits [Client Quit]
17:16:53Snivy (Snivy) joins
17:26:19etnguyen03 quits [Client Quit]
17:44:39etnguyen03 (etnguyen03) joins
18:33:03etnguyen03 quits [Client Quit]
18:44:26etnguyen03 (etnguyen03) joins
18:46:40Sluggs quits [Excess Flood]
18:47:02<h2ibot>Manu edited Distributed recursive crawls (+51, Candidates: Add ildb.nadir.org): https://wiki.archiveteam.org/?diff=57796&oldid=57786
18:48:48skyrocket quits [Read error: Connection reset by peer]
18:51:18Sluggs (Sluggs) joins
18:53:34<emanuele6>'twasn't me
19:03:57skyrocket joins
19:06:19pokechu22 quits [Read error: Connection reset by peer]
19:08:59pokechu22 (pokechu22) joins
19:23:08epoch joins
19:25:34<epoch>https://hackaday.com/2025/11/07/oldversion-com-archive-facing-shutdown-due-to-financing-issues/ dunno if anyone has mentioned this yet or if anyone is interested
19:35:53<nicolas17>downloads themselves are POST
19:37:01emanuele6 is now known as Manu
19:38:45<@JAA>Do we know the total size (or even a rough estimate)?
19:39:01<nicolas17>we should probably contact them
19:39:57<@JAA>(The financial troubles have been known for at least a month, by the way.)
19:47:18<nicolas17>hm I think I can do some scraping and get the total size
19:48:37<@JAA>That'd be great. Even an extrapolated estimate is fine. Just to get a sense of the scale.
19:49:34<nicolas17>something to note
19:49:36<nicolas17><span class="viewmore clickable" onclick="getpage('/windows/software/office/')">
19:49:53<nicolas17>function getpage(page) { window.location = page; }
19:49:54<nicolas17>sir have you heard of normal links
19:50:33<@JAA>I'd think it can't be huge. 30k versions, maybe tens of MB per version on average. That'd put it at the scale of a terabyte or a couple.
19:51:41<masterx244|m>but POST messes with wayback-ability depending on how the URLs work. in the worst case a static-item form needs to be derived from the downloaded WARCs
19:52:07<nicolas17>I have little hope for WBM replay due to that POST, yeah...
19:53:17<masterx244|m>thx for the reminder. had to check a set of currently running zip uploads if something failed
19:53:30<@JAA>I think it can work, although it'd be annoying to navigate. But let's not worry about that for now.
19:55:22Wohlstand quits [Quit: Wohlstand]
19:58:24<nicolas17>hm do I remember how to use bs4
19:59:08<@JAA>grep :-)
20:14:33Manu is now known as emanuele6
20:18:00<nicolas17>ok I have a list of all apps, they are indeed 1963
20:21:59NeonGlitch (NeonGlitch) joins
20:22:36ljcool2006 quits [Quit: Leaving]
20:26:55Aoede_ quits [Read error: Connection reset by peer]
20:29:06<@JAA>Is the sitemap complete?
20:29:22<nicolas17>I didn't think to check if there even was a sitemap >_<
20:29:29Aoede (Aoede) joins
20:29:40<@JAA>Heh
20:29:50<nicolas17>I'm fetching every app to get the list of versions now, so far it looks like your 1TB estimate was pretty good
20:30:15<nicolas17>25% fetched, 869 GB extrapolated
20:31:57<nicolas17>oh it will actually be smaller due to android being smaller files
20:32:17<@JAA>Ah
20:32:31<katia>nicolas17, you know there's a warc right
20:32:32<nicolas17>I went in order so I got all Windows first
20:32:35<nicolas17>katia: the what
20:32:42<katia>nothing
20:32:48<@JAA>Also, I've forgotten to check for sitemaps often enough. :-)
20:33:14<nicolas17>katia: WHERE
20:33:43<@JAA>There's an AB job (which obviously didn't grab any of the actual software).
20:34:47<emanuele6>it's not me
20:35:09<nicolas17>my script reached http://www.oldversion.com/android/com-flipkart-android/ and crashed T_T
20:35:25<masterx244|m>and thats usually the time when some WARC-parsing can be necessary to figure out the final stage of requests
20:35:42<@JAA>What's the last estimate? That'll be good enough.
20:36:17<emanuele6>I guess around 1123GB
20:36:27<emanuele6>we'll see how close I was
20:37:14<nicolas17>I fetched 1172 apps, 21909 app versions, 350 GB
20:38:24<katia>nicolas17, you don't get ratelimited?
20:38:37<nicolas17>now I'm getting slowdowns and 502s
20:38:44<katia>ah
20:42:36<nicolas17>yeah this looks like 500 gigs
20:44:10<katia>i'm going to guess 263278956.29 KB
20:44:49<@JAA>Because you have a copy already?
20:44:56<katia>no i parsed the warc
20:45:00<@JAA>Ah :-D
20:45:47<katia>https://vyxg5mxrl.i.katia.sh/2025-11-10-oldversion.com-warc-parse.py.txt
20:46:47<nicolas17>well how come I got 514322400 KB
20:46:53arch_ (arch) joins
20:47:01arch quits [Ping timeout: 272 seconds]
20:47:04arch_ is now known as arch
20:47:09<katia>dunno i can't read the code in your computer trivially at the moment
20:49:50<nicolas17>https://transfer.archivete.am/inline/Xc00w/oldversion.com-appversions.txt
20:49:59<@JAA>Well, 'hundreds of GB, probably under a TB' is fine enough for me. :-)
20:51:29<katia>ah i omitted android
20:52:10<katia>and mac i wrote macos i guess
20:54:20<katia>517673105.54 KB
21:03:28<twiswist>chrismrtn: Yes, that contains what I was looking for (derivative vs original), thank you!
21:14:01etnguyen03 quits [Client Quit]
21:14:06nine quits [Quit: See ya!]
21:14:20nine joins
21:14:20nine quits [Changing host]
21:14:20nine (nine) joins
21:21:19ducky quits [Read error: Connection reset by peer]
21:32:06ducky (ducky) joins
22:01:43etnguyen03 (etnguyen03) joins
22:09:01<@arkiver>not sure how the rate limiting of oldversion is
22:09:08<@arkiver>do we need a warrior project? or can AB handle it?
22:09:24@arkiver would be happy to setup a warrior project if that is what we need
22:09:54<emanuele6>no war
22:10:03<@JAA>I think it should be feasible without DPoS.
22:10:11<@JAA>AB can't do it though due to POST.
22:10:37<@JAA>I'll look at it more closely sometime this week.
22:15:03wickedplayer494 quits [Ping timeout: 272 seconds]
22:16:50<@arkiver>JAA: alright!
22:16:55<@arkiver>if needed though, i'm happy to make one :)
22:17:10<@arkiver>at under a TB though, it's definitely not needed for size, maybe for IPs
22:33:22<nicolas17>hmm replay might work
22:34:05<nicolas17>the website sends a POST, but all the data is in the URL (no body) and if you do a GET to the same URL it still works
22:34:34<nicolas17>what does WBM do if you send a POST to an archived URL?
22:34:51<nicolas17>error, or pretends it's a GET and returns the archived response?
22:36:27<nicolas17>looks like the latter! so this might Just Work for replay!
22:42:31<nicolas17>actually there are POST fields, but the server doesn't seem to care, the data in the URL is enough
22:45:33<nicolas17>URL data has the current timestamp, so it's likely it expires, but I don't know what the expiration is
22:53:00<@arkiver>nicolas17: should work then
22:53:15<@arkiver>i believe (didn't check now) that it just return data to a POST as if it were a GET
22:53:28<@arkiver>another strategy i
22:54:21<@arkiver>another strategy i've used sometimes it that most POST request don't check any "extra" parameter you add, so you can add identifiable information in there, so the POSTed URLs can still be found/looked up manually (or with customization in the Wayback Machine at some point)
22:57:19<nicolas17>oh that should work without needing extra params
22:57:49<nicolas17>the URL will be unique
22:57:57<nicolas17>what I was checking is if you can just follow links in a browser and get a working download, and I *think* you can
22:58:38<@arkiver>sounds good :)
23:00:36Guest58 joins
23:00:59nine quits [Client Quit]
23:01:12nine joins
23:01:12nine quits [Changing host]
23:01:12nine (nine) joins
23:02:18<nicolas17>http://www.oldversion.com/windows/winrar-2-00 download button does a POST (with a timestamp and signature) to http://www.oldversion.com/windows/download/winrar-2-00, then that page does a POST to http://software.oldversion.com/download.php?f=<base64 data> which gets the actual file, the timestamp is in the base64 data too
23:03:06<nicolas17>that timestamp/signature seems to expire quick so those 3 should be done sequentially
23:10:47wickedplayer494 joins