00:04:25<mgrandi>But yeah I'll take a look tonight and see if it works for portal maps
00:27:35<Pedrosso>Great. I wouldn't like to be the one to do it but if nobody else would at the moment, then perhaps
00:32:43<mgrandi>Poke me if I forget but I'll spend some time to ight documenting it and put it on GitHub
00:34:29tbc1887 quits [Client Quit]
00:35:14kiryu joins
00:35:14kiryu quits [Changing host]
00:35:14kiryu (kiryu) joins
00:39:11tbc1887 (tbc1887) joins
01:15:14pabs quits [Remote host closed the connection]
01:29:49Naruyoko5 joins
01:31:50Naruyoko quits [Ping timeout: 240 seconds]
01:48:28<flashfire42>https://en.wikipedia.org/wiki/Republic_of_Artsakh has just ceased to exist in the last few days
01:50:26<pokechu22>Fortunately thanks to gooshka we've been running https://artsakhlib.am/ for a while (but that site's also super slow and errors out if you run it faster)
01:50:39<pokechu22>I think we might have run some of their other sites too, but it'd be good to re-run them
02:18:53enim3n joins
02:21:32enim3n quits [Remote host closed the connection]
02:21:32qwertyasdfuiopghjkl quits [Remote host closed the connection]
02:31:58pabs (pabs) joins
04:02:54Hackerpcs quits [Quit: Hackerpcs]
04:06:38Hackerpcs (Hackerpcs) joins
04:37:24atphoenix_ quits [Remote host closed the connection]
04:38:08atphoenix_ (atphoenix) joins
05:25:24BlueMaxima quits [Read error: Connection reset by peer]
05:36:20qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
06:03:50nulldata quits [Ping timeout: 240 seconds]
06:07:31Arcorann (Arcorann) joins
06:09:16nulldata (nulldata) joins
06:13:52Island quits [Read error: Connection reset by peer]
06:42:18c3manu (c3manu) joins
06:56:12sec^nd quits [Remote host closed the connection]
06:58:07sec^nd (second) joins
07:27:01<@arkiver>i'm looking into a project for hardware info since they have crazy strict rate limits
07:30:17c3manu quits [Remote host closed the connection]
07:30:46<@arkiver>if there are any "official" sites or youtube channels of https://en.wikipedia.org/wiki/Republic_of_Artsakh we should archive them in #archivebot (sites) and #down-the-tube (youtube)
07:32:05<flashfire42>https://www.spyur.am/
07:34:38<fireonlive>arkiver: hardware info?
07:42:59<@arkiver>fireonlive: went read only on january 1, see deathwatch
07:43:39<fireonlive>ah! thanks
07:43:51<fireonlive>i went google instead for some reason -_-
07:44:01<fireonlive>i of all people should know the wiki :D
07:50:47DogsRNice quits [Read error: Connection reset by peer]
07:51:43<pokechu22>https://www.spyur.am seems to have strict cloudflare unfortunately :/
07:52:02<pokechu22>(though it also sounds like it's Armenia in general, not Artsakh)
07:52:26rdosus joins
08:29:15rdosus quits [Client Quit]
08:41:33Megame (Megame) joins
08:52:03<fireonlive>manu|m & others here: looks like c3 is going to be deleting a number of matrix channels for the event (via the irc bridge to #37c3-hall-1 and others)
08:52:19<fireonlive>unsure if there's a way to save matrix channels & its attachments/threads/etc or ?
08:52:33<fireonlive>message: <admintechnicaladministrationnoc> PSA: This channel is a candidate for deletion. If you think this is a mistake, please let us know by replying to this message. Otherwise we are going to delete the channel in a few days. Thanks for using the matrix event chat, we are happy to hear your feedback:
08:52:33<fireonlive>https://events.ccc.de/congress/2023/hub/wiki/Feedback/
08:53:14<fireonlive>account for that seems to be @admin:events.ccc.de
08:53:19<fireonlive>(/whois)
09:02:51<fireonlive>(just in bed, but just got a ping they're going this for data privacy reasons before we rush into this)
09:03:07<fireonlive>(will respond/ask qs tomorrow)
09:17:53<@arkiver>JAA: what directory contained the bulk of the size of archive.mozilla.org from your recent scan?
09:18:04<@arkiver>(CC Ryz )
10:00:01Bleo18260 quits [Client Quit]
10:01:21Bleo18260 joins
10:17:52qwertyasdfuiopghjkl quits [Remote host closed the connection]
10:46:31qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
11:30:25eroc1990 quits [Read error: Connection reset by peer]
11:32:24eroc1990 (eroc1990) joins
12:03:22bocci (bocci) joins
12:35:20le0n quits [Ping timeout: 240 seconds]
12:40:41le0n (le0n) joins
12:46:15Arcorann quits [Ping timeout: 272 seconds]
13:11:27bocci quits [Remote host closed the connection]
13:16:16bocci (bocci) joins
13:36:34kiryu_ joins
13:40:05kiryu quits [Ping timeout: 272 seconds]
13:41:21kiryu_ quits [Ping timeout: 272 seconds]
13:51:47jacksonchen666 is now known as RJHacker67335
13:51:52jacksonchen666 (jacksonchen666) joins
13:52:33jacksonchen666 quits [Client Quit]
13:53:21RJHacker67335 quits [Ping timeout: 255 seconds]
14:16:49Naruyoko5 quits [Ping timeout: 272 seconds]
14:18:11qwertyasdfuiopghjkl quits [Remote host closed the connection]
14:53:46qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
14:58:56BenjaminKrausseDB joins
15:03:10<BenjaminKrausseDB>Hi all, I'm trying to get an old download from the Microsoft Download Center, which no longer seems to be available. I stumbled upon this page (https://wiki.archiveteam.org/index.php/Microsoft_Download_Center) which states that everything was archived. I found the file I'm looking for in the index (msxml6_SDK.msi), and the way I understand it, that
15:03:10<BenjaminKrausseDB>file should be findable in https://archive.org/details/archiveteam_microsoft_download?sort=title . However, I am completely confused as to how to find the file there. It seems to me that a number of files are bunched together into large downloads, but I can't figure out for the life of me in which one of those large downloads the file I'm looking
15:03:11<BenjaminKrausseDB>for is located. Is there any documentation or something that I'm missing?
15:06:26<@Sanqui>BenjaminKrausseDB: You probably want to download the index https://archive.org/details/microsoft_download_center_html_index_2020-08 which will tell you which warc contains which URL (file), and then use something like pywb to replay the warc and extract the file.
15:11:07<BenjaminKrausseDB>Thanks for the link, I found the file I'm looking for in there, I'm just not sure where to go from there. Or is it the ID I'm looking for?
15:12:12<BenjaminKrausseDB>Essentially this is what I found:
15:12:22<BenjaminKrausseDB>~~~<h3 id="3988"><a href="#3988">•</a>Microsoft Core XML Services (MSXML) 6.0 </h3><p>MSXML 6.0 (MSXML6) has improved reliability, security, conformance with the XML 1.0 and XML Schema 1.0 W3C Recommendations, and compatibility with System.Xml 2.0.</p>
15:12:22<BenjaminKrausseDB>href="https://web.archive.org/web/20200801/https://www.microsoft.com/en-us/download/details.aspx?id=3988">Original page</a>)</p>
15:12:23<BenjaminKrausseDB>href="https://web.archive.org/web/20200801/https://download.microsoft.com/download/2/e/0/2e01308a-e17f-4bf9-bf48-161356cf9c81/msxml6.msi">msxml6.msi</a> (1.5MB)</p>
15:12:23<BenjaminKrausseDB>href="https://web.archive.org/web/20200801/https://download.microsoft.com/download/2/e/0/2e01308a-e17f-4bf9-bf48-161356cf9c81/msxml6_ia64.msi">msxml6_ia64.msi</a> (3.6MB)</p>~~~
15:13:11<@Sanqui>Those archive.org links seem work for me and start a download
15:13:18<@Sanqui>so I guess that's exactly what you want!
15:13:45BenjaminKrausseDB2 joins
15:14:12<@Sanqui>BenjaminKrausseDB2: <@Sanqui> Those archive.org links seem work for me and start a download
15:14:12<@Sanqui><@Sanqui> so I guess that's exactly what you want!
15:14:51<BenjaminKrausseDB>OK, weird, they're not working here. I'll try those links on a different device...
15:16:15<@Sanqui>if the download doesn't start, try putting "id_" after the timestamp in the url, as such:
15:16:20<@Sanqui>https://web.archive.org/web/20200803205234id_/https://download.microsoft.com/download/2/e/0/2e01308a-e17f-4bf9-bf48-161356cf9c81/msxml6_ia64.msi
15:16:27<@Sanqui>might have better compatibility
15:18:18<nicolas17>yes that goes directly to a 3MB binary file
15:18:51<BenjaminKrausseDB>OK, it worked on my phone. I suspect my work network is blocking something (although usually it says something, not sure what my IT department pulled off this time). Thanks for the help!
15:19:22<@Sanqui>No prob, good luck getting that Itanic working!
15:19:50nic9070 quits [Ping timeout: 240 seconds]
15:20:05nic9070 (nic) joins
15:20:49bocci_ joins
15:23:57bocci quits [Ping timeout: 272 seconds]
15:26:08<BenjaminKrausseDB>Thanks, I think I'll need the luck the way this has been going up until now '=D
15:26:25BenjaminKrausseDB2 quits [Ping timeout: 265 seconds]
15:29:27Naruyoko5 joins
15:31:00treora quits [Remote host closed the connection]
15:31:02treora joins
15:31:09treora quits [Remote host closed the connection]
15:31:11treora joins
15:40:01<BenjaminKrausseDB>Got it working! Thanks for the help and all the work you guys do!
15:40:33BenjaminKrausseDB2 joins
15:40:47BenjaminKrausseDB quits [Remote host closed the connection]
15:45:30BenjaminKrausseDB2 quits [Remote host closed the connection]
15:58:20bocci_ quits [Ping timeout: 240 seconds]
15:59:14bocci_ joins
16:03:23Deewiant quits [Remote host closed the connection]
16:20:19riku quits [Ping timeout: 272 seconds]
16:21:26<fireonlive>^_^
16:51:14Deewiant (Deewiant) joins
17:10:24JayEmbee quits [Quit: WeeChat 2.3]
17:12:53bocci_ quits [Read error: Connection reset by peer]
17:14:32bocci_ joins
17:15:20riku (riku) joins
17:19:24Kitty (Kitty) joins
17:20:29bocci_ quits [Ping timeout: 272 seconds]
17:21:15bocci_ joins
17:27:42treora quits [Remote host closed the connection]
17:27:45treora joins
17:42:17treora quits [Remote host closed the connection]
17:42:21treora joins
17:50:48poetav__ joins
17:53:20bocci_ quits [Ping timeout: 240 seconds]
17:53:42<h2ibot>FireonLive edited Deathwatch (+371, add bear.community): https://wiki.archiveteam.org/?diff=51457&oldid=51455
17:53:49<fireonlive>that was fast
17:54:23<fireonlive>luck of the cron
17:57:19bocci_ joins
17:57:42<h2ibot>FireonLive edited Current Projects (+78, add pastebin): https://wiki.archiveteam.org/?diff=51458&oldid=51407
17:57:43<h2ibot>FireonLive edited Pastebin (+24, DPoS): https://wiki.archiveteam.org/?diff=51459&oldid=47706
17:59:42<h2ibot>FireonLive edited Pastebin (+23, add CTA, make more secure): https://wiki.archiveteam.org/?diff=51460&oldid=51459
18:00:50poetav__ quits [Ping timeout: 240 seconds]
18:03:56Island joins
18:07:28JayEmbee (JayEmbee) joins
18:39:18IRC2DC joins
18:50:17<thuban>speaking of pastebin, i've noticed that the project code makes no attempt to extract outlinks from paste content. is that a deliberate choice?
18:55:47<fireonlive>hmmm. lots of spam there, but i think it's an older project so maybe not?
18:56:33<thuban>yeah, hence my uncertainty
18:58:43<fireonlive>arkiver?
19:00:06<thuban>could be a good source for links to filesharing projects (like mediafire or zippyshare) since it's often used as an agglomerator
19:01:28<thuban>(i know of at least one subreddit that bans download links, to avoid the attention of site admins, but tacitly encourages pastebins of same)
19:12:53<bocci_>speaking of hid URLs, have projects ever made an effort to catch base64 encoded urls
19:14:07<bocci_>using rot13 or base64, some file sharing communities hide mega, mediafire URLs from bots that issue DMCA takedowns
19:15:33<nicolas17>I question if those particular links are the kind of thing we want to archive >.>
19:16:10Doranwen quits [Quit: bbl]
19:16:20<bocci_>sure
19:18:15<thuban>bocci_: no, afaik no projects have ever implemented that kind of filter-evasion matching
19:18:18<thuban>(there's some attempt to repair broken urls, but mainly for accidental syntax-mangling)
19:19:01<bocci_>thanks, i just wanted to know/make it known
19:19:56<bocci_>an example of a history of these encoded links being used:
19:19:57<bocci_>https://warosu.org/ic/thread/6960541#p6960541
19:20:16<thuban>nicolas17: it can be legit. i remember doing a bunch of those manually during the zippyshare project--they were video game mods from some forum crawl
19:28:09<fireonlive>!tell Doranwen do you have a wiki account?
19:28:09<eggdrop>[tell] ok, I'll tell Doranwen when they join next
19:28:43<fireonlive>ah yeah, base64 has been used a lot in /r/piracy wiki i think?
19:28:47<fireonlive>or some reddit wiki
19:29:22<bocci_>for the record, the strings aren't random or encrypted
19:29:35<bocci_>a base64-encoded https link always starts with aHR0cHM6Ly
19:30:11<bocci_>and mediafire links aren't hard to spot once you memorize the pattern
19:30:25<bocci_>https://www.mediafire.com/file/not-real
19:30:31<bocci_>https://www.mediafire.com/file/some-file
19:30:42<bocci_>aHR0cHM6Ly93d3cubWVkaWFmaXJlLmNvbS9maWxlL25vdC1yZWFsCg==
19:30:47<bocci_>aHR0cHM6Ly93d3cubWVkaWFmaXJlLmNvbS9maWxlL3NvbWUtZmlsZQo=
19:31:03<fireonlive>ig yu'd want to look for aHR0cHM6Ly8 and aHR0cDovLw (https:// and http://)
19:31:10<fireonlive>oh no 8
19:31:25<fireonlive>interesting idea though i like it
19:33:10<thuban>would miss protocol-stripped links, but you'd have to get really aggressively heuristic to catch the general case, soz
19:33:16<thuban>interesting, i concur
19:35:40<bocci_>i think you can find protocol-stripped links automatically without some crazy heuristic
19:35:47<bocci_>if you limit yourself to some hosts
19:36:32<bocci_>d3d3Lm1lZGlhZmlyZS5jb20K = www.mediafire.com
19:37:08<bocci_>it's such a specific string, you wouldn't have any false positives
19:37:21<thuban>correct, but due to the way we backfeed discovered urls between projects, that could get awkward to maintain
19:38:27<bocci_>i have no idea about that
19:41:18<fireonlive>i suppose for pastebin itself someone could make something bespoke to scrape the warcs
19:48:07<thuban>fireonlive: someone has :P
19:51:24<thuban>by which i mean JAA's done a horrible one-liner a couple of times.
19:54:39<thuban>bocci_: basically, if a project discovers outlinks, it sends them to the general urls project (#//), which checks them against the list of site-specific projects and forwards them appropriately if there's a match
19:54:48<thuban>if every project were to discover obfuscated outlinks to a specific list of hosts, then every project would need the list of site-specific projects
19:55:42<thuban>and keeping an n:n system consistent is hell compared to 1:n
19:55:43<fireonlive>ah :D
19:56:51<fireonlive>hmmmm. i guess you could use those 'indicators' for b64 http/https and do further local processing if found?
19:57:00<fireonlive>then ship it to urls as normal?
19:57:14<thuban>right
19:57:25<fireonlive>sounds fun :)
20:14:41<fireonlive>-+rss- Niklaus Wirth Passed Away: https://twitter.com/Bertrand_Meyer/status/1742613897675178347 https://news.ycombinator.com/item?id=38858012
20:14:42<eggdrop>nitter: https://nitter.net/Bertrand_Meyer/status/1742613897675178347
20:16:25<qwertyasdfuiopghjkl>You would also need to account for all the different possible capitalizations of http:// and https:// since that would change the base64
20:38:49BlueMaxima joins
20:45:31c3manu (c3manu) joins
21:17:26<nicolas17>iOS 17.3 beta 2 was released today, and soon it was discovered that it caused iPhones with a certain feature enabled to boot-loop, so 3 hours later it was pulled from the update server
21:18:27<nicolas17>they *might* delete the actual files from the CDN too... sum of all variants is 239GB, is this too much? would it work on AB or urls?
21:18:29<nicolas17>JAA: ^
21:22:36Naruyoko5 quits [Remote host closed the connection]
21:22:57Naruyoko5 joins
21:23:59<bocci_>dumb question: what's wrong with just downloading the files and uploading to an archive.org collection if you wish to archive them
21:24:47<nicolas17>I could, and I have done that for files that were *already* deleted but I recovered from elsewhere
21:25:07<nicolas17>but then it won't work on WBM
21:25:22<bocci_>oh
21:26:03<bocci_>i've felt wrong for using the WBM for large files
21:26:04<nicolas17>and with my Internet it would take 20 hours to upload, but upload speeds *to IA* are usually worse
21:27:35<bocci_>i kinda had the sense that directly hitting images/files on the WBM was an unintended effect of saving web pages
21:28:05<bocci_>wayback machine is for webpages
21:28:12<bocci_>i think im wrong
21:28:21<nicolas17>idk, that's why I'm asking first :P
21:35:25<thuban>bocci_: nothing wrong with having files in the wbm--in fact it's good, because it's more authoritative _and_ more discoverable than just having them somewhere on archive.org
21:35:30<thuban>(if you find a link somewhere and it's dead, it's a lot easier to plug the url into the wbm than to search around and maybe find a relevant item and maybe find the file within the item and hope it's correct)
21:35:34<thuban>buuut there's a lot of duct tape involved, so idk how large is too large either
21:36:37<nicolas17>it's 34 files from 6363 MiB to 7756 MiB
21:37:28<bocci_>in total or each?
21:38:02<nicolas17>as I said total is 239GB x_x
21:39:00<nicolas17>MiB|url: https://paste.debian.net/1302977/
21:43:38<pokechu22>nicolas17: doing it via AB is probably fine
21:44:25<pokechu22>just got to make sure it ends up on firepipe (1.44 TiB free) or addax (524 GiB free) per http://archivebot.com/pipelines
21:45:54<pokechu22>an !ao < list of https://transfer.archivete.am/inline/zkuP2/ios_17.3_beta_2_cdn_urls.txt (which deliberately includes that paste at the top as a small file) should be fine, I'll run it unless you've got a different plan
22:01:49Megame quits [Client Quit]
22:22:02jacksonchen666 (jacksonchen666) joins
22:35:27c3manu quits [Remote host closed the connection]
22:50:41neggles quits [Quit: bye friends - ZNC - https://znc.in]
22:52:55neggles (neggles) joins
22:56:59simon816 quits [Remote host closed the connection]
22:59:36<@JAA>arkiver: Re archive.mozilla.org, I don't remember, but I believe I posted the link to the full JSONL scan output here some weeks ago.
23:00:42<audrooku|m>Is jsonl the same as ndjson?
23:02:26<@JAA>thuban: Can confirm, have written such horrible one-liners. 60% of the time, they work every time!
23:03:05<@JAA>audrooku|m: Yes
23:04:33<@JAA>Also referred to as 'JSON Lines' and some other variations. But .jsonl is the common file extension, and application/jsonl is the proposed media type.
23:05:13<@JAA>Also 'Line-Delimited JSON', which has absolutely no potential of confusion with the entirely unrelated JSON-LD.
23:05:50Hackerpcs quits [Ping timeout: 240 seconds]
23:06:14<@JAA>nicolas17, pokechu22: Yes, fine with AB. Large pipeline's a good idea, but if all pipelines are full, !ao < should end up on firepipe-ao anyway (unless that's full as well, didn't check).
23:06:41<@JAA>(Of course, firepipe-ao won't run jobs queued with --pipeline.)
23:07:20<thuban><@JAA> arkiver: [...] I believe I posted the link to the full JSONL scan output here some weeks ago.
23:07:21<pokechu22>It looked good as of an hour ago (I also see you got rid of addax-ao, which I guess makes sense because firepipe-ao receives jobs much faster)
23:07:23<thuban>https://transfer.archivete.am/a0mjU/archive.mozilla.org-files.jsonl.zst
23:07:29<thuban>(https://hackint.logs.kiska.pw/archiveteam-bs/20231118#c390573)
23:08:56<@JAA>pokechu22: Yeah, that's why. jap-addax-ao was taking a minute or more to dequeue a job, just horrendous.
23:09:34Hackerpcs (Hackerpcs) joins
23:10:08<pokechu22>It's running (ab job ew2dbtuft08uz2xe0tf4lhlcv)
23:12:54<@JAA>:-)
23:13:26<fireonlive>^_^
23:13:55<thuban>JAA, any thoughts on the wiki changes suggested in #//?
23:27:34<nulldata>https://www.polygon.com/24024266/kim-kardashian-mobile-game-shutting-down-glu-mobile
23:30:50simon816 (simon816) joins
23:30:51<nulldata>https://www.eurogamer.net/stray-souls-developer-shuts-down-following-publishers-closure-cyberbullying-and-poor-sales
23:31:40<nulldata>Doesn't look like Stray Souls has a website anymore, but they do have a Twitter if someone could throw it in AB. https://twitter.com/jukaistudio
23:31:41<eggdrop>nitter: https://nitter.net/jukaistudio
23:34:28simon816 quits [Client Quit]
23:35:42<fireonlive>added it to next on the pad for when one of the two active finish
23:44:39<nicolas17>https://developer.apple.com/documentation/ios-ipados-release-notes/ios-ipados-17_3-release-notes now finally acknowledging the issue
23:47:24<fireonlive>archivebotted
23:50:50simon816 (simon816) joins