00:01:19azalea_sh__ (azalea_sh_) joins
00:02:33azalea_sh_ quits [Ping timeout: 272 seconds]
00:02:59azalea_sh__ quits [Remote host closed the connection]
00:03:12azalea_sh_ (azalea_sh_) joins
00:03:51<azalea_sh_>My client seems to love throwing me out, if I miss something or go offline later, please reach out to aza@tabby.ing
00:08:43azalea_sh_ quits [Read error: Connection reset by peer]
00:08:52azalea_sh_ (azalea_sh_) joins
00:08:57azalea_sh_ quits [Read error: Connection reset by peer]
00:09:22azalea_sh_ (azalea_sh_) joins
00:09:38azalea_sh_ quits [Read error: Connection reset by peer]
00:10:21azalea_sh_ (azalea_sh_) joins
00:11:15<azalea_sh_>I give up on this client, I'll reconnect to IRC tomorrow but will be reachable by email until then and afterwards too, I forgot what a pain IRC can be
00:11:17azalea_sh_ quits [Remote host closed the connection]
00:14:58etnguyen03 quits [Client Quit]
00:29:01<h2ibot>Nintendofan885 edited Talk:Deathwatch (+16, should link to the AT IRC page): https://wiki.archiveteam.org/?diff=58604&oldid=58603
00:30:54etnguyen03 (etnguyen03) joins
00:31:48<dendory>Seems like the apps were just deleted from the app stores and content unavailable, with no prior warning, according to user reports: https://www.reddit.com/r/amino/
00:46:54etnguyen03 quits [Client Quit]
00:50:05<h2ibot>PaulWise edited ArchiveBot/Ignore (+154, drupal: use ident; url from ws:// - /cc klea): https://wiki.archiveteam.org/?diff=58605&oldid=58573
01:06:46notarobot17 quits [Ping timeout: 256 seconds]
01:10:28JayEmbee (JayEmbee) joins
01:15:53<@arkiver>i will send azalea_sh_ an email now
01:17:05<pabs>klea: personally I don't much like {{URL}} (I prefer the browser do it) but if it is going to exist then having an equivalent for SWH/Codearchiver links sounds good
01:18:58<pabs>JAA: re archive.today and {{URL}}, I already modified the Template:URL page, so you can see the effect on any page. before the link was to the list of all archives of a page, now it is to the latest archive of a page, with a date and the original URL in the URL
01:20:45<pabs>(the list of all archives only links to those short URLs, but /timegate/ redirects to the date-based URLs)
01:22:12SootBector quits [Remote host closed the connection]
01:23:18SootBector (SootBector) joins
01:24:56AlsoHP_Archivist quits [Quit: Leaving]
01:25:12HP_Archivist (HP_Archivist) joins
01:25:37<@JAA>pabs: Hmm. List of all archives is what we do for the WBM, isn't it?
01:25:51<@JAA>Yeah
01:26:02<pabs>ah, so it is
01:26:10<pabs>reverting then
01:27:20<h2ibot>PaulWise edited Template:Url (-9, Undo revision 58569 by…): https://wiki.archiveteam.org/?diff=58606&oldid=58569
01:37:44v01d joins
01:41:15etnguyen03 (etnguyen03) joins
01:43:23<h2ibot>PaulWise edited Facebook (+80, ArchiveBot, shorten Mnbot): https://wiki.archiveteam.org/?diff=58607&oldid=58579
01:44:48<pabs>klea: re the repo URL thing, maybe just an optional parameter code=1 to enable those links. since for most links you probably also want the IA save too?
01:45:17<pabs>(and auto-enable the code links for major code hosting sites)
02:10:27<h2ibot>PaulWise edited LinkedIn (+261, add AB/SPN/archive.is/Mnbot status): https://wiki.archiveteam.org/?diff=58608&oldid=58581
02:11:27<h2ibot>PaulWise edited LinkedIn (+17, -u stealth): https://wiki.archiveteam.org/?diff=58609&oldid=58608
02:38:42<pabs>https://news.bloomberglaw.com/bankruptcy-law/robot-vacuum-roomba-maker-files-for-bankruptcy-after-35-years
02:40:40<nicolas17>...35?
02:44:50<nicolas17>"iRobot no longer design or manufacture its own robots and buy in outsourced products from another company and have their own branding applied"
02:45:09<nicolas17>ok so there was nothing but the brand left anyway
02:45:11jason joins
02:50:33<h2ibot>Nyakase edited Alive... OR ARE THEY (-210, Remove Amino, mobile app delisted from stores…): https://wiki.archiveteam.org/?diff=58610&oldid=58555
02:52:33<h2ibot>PaulWise edited Forum list (+8312, more forums): https://wiki.archiveteam.org/?diff=58611&oldid=56227
03:04:11<egallager>pabs: I approve of the idea of adding a `code=1` parameter to the URL template
03:06:42<cruller>Maybe I should have created [[SPN]] and redirected it to [[Internet_Archive/Save_Page_Now]]?
03:10:36<h2ibot>Cruller created SPN (+44, Redirected page to…): https://wiki.archiveteam.org/?title=SPN
03:10:51<cruller>It's not too late
03:15:47<dendory>From Reddit: Chinese Catholic Archive facing imminent shutdown due to government defunding during Christmas. https://old.reddit.com/r/Archiveteam/comments/1pmw0fv/urgent_chinese_catholic_archive_facing_imminent/
03:23:58cyanbox_ joins
03:27:18cyanbox quits [Ping timeout: 256 seconds]
03:33:09cyan_box joins
03:37:15cyanbox_ quits [Ping timeout: 272 seconds]
03:38:40<h2ibot>Nicolas17v2 edited 짱공유닷컴 (-38, update status): https://wiki.archiveteam.org/?diff=58613&oldid=58211
03:41:13<cruller>dendory: Is 万有真原 itself sending an SOS? I'm not doubting evrosil, though.
03:45:42<h2ibot>Nicolas17v2 edited 짱공유닷컴 (+54, they did shut down at the announced time): https://wiki.archiveteam.org/?diff=58614&oldid=58613
03:57:56hackbug quits [Remote host closed the connection]
03:58:02etnguyen03 quits [Remote host closed the connection]
04:00:20hackbug (hackbug) joins
04:11:47Island quits [Read error: Connection reset by peer]
05:03:53<h2ibot>Cruller edited Deathwatch (+410, /* 2025 */ Add Amino): https://wiki.archiveteam.org/?diff=58616&oldid=58554
05:18:04DogsRNice quits [Read error: Connection reset by peer]
05:40:59<h2ibot>Cooljeanius edited Library of Alexandria (+26, add language): https://wiki.archiveteam.org/?diff=58617&oldid=58500
05:44:37<klea>pabs: yeah true
06:00:59nexussfan quits [Read error: Connection reset by peer]
06:01:05nexussfan (nexussfan) joins
06:20:06jason quits [Remote host closed the connection]
06:26:03h2ibot quits [Remote host closed the connection]
06:26:41h2ibot (h2ibot) joins
06:31:42nexussfan quits [Client Quit]
06:33:00cyan_box quits [Read error: Connection reset by peer]
06:35:40cyanbox joins
07:28:24<h2ibot>A053096 edited Alive... OR ARE THEY (+112, /* Watchlist */): https://wiki.archiveteam.org/?diff=58618&oldid=58610
07:28:25<h2ibot>Fmprod edited Deathwatch (+186, Added icebergcharts.com): https://wiki.archiveteam.org/?diff=58619&oldid=58616
07:28:26<h2ibot>NichardRixon edited List of websites excluded from the Wayback Machine (+26, Added one site): https://wiki.archiveteam.org/?diff=58620&oldid=58308
07:28:36<@JAA>There's a pending edit by 'Brad' that adds a bunch of things to Deathwatch and removes a bunch of other entries. Does someone want to deal with partially reverting that?
08:53:00h|nest (h) joins
08:57:05rohvani quits [Ping timeout: 272 seconds]
08:57:18<h|nest>There doesn't appear to be a channel for https://gitea.arpa.li/ArchiveTeam/http2irc, so I'll ask here: how would I go about setting up an instance of it?
08:58:18h|nest is now known as h|ca2
09:03:43tertu2 (tertu) joins
09:04:41tertu quits [Ping timeout: 272 seconds]
09:07:32rohvani joins
09:08:14evergreen58 joins
09:10:24<@JAA>h|ca2: Probably trial and error and reading code. Packaging and documenting it is somewhere on my todo list. From when I wrote it over six years ago until like a week ago, I believe only a single person wanted to use it, so it hasn't been a priority at all.
09:11:39evergreen5 quits [Ping timeout: 272 seconds]
09:11:39evergreen58 is now known as evergreen5
09:14:16<@JAA>s/use it/run their own instance/
09:19:13<h|ca2>Is that person who I'd think of? If so, I'll probably ask them elsewhere
09:19:50<h|ca2>I just realized I probably misunderstood, sorry
09:20:17<@JAA>The person you're probably thinking of is the one from a week ago.
09:21:09<h|ca2>Okay, just asked them, you should be able to see the message where I did
09:21:16<@JAA>:-)
09:38:04Wohlstand (Wohlstand) joins
09:44:45NF885 (NF885) joins
09:46:14<NF885>looks quite weird with https://archive.org/details/archiveteam?sort=-addeddate being only AB and daily mail atm due to the AT account not uploading since yesterday
09:47:02Dada joins
09:47:06<masterx244|m>maybe a random fail somewhere gumming up IA uploads and requiring a manual whack with a wrench
09:47:28<masterx244|m>(had that a few days ago where a random error interrupted a few uploads, had to remove the done parts from CSV and then rerun)
09:49:46Webuser969416 joins
09:50:09<NF885>you also had the collection thing fail a couple of days which is why the inbox currently has a lot of items
09:58:02Webuser969416 quits [Client Quit]
10:02:11NF885 quits [Client Quit]
10:16:07nepeat quits [Quit: ZNC - https://znc.in]
10:17:26nepeat (nepeat) joins
10:19:46<hexagonwin>is brozzler good for large scale (~200GB) crawling? i'm keep having various issues with browsertrix
10:19:48<h2ibot>Usernam edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=58621&oldid=58620
10:20:49<hexagonwin>(browsertrix keeps showing this "Link extraction timed out" and "Error creating WACZ", i can easily find missing pages)
10:54:27v01d quits [Remote host closed the connection]
12:16:50simon816 quits [Remote host closed the connection]
12:20:51simon816 (simon816) joins
12:59:36Wohlstand quits [Quit: Wohlstand]
13:03:07<TheTechRobo>FWIW, I am running an instance of http2irc myself in Docker
13:05:07Webuser459143 joins
13:05:36Webuser459143 quits [Client Quit]
13:05:47<TheTechRobo>I use python:3.9-slim and install aiohttp ircstates toml
13:06:36<TheTechRobo>The configuration is pretty straightforward. I did struggle a bit with setting up a SASL certificate but that was probably a skill issue on my behalf :P
13:08:25<h|ca2>TheTechRobo: am I likely to have success with a venv and those three packages?
13:10:27<TheTechRobo>Probably, assuming it's compatible with whatever python version you're using. I don't know if there's any particulars about that
13:11:25Dada quits [Remote host closed the connection]
13:11:37Dada joins
13:16:21<h|ca2>Going to try on...
13:16:58<h|ca2>whatever debian 13 has (tried to run python3 --version in my http2irc container but no python installed)
13:20:08<h|ca2>that is 3.13
13:28:19<h|ca2>For future people: apparently cargo is needed for a dependency when py 3.13 is used
13:29:02<@arkiver>hexagonwin: i'm not sure, but when it comes to WARC integrity, there are serious problems with webrecorder tools
13:29:11<@arkiver>so i'd definitely recommend switching to brozzler if possible
13:29:41<hexagonwin>thanks. since some backup is still better than nothing i was trying whatever tool i can lol
13:59:59sec^nd quits [Remote host closed the connection]
14:00:36sec^nd (second) joins
14:05:15<@arkiver>hexagonwin: yep that is correct
14:07:41Webuser935491 joins
14:08:13<hexagonwin>arkiver: regarding bufftoon (which is shutting down about 40hrs from now) i got a list of urls that should be downloaded https://p.z80.kr/bufftoon_urls_251215
14:08:18<hexagonwin>this can be downloaded with a dumb crawler (no custom logic) since most useful pages (the comics itself) can be discovered by <a> tags
14:08:21<@arkiver>i was about to ask :)
14:08:26<@arkiver>you were able to get a list of IDs?
14:08:37<@arkiver>they seems somewhat sequential, but also not fully
14:08:58<Webuser935491>Hi, may I ask for this to be backed up? https://dphk.org/ - this is website of Democratic Party in HK which said yesterday they will shut down bc of pressure from the authorities. [1]
14:08:58<Webuser935491>[1] https://www.thetimes.com/world/asia/article/hong-kongs-democratic-party-disbands-lm2psllc7
14:09:06<hexagonwin>so the only way to discover new comics seem to be via the 'genre' page like https://bufftoon.plaync.com/tag/1?currentType=webtoon for fantasy
14:09:10<@arkiver>hexagonwin: actually, is this all? seems very little
14:09:36<hexagonwin>i simply iterated $i from 1 to 100000 on this url https://api-bufftoon.plaync.com/v1/tag/$i/series?contentsType=webtoon&offset=0&limit=1000
14:09:44<@arkiver>that works i guess
14:09:54<@arkiver>if it's this few, we'll not have a warrior project
14:10:07<@arkiver>Webuser935491: yes, archiving it
14:10:15<hexagonwin>maybe there's some missing, idk. ideally if you can maybe grab all pages like https://bufftoon.plaync.com/series/697057 then get the genre pages from all of them recursively
14:10:23<hexagonwin>i don't have the knowledge to do so,,
14:10:30<Webuser935491>arkiver thanks
14:10:44<@arkiver>Webuser935491: do they have a youtube channel?
14:11:05<hexagonwin>(the ID for each genre page can be seen from <script> inside that /series/ page's html)
14:11:06<@arkiver>hexagonwin: i'll try to do a quick check over all IDs
14:11:52<hexagonwin>i think the site is pretty simple structurally, so you'll be easily able to figure out stuff like the comments section
14:12:16<Webuser935491>arkiver Youtube: https://www.youtube.com/user/dphkdphk
14:12:19<hexagonwin>if there's anything ambiguous etc please ping me
14:12:59<@arkiver>hexagonwin: if it's this few, i'm not sure about a warrior project
14:13:05<hexagonwin>it seems like they do issue an IP ban though
14:13:05<Webuser935491>arkiver
14:13:05<Webuser935491>Facebook: https://www.facebook.com/thedphk/
14:13:05<Webuser935491>Twitter: https://www.facebook.com/thedphk/
14:13:14<@arkiver>hmm
14:13:53<hexagonwin>for only the comics i also don't think warrior is needed, but to grab non static stuff like comments i do think we'll need a custom crawler
14:20:16<hexagonwin>oh wait, maybe it wasn't an ip ban? some urls are always giving 400 while some still works
14:23:30<hexagonwin>arkiver: ah, i figured it out. not an IP ban, but they do seem to have rate limiting. the comic image link changes every time lmao
14:23:52<hexagonwin>i mean urls like https://secure-bufftoon.gscdn.plaync.com/v?_lsu_sa_=3d5546(long ID)
14:29:19<klea>JAA: could you send me the edit by 'Brad'?, i'm unsure if i'll get to it soon, so i'd rather it not get approved
14:30:13<@arkiver>klea: why do you want the edit to be not approved?
14:30:16<@arkiver>looking at it too
14:30:59<klea>i mean, it could be approved
14:31:13<klea>it'd just take time to do the partial revention potentially, so i'd like it to not be live too much time
14:31:25<@arkiver>are you Brad?
14:31:33<h2ibot>Wyatt1267 uploaded File:DeviantArtLogo.png: https://wiki.archiveteam.org/?title=File%3ADeviantArtLogo.png
14:31:41<klea>arkiver: no
14:31:59<klea>i guess if you want, approve it, and i'll try to revert it partially now
14:33:11<@arkiver>it's a significant change, also deletes several entries. some of which indeed do not have a deadline anymore, other i don't remember
14:33:27<klea>oh
14:33:36<klea>i thought it'd be reverting the removals only
14:37:34<h2ibot>Cooljeanius edited Alive... OR ARE THEY (+11, /* Watchlist */ copyedit "Kemono" entry a bit): https://wiki.archiveteam.org/?diff=58623&oldid=58618
14:59:32Wohlstand (Wohlstand) joins
15:13:39<h2ibot>Zen edited List of websites excluded from the Wayback Machine (+24, https://purotora.com/): https://wiki.archiveteam.org/?diff=58624&oldid=58621
15:51:27azalea_sh_ (azalea_sh_) joins
15:51:58<azalea_sh_>am back from yesterday!
15:52:23<azalea_sh_>with a better connection that is
15:55:41<@arkiver>hi
15:55:48<azalea_sh_>so yeah regarding amino: the main reason for thinking that the platform is down for good is that they've removed their A record for the main domain and stripped their fastly configs for their API and additionally got rid of basically the entire team
15:56:03<@arkiver>azalea_sh_: i was going to send you an email, but we can do this here too
15:56:12<@arkiver>note that this channel is publicly logged
15:56:18<azalea_sh_>Sure! Thats fine w/ me
15:56:23<@arkiver>you mentioned there's fully public data in your list and "not so public" data
15:56:33<@arkiver>what do you need to separate them?
15:56:39<@arkiver>we can certainly archive the few TBs of public data
15:56:44<@arkiver>if you have a list of URLs for us
15:57:01<@arkiver>the data would become available in the Wayback Machine (web.archive.org)
15:57:27Dada quits [Remote host closed the connection]
15:57:39Dada joins
15:57:46<azalea_sh_>I probably would be able to filter by checking against the community ID which has a field for specing whether it's public or private
15:58:00<azalea_sh_>it's a fair bit of data so aggregations always take hours lol
15:58:28<azalea_sh_>I can put together a list of all of the public media files though, that shouldn't be a big problem, I *think*
16:01:12<azalea_sh_>Mm there's a slight problem, a lot of communities put themselves to private to protect themselves from the massive bot problem
16:02:06<azalea_sh_>I can probably get by that with some manual review
16:02:41<azalea_sh_>Is there a prefered upload forum for the URL list i can note for later? or just text.archivete.am?
16:04:08<justauser>transfer.archivete.am
16:04:18<azalea_sh_>:+1:
16:10:56<justauser>Rob Reiner died. No website, but some social media presence.
16:27:05ichdasich quits [Quit: trixie]
16:29:16ichdasich joins
16:30:05cyanbox quits [Read error: Connection reset by peer]
16:32:36<azalea_sh_>Running first rudimentary link extraction for the completely public subset, I'll be also making a urls file which includes semi public communities (set to private but were publicly accessible, a lot of plattform's largest communities fall into that category)
18:02:50<@arkiver>thanks azalea_sh_ :)
18:06:25FiTheArchiver joins
19:10:25Webuser935491 quits [Quit: Ooops, wrong browser tab.]
19:20:36<azalea_sh_>okay okay, update: I've extracted the fully public links, cleaned them, will do some dedup after transferring them to my server then I can upload
19:20:56<azalea_sh_>its >4.7G so I'll compress and then upload in parts
19:22:08<azalea_sh_>luckily my server has good bandwidth so at least that part should be fast lol
19:25:21Juest quits [Ping timeout: 272 seconds]
19:26:40Juest (Juest) joins
19:27:29FiTheArchiver quits [Client Quit]
19:49:01<azalea_sh_>Everything uploaded!
19:51:02flotwig quits [Read error: Connection reset by peer]
19:51:15<azalea_sh_>https://transfer.archivete.am/PUqRe/amino.md has the links & reconstruction commands (Guessing that wasn't necessary but yeah)
19:51:15<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/PUqRe/amino.md
19:51:16flotwig joins
19:51:43<azalea_sh_>Format is single TXT 1 URL/line
19:53:26<azalea_sh_>1.5G gz file split to 4 gzpart, decompresses to 4.7G txt w/ 54,915,836 lines
19:54:24<azalea_sh_>I'll be gathering the semi public data overnight, i forgot how long the aggregations take lol
19:58:41<@arkiver>perfect, thank you
19:58:47<@arkiver>this should be a few TB only right?
19:58:50<@arkiver>azalea_sh_: ^
20:01:06<azalea_sh_>Right
20:01:45<azalea_sh_>My estimate is between 6-7TB after having scanned >1.5M of those links the file sizes average out to that (with respect to mediatype and subdomain)
20:03:34<@arkiver>JAA: do you think this may be something we can simply put in AB?
20:06:11<@arkiver>azalea_sh_: did you come across any limits? like rate limiting, or IP banning?
20:07:26<azalea_sh_>nope, worked both from residential IP as well as commercial IP from a VPS, it didn't ratelimit even at >80 parallel connections
20:07:58<azalea_sh_>even after >24h of a trial run I never ran into any limits
20:11:51DogsRNice joins
20:29:43khaoohs quits [Read error: Connection reset by peer]
20:30:26<@arkiver>sounds good
20:30:38khaoohs joins
20:33:19khaoohs_ joins
20:33:37lennier2_ joins
20:36:22lennier2 quits [Ping timeout: 256 seconds]
20:36:55khaoohs quits [Ping timeout: 272 seconds]
20:38:38Dango360 quits [Ping timeout: 256 seconds]
20:51:25<Yakov>Chrome/firefox still falsely flags transfer.archivete.am as malicious: https://transparencyreport.google.com/safe-browsing/search?url=http:%2F%2Ftransfer.archivete.am%2F&hl=en
20:52:14<Yakov>For reason: "Send visitors to harmful websites" https://img.yakov.cloud/LyZYM.png
21:01:52flotwig quits [Ping timeout: 256 seconds]
21:03:05flotwig joins
21:19:09<that_lurker>report it as safe
21:20:00<that_lurker>https://www.virustotal.com/gui/url-analysis/u-bbdbb8e3a8c7a99126a15a893ef54d9e9237f63cdfee348b00491295545eb8b9-d9be2e1f
21:29:39<Yakov>Best would be if someone at AT who could check search console to see what URL specifically is being flagged
21:30:02<Yakov>Anyways, I reported it as safe here: https://safebrowsing.google.com/safebrowsing/report_phish/
21:37:05Shard7959 quits [Ping timeout: 272 seconds]
21:44:07Shard7959 (Shard) joins
21:48:55NeonGlitch quits [Quit: Textual IRC Client: www.textualapp.com]
21:57:22NeonGlitch (NeonGlitch) joins
22:59:21cyanbox joins
23:06:39monoxane quits [Quit: estoy fuera]
23:14:56nexussfan (nexussfan) joins
23:15:25Wohlstand quits [Quit: Wohlstand]
23:17:47abirkill quits [Ping timeout: 272 seconds]
23:19:44v01d joins
23:21:51etnguyen03 (etnguyen03) joins
23:24:40Dada quits [Remote host closed the connection]
23:31:23monoxane (monoxane) joins
23:34:22abirkill (abirkill) joins
23:46:55abirkill quits [Ping timeout: 272 seconds]
23:50:35HackMii quits [Remote host closed the connection]
23:50:57HackMii (hacktheplanet) joins
23:52:07<pabs>bugzilla.kernel.org is to be shut down at some point, I'm sending them an email now. https://lwn.net/SubscriberLink/1050177/e2d325a7a3262cbf/ /cc c3manu JAA
23:56:39<that_lurker>are they moving to another system or just email?
23:59:25etnguyen03 quits [Client Quit]