| 00:02:28 | | killsushi joins |
| 00:05:53 | <andrew> | wow, apparently opening wpull.db (even read only) over Samba from another computer causes all sorts of bad things to happen |
| 00:06:33 | <andrew> | wtf is "Fatal Python error: Bus error" supposed to mean |
| 00:06:44 | <nicolas17> | is there a wpull.db-shm and wpull.db-wal too? |
| 00:06:48 | <andrew> | yes |
| 00:07:17 | <pokechu22> | Yeah, using a network filesystem is one of the strategies listed at https://sqlite.org/howtocorrupt.html |
| 00:07:18 | <nicolas17> | okay so the database is in "WAL mode", and will totally fuck up if you access it over a network, because the data in the -shm file is not actually accessible via shared memory |
| 00:08:16 | <andrew> | anyways, I just did a horrible thing where I ran grab site, paused it, then ran some shell spaghetti to generate a bunch of insert statements to massage the database, unpaused and quickly re-paused grab-site to release the lock, then waited for the insertion to complete |
| 00:08:24 | <nicolas17> | "All processes using a database must be on the same host computer; WAL does not work over a network filesystem." |
| 00:09:14 | <andrew> | so theoretically now with a bunch of URLs in the tables marked "done", wpull *should* ignore them when continuing this crawl, right? |
| 00:09:49 | | tbc1887 (tbc1887) joins |
| 00:10:25 | <andrew> | (echo 'begin transaction;'; pv ../already-done.txt | while read line; do echo "insert into url_strings(url) values ('${line}'); insert into queued_urls(url_string_id, parent_url_string_id, root_url_string_id, status, try_count, level, inline_level, link_type, priority, status_code) values (LAST_INSERT_ROWID(), 0, 0, 'done', 1, 0, null, null, 0, |
| 00:10:25 | <andrew> | null);"; done; echo 'commit;') | sqlite3 -cmd '.timeout 10000' wpull.db |
| 00:10:28 | <andrew> | totally cool and normal! |
| 00:10:44 | <pokechu22> | JAA has a wpull2-requeue script (and a few other related things) somewhere -- I thought in https://github.com/JustAnotherArchivist/little-things but I can't find them |
| 00:10:59 | <pokechu22> | oh, they're in https://gitea.arpa.li/JustAnotherArchivist/little-things, the one on github is outdated |
| 00:12:43 | <fullpwnmedia> | also, my friend wants to know how he can submit his warcs to the wbm |
| 00:12:45 | <andrew> | anyways, now we wait for the insertion of the 5.5 million input URLs |
| 00:14:11 | <fullpwnmedia> | actually, ive got like 1.2 tb worth of windows updates, can you guys help me get this on the wbm? |
| 00:14:18 | <fullpwnmedia> | its like 80,000 |
| 00:16:05 | <andrew> | fullpwnmedia: aren't Windows updates not available on the web in a way that's supported by the WBM? |
| 00:16:57 | <fullpwnmedia> | what do you mean? theyre just a bunch of urls |
| 00:17:16 | <fullpwnmedia> | you can download them from the web |
| 00:17:30 | <fullpwnmedia> | either through windows catalog or directly |
| 00:17:43 | <andrew> | do you have WARCs? |
| 00:17:52 | <@JAA> | fullpwnmedia: Not possible in general. Accepting WARCs from random people would make the WBM useless because anyone could insert manipulated data. You can still upload them to IA, but they won't be in the WBM. |
| 00:18:26 | <fullpwnmedia> | JAA cheers. well in that case could i supply you with a txt filled with urls? |
| 00:18:42 | <fullpwnmedia> | andrew no, i have them plainly downloaded |
| 00:18:58 | <@JAA> | That would work, or even better a method of how to obtain the URLs. I've long wanted to launch a software binary archival project. |
| 00:19:02 | <andrew> | fullpwnmedia: reconstructing WARCs is a big no-no |
| 00:19:56 | <fullpwnmedia> | yeah. so the txt i have has english updates at the top of the list as that would be the target language |
| 00:20:18 | <fullpwnmedia> | theres like 20?? updates in the txt that dont work. (time out) |
| 00:20:58 | <fireonlive> | they're available on https://www.catalog.update.microsoft.com/Home.aspx but i'm not sure of a way to list *everything*; there's just the search box which limits it to 1k results |
| 00:21:07 | <fullpwnmedia> | theyre older updates since those have a higher chance of getting removed |
| 00:21:19 | <fullpwnmedia> | fireonlive the txt i have has over 80,000 |
| 00:21:21 | | Nulo quits [Read error: Connection reset by peer] |
| 00:21:25 | | Nulo joins |
| 00:23:09 | <fireonlive> | ah nice |
| 00:23:11 | <fullpwnmedia> | JAA https://wormhole.app/KRlvk#HerK7iRVb6Sr6TahbDfSKA theres the txt. it will be removed after 24 hours or after 100 downloads |
| 00:24:06 | <@JAA> | fullpwnmedia: Please put it on https://transfer.archivete.am/ instead. |
| 00:24:13 | <fullpwnmedia> | gotcha |
| 00:31:34 | | PredatorIWD joins |
| 00:32:34 | <fullpwnmedia> | JAA https://transfer.archivete.am/wcojJ/wuurls-sorted |
| 00:32:58 | <@JAA> | Thanks |
| 00:38:07 | <tbc1887> | What are the chances of someone updating the tracker wiki page to explain the stats at the bottom? E.g. "item_request_serve_rate" |
| 00:40:28 | | umgr036 quits [Client Quit] |
| 00:43:55 | <@JAA> | AT documenting stuff? Slim to none. :-P |
| 00:44:26 | <@JAA> | But yeah, would be good to have a canonical explanation of it. |
| 00:48:21 | | Arcorann (Arcorann) joins |
| 00:49:50 | <Ryz> | I wonder if someone that's more or less non-technical head or mental space wise would be able to write documenting xD |
| 00:53:58 | <imer> | having tooltips in the web ui itself (<abbr> or the like) would be better imo |
| 00:54:40 | <fireonlive> | ye a nice hoverability discoverability |
| 00:55:28 | <imer> | havent looked into that yes, its on my mental "things to look at" list though |
| 00:55:32 | <imer> | yet* |
| 00:59:07 | <@JAA> | Or just a title attribute. |
| 01:00:16 | <@JAA> | I guess <abbr> is the semantically correct way to do it these days. |
| 01:03:25 | <tbc1887> | Yeah, like I look at those values and I guess at what some of them mean but the others I'm just like uhhhhhhhh |
| 01:10:46 | <@JAA> | Same really. I still don't know what the different RTT values mean exactly. |
| 01:16:19 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 01:34:10 | | DopefishJustin quits [Remote host closed the connection] |
| 01:37:57 | <fireonlive> | suggestion for photobucket channel name; photofuckit |
| 01:38:00 | <fireonlive> | lol |
| 01:39:19 | | HP_Archivist quits [Ping timeout: 265 seconds] |
| 01:52:26 | <andrew> | I just did some naughty SQL commands on this running wpull job's database and I only crashed DB Browser for SQLite twice! |
| 01:52:30 | | nicolas17_ joins |
| 01:53:20 | | nicolas17 quits [Ping timeout: 265 seconds] |
| 01:53:36 | | nicolas17_ is now known as nicolas17 |
| 01:54:57 | | nicolas17 quits [Client Quit] |
| 01:57:00 | | nicolas17 joins |
| 02:14:43 | <Doranwen> | Oof, I know one place that used tons of Photobucket - all the icon accounts on LiveJournal |
| 02:14:53 | <Doranwen> | I remember *everyone* used Photobucket for those back in the day. |
| 02:15:43 | | Doranwen recalls at one point poking at URLs and figuring out how to access the *non*watermarked pics from albums, but doesn't recall how she did that now. |
| 02:15:46 | <Terbium> | the update page says photos/videos for (formerly) free users who not be deleted unless their violate TOS. But i don't think we can trust them on that lol |
| 02:15:55 | <Terbium> | will* not |
| 02:16:08 | <@JAA> | And 'not deleted' doesn't necessarily mean 'still accessible'. |
| 02:16:39 | <Terbium> | too bad "kickthebucket" channel name is taken |
| 02:16:46 | <@JAA> | The FAQ also mentions that 'when your account is deactivated, you will lose access to your Bucket *and your images*' (emph mine). |
| 02:16:59 | <Terbium> | what about just "fucket"? |
| 02:17:04 | <andrew> | photobitbucket? |
| 02:17:16 | <Terbium> | "phucket" |
| 02:17:22 | <Terbium> | ooh i like that one ^ |
| 02:18:24 | <@JAA> | Something about leaky buckets perhaps? |
| 02:21:04 | <Terbium> | buckethead? |
| 02:21:07 | <andrew> | I can't think of anything better than the names stated above |
| 02:21:40 | <Terbium> | it has to be a pun + funny |
| 02:21:45 | <andrew> | wait, this help page was published 16 days ago https://support.photobucket.com/hc/en-us/articles/13209408103572-Changes-to-the-Photobucket-Free-Account |
| 02:21:48 | <andrew> | the trial is 7 days long |
| 02:21:53 | <andrew> | so what exactly is the timetable? |
| 02:22:44 | <@JAA> | Yeah, that's the big question. We haven't heard of this before, so I suspect they didn't send out an email about it to affected accounts or similar. |
| 02:22:57 | <Terbium> | i heard about it from a user who said they got emails |
| 02:23:11 | <Terbium> | don't have a screenshot though |
| 02:23:15 | <@JAA> | Ah, could you ask them for a copy of the email? |
| 02:24:08 | <Terbium> | andrew: i think photobucket has a free tier (not a free trial tier) that this mentions. I could be wrong |
| 02:24:13 | <@JAA> | Hmm, maybe it already happened: https://old.reddit.com/r/mildlyinfuriating/comments/13azxy8/lmao_okay_photobucket_rip_my_2007_account_images/ |
| 02:25:18 | <@JAA> | I wonder if it has anything to do with Imgur's purge. People flooding Photobucket etc. |
| 02:26:29 | <Terbium> | I can't email Discord continue to offer free file/video/media uploads forever either |
| 02:26:35 | <Terbium> | imagine* not email |
| 02:27:05 | <andrew> | they did recently raise the free user upload limit to 25 MB |
| 02:27:09 | <andrew> | *MiB |
| 02:27:39 | <andrew> | maybe they are already profitable with all those Nitro subscriptions? |
| 02:29:31 | <Terbium> | but they don't limit by a user i think. Theoretically, someone can upload unlimited number of 25 MiB files |
| 02:29:58 | <Terbium> | if my understanding is correct, it would be trivial for a 16 line python script to upload 1 TiB of worthless data to Discord |
| 02:30:11 | <andrew> | yes, but if you do that you will likely get banned very quickly |
| 02:30:43 | <Terbium> | you can quite easily make accounts though. You can spread it out to do lets say 50-100GB per account |
| 02:30:45 | <fireonlive> | the image in the email is quite... the wrong tone lol |
| 02:31:20 | <Terbium> | with 20 discord accounts at 100 GB each (low enough to not be suspicious), I can put 2TiB on discord for free |
| 02:31:40 | <andrew> | Discord does have pretty strong anti-bot/abuse mechanisms though |
| 02:31:50 | <andrew> | they will pretty quickly throw up phone verification, etc. |
| 02:32:45 | <Terbium> | maybe, we don't know the storage stats for Discord sadly. I just can't imagine it being sustainable |
| 02:33:06 | <Terbium> | unless the storage consumption is much less that I expect |
| 02:33:20 | | Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.] |
| 02:33:52 | | Terbium joins |
| 02:35:23 | <Terbium> | i'm just expecting discord to eventually make a major policy change to save money like ending their current unlimited chat retention or purging old fiels |
| 02:37:33 | <@JAA> | They sadly aren't publishing numbers often, but they weren't profitable as of 2020. |
| 02:44:23 | <Terbium> | what about "photobusted" |
| 02:45:07 | <fireonlive> | discord would be a toughie since there's so many servers and the urls are basically unbruteforcable https://cdn.discordapp.com/attachments/480497101676216340/1107974860195500072/Ryujinx_Nvidia_Profile_Screenshot_2023.05.16_-_12.04.39.55.png |
| 02:45:29 | <fireonlive> | filename has to be spot on as well; and its not like every server lets you autojoin and see everything hm |
| 02:45:37 | <fireonlive> | i liked my suggestion of photofuckit :p |
| 02:53:38 | <Doranwen> | One of the users in the Reddit thread linked above called it "bhotophucket", lol. |
| 02:58:35 | <@JAA> | Maybe botophucket to swap it properly. |
| 03:01:04 | <Terbium> | phuckthebucket? |
| 03:01:39 | <andrew> | Terbium: that's... a bit much |
| 03:01:56 | <Terbium> | bucketinphuket (phuket the vietnamese city)? |
| 03:01:58 | <fireonlive> | maybe if we were doing a hub site |
| 03:02:44 | <Terbium> | fireonlive: like github? |
| 03:03:57 | <fireonlive> | >_> |
| 03:04:00 | <fireonlive> | not quite |
| 03:05:19 | | sonick (sonick) joins |
| 03:29:35 | | DopefishJustin joins |
| 03:29:35 | | DopefishJustin is now authenticated as DopefishJustin |
| 03:33:52 | <myself> | foetobucket |
| 03:38:33 | <Terbium> | phonybucket |
| 03:39:21 | <andrew> | phototrashcan |
| 03:39:25 | <andrew> | photobin |
| 03:40:18 | <andrew> | it's been a while since I've seen a working photobucket link, maybe it's already way too late? |
| 03:43:18 | <JTL> | photobucket sucking isn't a new thing |
| 03:48:52 | <imer> | JTL: photosucket? ;p |
| 03:49:04 | <@JAA> | I was just thinking the exact same thing. lol |
| 03:49:16 | <JTL> | imer: nice |
| 03:49:29 | <andrew> | I find it funny how we're bikeshedding about the channel name |
| 03:49:47 | <@JAA> | It's a critical part of starting an AT project. |
| 03:50:09 | <andrew> | and that's before we even know we're actually going to do the project! |
| 04:01:26 | <Terbium> | I agree with JAA, the name is the most important thing |
| 04:01:43 | <Terbium> | Not the code, warriors, or the volunteers, but the IRC channel name |
| 04:02:36 | <nicolas17> | is there any sane library to parse warcs? I'm about to do the insane thing of parsing it myself |
| 04:03:32 | <Terbium> | I would say warcio, but the mere mention of that name will give JAA an aneurysm |
| 04:06:32 | <fireonlive> | i do the cool kid method of zgrep |
| 04:08:25 | <@JAA> | warcio is okay for parsing WARCs as long as you don't care about exact header integrity, which usually only matters in weird edge cases. |
| 04:08:31 | | Ketchup901 quits [Remote host closed the connection] |
| 04:08:40 | | Ketchup901 (Ketchup901) joins |
| 04:08:48 | <@JAA> | It just shouldn't ever be used for writing WARCs until those awful bugs are fixed. |
| 04:16:08 | <nicolas17> | hm this is interesting |
| 04:20:52 | <nicolas17> | .warc.zst has a separate zstd record for each file right? if I grab 160MB worth of warc, decompress it, and recompress it with zstd -19, I get 147MB, if I use xz -9, I *also* get 147MB at a much bigger CPU cost :D |
| 04:25:18 | <Terbium> | Are you compressing the entire WARC file (multiple records) at once or per record? |
| 04:26:08 | <Jake> | (Also, are you talking about a 160MB gzipped WARC file?) |
| 04:28:24 | <andrew> | nicolas17: each record is compressed individually, but there's also a dictionary |
| 04:28:39 | <andrew> | that dictionary really helps improve the compression ratio of the possibly tiny records |
| 04:28:48 | <andrew> | assuming the dictionary was trained correctly |
| 04:29:53 | | decky_e quits [Ping timeout: 252 seconds] |
| 04:30:18 | <andrew> | fun fact: in my experience, in some scenarios hand-selected/brute-forced dictionaries (and sometimes "dictionaries") outperform the built-in dictionary trainer by an order of magnitude |
| 04:31:35 | | decky_e (decky_e) joins |
| 04:32:52 | | Dango360_ (Dango360) joins |
| 04:33:50 | <flashfire42> | Is it too late to suggest PhotoKickTheBucket |
| 04:34:50 | | DopefishJustin quits [Ping timeout: 252 seconds] |
| 04:35:13 | <flashfire42> | It wasnt that bad a name that someone had to leave |
| 04:35:23 | | Dango360 quits [Ping timeout: 252 seconds] |
| 04:36:30 | | DopefishJustin (DopefishJustin) joins |
| 04:36:30 | | decky_e quits [Ping timeout: 252 seconds] |
| 04:36:43 | <pabs> | KickThePhotoBucket |
| 04:36:49 | <pabs> | kind of long though |
| 04:37:30 | | decky_e (decky_e) joins |
| 04:37:31 | <fireonlive> | "CHANNELLEN=50 :are supported by this server" |
| 04:37:37 | <fireonlive> | we got 50 characters to work with! |
| 04:39:12 | <fireonlive> | make that.. 48 |
| 04:39:48 | | TastyWiener95 quits [Ping timeout: 252 seconds] |
| 04:41:06 | <fireonlive> | it's been too long since i've been intimate with the IRC RFCs :( |
| 04:44:32 | <@JAA> | #curiousincidentoftheconfusedcatinthenighttime is our record holder so far, and I cringe every time I read it. |
| 04:46:33 | <fireonlive> | lmfao |
| 04:49:46 | <myself> | I really like photosuckit / photosucket, chiefly its potential for another round of bikeshedding about the penultimate letter. |
| 04:52:01 | <fireonlive> | a lot of these names are just asking for <reporter> why do you care about just saving porn? |
| 04:52:07 | <fireonlive> | especially mine haha |
| 04:52:14 | <andrew> | but is there even anything to save? |
| 04:52:23 | <fireonlive> | not sure |
| 04:52:46 | <fireonlive> | we've been too busy with paint swatches i suppose |
| 04:54:00 | <@JAA> | I'd like to answer with a song: The internet is really, really great... |
| 04:54:18 | | decky_e quits [Remote host closed the connection] |
| 04:54:34 | | decky_e joins |
| 04:55:34 | | killsushi quits [Ping timeout: 252 seconds] |
| 04:56:45 | <fireonlive> | yes |
| 04:57:01 | <fireonlive> | my submissions via !a can attest to that |
| 05:25:41 | | hitgrr8 joins |
| 05:26:23 | <@arkiver> | i see photobucket is not going to actually delete data? |
| 05:37:07 | <nicolas17> | Terbium: I was recompressing it all at once, not per record, so it's not surprising it got smaller; it was just interesting that zstd -19 and xz -9 gave the same results |
| 05:41:03 | <@JAA> | arkiver: It's ambiguous at best. They say people will lose access to their images but also that they aren't deleted. |
| 05:42:21 | <@JAA> | arkiver: Also, in case you missed it in the spam above, there are some signs that they've already 'deactivated' accounts: https://old.reddit.com/r/mildlyinfuriating/comments/13azxy8/lmao_okay_photobucket_rip_my_2007_account_images/ |
| 05:58:11 | | tsblock (tsblock) joins |
| 06:07:03 | | Minkafighter quits [Client Quit] |
| 06:07:36 | | Minkafighter joins |
| 06:55:08 | | icedice (icedice) joins |
| 06:56:57 | <icedice> | JAA: Have you had time to extract Imgur links from Bulbagarden Forums, Serebii Forums, Pokemon-Trainer.com, and the newly crawled links from The PokéCommunity? |
| 07:05:04 | | Island quits [Read error: Connection reset by peer] |
| 07:12:10 | | umgr036 joins |
| 07:12:33 | <@JAA> | icedice: Negative, been too busy with other things. |
| 07:13:00 | | umgr036 quits [Remote host closed the connection] |
| 07:13:15 | | umgr036 joins |
| 07:19:00 | <icedice> | Ok, rodger that |
| 07:19:25 | <icedice> | Hopefully there's time before Imgur finishes their data wipe |
| 07:36:01 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
| 07:36:57 | | user__ joins |
| 07:39:50 | | umgr036 quits [Ping timeout: 252 seconds] |
| 07:41:18 | | user__ quits [Ping timeout: 252 seconds] |
| 07:45:05 | | icedice quits [Client Quit] |
| 07:58:21 | | icedice (icedice) joins |
| 08:04:54 | | icedice quits [Client Quit] |
| 08:05:37 | | icedice (icedice) joins |
| 08:29:14 | | jo2 joins |
| 08:44:19 | | decky_e quits [Read error: Connection reset by peer] |
| 08:46:47 | <@JAA> | TIL https://archive.matrix.org/ |
| 08:53:03 | | decky_e (decky_e) joins |
| 08:53:53 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 08:55:05 | | fuzzy8021 (fuzzy8021) joins |
| 09:09:57 | <h2ibot> | Aladork edited URLTeam (+131, added 1url.cz): https://wiki.archiveteam.org/?diff=49805&oldid=49792 |
| 09:09:58 | <h2ibot> | ProudQuebecois edited URLTeam (+539, /* Alive */): https://wiki.archiveteam.org/?diff=49806&oldid=49805 |
| 09:09:59 | <h2ibot> | Todb moved ArchiveBot/CVE to CVE References (Shouldn't be a subpage of ArchiveBot.): https://wiki.archiveteam.org/?title=CVE%20References |
| 09:40:27 | | tsblock quits [Read error: Connection reset by peer] |
| 09:59:08 | | Voyage joins |
| 10:07:29 | <masterX244> | do we have a channel for photobucket round 2? |
| 10:08:33 | <@JAA> | Not yet, see above for channel name suggestions. |
| 10:09:42 | <@JAA> | Although it's not clear whether a project even makes sense now or they already disabled the affected accounts anyway. |
| 10:10:07 | <masterX244> | photofuckit if we accept more unorthodox names could work, too |
| 10:10:39 | <masterX244> | lemme hunt my linklists, io think i got a few links to poke around |
| 10:11:17 | <masterX244> | http://i1006.photobucket.com/albums/af181/polewskdComradeDave/20160126_195428_zpsbiae3q5f.jpg |
| 10:11:24 | <masterX244> | still up (random image of a stormtrooper armor part) |
| 10:13:57 | <@JAA> | I like photosucket, photofucket, botophucket. Maybe a slight preference for the first one since 'Photobucket sucks' has been a common saying for at least a decade now, but they're all good. |
| 10:22:37 | | jo2 quits [Remote host closed the connection] |
| 10:31:57 | <datechnoman> | I'm with JAA_ on the Photosucket channel name. It's just sucks |
| 10:44:53 | | Lambro_D joins |
| 11:17:07 | <lennier1> | Apparently they have a history of suddenly changing TOS and holding images for ransom: https://www.theverge.com/2017/7/4/15919224/photobucket-broken-images-amazon-ebay-etsy-paid-update |
| 11:17:45 | <icedice> | Why even pay them when other free image hosts like Imgbox are more reliable? |
| 11:18:05 | <icedice> | And support bigger file sizes |
| 11:26:51 | <lennier1> | Some people on Twitter saying their account is already deactivated. https://twitter.com/minanohime/status/1659429160681635840 https://twitter.com/sorunort/status/1659414501165350912 https://twitter.com/21magz/status/1659216152470470656 |
| 11:27:28 | <lennier1> | https://support.photobucket.com/hc/en-us/articles/200724494-What-Happens-to-My-Account-When-My-Subscription-Expires- |
| 11:28:09 | <masterX244> | and even paid accounts are indirectly endangered since failcket might go bellyup unexpectedly |
| 11:28:27 | <lennier1> | "What Happens When My Account is Deactivated? ... If your account was public, it will no longer be viewable" |
| 11:35:11 | <nstrom|m> | +1 for photosucket, has the double meaning of us sucking down their data |
| 11:45:06 | | tbc1887 quits [Client Quit] |
| 11:46:24 | | tbc1887 (tbc1887) joins |
| 11:57:05 | <Terbium> | We need a committee and 2 rounds of voting, peer review, and conduct formal scientific studies to ensure we are choose the best possible channel name |
| 12:02:09 | | jacksonchen666 (jacksonchen666) joins |
| 12:16:09 | | HP_Archivist (HP_Archivist) joins |
| 12:20:19 | | Letur joins |
| 12:33:35 | | birdjj2 joins |
| 12:34:36 | | Voyage quits [Client Quit] |
| 12:35:41 | | birdjj quits [Ping timeout: 265 seconds] |
| 12:35:42 | | birdjj2 is now known as birdjj |
| 12:45:15 | | Chris5010 quits [Quit: ] |
| 13:01:21 | | Chris5010 (Chris5010) joins |
| 13:15:55 | | jacksonchen666 quits [Client Quit] |
| 13:31:05 | | Ryz quits [Ping timeout: 252 seconds] |
| 13:31:23 | | IDK_ quits [Read error: Connection reset by peer] |
| 13:39:27 | | IDK_ joins |
| 13:39:51 | | Ryz (Ryz) joins |
| 13:58:36 | | Chris5010 quits [Ping timeout: 252 seconds] |
| 14:28:12 | | dumbgoy_ joins |
| 14:38:04 | | Chris5010 (Chris5010) joins |
| 15:09:23 | | GNU_world quits [Ping timeout: 265 seconds] |
| 15:47:01 | <icedice> | Is Pushshift archived anywhere btw? Seems like their API is no longer accessible: "Check back in the next few weeks for updates. - Pushshift team (May 19, 2023)" |
| 15:49:35 | <Terbium> | icedice: pushshift torrents are available (2 TiB) and widely seeded |
| 15:49:49 | <Terbium> | unless you require an ready to use API |
| 15:50:05 | <Terbium> | or need data past March 2023 |
| 15:52:55 | <icedice> | Ideally I'm looking for a search interface that utilizes Pushshift, but had the foresight to mirror it |
| 15:53:03 | <icedice> | And I doubt any such website exists |
| 15:54:08 | <Terbium> | most Pushshift front ends like camas/unddit/reveddit talk directly to the pushshift API to avoid rehosting the database themselves. |
| 15:54:25 | <icedice> | I think I found one: https://socialgrep.com/search |
| 15:55:25 | <icedice> | No idea if redditsearchtool.com and redditsearch.io still work since they haven't worked in my main browser since forever |
| 15:55:29 | <Terbium> | huh, i guess it makes sense for socialgrep to self host it since they are selling paid plans for their service |
| 15:55:52 | <Terbium> | the other free frontends are generally unpaid or rely on donations so they don't have the resources to selfhost |
| 15:55:56 | <icedice> | Yeah |
| 15:56:27 | <icedice> | Let's not give SocialGrep a reason to paywall the entire thing |
| 15:56:53 | <icedice> | i.e. be kind towards their servers in case there's anyone here thinking of utilizing it on a big scale |
| 15:57:45 | <Terbium> | i believe there were a lot of free services abusing the pushshift api as well since it was free and ease to use |
| 15:57:58 | <icedice> | Yeah |
| 15:58:35 | <Terbium> | im currently loading the pushshift dataset into elasticsearch on a couple on my servers |
| 15:58:52 | <icedice> | This one was pretty cool: https://reddit-web-downloader.vercel.app/ |
| 15:59:11 | <icedice> | https://github.com/M4p4/reddit-web-downloader |
| 15:59:35 | <Terbium> | oh, that uses the limited reddit api (max 1000 posts) |
| 16:00:07 | <icedice> | That might explain why other methods can rip more |
| 16:00:20 | <Terbium> | other methods just generally rip off of pushshift |
| 16:00:49 | <icedice> | Or load the json files from Reddit |
| 16:00:55 | <icedice> | Which is probably also limited |
| 16:01:08 | <icedice> | Yeah, limited to 1000 posts |
| 16:01:32 | <Terbium> | I think currently theres' no way on any of the Reddit API endpoints to go back more than 1000 posts i think |
| 16:02:02 | <icedice> | Yeah, you just have to rip a bunch of different combinations of stuff to get more |
| 16:03:31 | <icedice> | Terbium: Reddit Web Downloader stopped working at the same time Pushshift took down their API though |
| 16:03:48 | <icedice> | Maybe just a coincidence |
| 16:05:03 | <Terbium> | i see most apps do is first use pushshift to grab historical data (get ids of posts) then use the offical reddit API to grab the post data |
| 16:05:32 | <icedice> | That would explain it, yeah |
| 16:12:54 | <kpcyrd> | socialgrep data seems pretty old |
| 16:24:31 | <icedice> | Probably because Pushshift got banned by Reddit |
| 16:32:21 | <Terbium> | i wouldnt be surpised if socialgrep was just mirroring pushshift data to avoid crawling reddit themselves |
| 16:32:40 | <Terbium> | saves them a lot of effort too |
| 16:36:43 | <@JAA> | We have a channel for Reddit, please keep this there. #shreddit |
| 16:38:40 | <@arkiver> | so, #photosucket ? JAA imer anarcat |
| 16:42:14 | <masterX244> | +1 from me, too. |
| 16:42:44 | <icedice> | +1 |
| 16:45:26 | | Arcorann quits [Ping timeout: 252 seconds] |
| 16:50:15 | | Kinille quits [] |
| 16:50:56 | | Kinille (Kinille) joins |
| 16:52:15 | | Megame (Megame) joins |
| 16:56:17 | <TheTechRobo> | +1 |
| 17:11:13 | | Island joins |
| 17:16:02 | | Island quits [Ping timeout: 252 seconds] |
| 17:18:34 | <Barto> | +1 |
| 17:19:48 | <andrew> | +0.5 |
| 17:20:21 | <Terbium> | +0.05 |
| 17:20:35 | <@arkiver> | still looks positive |
| 17:20:45 | <@arkiver> | good enough for me |
| 17:27:27 | <masterX244> | the IA got IPV6? |
| 17:27:54 | <@JAA> | Nope |
| 17:28:20 | <masterX244> | frick.... was hoping to save a IPV4 on a tiny vserver at hetzner as a Pipe booster to the IA |
| 17:28:28 | | Island joins |
| 17:29:15 | | Island_ joins |
| 17:31:37 | <fireonlive> | +1 |
| 17:32:56 | | Island quits [Ping timeout: 265 seconds] |
| 17:34:29 | <fireonlive> | if anyone was using the pushshift API for data between 202303 and the plug being pulled (202305?) it's shutdown now |
| 17:34:35 | <fireonlive> | via https://www.reddit.com/r/pushshift/comments/13mhuzq/api_has_been_taken_down/ |
| 17:35:06 | <fireonlive> | oh, 04 to 05; i think the last dump was 03 |
| 17:35:45 | <@JAA> | Again, we have a channel for Reddit. → #shreddit |
| 17:35:49 | | Island__ joins |
| 17:36:32 | <fireonlive> | sorry wasn't sure if that was relevant specifically |
| 17:36:37 | <fireonlive> | shall pass the message on :) |
| 17:36:48 | <@JAA> | It has been discussed extensively already. |
| 17:36:48 | | Island_ quits [Ping timeout: 265 seconds] |
| 17:37:34 | <fireonlive> | ah; this was the api.pushshift.io finally just not giving any responses instead of not ingesting new data |
| 17:37:38 | <fireonlive> | but i shall shut up now |
| 17:39:55 | | dumbgoy__ joins |
| 17:43:44 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 17:47:24 | | Island__ quits [Ping timeout: 252 seconds] |
| 17:48:05 | | Island joins |
| 17:49:43 | <fireonlive> | nvm i'm just an idiot lol |
| 17:52:22 | | Island_ joins |
| 17:54:32 | | Island quits [Ping timeout: 252 seconds] |
| 17:59:09 | | Island joins |
| 17:59:30 | | Island_ quits [Ping timeout: 252 seconds] |
| 18:11:34 | | Island_ joins |
| 18:12:34 | | Island quits [Ping timeout: 265 seconds] |
| 18:18:15 | | Island joins |
| 18:19:20 | | Island_ quits [Ping timeout: 265 seconds] |
| 18:21:03 | | Island_ joins |
| 18:24:10 | | Island quits [Ping timeout: 265 seconds] |
| 18:36:53 | | Island_ quits [Ping timeout: 252 seconds] |
| 18:40:57 | | Megame quits [Client Quit] |
| 18:47:52 | | Island joins |
| 18:59:38 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 19:10:07 | | HP_Archivist (HP_Archivist) joins |
| 19:20:53 | | Island quits [Ping timeout: 252 seconds] |
| 19:24:57 | | Island joins |
| 19:49:30 | <tech234a> | https://news.ycombinator.com/item?id=36014778 “Gfycat has been down for two days due to an expired SSL certificate” |
| 19:52:04 | <fireonlive> | no one works on the weekend i guess o_O |
| 19:53:17 | <fireonlive> | hm, expired wednesday |
| 19:53:27 | <fireonlive> | not a great sign |
| 19:54:33 | <@JAA> | → #deadcat |
| 20:01:35 | | Island quits [Ping timeout: 252 seconds] |
| 20:08:12 | <icedice> | Are Gfycat operating RedGifs or did they sell that part off to someone else? |
| 20:16:24 | <@JAA> | RedGIFs was sold shortly after Gfycat banned NSFW content. |
| 20:16:56 | <icedice> | Makes sense |
| 20:18:17 | <@JAA> | ... at least I think it was, can't find a source for that right now. |
| 20:19:43 | <icedice> | Given that Gfycat was aqcuired by Snap (Snapchat) it would have to have been sold off |
| 20:20:07 | <icedice> | Those guys want no part of anything that isn't family friendly |
| 20:21:12 | <icedice> | I always wondered why I stopped seeing Gfycat being used as a gif host and why Discord just added Giphy and Tenor, but not Gfycat |
| 20:21:14 | <fireonlive> | redgifs shows "Vergil Services, Inc." for DMCA contact and gfycat shows ... GfyCat Inc |
| 20:21:20 | <icedice> | I guess the answer is that i fell off |
| 20:21:49 | <icedice> | * it fell |
| 20:23:56 | | Lambro_D quits [Read error: Connection reset by peer] |
| 20:27:39 | <@JAA> | s/it/the front/ |
| 20:29:17 | <icedice> | I guess link redirects to RedGifs will stop working as well |
| 20:29:50 | <icedice> | Though it's not exactly rocket science for users to get there themselves when the IDs remain the same |
| 20:36:58 | | sonick quits [Client Quit] |
| 21:02:10 | | Island joins |
| 21:05:24 | | decky_e quits [Remote host closed the connection] |
| 21:05:43 | <vokunal|m> | how do you find channels that aren't in the matrix space list? |
| 21:07:35 | | Island quits [Ping timeout: 252 seconds] |
| 21:08:33 | <@JAA> | The wiki lists most of them, although it's sometimes a bit outdated and there isn't a single easy list. |
| 21:08:48 | | Island joins |
| 21:12:00 | <h2ibot> | JustAnotherArchivist edited Photobucket (+320, Add IRC channel, free account purge, references): https://wiki.archiveteam.org/?diff=49809&oldid=48537 |
| 21:12:22 | <vokunal|m> | I see the channel names, but I don't know how to search it |
| 21:12:38 | <vokunal|m> | like putting #deadcat in the search bar doesn't show anything |
| 21:13:35 | <@JAA> | Yeah, many of our channels aren't publicly listed due to past spam episodes. |
| 21:15:48 | <@JAA> | (I assume 'the search bar' on Matrix gets translated to searching the output of /LIST or similar.) |
| 21:16:26 | <nicolas17> | no, I think if nobody ever joined an IRC channel from matrix, there won't be a matching matrix room for it, and it won't appear in the matrix search |
| 21:16:53 | <@JAA> | Ah. Do secret rooms appear there? If so, eww. |
| 21:17:10 | <@JAA> | I.e. channels with +s, which makes them not appear in /LIST. |
| 21:17:57 | <nicolas17> | not sure, it seems plausible that the bridge would convert +s to its equivalent matrix-room visibility settings but idk if it actually works that way |
| 21:20:26 | <vokunal|m> | if there isn't a room for it because no one joined from matrix then how would anyone join a new room from matrix? I don't know how to tell if a room is unhidden |
| 21:24:57 | <FireFly> | I think there's been privacy leaks like that before with the public directory listing |
| 21:25:07 | <FireFly> | as in, including IRC-secret channels on it |
| 21:25:17 | <FireFly> | might be fixed though |
| 21:27:17 | <nicolas17> | vokunal|m: what's this room called in matrix? # archiveteam-bs : hackint.org? |
| 21:28:09 | <nicolas17> | seems so |
| 21:28:23 | <vokunal|m> | yeah |
| 21:28:37 | <nicolas17> | so join #deadcat:hackint.org |
| 21:29:12 | <vokunal|m> | ah yeah that worked |
| 21:29:21 | <vokunal|m> | I never even bothered looking at the url |
| 21:29:27 | <@JAA> | #// is the one that's challenging on Matrix, or so I've heard. :-) |
| 21:29:41 | <nicolas17> | oof |
| 21:29:43 | <vokunal|m> | yeah it shows as !IhBFmCKJgcfpMnEJqT:hackint.org |
| 21:39:34 | | hitgrr8 quits [Client Quit] |
| 21:45:43 | | Island quits [Ping timeout: 265 seconds] |
| 21:46:50 | <pokechu22> | Tue, 25 Apr 2023 10:59 -- Global (Global@services.hackint.org): [Network Notice] hexa- - The matrix-appservice-irc fixed an issue in the latest release, where rooms with +s (secret) cmode set would accidentally be leaked into the matrix channel directory, if a matrix user was joined to it. The fix has been deployed on 2023/04/14 around 7pm UTC and the rooms are not visible |
| 21:46:52 | <pokechu22> | anymore. |
| 21:47:08 | <pokechu22> | +s was leaked in the past, but not anymore, dunno if this was also announced on hackint.org or not |
| 21:49:13 | <hexa-> | it was only announced here on IRC |
| 21:53:07 | <h2ibot> | Hans5958 edited URLTeam (-2944, Checking round on 2023-05-20): https://wiki.archiveteam.org/?diff=49810&oldid=49806 |
| 21:53:08 | <h2ibot> | Hans5958 edited URLTeam/Dead (+4024, Checking round on 2023-05-20): https://wiki.archiveteam.org/?diff=49811&oldid=49797 |
| 21:59:18 | | Island joins |
| 22:00:09 | <h2ibot> | JAABot edited URLTeam/Dead (+0): https://wiki.archiveteam.org/?diff=49812&oldid=49811 |
| 22:15:12 | | Island quits [Ping timeout: 265 seconds] |
| 22:15:55 | | Island joins |
| 22:23:52 | | Island quits [Ping timeout: 252 seconds] |
| 22:29:28 | | Island joins |
| 22:36:47 | | Island_ joins |
| 22:40:00 | | Island quits [Ping timeout: 252 seconds] |
| 22:42:16 | | Island_ quits [Ping timeout: 265 seconds] |
| 22:57:29 | | sonick (sonick) joins |
| 22:59:14 | | TheTechRobo quits [Read error: Connection reset by peer] |
| 23:01:13 | | TheTechRobo (TheTechRobo) joins |
| 23:08:00 | | Icyelut|2 (Icyelut) joins |
| 23:11:20 | | Icyelut quits [Ping timeout: 252 seconds] |
| 23:13:10 | | marto_8 quits [Quit: zzzzz] |
| 23:13:40 | | marto_8 joins |
| 23:17:03 | | TastyWiener95 (TastyWiener95) joins |
| 23:17:16 | | decky_e (decky_e) joins |
| 23:20:59 | | BlueMaxima joins |
| 23:22:55 | | Island joins |
| 23:28:56 | | Island quits [Ping timeout: 252 seconds] |
| 23:42:56 | | Island joins |
| 23:54:36 | | TastyWiener95 quits [Client Quit] |