00:00:47Arcorann (Arcorann) joins
00:09:32TheTechRobo is now known as TheTechRoboWithAMoustache
00:09:47TheTechRoboWithAMoustache is now known as TheTechRobo
00:13:22parfait (kdqep) joins
00:13:57<Terbium>TheTechRobo: wpull isn't that difficult to install i would say
00:15:00<@JAA>As long as you have a supported (i.e. long EOL) version of Python.
00:15:07<@JAA>And you deal with the broken dependencies.
00:15:12<@JAA>And you deal with the broken CLI.
00:15:31<Terbium>I'm using the Python 3.12 with it with latest Tornado and dependencies :)
00:15:35<@JAA>But other than *that*, it's just fine. :-)
00:15:45<@JAA>Yes, but not the regular wpull. :-)
00:16:15<Terbium>Close enough for me (it's just missing the "ludios_" prefix in the name haha)
00:19:25<Terbium>Back to working on my PR into ludios_wpull, hopefully will get a Python 3.11+ version into master branch within the next 1-2 weeks
00:21:14<Terbium>wpull++
00:21:15<eggdrop>[karma] 'wpull' now has 1 karma!
00:21:35<TheTechRobo>Wget-AT is love, Wget-AT is life
00:22:33<fireonlive>oh! you're the one committing into the python 3.11 branch of pull
00:22:35<fireonlive>wpull
00:22:40<fireonlive>nice to meet you Terbium :3
00:22:58<Terbium>Nice to meet you (again) :)
00:25:16<fireonlive>ye again x3
00:25:18<fireonlive>:)
00:25:48<Terbium>I mostly lurk hehe
00:26:14<fireonlive>:3
00:26:20<fireonlive>a watchful eye
00:30:46<@JAA>Ah, now that makes sense. :-)
00:31:13<fireonlive>we were wondering who the mysterious committer was
00:31:16<fireonlive>:p
00:31:26<fireonlive>(#at-changes)
00:32:56<Terbium>oh what?
00:33:21<ScenarioPlanet>Here's a question about wget-at: is it possible to send & save a POST request with some formdata with a single command? Or does it need any .lua script for that?
00:33:27<Terbium>lol, I've been talking to Ivan on and off about it, didn't realize it was stirring up some confusion
00:34:19<fireonlive>there's a newly founded channel (thanks to nulldata) that posts new commits in ArchiveTeam repos
00:34:36<fireonlive>and stuff related to issues/PRs
00:34:51<fireonlive>oh and docker images/wiki changes
00:34:56<Terbium>I had an private fork with numerous changes/modernizations for wpull along with grab-site, with CI/CD + docker. just recently got some time to work of migrating some changes to the mainline repo
00:35:07<fireonlive>oh awesome :)
00:35:23<fireonlive>nice to see :3
00:36:20<fireonlive>oh hey, our wiki got linked https://news.ycombinator.com/item?id=34734177#34737936 (google alert came in)
00:36:34<fireonlive>well, one of our secondary wikis
00:40:20<Terbium>still a shame this hasn't moved along in the 2 years i've been watching it: https://github.com/facebook/zstd/pull/2349
00:40:59<Terbium>zstandards pretty nice, thought about converting my WARC datasets to zstd, but never got around to it
00:44:14hitgrr8 quits [Client Quit]
00:47:41<fireonlive>:(
00:48:17fireonlive pokes the developers of archivebox too
00:48:51<Terbium>archivebox is great, shame it doesn't do recursive crawling
00:49:13<@JAA>And shame its WARCs are weird because wget.
00:49:37<fireonlive>ye, need to switch to wget-at i suppose
00:49:46<fireonlive>recursive would be cool :)
00:49:48<@JAA>I was just wondering how hard it would be to make that work.
00:50:06<fireonlive>i do have my instance still up, though i've barely used it since JAA pooped on it
00:50:09<fireonlive>:P
00:50:15<@JAA>:-P
00:50:26<@JAA>I mean, at least wget doesn't corrupt data or similar.
00:50:29<@JAA>:-)
00:50:34<Terbium>JAA you should use a toilet instead of fireonline's archivebox instance :)
00:51:20<fireonlive>xP
00:51:57fireonlive attempts to remember wget's warc faults
00:52:01<Terbium>JS rendering still a big pain to deal with, I'll resorting to Chromium browsers for archiving since wpull and wget-at doesn't cut it for those
00:52:27<@JAA>Yeah, brozzler I guess for that.
00:52:36<fireonlive>ah! https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem
00:53:18<fireonlive>yeah, the new world of javascript for everything and soon HTTP/2 and HTTP/3
00:53:37<Terbium>currently using warcprox with chromium containers in Kubernetes
00:54:01<fireonlive>ooh :3
00:54:10<@JAA>Neat
00:54:12<Terbium>brozzler was a bit too vertically integrated for my liking when i looked at it 3-4 years ago
00:54:14<fireonlive>is cloudflare happy with you?
00:55:04<Terbium>using a couple proxies and captcha solvers to work around buttflare. Still a pain to deal with
00:55:52<fireonlive>ahh
00:56:25<fireonlive>prowlarr had an integration for one of those pay as you go services, thought it quite neat
00:58:05<Terbium>"neat"ly burning a hole in my wallet :P
00:58:53<fireonlive>:P
00:59:26<fireonlive>gotta set up a free site with "something people want" and in order to get access to said content they solve captchas for you
00:59:33<fireonlive>:D
01:00:43<Terbium>We could make it an AT project, outsourcing captchas to volunteers lol
01:00:58<fireonlive>haha
01:01:24<Terbium>But in all seriousness, hcaptchas are pretty difficult compared to recaptcha
01:01:37<fireonlive>>_<
01:03:13<fireonlive>-+rss/#hackernews- Indexing a Billion Pages: https://blog.mwmbl.org/articles/indexing-a-billion/ https://news.ycombinator.com/item?id=38744224
01:03:22<fireonlive>wonder if they deal with the same
01:03:32<fireonlive>also oops, meant to offtopic that lol
01:04:42<qwertyasdfuiopghjkl>something like a captcha solving leaderboard
01:04:58<fireonlive>gotta gamify it :3
01:05:54<thuban>you jest, but we did have a leaderboard for joining yahoo groups
01:07:32<Terbium>Yep, I was there for yahoo groups lol
01:07:37<Terbium>That one was insane
01:17:11<fireonlive>ooh, that sounds fun lol
01:17:15<fireonlive>'fun'
02:39:02<flashfire42|m>Fireonlive constantly doing captchas? And not even getting paid. Not my idea of fun
02:39:26<fireonlive>ah were you a yahooligan?
03:00:07Hackerpcs quits [Client Quit]
03:02:10Hackerpcs (Hackerpcs) joins
03:12:20parfait quits [Ping timeout: 240 seconds]
03:16:55<Ryz>Hmm, outsourcing ReCAPTCHAs to volunteers; I partipicated in that madness before with Yahoo Groups...I had some interesting image finds from doing the solving~
03:17:05<Ryz>*participated
03:17:22<Terbium>Yahoo Groups was a big pain due to all the private groups :(
03:23:30<Ryz>I wouldn't mind doing a reCAPTCHA stuff for internet archiving purposes, similar to the Yahoo Groups stuff; do 'em when I'm bored or something
03:31:42<Ryz>I do ponder if something like that would be possible in general, via #// ...?
04:03:08DogsRNice joins
04:17:28<Ryz>Hmm, interesting, Mwmbl has a Firefox extension that when installed and enabled, uses computer resources for web crawling, https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/
04:34:02<Pedrosso>https://twitter.com/Mineteria the twitter of a Minecraft server which years back got merged into another. I'm surprised the twitter account is still up
04:34:03<eggdrop>nitter: https://nitter.net/Mineteria
04:59:05IDK quits [Quit: Connection closed for inactivity]
05:09:55<pabs>Pedrosso: add it to https://pad.notkiska.pw/p/archivebot-twitter
05:10:43<Pedrosso>will do
05:16:16qwertyasdfuiopghjkl quits [Remote host closed the connection]
05:34:03qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
05:43:24DogsRNice quits [Read error: Connection reset by peer]
06:02:40<fireonlive>in which fireonlive types too many words on the etherpad
06:03:05<fireonlive>too much pad not enough ether 🥴
06:25:18wyatt8750 joins
06:25:19wyatt8740 quits [Read error: Connection reset by peer]
06:38:49pabs quits [Ping timeout: 272 seconds]
06:39:48<Doranwen>Oh, the Yahoo Groups captchas brings back so many memories… all that training AI to read rumble strips as crosswalks, lol.
06:41:19pabs (pabs) joins
06:41:38<Doranwen>The other top people who worked on the fandom side of the project were revisiting the history of it the other day and remembering some of the captcha discussion and joking. Me, I think about YG all the time, but that's because I'm still sorting metadata. So many weird groups that used to exist.
06:44:08<thuban>i still have a folder full of screenshots of 'ceci n'est pas un pipe' situations
06:45:11<thuban>*une
06:47:53<Ryz>Wouldn't mind doing more of the Captcha stuff <#>;
06:49:59<Ryz>Saying this because I recall years ago, there's a website that uses Google reCAPTCHA for the sole purpose of doing it for a high score~
06:50:40<flashfire42>You realise that was probably aiding a spam operation?
06:50:51BlueMaxima quits [Read error: Connection reset by peer]
06:52:12<Ryz>It was like years ago, like, I think a decade ago?
06:52:44<Ryz>It was back then when reCAPTCHAs where all about transcribing text from books because Google Books at the time
06:53:29<Ryz>I do know that after solving them enough times, it gives a much harder version, I'm assuming because of having to solve them a bunch in a row
06:57:06<@JAA>Yeah, there are lots of services where you can spend your time as a Mechanical Turk to earn pennies and help spam bots break captchas.
06:57:35<fireonlive>tfw mturk banned me
06:57:38<fireonlive>>:(
06:58:03<Ryz>Aww yeah, this is what I used to see when I was doing it for a high score: https://3.bp.blogspot.com/-SnVfcK0v9Lc/Ur4F_bIMyEI/AAAAAAAACcc/S5inQO5jFTU/s1600/reCAPTCHA+don't+type.jpg
07:00:59ymgve quits [Ping timeout: 272 seconds]
07:02:58qwertyasdfuiopghjkl quits [Remote host closed the connection]
07:05:55ymgve joins
07:09:59qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:31:15Megame (Megame) joins
07:47:12pabs quits [Remote host closed the connection]
07:51:51pabs (pabs) joins
07:54:09inedia (inedia) joins
08:51:03qwertyasdfuiopghjkl quits [Client Quit]
09:23:34qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
09:28:20Naruyoko quits [Remote host closed the connection]
09:28:41Naruyoko joins
10:00:05Bleo18260 quits [Client Quit]
10:01:26Bleo18260 joins
10:03:15project10 quits [Quit: .]
10:04:40<ShadowJonathan>Heya, could I request two websites for re-scrape? One is showing signs of bit rot (and is behind cloudflare), while the other I had already requested scrape while the site was having a rough period, but since it's stable now, I'd wanna request again since then it's assured all pages got captured
10:20:12project10 (project10) joins
10:35:03Hackerpcs quits [Ping timeout: 272 seconds]
10:35:57Hackerpcs (Hackerpcs) joins
10:37:33hitgrr8 joins
11:03:58vukky quits [Quit: @ERROR: max connections (-1) reached -- try again later]
11:09:53vukky (vukky) joins
11:11:29Island quits [Read error: Connection reset by peer]
11:14:10ScenarioPlanet quits [Client Quit]
11:18:23vukky quits [Read error: Connection reset by peer]
11:18:26vukky1 (vukky) joins
11:51:11<thuban>ShadowJonathan: go ahead, but be aware that we may not be able to do much through cloudflare protection
11:51:32<ShadowJonathan>ait, the CF-protected fanfic site is www.fanfiction.net
11:51:40<thuban>ah
11:51:47<ShadowJonathan>the head domain fanfiction.net stopped resolving, and thats why im kinda panicking
11:51:56<ShadowJonathan>or well, its a signal of bitrot and neglect, for me
11:52:23<ShadowJonathan>the other site, the well-working one, is www.cyoc.net, a "choose your own adventure" submission website, but NSFW
11:53:04<thuban>yeah, we've discussed ffn on a number of occasions (including when that started happening)
11:53:15<thuban>but cloudflare's a bitch
11:53:38<ShadowJonathan>ah
11:53:40<ShadowJonathan>alrighty then :(
11:56:46<thuban>there are theoretical plans of attack, but it's a lot of dev work that nobody's had the time to do :(
11:57:17<thuban>someone should be along to queue the other site in a bit
12:19:17<ShadowJonathan>alrighty, thanks
12:23:31ScenarioPlanet (ScenarioPlanet) joins
13:28:50Arcorann quits [Ping timeout: 240 seconds]
13:37:59jacksonchen666 (jacksonchen666) joins
13:48:36jacksonchen666 quits [Client Quit]
14:20:07qwertyasdfuiopghjkl quits [Client Quit]
14:30:39systwi quits [Ping timeout: 272 seconds]
14:38:44systwi (systwi) joins
14:57:15systwi quits [Ping timeout: 272 seconds]
15:00:12ScenarioPlanet quits [Client Quit]
15:37:12HP_Archivist (HP_Archivist) joins
15:37:40aninternettroll_ (aninternettroll) joins
15:37:56aninternettroll quits [Read error: Connection reset by peer]
15:37:56aninternettroll_ is now known as aninternettroll
15:45:32systwi (systwi) joins
16:26:38c3manu (c3manu) joins
16:50:26DogsRNice joins
17:07:50<Vokun>The art of second person story telling is underutilized. They should make a non nsfw site. This is an interesting concept
17:59:51andrew quits [Quit: Ping timeout (120 seconds)]
18:00:12andrew (andrew) joins
18:09:04<Doranwen>ShadowJonathan: I've taken the precautions of downloading all the fics I might ever want to read, via fichub-cli, but that's as good as I know how to do.
18:12:44<Doranwen>My "I wish this were being actively worked on" is LiveJournal but #recordedjournal hasn't had any activity in a long while. The only archiving tool out there currently is something someone (not an AT person) cooked up to use that requires Excel macros. I run Linux and don't have MSOffice so can't use it at all, alas.
18:44:42Megame quits [Client Quit]
19:10:59hackbug quits [Remote host closed the connection]
19:12:46hackbug (hackbug) joins
19:14:08<pokechu22>ShadowJonathan: I'm pretty sure our job for cyoc.net is complete (or was complete when it was done a few months ago) - I did check to make sure all pages were captured after the fact
19:14:41<ShadowJonathan>ah alright
19:14:49<ShadowJonathan>i might've forgotten that, or that might've slipped my mind
19:15:08<ShadowJonathan>i still remember the anxiety of trying to download it, so maybe thats that
19:16:08<pokechu22>Specifically I think I did a second job when the site started being faster where I saved all of the user pages, and then I checked to make sure all of the stories linked from those had been saved by the first job
19:17:35<pokechu22>... and then did one additional job that covered the missed ones (which were mainly new chapters posted afterwards: https://archive.fart.website/archivebot/viewer/domain/urls-transfer.archivete.am-www.cyoc.net_missed_chapters.txt)
19:30:35<ShadowJonathan>alrighty, thanks :)
20:07:47Island joins
20:20:20c3manu quits [Max SendQ exceeded]
20:20:44c3manu (c3manu) joins
20:20:46c3manu quits [Max SendQ exceeded]
20:21:11c3manu (c3manu) joins
20:21:14c3manu quits [Max SendQ exceeded]
20:22:07c3manu (c3manu) joins
20:22:47c3manu quits [Max SendQ exceeded]
20:23:08c3manu (c3manu) joins
20:43:04BlueMaxima joins
21:04:24that_lurker quits [Quit: I am most likely running a system update]
21:04:40that_lurker (that_lurker) joins
21:14:53that_lurker quits [Client Quit]
21:15:19that_lurker (that_lurker) joins
21:15:41<h2ibot>OrIdow6 edited FanFiction.Net (+565, On false negatives during replay due to…): https://wiki.archiveteam.org/?diff=51414&oldid=48810
22:00:16jacksonchen666 (jacksonchen666) joins
22:32:22hitgrr8 quits [Client Quit]
22:34:50c3manu quits [Client Quit]
23:21:20Arcorann (Arcorann) joins
23:36:56lennier2_ quits [Client Quit]
23:41:33lennier1 (lennier1) joins
23:51:26andrew quits [Client Quit]
23:56:13andrew (andrew) joins