00:04:11pcr leaves
00:19:02aaeaston joins
00:26:04aaeaston quits [Client Quit]
00:59:12Mineroboter joins
01:01:42Mineroboter_ quits [Ping timeout: 250 seconds]
01:02:22dm4v_ joins
01:02:41dm4v quits [Read error: Connection reset by peer]
01:02:41dm4v_ is now known as dm4v
01:02:41dm4v quits [Changing host]
01:02:41dm4v (dm4v) joins
01:16:46aaeaston joins
01:36:06aaeaston quits [Client Quit]
01:38:51pcr joins
01:56:13lun4 quits [Quit: Ping timeout (120 seconds)]
01:56:35lun4 (lun4) joins
02:30:37blankie (blankie) joins
02:59:11JackThompson joins
03:00:32Jack_Thompson quits [Ping timeout: 258 seconds]
03:05:58Jack_Thompson joins
03:08:12JackThompson quits [Ping timeout: 258 seconds]
03:26:18BlueMaxima joins
03:38:34crispyalice2 quits [Ping timeout: 250 seconds]
03:38:44pcr leaves
03:38:46pcr joins
03:50:32qw3rty_ joins
03:54:12qw3rty__ quits [Ping timeout: 258 seconds]
04:01:36etnguyen03 quits [Client Quit]
04:02:31crispyalice2 (crispyalice2) joins
04:38:17Stilett0 quits [Ping timeout: 258 seconds]
04:42:14<@OrIdow6>Alright, before I read the backlog, I have been gone for ~3 days
04:42:19<@OrIdow6>Maybe it was less, I can't remember
04:43:01<@OrIdow6>Anyhow, aimix-z is thankfully still up (and hopefully will remain so), I have a Japanese proxy that works accessing it, so hopefully I can do that
04:43:33<@OrIdow6>Sorry for sort of vanishing suddenly, and so close to a deadline, hopefully I'll have these things running
04:43:54lennier2 joins
04:46:43lennier1 quits [Ping timeout: 258 seconds]
04:46:46lennier2 is now known as lennier1
04:49:27<@OrIdow6>gazorpazorp: I recall themadpro looking into something subtitle-related a while ago (or it could have been tech 234 a, I got them mixed up at first)
05:03:20<@OrIdow6>By the way, if anyone knows of a Bintray user that has a small (2-4) number of packages, with a small number of repos, with a small number of versions, that would be nice
05:03:55<@OrIdow6>Because multiple versions tend to make it blow up into thousands of requests, which makes it hard to test
05:13:18blankie quits [Remote host closed the connection]
05:56:18atphoenix_ (atphoenix) joins
05:56:48atphoenix quits [Ping timeout: 250 seconds]
06:38:35blankie (blankie) joins
07:02:53LeGoupil joins
07:10:12Doran (Doranwen) joins
07:10:28Doranwen quits [Ping timeout: 258 seconds]
07:49:16hooway joins
07:53:18<Jake>Orldow6: I have one with a single package, single repo, and one version. https://bintray.com/lightshed This guy also has 9 repos and 4 packages. https://bintray.com/jaycroaker
07:58:46HP_Archivist quits [Ping timeout: 258 seconds]
08:29:55Arcorann (Arcorann) joins
08:39:40britmob joins
08:39:55Webuser796 joins
08:41:42britm0b quits [Ping timeout: 258 seconds]
09:17:07dm4v_ joins
09:17:09dm4v quits [Read error: Connection reset by peer]
09:17:13dm4v_ is now known as dm4v
09:17:25dm4v quits [Changing host]
09:17:25dm4v (dm4v) joins
09:19:09blankie quits [Remote host closed the connection]
09:22:51HP_Archivist (HP_Archivist) joins
09:24:34blankie (blankie) joins
09:26:46blankie quits [Remote host closed the connection]
09:30:47Webuser79699 joins
09:33:32Webuser796 quits [Ping timeout: 244 seconds]
09:37:38Kenshin quits [Quit: ZNC - http://znc.in]
09:37:52blankie (blankie) joins
09:38:31shoghicp quits [Quit: My znc bouncer found a childhood friend and left me all alone, how will I survive now? Again?!]
09:39:26Webuser796 joins
09:40:19Kenshin joins
09:40:26blank_x joins
09:40:44blankie quits [Remote host closed the connection]
09:42:50Webuser79699 quits [Ping timeout: 244 seconds]
09:56:50HP_Archivist quits [Ping timeout: 258 seconds]
10:08:48blank_x quits [Client Quit]
10:09:13blankie (blankie) joins
10:16:11BlueMaxima quits [Read error: Connection reset by peer]
10:29:38Mateon1 quits [Remote host closed the connection]
10:29:54Mateon1 joins
10:50:13hooway_ joins
10:50:13hooway quits [Read error: Connection reset by peer]
10:54:56Wayward quits [Ping timeout: 250 seconds]
10:56:04Wayward (wayward) joins
11:10:09Webuser796 quits [Ping timeout: 244 seconds]
11:18:20blankie quits [Remote host closed the connection]
11:19:36<themadpro>gazorpazorp and orldow6: It's us alright
11:22:59<themadpro>we have been trying to build a subtitle/caption alliance for the past year or so ever since YouTube removed closed captions, and most work so far has been LIMITED to YouTube.
11:23:34Iki joins
11:24:02<themadpro>We could consider adding this to the backlog, but we have got quite a lot of things ahead of us already.
11:24:29<themadpro>Notably, Jopik is working on publishing a bunch of credits he had gathered from Metadata scrapes over the years.
11:27:13Matthww quits [Client Quit]
11:31:32<themadpro>We're active on Discord, but I might as well grab the channel for it on IRC #scc
11:41:44Matthww joins
11:52:13hooway_ quits [Read error: Connection reset by peer]
11:52:13hooway joins
11:53:46blankie (blankie) joins
11:57:35@Fusl quits [Ping timeout: 258 seconds]
11:57:43sonick joins
12:01:34Fusl (Fusl) joins
12:01:34@ChanServ sets mode: +o Fusl
12:02:08<gazorpazorp>Thanks for answering, themadpro. :)
12:18:57LeGoupil quits [Client Quit]
12:21:27Iki quits [Ping timeout: 244 seconds]
12:24:17LeighR (LeighR) joins
12:29:31etnguyen03 (etnguyen03) joins
12:52:37atphoenix_ is now known as atphoenix
13:23:57pcr leaves
13:23:59pcr joins
13:41:09spirit joins
13:42:06Iki joins
13:54:08sonick quits [Remote host closed the connection]
14:03:42blankie quits [Ping timeout: 258 seconds]
14:52:02hooway quits [Read error: Connection reset by peer]
14:52:12hooway joins
14:52:38<@Kaz>so _that's_ why he wanted to know
14:52:40<@Kaz>smh
14:54:24<@EggplantN>kekw
14:54:29<@EggplantN>oh
14:54:34<@EggplantN>kekw man died recently
14:54:36<@EggplantN>did we archive him
14:55:13<Vukky>reddit archive probably contains at least one meme of him
14:57:46<thuban>afaict he did not have a twitter
15:09:28<spirit>SketchTheCow: could you update the description of https://archive.org/details/atomicgamer?tab=about with these two snippets https://pastebin.com/raw/BNs915Ya ? thanks!
15:10:01Arcorann quits [Ping timeout: 258 seconds]
16:02:35Iki quits [Ping timeout: 244 seconds]
16:36:40spirit quits [Client Quit]
16:38:32<etnguyen03>just curious is atdash.meo.ws no longer public? (just curious, trying to see where my workers are)
16:39:42<@EggplantN>For now, no it is not
16:39:51<@EggplantN>people have abused it slightly recently
16:40:05<@EggplantN>I.e viewing 7 days with 5s refresh and it’s been causing issues
16:41:40<etnguyen03>okay cool
16:53:32Iki joins
16:56:18<Jake>Epic Games acquired ArtStation, a portfolio, digital assets marketplace, kinda website. They say no changes to branding, and lower fees for their marketplace. https://magazine.artstation.com/2021/04/artstation-is-joining-the-epic-games-family/
17:07:27hilda quits [Client Quit]
17:08:07hilda joins
17:09:50<atphoenix>"no changes" famous last words
18:02:37<Ryz>Jake, atphoenix, launched some archives on ArtStation;
18:03:00<Ryz>Earlier I archived individual peoples' ArtStation accounts during the Activision Blizzard mess earlier this year~
18:04:22<@JAA>EggplantN: Re findmypast.com, login required, so not possible with AB.
18:05:23<@JAA>Also, 'During a Free Access period or Free Weekend, you may access a maximum of 200 Records per 24hr period (Free Access Limit).' per the T&C.
18:26:53<Jake>Ryz: Awesome, thank you very much! :)
18:27:14<Jake>(and yeah, I think ArtStation is quite large, may not be great for AB.)
18:30:38<Ryz>Jake, could maybe have it as a surface grab? Usually companies being acquired, I usually archive the whole websites and their related subdomains and other websites~
18:36:29hooway quits [Read error: Connection reset by peer]
18:36:39hooway joins
18:40:21LeighR quits [Remote host closed the connection]
18:52:44Vukky quits [Client Quit]
18:52:56Vukky joins
19:05:25<Jake>Ryz: Yeah, not sure what the best approach here is! A blogpost in 2018 said they had 3.4m monthly users. (https://magazine.artstation.com/2018/03/artstation-marketplace-alpha/)
19:06:37Gereon (Gereon) joins
19:33:31mls (mls) joins
19:43:08shoghicp (shoghicp) joins
19:49:35<masterX244>could be a case for the warrior
19:50:14<masterX244>(i think we need to do a outlink-crawl (including i.stack.imgur.com) on the stackexchange dumps, too, another valuable source of relevant links there)
19:50:25Stiletto joins
19:51:04<masterX244>might be worth the time to hack a tool together that extracts the URLS from the dump and then doing a diff to last run for new outlinks to insert into the URLS project
19:54:03<@hook54321>JAA: I'm adding a section on connecting via mobile to what you wrote on https://wiki.archiveteam.org/index.php/Archiveteam:IRC, feel free to change or move it if you think there's a better place.
19:56:22<@JAA>hook54321: Sounds good!
19:58:42<Jake>masterX244: I believe someone does have a tool to extract certain outlinks from large groups of WARCs. (I believe rewby?)
19:59:06<masterX244>stackexchange is a XML dump (already dumped regularly to archive.org)
19:59:13<rewby>I was about to say, SE is xml
19:59:20<rewby>One I'm working with for a uni project actually
19:59:29<rewby>Painful file to work with
19:59:30<masterX244>jake: should be able to quickly hack together a tool for that job
19:59:39<Jake>Ah sorry, didn't realize it was XML.
19:59:46<rewby>Yeah, it's quite interesting
20:00:04<masterX244>did a really ugly crawler recently for the Trackmania exchange to get all track and replay pages
20:00:05<rewby>Basically every detail of the stackexchange platform (and all sites under it) is archived
20:00:38<masterX244>result file is running through my grab-site instance atm and uploaded to archive.org every 50GB
20:02:01<masterX244>pulling down those XMLs now to find the quickest way to process the files
20:02:25<masterX244>(one smaller to my computer and a full dump over to my server, shit internet so main processing is done at server to avoid that bottleneck)
20:03:14<@OrIdow6>So Aimix-Z is still blocking me
20:03:39<@OrIdow6>I have my suspicions for how they're doing it, not completely sure though
20:03:52<@OrIdow6>Could just be that they're blocking anyone who quickly accesses it
20:04:15<@OrIdow6>Something to work on later: headless browser warriors, maybe using Selenium or whatever
20:04:17<masterX244>preventive crawl or immediate danger atm?
20:04:19<@OrIdow6>*Webdriver
20:04:29<@OrIdow6>Jake: Thanks
20:04:45<Jake>No problem.
20:05:12<@OrIdow6>masterX244: See Deathwatch, technically should have already gone down
20:14:46Vukky quits [Client Quit]
20:18:25Vukky (Vukky) joins
20:21:53<@EggplantN>OrIdow6 can we provide any infra to assist you?
20:21:58<@EggplantN>as in Dual E5v3 with a /23?
20:22:00<@EggplantN>or more?
20:25:21<@HCross>Needs to be Japanese
20:31:14<nyany>oof
20:31:22<nyany>that's a wallet burner
20:33:24<@EggplantN>HCross vultr?
20:33:55<@HCross>Could do, but you wouldn’t be able to bring your own IP With a decent geolocation
20:34:02godane (godane) joins
20:34:07<@EggplantN>ah shit they're fucked geo arent they
20:34:11<nyany>probably
20:34:23<nyany>Linode is the same way
20:34:51<@EggplantN>linode is in JP?
20:35:20<nyany>Yeah
20:35:39<nyany>When I geolocate the IPs it usually comes up as like Atlanta
20:37:07<@Kaz>AK: bought
20:40:30<thuban>i registered a fresh alternatehistory.com account to use with grab-site, but new accounts require admin approval before you can see the forums that are getting deleted :<
20:40:35NF885 (NF885) joins
20:41:19<thuban>if my shit doesn't get confirmed before i get grab-site set up i might just use my personal account and sit on the results
20:49:22<AK>Enjoy it Kaz, well worth it imo
20:51:27<@EggplantN>what has Kaz bought
20:52:54<@Kaz>staycation
20:53:07<gazorpazorp>Is there someone who does PR for ArchiveTeam? I've been reading articles on Yahoo! Answers shutting down and not one of them mentioned ArchiveTeam or anything related to archiving. Some way to coordinate in contacting writers to edit their page would be a nice thing to set up
20:53:49<Ajay>There are many that do mention AT
20:53:52<gazorpazorp>Or when an article talks about censorship of reddit or whatever we archive - a good reminder would be the Wayback Machine and ways to add to it (via ArchiveTeam)
20:54:10Jake8 (Jake) joins
20:54:28<@EggplantN>yes there is a PR person gazorpazorp
20:54:40Jake quits [Ping timeout: 250 seconds]
20:54:40Jake8 is now known as Jake
20:54:40<@EggplantN>his name is Jason Scott
20:54:49<@EggplantN>aka TextFiles/SketchTheCo_W
20:55:44<masterX244>Jake and rewby: Xml parser rigged. Waiting for the full XMLs arriving at my server.
20:56:11<Jake>Nice
20:56:16<gazorpazorp>That's great, @EggplantN and Ajay. Thanks
21:01:15<thuban>ok, question about grab-site:
21:02:46<thuban>what's the most correct way to get threads only from specific forums? (threads are under generic 'threads' urls, not per-forum.)
21:04:51<masterX244>Enumerate the URLs of all forum thread list pages that you want to get the threads from, then add the forum index URL as ignore (ignores don't ignore starting URLs) so it doesnt go into other subforums
21:06:07<thuban>oh cool, thanks
21:07:24<thuban>i figured i'd be doing that enumeration, but i wasn't sure whether i'd end up in other fora via miscellaneous ui...
21:08:21<masterX244>or blacklist thread urls, too if you also enumerated all of them, that way both main escape paths are blocked
21:09:55<thuban>the other option i considered would be to enumerate the threads of interest and just use no-parent (since thread pages are children)
21:11:13<thuban>but to enumerate the threads, i'd have to get them from the thread list pages somehow, and it felt silly to do that manually if i could figure out a way to get grab-site to do it for me
21:11:37<masterX244>last time that i needed to do that i hacked together a quick and dirty C# program
21:11:58<thuban>grab-site definitely doesn't have a whitelist mode, right? (ignore everything _except_ /threads/ urls?)
21:12:31<masterX244>regex allows a match anything except. but direct links to other threads allows escaping that way
21:12:39<thuban>ah yeah
21:13:21<thuban>though i'm not sure that's as much of a concern
21:33:15Iki quits [Ping timeout: 244 seconds]
21:33:45superkuh joins
21:35:50NF885 quits [Ping timeout: 244 seconds]
21:44:23<@OrIdow6>EggplantN HCross: Thanks for the offer, right now I'm sort of busy, it is possible that I will be able to bypass the geographic thing with Accept-Language as that seemed to extend the time before I got banned from a Japanese IP
21:44:31<@OrIdow6>Well, I or anyone else
21:46:10<@OrIdow6>Anyhow, at present it's in limbo, where it should have been shut down but hasn't
21:46:20<@OrIdow6>Well, as of a few hoursa go
22:06:43Sylirana (Sylirana) joins
22:17:39<thuban>does grab-site --1 (no recursion) disable offsite links?
22:18:06sonick joins
22:19:52<thuban>ugh, wait
22:20:59<thuban>i don't want --1, i want no-parent (like the default archivebot behavior). is that the default for grab-site too?
22:27:40hooway_ joins
22:27:40hooway quits [Read error: Connection reset by peer]
22:28:01<@JAA>thuban: https://github.com/ArchiveTeam/grab-site/blob/132064a24eeedbad2881128f932fca8b0c56ac64/libgrabsite/main.py#L221
22:28:18<@JAA>Yes, --no-parent is the default.
22:28:20<thuban>ty JAA :)
22:29:29<thuban>unfortunately when trying `grab-site --input-file ~/misc/at/ah-urls.txt --igsets=forums --wpull-args=--load-cookies=/tmp/alternatehistory.com_cookies.txt`, i get the following errors:
22:29:47<thuban>"sqlalchemy.exc.InvalidRequestError: Could not evaluate current criteria in Python: "Cannot evaluate Select". Specify 'fetch' or False for the synchronize_session execution option.", followed by "CRITICAL Sorry, Wpull unexpectedly crashed."
22:30:11<@JAA>Which SQLAlchemy version?
22:30:46<thuban>1.4.12
22:31:06hooway_ quits [Read error: Connection reset by peer]
22:31:11<@JAA>Try a 1.3.x version instead. At least standard wpull broke in a number of ways with 1.4.
22:31:26<@JAA>Actually yeah, exactly that error: https://github.com/ArchiveTeam/wpull/issues/463
22:31:39hooway joins
22:31:52<@JAA>(grab-site uses a fork, but it's close enough in this respect I believe.)
22:32:33<thuban>ok, trying again with 1.3.24...
22:33:20<thuban>and it's working :) thanks!
22:34:53<thuban>my one concern is that i'm currently seeing only urls from the input file, not page prerequisites or subsequent pages of threads; are those all queued at the end?
22:36:26<@JAA>Yes, wpull does breadth-first recursion.
22:36:36<thuban>ok, good to know.
22:49:51BlueMaxima joins
23:05:32hooway quits [Client Quit]
23:37:00<@JAA>OrIdow6: You working on Bintray?
23:39:11<@OrIdow6>JAA: The last day or so, no, though there is a semi-working grab script
23:39:23<@OrIdow6>In the sense that it
23:39:34<@OrIdow6>gets the essential data but not the interface stuff
23:39:43<@JAA>I see.
23:39:48<@JAA>I'll try to get some discovery done.
23:39:59<@OrIdow6>I did some already, let me find it
23:41:10<@OrIdow6>Was simple, I just searched for alphanumerical strings on the user search - could not figure out how not do do approximate matching
23:42:01<@OrIdow6>https://transfer.archivete.am/6z4Kw/search.py
23:43:05<@JAA>Yeah, that was more or less what I had in mind as well.
23:43:11<@JAA>Sadly, pagination breaks at 10k.
23:43:54<@OrIdow6>Yeah
23:44:50<@JAA>What queries did you run?
23:45:57<@OrIdow6>Um
23:46:57<@OrIdow6>0a-zo apparently, not sure how that's being sorted
23:49:23<@OrIdow6>https://transfer.archivete.am/RCW9x/run - what ran successfully
23:49:38<@OrIdow6>Judging from the size of stdout
23:50:13<@OrIdow6>https://transfer.archivete.am/qtfr5/stdout_sorted.txt.zstandard - results
23:54:48<@JAA>Huh, didn't run p-z on the second character? I'm seeing results on those.
23:57:54<@OrIdow6>I think I stopped it once it plateaud
23:58:04<@JAA>Ah