| 00:04:11 | | pcr leaves |
| 00:19:02 | | aaeaston joins |
| 00:19:22 | | aaeaston is now authenticated as aaeaston |
| 00:26:04 | | aaeaston quits [Client Quit] |
| 00:59:12 | | Mineroboter joins |
| 01:01:42 | | Mineroboter_ quits [Ping timeout: 250 seconds] |
| 01:02:22 | | dm4v_ joins |
| 01:02:41 | | dm4v quits [Read error: Connection reset by peer] |
| 01:02:41 | | dm4v_ is now known as dm4v |
| 01:02:41 | | dm4v is now authenticated as dm4v |
| 01:02:41 | | dm4v quits [Changing host] |
| 01:02:41 | | dm4v (dm4v) joins |
| 01:16:46 | | aaeaston joins |
| 01:16:53 | | aaeaston is now authenticated as aaeaston |
| 01:36:06 | | aaeaston quits [Client Quit] |
| 01:38:51 | | pcr joins |
| 01:56:13 | | lun4 quits [Quit: Ping timeout (120 seconds)] |
| 01:56:35 | | lun4 (lun4) joins |
| 02:30:37 | | blankie (blankie) joins |
| 02:59:11 | | JackThompson joins |
| 03:00:32 | | Jack_Thompson quits [Ping timeout: 258 seconds] |
| 03:05:58 | | Jack_Thompson joins |
| 03:08:12 | | JackThompson quits [Ping timeout: 258 seconds] |
| 03:26:18 | | BlueMaxima joins |
| 03:38:34 | | crispyalice2 quits [Ping timeout: 250 seconds] |
| 03:38:44 | | pcr leaves |
| 03:38:46 | | pcr joins |
| 03:50:32 | | qw3rty_ joins |
| 03:54:12 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 04:01:36 | | etnguyen03 quits [Client Quit] |
| 04:02:31 | | crispyalice2 (crispyalice2) joins |
| 04:38:17 | | Stilett0 quits [Ping timeout: 258 seconds] |
| 04:42:14 | <@OrIdow6> | Alright, before I read the backlog, I have been gone for ~3 days |
| 04:42:19 | <@OrIdow6> | Maybe it was less, I can't remember |
| 04:43:01 | <@OrIdow6> | Anyhow, aimix-z is thankfully still up (and hopefully will remain so), I have a Japanese proxy that works accessing it, so hopefully I can do that |
| 04:43:33 | <@OrIdow6> | Sorry for sort of vanishing suddenly, and so close to a deadline, hopefully I'll have these things running |
| 04:43:54 | | lennier2 joins |
| 04:46:43 | | lennier1 quits [Ping timeout: 258 seconds] |
| 04:46:46 | | lennier2 is now known as lennier1 |
| 04:49:27 | <@OrIdow6> | gazorpazorp: I recall themadpro looking into something subtitle-related a while ago (or it could have been tech 234 a, I got them mixed up at first) |
| 05:03:20 | <@OrIdow6> | By the way, if anyone knows of a Bintray user that has a small (2-4) number of packages, with a small number of repos, with a small number of versions, that would be nice |
| 05:03:55 | <@OrIdow6> | Because multiple versions tend to make it blow up into thousands of requests, which makes it hard to test |
| 05:13:18 | | blankie quits [Remote host closed the connection] |
| 05:56:18 | | atphoenix_ (atphoenix) joins |
| 05:56:48 | | atphoenix quits [Ping timeout: 250 seconds] |
| 06:38:35 | | blankie (blankie) joins |
| 07:02:53 | | LeGoupil joins |
| 07:10:12 | | Doran (Doranwen) joins |
| 07:10:28 | | Doranwen quits [Ping timeout: 258 seconds] |
| 07:49:16 | | hooway joins |
| 07:53:18 | <Jake> | Orldow6: I have one with a single package, single repo, and one version. https://bintray.com/lightshed This guy also has 9 repos and 4 packages. https://bintray.com/jaycroaker |
| 07:58:46 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 08:29:55 | | Arcorann (Arcorann) joins |
| 08:39:40 | | britmob joins |
| 08:39:55 | | Webuser796 joins |
| 08:41:42 | | britm0b quits [Ping timeout: 258 seconds] |
| 09:17:07 | | dm4v_ joins |
| 09:17:09 | | dm4v quits [Read error: Connection reset by peer] |
| 09:17:13 | | dm4v_ is now known as dm4v |
| 09:17:25 | | dm4v is now authenticated as dm4v |
| 09:17:25 | | dm4v quits [Changing host] |
| 09:17:25 | | dm4v (dm4v) joins |
| 09:19:09 | | blankie quits [Remote host closed the connection] |
| 09:22:51 | | HP_Archivist (HP_Archivist) joins |
| 09:24:34 | | blankie (blankie) joins |
| 09:26:46 | | blankie quits [Remote host closed the connection] |
| 09:30:47 | | Webuser79699 joins |
| 09:33:32 | | Webuser796 quits [Ping timeout: 244 seconds] |
| 09:37:38 | | Kenshin quits [Quit: ZNC - http://znc.in] |
| 09:37:52 | | blankie (blankie) joins |
| 09:38:31 | | shoghicp quits [Quit: My znc bouncer found a childhood friend and left me all alone, how will I survive now? Again?!] |
| 09:39:26 | | Webuser796 joins |
| 09:40:19 | | Kenshin joins |
| 09:40:26 | | blank_x joins |
| 09:40:44 | | blankie quits [Remote host closed the connection] |
| 09:42:50 | | Webuser79699 quits [Ping timeout: 244 seconds] |
| 09:56:50 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 10:08:48 | | blank_x quits [Client Quit] |
| 10:09:13 | | blankie (blankie) joins |
| 10:16:11 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 10:29:38 | | Mateon1 quits [Remote host closed the connection] |
| 10:29:54 | | Mateon1 joins |
| 10:50:13 | | hooway_ joins |
| 10:50:13 | | hooway quits [Read error: Connection reset by peer] |
| 10:54:56 | | Wayward quits [Ping timeout: 250 seconds] |
| 10:56:04 | | Wayward (wayward) joins |
| 11:10:09 | | Webuser796 quits [Ping timeout: 244 seconds] |
| 11:18:20 | | blankie quits [Remote host closed the connection] |
| 11:19:36 | <themadpro> | gazorpazorp and orldow6: It's us alright |
| 11:22:59 | <themadpro> | we have been trying to build a subtitle/caption alliance for the past year or so ever since YouTube removed closed captions, and most work so far has been LIMITED to YouTube. |
| 11:23:34 | | Iki joins |
| 11:24:02 | <themadpro> | We could consider adding this to the backlog, but we have got quite a lot of things ahead of us already. |
| 11:24:29 | <themadpro> | Notably, Jopik is working on publishing a bunch of credits he had gathered from Metadata scrapes over the years. |
| 11:27:13 | | Matthww quits [Client Quit] |
| 11:31:32 | <themadpro> | We're active on Discord, but I might as well grab the channel for it on IRC #scc |
| 11:41:44 | | Matthww joins |
| 11:52:13 | | hooway_ quits [Read error: Connection reset by peer] |
| 11:52:13 | | hooway joins |
| 11:53:46 | | blankie (blankie) joins |
| 11:57:35 | | @Fusl quits [Ping timeout: 258 seconds] |
| 11:57:43 | | sonick joins |
| 12:01:34 | | Fusl (Fusl) joins |
| 12:01:34 | | @ChanServ sets mode: +o Fusl |
| 12:02:08 | <gazorpazorp> | Thanks for answering, themadpro. :) |
| 12:18:57 | | LeGoupil quits [Client Quit] |
| 12:21:27 | | Iki quits [Ping timeout: 244 seconds] |
| 12:24:17 | | LeighR (LeighR) joins |
| 12:29:31 | | etnguyen03 (etnguyen03) joins |
| 12:52:37 | | atphoenix_ is now known as atphoenix |
| 13:23:57 | | pcr leaves |
| 13:23:59 | | pcr joins |
| 13:41:09 | | spirit joins |
| 13:42:06 | | Iki joins |
| 13:54:08 | | sonick quits [Remote host closed the connection] |
| 14:03:42 | | blankie quits [Ping timeout: 258 seconds] |
| 14:52:02 | | hooway quits [Read error: Connection reset by peer] |
| 14:52:12 | | hooway joins |
| 14:52:38 | <@Kaz> | so _that's_ why he wanted to know |
| 14:52:40 | <@Kaz> | smh |
| 14:54:24 | <@EggplantN> | kekw |
| 14:54:29 | <@EggplantN> | oh |
| 14:54:34 | <@EggplantN> | kekw man died recently |
| 14:54:36 | <@EggplantN> | did we archive him |
| 14:55:13 | <Vukky> | reddit archive probably contains at least one meme of him |
| 14:57:46 | <thuban> | afaict he did not have a twitter |
| 15:09:28 | <spirit> | SketchTheCow: could you update the description of https://archive.org/details/atomicgamer?tab=about with these two snippets https://pastebin.com/raw/BNs915Ya ? thanks! |
| 15:10:01 | | Arcorann quits [Ping timeout: 258 seconds] |
| 16:02:35 | | Iki quits [Ping timeout: 244 seconds] |
| 16:36:40 | | spirit quits [Client Quit] |
| 16:38:32 | <etnguyen03> | just curious is atdash.meo.ws no longer public? (just curious, trying to see where my workers are) |
| 16:39:42 | <@EggplantN> | For now, no it is not |
| 16:39:51 | <@EggplantN> | people have abused it slightly recently |
| 16:40:05 | <@EggplantN> | I.e viewing 7 days with 5s refresh and it’s been causing issues |
| 16:41:40 | <etnguyen03> | okay cool |
| 16:53:32 | | Iki joins |
| 16:56:18 | <Jake> | Epic Games acquired ArtStation, a portfolio, digital assets marketplace, kinda website. They say no changes to branding, and lower fees for their marketplace. https://magazine.artstation.com/2021/04/artstation-is-joining-the-epic-games-family/ |
| 17:07:27 | | hilda quits [Client Quit] |
| 17:08:07 | | hilda joins |
| 17:09:50 | <atphoenix> | "no changes" famous last words |
| 18:02:37 | <Ryz> | Jake, atphoenix, launched some archives on ArtStation; |
| 18:03:00 | <Ryz> | Earlier I archived individual peoples' ArtStation accounts during the Activision Blizzard mess earlier this year~ |
| 18:04:22 | <@JAA> | EggplantN: Re findmypast.com, login required, so not possible with AB. |
| 18:05:23 | <@JAA> | Also, 'During a Free Access period or Free Weekend, you may access a maximum of 200 Records per 24hr period (Free Access Limit).' per the T&C. |
| 18:26:53 | <Jake> | Ryz: Awesome, thank you very much! :) |
| 18:27:14 | <Jake> | (and yeah, I think ArtStation is quite large, may not be great for AB.) |
| 18:30:38 | <Ryz> | Jake, could maybe have it as a surface grab? Usually companies being acquired, I usually archive the whole websites and their related subdomains and other websites~ |
| 18:36:29 | | hooway quits [Read error: Connection reset by peer] |
| 18:36:39 | | hooway joins |
| 18:40:21 | | LeighR quits [Remote host closed the connection] |
| 18:52:44 | | Vukky quits [Client Quit] |
| 18:52:56 | | Vukky joins |
| 19:05:25 | <Jake> | Ryz: Yeah, not sure what the best approach here is! A blogpost in 2018 said they had 3.4m monthly users. (https://magazine.artstation.com/2018/03/artstation-marketplace-alpha/) |
| 19:06:37 | | Gereon (Gereon) joins |
| 19:33:31 | | mls (mls) joins |
| 19:43:08 | | shoghicp (shoghicp) joins |
| 19:49:35 | <masterX244> | could be a case for the warrior |
| 19:50:14 | <masterX244> | (i think we need to do a outlink-crawl (including i.stack.imgur.com) on the stackexchange dumps, too, another valuable source of relevant links there) |
| 19:50:25 | | Stiletto joins |
| 19:51:04 | <masterX244> | might be worth the time to hack a tool together that extracts the URLS from the dump and then doing a diff to last run for new outlinks to insert into the URLS project |
| 19:54:03 | <@hook54321> | JAA: I'm adding a section on connecting via mobile to what you wrote on https://wiki.archiveteam.org/index.php/Archiveteam:IRC, feel free to change or move it if you think there's a better place. |
| 19:56:22 | <@JAA> | hook54321: Sounds good! |
| 19:58:42 | <Jake> | masterX244: I believe someone does have a tool to extract certain outlinks from large groups of WARCs. (I believe rewby?) |
| 19:59:06 | <masterX244> | stackexchange is a XML dump (already dumped regularly to archive.org) |
| 19:59:13 | <rewby> | I was about to say, SE is xml |
| 19:59:20 | <rewby> | One I'm working with for a uni project actually |
| 19:59:29 | <rewby> | Painful file to work with |
| 19:59:30 | <masterX244> | jake: should be able to quickly hack together a tool for that job |
| 19:59:39 | <Jake> | Ah sorry, didn't realize it was XML. |
| 19:59:46 | <rewby> | Yeah, it's quite interesting |
| 20:00:04 | <masterX244> | did a really ugly crawler recently for the Trackmania exchange to get all track and replay pages |
| 20:00:05 | <rewby> | Basically every detail of the stackexchange platform (and all sites under it) is archived |
| 20:00:38 | <masterX244> | result file is running through my grab-site instance atm and uploaded to archive.org every 50GB |
| 20:02:01 | <masterX244> | pulling down those XMLs now to find the quickest way to process the files |
| 20:02:25 | <masterX244> | (one smaller to my computer and a full dump over to my server, shit internet so main processing is done at server to avoid that bottleneck) |
| 20:03:14 | <@OrIdow6> | So Aimix-Z is still blocking me |
| 20:03:39 | <@OrIdow6> | I have my suspicions for how they're doing it, not completely sure though |
| 20:03:52 | <@OrIdow6> | Could just be that they're blocking anyone who quickly accesses it |
| 20:04:15 | <@OrIdow6> | Something to work on later: headless browser warriors, maybe using Selenium or whatever |
| 20:04:17 | <masterX244> | preventive crawl or immediate danger atm? |
| 20:04:19 | <@OrIdow6> | *Webdriver |
| 20:04:29 | <@OrIdow6> | Jake: Thanks |
| 20:04:45 | <Jake> | No problem. |
| 20:05:12 | <@OrIdow6> | masterX244: See Deathwatch, technically should have already gone down |
| 20:14:46 | | Vukky quits [Client Quit] |
| 20:18:25 | | Vukky (Vukky) joins |
| 20:21:53 | <@EggplantN> | OrIdow6 can we provide any infra to assist you? |
| 20:21:58 | <@EggplantN> | as in Dual E5v3 with a /23? |
| 20:22:00 | <@EggplantN> | or more? |
| 20:25:21 | <@HCross> | Needs to be Japanese |
| 20:31:14 | <nyany> | oof |
| 20:31:22 | <nyany> | that's a wallet burner |
| 20:33:24 | <@EggplantN> | HCross vultr? |
| 20:33:55 | <@HCross> | Could do, but you wouldn’t be able to bring your own IP With a decent geolocation |
| 20:34:02 | | godane (godane) joins |
| 20:34:07 | <@EggplantN> | ah shit they're fucked geo arent they |
| 20:34:11 | <nyany> | probably |
| 20:34:23 | <nyany> | Linode is the same way |
| 20:34:51 | <@EggplantN> | linode is in JP? |
| 20:35:20 | <nyany> | Yeah |
| 20:35:39 | <nyany> | When I geolocate the IPs it usually comes up as like Atlanta |
| 20:37:07 | <@Kaz> | AK: bought |
| 20:40:30 | <thuban> | i registered a fresh alternatehistory.com account to use with grab-site, but new accounts require admin approval before you can see the forums that are getting deleted :< |
| 20:40:35 | | NF885 (NF885) joins |
| 20:41:19 | <thuban> | if my shit doesn't get confirmed before i get grab-site set up i might just use my personal account and sit on the results |
| 20:49:22 | <AK> | Enjoy it Kaz, well worth it imo |
| 20:51:27 | <@EggplantN> | what has Kaz bought |
| 20:52:54 | <@Kaz> | staycation |
| 20:53:07 | <gazorpazorp> | Is there someone who does PR for ArchiveTeam? I've been reading articles on Yahoo! Answers shutting down and not one of them mentioned ArchiveTeam or anything related to archiving. Some way to coordinate in contacting writers to edit their page would be a nice thing to set up |
| 20:53:49 | <Ajay> | There are many that do mention AT |
| 20:53:52 | <gazorpazorp> | Or when an article talks about censorship of reddit or whatever we archive - a good reminder would be the Wayback Machine and ways to add to it (via ArchiveTeam) |
| 20:54:10 | | Jake8 (Jake) joins |
| 20:54:28 | <@EggplantN> | yes there is a PR person gazorpazorp |
| 20:54:40 | | Jake quits [Ping timeout: 250 seconds] |
| 20:54:40 | | Jake8 is now known as Jake |
| 20:54:40 | <@EggplantN> | his name is Jason Scott |
| 20:54:49 | <@EggplantN> | aka TextFiles/SketchTheCo_W |
| 20:55:44 | <masterX244> | Jake and rewby: Xml parser rigged. Waiting for the full XMLs arriving at my server. |
| 20:56:11 | <Jake> | Nice |
| 20:56:16 | <gazorpazorp> | That's great, @EggplantN and Ajay. Thanks |
| 21:01:15 | <thuban> | ok, question about grab-site: |
| 21:02:46 | <thuban> | what's the most correct way to get threads only from specific forums? (threads are under generic 'threads' urls, not per-forum.) |
| 21:04:51 | <masterX244> | Enumerate the URLs of all forum thread list pages that you want to get the threads from, then add the forum index URL as ignore (ignores don't ignore starting URLs) so it doesnt go into other subforums |
| 21:06:07 | <thuban> | oh cool, thanks |
| 21:07:24 | <thuban> | i figured i'd be doing that enumeration, but i wasn't sure whether i'd end up in other fora via miscellaneous ui... |
| 21:08:21 | <masterX244> | or blacklist thread urls, too if you also enumerated all of them, that way both main escape paths are blocked |
| 21:09:55 | <thuban> | the other option i considered would be to enumerate the threads of interest and just use no-parent (since thread pages are children) |
| 21:11:13 | <thuban> | but to enumerate the threads, i'd have to get them from the thread list pages somehow, and it felt silly to do that manually if i could figure out a way to get grab-site to do it for me |
| 21:11:37 | <masterX244> | last time that i needed to do that i hacked together a quick and dirty C# program |
| 21:11:58 | <thuban> | grab-site definitely doesn't have a whitelist mode, right? (ignore everything _except_ /threads/ urls?) |
| 21:12:31 | <masterX244> | regex allows a match anything except. but direct links to other threads allows escaping that way |
| 21:12:39 | <thuban> | ah yeah |
| 21:13:21 | <thuban> | though i'm not sure that's as much of a concern |
| 21:33:15 | | Iki quits [Ping timeout: 244 seconds] |
| 21:33:45 | | superkuh joins |
| 21:35:50 | | NF885 quits [Ping timeout: 244 seconds] |
| 21:44:23 | <@OrIdow6> | EggplantN HCross: Thanks for the offer, right now I'm sort of busy, it is possible that I will be able to bypass the geographic thing with Accept-Language as that seemed to extend the time before I got banned from a Japanese IP |
| 21:44:31 | <@OrIdow6> | Well, I or anyone else |
| 21:46:10 | <@OrIdow6> | Anyhow, at present it's in limbo, where it should have been shut down but hasn't |
| 21:46:20 | <@OrIdow6> | Well, as of a few hoursa go |
| 22:06:43 | | Sylirana (Sylirana) joins |
| 22:17:39 | <thuban> | does grab-site --1 (no recursion) disable offsite links? |
| 22:18:06 | | sonick joins |
| 22:19:52 | <thuban> | ugh, wait |
| 22:20:59 | <thuban> | i don't want --1, i want no-parent (like the default archivebot behavior). is that the default for grab-site too? |
| 22:27:40 | | hooway_ joins |
| 22:27:40 | | hooway quits [Read error: Connection reset by peer] |
| 22:28:01 | <@JAA> | thuban: https://github.com/ArchiveTeam/grab-site/blob/132064a24eeedbad2881128f932fca8b0c56ac64/libgrabsite/main.py#L221 |
| 22:28:18 | <@JAA> | Yes, --no-parent is the default. |
| 22:28:20 | <thuban> | ty JAA :) |
| 22:29:29 | <thuban> | unfortunately when trying `grab-site --input-file ~/misc/at/ah-urls.txt --igsets=forums --wpull-args=--load-cookies=/tmp/alternatehistory.com_cookies.txt`, i get the following errors: |
| 22:29:47 | <thuban> | "sqlalchemy.exc.InvalidRequestError: Could not evaluate current criteria in Python: "Cannot evaluate Select". Specify 'fetch' or False for the synchronize_session execution option.", followed by "CRITICAL Sorry, Wpull unexpectedly crashed." |
| 22:30:11 | <@JAA> | Which SQLAlchemy version? |
| 22:30:46 | <thuban> | 1.4.12 |
| 22:31:06 | | hooway_ quits [Read error: Connection reset by peer] |
| 22:31:11 | <@JAA> | Try a 1.3.x version instead. At least standard wpull broke in a number of ways with 1.4. |
| 22:31:26 | <@JAA> | Actually yeah, exactly that error: https://github.com/ArchiveTeam/wpull/issues/463 |
| 22:31:39 | | hooway joins |
| 22:31:52 | <@JAA> | (grab-site uses a fork, but it's close enough in this respect I believe.) |
| 22:32:33 | <thuban> | ok, trying again with 1.3.24... |
| 22:33:20 | <thuban> | and it's working :) thanks! |
| 22:34:53 | <thuban> | my one concern is that i'm currently seeing only urls from the input file, not page prerequisites or subsequent pages of threads; are those all queued at the end? |
| 22:36:26 | <@JAA> | Yes, wpull does breadth-first recursion. |
| 22:36:36 | <thuban> | ok, good to know. |
| 22:49:51 | | BlueMaxima joins |
| 23:05:32 | | hooway quits [Client Quit] |
| 23:37:00 | <@JAA> | OrIdow6: You working on Bintray? |
| 23:39:11 | <@OrIdow6> | JAA: The last day or so, no, though there is a semi-working grab script |
| 23:39:23 | <@OrIdow6> | In the sense that it |
| 23:39:34 | <@OrIdow6> | gets the essential data but not the interface stuff |
| 23:39:43 | <@JAA> | I see. |
| 23:39:48 | <@JAA> | I'll try to get some discovery done. |
| 23:39:59 | <@OrIdow6> | I did some already, let me find it |
| 23:41:10 | <@OrIdow6> | Was simple, I just searched for alphanumerical strings on the user search - could not figure out how not do do approximate matching |
| 23:42:01 | <@OrIdow6> | https://transfer.archivete.am/6z4Kw/search.py |
| 23:43:05 | <@JAA> | Yeah, that was more or less what I had in mind as well. |
| 23:43:11 | <@JAA> | Sadly, pagination breaks at 10k. |
| 23:43:54 | <@OrIdow6> | Yeah |
| 23:44:50 | <@JAA> | What queries did you run? |
| 23:45:57 | <@OrIdow6> | Um |
| 23:46:57 | <@OrIdow6> | 0a-zo apparently, not sure how that's being sorted |
| 23:49:23 | <@OrIdow6> | https://transfer.archivete.am/RCW9x/run - what ran successfully |
| 23:49:38 | <@OrIdow6> | Judging from the size of stdout |
| 23:50:13 | <@OrIdow6> | https://transfer.archivete.am/qtfr5/stdout_sorted.txt.zstandard - results |
| 23:54:48 | <@JAA> | Huh, didn't run p-z on the second character? I'm seeing results on those. |
| 23:57:54 | <@OrIdow6> | I think I stopped it once it plateaud |
| 23:58:04 | <@JAA> | Ah |