00:21:31Hackerpcs (Hackerpcs) joins
00:24:24cascode quits [Ping timeout: 260 seconds]
00:24:43cascode joins
00:30:19lemuria (lemuria) joins
01:15:31Webuser123339 joins
01:18:46Webuser123339 quits [Client Quit]
01:19:46Wohlstand quits [Quit: Wohlstand]
01:36:44nicolas17 quits [Ping timeout: 260 seconds]
01:39:50<h2ibot>Cooljeanius edited Itch.io (+51, Use URL template more; copyedit; add explicit…): https://wiki.archiveteam.org/?diff=56589&oldid=56588
01:40:27Island quits [Read error: Connection reset by peer]
01:40:44Island joins
01:40:50<h2ibot>Cooljeanius edited Itch.io (-2, Fix heading level for references section): https://wiki.archiveteam.org/?diff=56590&oldid=56589
01:46:33nicolas17 joins
01:49:31cuphead2527480 quits [Quit: Connection closed for inactivity]
01:51:54cascode quits [Ping timeout: 260 seconds]
01:52:19cascode joins
01:59:29cascode quits [Ping timeout: 260 seconds]
02:02:18cascode joins
02:02:59nicolas17 quits [Ping timeout: 260 seconds]
02:03:46cascode quits [Read error: Connection reset by peer]
02:04:03cascode joins
02:04:43nicolas17 joins
02:12:52<pokechu22>All of the data JSONs for itch.io have finished downloading. But it doesn't directly mark NSFW games, only ones that are tagged NSFW (and that's not a complete list; e.g. there are games tagged "porn" but not "nsfw"). I'll try to come up with a list of tags based on wiktionary I guess
02:16:13etnguyen03 quits [Client Quit]
02:16:34etnguyen03 (etnguyen03) joins
02:24:26<pokechu22>... ok, turns out there are a lot of words in subcategories of https://en.wiktionary.org/wiki/Category:en:Sex which I don't really want to fully understand
02:25:52BornOn420 quits [Remote host closed the connection]
02:25:55<BlankEclair>> a pussy and a pulse
02:25:59<BlankEclair>interesting requirements
02:26:08nicolas17_ joins
02:26:26BornOn420 (BornOn420) joins
02:26:54nicolas17 quits [Ping timeout: 260 seconds]
02:28:37BornOn420 quits [Max SendQ exceeded]
02:29:07BornOn420 (BornOn420) joins
02:31:17BornOn420 quits [Max SendQ exceeded]
02:31:29flotwig quits [Read error: Connection reset by peer]
02:31:41flotwig joins
02:31:48BornOn420 (BornOn420) joins
02:55:20etnguyen03 quits [Remote host closed the connection]
02:55:38<pokechu22>... and now I'm having trouble with the reverse https://en.wikipedia.org/wiki/Scunthorpe_problem where I was picking up https://demonixis.itch.io/mars-extraction due to s-ex becoming sex
03:06:07flotwig quits [Client Quit]
03:07:12flotwig joins
03:23:02BearFortress_ joins
03:25:49BearFortress quits [Ping timeout: 260 seconds]
03:30:29lennier2_ quits [Ping timeout: 260 seconds]
03:31:48lennier2 joins
03:54:41GradientCat quits [Quit: Connection closed for inactivity]
04:03:17Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
04:27:12<pokechu22>I'm running a second list (https://transfer.archivete.am/DVdSZ/itch.io_nsfw_games.txt.zst) with 87267 + 24611 games in #archivebot; this still won't get downloads but we can prioritize potentially-NSFW games. Note that https://web.archive.org/web/20250716040724/https://itch.io/games/nsfw only says there's 28,144 NSFW games so this is a lot of extra ones, but this is the
04:27:13<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/DVdSZ/itch.io_nsfw_games.txt.zst)
04:27:14<pokechu22>easiest thing to do
04:30:30nicolas17_ is now known as nicolas17
04:31:10eythian quits [Quit: http://quassel-irc.org - Chat comfortabel. Waar dan ook.]
04:32:37eythian joins
04:38:13Guest58 joins
04:39:17datechnoman (datechnoman) joins
04:44:34DogsRNice quits [Ping timeout: 260 seconds]
04:45:25DogsRNice joins
04:47:12DogsRNice_ joins
04:49:17chrismeller8 quits [Quit: chrismeller8]
04:49:54DogsRNice quits [Ping timeout: 240 seconds]
04:49:55chrismeller8 (chrismeller) joins
04:56:43dabs quits [Read error: Connection reset by peer]
05:24:16DogsRNice_ quits [Read error: Connection reset by peer]
06:32:22Island quits [Read error: Connection reset by peer]
06:41:14HP_Archivist quits [Ping timeout: 240 seconds]
06:42:04<gamer191-1|m><pokechu22> "I'm running a second list (https..." <- If it’s small enough to run in archivebot, why not just archive all games?
06:43:40<pokechu22>I've also got one running all games, but they have rate-limiting where running the whole list of games will take probably about a week to just do the initial page HTML (longer to also get images and whatever else it'll discover)
06:44:10<pokechu22>note that both jobs *won't* be able to get downloads (even for free or name your price games) since those are POST-based and archivebot doesn't do that
06:57:51awauwa (awauwa) joins
07:02:30<gamer191-1|m><pokechu22> "I've also got one running all..." <- Nice
07:02:46<gamer191-1|m><pokechu22> "note that both jobs *won't* be..." <- Oh, what’s the plan for those?
07:04:17<gamer191-1|m>Out of interest, are you archiving the data.json files (I’m not saying you necessarily should, they probably aren’t worth archiving)
07:10:23<pokechu22>Yes; I did all of the data.json files split across 6 archivebot !ao < list jobs for speed reasons. They were needed to be able to search all games by tags, given that the whole problem is that the actual search isn't showing NSFW games
07:11:01<pokechu22>OrIdow6 is looking into handling downloads
07:13:33<pokechu22>oh, and the data.json files also include prices, so they can be used to identify free/name your price games
07:23:47<h2ibot>Manu edited Discourse/archived (+87, Queued forums.pimoroni.com): https://wiki.archiveteam.org/?diff=56591&oldid=56458
07:30:25<BlankEclair>fyi, some files in a game can be free, and some can be paid
07:30:27<BlankEclair>https://exodrifter.itch.io/gender-dysphoria
07:30:38<BlankEclair>binaries are free, ost is >= US$2
07:40:23<BlankEclair>unrelated, but in a personal context and a twitter account to burn, what's the best way to archive a twitter account?
07:45:11HP_Archivist (HP_Archivist) joins
08:22:56<h2ibot>Exorcism edited Discourse/archived (+275): https://wiki.archiveteam.org/?diff=56592&oldid=56591
08:27:45<c3manu>BlankEclair: we currently have none. we had a few nitter instances we could use for that a while ago, but those have stopped working (can't just take any nitter instance, since #archivebot's load+traffic would probably cause an unreasonable amount of costs for whoever is running it)
08:28:20<c3manu>twitter has done it's best to close off the page to crawlers etc. (except you have that Russian bot subscription or whatever)
08:28:26<BlankEclair>i probably should've clarified that i'm fine w/ making my own dump ^^;
08:28:42<c3manu>for now we just collect the accounts in case archiving them gets possible again: https://pad.notkiska.pw/p/archivebot-twitter
08:28:44<c3manu>oh, okay
08:28:58<c3manu>but that's gonna have the same problem pretty much
08:29:03<c3manu>(i think)
08:29:48<BlankEclair>> in a personal context and a twitter account to burn
08:29:49<BlankEclair>:p
08:31:00<c3manu>how much stuff do you have to archive?
08:31:51<BlankEclair>just one account, though i don't know how big it is
08:32:28<c3manu>hm.. i can't think of anything reasonable off the top of my head
08:33:01<BlankEclair>ouch ^^;
08:35:15<pabs>the wiki page has a few workarounds, but they aren't very useful for archiving
08:43:43<BlankEclair>would a self-hosted nitter instance be, while overkill, effective?
08:48:24pabs quits [Ping timeout: 260 seconds]
08:48:59HP_Archivist quits [Ping timeout: 260 seconds]
08:52:02pabs (pabs) joins
08:54:29<PredatorIWD25>BlankEclair: I have an old, basic extractor config for gallery-dl for Twitter that should archive everything off of someone's wall, including retweets and metadata of the posts. You should definitely double check if it all works properly though and if this solution is enough for you.
08:54:32<PredatorIWD25>Config: https://pastebin.com/tFTqS19i Example command after you log into your Twitter account on Firefox: gallery-dl --cookies-from-browser firefox --config gallery-dl_config.conf https://x.com/x/
08:54:32<eggdrop>nitter: https://nitter.net/x/
09:29:14Aoede quits [Ping timeout: 260 seconds]
09:34:08<@OrIdow6>BlankEclair: Would holding down the end key and using SingleFIle be enough for you?
09:34:28APOLLO03 joins
09:45:00HP_Archivist (HP_Archivist) joins
09:48:20APOLLO03a joins
09:48:30APOLLO03a quits [Client Quit]
09:49:34APOLLO03 quits [Ping timeout: 240 seconds]
09:50:36APOLLO03 joins
10:02:18Hackerpcs quits [Quit: Hackerpcs]
10:02:36APOLLO03 quits [Client Quit]
10:05:12APOLLO03 joins
10:05:27<BlankEclair>OrIdow6: not fully sure how singlefile works, but i don't _think_ that it would save full-res images
10:12:30APOLLO03 quits [Client Quit]
10:26:38GradientCat (GradientCat) joins
10:48:45Dada joins
10:48:54HP_Archivist quits [Ping timeout: 240 seconds]
11:00:01Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:46Bleo182600722719623455222 joins
11:38:45ineffyble (ineffyble) joins
11:45:00HP_Archivist (HP_Archivist) joins
11:57:50etnguyen03 (etnguyen03) joins
12:02:09cuphead2527480 (Cuphead2527480) joins
12:38:28APOLLO03 joins
12:44:36etnguyen03 quits [Client Quit]
12:48:14HP_Archivist quits [Ping timeout: 240 seconds]
13:09:29<cruller>How about requesting your Twitter data from https://twitter.com/settings/your_twitter_data/request_data, extracting the URLs of individual tweets from there, and saving them with gallery-dl or something similar?
13:09:29<eggdrop>nitter: https://nitter.net/settings/your_twitter_data/request_data
13:26:10GradientCat quits [Client Quit]
13:36:12FiTheArchiver joins
13:37:45FiTheArchiver quits [Client Quit]
13:44:58HP_Archivist (HP_Archivist) joins
14:21:57cuphead2527480 quits [Client Quit]
14:24:21etnguyen03 (etnguyen03) joins
14:45:56etnguyen03 quits [Client Quit]
14:48:14HP_Archivist quits [Ping timeout: 240 seconds]
14:50:08etnguyen03 (etnguyen03) joins
15:09:00xkey quits [Quit: WeeChat 4.6.3]
15:09:10xkey (xkey) joins
15:35:11grill (grill) joins
15:44:59HP_Archivist (HP_Archivist) joins
16:02:59nine quits [Ping timeout: 260 seconds]
16:04:09<h2ibot>OrIdow6 edited Itch.io (+152, /* Site structure notes */): https://wiki.archiveteam.org/?diff=56593&oldid=56590
16:15:34grill quits [Ping timeout: 240 seconds]
16:17:46grill (grill) joins
16:22:14nicolas17 quits [Ping timeout: 260 seconds]
16:23:24nicolas17 joins
16:27:58nine joins
16:27:58nine quits [Changing host]
16:27:58nine (nine) joins
16:32:07etnguyen03 quits [Client Quit]
16:48:29HP_Archivist quits [Ping timeout: 260 seconds]
16:48:33aars quits [Quit: The Lounge - https://thelounge.chat]
16:51:55GradientCat (GradientCat) joins
16:52:53etnguyen03 (etnguyen03) joins
16:53:52aars joins
17:04:37etnguyen03 quits [Client Quit]
17:16:23HP_Archivist (HP_Archivist) joins
17:23:29nine quits [Ping timeout: 260 seconds]
17:23:42etnguyen03 (etnguyen03) joins
17:24:19ywaltjs joins
17:24:20<h2ibot>TriangleDemon edited Colors! (+61): https://wiki.archiveteam.org/?diff=56594&oldid=56577
17:25:28<ywaltjs>hey, could someone help me with setting up a custom seesaw kit setup ? i did try to build wget-at from the github repo, but i still encounter compatibility issues... is there a link for the fully working prebuilt binary ?
17:26:41Webuser902243 joins
17:26:46<Webuser902243>Can I write here
17:26:48<Webuser902243>good
17:27:21<h2ibot>TriangleDemon edited DeviantArt (+29): https://wiki.archiveteam.org/?diff=56595&oldid=53247
17:27:56<Webuser902243>Guys a very old Russian-speaking tech forum is going to die very soon, oszone.net. If someone could archive it in its entirety people including myself would be very grateful
17:30:00<Webuser902243>c3manu thanks!
17:31:29egallager quits [Quit: This computer has gone to sleep]
17:31:38Webuser902243 quits [Client Quit]
17:33:35awauwa quits [Quit: awauwa]
17:35:25NomToxic joins
17:36:06pattiobear joins
17:45:01Wohlstand (Wohlstand) joins
17:48:14emily quits [Quit: ZNC 1.10.1 - https://znc.in]
17:49:14pseudorizer (pseudorizer) joins
17:55:22Wohlstand quits [Client Quit]
17:55:36Wohlstand (Wohlstand) joins
18:05:05nine joins
18:05:05nine quits [Changing host]
18:05:05nine (nine) joins
18:22:31Island joins
18:29:45<nicolas17>cruller: that only works for saving *your own* account
18:33:17NomToxic quits [Client Quit]
18:47:00Shyy4 joins
18:50:37etnguyen03 quits [Client Quit]
18:54:29cuphead2527480 (Cuphead2527480) joins
19:09:04nine quits [Ping timeout: 260 seconds]
19:11:40FortalezaDelGuerrero (FortalezaDelGuerrero) joins
19:12:11nine joins
19:12:11nine quits [Changing host]
19:12:11nine (nine) joins
19:12:58FortalezaDelGuerrero leaves
19:14:55FortalezaDelGuerrero (FortalezaDelGuerrero) joins
19:16:07Webuser498073 joins
19:16:19Webuser498073 quits [Client Quit]
19:17:46pattiobear quits [Client Quit]
19:18:56egallager joins
19:24:04nine quits [Client Quit]
19:24:16nine joins
19:24:16nine quits [Changing host]
19:24:16nine (nine) joins
19:27:45<h2ibot>Cooljeanius edited Colors! (+15, /* Problems */ copyedit): https://wiki.archiveteam.org/?diff=56596&oldid=56594
19:29:26<ywaltjs>hey, could someone help me with setting up a custom seesaw kit setup ? i did try to build wget-at from the github repo, but i still encounter compatibility issues... is there a link for the fully working prebuilt binary ?
19:29:45<h2ibot>Cooljeanius edited Colors! (+24, /* Site structure */ copyedit): https://wiki.archiveteam.org/?diff=56597&oldid=56596
19:39:06Hackerpcs (Hackerpcs) joins
19:42:25FortalezaDelGuerrero leaves
19:50:00dabs joins
20:11:52<TheTechRobo>ywaltjs: There is a Docker image in the wget-lua repository that should work for building it.
20:16:34grill quits [Ping timeout: 240 seconds]
20:23:17<gamer191-1|m>Random thought: aren’t “clean connections” mostly unnecessary nowadays due to https? Perhaps there should be a “unclean connection” mode that blocks http traffic (and just fails the job if it attempts to make an http request) since I assume most jobs would still pass and that would remove most of the connection requirements
20:24:19Wohlstand quits [Client Quit]
20:28:24Guest58 quits [Read error: Connection reset by peer]
20:30:01Guest58 joins
20:31:41<TheTechRobo>I think the policy is more of a "better safe than sorry" thing. Also, plenty of connections do still inject stuff into DNS (and we don't support DNS over HTTPS yet).
20:32:39<katia>also - things can still be blocked - see .it internet
20:35:22<murb>katia: is that more a problem with consumer internet connections?
20:37:48<katia>murb, it's *all* isps
20:39:42<murb>sounds expensive.
20:41:06<katia>murb, in order to run an isp in .it you have to sign up for the censorship platform
20:41:15<katia>and censor what they push within 30 mins
20:41:20<katia>can be cidrs or hostnames
20:41:24<katia>they even support ipv6
20:41:32<murb>wtf do you do with hostnames?
20:41:35<katia>dns
20:42:14<katia>https://github.com/fuckpiracyshield/data/blob/main/Manuale%20Piracy%20Shield%20ISP.pdf
20:42:40<murb>so you're a small biz, you take a rack of two couple of transit feeds, some peering. would you be expected to signup, given you're not providing much to 3rd parties?
20:42:48<katia>Stato "Open": durante questo stato, i provider possono accedere al ticket. La
20:42:48<katia>durata di un ticket è di 30 minuti
20:42:52<murb>i mean you might not even be a buisness.
20:43:01<katia>murb, i guess all your upstreams would implement it already?
20:43:45<murb>doesn't that cause much breakage?
20:43:57<katia>sure does
20:44:01<murb>what if the people who control the DNS for the hostnames are not entirely nice people?
20:44:31<katia>hmm?
20:44:45<murb>katia: i make my evil censored hostname resolve to critical infra.
20:45:01<katia>ah
20:45:14<katia>do it murb
20:45:33Wohlstand (Wohlstand) joins
20:47:15etnguyen03 (etnguyen03) joins
20:48:57<murb>nice would be a list of the IPs used by the platform by say major ISPs for such resolution.
20:49:08<murb>so you can give different answers depending on who is asking.
20:49:13<katia>hehe
20:49:18<murb>(maybe also consider the TTL)
20:49:50<murb>and or other fingerprinting.
20:50:30Guest58 quits [Client Quit]
20:58:04<cruller>nicolas17: Yes, in any case, enumerating tweets and archiving individual tweets may need to be separated into different steps, I think.
20:59:54ywaltjs quits [Quit: Ooops, wrong browser tab.]
21:00:24<cruller>I still clearly remember TheTechRobo saying that the former was the main problem.
21:01:14GradientCat quits [Quit: Connection closed for inactivity]
21:03:20sec^nd quits [Remote host closed the connection]
21:04:51<cruller>I'm wondering how effective https://github.com/webrecorder/browsertrix-behaviors is for enumerating tweets. (I know there is a problem with their WARC writing.)
21:05:21FortalezaDelGuerrero (FortalezaDelGuerrero) joins
21:05:47FortalezaDelGuerrero leaves
21:09:54cascode quits [Ping timeout: 240 seconds]
21:10:34nine quits [Ping timeout: 240 seconds]
21:12:19cascode joins
21:12:26nine joins
21:12:26nine quits [Changing host]
21:12:26nine (nine) joins
21:24:34etnguyen03 quits [Client Quit]
21:29:15trix quits [Quit: trix]
21:29:55trix (trix) joins
21:40:46Larsenv quits [Remote host closed the connection]
21:41:22Larsenv (Larsenv) joins
21:43:46etnguyen03 (etnguyen03) joins
21:49:34nine quits [Ping timeout: 240 seconds]
21:50:13nine joins
21:50:13nine quits [Changing host]
21:50:13nine (nine) joins
22:29:24etnguyen03 quits [Client Quit]
22:30:36Webuser123593 joins
22:35:09Webuser123593 quits [Client Quit]
22:48:15Dada quits [Remote host closed the connection]
23:13:22lennier2_ joins
23:16:24lennier2 quits [Ping timeout: 260 seconds]
23:41:12GradientCat (GradientCat) joins
23:48:51JayEmbee (JayEmbee) joins