00:21:31 | | Hackerpcs (Hackerpcs) joins |
00:24:24 | | cascode quits [Ping timeout: 260 seconds] |
00:24:43 | | cascode joins |
00:30:19 | | lemuria (lemuria) joins |
01:15:31 | | Webuser123339 joins |
01:18:46 | | Webuser123339 quits [Client Quit] |
01:19:46 | | Wohlstand quits [Quit: Wohlstand] |
01:36:44 | | nicolas17 quits [Ping timeout: 260 seconds] |
01:39:50 | <h2ibot> | Cooljeanius edited Itch.io (+51, Use URL template more; copyedit; add explicit…): https://wiki.archiveteam.org/?diff=56589&oldid=56588 |
01:40:27 | | Island quits [Read error: Connection reset by peer] |
01:40:44 | | Island joins |
01:40:50 | <h2ibot> | Cooljeanius edited Itch.io (-2, Fix heading level for references section): https://wiki.archiveteam.org/?diff=56590&oldid=56589 |
01:46:33 | | nicolas17 joins |
01:49:31 | | cuphead2527480 quits [Quit: Connection closed for inactivity] |
01:51:54 | | cascode quits [Ping timeout: 260 seconds] |
01:52:19 | | cascode joins |
01:59:29 | | cascode quits [Ping timeout: 260 seconds] |
02:02:18 | | cascode joins |
02:02:59 | | nicolas17 quits [Ping timeout: 260 seconds] |
02:03:46 | | cascode quits [Read error: Connection reset by peer] |
02:04:03 | | cascode joins |
02:04:43 | | nicolas17 joins |
02:12:52 | <pokechu22> | All of the data JSONs for itch.io have finished downloading. But it doesn't directly mark NSFW games, only ones that are tagged NSFW (and that's not a complete list; e.g. there are games tagged "porn" but not "nsfw"). I'll try to come up with a list of tags based on wiktionary I guess |
02:16:13 | | etnguyen03 quits [Client Quit] |
02:16:34 | | etnguyen03 (etnguyen03) joins |
02:24:26 | <pokechu22> | ... ok, turns out there are a lot of words in subcategories of https://en.wiktionary.org/wiki/Category:en:Sex which I don't really want to fully understand |
02:25:52 | | BornOn420 quits [Remote host closed the connection] |
02:25:55 | <BlankEclair> | > a pussy and a pulse |
02:25:59 | <BlankEclair> | interesting requirements |
02:26:08 | | nicolas17_ joins |
02:26:26 | | BornOn420 (BornOn420) joins |
02:26:54 | | nicolas17 quits [Ping timeout: 260 seconds] |
02:28:37 | | BornOn420 quits [Max SendQ exceeded] |
02:29:07 | | BornOn420 (BornOn420) joins |
02:31:17 | | BornOn420 quits [Max SendQ exceeded] |
02:31:29 | | flotwig quits [Read error: Connection reset by peer] |
02:31:41 | | flotwig joins |
02:31:48 | | BornOn420 (BornOn420) joins |
02:55:20 | | etnguyen03 quits [Remote host closed the connection] |
02:55:38 | <pokechu22> | ... and now I'm having trouble with the reverse https://en.wikipedia.org/wiki/Scunthorpe_problem where I was picking up https://demonixis.itch.io/mars-extraction due to s-ex becoming sex |
03:06:07 | | flotwig quits [Client Quit] |
03:07:12 | | flotwig joins |
03:23:02 | | BearFortress_ joins |
03:25:49 | | BearFortress quits [Ping timeout: 260 seconds] |
03:30:29 | | lennier2_ quits [Ping timeout: 260 seconds] |
03:31:48 | | lennier2 joins |
03:54:41 | | GradientCat quits [Quit: Connection closed for inactivity] |
04:03:17 | | Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…] |
04:27:12 | <pokechu22> | I'm running a second list (https://transfer.archivete.am/DVdSZ/itch.io_nsfw_games.txt.zst) with 87267 + 24611 games in #archivebot; this still won't get downloads but we can prioritize potentially-NSFW games. Note that https://web.archive.org/web/20250716040724/https://itch.io/games/nsfw only says there's 28,144 NSFW games so this is a lot of extra ones, but this is the |
04:27:13 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/DVdSZ/itch.io_nsfw_games.txt.zst) |
04:27:14 | <pokechu22> | easiest thing to do |
04:30:30 | | nicolas17_ is now known as nicolas17 |
04:31:10 | | eythian quits [Quit: http://quassel-irc.org - Chat comfortabel. Waar dan ook.] |
04:32:37 | | eythian joins |
04:38:13 | | Guest58 joins |
04:39:17 | | datechnoman (datechnoman) joins |
04:44:34 | | DogsRNice quits [Ping timeout: 260 seconds] |
04:45:25 | | DogsRNice joins |
04:47:12 | | DogsRNice_ joins |
04:49:17 | | chrismeller8 quits [Quit: chrismeller8] |
04:49:54 | | DogsRNice quits [Ping timeout: 240 seconds] |
04:49:55 | | chrismeller8 (chrismeller) joins |
04:56:43 | | dabs quits [Read error: Connection reset by peer] |
05:24:16 | | DogsRNice_ quits [Read error: Connection reset by peer] |
06:32:22 | | Island quits [Read error: Connection reset by peer] |
06:41:14 | | HP_Archivist quits [Ping timeout: 240 seconds] |
06:42:04 | <gamer191-1|m> | <pokechu22> "I'm running a second list (https..." <- If it’s small enough to run in archivebot, why not just archive all games? |
06:43:40 | <pokechu22> | I've also got one running all games, but they have rate-limiting where running the whole list of games will take probably about a week to just do the initial page HTML (longer to also get images and whatever else it'll discover) |
06:44:10 | <pokechu22> | note that both jobs *won't* be able to get downloads (even for free or name your price games) since those are POST-based and archivebot doesn't do that |
06:57:51 | | awauwa (awauwa) joins |
07:02:30 | <gamer191-1|m> | <pokechu22> "I've also got one running all..." <- Nice |
07:02:46 | <gamer191-1|m> | <pokechu22> "note that both jobs *won't* be..." <- Oh, what’s the plan for those? |
07:04:17 | <gamer191-1|m> | Out of interest, are you archiving the data.json files (I’m not saying you necessarily should, they probably aren’t worth archiving) |
07:10:23 | <pokechu22> | Yes; I did all of the data.json files split across 6 archivebot !ao < list jobs for speed reasons. They were needed to be able to search all games by tags, given that the whole problem is that the actual search isn't showing NSFW games |
07:11:01 | <pokechu22> | OrIdow6 is looking into handling downloads |
07:13:33 | <pokechu22> | oh, and the data.json files also include prices, so they can be used to identify free/name your price games |
07:23:47 | <h2ibot> | Manu edited Discourse/archived (+87, Queued forums.pimoroni.com): https://wiki.archiveteam.org/?diff=56591&oldid=56458 |
07:30:25 | <BlankEclair> | fyi, some files in a game can be free, and some can be paid |
07:30:27 | <BlankEclair> | https://exodrifter.itch.io/gender-dysphoria |
07:30:38 | <BlankEclair> | binaries are free, ost is >= US$2 |
07:40:23 | <BlankEclair> | unrelated, but in a personal context and a twitter account to burn, what's the best way to archive a twitter account? |
07:45:11 | | HP_Archivist (HP_Archivist) joins |
08:22:56 | <h2ibot> | Exorcism edited Discourse/archived (+275): https://wiki.archiveteam.org/?diff=56592&oldid=56591 |
08:27:45 | <c3manu> | BlankEclair: we currently have none. we had a few nitter instances we could use for that a while ago, but those have stopped working (can't just take any nitter instance, since #archivebot's load+traffic would probably cause an unreasonable amount of costs for whoever is running it) |
08:28:20 | <c3manu> | twitter has done it's best to close off the page to crawlers etc. (except you have that Russian bot subscription or whatever) |
08:28:26 | <BlankEclair> | i probably should've clarified that i'm fine w/ making my own dump ^^; |
08:28:42 | <c3manu> | for now we just collect the accounts in case archiving them gets possible again: https://pad.notkiska.pw/p/archivebot-twitter |
08:28:44 | <c3manu> | oh, okay |
08:28:58 | <c3manu> | but that's gonna have the same problem pretty much |
08:29:03 | <c3manu> | (i think) |
08:29:48 | <BlankEclair> | > in a personal context and a twitter account to burn |
08:29:49 | <BlankEclair> | :p |
08:31:00 | <c3manu> | how much stuff do you have to archive? |
08:31:51 | <BlankEclair> | just one account, though i don't know how big it is |
08:32:28 | <c3manu> | hm.. i can't think of anything reasonable off the top of my head |
08:33:01 | <BlankEclair> | ouch ^^; |
08:35:15 | <pabs> | the wiki page has a few workarounds, but they aren't very useful for archiving |
08:43:43 | <BlankEclair> | would a self-hosted nitter instance be, while overkill, effective? |
08:48:24 | | pabs quits [Ping timeout: 260 seconds] |
08:48:59 | | HP_Archivist quits [Ping timeout: 260 seconds] |
08:52:02 | | pabs (pabs) joins |
08:54:29 | <PredatorIWD25> | BlankEclair: I have an old, basic extractor config for gallery-dl for Twitter that should archive everything off of someone's wall, including retweets and metadata of the posts. You should definitely double check if it all works properly though and if this solution is enough for you. |
08:54:32 | <PredatorIWD25> | Config: https://pastebin.com/tFTqS19i Example command after you log into your Twitter account on Firefox: gallery-dl --cookies-from-browser firefox --config gallery-dl_config.conf https://x.com/x/ |
08:54:32 | <eggdrop> | nitter: https://nitter.net/x/ |
09:29:14 | | Aoede quits [Ping timeout: 260 seconds] |
09:34:08 | <@OrIdow6> | BlankEclair: Would holding down the end key and using SingleFIle be enough for you? |
09:34:28 | | APOLLO03 joins |
09:45:00 | | HP_Archivist (HP_Archivist) joins |
09:48:20 | | APOLLO03a joins |
09:48:30 | | APOLLO03a quits [Client Quit] |
09:49:34 | | APOLLO03 quits [Ping timeout: 240 seconds] |
09:50:36 | | APOLLO03 joins |
10:02:18 | | Hackerpcs quits [Quit: Hackerpcs] |
10:02:36 | | APOLLO03 quits [Client Quit] |
10:05:12 | | APOLLO03 joins |
10:05:27 | <BlankEclair> | OrIdow6: not fully sure how singlefile works, but i don't _think_ that it would save full-res images |
10:12:30 | | APOLLO03 quits [Client Quit] |
10:26:38 | | GradientCat (GradientCat) joins |
10:48:45 | | Dada joins |
10:48:54 | | HP_Archivist quits [Ping timeout: 240 seconds] |
11:00:01 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:46 | | Bleo182600722719623455222 joins |
11:38:45 | | ineffyble (ineffyble) joins |
11:45:00 | | HP_Archivist (HP_Archivist) joins |
11:57:50 | | etnguyen03 (etnguyen03) joins |
12:02:09 | | cuphead2527480 (Cuphead2527480) joins |
12:38:28 | | APOLLO03 joins |
12:44:36 | | etnguyen03 quits [Client Quit] |
12:48:14 | | HP_Archivist quits [Ping timeout: 240 seconds] |
13:09:29 | <cruller> | How about requesting your Twitter data from https://twitter.com/settings/your_twitter_data/request_data, extracting the URLs of individual tweets from there, and saving them with gallery-dl or something similar? |
13:09:29 | <eggdrop> | nitter: https://nitter.net/settings/your_twitter_data/request_data |
13:26:10 | | GradientCat quits [Client Quit] |
13:36:12 | | FiTheArchiver joins |
13:37:45 | | FiTheArchiver quits [Client Quit] |
13:44:58 | | HP_Archivist (HP_Archivist) joins |
14:21:57 | | cuphead2527480 quits [Client Quit] |
14:24:21 | | etnguyen03 (etnguyen03) joins |
14:45:56 | | etnguyen03 quits [Client Quit] |
14:48:14 | | HP_Archivist quits [Ping timeout: 240 seconds] |
14:50:08 | | etnguyen03 (etnguyen03) joins |
15:09:00 | | xkey quits [Quit: WeeChat 4.6.3] |
15:09:10 | | xkey (xkey) joins |
15:35:11 | | grill (grill) joins |
15:44:59 | | HP_Archivist (HP_Archivist) joins |
16:02:59 | | nine quits [Ping timeout: 260 seconds] |
16:04:09 | <h2ibot> | OrIdow6 edited Itch.io (+152, /* Site structure notes */): https://wiki.archiveteam.org/?diff=56593&oldid=56590 |
16:15:34 | | grill quits [Ping timeout: 240 seconds] |
16:17:46 | | grill (grill) joins |
16:22:14 | | nicolas17 quits [Ping timeout: 260 seconds] |
16:23:24 | | nicolas17 joins |
16:27:58 | | nine joins |
16:27:58 | | nine is now authenticated as nine |
16:27:58 | | nine quits [Changing host] |
16:27:58 | | nine (nine) joins |
16:32:07 | | etnguyen03 quits [Client Quit] |
16:48:29 | | HP_Archivist quits [Ping timeout: 260 seconds] |
16:48:33 | | aars quits [Quit: The Lounge - https://thelounge.chat] |
16:51:55 | | GradientCat (GradientCat) joins |
16:52:53 | | etnguyen03 (etnguyen03) joins |
16:53:52 | | aars joins |
17:04:37 | | etnguyen03 quits [Client Quit] |
17:16:23 | | HP_Archivist (HP_Archivist) joins |
17:23:29 | | nine quits [Ping timeout: 260 seconds] |
17:23:42 | | etnguyen03 (etnguyen03) joins |
17:24:19 | | ywaltjs joins |
17:24:20 | <h2ibot> | TriangleDemon edited Colors! (+61): https://wiki.archiveteam.org/?diff=56594&oldid=56577 |
17:25:28 | <ywaltjs> | hey, could someone help me with setting up a custom seesaw kit setup ? i did try to build wget-at from the github repo, but i still encounter compatibility issues... is there a link for the fully working prebuilt binary ? |
17:26:41 | | Webuser902243 joins |
17:26:46 | <Webuser902243> | Can I write here |
17:26:48 | <Webuser902243> | good |
17:27:21 | <h2ibot> | TriangleDemon edited DeviantArt (+29): https://wiki.archiveteam.org/?diff=56595&oldid=53247 |
17:27:56 | <Webuser902243> | Guys a very old Russian-speaking tech forum is going to die very soon, oszone.net. If someone could archive it in its entirety people including myself would be very grateful |
17:30:00 | <Webuser902243> | c3manu thanks! |
17:31:29 | | egallager quits [Quit: This computer has gone to sleep] |
17:31:38 | | Webuser902243 quits [Client Quit] |
17:33:35 | | awauwa quits [Quit: awauwa] |
17:35:25 | | NomToxic joins |
17:36:06 | | pattiobear joins |
17:42:46 | | NomToxic is now authenticated as NomToxic |
17:45:01 | | Wohlstand (Wohlstand) joins |
17:48:14 | | emily quits [Quit: ZNC 1.10.1 - https://znc.in] |
17:49:14 | | pseudorizer (pseudorizer) joins |
17:55:22 | | Wohlstand quits [Client Quit] |
17:55:36 | | Wohlstand (Wohlstand) joins |
18:05:05 | | nine joins |
18:05:05 | | nine is now authenticated as nine |
18:05:05 | | nine quits [Changing host] |
18:05:05 | | nine (nine) joins |
18:22:31 | | Island joins |
18:29:45 | <nicolas17> | cruller: that only works for saving *your own* account |
18:33:17 | | NomToxic quits [Client Quit] |
18:47:00 | | Shyy4 joins |
18:50:37 | | etnguyen03 quits [Client Quit] |
18:54:29 | | cuphead2527480 (Cuphead2527480) joins |
19:09:04 | | nine quits [Ping timeout: 260 seconds] |
19:11:40 | | FortalezaDelGuerrero (FortalezaDelGuerrero) joins |
19:12:11 | | nine joins |
19:12:11 | | nine is now authenticated as nine |
19:12:11 | | nine quits [Changing host] |
19:12:11 | | nine (nine) joins |
19:12:58 | | FortalezaDelGuerrero leaves |
19:14:55 | | FortalezaDelGuerrero (FortalezaDelGuerrero) joins |
19:16:07 | | Webuser498073 joins |
19:16:19 | | Webuser498073 quits [Client Quit] |
19:17:46 | | pattiobear quits [Client Quit] |
19:18:56 | | egallager joins |
19:24:04 | | nine quits [Client Quit] |
19:24:16 | | nine joins |
19:24:16 | | nine is now authenticated as nine |
19:24:16 | | nine quits [Changing host] |
19:24:16 | | nine (nine) joins |
19:27:45 | <h2ibot> | Cooljeanius edited Colors! (+15, /* Problems */ copyedit): https://wiki.archiveteam.org/?diff=56596&oldid=56594 |
19:29:26 | <ywaltjs> | hey, could someone help me with setting up a custom seesaw kit setup ? i did try to build wget-at from the github repo, but i still encounter compatibility issues... is there a link for the fully working prebuilt binary ? |
19:29:45 | <h2ibot> | Cooljeanius edited Colors! (+24, /* Site structure */ copyedit): https://wiki.archiveteam.org/?diff=56597&oldid=56596 |
19:39:06 | | Hackerpcs (Hackerpcs) joins |
19:42:25 | | FortalezaDelGuerrero leaves |
19:50:00 | | dabs joins |
20:11:52 | <TheTechRobo> | ywaltjs: There is a Docker image in the wget-lua repository that should work for building it. |
20:16:34 | | grill quits [Ping timeout: 240 seconds] |
20:23:17 | <gamer191-1|m> | Random thought: aren’t “clean connections” mostly unnecessary nowadays due to https? Perhaps there should be a “unclean connection” mode that blocks http traffic (and just fails the job if it attempts to make an http request) since I assume most jobs would still pass and that would remove most of the connection requirements |
20:24:19 | | Wohlstand quits [Client Quit] |
20:28:24 | | Guest58 quits [Read error: Connection reset by peer] |
20:30:01 | | Guest58 joins |
20:31:41 | <TheTechRobo> | I think the policy is more of a "better safe than sorry" thing. Also, plenty of connections do still inject stuff into DNS (and we don't support DNS over HTTPS yet). |
20:32:39 | <katia> | also - things can still be blocked - see .it internet |
20:35:22 | <murb> | katia: is that more a problem with consumer internet connections? |
20:37:48 | <katia> | murb, it's *all* isps |
20:39:42 | <murb> | sounds expensive. |
20:41:06 | <katia> | murb, in order to run an isp in .it you have to sign up for the censorship platform |
20:41:15 | <katia> | and censor what they push within 30 mins |
20:41:20 | <katia> | can be cidrs or hostnames |
20:41:24 | <katia> | they even support ipv6 |
20:41:32 | <murb> | wtf do you do with hostnames? |
20:41:35 | <katia> | dns |
20:42:14 | <katia> | https://github.com/fuckpiracyshield/data/blob/main/Manuale%20Piracy%20Shield%20ISP.pdf |
20:42:40 | <murb> | so you're a small biz, you take a rack of two couple of transit feeds, some peering. would you be expected to signup, given you're not providing much to 3rd parties? |
20:42:48 | <katia> | Stato "Open": durante questo stato, i provider possono accedere al ticket. La |
20:42:48 | <katia> | durata di un ticket è di 30 minuti |
20:42:52 | <murb> | i mean you might not even be a buisness. |
20:43:01 | <katia> | murb, i guess all your upstreams would implement it already? |
20:43:45 | <murb> | doesn't that cause much breakage? |
20:43:57 | <katia> | sure does |
20:44:01 | <murb> | what if the people who control the DNS for the hostnames are not entirely nice people? |
20:44:31 | <katia> | hmm? |
20:44:45 | <murb> | katia: i make my evil censored hostname resolve to critical infra. |
20:45:01 | <katia> | ah |
20:45:14 | <katia> | do it murb |
20:45:33 | | Wohlstand (Wohlstand) joins |
20:47:15 | | etnguyen03 (etnguyen03) joins |
20:48:57 | <murb> | nice would be a list of the IPs used by the platform by say major ISPs for such resolution. |
20:49:08 | <murb> | so you can give different answers depending on who is asking. |
20:49:13 | <katia> | hehe |
20:49:18 | <murb> | (maybe also consider the TTL) |
20:49:50 | <murb> | and or other fingerprinting. |
20:50:30 | | Guest58 quits [Client Quit] |
20:58:04 | <cruller> | nicolas17: Yes, in any case, enumerating tweets and archiving individual tweets may need to be separated into different steps, I think. |
20:59:54 | | ywaltjs quits [Quit: Ooops, wrong browser tab.] |
21:00:24 | <cruller> | I still clearly remember TheTechRobo saying that the former was the main problem. |
21:01:14 | | GradientCat quits [Quit: Connection closed for inactivity] |
21:03:20 | | sec^nd quits [Remote host closed the connection] |
21:04:51 | <cruller> | I'm wondering how effective https://github.com/webrecorder/browsertrix-behaviors is for enumerating tweets. (I know there is a problem with their WARC writing.) |
21:05:21 | | FortalezaDelGuerrero (FortalezaDelGuerrero) joins |
21:05:47 | | FortalezaDelGuerrero leaves |
21:09:54 | | cascode quits [Ping timeout: 240 seconds] |
21:10:34 | | nine quits [Ping timeout: 240 seconds] |
21:12:19 | | cascode joins |
21:12:26 | | nine joins |
21:12:26 | | nine is now authenticated as nine |
21:12:26 | | nine quits [Changing host] |
21:12:26 | | nine (nine) joins |
21:24:34 | | etnguyen03 quits [Client Quit] |
21:29:15 | | trix quits [Quit: trix] |
21:29:55 | | trix (trix) joins |
21:40:46 | | Larsenv quits [Remote host closed the connection] |
21:41:22 | | Larsenv (Larsenv) joins |
21:43:46 | | etnguyen03 (etnguyen03) joins |
21:49:34 | | nine quits [Ping timeout: 240 seconds] |
21:50:13 | | nine joins |
21:50:13 | | nine is now authenticated as nine |
21:50:13 | | nine quits [Changing host] |
21:50:13 | | nine (nine) joins |
22:29:24 | | etnguyen03 quits [Client Quit] |
22:30:36 | | Webuser123593 joins |
22:35:09 | | Webuser123593 quits [Client Quit] |
22:48:15 | | Dada quits [Remote host closed the connection] |
23:13:22 | | lennier2_ joins |
23:16:24 | | lennier2 quits [Ping timeout: 260 seconds] |
23:41:12 | | GradientCat (GradientCat) joins |
23:48:51 | | JayEmbee (JayEmbee) joins |