00:01:49 | <@JAA> | → #telegrab |
00:02:20 | <nicolas17> | JAA: I believe telegram was just an example |
00:03:21 | <kiska> | eightthree: For my grafana dashboard please limit yourself to "last 24 hours" |
00:03:37 | <@JAA> | Fair |
00:03:51 | <nicolas17> | yes, I used 30 days only to take that screenshot, definitely don't use long periods *and* auto-update |
00:04:34 | <kiska> | nicolas17: If you did that, I will end you |
00:04:51 | <nicolas17> | kiska: https://transfer.archivete.am/inline/5vyqw/screenshot.png was me |
00:05:01 | <kiska> | I saw it |
00:05:26 | <kiska> | Do you need a "todo" graph? |
00:05:34 | <nicolas17> | idk |
00:05:51 | <nicolas17> | I took the existing graph and set min=0 |
00:06:09 | <nicolas17> | and then I realized I had to zoom out a lot to make the line not look entirely flat :D |
00:06:22 | <fireonlive> | last 365 days update every 30s? |
00:06:49 | <kiska> | I don't have data for that period + I will end you |
00:06:58 | <fireonlive> | is that a promise |
00:07:01 | <fireonlive> | :3 |
00:07:17 | <nicolas17> | kiska: https://shipadick.com/collections/ship-a-brick/products/ship-a-brick-custom-message |
00:07:26 | <nicolas17> | get this delivered through his window |
00:18:08 | | Wohlstand (Wohlstand) joins |
00:36:07 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
00:45:15 | <h2ibot> | Mewtwodestroyer edited List of websites excluded from the Wayback Machine (+19, added http://667u.com/. I can't believe dj…): https://wiki.archiveteam.org/?diff=51698&oldid=51697 |
00:57:59 | | icedice (icedice) joins |
00:58:05 | | jasons (jasons) joins |
01:00:18 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51699&oldid=51698 |
01:15:20 | | Wohlstand quits [Client Quit] |
01:25:16 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
01:45:38 | | icedice quits [Client Quit] |
01:57:50 | | jasons quits [Ping timeout: 240 seconds] |
02:18:15 | | lya joins |
02:18:46 | | lya quits [Remote host closed the connection] |
02:18:51 | | lya joins |
02:21:14 | | lya quits [Remote host closed the connection] |
02:40:58 | | pabs quits [Remote host closed the connection] |
02:50:50 | | pabs (pabs) joins |
03:01:50 | | jasons (jasons) joins |
03:17:29 | <fireonlive> | dj rainbow ejaculation please :( |
03:33:03 | <eightthree> | fireonlive: not too mention that Rank (fowl) Sinatra smell |
03:33:43 | <eightthree> | +XD |
03:34:07 | <fireonlive> | remember, clean behind your foreskin! |
03:35:02 | <eightthree> | is AI used, and if not what is, used to remove "undesirable" material from the queue/archive? i.e. how is moderation achieved? |
03:36:37 | <eightthree> | https://wiki.archiveteam.org/index.php/Porn ok... |
03:38:21 | <eightthree> | https://wiki.archiveteam.org/index.php/Special:WhatLinksHere/Porn |
03:38:21 | <eightthree> | > No pages link to Porn. |
03:38:21 | <eightthree> | so that's why I never stumbled on this in the wiki... |
03:40:42 | | Ruthalas593 (Ruthalas) joins |
03:41:03 | <fireonlive> | i dunno if there's a list of them but there's probably a few pages that aren't linked to by anything |
03:41:20 | <fireonlive> | ah, https://wiki.archiveteam.org/index.php?title=Special:LonelyPages&limit=500&offset=0 |
03:41:26 | | Ruthalas59 quits [Read error: Connection reset by peer] |
03:41:27 | | Ruthalas593 is now known as Ruthalas59 |
03:42:25 | <fireonlive> | TIL we once did a press release: https://wiki.archiveteam.org/index.php/Archive_Team_press_releases |
04:00:20 | | jasons quits [Ping timeout: 240 seconds] |
04:29:34 | <fireonlive> | eightthree: unfortunately there is no way to only save gay porn |
04:29:38 | <fireonlive> | so we must just save it all |
04:31:04 | <eightthree> | fireonlive: my human advisors tell me it is recommended to laugh at this joke, hahaha |
04:31:17 | <fireonlive> | ;) |
04:44:08 | | DogsRNice quits [Read error: Connection reset by peer] |
05:03:54 | | jasons (jasons) joins |
05:50:20 | <h2ibot> | Pokechu22 edited Jira (+180, issues.apache.org failed, will need to update…): https://wiki.archiveteam.org/?diff=51700&oldid=51687 |
06:01:19 | | jasons quits [Ping timeout: 272 seconds] |
06:03:51 | | IRC2DC quits [Ping timeout: 272 seconds] |
07:00:41 | | BlueMaxima quits [Read error: Connection reset by peer] |
07:03:38 | <h2ibot> | Hina.K edited URLTeam (+43, /* Alive */): https://wiki.archiveteam.org/?diff=51701&oldid=51653 |
07:04:37 | | jasons (jasons) joins |
07:31:09 | | Arcorann (Arcorann) joins |
08:01:01 | | jasons quits [Ping timeout: 272 seconds] |
08:33:08 | | Overlordz joins |
09:04:17 | | jasons (jasons) joins |
09:15:56 | | jacksonchen666 (jacksonchen666) joins |
09:27:11 | | Lixusboooo joins |
09:27:22 | | Lixusboooo quits [Remote host closed the connection] |
09:36:21 | <jacksonchen666> | re queer.af: IP address is 65.108.48.233, only DNS doesn't resolve anymore (https://tech.lgbt/@ShadowJonathan/111917612478836930). not updating deathwatch yet. |
09:57:09 | | SootBector quits [Remote host closed the connection] |
09:57:32 | | SootBector (SootBector) joins |
10:00:06 | | Bleo18260 quits [Client Quit] |
10:01:30 | | Bleo18260 joins |
10:01:59 | | jasons quits [Ping timeout: 272 seconds] |
10:10:11 | | Island quits [Read error: Connection reset by peer] |
10:21:54 | | jacksonchen666 quits [Client Quit] |
10:26:32 | <@OrIdow6^2> | Re logging, I believe I read that the old logger let you prevent a message from being logged by prefixing it with something, do we have that now? |
10:26:37 | | @OrIdow6^2 is now known as @OrIdow6 |
11:01:34 | <h2ibot> | OrIdow6 edited Fediverse (+591, The history of why we don't archive it): https://wiki.archiveteam.org/?diff=51704&oldid=36059 |
11:01:35 | <h2ibot> | OrIdow6 edited Fediverse (+8, Fix template usage): https://wiki.archiveteam.org/?diff=51705&oldid=51704 |
11:01:36 | <h2ibot> | OrIdow6 edited Fediverse (+15, Fix template usage, part 2): https://wiki.archiveteam.org/?diff=51706&oldid=51705 |
11:02:00 | | igloo22225 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:25 | | igloo22225 (igloo22225) joins |
11:03:35 | <h2ibot> | OrIdow6 edited Mastodon (+139, We don't archive it): https://wiki.archiveteam.org/?diff=51707&oldid=50243 |
11:04:09 | <thuban> | OrIdow6: cf https://wiki.archiveteam.org/index.php?title=Mastodon&diff=prev&oldid=50243 |
11:04:45 | <@OrIdow6> | thuban: Huh, thanks |
11:05:18 | <@OrIdow6> | Have we actually done this in the last year? |
11:05:28 | | jasons (jasons) joins |
11:05:30 | <@OrIdow6> | Or just talked about changing this policy |
11:05:54 | <@OrIdow6> | Mostly I feel some explanation is owed to the curious for why it's in place |
11:06:01 | <thuban> | i don't think so, it's kind of moot due to the technical issues |
11:06:47 | <@OrIdow6> | ..... any compact description of the technical issues that could to up on the wiki? Or you can edit it yourself |
11:07:00 | <@OrIdow6> | ("....." not meant to be mocking) |
11:08:22 | <thuban> | current mastodon doesn't work without js, so archivebot can't handle it, and we don't have a mastodon project (in part due to the historical policy, in part because mastodon devs have made vague noises about fixing it, and in part for the usual reasons) |
11:08:33 | <@OrIdow6> | Ack, will update |
11:08:39 | <@OrIdow6> | Thanks |
11:10:23 | <thuban> | (iirc you can get individual toots as embeds, but that's quite limited obviously) |
11:20:48 | <aninternettroll> | can't you use the API? |
11:21:19 | <aninternettroll> | i guess it would require more work though |
11:23:07 | <thuban> | right, that would be 'a mastodon project', which we haven't done |
11:52:45 | <h2ibot> | OrIdow6 edited Mastodon (-10, No longer a policy, apparently): https://wiki.archiveteam.org/?diff=51711&oldid=51707 |
12:01:50 | | jasons quits [Ping timeout: 240 seconds] |
12:26:52 | <h2ibot> | OrIdow6 edited Fediverse (+396, /* ArchiveTeam and the Fediverse */ Not banned,…): https://wiki.archiveteam.org/?diff=51712&oldid=51706 |
12:37:09 | | Arcorann quits [Ping timeout: 272 seconds] |
12:43:01 | | Wohlstand (Wohlstand) joins |
13:05:35 | | jasons (jasons) joins |
13:06:08 | | ScenarioPlanet (ScenarioPlanet) joins |
13:53:24 | | ell1 (ell) joins |
13:54:45 | | ell1 quits [Client Quit] |
13:55:08 | | ell1 (ell) joins |
14:03:50 | | jasons quits [Ping timeout: 240 seconds] |
14:16:50 | | ^ quits [Ping timeout: 240 seconds] |
14:17:25 | | ^ (^) joins |
14:35:06 | | SootBector quits [Remote host closed the connection] |
14:37:00 | | SootBector (SootBector) joins |
14:42:33 | | SootBector quits [Remote host closed the connection] |
14:43:00 | | SootBector (SootBector) joins |
15:06:05 | | Wohlstand quits [Remote host closed the connection] |
15:07:42 | | jasons (jasons) joins |
15:30:03 | | sdomi quits [Ping timeout: 272 seconds] |
16:05:31 | | jasons quits [Ping timeout: 272 seconds] |
16:23:19 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
16:33:40 | | Darken (Darken) joins |
16:38:12 | <aninternettroll> | How would one start a project? Let's say that I wanted to archive mastodon, what are the requirements? |
16:44:59 | | Ketchup901 quits [Remote host closed the connection] |
16:48:17 | | Ketchup901 (Ketchup901) joins |
16:52:48 | | that_lurker quits [Quit: I am most likely running a system update] |
16:54:01 | <thuban> | aninternettroll: read and understand the code of prior dpos projects (https://wiki.archiveteam.org/index.php?title=Category:DPoS_project), then write your own with an appropriate pipeline.py (receive items from tracker, invoke wget) and <project>.lua (process response data and make additional requests as appropriate |
16:54:03 | <thuban> | https://github.com/ArchiveTeam/wget-lua/wiki#wget-with-lua-hooks) |
16:54:13 | <thuban> | this isn't very legible at present, sorry |
16:55:18 | | that_lurker (that_lurker) joins |
17:03:14 | | zhongfu_ (zhongfu) joins |
17:03:32 | | zhongfu quits [Read error: Connection reset by peer] |
17:08:22 | | jasons (jasons) joins |
17:12:01 | <aninternettroll> | Can I get a bit of a TL;DR? What more is there than to get a giant list of URLs and ask interrested people to archive them? |
17:14:13 | | icedice (icedice) joins |
17:23:57 | | icedice quits [Client Quit] |
17:32:21 | | icedice (icedice) joins |
17:36:24 | <thuban> | aninternettroll: most sites that can be handled that way go through archivebot. a dpos is really only necessary when (1) there are excessively strict rate limits (rare) or (2) it's non-trivial to generate the list of urls (common). |
17:36:34 | <thuban> | a conceptually simple 'item' like 'a user' or 'a post' involves potentially many web pages (user info, pages of comments), page assets (images, javascript), and api interactions (more javascript), all of which have to be retrieved for the item to render properly in the wayback machine. |
17:36:39 | <thuban> | it's often not possible to deduce those assets and api calls in advance; they have to be generated dynamically based on what we find in the page data. |
17:36:47 | <thuban> | in addition, many sites don't have a public index of all items. we can't simply get a giant list of all users or all posts; we have to start with what we can find and dynamically 'discover' new items through features like authorship (post->user), timelines (user->post), replies (post->post), and mentions (user->user). |
17:37:46 | <thuban> | the site-specific project code is what does this dynamic generation. does that answer your question? |
17:41:16 | <aninternettroll> | What does DPoS stand for? |
17:44:47 | <thuban> | "Distributed Preservation of Service". https://wiki.archiveteam.org/index.php/DPoS |
17:47:33 | <aninternettroll> | So what would such a DPoS script do for let's say mastodon? Is it's goal to find all the relevant links to be archived to make the page load? Could the script just go to the relevant API endpoints and call it a day? |
17:49:15 | <aninternettroll> | Anyway, thanks! I got the answer I needed at least (if you have a big list of URLs go to #archivebot) |
17:49:18 | <imer> | if you want wayback machine playback on archive.org it needs to visit everything a normal page load needs, and then you usually grab any extra info or legacy pages |
17:49:33 | <imer> | ... that might be linked to* |
17:51:11 | <aninternettroll> | How much does archivebot handle? I would assume it can get the pages a page links to (<a href="" />), but I guess it doesn't run javascript to fetch API calls? |
17:51:45 | <imer> | yes |
17:53:11 | | Darken quits [Ping timeout: 272 seconds] |
17:58:52 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
18:03:57 | | jasons quits [Ping timeout: 272 seconds] |
18:04:07 | <fireonlive> | mastodon is broken in AB (and i guess WBM?) now due to the way it is atm |
18:04:27 | <fireonlive> | /embed after a single post url works though |
18:04:41 | <aninternettroll> | "due to the way it is"? Anything in particular? |
18:04:49 | <fireonlive> | but the whole you need js/its a single page app(?) thing ruins it i believe |
18:04:59 | <fireonlive> | started with version 4 |
18:05:31 | <fireonlive> | they also refused to add any kind of nojs fallback sadly |
18:05:59 | <aninternettroll> | I thought the wayback machine did run javascript as well |
18:06:27 | <fireonlive> | hmm might just be an AB issue |
18:08:29 | <thuban> | wbm does run javascript, but the javascript and anything it requests have to be in the wbm |
18:08:51 | <thuban> | archivebot does not run javascript and therefore can't make all the relevant api requests |
18:10:17 | <fireonlive> | sheesh the wayback machine is in its dialup era today |
18:10:19 | | pseudorizer quits [Quit: ZNC 1.8.2 - https://znc.in] |
18:10:20 | <fireonlive> | https://web.archive.org/web/20240212180822/https://infosec.exchange/@radareorg/111919836786322370 |
18:10:23 | <fireonlive> | just made this |
18:10:44 | <fireonlive> | it doesn’t appear to work correctly |
18:10:59 | | pseudorizer (pseudorizer) joins |
18:11:34 | <fireonlive> | if you toggle save a screenshot some otherwise broken sites work in screenshot only |
18:12:47 | <katia> | is there something like archivebot but with a browser? |
18:13:59 | <pokechu22> | There used to be chromebot but that's been disabled for a while due to it producing broken files (I don't entirely know the details) |
18:14:40 | | rktk quits [Read error: Connection reset by peer] |
18:16:30 | <thuban> | katia: different interface obviously, but that's how spn works (though spn is fucking slammed lately) |
18:17:01 | <fireonlive> | https://github.com/internetarchive/warcprox |
18:17:17 | <fireonlive> | https://github.com/internetarchive/brozzler |
18:18:12 | | rktk (rktk) joins |
18:21:19 | | icedice quits [Client Quit] |
18:24:41 | <DigitalDragons> | I wonder if other fediverse software (Sharkey, Pleroma) have the same problem as mastodon |
18:26:26 | <fireonlive> | hmm. if they’re js hellholes perhaps |
18:26:58 | <@JAA> | Pleroma has been a JS hell from the start. It was a design goal, I think. |
18:28:47 | <katia> | thuban, thinking more wrt. selfhosting |
18:30:05 | | Darken (Darken) joins |
18:30:08 | <@JAA> | katia: Yeah, look at brozzler. |
18:30:28 | <katia> | oh, nice, is this save page now? :D |
18:30:32 | | icedice (icedice) joins |
18:30:51 | <@JAA> | I believe it's the backend for SPN(2), but not entirely sure. |
18:32:00 | <thuban> | it is https://blog.archive.org/2019/10/23/the-wayback-machines-save-page-now-is-new-and-improved/ |
18:32:15 | <DigitalDragons> | https://web.archive.org/web/20240212182740/https://social.digitaldragon.dev/notes/9p1my38vep7b0009 Sharkey (and by extension I assume it's upstream Misskey) doesn't seem to work either |
18:32:42 | <@JAA> | Ah, hadn't heard of Sharkey before, but Misskey is also broken, yeah. |
18:33:02 | <@JAA> | Note that in some cases SPN will actually archive all relevant information, but it just won't play back correctly. |
18:34:16 | <@JAA> | The API response behind that Mastodon example above was actually archived: https://web.archive.org/web/20240212180825/https://infosec.exchange/api/v1/statuses/111919836786322370 |
19:00:11 | | icedice quits [Client Quit] |
19:00:16 | | jacksonchen666 (jacksonchen666) joins |
19:01:18 | | icedice (icedice) joins |
19:07:08 | | jasons (jasons) joins |
19:14:28 | | Island joins |
19:41:10 | | jacksonchen666 is now authenticated as * |
19:41:10 | | jacksonchen666 is now known as RJHacker58248 |
19:41:12 | | RJHacker58248 quits [Ping timeout: 255 seconds] |
19:41:14 | | jacksonchen666 (jacksonchen666) joins |
19:41:40 | <fireonlive> | oh great |
19:42:36 | <h2ibot> | JacksonChen666 edited Deathwatch (+230, update queer.af: date of shutdown is probably…): https://wiki.archiveteam.org/?diff=51715&oldid=51689 |
19:45:15 | | anarcat quits [Quit: rebooting] |
19:45:55 | | anarcat (anarcat) joins |
20:07:27 | | jasons quits [Ping timeout: 272 seconds] |
20:27:38 | | thuban is now authenticated as thuban |
20:38:16 | | JustMeCorne quits [Remote host closed the connection] |
20:50:47 | | BlueMaxima joins |
20:58:58 | <h2ibot> | Pedrosso edited Swedish Public TV (+302, /* SVT Play */ clarified vital signs): https://wiki.archiveteam.org/?diff=51717&oldid=51638 |
21:10:19 | | jasons (jasons) joins |
21:13:00 | <h2ibot> | Barto edited Votes in Switzerland/2024-03-03 (+821, Add Wallis / Valais): https://wiki.archiveteam.org/?diff=51718&oldid=51695 |
21:13:04 | <Barto> | JAA: ^ |
21:13:10 | <@JAA> | :-) |
21:13:31 | <Barto> | got some 403 in AB though, which is a shame |
21:22:02 | | anarchat (anarcat) joins |
21:22:12 | | anarcat quits [Client Quit] |
21:41:11 | | wyatt8740 quits [Ping timeout: 272 seconds] |
21:42:29 | | wyatt8740 joins |
21:53:03 | | jacksonchen666 quits [Client Quit] |
21:57:39 | | Dango360_ quits [Ping timeout: 272 seconds] |
22:00:54 | | ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
22:01:01 | | ThetaDev joins |
22:05:19 | | Naruyoko quits [Quit: Leaving] |
22:05:50 | | jasons quits [Ping timeout: 240 seconds] |
22:35:14 | | sm joins |
22:35:18 | | Dango360 (Dango360) joins |
22:35:24 | | sm quits [Remote host closed the connection] |
22:45:54 | | anarchat quits [Read error: Connection reset by peer] |
22:46:04 | | anarcat (anarcat) joins |
23:05:24 | <h2ibot> | Barto edited Votes in Switzerland/2024-03-03 (+886, Add Fribourg / Freiburg): https://wiki.archiveteam.org/?diff=51719&oldid=51718 |
23:05:52 | <Barto> | :-) |
23:08:16 | | Wohlstand (Wohlstand) joins |
23:09:27 | | jasons (jasons) joins |
23:18:33 | | Ketchup901 quits [Ping timeout: 255 seconds] |
23:18:52 | | Ketchup901 (Ketchup901) joins |
23:20:26 | <fireonlive> | :3 |
23:21:49 | | Ketchup901 quits [Remote host closed the connection] |
23:22:01 | | Ketchup901 (Ketchup901) joins |
23:52:02 | | Ketchup901 quits [Remote host closed the connection] |
23:52:20 | | Ketchup901 (Ketchup901) joins |