00:00:42 | | decky_e quits [Read error: Connection reset by peer] |
00:01:36 | | decky_e joins |
00:04:27 | | sec^nd quits [Ping timeout: 255 seconds] |
00:05:14 | | sec^nd (second) joins |
00:08:55 | | zhongfu_ quits [Quit: cya losers] |
00:09:10 | | zhongfu (zhongfu) joins |
00:10:16 | <h2ibot> | Petchea edited Tumblr (+538, /* History */): https://wiki.archiveteam.org/?diff=51760&oldid=51596 |
00:12:16 | <h2ibot> | Petchea edited Tumblr (-43, /* History */): https://wiki.archiveteam.org/?diff=51761&oldid=51760 |
00:17:39 | | wickedplayer494 quits [Remote host closed the connection] |
00:21:10 | | nic8 (nic) joins |
00:23:08 | | nic quits [Ping timeout: 255 seconds] |
00:23:09 | | nic8 is now known as nic |
00:51:05 | | nic quits [Ping timeout: 272 seconds] |
01:02:26 | | nic (nic) joins |
01:06:59 | | kiryu joins |
01:06:59 | | kiryu is now authenticated as kiryu |
01:06:59 | | kiryu quits [Changing host] |
01:06:59 | | kiryu (kiryu) joins |
01:10:23 | | nic quits [Ping timeout: 255 seconds] |
01:13:17 | | etnguyen03 (etnguyen03) joins |
01:17:12 | | nic (nic) joins |
01:38:44 | | nic quits [Ping timeout: 255 seconds] |
01:41:16 | | nic (nic) joins |
01:47:44 | | nic quits [Ping timeout: 255 seconds] |
01:50:41 | | HP_Archivist (HP_Archivist) joins |
01:51:15 | | etnguyen03 quits [Ping timeout: 272 seconds] |
01:51:43 | | Wohlstand quits [Client Quit] |
01:54:59 | | Wohlstand (Wohlstand) joins |
01:56:43 | | etnguyen03 (etnguyen03) joins |
02:54:03 | | Megame (Megame) joins |
03:14:45 | | kiryu_ joins |
03:18:39 | | kiryu quits [Ping timeout: 272 seconds] |
03:19:17 | | nulldata quits [Ping timeout: 272 seconds] |
03:20:53 | <h2ibot> | Megame edited Deathwatch (+140, https://www.neonmob.com/ - Feb 29): https://wiki.archiveteam.org/?diff=51762&oldid=51744 |
03:24:54 | <h2ibot> | PaulWise edited YouTube (+191, link to YouTube Video Finder by TheTechRobo): https://wiki.archiveteam.org/?diff=51763&oldid=51742 |
03:27:28 | | nulldata (nulldata) joins |
03:30:21 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
03:34:40 | | Lord_Nightmare (Lord_Nightmare) joins |
03:36:29 | | jacksonchen666 (jacksonchen666) joins |
03:38:34 | | jacksonchen666 quits [Remote host closed the connection] |
03:39:01 | | jacksonchen666 (jacksonchen666) joins |
03:45:45 | | jacksonchen666 quits [Remote host closed the connection] |
03:46:21 | | jacksonchen666 (jacksonchen666) joins |
03:50:08 | | jacksonchen666 quits [Remote host closed the connection] |
03:50:41 | | jacksonchen666 (jacksonchen666) joins |
03:51:39 | | jacksonchen666 quits [Remote host closed the connection] |
03:52:16 | | jacksonchen666 (jacksonchen666) joins |
03:57:01 | | Wohlstand quits [Client Quit] |
04:01:42 | | jacksonchen666 quits [Remote host closed the connection] |
04:02:09 | | jacksonchen666 (jacksonchen666) joins |
04:05:49 | | jacksonchen666 quits [Remote host closed the connection] |
04:06:18 | | jacksonchen666 (jacksonchen666) joins |
04:08:09 | | jacksonchen666 quits [Remote host closed the connection] |
04:08:33 | | jacksonchen666 (jacksonchen666) joins |
04:14:18 | | BlueMaxima quits [Client Quit] |
04:34:13 | | belthesar quits [Remote host closed the connection] |
04:34:14 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
04:35:40 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
04:37:25 | | qwertyasdfuiopghjkl quits [Excess Flood] |
04:37:37 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:05:15 | | Island quits [Read error: Connection reset by peer] |
05:11:34 | | michaelblob quits [Read error: Connection reset by peer] |
05:19:14 | | etnguyen03 quits [Ping timeout: 255 seconds] |
05:22:57 | | DogsRNice quits [Read error: Connection reset by peer] |
05:26:56 | | etnguyen03 (etnguyen03) joins |
05:32:06 | <pabs> | not sure if here or #urlteam for this, but anyway: so the deb.li (Debian short URL service) maintainer says it has 2.5M urls at the moment, and an export would be possible. I guess thats OKish for AB or better for #// (?) however the maintainer asks the best way to keep things updated, like what to do to re-archive updated custom short links and archive new custom/random short links |
05:33:37 | <pabs> | hmm, maybe just a page of recently created/updated links, regularly crawled by #// from urls-sources? |
05:34:21 | <pabs> | /cc JAA arkiver - IIRC urls-sources changes are paused for now |
05:35:02 | <@JAA> | If we care about completeness, that's probably not the ideal solution since there's no reasonable way to verify whether things succeeded through #//, the regular stuff doesn't run all that regularly when we're backlogged, etc. |
05:35:19 | <@JAA> | But #// would be perfect for the redirect targets, probably. |
05:36:23 | <@JAA> | A page of the N last created/updated links would work fine. I could run something with qwarc to retrieve that very regularly. I already run a bunch of things like that. |
05:37:27 | <@JAA> | Could also do the redirects that way, dump out the target URLs, and then throw that into #// periodically. |
05:38:15 | <thuban> | could also have a service regularly submit items to urlteam through backfeed, if we wanted to keep it under that umbrella, right? |
05:38:29 | <thuban> | i guess that wouldn't do anything with the targets through |
05:38:32 | <@JAA> | URLTeam doesn't support such things. It only does auto-generated IDs. |
05:38:37 | <thuban> | oh, huh |
05:38:42 | <@JAA> | And it doesn't produce WARCs. |
05:38:50 | <thuban> | oh, _huh_ |
05:39:00 | <@JAA> | It's an old, special project. :-) |
05:39:03 | <thuban> | til |
05:39:32 | <@JAA> | It would in theory make sense to keep it under that umbrella, except for these little details, yeah. |
05:40:10 | <thuban> | my next question was going to be whether urlteam submitted targets to #// (and if not, whether it should), but i think i can guess the answer now... |
05:41:30 | <@JAA> | No, but datechnoman has been submitting them slowly. |
05:41:45 | <@JAA> | Or at least a subset of them, all would be a *lot* of work. |
05:41:51 | <thuban> | nice |
05:51:08 | | Ruthalas590 (Ruthalas) joins |
05:51:35 | | Ruthalas59 quits [Quit: Ping timeout (120 seconds)] |
05:51:35 | | Ruthalas590 is now known as Ruthalas59 |
06:04:02 | <pabs> | snapshot.debian.org improvement planning: https://lists.debian.org/msgid-search/87ttm2gxit.fsf@nordberg.se |
06:04:12 | <pabs> | 2) find new main storage |
06:04:19 | <pabs> | - 1 x physical server with > 150T (potentially slow) disk for storage |
06:04:23 | | etnguyen03 quits [Remote host closed the connection] |
06:11:48 | <@arkiver> | that's a whole lot of snapshots :P |
06:20:27 | | Arcorann (Arcorann) joins |
06:21:28 | <h2ibot> | Petchea edited Tumblr (+404, /* History */): https://wiki.archiveteam.org/?diff=51764&oldid=51761 |
06:33:34 | <@arkiver> | pabs: yeah if you have a list of URLs that were shortened, they would be very welcome in #// ! |
06:34:29 | | TastyWiener9541 (TastyWiener95) joins |
06:35:37 | | TastyWiener954 quits [Ping timeout: 272 seconds] |
06:35:37 | | TastyWiener9541 is now known as TastyWiener954 |
06:35:48 | <pabs> | hmm what about the short URLs, ie not the redirect targets. guess thats AB only, but it will also get the redirect targets, so... |
06:36:20 | <pabs> | arkiver: snapshot.d.o grows at 8TB per year |
06:36:50 | <@JAA> | pabs: Hence why I suggested qwarc. :-) |
06:37:16 | <pabs> | oh for the initial dump too? |
06:37:32 | <pabs> | thought you meant for the updates |
06:38:06 | <@JAA> | Sure, it's easy enough. |
06:40:30 | <pabs> | cool |
06:54:34 | <h2ibot> | Petchea edited Deathwatch (+131, /* 2024 */ Manyland): https://wiki.archiveteam.org/?diff=51765&oldid=51762 |
06:54:35 | <h2ibot> | Petchea edited Deathwatch (-33, /* 2024 */ removed the old entry): https://wiki.archiveteam.org/?diff=51766&oldid=51765 |
07:22:19 | | BearFortress joins |
07:26:15 | | Megame quits [Client Quit] |
07:28:49 | | Barto quits [Ping timeout: 272 seconds] |
07:29:06 | | Barto (Barto) joins |
07:31:59 | | superkuh quits [Ping timeout: 255 seconds] |
07:31:59 | | Doomaholic quits [Ping timeout: 255 seconds] |
07:35:17 | | jacksonchen666 quits [Remote host closed the connection] |
07:35:17 | | sec^nd quits [Remote host closed the connection] |
07:36:09 | | jacksonchen666 (jacksonchen666) joins |
07:36:11 | | sec^nd (second) joins |
07:41:10 | | sec^nd quits [Remote host closed the connection] |
07:41:39 | | sec^nd (second) joins |
08:00:03 | | jacksonchen666 quits [Remote host closed the connection] |
08:00:03 | | sec^nd quits [Remote host closed the connection] |
08:00:28 | | sec^nd (second) joins |
08:00:32 | | jacksonchen666 (jacksonchen666) joins |
08:12:18 | | sec^nd quits [Remote host closed the connection] |
08:12:34 | | sec^nd (second) joins |
08:17:41 | | eyes joins |
08:38:34 | | superkuh joins |
08:42:55 | | mcint quits [Ping timeout: 272 seconds] |
08:43:16 | | mcint (mcint) joins |
09:12:47 | | Doranwen quits [Ping timeout: 255 seconds] |
09:14:44 | | qwertyasdfuiopghjkl quits [Client Quit] |
09:25:21 | | Doomaholic (Doomaholic) joins |
09:32:43 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
09:49:05 | | cas joins |
09:50:18 | <cas> | https://www.intanibase.com/iad_general/financial.aspx I recently found out this site is in financial trouble today. Afaik there isn't a specific date as to when it's shut down, so I presume it's not at a point of shutting down for good, however, its state appears to remain precarious as it still keep this announcement on display |
09:51:05 | <@JAA> | cas: Thanks, I'll throw it into ArchiveBot. |
09:51:27 | <cas> | No problem. The announcement is made at a far earlier date than when I first found this, so Idk how long they've been in trouble for. Regardless, they appears to be still in trouble as they've chosen to kept it on display |
09:53:11 | <cas> | thanks again for your quick input, lol, seems like im posting at a lucky time haha |
09:54:36 | <@JAA> | Oh right, it's ASP.NET, so depending on how the different sections of the site work, a simple recursive crawl might not be sufficient. Might look into that later. |
09:56:14 | <@JAA> | When something gets posted here, someone will pick it up eventually, but yeah, it usually takes a bit longer than 47 seconds. lol |
10:00:02 | | Bleo18260 quits [Client Quit] |
10:01:19 | | Bleo18260 joins |
10:04:59 | | Gereon quits [Ping timeout: 255 seconds] |
10:05:10 | <cas> | lol lucky me then hah |
10:05:19 | <cas> | anywho, Im gonna go tho, thanks for responding to me JAA |
10:05:22 | | cas quits [Remote host closed the connection] |
10:15:42 | | Gereon (Gereon) joins |
10:35:34 | <datechnoman> | Yes i do feed URLTeam urls into #// but only when the queue is running low which is hardly ever |
10:35:56 | <datechnoman> | I keep them all tracked in an excel document, all the items that have been submitted, how many, when etc |
11:20:43 | | BearFortress quits [Client Quit] |
11:21:13 | | Barto quits [Read error: Connection reset by peer] |
11:21:23 | | Barto (Barto) joins |
11:32:01 | | Barto quits [Ping timeout: 272 seconds] |
11:32:32 | | Barto (Barto) joins |
11:53:26 | | michaelblob (michaelblob) joins |
11:58:13 | | decky_e quits [Read error: Connection reset by peer] |
11:58:59 | | decky_e joins |
12:46:27 | | jacksonchen666 quits [Client Quit] |
12:46:40 | | jacksonchen666 (jacksonchen666) joins |
12:54:21 | | Arcorann quits [Ping timeout: 272 seconds] |
13:54:09 | | beastbg8 quits [Read error: Connection reset by peer] |
14:01:59 | | etnguyen03 (etnguyen03) joins |
14:05:48 | | beastbg8 (beastbg8) joins |
14:09:29 | | decky joins |
14:10:40 | | Ketchup901 quits [Quit: No Ping reply in 180 seconds.] |
14:11:37 | | decky_e quits [Ping timeout: 272 seconds] |
14:12:02 | | Ketchup901 (Ketchup901) joins |
14:13:35 | | eyes quits [Client Quit] |
14:22:06 | | Ketchup901 quits [Remote host closed the connection] |
14:22:17 | | Ketchup901 (Ketchup901) joins |
14:25:44 | | decky quits [Read error: Connection reset by peer] |
14:28:14 | | nepeat quits [Ping timeout: 255 seconds] |
14:30:29 | | etnguyen03 quits [Ping timeout: 255 seconds] |
14:33:02 | | igloo22225 quits [Client Quit] |
14:33:15 | | igloo22225 (igloo22225) joins |
14:33:47 | | nepeat (nepeat) joins |
14:33:57 | | decky_e joins |
14:34:34 | | decky_e quits [Read error: Connection reset by peer] |
14:36:11 | | igloo22225 quits [Client Quit] |
14:36:22 | | igloo22225 (igloo22225) joins |
14:37:52 | | etnguyen03 (etnguyen03) joins |
14:42:01 | | mcint quits [Ping timeout: 272 seconds] |
14:42:21 | | mcint (mcint) joins |
14:42:39 | | igloo22225 quits [Ping timeout: 272 seconds] |
14:43:08 | | igloo22225 (igloo22225) joins |
14:45:20 | | etnguyen03 quits [Ping timeout: 255 seconds] |
14:48:21 | | nepeat quits [Ping timeout: 272 seconds] |
14:48:21 | | dave quits [Ping timeout: 272 seconds] |
14:58:26 | | decky_e joins |
14:58:43 | | etnguyen03 (etnguyen03) joins |
15:03:35 | | nepeat (nepeat) joins |
15:04:03 | | dave (dave) joins |
15:35:03 | | katocala quits [Read error: Connection reset by peer] |
15:35:16 | | katocala (katocala) joins |
15:40:44 | | ell quits [Client Quit] |
15:41:33 | | ell (ell) joins |
16:28:28 | <balrog> | https://bsky.app/profile/friede.bsky.social/post/3klzdhhullt2n -- reputable reports that vice.com may be going down imminently |
16:28:41 | <balrog> | looks like there's a may 2023 fullsite AB grab |
16:32:01 | | yawkat quits [Read error: Connection reset by peer] |
16:33:00 | <fireonlive> | https://bsky.app/profile/socialismforall.bsky.social/post/3klzec7t2pm2e < 🤦🏻♂️ |
16:34:43 | <balrog> | ugh yeah I know right |
16:36:00 | <nicolas17> | "entire site" = front page? |
16:44:03 | <nicolas17> | fireonlive: is that only the front page? |
16:44:34 | <nicolas17> | if so, anyone with a bsky account to go correct them? |
16:46:23 | | etnguyen03 quits [Ping timeout: 255 seconds] |
16:53:35 | <fireonlive> | nicolas17: looks like it yeah |
17:06:17 | | yawkat (yawkat) joins |
17:21:08 | <fireonlive> | i guess we should probably throw all vice youtube channels in down the tube too eh |
17:25:42 | <kdy> | @nicolas17 : sure, I replied |
17:27:14 | <joepie91|m> | balrog: did the fullsite grab include the local versions? like the NL one |
17:35:42 | <balrog> | joepie91|m: I wouldn't know, maybe Ryz would |
17:41:09 | | AramZS joins |
17:43:03 | <joepie91|m> | I do feel like a new fullsite grab would be a good idea |
17:43:08 | <AramZS> | It appears that vice.com is in danger of being shut down - https://twitter.com/janusrose/status/1760683182389956757 |
17:43:09 | <eggdrop> | nitter: https://farside.link/nitter/janusrose/status/1760683182389956757 |
17:43:10 | <joepie91|m> | given their pace of publication |
17:44:03 | <AramZS> | Ah yeah, someone else is on here noting this. I agree, considering pace of publication it would likely be good to make sure the archive has a latest update |
17:49:20 | <@JAA> | The job from last year did include all languages, but note that it took half a year to complete. |
17:49:45 | <@JAA> | But that was with outlinks. We can skip those this time, I think. |
17:50:27 | <balrog> | (fwiw nitter is pretty much dead) |
17:50:46 | <@JAA> | Yeah, two more days or so until it will be completely dead. |
17:51:49 | <joepie91|m> | yeah outlinks don't seem like the priority this time |
17:52:10 | <joepie91|m> | perhaps we can also skip any articles pre 2023? |
17:52:17 | <joepie91|m> | as those are presumably already archived |
17:52:19 | <balrog> | There doesn't seem to be an easy way to skip based on date |
17:52:23 | <balrog> | err, based on URL |
17:52:36 | <joepie91|m> | oh right, they don't have the date in the URL... annoying |
17:54:10 | <AramZS> | Not sure if it is helpful, but you could use their sitemap to control by date - https://www.vice.com/en/sitemap/ |
17:54:23 | <AramZS> | Their robots.txt has sitemaps for each language they support too |
17:54:24 | | BearFortress joins |
17:55:52 | <@JAA> | We could do an !ao < job with just the articles from the recent sitemaps, yeah. A recursive job would still recurse to older parts of the site. |
17:56:06 | <@JAA> | (Of course, we can also do both.) |
18:02:33 | <AramZS> | Looks like 2023 onward is covered by the sitemap starting at https://www.vice.com/en/sitemap/articles?before=1668784610395&after=1664802000000 and going to the top |
18:03:44 | <AramZS> | Sorry, starting after that one I mean, at https://www.vice.com/en/sitemap/articles?before=1674148955507&after=1668787492681 |
18:06:39 | <joepie91|m> | note that that is only /en/ |
18:06:45 | <joepie91|m> | does not include /nl/ for example |
18:07:54 | <@JAA> | I'm dealing with it. |
18:08:03 | <joepie91|m> | ok, :) |
18:08:43 | <AramZS> | Yeah, robots.txt has sitemaps for each language - https://www.vice.com/ar/sitemap/, https://www.vice.com/de/sitemap/, etc... |
18:08:56 | | lennier2_ joins |
18:10:49 | <@JAA> | I'm skipping over all sitemaps with a 'before' date before when last year's AB job started (1683020669). |
18:12:17 | | lennier2 quits [Ping timeout: 272 seconds] |
18:12:55 | <AramZS> | :+1: |
18:13:49 | <@JAA> | 15654 URLs in those newer sitemaps. Not too bad. |
18:15:52 | <@JAA> | For the record: curl -s https://www.vice.com/robots.txt | grep sitemap | sed 's,^.* ,,' | xargs curl -sv -w $'\n' 2> >(grep '^> GET\|^< HTTP' >&2) | grep -Po '<loc>\K[^<]*' | sed 's,&,\&,g' | perl -pe 's,^(.*[?&]before=(\d+).*)$,\2 \1,' | awk '$1 < 1683020669000 { next; } {print $2}' | xargs curl -sv -w $'\n' 2> >(grep '^> GET\|^< HTTP' >&2) | grep -Po '<loc>\K[^<]*' |
18:19:38 | <@JAA> | First attempt got only 404s for some reason, running fine now. |
18:21:10 | <fireonlive> | JAA++ |
18:21:10 | <eggdrop> | [karma] 'JAA' now has 21 karma! |
18:26:25 | | IDK (IDK) joins |
18:30:36 | <balrog> | https://www.mediaite.com/media/its-apocalyptic-vice-staffers-brace-for-worst-after-anonymous-tip-warns-site-will-be-deleted-entirely/ |
18:40:50 | | etnguyen03 (etnguyen03) joins |
18:52:49 | | eyes joins |
19:06:10 | | Island joins |
19:16:06 | | Maika (Maika) joins |
19:51:17 | | Island_ joins |
19:53:17 | <that_lurker> | does someone remember how big the last Vice grab was |
19:54:15 | | Island quits [Ping timeout: 272 seconds] |
20:01:41 | | etnguyen03 quits [Ping timeout: 255 seconds] |
20:13:54 | <AK> | Oh god, not Vice again 😆 |
20:15:43 | <AK> | that_lurker, yes: https://archive.fart.website/archivebot/viewer/job/202305020944293m7tt |
20:16:04 | <AK> | Absolutely massive is the answer |
20:16:32 | <that_lurker> | Oh god. All I remember was that it took a while |
20:17:16 | <AK> | I seem to have this great habit of starting a massive AB job and then running nothing again for another 6 months |
20:17:46 | <@JAA> | Yeah, but that run included offsite links, and that usually increases the size ten-fold. |
20:18:50 | <AK> | True true, it makes me wish for that hypothetical state where offsites links can get thrown to #// to be done over time, where the main onsite is still asap in ab |
20:19:46 | <@JAA> | I'm kind of doing that manually currently (and will do so for the rerun), but yeah. |
20:23:03 | <AK> | Ahh yeah I forget you have magical ways to get those links |
20:23:11 | <@JAA> | :-) |
20:32:42 | <that_lurker> | Did AB grab .zip files from sites as well? |
20:33:03 | <@JAA> | Example? |
20:33:34 | | etnguyen03 (etnguyen03) joins |
20:33:52 | <that_lurker> | https://download.highonandroid.com/ could use a complete grab, but it seems to be runnin on a potato |
20:34:04 | <that_lurker> | s/runnin/ruinning |
20:34:25 | <that_lurker> | (╯°□°)╯︵ ┻━┻ typing is hard it seems :-P |
20:35:04 | <that_lurker> | but yeah, the site has for example to roms for oneplus that oneplus no longer suplies |
20:35:48 | <@JAA> | Oh, I thought you were talking about VICE. |
20:36:02 | <@JAA> | Nice expired cert there. |
20:36:38 | <@JAA> | Doesn't look very potato-y to me. |
20:37:07 | <that_lurker> | Sometimes its somewhat fast, but I think the one who is hosting this is running it at home |
20:37:52 | <that_lurker> | not 100% sure on that, but it is really slow time at times. |
20:40:00 | <that_lurker> | and now that im looking at it. It seems the whole highonandroid.com has become inactive. Forums are down already it seems |
20:40:05 | | IDK quits [Client Quit] |
20:48:51 | <@JAA> | Forums work here (with an expired cert) but are very slow. |
20:49:23 | <@JAA> | Oh, zedomax, that rings a bell. |
20:57:09 | <@JAA> | Main site started, I'll throw them in slowly as they finish to not overload the potato. |
20:58:15 | <@JAA> | Lots of dead links to the now-parked stockroms.net domain. |
20:59:16 | <that_lurker> | 👍thanks <3 |
21:01:11 | <@JAA> | The !ao < job with new VICE articles is done with the initial list, so at least the most important parts should be mostly covered. |
21:03:37 | <fireonlive> | :) |
21:05:13 | | IDK (IDK) joins |
21:13:25 | <thuban> | i asked this in #archivebot a while back but i think it got lost in the shuffle: what are people's thoughts on running an archiveteam nitter using an actual twitter account (either with the privacydev fork or using account tokens as guest tokens)? https://github.com/zedeus/nitter/issues/1156 |
21:13:30 | <thuban> | although this is contrary to our usual policy, it seems justifiable to me on the basis of (1) twitter's significance and (2) the fact that we would not be rendering any* non-public tweets public, only circumventing rate limits |
21:13:34 | <thuban> | * except possibly tweets marked nsfw? i'm not familiar with the details there |
21:14:09 | <fireonlive> | would the rate limits be high enough? |
21:14:28 | <fireonlive> | i hear real accounts are quite low so it would be hard to grab a full account for example |
21:15:23 | <aninternettroll> | seems like most nitter instances nowadays are always at the rate limit. Not sure how well it would work for a project on archiveteam's scale |
21:15:38 | <thuban> | fireonlive: are they? i would have thought they'd be higher than guest accounts |
21:16:34 | <thuban> | aninternettroll: those are public nitters; this would be a nitter only for use with archivebot and with jobs queued at a limited rate |
21:17:44 | <fireonlive> | i could be wrong.. just from glancing over the somewhat high volume github emails |
21:18:10 | <fireonlive> | i think there's someone here who setup a nitter using a personal account who could hopefully weigh in though |
21:19:04 | <thuban> | my understanding was that _search_ was super low, so people couldn't scrape, but account pagination/individual tweets were ok |
21:19:14 | <thuban> | but maybe i am wrong |
21:19:58 | <fireonlive> | ahh perhaps |
21:20:04 | <fireonlive> | i'd be for it fwiw |
21:20:30 | <fireonlive> | if IA was happy with that in the WBM.. a IP-limited instance |
21:20:35 | <fireonlive> | could even get people to donate accounts perhaps |
21:20:43 | <fireonlive> | burner accounts that is |
21:21:18 | <thuban> | not a fan of that last, we'd have to verify that they were proper burners (no friends at minimum) |
21:22:02 | <fireonlive> | hmm yeah |
21:26:53 | <katia> | i've setup my own nitter with my own account (yolo, idc) and just lightly browsing around i have been seeing rate limit in the logs |
21:28:04 | <katia> | the limiting factor is definitely throwaway accounts at this point since guest tokens cannot be maken anymore as far as i understand |
21:29:07 | <thuban> | i see, interesting--i guess _one_ real account might have lower rate limits than _many guest accounts (like nitter used) |
21:29:26 | <fireonlive> | we had something like 20k tokens at one point |
21:30:04 | <katia> | https://gist.github.com/cmj/998f59680e3549e7f181057074eccaa3 |
21:30:10 | <fireonlive> | !nitterstatus |
21:30:11 | <eggdrop> | [AT/nitter/status] accounts remaining: 7408, limited accounts 0, oldest: 2023-12-25T03:35:07Z, average: 2024-01-13T01:26:07Z, newest: 2024-01-24T03:35:17Z |
21:30:28 | <fireonlive> | well that's more than before :p |
21:32:38 | <katia> | there's also this but i think it did not work recently https://gist.github.com/cmj/d9227bf821039998b18f151a1a73ff35 |
21:33:47 | <that_lurker> | Does Twiter blue accounts have bigger limits? One paid archiving account would not be that bad |
21:47:26 | | Island_ quits [Ping timeout: 255 seconds] |
21:47:34 | | Island joins |
21:56:39 | | sec^nd quits [Ping timeout: 255 seconds] |
21:58:05 | | Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ] |
21:58:44 | | Shjosan (Shjosan) joins |
22:00:39 | | sec^nd (second) joins |
22:02:54 | | AramZS quits [Client Quit] |
22:05:34 | | Maika quits [Read error: Connection reset by peer] |
22:10:52 | | Chris5010 quits [Remote host closed the connection] |
22:27:56 | | etnguyen03 quits [Ping timeout: 255 seconds] |
22:41:25 | | jacksonchen666 quits [Remote host closed the connection] |
22:42:35 | | sd96 joins |
22:43:13 | | sd96 quits [Remote host closed the connection] |
22:47:05 | | Wohlstand (Wohlstand) joins |
22:48:11 | <fireonlive> | vice confirmed: https://variety.com/2024/digital/news/vice-cease-publishing-layoff-hundreds-ceo-1235919843/ |
22:48:17 | <steering> | pfft |
22:56:03 | <balrog> | doesn't answer if they're stopping publishing or taking down |
23:03:19 | | etnguyen03 (etnguyen03) joins |
23:09:04 | | egallager joins |
23:10:05 | <egallager> | hey so is the Vice shutdown under control? |
23:13:12 | <nicolas17> | it's known, people are working on it |
23:13:16 | <nicolas17> | idk if we can call it "under control" |
23:13:40 | <@JAA> | If a raging open fire is 'under control'... sure. :-) |
23:16:31 | <fireonlive> | raging open fire... you rang? |
23:16:42 | <fireonlive> | :3 |
23:17:56 | <egallager> | "The capture will start in ~6 seconds because we are doing a lot of captures of www.vice.com right now." ok well I guess that's a good sign, at least... |
23:20:35 | <@JAA> | We are not archiving using the Wayback Machine. It's far too slow. |
23:20:48 | <@JAA> | But the data will all show up in the Wayback Machine later. |
23:21:33 | <thuban> | egallager: we did a complete crawl of vice.com back in may; today we're doing a retrieval of all newer articles (which has finished the article pages and is just working on page assets now), another crawl of the site, and a couple of subdomains http://archivebot.com/?initialFilter=vice |
23:23:13 | <egallager> | I didn't quite catch, did they include a date for when the shutdown would occur? |
23:24:34 | <thuban> | no (nor did they mention whether vice.com would go offline or merely stop publishing) |
23:25:17 | <egallager> | ok, so I guess that'd be why the Deathwatch page hasn't been updated yet, then... |
23:26:20 | <@JAA> | I guess we can put it down as Unknown. |
23:26:43 | <fireonlive> | was there a seperate page for 'probably dying' |
23:27:09 | <fireonlive> | ah no i'm thinking of firedrill |
23:27:11 | <fireonlive> | different thing |
23:27:22 | <@JAA> | There's [[Alive... OR ARE THEY]] (aka [[Fire Drill]]), but I'm not sure anyone ever looks at that, and yeah, not quite. |
23:27:56 | <egallager> | ah, apparently someone created this: rebrand.ly/vice-archives (Google Drive link) |
23:28:41 | <thuban> | i should do the big fire drill edit i talked about |
23:28:59 | <fireonlive> | hmm spreadsheets |
23:29:04 | <fireonlive> | thuban: yeah i'd go for it |
23:29:08 | <thuban> | i think i was putting it off until after i went through the deathwatch backlog. i should do that too... |
23:29:22 | <@JAA> | Heh |
23:30:27 | <fireonlive> | huh, articles dated 1970 eh |
23:30:48 | <fireonlive> | Thu, 01 Jan 1970 00:00:00 GMT |
23:31:21 | <fireonlive> | http://vice.com/en/article/kwzv73/grajauexm lol |
23:32:26 | <egallager> | that's the start of the unix epoch, isn't it? |
23:35:08 | <fireonlive> | indeed :) |
23:35:34 | | wickedplayer494 joins |
23:35:53 | | wickedplayer494 is now authenticated as wickedplayer494 |
23:49:03 | | egallager quits [Remote host closed the connection] |
23:53:27 | | magmaus3 quits [Quit: Ping timeout (120 seconds)] |
23:54:35 | | magmaus3 (magmaus3) joins |