00:00:25 | | etnguyen03 (etnguyen03) joins |
00:13:37 | <h2ibot> | Vokunal edited Deathwatch (+0): https://wiki.archiveteam.org/?diff=51121&oldid=51120 |
00:13:38 | <h2ibot> | JustAnotherArchivist changed the user rights of User:Vokunal |
00:34:55 | | pseudorizer quits [Ping timeout: 272 seconds] |
00:37:06 | | Megame quits [Client Quit] |
01:12:37 | | kiryu quits [Remote host closed the connection] |
01:13:33 | | itachi1706 quits [Ping timeout: 272 seconds] |
01:13:52 | | kiryu joins |
01:13:52 | | kiryu is now authenticated as kiryu |
01:13:52 | | kiryu quits [Changing host] |
01:13:52 | | kiryu (kiryu) joins |
01:15:02 | | itachi1706 (itachi1706) joins |
01:31:17 | | etnguyen03 quits [Ping timeout: 272 seconds] |
01:47:25 | | etnguyen03 (etnguyen03) joins |
02:07:12 | | etnguyen03 quits [Ping timeout: 265 seconds] |
02:17:19 | | etnguyen03 (etnguyen03) joins |
02:31:00 | | parfait_ joins |
02:34:45 | | parfait quits [Ping timeout: 265 seconds] |
02:49:32 | <Pedrosso> | With URL-needing projects like #down-the-tube, when the tracker says there are 0 to do, does that mean that the system literally has no more urls to go off of? Or that it's just not willing to allocate any right now? |
02:50:05 | <nicolas17> | when the youtube tracker says there are 0 to do, it means there are no more urls in the youtube queue, yeah |
02:50:31 | <nicolas17> | the youtube project is not trying to archive all of youtube (that would be infeasible), it has to be actually important videos |
02:50:44 | <nicolas17> | if it reaches 0, great, we have more capacity for the other projects |
02:51:24 | <Pedrosso> | Alright, that's what I wanted/needed to know. Thanks |
03:00:59 | <Pedrosso> | On a seperate curiosity, I've been wondering from a previous conversation if it'd be possible (and if possible; if it should be done) get all the failed imgur outlinks from the logs of AB projects and run those through the imgur warrior. |
03:02:41 | <pabs> | yes, you "just" need to download all the AB logs from IA, parse them, upload the lists and submit to #imgone |
03:03:05 | <pabs> | and maybe make a service for that, since other projects will want some processing too |
03:04:05 | <nicolas17> | pabs: are warcs public for imgur? for many projects they aren't :( |
03:05:00 | <pabs> | sounded like Pedrosso was talking about warcs for AB not #imgone? |
03:05:07 | <Pedrosso> | I was, I was |
03:05:13 | | pabs not sure about imgur warcs tho |
03:05:46 | <pabs> | btw the AB warcs are linked from https://archive.fart.website/archivebot/viewer/ |
03:05:47 | <nicolas17> | ah |
03:06:08 | <Pedrosso> | Also, pabs, what exactly do you mean by making a service for that? |
03:06:29 | <thuban> | nicolas17: they're both public |
03:07:06 | <pabs> | Pedrosso: as in a server with some code that does this all day long, and lets people add processing and flows. ie if AB finds a wiki, it should go to #wikibot |
03:07:28 | <pabs> | so the service would parse the warcs and connect that link |
03:09:12 | <Pedrosso> | That sounds like a good idea. Tho I individually don't have enough knowledge nor experience here to begin to think about executing that |
03:09:32 | | Pedrosso quits [Remote host closed the connection] |
03:09:53 | | Pedrosso joins |
03:11:04 | <@JAA> | There is a tool for WARC extraction, although that would have slightly different results than log parsing. |
03:11:18 | | Pedrosso29 joins |
03:11:22 | <@JAA> | s/extraction/scraping/ I guess, extracting links that appear in WARCs. |
03:11:22 | <Pedrosso29> | Sry bout the disconnect/reconnect, if it shows |
03:12:24 | <pabs> | I think this was less about scraping the HTML in WARCs and more about sending the 429ed imgur requests from AB to #imgone |
03:12:35 | <Pedrosso29> | ^ |
03:12:37 | <@JAA> | Yeah, they're not equivalent. |
03:12:50 | <@JAA> | WARC scraping would produce more results but also requires munching more data. |
03:13:03 | <pabs> | but really, both could be useful. indeed, tons more data for scraping though |
03:13:34 | <Pedrosso29> | The former I suppose would be more specific to what I originally asked, the latter would be far more general and fit with the service idea |
03:13:36 | <pabs> | could do scraping only for the AB jobs without offsite links |
03:14:23 | | Pedrosso quits [Ping timeout: 265 seconds] |
03:14:47 | <pabs> | anyway. its good to start simple though and work up from there, so manually do this, then hackily automate parts, then betterise the automation, then package it into a service |
03:14:51 | | Pedrosso29 is now known as Pedrosso |
03:15:58 | <thuban> | it's a nice thought, but it would duplicate some of the logic for cross-project dispatch we do already and i'm not sure what the best strategy for eventually rationalizing that would be |
03:16:10 | <thuban> | s/dispatch/backfeed/ |
03:17:34 | <pabs> | are there any docs for that? I hadn't heard of any cross-project dispatch yet |
03:17:57 | <@JAA> | #// dispatches to Telegram and (soon?) Imgur. |
03:17:58 | <nicolas17> | pabs: #// already sends telegram links to #telegrab |
03:18:03 | <nicolas17> | how that works behind the scenes, I don't know |
03:18:12 | <pabs> | ah, interesting... |
03:18:21 | <thuban> | there's loads but it's all done haphazardly inline https://github.com/ArchiveTeam/urls-grab/blob/master/urls.lua#L1688 |
03:18:29 | <@JAA> | No Imgur yet. arkiver, here's a reminder. ;-) |
03:18:31 | <nicolas17> | oh ew |
03:18:44 | <nicolas17> | I expected something server side rather than the worker for one project submitting into another |
03:21:49 | <thuban> | the logical thing might be to have a central url clearinghouse that identified all specially-handled urls and forwarded them to the appropriate projects (and either sent the rest to #// or, possibly configurably, dropped them as might be more appropriate for archivebot) |
03:22:28 | <pabs> | yes |
03:25:42 | <thuban> | in practice all new projects send outlinks to #// anyway, so (if eg telegram links to mediafire or whatever) they do get to the appropriate projects eventually |
03:29:49 | <Pedrosso> | So a mediafire outlink from the AB will be sent to #// where it'll be sent to #mediaonfire? |
03:30:34 | <@JAA> | Only DPoS projects send things to #//. AB does not. |
03:30:51 | <Pedrosso> | Ah, I see I see |
03:32:46 | <thuban> | right. and bundling that queueing with archival makes it not compose well with archivebot, plus it's a needless round-trip, plus it requires the code to actually opt in (when looking for an example i was surprised to find that apparently pastebin doesn't queue outlinks at all) |
03:34:35 | | BlueMaxima_ quits [Read error: Connection reset by peer] |
03:37:52 | | Pedrosso quits [Remote host closed the connection] |
03:39:45 | <thuban> | plus changes require #// worker updates to take effect (minor considering how most people run it, but still) |
03:48:41 | <thuban> | idk, i can think of some cases in which you really do need the original discovery context and not just the url (nitter/mastodon instances, blogs at custom domains). but i think all we actually do at present is url-pattern-based |
03:54:09 | <thuban> | s/discovery context/page structure/ (i can't actually think of any examples where you need the discovery context) |
04:14:20 | | dumbgoy_ quits [Read error: Connection reset by peer] |
04:22:44 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
04:58:09 | | icedice2 quits [Client Quit] |
04:58:20 | | DogsRNice quits [Read error: Connection reset by peer] |
05:00:14 | | etnguyen03 quits [Ping timeout: 265 seconds] |
05:02:09 | | etnguyen03 (etnguyen03) joins |
05:10:18 | | Wohlstand quits [Remote host closed the connection] |
05:31:15 | <fireonlive> | here we go, here we go again… https://x.com/dexerto/status/1722958208807891046?s=12 |
05:31:16 | <eggdrop> | nitter: https://nitter.net/dexerto/status/1722958208807891046 |
05:31:45 | | etnguyen03 quits [Client Quit] |
05:34:58 | <@JAA> | https://i.kym-cdn.com/entries/icons/original/000/029/223/cover2.jpg |
05:52:56 | <h2ibot> | Tech234a edited List of websites excluded from the Wayback Machine/Partial exclusions (+52, Add early Apple Store): https://wiki.archiveteam.org/?diff=51122&oldid=50493 |
05:56:57 | <h2ibot> | Petchea edited Tumblr (+107, /* History */): https://wiki.archiveteam.org/?diff=51123&oldid=51113 |
06:07:04 | <mgrandi> | https://abcnews.go.com/Technology/wireStory/jezebel-sharp-edged-feminist-website-shutting-after-16-104768751 I don't see it on deathwatch or mentioned here |
06:07:39 | <@JAA> | Indeed, but it's running through AB already. |
06:08:39 | <@JAA> | Didn't realise it was part of G/O. Another one for the list, I guess. |
06:12:06 | | kdqep__ joins |
06:13:40 | | parfait (kdqep) joins |
06:14:03 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+184, /* 2023 */ Add Jezebel): https://wiki.archiveteam.org/?diff=51124&oldid=51121 |
06:16:07 | | parfait_ quits [Ping timeout: 265 seconds] |
06:16:36 | | kdqep__ quits [Ping timeout: 265 seconds] |
06:52:05 | | nicolas17 quits [Client Quit] |
07:01:01 | | lennier2_ joins |
07:04:25 | | lennier2 quits [Ping timeout: 272 seconds] |
07:28:50 | | hitgrr8 joins |
09:02:13 | | Earendil7 quits [Ping timeout: 272 seconds] |
09:10:59 | | Earendil7 (Earendil7) joins |
09:12:27 | | Island quits [Read error: Connection reset by peer] |
09:28:32 | | pseudorizer (pseudorizer) joins |
10:00:01 | | Bleo1 quits [Client Quit] |
10:01:21 | | Bleo1 joins |
10:20:53 | | Megame (Megame) joins |
10:51:08 | | parfait quits [Ping timeout: 265 seconds] |
11:07:53 | | Chris5010 (Chris5010) joins |
11:30:55 | | lennier2 joins |
11:30:57 | | BearFortress_ joins |
11:34:38 | | BearFortress quits [Ping timeout: 265 seconds] |
11:35:36 | | lennier2_ quits [Ping timeout: 265 seconds] |
12:38:38 | | Wohlstand (Wohlstand) joins |
12:38:49 | | Arcorann quits [Ping timeout: 272 seconds] |
13:07:16 | | Pedrosso joins |
13:31:02 | | Pedrosso quits [Remote host closed the connection] |
13:37:22 | | benjins joins |
13:38:18 | | benjins2_ joins |
13:40:53 | | benjinsm quits [Ping timeout: 272 seconds] |
13:40:53 | | benjins2__ quits [Ping timeout: 272 seconds] |
13:41:21 | | Pedrosso joins |
13:41:41 | | benjinsm joins |
13:45:08 | | benjinsmi joins |
13:45:37 | | benjins quits [Ping timeout: 265 seconds] |
13:47:04 | | benjinsm quits [Ping timeout: 265 seconds] |
13:51:01 | | benjinsmi quits [Ping timeout: 272 seconds] |
13:51:30 | | benjins joins |
13:58:09 | | simon8162 quits [Quit: ZNC 1.8.2 - https://znc.in] |
13:58:26 | | simon816 (simon816) joins |
14:02:51 | | zupolufa joins |
14:02:55 | | benjins quits [Read error: Connection reset by peer] |
14:33:22 | | benjins joins |
14:38:49 | | etnguyen03 (etnguyen03) joins |
14:46:39 | | icedice (icedice) joins |
14:51:14 | <Barto> | pabs: poor TheTechRobo he may get the hug of death of HN :D |
14:52:25 | | Pedrosso quits [Remote host closed the connection] |
15:43:38 | <TheTechRobo> | wtf send help https://lounge.thetechrobo.ca/uploads/f2d379beb39b7321/IMG_2421.jpeg |
15:46:55 | | etnguyen03 quits [Ping timeout: 272 seconds] |
15:48:11 | | etnguyen03 (etnguyen03) joins |
16:06:34 | | DogsRNice joins |
16:09:21 | | hackbug77 quits [Remote host closed the connection] |
16:11:25 | | hackbug (hackbug) joins |
16:15:02 | <Barto> | that's pretty moderate so far |
16:17:59 | <TheTechRobo> | 1.7k now |
16:21:00 | | hackbug quits [Client Quit] |
16:24:49 | | hackbug (hackbug) joins |
16:27:18 | | hackbug quits [Client Quit] |
16:29:01 | | hackbug (hackbug) joins |
16:40:17 | <@arkiver> | TheTechRobo: congrats on getting on front page :) |
16:40:21 | <@arkiver> | very nice tool as well! |
16:40:28 | <@arkiver> | JAA: whoops |
16:40:31 | <@arkiver> | thanks for the reminder |
16:43:36 | | dumbgoy joins |
16:46:08 | <TheTechRobo> | arkiver: :D |
17:06:02 | | nicolas17 joins |
17:12:52 | | zupolufa quits [Remote host closed the connection] |
17:25:05 | | rohvani quits [Ping timeout: 272 seconds] |
17:30:57 | | icedice quits [Client Quit] |
17:40:21 | <ScenarioPlanet> | https://transfer.archivete.am/jxHWG/static.spore.com-ids-2016-fix.txt.zst - Fixed line 538397 and broken sorting |
17:40:32 | <ScenarioPlanet> | Pedrosso pokechu22 ^ |
17:43:54 | | ehmry joins |
17:50:08 | | icedice (icedice) joins |
18:00:29 | <h2ibot> | JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=51125&oldid=51117 |
18:12:41 | | tertu2 (tertu) joins |
18:13:51 | | tertu quits [Ping timeout: 272 seconds] |
18:46:59 | | rohvani joins |
19:16:45 | | benjinsm joins |
19:16:53 | | benjins quits [Remote host closed the connection] |
19:16:53 | | itachi1706 quits [Client Quit] |
19:17:18 | | itachi1706 (itachi1706) joins |
19:20:52 | | benjinsm is now known as benjins |
19:20:53 | | benjins is now authenticated as benjins |
19:38:25 | | shinji257 quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
19:57:09 | | error joins |
20:02:05 | | shinji257 (shinji257) joins |
20:18:48 | | error quits [Read error: Connection reset by peer] |
20:19:10 | | error joins |
20:19:22 | <ScenarioPlanet> | error: Hello there |
20:22:18 | <error> | howdy |
20:28:23 | | shinji257 quits [Client Quit] |
20:28:31 | | shinji257 (shinji257) joins |
20:35:38 | <ScenarioPlanet> | Also I think you should change your nickname (with /nick new_nickname) or you'll get pinged every time someone uses the "error" word |
20:37:06 | <error> | fair lol |
20:37:18 | | error is now known as redlattice |
20:37:22 | <redlattice> | changed |
20:50:20 | | redlattice quits [Client Quit] |
20:59:09 | | BearFortress_ quits [Ping timeout: 272 seconds] |
21:17:38 | | Pedrosso joins |
21:18:56 | | BlueMaxima joins |
21:20:39 | | pseudorizer quits [Client Quit] |
21:23:03 | | pseudorizer (pseudorizer) joins |
21:24:07 | | Pedrosso quits [Remote host closed the connection] |
21:29:53 | | Wohlstand quits [Read error: Connection reset by peer] |
21:30:04 | | Wohlstand (Wohlstand) joins |
21:30:19 | | Wohlstand quits [Client Quit] |
21:34:41 | | Megame quits [Client Quit] |
21:35:19 | | Pedrosso joins |
21:46:12 | | Deewiant_ (Deewiant) joins |
21:46:16 | | jodizzle_ (jodizzle) joins |
21:46:20 | | neggles_ (neggles) joins |
21:46:22 | | emily (pseudorizer) joins |
21:46:47 | | null joins |
21:47:23 | | pie_[bnc] joins |
21:47:54 | | simon8162 (simon816) joins |
21:48:01 | | kdy_ (kdy) joins |
21:48:27 | | JAA_ (JAA) joins |
21:48:27 | | @ChanServ sets mode: +o JAA_ |
21:48:30 | | Iroha (Kitty) joins |
21:48:42 | | SketchCow joins |
21:48:43 | | murb_ (murb) joins |
21:48:46 | | anarchat (anarcat) joins |
21:48:47 | | bleb joins |
21:48:50 | | betamax_ (betamax) joins |
21:48:52 | | klg (klg) joins |
21:48:54 | | ats_ (ats) joins |
21:48:56 | | Ophanim_ joins |
21:49:03 | | lumidify_ (lumidify) joins |
21:49:04 | | nimaje1 joins |
21:49:07 | | automato83 joins |
21:49:15 | | rewby1 (rewby) joins |
21:49:15 | | @ChanServ sets mode: +o rewby1 |
21:49:16 | | nickofni1 (nickofnicks) joins |
21:49:21 | | @rewby quits [Killed (NickServ (GHOST command used by rewby1))] |
21:49:21 | | maxfan8_ (maxfan8) joins |
21:49:23 | | @rewby1 is now known as @rewby |
21:49:23 | | dave3 (dave) joins |
21:50:13 | | D00maholic (Doomaholic) joins |
21:50:28 | | rohvani4 joins |
21:50:58 | | JTL1 (jtl) joins |
21:52:02 | | pokechu22_ (pokechu22) joins |
21:52:05 | | pie_ quits [Quit: No Ping reply in 180 seconds.] |
21:52:05 | | mattx433 quits [Client Quit] |
21:52:05 | | TheTechRobo quits [Client Quit] |
21:52:06 | | rohvani quits [Client Quit] |
21:52:06 | | pseudorizer quits [Client Quit] |
21:52:06 | | rohvani4 is now known as rohvani |
21:52:10 | | rktk quits [Remote host closed the connection] |
21:52:10 | | etnguyen03 quits [Remote host closed the connection] |
21:52:10 | | nyany quits [Client Quit] |
21:52:10 | | maxfan8 quits [Remote host closed the connection] |
21:52:10 | | automato1 quits [Remote host closed the connection] |
21:52:10 | | xkey quits [Remote host closed the connection] |
21:52:10 | | aismallard quits [Remote host closed the connection] |
21:52:10 | | nyany_ (nyany) joins |
21:52:14 | | Earendil7 quits [Client Quit] |
21:52:14 | | ehmry quits [Client Quit] |
21:52:14 | | qwertyasdfuiopghjkl quits [Client Quit] |
21:52:14 | | neggles quits [Quit: bye friends - ZNC - https://znc.in] |
21:52:14 | | simon816 quits [Client Quit] |
21:52:14 | | h3ndr1k quits [Remote host closed the connection] |
21:52:14 | | neggles_ is now known as neggles |
21:52:15 | | JensRex quits [Quit: No Ping reply in 180 seconds.] |
21:52:15 | | JTL quits [Remote host closed the connection] |
21:52:15 | | colona quits [Remote host closed the connection] |
21:52:15 | | @AlsoJAA quits [Remote host closed the connection] |
21:52:15 | | murb quits [Remote host closed the connection] |
21:52:25 | | klg_ quits [Remote host closed the connection] |
21:52:25 | | jodizzle quits [Remote host closed the connection] |
21:52:25 | | cm quits [Remote host closed the connection] |
21:52:25 | | betamax quits [Remote host closed the connection] |
21:52:25 | | Ophanim quits [Remote host closed the connection] |
21:52:25 | | @JAA quits [Remote host closed the connection] |
21:52:25 | | nickofnicks quits [Remote host closed the connection] |
21:52:25 | | plcp quits [Remote host closed the connection] |
21:52:25 | | Deewiant quits [Remote host closed the connection] |
21:52:25 | | pokechu22 quits [Quit: WeeChat 4.0.2] |
21:52:25 | | nicolas17 quits [Remote host closed the connection] |
21:52:25 | | Fiszl quits [Remote host closed the connection] |
21:52:25 | | Elizabeth quits [Remote host closed the connection] |
21:52:25 | | anarcat quits [Remote host closed the connection] |
21:52:25 | | Pedrosso quits [Remote host closed the connection] |
21:52:25 | | kdy quits [Remote host closed the connection] |
21:52:25 | | Doomaholic quits [Remote host closed the connection] |
21:52:25 | | lumidify quits [Remote host closed the connection] |
21:52:25 | | Hackerpcs quits [Remote host closed the connection] |
21:52:25 | | nimaje quits [Remote host closed the connection] |
21:52:25 | | dave2 quits [Remote host closed the connection] |
21:52:25 | | erenrich_ quits [Remote host closed the connection] |
21:52:25 | | SketchCo1 quits [Remote host closed the connection] |
21:52:25 | | dx quits [Remote host closed the connection] |
21:52:25 | | jodizzle_ is now known as jodizzle |
21:52:26 | | nicolas17_ joins |
21:52:28 | | h3ndr1k (h3ndr1k) joins |
21:52:30 | | @JAA_ is now known as @JAA |
21:52:39 | | pokechu22_ is now known as pokechu22 |
21:52:40 | | Earendil7 (Earendil7) joins |
21:52:40 | | Pedrosso joins |
21:52:40 | | JensRex (JensRex) joins |
21:52:49 | | katocala quits [Ping timeout: 260 seconds] |
21:52:56 | | kdy_ is now known as kdy |
21:53:03 | | katocala joins |
21:53:12 | | Elizabeth (Elizabeth) joins |
21:53:16 | | xkey (xkey) joins |
21:53:37 | | aismallard joins |
21:53:45 | | Hackerpcs (Hackerpcs) joins |
21:53:49 | | negge_ joins |
21:53:50 | | AlsoJAA (JAA) joins |
21:53:50 | | @ChanServ sets mode: +o AlsoJAA |
21:53:51 | | erenrich joins |
21:56:05 | | Deewiant_ is now known as Deewiant |
21:56:41 | | negge quits [Ping timeout: 265 seconds] |
21:56:41 | | ats quits [Ping timeout: 265 seconds] |
21:57:10 | | dx joins |
21:57:20 | | colona (colona) joins |
22:00:28 | | ThetaDev_ quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
22:00:36 | | ThetaDev joins |
22:07:39 | <tomodachi94> | Does anyone know if Fextralife (https://fextralife.com) has been grabbed ever? Specifically curious about their wikis, which seem like a goldmine |
22:07:44 | <tomodachi94> | (Wiki page already created for those interested) |
22:12:36 | | katocala is now authenticated as katocala |
22:22:30 | <Pedrosso> | I don't see anything on https://archive.org/search?query=originalurl%3A%28%2Afextralife%2A%29 however, idk if there's possibly another way of searching for it |
22:23:07 | <Pedrosso> | If it's really a goldmine of wikis; maybe move this to in #wikiteam ? |
22:24:34 | <Pedrosso> | Didregard that last statment, as after all it's the entire website |
22:27:24 | <Pedrosso> | tomodachi94: I believe #archivebot automatically moves wikis to #wikibot when it discovers them, so I'd suggest you repeat this in #archivebot so that an admin can submit it |
22:28:01 | <Pedrosso> | do ask them if it does move them automatically as I don't know |
22:29:46 | <thuban> | archivebot is a self-contained system and doesn't submit anything to any other tooling |
22:30:57 | <Pedrosso> | Strange, when I had asked AB to archive a website with a wiki in it, it sent it there. Perhaps I misinterpreted it |
22:31:57 | | mossssss joins |
22:32:20 | <mossssss> | does anyone know if blogger/blogspot is in the warrior? |
22:32:31 | <mossssss> | or if there's even an initiative to archive it? |
22:33:45 | <@JAA> | Pedrosso: The originalurl search only works for wikis specifically dumped by WikiTeam tooling. Basically nobody else sets that metadata field. Certainly not AB. |
22:34:04 | <@JAA> | And no, AB does not submit anything elsewhere. That was done manually. |
22:34:16 | <thuban> | tomodachi94: doesn't look like it https://archive.fart.website/archivebot/viewer/?q=fextralife |
22:34:49 | | Overlordz__ quits [Quit: Leaving] |
22:34:58 | <@JAA> | mossssss: It isn't yet, but we're aware of the situation. Unfortunately, it doesn't seem to be possible to enumerate the blogs or similar. |
22:35:40 | <@JAA> | Looks like that (Google inactive accounts etc.) was never added to Deathwatch though. |
22:37:06 | | Island joins |
22:37:40 | <mossssss> | oh no!!! thats so frustrating that there's no way to do it |
22:37:48 | <Pedrosso> | Very frustrating indeed. |
22:38:04 | <mossssss> | i'm stressed because i know there's so much stuff on there that is totally going to be lost |
22:38:08 | <fireonlive> | :( |
22:38:28 | <fireonlive> | google is really on a 'HOW much are we storing???' kick lately |
22:38:49 | <Pedrosso> | Could you specify? |
22:38:56 | | hitgrr8 quits [Client Quit] |
22:39:10 | <@JAA> | If there's anything you particularly care about, feel free to ask in #archivebot about archiving it. Blogger blogs work fairly well (except for some pagination mess and the 'dynamic view' script hells). |
22:42:30 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+635, /* 2023 */ Add Google's inactive accounts purge): https://wiki.archiveteam.org/?diff=51126&oldid=51124 |
22:44:04 | <mossssss> | this is perhaps a bit backwards but is there a way to do it through individual bloggers profiles? probably half of profiles aren't visible but the ones that are usually have 1-3 blogs on them |
22:44:04 | | mossssss quits [Client Quit] |
22:45:06 | | mossssss joins |
22:45:42 | <@JAA> | The profile IDs are much too large to be bruteforced, and IIRC there's quite a bit of rate limiting on the profile pages. |
22:46:18 | <mossssss> | yeah - that makes sense. its just the only half-plausible solution i can come up with lol |
22:51:05 | <Pedrosso> | JAA: I've grabbed the wiki links from tomodachi94's suggested website. https://transfer.archivete.am/ErUPC/wikilinks.txt |
22:54:30 | | JTL1 is now known as JTL |
22:54:43 | <Ryz> | I mentioned the Blogger thing multiple times 2-3 months ago... |
22:55:25 | <@JAA> | Yes, it was discussed extensively in May. |
22:55:46 | <@JAA> | But since we have no way of discovering blogs, really... |
22:56:30 | <Ryz> | Not even Blogger ID numeration? |
22:56:35 | <Ryz> | *ID number |
22:56:39 | <Ryz> | Even if it's rate limited? |
23:00:57 | <thuban> | a bit, yeah (https://wiki.archiveteam.org/index.php/Blogger#Strategy, https://hackint.logs.kiska.pw/archiveteam-bs/20230910#c378934) |
23:02:29 | <thuban> | on a related note, anyone know whether blog names or user ids can be extracted from blogspot image cdn urls? parts don't look entirely random, but i'm not sure |
23:02:37 | | Arcorann (Arcorann) joins |
23:02:44 | <Ryz> | Yeah, it's one of the reasons why I became slightly to somewhat more inactive in ArchiveBot s: |
23:11:01 | <Ryz> | There seems to be an implicit feeling that Blogger may be deemed less important than YouTube or other stuff |
23:12:22 | <Ryz> | Even though it uses less space than something video related |
23:15:44 | <thuban> | i actually had no idea about this--must have missed the discussion |
23:20:12 | <pabs> | Barto, arkiver, TheTechRobo: oh, didn't think it would reach the front page :) |
23:21:46 | <@JAA> | thuban: Yeah, that's why this should've been on Deathwatch from the start. :-| |
23:22:36 | <Barto> | pabs: muahaha |
23:22:40 | <Barto> | congratz |
23:22:51 | <Pedrosso> | congratz indeed |
23:23:50 | <fireonlive> | pabs raking in that HN karma :p |
23:25:26 | <pabs> | re blogger, a while back I found you can scrape front pages for profile links, scrape front page links from profiles, and you get a probably ever-expanding lists |
23:25:53 | <Ryz> | I'm not even sure even adding it on Deathwatch when it was announced would help |
23:26:15 | <Pedrosso> | I don't imagine it'd be complete but quite extensive |
23:26:23 | <mossssss> | it would be nice to try |
23:26:28 | <Pedrosso> | It would |
23:27:17 | <@JAA> | Ryz: It does help. It wasn't really on my radar anymore until some people brought it up again a couple days ago (on Reddit and via email). |
23:27:46 | | murb_ is now known as murb |
23:28:19 | | @JAA summons the arkiver. |
23:29:01 | <fireonlive> | https://mkx9delh5a.execute-api.ca-central-1.amazonaws.com/uploads/c7743b41c33e6600/arkiver.png |
23:29:02 | <fireonlive> | it is time. |
23:29:06 | <fireonlive> | arkiver |
23:29:59 | <pabs> | my hacky script for blogger/blogspot enumeration: https://transfer.archivete.am/RAiXa/archive-blogspot.sh |
23:30:16 | <pabs> | (note the captchas you get really hamper the process) |
23:30:41 | <fireonlive> | anyone here work at google? :p |
23:30:59 | <pabs> | and my list of blogs I wanted to AB: https://transfer.archivete.am/XWpXt/blogspot.com-blogs.txt |
23:31:25 | <Pedrosso> | ooh, nevermind pabs: that(the script) does look like it'd be complete |
23:31:35 | <Ryz> | I have so many Blogspot websites to process too |
23:31:43 | | TheTechRobo (TheTechRobo) joins |
23:32:01 | <pabs> | sorry for the traffic bump TheTechRobo :) |
23:32:23 | <mossssss> | same - i may just send them in the other channel if i need to |
23:32:29 | | pabs reached out to a Google person he knows |
23:32:44 | <pabs> | (not in the right dept tho) |
23:33:02 | <fireonlive> | 🤞 |
23:33:11 | <katia> | fireonlive, you can tell if they say (opinions my own) |
23:33:18 | <fireonlive> | haha |
23:33:21 | <fireonlive> | true true |
23:33:32 | <TheTechRobo> | pabs: All good! :-) |
23:34:12 | <fireonlive> | ah! https://news.ycombinator.com/item?id=38228481 :) |
23:40:41 | <h2ibot> | PaulWise edited Blogger (+294, add second strategy): https://wiki.archiveteam.org/?diff=51127&oldid=47348 |
23:42:42 | <h2ibot> | PaulWise edited Blogger (+116, add list of blogs found with the second strategy): https://wiki.archiveteam.org/?diff=51128&oldid=51127 |
23:45:01 | <thuban> | here's a list of 144 blogs extracted from my irc logs (excluding #archivebot but not other archiveteam channels): https://transfer.archivete.am/2sEI9/blogspot_blogs_from_irc_logs.txt |
23:46:27 | <thuban> | (some of these are from topics of channels i scanned during the freenode implosion--i had totally forgotten about that) |
23:48:34 | <mossssss> | does this mean we might be able to do it?? (I would be SO relieved lol - even some is better than none) |
23:49:43 | <h2ibot> | Tomodachi94 created Fextralife (+458, Create page): https://wiki.archiveteam.org/?title=Fextralife |
23:49:44 | <h2ibot> | Tomodachi94 uploaded File:Fextralife banner.png: https://wiki.archiveteam.org/?title=File%3AFextralife%20banner.png |
23:50:04 | <@JAA> | One potential concern is that many blogs will not be at risk, and I guess we don't have a good way of identifying which ones are. |
23:51:18 | <mossssss> | yeah - i know its any google account that hasnt been touched in 2 years - but that doesnt necessarily mean that the blogs are representative of the accounts |
23:52:37 | <@JAA> | Any blog with a post in the past 2 years would *probably* be fine, but scheduled posts are a thing, so it's not reliable. |
23:53:41 | <mossssss> | omg i totally forgot about that... |
23:55:23 | | mossssss90 joins |
23:55:31 | <mossssss90> | not sure why it keeps disconnecting me lol so annoying |
23:55:46 | | mossssss quits [Remote host closed the connection] |