| 00:00:51 | | Wingy1 quits [Remote host closed the connection] |
| 00:01:42 | | Wingy1 (Wingy) joins |
| 00:16:27 | | Specular joins |
| 01:00:39 | <@JAA> | My 4Players forums grab finished. Apart from 72 URLs that probably all resulted in SQL errors (e.g. https://forum.4pforen.4players.de/viewtopic.php?p=41325 ), I should have everything in there. The continuous archival of new content is obviously still running. |
| 01:00:41 | | dm4v quits [Read error: Connection reset by peer] |
| 01:02:56 | | dm4v joins |
| 01:02:58 | | dm4v is now authenticated as dm4v |
| 01:02:58 | | dm4v quits [Changing host] |
| 01:02:58 | | dm4v (dm4v) joins |
| 02:02:20 | | dm4v quits [Read error: Connection reset by peer] |
| 02:03:12 | | dm4v joins |
| 02:03:14 | | dm4v is now authenticated as dm4v |
| 02:03:14 | | dm4v quits [Changing host] |
| 02:03:14 | | dm4v (dm4v) joins |
| 02:20:46 | | ddd quits [Client Quit] |
| 02:20:54 | | qwertyasdfuiopghjkl87 joins |
| 02:22:04 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 02:24:48 | | Specular quits [Client Quit] |
| 02:45:28 | | paul2520 quits [Remote host closed the connection] |
| 03:06:55 | | qwertyasdfuiopghjkl87 is now known as qwertyasdfuiopghjkl |
| 03:20:15 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 03:21:58 | | ThreeHM quits [Ping timeout: 252 seconds] |
| 03:23:50 | | ThreeHM (ThreeHeadedMonkey) joins |
| 03:53:06 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 03:53:29 | | monoxane quits [Read error: Connection reset by peer] |
| 03:54:07 | | monoxane (monoxane) joins |
| 03:55:08 | | pabs (pabs) joins |
| 04:21:53 | | qw3rty__ joins |
| 04:25:48 | | qw3rty_ quits [Ping timeout: 258 seconds] |
| 05:27:21 | | fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))] |
| 05:27:27 | | fuzzy8021 (fuzzy8021) joins |
| 05:41:40 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 05:49:52 | | fuzzy8021 (fuzzy8021) joins |
| 06:03:41 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:40:05 | | Stiletto quits [Remote host closed the connection] |
| 06:40:17 | | Stiletto joins |
| 07:01:18 | | Stilett0 joins |
| 07:03:22 | | Stiletto quits [Ping timeout: 265 seconds] |
| 07:03:44 | | sec^nd quits [Ping timeout: 258 seconds] |
| 07:08:29 | | sec^nd (second) joins |
| 07:29:52 | | Wingy1 quits [Read error: Connection reset by peer] |
| 07:30:51 | | Wingy1 (Wingy) joins |
| 07:34:25 | | Wingy1 quits [Remote host closed the connection] |
| 07:35:11 | | Wingy1 (Wingy) joins |
| 08:00:07 | | britmob256364 quits [Quit: britmob256364] |
| 08:40:18 | | monoxane4 (monoxane) joins |
| 08:42:56 | | monoxane quits [Ping timeout: 265 seconds] |
| 08:42:57 | | monoxane4 is now known as monoxane |
| 09:28:48 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 09:32:12 | | qwertyasdfuiopghjkl joins |
| 11:28:08 | | sec^nd quits [Remote host closed the connection] |
| 11:28:31 | | sec^nd (second) joins |
| 11:53:00 | | Wingy1 quits [Remote host closed the connection] |
| 11:54:05 | | Wingy1 (Wingy) joins |
| 11:57:52 | | monoxane quits [Ping timeout: 252 seconds] |
| 12:05:41 | | britmob2563647 joins |
| 12:59:34 | | Wingy1 quits [Remote host closed the connection] |
| 13:00:30 | | Wingy1 (Wingy) joins |
| 13:07:39 | | HP_Archivist (HP_Archivist) joins |
| 13:46:54 | | paul2520 (paul2520) joins |
| 14:02:43 | | Arcorann quits [Ping timeout: 258 seconds] |
| 14:10:10 | | Wingy1 quits [Remote host closed the connection] |
| 14:11:00 | | Wingy1 (Wingy) joins |
| 15:26:03 | | cpina_ joins |
| 15:26:03 | | cpina quits [Read error: Connection reset by peer] |
| 15:28:28 | <h2ibot> | Tech234a edited Current Projects (+69, Google Sites and Webs are up well past their…): https://wiki.archiveteam.org/?diff=47701&oldid=47697 |
| 15:31:28 | <h2ibot> | Tech234a edited Tumblr (+149, Tumblr was acquired by Automattic in 2019): https://wiki.archiveteam.org/?diff=47702&oldid=46129 |
| 16:09:31 | | Wingy1 quits [Remote host closed the connection] |
| 16:10:23 | | Wingy1 (Wingy) joins |
| 16:24:43 | | Wingy1 quits [Remote host closed the connection] |
| 16:25:35 | | Wingy1 (Wingy) joins |
| 16:34:09 | | Wingy1 quits [Remote host closed the connection] |
| 16:35:20 | | Wingy1 (Wingy) joins |
| 16:55:26 | | Wingy1 quits [Remote host closed the connection] |
| 16:56:14 | | Wingy1 (Wingy) joins |
| 17:10:33 | | HackMii quits [Ping timeout: 258 seconds] |
| 17:28:49 | | G4te_Keep3r4 joins |
| 17:28:58 | | G4te_Keep3r quits [Ping timeout: 252 seconds] |
| 17:28:58 | | G4te_Keep3r4 is now known as G4te_Keep3r |
| 17:38:51 | | HackMii (hacktheplanet) joins |
| 17:39:14 | | Wingy1 quits [Remote host closed the connection] |
| 17:39:59 | | temporarily joins |
| 17:40:07 | | Wingy1 (Wingy) joins |
| 17:42:11 | <temporarily> | archive.today one captcha after another but no go. last ok maybe some weeks ago. from bionicbeaver standard firefox private and normal. Can anyone PLZ verify this issue? THANKS!!! |
| 17:43:08 | <temporarily> | .de IP no proxy |
| 17:46:54 | <h2ibot> | Thezt edited List of websites excluded from the Wayback Machine (+20): https://wiki.archiveteam.org/?diff=47703&oldid=47699 |
| 17:47:24 | | LeGoupil joins |
| 17:55:19 | <temporarily> | with 4everproxy dot com i am reaching the site, but for archiving captchas not working through webproxy, even in normal 'cookies on'-mode of firefox |
| 17:56:01 | <rewby> | I'm pretty sure that archive.today is not us |
| 17:56:36 | <russss> | yeah different archive |
| 17:58:35 | <temporarily> | so maybe firefox on ubuntu18 to old, running on usb, cannot update. and/or .de ip's locked? |
| 17:59:18 | <rewby> | We don't know. We're not the people you're looking for. |
| 17:59:32 | <temporarily> | thx for answer, is there somewhere you know of? |
| 18:00:30 | <temporarily> | sorry i read the profound article on https://wiki.archiveteam.org/index.php/Archive.today looks like quite educated |
| 18:00:56 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=47704&oldid=47703 |
| 18:09:13 | <temporarily> | still, PLZ, if it is not too much of a burden to you, i kindly ask you trying out if you can pass the captchas without problems i.e. eternal loop without entry. THX and never mind if you cannot. Right time is of essence, yeah |
| 18:10:33 | <russss> | temporarily: FWIW I do use it quite regularly and I haven't seen any captchas here. |
| 18:20:29 | <temporarily> | whilst the archiveteam wiki itself details 'Constant reCAPTCHAs'? |
| 18:25:59 | <rewby> | The wiki is out of date all the time |
| 18:28:40 | <@JAA> | temporarily: You can contact the person behind archive.whatever at https://blog.archive.today/ask |
| 18:30:43 | <@JAA> | I can access the site and past snapshots without issues, but trying to submit something yields a captcha. |
| 18:31:16 | <@JAA> | Which is how it has been for a long time now, I believe. |
| 18:33:34 | <temporarily> | thank you very much JAA, haven't thought of it till now since just such a prob and minutes ago found the way via webproxy 4everproxy, will try, great, hopefully the ask wont take a captcha too.. otherwise it guess bionicbeaver standard firefox has become TOO old maybe? |
| 18:34:53 | <temporarily> | and thank you VERY MUCH for trying! kindness will prevail for sure! |
| 18:34:53 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 18:35:04 | <h2ibot> | JustAnotherArchivist created Template:IA collection (+28, Create separate template for collections for…): https://wiki.archiveteam.org/?title=Template%3AIA%20collection |
| 18:36:04 | <h2ibot> | JustAnotherArchivist edited Pastebin (+48, Link to collection): https://wiki.archiveteam.org/?diff=47706&oldid=47389 |
| 18:37:04 | <h2ibot> | JustAnotherArchivist edited Reddit (+46, Link to collection): https://wiki.archiveteam.org/?diff=47707&oldid=47382 |
| 18:37:05 | <h2ibot> | JustAnotherArchivist edited URLs (+44, Link to collection): https://wiki.archiveteam.org/?diff=47708&oldid=47627 |
| 18:38:45 | | HP_Archivist (HP_Archivist) joins |
| 18:46:19 | <temporarily> | great, the captcha at the blogs' ask site worked! hopefully helpful for others as well, thx again |
| 18:48:09 | <@JAA> | :-) |
| 18:51:07 | <h2ibot> | JustAnotherArchivist edited CodePlex (+399, Link to data and other infobox updates): https://wiki.archiveteam.org/?diff=47709&oldid=47527 |
| 18:53:07 | <h2ibot> | JustAnotherArchivist edited Facepunch Forums (+114, Link to data and other infobox updates): https://wiki.archiveteam.org/?diff=47710&oldid=47557 |
| 18:54:51 | | Wingy1 quits [Read error: Connection reset by peer] |
| 18:55:43 | | Wingy1 (Wingy) joins |
| 18:55:47 | | Rodeo joins |
| 18:55:58 | <Rodeo> | hello :-) |
| 18:56:05 | <Rodeo> | quick question from Belgium |
| 18:56:10 | <Rodeo> | I would like to archive a Belgian blogger service, bloggen.be, that I fear will go offline soon |
| 18:56:16 | <Rodeo> | How can I set up an archiving campaign for that specific site myself |
| 18:57:45 | <@JAA> | Huh, almost 200k blogs. Impressive. (I wonder how much of that is spam.) |
| 18:58:38 | <@JAA> | And around 3.3M blog posts it seems. |
| 18:58:44 | <@JAA> | So yeah, this is not small. |
| 18:59:20 | <Rodeo> | How do you check the number of blogs and posts so quickly? |
| 18:59:32 | <@JAA> | The number of blogs is listed on https://www.bloggen.be/toplijsten_blog_blogs_bloggen.php |
| 18:59:46 | <@JAA> | 'Totaal aantal blogs: 196.518' |
| 19:00:03 | <Rodeo> | check |
| 19:00:06 | <@JAA> | For the post count, I just went to a random blog with a recent post and looked at the post ID. |
| 19:00:58 | <@JAA> | What's the reason for your concern that it might vanish soon? |
| 19:01:28 | <@JAA> | And also, how soon is 'soon'? :-) |
| 19:03:50 | <Rodeo> | Their service is managed by a single Dutchman, all blogs were offline for a 4-5 week period (July-August), So yeah, I think it won't last long |
| 19:03:51 | <temporarily> | Out of a sudden, passing the captcha's works again, maybe i sat on a spammer's dynamic ip?? Anyways, thank you very much for your help, kindness JAA |
| 19:03:56 | <rewby> | 3.3M posts seem relatively tame compared to some of the stuff we do |
| 19:06:33 | <@JAA> | rewby: Well yeah, my point was rather that it's quite a sizeable thing if you're not already familiar with web archival since Rodeo said they want to do it themselves. |
| 19:06:41 | <rewby> | Ah like that, yeah |
| 19:06:58 | <rewby> | I was gonna say this feels within qwarc territory, but I don't think anyone but you knows how to make qwarc dance |
| 19:07:09 | <@JAA> | Yep :-/ |
| 19:07:31 | <rewby> | I've been meaning to learn either qwarc or warrior dev, but I'm so damn short on time always |
| 19:07:55 | <@JAA> | Yeah, I'd suggest to wait with learning how to tame qwarc until the next version anyway. |
| 19:08:01 | <rewby> | Noted |
| 19:08:21 | <rewby> | Is this trackerv2 kinda next version or something we can expect within a year? |
| 19:08:37 | <@JAA> | :-) |
| 19:09:00 | <@JAA> | I intend to work on it over the next two months, provided not too many new deadlines come up. |
| 19:09:05 | <rewby> | Fair enough |
| 19:09:08 | <rewby> | I wish you good luck on that |
| 19:09:23 | <rewby> | Let me know when it's done |
| 19:09:24 | <@JAA> | Well, first pywarc, then qwarc on top of that. |
| 19:09:25 | <rewby> | I'm curious |
| 19:09:55 | <rewby> | Anyway, I'll stop distracting the conversation from the bloggen.be platform |
| 19:11:06 | <rewby> | They appear to have a an alphabetical list of blogs. (https://www.bloggen.be/zoeken_blogs_alfabetisch.php?letter=A) so that's a nice starting point to enumerate the blogs |
| 19:11:27 | <@JAA> | I'm thinking of just throwing it into AB. |
| 19:11:47 | <@JAA> | Since it's all on one domain, that should work fine. |
| 19:12:43 | <rewby> | Could do yeah |
| 19:12:50 | <@JAA> | I could qwarc the contents of course, but there's custom styling and stuff it seems. |
| 19:13:13 | <@JAA> | Handling that with qwarc is a pain. |
| 19:14:07 | <Rodeo> | ok, I checked out ArchiveBot, that seems to be a good starting point |
| 19:14:33 | <rewby> | Yeah. If archivebot doesn't work I can always try and make heritrix deal with it. |
| 19:16:06 | <rewby> | JAA: Slightly worried that if you put it in AB you might fill up a pipeline. I don't know if AB chunks and uploads tasks while it's still grabbing or if it does each task in one go. |
| 19:16:17 | <rewby> | fill up being eat all the disk space |
| 19:16:28 | <@JAA> | It uploads 5 GiB chunks. |
| 19:16:33 | <rewby> | Ah okay |
| 19:16:41 | <rewby> | You can tell I'm not much of an advanced AB user |
| 19:16:46 | <@JAA> | We have two jobs with over 11 TiB each currently. :-P |
| 19:16:55 | <rewby> | Fair enough |
| 19:17:02 | <rewby> | I'd say try AB. See what happens |
| 19:17:10 | <rewby> | If nothing else, you can always just kill it |
| 19:17:14 | <@JAA> | I'm more worried about outlinks and the cookie jar performance issues. |
| 19:17:14 | <Rodeo> | sooo.. I should give the !archive https://www.bloggen.be/overzicht_alle_blogs_bloggen.php |
| 19:17:38 | <@JAA> | Would be a shame to skip the outlinks though. |
| 19:17:40 | <Rodeo> | correction, so I should launch the command https://www.bloggen.be/overzicht_alle_blogs_bloggen.php |
| 19:17:45 | <rewby> | JAA: I can try heritrix. I have stuff for outlinks |
| 19:18:04 | <rewby> | But that'll have to wait a week or so pending me dealing with exams |
| 19:19:11 | <rewby> | But yeah, I have a slightly modified version of heritrix that will archive a (set of) domain(s) based on (a list of) regex(es). All links it finds that aren't either embedded in the page (like images or scripts or css) end up in a big zst file |
| 19:19:11 | <@JAA> | Started an AB job with outlinks, let's see how it goes. :-) |
| 19:19:23 | <@JAA> | Ah, neat. |
| 19:19:27 | <rewby> | Yeah, it's quite useful |
| 19:19:40 | <rewby> | A pita to configure properly, but given about 3 hours I can make it happen |
| 19:19:48 | <rewby> | I then just let my server nom |
| 19:19:55 | <rewby> | Last time I did this we fed the outlinks.zst into urls |
| 19:20:31 | <rewby> | Actually no, it wasn't urls. Someone else ran a grab-site on them |
| 19:20:35 | <rewby> | But oh well |
| 19:21:10 | <@JAA> | Yeah, that would have the same cookie jar issue though. |
| 19:23:36 | <rewby> | See, I didn't run the outlinks myself |
| 19:23:42 | <rewby> | I just gave those to HCross to do with as he saw fit |
| 19:23:57 | <rewby> | I thought he was just gonna throw them in urls, but he chose that method |
| 19:24:23 | <@JAA> | If you're unaware: Python's cookie jar implementation is quite poor performance-wise. It's a big dict structure, and it iterates over every domain (or even every cookie) in the jar on every HTTP request. |
| 19:25:03 | <rewby> | I knew it was bad performance wise. I didn't know that's what it did |
| 19:25:05 | <rewby> | that's kinda bad |
| 19:25:19 | <@JAA> | Bigger AB jobs with many outlinks often run into that wall eventually. |
| 19:25:28 | <@JAA> | I assumed it'd be like millions of cookies. Nope, 10k does it. |
| 19:25:47 | <rewby> | That's the law of O(n^k) stuff, it bites you much sooner than you think |
| 19:25:56 | <@JAA> | Yeah |
| 19:26:57 | | treora quits [Remote host closed the connection] |
| 19:27:58 | | treora joins |
| 19:29:02 | | Rodeo leaves |
| 19:45:07 | | treora quits [Client Quit] |
| 19:49:18 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 19:50:45 | | treora joins |
| 19:51:58 | | HP_Archivist (HP_Archivist) joins |
| 19:53:58 | | lennier1 quits [Client Quit] |
| 19:55:36 | | lennier1 (lennier1) joins |
| 20:03:26 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 20:22:54 | | AlsoHP_Archivist joins |
| 20:26:26 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 20:28:11 | | Wingy1 quits [Remote host closed the connection] |
| 20:29:02 | | Wingy1 (Wingy) joins |
| 20:30:17 | | AlsoHP_Archivist quits [Client Quit] |
| 20:42:34 | | temporarily quits [Ping timeout: 244 seconds] |
| 21:22:02 | | LeGoupil quits [Client Quit] |
| 21:50:38 | <h2ibot> | Jake edited Periscope (+41, Add the collection): https://wiki.archiveteam.org/?diff=47711&oldid=47625 |
| 21:50:39 | <h2ibot> | Jake edited Webs (+36, Add link to collection): https://wiki.archiveteam.org/?diff=47712&oldid=47623 |
| 21:50:40 | <h2ibot> | Jake edited MediaFire (+41, Add link to collection): https://wiki.archiveteam.org/?diff=47713&oldid=47622 |
| 21:50:41 | <h2ibot> | Jake edited Google Drive (+43, Add link to collection): https://wiki.archiveteam.org/?diff=47714&oldid=47694 |
| 21:50:42 | <h2ibot> | Jake edited Chrome Web Store (+54, Add link to collection): https://wiki.archiveteam.org/?diff=47715&oldid=47609 |
| 21:55:19 | <Jake> | (Unrelated, but this project never got completed? I think? https://wiki.archiveteam.org/index.php/LiveJournal ) |
| 21:58:48 | <@JAA> | Yeah, we should go through all upcoming/inprogress projects and clean that up. |
| 21:58:53 | <@JAA> | Lots of dead things from years ago in there. |
| 21:59:14 | | Stilett0 is now known as Stiletto |
| 21:59:39 | <h2ibot> | Jake edited YouTube (+47, Add link to collection): https://wiki.archiveteam.org/?diff=47716&oldid=47692 |
| 21:59:40 | <h2ibot> | Jake edited Google Sites (+51, Add link to collection): https://wiki.archiveteam.org/?diff=47717&oldid=47353 |
| 21:59:41 | <h2ibot> | JustAnotherArchivist changed the user rights of User:Jake |
| 21:59:53 | <@JAA> | Jake: ^ You're automoderated now. |
| 21:59:57 | <Jake> | https://wiki.archiveteam.org/index.php/Ownlog.com seems incomplete as well. (and I'm unsure if this ever got started? https://wiki.archiveteam.org/index.php/JamiiForums ) |
| 21:59:59 | <Jake> | Thanks! |
| 22:00:09 | <@JAA> | JamiiForums never started. |
| 22:02:40 | <h2ibot> | Jake edited Roblox (+92, Added links to the collection (I believe both…): https://wiki.archiveteam.org/?diff=47718&oldid=47691 |
| 22:04:40 | <h2ibot> | Jake edited Google Poly (+50, Add link to collection): https://wiki.archiveteam.org/?diff=47719&oldid=47616 |
| 22:05:40 | <h2ibot> | Jake edited Bintray (+47, Add link to collection): https://wiki.archiveteam.org/?diff=47720&oldid=47634 |
| 22:10:42 | <h2ibot> | Jake edited Tinkercad (+219, Add links to the 3 items.): https://wiki.archiveteam.org/?diff=47721&oldid=47635 |
| 22:15:54 | | Wingy1 quits [Remote host closed the connection] |
| 22:16:43 | <h2ibot> | Jake edited Angelfire (+23, Add link to AB job.): https://wiki.archiveteam.org/?diff=47722&oldid=41026 |
| 22:16:47 | | Wingy1 (Wingy) joins |
| 22:19:44 | <h2ibot> | Jake edited Flickr (+46, Add link to collection): https://wiki.archiveteam.org/?diff=47723&oldid=45998 |
| 22:25:44 | <h2ibot> | Jake edited NewsGrabber (+41, Add link to collection): https://wiki.archiveteam.org/?diff=47724&oldid=47499 |
| 22:28:45 | <h2ibot> | Jake edited Yahoo! Groups (+43, Add link to collection): https://wiki.archiveteam.org/?diff=47725&oldid=47337 |
| 22:30:54 | | Wingy1 quits [Remote host closed the connection] |
| 22:31:42 | | Wingy1 (Wingy) joins |
| 22:31:46 | <h2ibot> | Jake edited Quizlet (+55, Move the links to the infobox): https://wiki.archiveteam.org/?diff=47726&oldid=40157 |
| 22:51:27 | | BlueMaxima joins |
| 22:51:50 | <h2ibot> | Jake edited Tumblr (+38, Add link to collection.): https://wiki.archiveteam.org/?diff=47727&oldid=47702 |
| 22:52:50 | <h2ibot> | Jake edited Yuku.com (+36, Add link to collection): https://wiki.archiveteam.org/?diff=47728&oldid=47491 |
| 22:53:22 | | TheTechRobo quits [Remote host closed the connection] |
| 22:57:36 | | TheTechRobo (TheTechRobo) joins |
| 23:00:54 | | Wingy1 quits [Remote host closed the connection] |
| 23:00:56 | | Arcorann (Arcorann) joins |
| 23:01:51 | | Wingy1 (Wingy) joins |
| 23:05:27 | | driib7 quits [Read error: Connection reset by peer] |
| 23:05:37 | | driib7 (driib) joins |
| 23:15:06 | | Wingy1 quits [Remote host closed the connection] |
| 23:15:57 | | Wingy1 (Wingy) joins |
| 23:46:40 | | Wingy1 quits [Remote host closed the connection] |
| 23:47:31 | | Wingy1 (Wingy) joins |
| 23:52:39 | | HP_Archivist (HP_Archivist) joins |