00:00:51Wingy1 quits [Remote host closed the connection]
00:01:42Wingy1 (Wingy) joins
00:16:27Specular joins
01:00:39<@JAA>My 4Players forums grab finished. Apart from 72 URLs that probably all resulted in SQL errors (e.g. https://forum.4pforen.4players.de/viewtopic.php?p=41325 ), I should have everything in there. The continuous archival of new content is obviously still running.
01:00:41dm4v quits [Read error: Connection reset by peer]
01:02:56dm4v joins
01:02:58dm4v quits [Changing host]
01:02:58dm4v (dm4v) joins
02:02:20dm4v quits [Read error: Connection reset by peer]
02:03:12dm4v joins
02:03:14dm4v quits [Changing host]
02:03:14dm4v (dm4v) joins
02:20:46ddd quits [Client Quit]
02:20:54qwertyasdfuiopghjkl87 joins
02:22:04qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
02:24:48Specular quits [Client Quit]
02:45:28paul2520 quits [Remote host closed the connection]
03:06:55qwertyasdfuiopghjkl87 is now known as qwertyasdfuiopghjkl
03:20:15HP_Archivist quits [Ping timeout: 258 seconds]
03:21:58ThreeHM quits [Ping timeout: 252 seconds]
03:23:50ThreeHM (ThreeHeadedMonkey) joins
03:53:06pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
03:53:29monoxane quits [Read error: Connection reset by peer]
03:54:07monoxane (monoxane) joins
03:55:08pabs (pabs) joins
04:21:53qw3rty__ joins
04:25:48qw3rty_ quits [Ping timeout: 258 seconds]
05:27:21fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))]
05:27:27fuzzy8021 (fuzzy8021) joins
05:41:40fuzzy8021 quits [Read error: Connection reset by peer]
05:49:52fuzzy8021 (fuzzy8021) joins
06:03:41BlueMaxima quits [Read error: Connection reset by peer]
06:40:05Stiletto quits [Remote host closed the connection]
06:40:17Stiletto joins
07:01:18Stilett0 joins
07:03:22Stiletto quits [Ping timeout: 265 seconds]
07:03:44sec^nd quits [Ping timeout: 258 seconds]
07:08:29sec^nd (second) joins
07:29:52Wingy1 quits [Read error: Connection reset by peer]
07:30:51Wingy1 (Wingy) joins
07:34:25Wingy1 quits [Remote host closed the connection]
07:35:11Wingy1 (Wingy) joins
08:00:07britmob256364 quits [Quit: britmob256364]
08:40:18monoxane4 (monoxane) joins
08:42:56monoxane quits [Ping timeout: 265 seconds]
08:42:57monoxane4 is now known as monoxane
09:28:48qwertyasdfuiopghjkl quits [Client Quit]
09:32:12qwertyasdfuiopghjkl joins
11:28:08sec^nd quits [Remote host closed the connection]
11:28:31sec^nd (second) joins
11:53:00Wingy1 quits [Remote host closed the connection]
11:54:05Wingy1 (Wingy) joins
11:57:52monoxane quits [Ping timeout: 252 seconds]
12:05:41britmob2563647 joins
12:59:34Wingy1 quits [Remote host closed the connection]
13:00:30Wingy1 (Wingy) joins
13:07:39HP_Archivist (HP_Archivist) joins
13:46:54paul2520 (paul2520) joins
14:02:43Arcorann quits [Ping timeout: 258 seconds]
14:10:10Wingy1 quits [Remote host closed the connection]
14:11:00Wingy1 (Wingy) joins
15:26:03cpina_ joins
15:26:03cpina quits [Read error: Connection reset by peer]
15:28:28<h2ibot>Tech234a edited Current Projects (+69, Google Sites and Webs are up well past their…): https://wiki.archiveteam.org/?diff=47701&oldid=47697
15:31:28<h2ibot>Tech234a edited Tumblr (+149, Tumblr was acquired by Automattic in 2019): https://wiki.archiveteam.org/?diff=47702&oldid=46129
16:09:31Wingy1 quits [Remote host closed the connection]
16:10:23Wingy1 (Wingy) joins
16:24:43Wingy1 quits [Remote host closed the connection]
16:25:35Wingy1 (Wingy) joins
16:34:09Wingy1 quits [Remote host closed the connection]
16:35:20Wingy1 (Wingy) joins
16:55:26Wingy1 quits [Remote host closed the connection]
16:56:14Wingy1 (Wingy) joins
17:10:33HackMii quits [Ping timeout: 258 seconds]
17:28:49G4te_Keep3r4 joins
17:28:58G4te_Keep3r quits [Ping timeout: 252 seconds]
17:28:58G4te_Keep3r4 is now known as G4te_Keep3r
17:38:51HackMii (hacktheplanet) joins
17:39:14Wingy1 quits [Remote host closed the connection]
17:39:59temporarily joins
17:40:07Wingy1 (Wingy) joins
17:42:11<temporarily>archive.today one captcha after another but no go. last ok maybe some weeks ago. from bionicbeaver standard firefox private and normal. Can anyone PLZ verify this issue? THANKS!!!
17:43:08<temporarily>.de IP no proxy
17:46:54<h2ibot>Thezt edited List of websites excluded from the Wayback Machine (+20): https://wiki.archiveteam.org/?diff=47703&oldid=47699
17:47:24LeGoupil joins
17:55:19<temporarily>with 4everproxy dot com i am reaching the site, but for archiving captchas not working through webproxy, even in normal 'cookies on'-mode of firefox
17:56:01<rewby>I'm pretty sure that archive.today is not us
17:56:36<russss>yeah different archive
17:58:35<temporarily>so maybe firefox on ubuntu18 to old, running on usb, cannot update. and/or .de ip's locked?
17:59:18<rewby>We don't know. We're not the people you're looking for.
17:59:32<temporarily>thx for answer, is there somewhere you know of?
18:00:30<temporarily>sorry i read the profound article on https://wiki.archiveteam.org/index.php/Archive.today looks like quite educated
18:00:56<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=47704&oldid=47703
18:09:13<temporarily>still, PLZ, if it is not too much of a burden to you, i kindly ask you trying out if you can pass the captchas without problems i.e. eternal loop without entry. THX and never mind if you cannot. Right time is of essence, yeah
18:10:33<russss>temporarily: FWIW I do use it quite regularly and I haven't seen any captchas here.
18:20:29<temporarily>whilst the archiveteam wiki itself details 'Constant reCAPTCHAs'?
18:25:59<rewby>The wiki is out of date all the time
18:28:40<@JAA>temporarily: You can contact the person behind archive.whatever at https://blog.archive.today/ask
18:30:43<@JAA>I can access the site and past snapshots without issues, but trying to submit something yields a captcha.
18:31:16<@JAA>Which is how it has been for a long time now, I believe.
18:33:34<temporarily>thank you very much JAA, haven't thought of it till now since just such a prob and minutes ago found the way via webproxy 4everproxy, will try, great, hopefully the ask wont take a captcha too.. otherwise it guess bionicbeaver standard firefox has become TOO old maybe?
18:34:53<temporarily>and thank you VERY MUCH for trying! kindness will prevail for sure!
18:34:53HP_Archivist quits [Ping timeout: 258 seconds]
18:35:04<h2ibot>JustAnotherArchivist created Template:IA collection (+28, Create separate template for collections for…): https://wiki.archiveteam.org/?title=Template%3AIA%20collection
18:36:04<h2ibot>JustAnotherArchivist edited Pastebin (+48, Link to collection): https://wiki.archiveteam.org/?diff=47706&oldid=47389
18:37:04<h2ibot>JustAnotherArchivist edited Reddit (+46, Link to collection): https://wiki.archiveteam.org/?diff=47707&oldid=47382
18:37:05<h2ibot>JustAnotherArchivist edited URLs (+44, Link to collection): https://wiki.archiveteam.org/?diff=47708&oldid=47627
18:38:45HP_Archivist (HP_Archivist) joins
18:46:19<temporarily>great, the captcha at the blogs' ask site worked! hopefully helpful for others as well, thx again
18:48:09<@JAA>:-)
18:51:07<h2ibot>JustAnotherArchivist edited CodePlex (+399, Link to data and other infobox updates): https://wiki.archiveteam.org/?diff=47709&oldid=47527
18:53:07<h2ibot>JustAnotherArchivist edited Facepunch Forums (+114, Link to data and other infobox updates): https://wiki.archiveteam.org/?diff=47710&oldid=47557
18:54:51Wingy1 quits [Read error: Connection reset by peer]
18:55:43Wingy1 (Wingy) joins
18:55:47Rodeo joins
18:55:58<Rodeo>hello :-)
18:56:05<Rodeo>quick question from Belgium
18:56:10<Rodeo>I would like to archive a Belgian blogger service, bloggen.be, that I fear will go offline soon
18:56:16<Rodeo>How can I set up an archiving campaign for that specific site myself
18:57:45<@JAA>Huh, almost 200k blogs. Impressive. (I wonder how much of that is spam.)
18:58:38<@JAA>And around 3.3M blog posts it seems.
18:58:44<@JAA>So yeah, this is not small.
18:59:20<Rodeo>How do you check the number of blogs and posts so quickly?
18:59:32<@JAA>The number of blogs is listed on https://www.bloggen.be/toplijsten_blog_blogs_bloggen.php
18:59:46<@JAA>'Totaal aantal blogs: 196.518'
19:00:03<Rodeo>check
19:00:06<@JAA>For the post count, I just went to a random blog with a recent post and looked at the post ID.
19:00:58<@JAA>What's the reason for your concern that it might vanish soon?
19:01:28<@JAA>And also, how soon is 'soon'? :-)
19:03:50<Rodeo>Their service is managed by a single Dutchman, all blogs were offline for a 4-5 week period (July-August), So yeah, I think it won't last long
19:03:51<temporarily>Out of a sudden, passing the captcha's works again, maybe i sat on a spammer's dynamic ip?? Anyways, thank you very much for your help, kindness JAA
19:03:56<rewby>3.3M posts seem relatively tame compared to some of the stuff we do
19:06:33<@JAA>rewby: Well yeah, my point was rather that it's quite a sizeable thing if you're not already familiar with web archival since Rodeo said they want to do it themselves.
19:06:41<rewby>Ah like that, yeah
19:06:58<rewby>I was gonna say this feels within qwarc territory, but I don't think anyone but you knows how to make qwarc dance
19:07:09<@JAA>Yep :-/
19:07:31<rewby>I've been meaning to learn either qwarc or warrior dev, but I'm so damn short on time always
19:07:55<@JAA>Yeah, I'd suggest to wait with learning how to tame qwarc until the next version anyway.
19:08:01<rewby>Noted
19:08:21<rewby>Is this trackerv2 kinda next version or something we can expect within a year?
19:08:37<@JAA>:-)
19:09:00<@JAA>I intend to work on it over the next two months, provided not too many new deadlines come up.
19:09:05<rewby>Fair enough
19:09:08<rewby>I wish you good luck on that
19:09:23<rewby>Let me know when it's done
19:09:24<@JAA>Well, first pywarc, then qwarc on top of that.
19:09:25<rewby>I'm curious
19:09:55<rewby>Anyway, I'll stop distracting the conversation from the bloggen.be platform
19:11:06<rewby>They appear to have a an alphabetical list of blogs. (https://www.bloggen.be/zoeken_blogs_alfabetisch.php?letter=A) so that's a nice starting point to enumerate the blogs
19:11:27<@JAA>I'm thinking of just throwing it into AB.
19:11:47<@JAA>Since it's all on one domain, that should work fine.
19:12:43<rewby>Could do yeah
19:12:50<@JAA>I could qwarc the contents of course, but there's custom styling and stuff it seems.
19:13:13<@JAA>Handling that with qwarc is a pain.
19:14:07<Rodeo>ok, I checked out ArchiveBot, that seems to be a good starting point
19:14:33<rewby>Yeah. If archivebot doesn't work I can always try and make heritrix deal with it.
19:16:06<rewby>JAA: Slightly worried that if you put it in AB you might fill up a pipeline. I don't know if AB chunks and uploads tasks while it's still grabbing or if it does each task in one go.
19:16:17<rewby>fill up being eat all the disk space
19:16:28<@JAA>It uploads 5 GiB chunks.
19:16:33<rewby>Ah okay
19:16:41<rewby>You can tell I'm not much of an advanced AB user
19:16:46<@JAA>We have two jobs with over 11 TiB each currently. :-P
19:16:55<rewby>Fair enough
19:17:02<rewby>I'd say try AB. See what happens
19:17:10<rewby>If nothing else, you can always just kill it
19:17:14<@JAA>I'm more worried about outlinks and the cookie jar performance issues.
19:17:14<Rodeo>sooo.. I should give the !archive https://www.bloggen.be/overzicht_alle_blogs_bloggen.php
19:17:38<@JAA>Would be a shame to skip the outlinks though.
19:17:40<Rodeo>correction, so I should launch the command https://www.bloggen.be/overzicht_alle_blogs_bloggen.php
19:17:45<rewby>JAA: I can try heritrix. I have stuff for outlinks
19:18:04<rewby>But that'll have to wait a week or so pending me dealing with exams
19:19:11<rewby>But yeah, I have a slightly modified version of heritrix that will archive a (set of) domain(s) based on (a list of) regex(es). All links it finds that aren't either embedded in the page (like images or scripts or css) end up in a big zst file
19:19:11<@JAA>Started an AB job with outlinks, let's see how it goes. :-)
19:19:23<@JAA>Ah, neat.
19:19:27<rewby>Yeah, it's quite useful
19:19:40<rewby>A pita to configure properly, but given about 3 hours I can make it happen
19:19:48<rewby>I then just let my server nom
19:19:55<rewby>Last time I did this we fed the outlinks.zst into urls
19:20:31<rewby>Actually no, it wasn't urls. Someone else ran a grab-site on them
19:20:35<rewby>But oh well
19:21:10<@JAA>Yeah, that would have the same cookie jar issue though.
19:23:36<rewby>See, I didn't run the outlinks myself
19:23:42<rewby>I just gave those to HCross to do with as he saw fit
19:23:57<rewby>I thought he was just gonna throw them in urls, but he chose that method
19:24:23<@JAA>If you're unaware: Python's cookie jar implementation is quite poor performance-wise. It's a big dict structure, and it iterates over every domain (or even every cookie) in the jar on every HTTP request.
19:25:03<rewby>I knew it was bad performance wise. I didn't know that's what it did
19:25:05<rewby>that's kinda bad
19:25:19<@JAA>Bigger AB jobs with many outlinks often run into that wall eventually.
19:25:28<@JAA>I assumed it'd be like millions of cookies. Nope, 10k does it.
19:25:47<rewby>That's the law of O(n^k) stuff, it bites you much sooner than you think
19:25:56<@JAA>Yeah
19:26:57treora quits [Remote host closed the connection]
19:27:58treora joins
19:29:02Rodeo leaves
19:45:07treora quits [Client Quit]
19:49:18HP_Archivist quits [Read error: Connection reset by peer]
19:50:45treora joins
19:51:58HP_Archivist (HP_Archivist) joins
19:53:58lennier1 quits [Client Quit]
19:55:36lennier1 (lennier1) joins
20:03:26qwertyasdfuiopghjkl quits [Remote host closed the connection]
20:22:54AlsoHP_Archivist joins
20:26:26HP_Archivist quits [Ping timeout: 258 seconds]
20:28:11Wingy1 quits [Remote host closed the connection]
20:29:02Wingy1 (Wingy) joins
20:30:17AlsoHP_Archivist quits [Client Quit]
20:42:34temporarily quits [Ping timeout: 244 seconds]
21:22:02LeGoupil quits [Client Quit]
21:50:38<h2ibot>Jake edited Periscope (+41, Add the collection): https://wiki.archiveteam.org/?diff=47711&oldid=47625
21:50:39<h2ibot>Jake edited Webs (+36, Add link to collection): https://wiki.archiveteam.org/?diff=47712&oldid=47623
21:50:40<h2ibot>Jake edited MediaFire (+41, Add link to collection): https://wiki.archiveteam.org/?diff=47713&oldid=47622
21:50:41<h2ibot>Jake edited Google Drive (+43, Add link to collection): https://wiki.archiveteam.org/?diff=47714&oldid=47694
21:50:42<h2ibot>Jake edited Chrome Web Store (+54, Add link to collection): https://wiki.archiveteam.org/?diff=47715&oldid=47609
21:55:19<Jake>(Unrelated, but this project never got completed? I think? https://wiki.archiveteam.org/index.php/LiveJournal )
21:58:48<@JAA>Yeah, we should go through all upcoming/inprogress projects and clean that up.
21:58:53<@JAA>Lots of dead things from years ago in there.
21:59:14Stilett0 is now known as Stiletto
21:59:39<h2ibot>Jake edited YouTube (+47, Add link to collection): https://wiki.archiveteam.org/?diff=47716&oldid=47692
21:59:40<h2ibot>Jake edited Google Sites (+51, Add link to collection): https://wiki.archiveteam.org/?diff=47717&oldid=47353
21:59:41<h2ibot>JustAnotherArchivist changed the user rights of User:Jake
21:59:53<@JAA>Jake: ^ You're automoderated now.
21:59:57<Jake>https://wiki.archiveteam.org/index.php/Ownlog.com seems incomplete as well. (and I'm unsure if this ever got started? https://wiki.archiveteam.org/index.php/JamiiForums )
21:59:59<Jake>Thanks!
22:00:09<@JAA>JamiiForums never started.
22:02:40<h2ibot>Jake edited Roblox (+92, Added links to the collection (I believe both…): https://wiki.archiveteam.org/?diff=47718&oldid=47691
22:04:40<h2ibot>Jake edited Google Poly (+50, Add link to collection): https://wiki.archiveteam.org/?diff=47719&oldid=47616
22:05:40<h2ibot>Jake edited Bintray (+47, Add link to collection): https://wiki.archiveteam.org/?diff=47720&oldid=47634
22:10:42<h2ibot>Jake edited Tinkercad (+219, Add links to the 3 items.): https://wiki.archiveteam.org/?diff=47721&oldid=47635
22:15:54Wingy1 quits [Remote host closed the connection]
22:16:43<h2ibot>Jake edited Angelfire (+23, Add link to AB job.): https://wiki.archiveteam.org/?diff=47722&oldid=41026
22:16:47Wingy1 (Wingy) joins
22:19:44<h2ibot>Jake edited Flickr (+46, Add link to collection): https://wiki.archiveteam.org/?diff=47723&oldid=45998
22:25:44<h2ibot>Jake edited NewsGrabber (+41, Add link to collection): https://wiki.archiveteam.org/?diff=47724&oldid=47499
22:28:45<h2ibot>Jake edited Yahoo! Groups (+43, Add link to collection): https://wiki.archiveteam.org/?diff=47725&oldid=47337
22:30:54Wingy1 quits [Remote host closed the connection]
22:31:42Wingy1 (Wingy) joins
22:31:46<h2ibot>Jake edited Quizlet (+55, Move the links to the infobox): https://wiki.archiveteam.org/?diff=47726&oldid=40157
22:51:27BlueMaxima joins
22:51:50<h2ibot>Jake edited Tumblr (+38, Add link to collection.): https://wiki.archiveteam.org/?diff=47727&oldid=47702
22:52:50<h2ibot>Jake edited Yuku.com (+36, Add link to collection): https://wiki.archiveteam.org/?diff=47728&oldid=47491
22:53:22TheTechRobo quits [Remote host closed the connection]
22:57:36TheTechRobo (TheTechRobo) joins
23:00:54Wingy1 quits [Remote host closed the connection]
23:00:56Arcorann (Arcorann) joins
23:01:51Wingy1 (Wingy) joins
23:05:27driib7 quits [Read error: Connection reset by peer]
23:05:37driib7 (driib) joins
23:15:06Wingy1 quits [Remote host closed the connection]
23:15:57Wingy1 (Wingy) joins
23:46:40Wingy1 quits [Remote host closed the connection]
23:47:31Wingy1 (Wingy) joins
23:52:39HP_Archivist (HP_Archivist) joins