00:06:00wickedplayer494 quits [Ping timeout: 265 seconds]
01:16:11kiryu quits [Client Quit]
01:19:24Arcorann (Arcorann) joins
01:27:01icedice quits [Client Quit]
01:33:42wickedplayer494 joins
01:41:17etnguyen03 quits [Ping timeout: 252 seconds]
01:47:52<pabs>arkiver: there are definitely times when AB is at or above the job limit + 5 pending, especially when flashfire42 is around doing ISP stuff. I tend to avoid doing proactive stuff a fair bit, unless we are more than a few jobs below the limit and things seem quiet
01:48:28<pabs>and when/if snscrape comes back, there will be a big backlog of twitter archiving to do
01:49:01<flashfire42>Sorry bout that hahaha
01:50:23<pabs>and there will be other situations where higher peak capacity is useful; for eg the adelaide university merger is going to be tons of jobs due to the many subdomains
01:51:53<pabs>no, I think you're doing good stuff flashfire42 :)
01:52:25<pabs>anyway, I'm sure we will always be able to reach whatever the job limit is :)
02:12:09etnguyen03 (etnguyen03) joins
02:25:17owen joins
02:25:41etnguyen03 quits [Ping timeout: 265 seconds]
02:30:59<owen>What's the easiest way to archive a mid-sized portion of a website? (ex. example.com/stuff-to-archive/*)
02:33:25threedeeitguy3 quits [Ping timeout: 265 seconds]
02:33:32threedeeitguy3 (threedeeitguy) joins
02:36:40<pabs>owen: archivebot
02:37:02<pabs>just pass us the URL and we will run it, then everything happens automatically after that
02:37:35<pabs>that needs a directory index or some other link based mechanism that lists all subcontents though
02:38:50BlueMaxima quits [Read error: Connection reset by peer]
02:40:14etnguyen03 (etnguyen03) joins
02:47:27DogsRNice__ joins
02:47:35threedeeitguy3 quits [Client Quit]
02:50:38DogsRNice__ quits [Remote host closed the connection]
02:50:50DogsRNice__ joins
02:51:47DogsRNice_ quits [Ping timeout: 265 seconds]
02:54:33threedeeitguy39 (threedeeitguy) joins
03:01:40owen quits [Client Quit]
03:05:49DogsRNice__ quits [Read error: Connection reset by peer]
03:06:10<pokechu22>The alternative technique is saving the entire site :P
03:12:39<project10>AB job 8k26biu6lro5cb6vi3awnu3z8 is a chonky one
03:20:06<@arkiver>#shreddit is restarted
03:20:42<@arkiver>pabs: so, we're holding back now?
03:21:20<pabs>some folks are occasionally yeah
03:24:06decagon__ joins
03:27:26krvme quits [Ping timeout: 252 seconds]
03:27:52<Ryz>Reminder, inactive user content excluding YouTube on Google may start being deleted starting on 2023 December S:
03:33:29<@arkiver>when IA is all fine again with taking data, got great plans for expanding our archiving - or well especially plans for #//
03:33:48<@arkiver>we'll significantly increase our coverage of 'important stuff'
03:36:55<fireonlive>awesome possum
03:44:29etnguyen03 quits [Ping timeout: 252 seconds]
03:47:21etnguyen03 (etnguyen03) joins
04:00:18<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=50742&oldid=50671
04:00:26BigBrain_ quits [Ping timeout: 245 seconds]
04:00:36<pabs>Ryz: does that include public blogspot/blogger/etc stuff?
04:02:26BigBrain_ (bigbrain) joins
04:09:59kiryu (kiryu) joins
04:10:12kiryu quits [Client Quit]
04:11:31<Ryz>pabs, yes...S:
04:13:40<Ryz>The problem with Blogger user number IDs is that it gives 429s pretty easily at least on running ArchiveBot, which is why I would want this to take off the ground as soon as possible...
04:13:46<Ryz>arkiver?
04:27:40<pabs>fuck
04:29:36pabs . o O 0 ( #Y )
04:31:42pabs has 1323 URLs in his blogspot archive TODO...
04:31:50<fireonlive>x_x
04:33:26<pabs>ISTR with blogspot it is easy to enumerate lots of blogspot starting with one blog, see what other blogs that author has, and same for all the commenters
04:33:59pabs checks shell history for some terrible oneliners
04:34:19<pabs>also theres tons of spammers on blogspot
04:35:45<fireonlive>yeah one of the sites i want to get archived eventually is just 99% overrun with spam (it's also js-hell-frontend-on-top-of-phpBB2) :/
04:35:47<fireonlive>sad to se
04:35:49<fireonlive>see
04:36:36kiryu (kiryu) joins
04:38:48<pabs>https://transfer.archivete.am/gJyh0/blogspot-profile-enumerator.sh
04:40:59<pabs>https://transfer.archivete.am/sKHm2/pabs-archive-blogspot-todo.txt
04:43:53etnguyen03 quits [Client Quit]
05:17:26Chris5010 quits [Ping timeout: 252 seconds]
05:22:30Chris5010 (Chris5010) joins
05:30:16kiryu quits [Remote host closed the connection]
05:30:50<shinji257>I got a couple of tasks that keep getting stuck at "Lua runtime error: reddit.lua:286: attempt to call global 'unicode_codepoint_as_utf8' (a nil value)"? They are reddit project tasks.
05:36:35kiryu (kiryu) joins
05:47:14<imer>shinji257: thats known i think, #shreddit is the project channel :)
05:47:33<imer>Just waiting for a fix, should be sorted later today
05:55:12erkinalp quits [Remote host closed the connection]
06:31:08nicolas17 quits [Ping timeout: 252 seconds]
07:01:11dumbgoy quits [Ping timeout: 265 seconds]
07:05:01Unholy236131661808515 quits [Remote host closed the connection]
07:06:36Unholy236131661808515 (Unholy2361) joins
07:26:10themadpro (themadpro) joins
08:07:23nulldata quits [Ping timeout: 252 seconds]
08:10:42nulldata (nulldata) joins
08:44:20kiryu quits [Client Quit]
08:45:23Island quits [Read error: Connection reset by peer]
08:51:31kiryu (kiryu) joins
09:00:44nulldata quits [Ping timeout: 252 seconds]
09:04:03nulldata (nulldata) joins
09:15:55<pabs>does anyone know if AB looks at <a href> links inside HTML comments?
09:17:49Exorcism is now known as Exorcism_
09:24:37Exorcism (exorcism) joins
09:35:31themadpro quits [Client Quit]
09:43:01<mgrandi>https://www.msn.com/en-us/news/technology/atari-pulls-nostalgia-power-move-and-buys-homebrew-community-forum/ar-AA1grqaA, I've heard rumblings that they are going to purge boards on https://forums.atariage.com , dunno how easy it is to archive , it's an Invision forum board
09:52:28<pabs>there is an AB job in progress
09:52:46<pabs>and the forum has been saved before, 2021 or 2019 IIRC
09:53:08<pabs>unfortunately we had to restart the job a couple of times and slow it down a fair bit
09:57:56<mgrandi>Awesome
09:59:17<pabs>got the main website too and some other subdomains
10:00:01railen63 quits [Remote host closed the connection]
10:00:17railen63 joins
10:57:52JensRex quits []
10:58:24JensRex (JensRex) joins
13:12:10aa joins
13:14:00aa quits [Remote host closed the connection]
13:26:56Arcorann quits [Ping timeout: 252 seconds]
14:03:08railen63 quits [Remote host closed the connection]
14:05:02railen63 joins
14:05:10Webuser10794 joins
14:05:43Webuser10794 quits [Remote host closed the connection]
14:06:17Webuser10794 joins
14:07:26Webuser693 joins
14:07:34driib quits [Quit: The Lounge - https://thelounge.chat]
14:08:31Webuser10794 quits [Remote host closed the connection]
14:09:05<Webuser693>Hey, do you have the video link, it's called https://www.youtube.com/watch?v=fUVrK6089fs
14:11:37Webuser693 quits [Remote host closed the connection]
14:12:22driib (driib) joins
14:25:11DogsRNice joins
14:25:56etnguyen03 (etnguyen03) joins
14:43:11kiryu quits [Client Quit]
14:47:59nncandy joins
14:48:31nncandy quits [Remote host closed the connection]
14:49:32etnguyen03 quits [Ping timeout: 265 seconds]
14:50:47<shinji257>imer: acknowledged
14:54:22gfhh quits [Ping timeout: 265 seconds]
14:55:35etnguyen03 (etnguyen03) joins
14:57:23gfhh joins
15:11:51kiryu joins
15:11:51kiryu quits [Changing host]
15:11:51kiryu (kiryu) joins
15:28:45dumbgoy joins
15:37:34<h2ibot>Bzc6p edited ArchiveTeam Domains (+37, /* archiveteam.hu */ Lecsű is discontinued): https://wiki.archiveteam.org/?diff=50743&oldid=50703
15:42:34<h2ibot>Bzc6p edited Deathwatch (-3, /* 2023 */ fix grammar): https://wiki.archiveteam.org/?diff=50744&oldid=50741
15:43:35<h2ibot>Bzc6p edited Valhalla (+0, /* Physical Options */ typo): https://wiki.archiveteam.org/?diff=50745&oldid=50740
16:00:44fede joins
16:01:03<fede>hello
16:01:25<fede>is this like an archiving project?
16:02:43<that_lurker>This is the team that does the projects. You can find info about current and old archiving projects in the wiki https://wiki.archiveteam.org/index.php/Main_Page
16:03:06<that_lurker>On the page of every project you can also find the corresponging irc channel.
16:04:04zhongfu quits [Ping timeout: 258 seconds]
16:04:13<fede>there's no everyplay archive right?
16:05:49zhongfu (zhongfu) joins
16:07:06<imer>https://wiki.archiveteam.org/index.php/Everyplay doesn't look like it
16:07:25<fede>thats so sad
16:07:33<fede>i lost all my videos
16:11:56AmAnd0A quits [Ping timeout: 252 seconds]
16:12:41<TheTechRobo>yeah, I unfortunately haven't been able to find anyone who archived it
16:12:45AmAnd0A joins
16:19:01kiryu quits [Client Quit]
16:22:31Naruyoko quits [Quit: Leaving]
16:26:14qw3rty quits [Ping timeout: 252 seconds]
16:27:55iCaotix quits [Read error: Connection reset by peer]
16:28:09iCaotix joins
16:39:52gfhh quits [Read error: Connection reset by peer]
16:40:22Naruyoko joins
16:42:27gfhh joins
16:53:45szczot3k quits [Ping timeout: 265 seconds]
16:53:55kiryu (kiryu) joins
16:57:29szczot3k (szczot3k) joins
17:15:46icedice (icedice) joins
17:21:18etnguyen03 quits [Ping timeout: 265 seconds]
17:25:39lunik173 quits [Ping timeout: 265 seconds]
17:32:44szczot3k quits [Client Quit]
17:33:21szczot3k (szczot3k) joins
17:45:53fede quits [Remote host closed the connection]
17:46:52AmAnd0A quits [Read error: Connection reset by peer]
17:47:19AmAnd0A joins
17:48:21jacksonchen666 quits [Ping timeout: 245 seconds]
18:07:27AmAnd0A quits [Read error: Connection reset by peer]
18:09:34AmAnd0A joins
18:09:56AlsoHP_Archivist joins
18:12:16AmAnd0A quits [Read error: Connection reset by peer]
18:12:34AmAnd0A joins
18:24:11<h2ibot>Exorcism edited DokuWiki (+92): https://wiki.archiveteam.org/?diff=50746&oldid=50527
18:26:11<h2ibot>Exorcism edited Wordpress.com (+106): https://wiki.archiveteam.org/?diff=50747&oldid=28940
18:36:04Hackerpcs quits [Quit: Hackerpcs]
18:38:35Hackerpcs (Hackerpcs) joins
18:40:12AlsoHP_Archivist quits [Client Quit]
18:43:52AmAnd0A quits [Read error: Connection reset by peer]
18:43:56AmAnd0A joins
18:57:20erkinalp joins
19:02:05<@JAA>pabs: AB parses the HTML and then walks the element tree. It shouldn't see anything in comments.
19:04:31driib quits [Client Quit]
19:16:34qwertyasdfuiopghjkl quits [Client Quit]
19:18:53<@arkiver>thuban: is there any update on the orange sites coming back?
19:19:21<h2ibot>Myusernameisanything edited University Web Hosting (-7, Changing not saved yet tag to lost.): https://wiki.archiveteam.org/?diff=50748&oldid=47676
19:19:22<h2ibot>Myusernameisanything edited List of websites excluded from the Wayback Machine (+57, Added 2 links): https://wiki.archiveteam.org/?diff=50749&oldid=50702
19:19:23<h2ibot>Myusernameisanything edited BluWiki (+10, If there are about 20 dumps, it is partially…): https://wiki.archiveteam.org/?diff=50750&oldid=27576
19:19:24<h2ibot>Gridkr edited List of websites excluded from the Wayback Machine (+20, Add https://nexo.com/): https://wiki.archiveteam.org/?diff=50751&oldid=50749
19:20:59driib (driib) joins
19:33:49qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
19:35:01BigBrain_ quits [Ping timeout: 245 seconds]
19:36:00AmAnd0A quits [Read error: Connection reset by peer]
19:36:07AmAnd0A joins
19:36:38AmAnd0A quits [Read error: Connection reset by peer]
19:36:56AmAnd0A joins
19:37:18BigBrain_ (bigbrain) joins
19:55:42etnguyen03 (etnguyen03) joins
20:00:29<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=50754&oldid=50751
20:04:32<h2ibot>JustAnotherArchivist edited SoundCloud (+236, Datetimeify, add 2019 projectn't, add…): https://wiki.archiveteam.org/?diff=50755&oldid=48897
20:09:14Island joins
20:41:59etnguyen03 quits [Ping timeout: 252 seconds]
20:56:00Miki57 joins
21:00:08Miki_57 quits [Ping timeout: 252 seconds]
21:08:51erkinalp quits [Remote host closed the connection]
21:15:43<project10>#archivebot jobs submit discovered things into the backfeed system, yes?
21:19:23AmAnd0A quits [Ping timeout: 252 seconds]
21:20:06AmAnd0A joins
21:28:51<TheTechRobo>i don’t think so, assuming you mean e.g. queuing imgur URLs in #imgone
21:30:36<@JAA>project10: No, there's zero interaction between AB and DPoS projects.
21:36:33<project10>well the genesis of my question was seeing #telegrab items submitted via AB (job 1ty54jgyh2n6iv2ri6o0gbbbp)
21:37:25fireonlive quits [Excess Flood]
21:37:57fireonlive (fireonlive) joins
21:42:07AmAnd0A quits [Read error: Connection reset by peer]
21:42:32AmAnd0A joins
21:44:17<@JAA>That's just me archiving URLs shared in AT channels so our logs aren't full of dead links in the future.
21:44:21iCaotix quits [Read error: Connection reset by peer]
21:44:36<project10>oh :)
21:46:06iCaotix joins
21:46:32<fireonlive>JAA++
21:47:10<that_lurker>we need commode points system here aswell :P
21:54:01etnguyen03 (etnguyen03) joins
21:56:03nicolas17 joins
21:56:55lunik173 joins
22:23:26<fireonlive>JAA++
22:23:27<eggdrop>karma for 'JAA' is now 1
22:23:30<fireonlive>lol
22:28:41etnguyen03 quits [Ping timeout: 252 seconds]
22:31:05<nicolas17>2 files remaining and I'll finish getting the listing of all yahoo-videos .tar.bz2 files
22:31:35etnguyen03 (etnguyen03) joins
22:32:13<nicolas17>my intention was to get *.tar.bz2 first while I wrote a more efficient script to get the .tar lists, which of course I haven't actually started yet so I'll have to continue the .tar files the slow way
22:38:50<@JAA>++fireonlive
22:39:03<fireonlive>f
22:39:13<@JAA>Pff, doesn't even understand pre-incrementing.
22:39:29<fireonlive>:p
22:44:58<TheTechRobo>eggdrop—
22:45:26<TheTechRobo>Oh thanks the lounge i really needed that transformation
22:45:56<@JAA>The Lounge--
22:45:57<eggdrop>[karma] 'The Lounge' is now at -1
22:46:33<@JAA>The Lounge--
22:46:35<eggdrop>[karma] 'The Lounge' is now at -1
22:46:43<@JAA>Ah, works with a normal space, too. :-)
22:46:51<Terbium>The Lounge++
22:46:51<eggdrop>[karma] 'The Lounge' is now at 0
22:46:54<fireonlive>TheTechRobo: i do believe that was iOS
22:46:56<fireonlive>:P
22:47:05<TheTechRobo>Oh thanks apple then
22:47:10<Terbium>iPhone--\
22:47:12<Terbium>iPhone--
22:47:13<eggdrop>[karma] 'iPhone' is now at -1
22:47:18<fireonlive>!
22:47:26<TheTechRobo>Dictating how I type letters, thanks Timmy
22:48:28<fireonlive>>not knowing how to configure text replacement
22:49:39<@JAA>This can go in -ot now. :-)
22:49:49<fireonlive>:)
22:49:51<@JAA>Apparently my FuzzyMemories.TV crawl is nearly done.
22:50:39<@JAA>It has a bit of pagination to hunt down but has already retrieved most /watch/ pages and the accompanying videos (that aren't 404s).
22:52:52BearFortress quits [Ping timeout: 265 seconds]
22:52:57<@JAA>Specifically, video IDs go to 4794, and my crawl has retrieved 4668 as of a couple minutes ago.
22:53:16<@JAA>~100 GiB so far
22:55:35<@JAA>4054 actual videos as of just now based on some crude log grepping.
23:00:51benjinsm joins
23:00:56Naruyoko5 joins
23:04:28Naruyoko quits [Ping timeout: 265 seconds]
23:04:28benjins quits [Ping timeout: 265 seconds]
23:09:39eythian quits [Client Quit]
23:10:03eythian joins
23:27:16icedice quits [Client Quit]
23:30:46BlueMaxima joins
23:36:53eggdrop quits [Ping timeout: 252 seconds]
23:39:26AmAnd0A quits [Remote host closed the connection]
23:39:38AmAnd0A joins
23:44:02AmAnd0A quits [Ping timeout: 252 seconds]
23:47:58systwi quits [Ping timeout: 265 seconds]
23:48:07systwi__ (systwi) joins
23:48:19eggdrop (eggdrop) joins
23:48:59railen63 quits [Remote host closed the connection]
23:53:27railen63 joins
23:56:54octylFractal|m joins