00:08:02Lord_Nightmare quits [Quit: ZNC - http://znc.in]
00:11:17Lord_Nightmare (Lord_Nightmare) joins
00:14:01Wohlstand quits [Quit: Wohlstand]
00:27:09^ quits [Ping timeout: 272 seconds]
00:30:26SootBector quits [Remote host closed the connection]
00:31:38SootBector (SootBector) joins
00:42:00ichdasich quits [Remote host closed the connection]
00:46:58tekulvw (tekulvw) joins
00:49:17etnguyen03 quits [Client Quit]
00:51:46tekulvw quits [Ping timeout: 268 seconds]
00:54:47^ (^) joins
00:56:55Barto quits [Ping timeout: 272 seconds]
01:03:43Barto (Barto) joins
01:09:35archiveDrill quits [Ping timeout: 272 seconds]
01:29:13etnguyen03 (etnguyen03) joins
01:29:33tekulvw (tekulvw) joins
01:34:19tekulvw quits [Ping timeout: 268 seconds]
01:36:40archiveDrill joins
01:39:54tekulvw (tekulvw) joins
01:45:03tekulvw quits [Ping timeout: 272 seconds]
01:48:24tekulvw (tekulvw) joins
01:48:31<nstrom|m>ty imer, see that now. hard to keep track of chat in telegrab because of bot activity
01:48:56<@imer>#telegrab-chat ? :)
01:53:17tekulvw quits [Ping timeout: 272 seconds]
01:58:29TunaLobster quits [Quit: So long and thanks for all the fish]
01:59:22TunaLobster joins
01:59:45Webuser378569 joins
01:59:54tekulvw (tekulvw) joins
02:01:18Webuser378569 quits [Client Quit]
02:35:43dxrt_ is now known as dxrt
02:35:43dxrt quits [Changing host]
02:35:43dxrt (dxrt) joins
02:35:43@ChanServ sets mode: +o dxrt
02:37:21<datechnoman>Omg I didnt know that that channel existed! >.<
02:45:19Hackerpcs quits [Remote host closed the connection]
02:46:13Hackerpcs (Hackerpcs) joins
02:46:26chunkynutz609 joins
02:47:09chunkynutz60 quits [Read error: Connection reset by peer]
02:47:09chunkynutz609 is now known as chunkynutz60
02:59:47tekulvw quits [Ping timeout: 272 seconds]
03:07:01tekulvw (tekulvw) joins
03:11:45tekulvw quits [Ping timeout: 268 seconds]
03:23:03tekulvw (tekulvw) joins
03:27:39tekulvw quits [Ping timeout: 272 seconds]
03:41:59etnguyen03 quits [Client Quit]
03:42:58etnguyen03 (etnguyen03) joins
03:50:46DogsRNice_ quits [Read error: Connection reset by peer]
03:51:30tekulvw (tekulvw) joins
03:56:09tekulvw quits [Ping timeout: 268 seconds]
04:02:25tekulvw (tekulvw) joins
04:07:22Island quits [Read error: Connection reset by peer]
04:07:27etnguyen03 quits [Remote host closed the connection]
04:11:48tekulvw quits [Remote host closed the connection]
04:12:06tekulvw (tekulvw) joins
04:19:35tekulvw quits [Ping timeout: 272 seconds]
04:21:06tekulvw (tekulvw) joins
04:25:45tekulvw quits [Ping timeout: 268 seconds]
04:27:01tekulvw (tekulvw) joins
04:32:15tekulvw quits [Ping timeout: 272 seconds]
04:35:50fireatseaparks quits [Quit: Textual IRC Client: www.textualapp.com]
04:36:18fireatseaparks (fireatseaparks) joins
04:38:36tekulvw (tekulvw) joins
04:57:49tekulvw quits [Ping timeout: 268 seconds]
05:02:07tekulvw (tekulvw) joins
05:03:10Webuser760137 joins
05:04:36n9nes quits [Ping timeout: 268 seconds]
05:06:07n9nes joins
05:06:39Webuser760137 quits [Client Quit]
05:12:09tekulvw quits [Ping timeout: 272 seconds]
05:14:49tekulvw (tekulvw) joins
05:21:52tekulvw quits [Ping timeout: 268 seconds]
05:25:53Webuser463246 joins
05:26:04tekulvw (tekulvw) joins
05:36:51tekulvw quits [Remote host closed the connection]
05:37:09tekulvw (tekulvw) joins
05:50:00Webuser463246 quits [Client Quit]
05:52:06tekulvw quits [Ping timeout: 268 seconds]
05:54:36sec^nd quits [Remote host closed the connection]
05:55:07sec^nd (second) joins
05:55:14tekulvw (tekulvw) joins
05:57:51nexussfan quits [Quit: Konversation terminated!]
06:00:17tekulvw quits [Ping timeout: 272 seconds]
06:04:17tekulvw (tekulvw) joins
06:09:21tekulvw quits [Ping timeout: 268 seconds]
06:30:00tekulvw (tekulvw) joins
06:33:12<@arkiver>datechnoman: me neither :P
06:33:24nine quits [Quit: See ya!]
06:33:38nine joins
06:33:38nine quits [Changing host]
06:33:38nine (nine) joins
06:33:53<datechnoman>I feel so stupid. The amount of times I thought to myself we needed a channel for #telegrab... it was there all along...
06:46:31<datechnoman>Hoping we can keep images in the mix as they are a gold mine
07:04:51tekulvw quits [Ping timeout: 268 seconds]
07:04:58<pabs>justauser: I haven't yet mailed them about the git, IIRC that isn't affected
07:23:27tekulvw (tekulvw) joins
07:28:19tekulvw quits [Ping timeout: 272 seconds]
07:45:53runxiyu quits [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in]
08:00:35tekulvw (tekulvw) joins
08:05:17tekulvw quits [Ping timeout: 268 seconds]
08:08:51<DlugasnyPL>when You grabbing telegram content, for sure You have file duplicates (for example shared pictures between channels, users), do You have any mechanismus which removing duplicate content (check hash etc.) ? Or You are simply storing complete stream ?
08:18:57HP_Archivist quits [Read error: Connection reset by peer]
08:44:32JTL quits [Remote host closed the connection]
08:45:14JTL (JTL) joins
09:16:32TheEnbyperor quits [Read error: Connection reset by peer]
09:16:33TheEnbyperor_ is now known as TheEnbyperor
09:17:12linuxgemini3 (linuxgemini) joins
09:18:40linuxgemini quits [Ping timeout: 268 seconds]
09:18:41linuxgemini3 is now known as linuxgemini
09:26:55TheEnbyperor_ (TheEnbyperor) joins
09:43:09tekulvw (tekulvw) joins
09:48:17tekulvw quits [Ping timeout: 272 seconds]
10:16:41Webuser955198 joins
10:17:03<Webuser955198>
10:17:12<Webuser955198>不得,,?
10:18:24Webuser955198 quits [Client Quit]
11:02:34Sluggs quits [Excess Flood]
11:03:48evergreen5 quits [Quit: Bye]
11:04:14evergreen56 joins
11:07:41Sluggs (Sluggs) joins
11:11:24Sluggs quits [Excess Flood]
11:18:49Sluggs (Sluggs) joins
11:27:53Sluggs quits [Excess Flood]
11:28:07sec^nd quits [Ping timeout: 245 seconds]
11:28:31sec^nd (second) joins
11:33:29Sluggs (Sluggs) joins
11:35:15<h2ibot>Cooljeanius edited Substack (+23, +cat): https://wiki.archiveteam.org/?diff=60512&oldid=60496
11:40:15<h2ibot>Cooljeanius edited Medium (+93, copyedit (wikify, add category)): https://wiki.archiveteam.org/?diff=60513&oldid=60479
11:41:05<DlugasnyPL>I have compared warc`s generated by wget-at and browsertrix (which I`m currently using to archive pages). I know the difference between two of that tools. What do You think about creating some small team (subgroup of archiveteam) which will record pages using Browsertrix or any other crawler which will be able to properly execute java scripts and rest of site components which are not possible for wget-at ? I`m talking about warcs which user
11:41:05<DlugasnyPL>can download and browse - keeping in mind that page is almost 90-100% the same as in real. What do You think about ?
11:44:16<h2ibot>Cooljeanius created Scribd (+196, Created page with "{{stub}} '''Scribd''' is a…): https://wiki.archiveteam.org/?oldid=60514
11:44:17<h2ibot>Cooljeanius edited ArchiveBot/Monitoring (+4, /* Ideas */ Wikify ([[Scribd]])): https://wiki.archiveteam.org/?diff=60515&oldid=58718
11:45:16<h2ibot>Cooljeanius edited Academia.edu (+49, mention [[Scribd]]): https://wiki.archiveteam.org/?diff=60516&oldid=60028
12:00:01Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat]
12:01:18<h2ibot>Manu edited Discourse/archived (+96, Queued forums.whonix.org): https://wiki.archiveteam.org/?diff=60517&oldid=60488
12:02:41Bleo1826007227196234552220 joins
12:03:18<h2ibot>Manu edited Discourse/archived (+100, Queued community.ntppool.org): https://wiki.archiveteam.org/?diff=60518&oldid=60517
12:09:07DlugasnyPL quits [Read error: Connection reset by peer]
12:09:10DlugasnyPL joins
12:43:32oxtyped_ joins
12:43:34tekulvw (tekulvw) joins
12:43:43oxtyped quits [Ping timeout: 272 seconds]
12:43:46oxtyped_ is now known as oxtyped
12:48:20tekulvw quits [Ping timeout: 268 seconds]
13:11:46Arcorann quits [Ping timeout: 268 seconds]
13:27:54PredatorIWD257 joins
13:31:51PredatorIWD25 quits [Ping timeout: 272 seconds]
13:31:51PredatorIWD257 is now known as PredatorIWD25
13:32:44benjins3 quits [Ping timeout: 268 seconds]
13:33:05FiTheArchiver joins
13:35:13FiTheArchiver quits [Client Quit]
13:42:36useretail quits [Remote host closed the connection]
13:43:37useretail joins
14:21:40Dada joins
14:33:12<cruller>To quote https://github.com/iipc/warcaroo/blob/main/cdp/README.md, "In the future, we might switch to the webdriver-bidi protocol once it covers everything we need."
14:34:09<cruller>Is such a future possible? (Of course, I hope so.)
14:37:33Dada quits [Remote host closed the connection]
14:38:45Dada joins
15:07:04benjins3 joins
15:11:29Island joins
15:12:33knecht quits [Ping timeout: 272 seconds]
15:19:08knecht (knecht) joins
15:19:31Cuphead2527480 (Cuphead2527480) joins
15:30:28nulldata-alt1 (nulldata) joins
15:49:30tekulvw (tekulvw) joins
15:54:34tekulvw quits [Ping timeout: 268 seconds]
16:02:20tekulvw (tekulvw) joins
16:06:54tekulvw quits [Ping timeout: 268 seconds]
16:11:54<justauser>Yakov, klea: Can't reproduce. blog.archive.today redirects me to https://archive-is.tumblr.com/, which is still good.
16:12:13<klea>Yeah, it also redirects me to thta.
16:12:26<klea>I supposed that's what Yakov meant by them no longer being on tumblr.
16:14:21<justauser>They are, last post today and certainly nothing was wiped.
16:17:55<klea>Oh ok.
16:19:06sec^nd quits [Remote host closed the connection]
16:19:31sec^nd (second) joins
16:21:00<h2ibot>Klea edited RTVE Play (-97): https://wiki.archiveteam.org/?diff=60519&oldid=58295
16:38:26CYBERDEV quits [Quit: Leaving]
16:41:30ducky quits [Remote host closed the connection]
16:41:45ducky (ducky) joins
16:42:22CYBERDEV joins
16:57:09kansei (kansei) joins
16:58:42kansei- quits [Ping timeout: 268 seconds]
17:00:13APOLLO03 quits [Ping timeout: 272 seconds]
17:01:18DlugasnyPL quits [Quit: Leaving]
17:04:39@Sanqui quits [Ping timeout: 272 seconds]
17:07:33Sanqui joins
17:07:38Sanqui quits [Changing host]
17:07:38Sanqui (Sanqui) joins
17:07:38@ChanServ sets mode: +o Sanqui
17:21:14<hexagonwin>i'm developing a python+warcio based crawler for some web service (needs auth). does anyone know of a good way to keep track of downloaded URLs?
17:22:02<hexagonwin>i'd prefer to have it default to not re-visiting URLs as the contents are (mostly) unchanging, and there are same URLs being repeated in many different pages.
17:22:16<justauser>Warrior tracker uses Bloom filter.
17:22:24<justauser>AB uses SQLite DB.
17:22:42<hexagonwin>i'll take a look, thanks
17:22:50<justauser>And warcio is discouraged IIRC.
17:22:56<hexagonwin>oh is it?
17:23:09<justauser>Writing WARCs: No. Has long-standing bugs regarding correct preservation of data as sent by the server.[6][7]
17:23:14<hexagonwin>oops
17:23:37<hexagonwin>i couldn't find documentation on using wpull as a library (and not a command line util) so i was using warcio instead :/
17:24:23<justauser>Maybe warc-for-humans is considered production-grade now? /cc DigitalDragons
17:27:13tekulvw (tekulvw) joins
17:32:00tekulvw quits [Ping timeout: 268 seconds]
17:33:05Ryz quits [Quit: Ping timeout (120 seconds)]
17:42:37@dxrt quits [Quit: ZNC - http://znc.sourceforge.net]
17:44:18croissant quits [Quit: Leaving]
17:46:27pokechu22 quits [Ping timeout: 272 seconds]
17:47:19tekulvw (tekulvw) joins
17:47:49dxrt joins
17:48:35dxrt quits [Changing host]
17:48:35dxrt (dxrt) joins
17:48:35@ChanServ sets mode: +o dxrt
17:51:21klea gives out the link to the big warc table: https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem#Information
17:51:48pokechu22 (pokechu22) joins
17:52:09tekulvw quits [Ping timeout: 272 seconds]
17:53:35<hexagonwin>only wget-at, wpull, warcprox, qwarc seems to be usable.. wget-at wouldn't be ideal for integrating in a python script, i can't find documentation for wpull and qwarc, warcprox hardly looks ideal since it's a proxy
17:58:46croissant joins
18:00:07<hexagonwin>so according to https://github.com/webrecorder/warcio/issues/128 , this seems to be the commit that causes non-standard WARCs in warcio: https://github.com/webrecorder/warcio/pull/45
18:00:46<hexagonwin>would it be okay to use if this is reverted? since I can use other libraries for reading the WARC
18:01:01cm quits [Ping timeout: 272 seconds]
18:01:50Dada quits [Remote host closed the connection]
18:03:24Dada joins
18:03:59cm joins
18:07:46nulldata-alt1 quits [Client Quit]
18:14:03<justauser>https://transfer.archivete.am/p0BMm/pomf.lain.la_final.txt
18:14:03<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/p0BMm/pomf.lain.la_final.txt
18:15:28nulldata-alt1 (nulldata) joins
18:32:30HP_Archivist (HP_Archivist) joins
18:35:12nulldata-alt1 quits [Client Quit]
18:35:29nulldata-alt1 (nulldata) joins
18:39:02Cuphead2527480 quits [Quit: Connection closed for inactivity]
18:40:39kansei- (kansei) joins
18:42:11kansei quits [Ping timeout: 272 seconds]
18:47:12Ryz (Ryz) joins
18:56:33nulldata-alt1 quits [Client Quit]
18:56:50nulldata-alt1 (nulldata) joins
19:04:15irisfreckles13 joins
19:04:27tekulvw (tekulvw) joins
19:09:26tekulvw quits [Ping timeout: 268 seconds]
19:16:23kansei (kansei) joins
19:17:01kansei- quits [Ping timeout: 272 seconds]
19:18:50nulldata-alt1 quits [Client Quit]
19:19:07nulldata-alt1 (nulldata) joins