00:05:15<katocala>!a https://southernafrica.iwmi.cgiar.org/ --useragent firefox --igset blogs
00:05:30<katocala>blah
00:37:44<imer>mh, is there a recommended way of handling "ia upload" of many files? had it give up earlier and restarting it, it seems to just reupload/overwrite files it already uploaded.
00:37:50<imer>--delete maybe if that deletes files as it's uploaded them? but that'd nuke the files on my end..
00:39:30systwi_ (systwi) joins
00:39:32systwi quits [Ping timeout: 252 seconds]
00:41:32<fireonlive>you can use -c to checksum and not upload existing files
00:41:49<fireonlive>"-c, --checksum Skip based on checksum. [default: False]"
00:42:15<@JAA>Make sure there are no pending tasks on the item before using that though: https://github.com/jjjake/internetarchive/issues/289
00:42:29<fireonlive>ah!
00:42:43<imer>that explains why it wasnt working JAA, thanks. ill give it time to catch up
00:43:11<fireonlive>is there an easy recursive upload? i had some problems with directories being skipped initially
00:43:19<fireonlive>'dupes_db' for example
00:44:12systwi_ is now known as systwi
00:50:20etnguyen03 quits [Ping timeout: 252 seconds]
00:50:51<@JAA>The CLI can upload directories. But if dupes_db in grab-site means the same thing as on AB, ignore its existence and don't upload it.
00:51:22<@JAA>It's a 1 TiB file of NULs.
01:04:50etnguyen03 (etnguyen03) joins
01:15:43<fireonlive>ahh it was a few hundred MB yeah
01:15:50BlueMaxima joins
01:15:54<fireonlive>..i think
01:16:07<fireonlive>ahno KB
01:16:45<@JAA>I guess grab-site does that differently then.
01:17:24<@JAA>AB fallocates a 1 TiB file there for each job, but the dedupe plugin is broken anyway, so the file is never actually written to.
01:19:17<fireonlive>ahh ok :X yeah that would immediately out of space my little dedi at the moment
01:20:02<fireonlive>i had the links in internet archive.. somewhere
01:20:10<fireonlive>somehow i don't know how i have too many chanels now
01:20:13<fireonlive>weird thing that xD
01:20:48<@JAA>It's just a sparse file. It takes 4 KiB or something like that on disk.
01:21:32<@JAA>Er right, not fallocate, something else.
01:23:07<fireonlive>ahh ok
01:43:37yasomi joins
01:45:44yasomi quits [Changing host]
01:45:44yasomi (yasomi) joins
01:48:11yasomi quits [Client Quit]
01:48:23yasomi (yasomi) joins
02:01:38<HP_Archivist>JAA: I forget, but what's the average time delay for !ao links to show up in WBM these days?
02:16:28etnguyen03 quits [Ping timeout: 265 seconds]
02:26:25etnguyen03 (etnguyen03) joins
03:40:50etnguyen03 quits [Ping timeout: 252 seconds]
04:59:05BlueMaxima quits [Client Quit]
05:26:37hitgrr8 joins
05:32:08Barto quits [Ping timeout: 252 seconds]
05:39:28g0tmk quits [Ping timeout: 265 seconds]
06:01:29IDK (IDK) joins
06:09:26pabs quits [Ping timeout: 265 seconds]
06:12:02pabs (pabs) joins
06:16:44<Doranwen>The fun of setting up to scan a book - only to find out that one of the two Bluetooth clickers I have either needs a new battery or just isn't working, period. They're both the same age, so I have to wonder if it's not the clicker just gone wonky. I remember it had trouble connecting before, but now the phone doesn't see it at all.
06:19:12<Doranwen>Can't scan without it or my hands would be reflected in the pictures. Ah well, I guess that project is going to happen another evening.
06:25:36<pabs>https://www.anandtech.com/show/18901/big-leap-for-hdds-32-tb-hamr-drive-is-coming-40tb-on-horizon https://news.ycombinator.com/item?id=36253499
06:27:54spirit quits [Client Quit]
06:47:47Ivan2261 joins
06:49:04Ivan226 quits [Ping timeout: 265 seconds]
06:49:10<fireonlive>https://twitter.com/llm_sec/status/1667573374426701824?s=12
06:49:13<fireonlive>lmao
07:43:23JackThompson3 quits [Ping timeout: 252 seconds]
07:44:24nito-kihk joins
07:46:15JackThompson3 joins
08:07:50TastyWiener95 quits [Quit: Ping timeout (120 seconds)]
08:08:42TastyWiener95 (TastyWiener95) joins
08:44:10nito-kihk quits [Client Quit]
08:44:22nito-kihk (nito-kihk) joins
10:45:49etnguyen03 (etnguyen03) joins
10:59:37etnguyen03 quits [Client Quit]
10:59:46drin joins
11:02:30geezabiscuit quits [Ping timeout: 252 seconds]
11:02:30drin is now known as geezabiscuit
11:05:33drin joins
11:09:13geezabis- joins
11:09:35geezabiscuit quits [Ping timeout: 265 seconds]
11:10:02geezabis- is now known as geezabiscuit
11:11:50drin quits [Ping timeout: 252 seconds]
11:19:10drin joins
11:21:44geezabiscuit quits [Ping timeout: 252 seconds]
11:21:44drin is now known as geezabiscuit
11:26:59icedice (icedice) joins
11:28:29vegbrasil quits [Remote host closed the connection]
11:38:14Letur quits [Ping timeout: 252 seconds]
11:42:04<pabs>https://theconversation.com/succession-on-the-tibetan-plateau-whats-at-stake-in-the-battle-over-the-dalai-lamas-reincarnation-202353
11:49:24<masterX244>The dupes_db should be a memmapped file, on grab-site it fails on large and goes to smaller sizes and then dupes_db is used but thatmfile is just a helper file and no useful data once crawl finished
11:49:42vegbrasil joins
11:59:41c3manu (c3manu) joins
12:24:11etnguyen03 (etnguyen03) joins
12:30:42vegbrasi_ joins
12:34:20vegbrasil quits [Ping timeout: 252 seconds]
12:34:50icedice quits [Client Quit]
12:40:03icedice (icedice) joins
12:55:36anityatva joins
13:00:10anityatva quits [Remote host closed the connection]
13:00:20vegbrasi_ quits [Remote host closed the connection]
13:00:49vegbrasil joins
13:10:25AmAnd0A quits [Ping timeout: 265 seconds]
13:11:08AmAnd0A joins
13:18:23diggan (diggan) joins
13:54:54Letur joins
14:45:16emberquill quits [Quit: The Lounge - https://thelounge.chat]
14:46:09emberquill (emberquill) joins
15:03:17<nicolas17>masterX244: I wonder how many dupes_db's were uploaded to archive.org...
15:46:57<masterX244>not sure. i usually only upload the *.warc.gz's (and sometimes data that i crunched as input for the crawl when i bruteforced/pre-checked something. had to brtuteforce once via TOR to avoid losing a IP address for site archival after already burning 2 on a initial enumerate, there was a bug with high server load that must have tripped off monitoring)
15:52:00<fireonlive>someone told me the whole folder was good but gave me a couple cleanup instructions
15:52:19<fireonlive>like if there’s a -wal with wpull.db
15:52:43<fireonlive>notes are on laptop and i’m on phone atm but i think that was about it
15:58:32<masterX244>the WARCs are the juice. i sometimes got jobs where the wpull.db is mostly garbage (rabbit holes and other shit, grab-site finds urls that shouldnt exist for some odd reason, had some jobs where 9GB of a 10GB wpull db was garbage to ignore off
16:02:50imer quits [Quit: Ping timeout (120 seconds)]
16:03:16imer (imer) joins
16:09:24<nicolas17>yeah
16:09:33<nicolas17>I'm just curious how many people *didn't* do that cleanup properly :D
16:21:03<fireonlive>ahh
17:03:34g0tmk joins
17:13:48yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/]
17:16:44yano (yano) joins
17:55:00vegbrasi_ joins
17:58:00etnguyen03 quits [Ping timeout: 265 seconds]
17:58:29vegbrasil quits [Ping timeout: 265 seconds]
17:59:56vegbrasi_ quits [Ping timeout: 265 seconds]
18:09:22vegbrasil joins
18:15:53vegbrasil quits [Ping timeout: 252 seconds]
18:23:20vegbrasil joins
18:51:08Barto (Barto) joins
19:10:31etnguyen03 (etnguyen03) joins
19:23:17Ivan2261 is now known as Ivan226
19:30:39pabs quits [Read error: Connection reset by peer]
19:45:55AmAnd0A quits [Read error: Connection reset by peer]
19:46:12AmAnd0A joins
20:08:02driib quits [Quit: The Lounge - https://thelounge.chat]
20:08:59driib (driib) joins
20:29:10katocala quits [Remote host closed the connection]
20:35:01c3manu quits [Client Quit]
20:35:09driib quits [Remote host closed the connection]
20:35:27jlwoodwa joins
20:35:41driib (driib) joins
20:41:09Catdurid joins
20:49:55driib quits [Remote host closed the connection]
20:50:26driib (driib) joins
20:57:07<bigdata>does anyone know where i can find datasets similar to what used to be on https://opendata.rapid7.com ? particularly the dns dumps, but the rest webpages and ports would also be nice to have
21:02:02katocala joins
21:08:02geezabiscuit quits [Ping timeout: 252 seconds]
21:09:49geezabiscuit (geezabiscuit) joins
21:40:20etnguyen03 quits [Ping timeout: 265 seconds]
21:41:18Mateon1 quits [Ping timeout: 265 seconds]
21:46:50etnguyen03 (etnguyen03) joins
21:51:00Mateon1 joins
22:02:24hitgrr8 quits [Client Quit]
22:11:32BlueMaxima joins
22:46:52jlwoodwa quits [Ping timeout: 252 seconds]
23:18:54pabs (pabs) joins
23:27:05MactasticMendez (MactasticMendez) joins
23:35:53g0tmk quits [Remote host closed the connection]