/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/26/24(Fri)13:56:28 No.100192168

File: GL5Iz27bUAA92Uj.jpg (601 KB, 1856x2464)

601 KB JPG

/lmg/ - Local Models General Anonymous 04/26/24(Fri)13:56:28 No.100192168 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100185269 & >>100180197

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
04/26/24(Fri)13:56:54 No.100192173

Anonymous 04/26/24(Fri)13:56:54 No.100192173

File: 1713871075944.jpg (91 KB, 640x720)

91 KB JPG

►Recent Highlights from the Previous Thread: >>100185269

--Paper: MoDE: CLIP Data Experts via Clustering: >>100187911
--Paper: Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding: >>100187639
--Is Perplexity a Useful Metric for Evaluating Language Models?: >>100186134 >>100186357 >>100186419 >>100186674
--Unraveling the Mysteries of LLaMA Models: Quantization, Preprocessing, and Architecture: >>100188162 >>100188281 >>100188316 >>100189168 >>100189330 >>100189382 >>100189188
--Models Fail to Reason About Emergency Brake Scenario: >>100186538 >>100186853 >>100186932
--Qwen1.5-110B Released: Benchmark Performance and Implications: >>100188256 >>100188273 >>100189963 >>100191165 >>100191285
--Qwen 110B Benchmark Scores Under Fire: >>100189566 >>100189624 >>100189669 >>100190126 >>100190366 >>100190746 >>100190502 >>100190171 >>100190263
--Anon's Model Comparison: 70b q2_k vs 30b: >>100187650 >>100187758 >>100187897 >>100190650
--Which L3 Extended Context Window Release Isn't Garbage?: >>100185915
--Impressive Captions from Llama-3-Vision-Alpha Model: >>100185879 >>100186016 >>100187617
--Llama3-Instruct OOMs Despite Sufficient Memory: >>100191004 >>100191031
--Zuck vs Sam: Contrasting Approaches to AGI: >>100187267 >>100189134 >>100187381
--VoiceCraft Fails to Impress in TTS Arena - Will We Ever Get Good Local TTS?: >>100187059 >>100187929 >>100187935 >>100188366
--Anon's LLM Dedicated Machine and GPU Talk: >>100187972 >>100188151
--Jailbreaking AI Models for RP and Creative Writing: >>100186017 >>100186180 >>100186382
--Sense Nova 5.0: China's AI Breakthrough Outperforms GPT-4: >>100190388 >>100190683 >>100190666
--Anime.gf: Not Entirely Local After All: >>100187447 >>100187589 >>100187641 >>100187472
--Miku (free space): >>100185338 >>100185371 >>100185672 >>100186956 >>100187280 >>100191521 >>100191233

►Recent Highlight Posts from the Previous Thread: >>100186601

Anonymous
04/26/24(Fri)14:01:50 No.100192235

Anonymous 04/26/24(Fri)14:01:50 No.100192235

wow its lmg my favorite thread

Anonymous
04/26/24(Fri)14:02:54 No.100192248

Anonymous 04/26/24(Fri)14:02:54 No.100192248

>>100192235
let's hear it for lmg

Anonymous
04/26/24(Fri)14:03:16 No.100192259

Anonymous 04/26/24(Fri)14:03:16 No.100192259

I miss when lmg didn't look so much like aicg

Also, anchor.

Anonymous
04/26/24(Fri)14:04:39 No.100192277

Anonymous 04/26/24(Fri)14:04:39 No.100192277

File: 1713595713072563.png (1.45 MB, 832x1216)

1.45 MB PNG

tread treme
https://www.youtube.com/watch?v=WDUA7z3703g

Anonymous
04/26/24(Fri)14:05:00 No.100192281

Anonymous 04/26/24(Fri)14:05:00 No.100192281

>>100192259
Blame the NAIshills for this

Anonymous
04/26/24(Fri)14:05:17 No.100192285

Anonymous 04/26/24(Fri)14:05:17 No.100192285

wow lmg is the best thread on the 4chan ever guys

Anonymous
04/26/24(Fri)14:07:31 No.100192314

Anonymous 04/26/24(Fri)14:07:31 No.100192314

>>100192277
local models?

Anonymous
04/26/24(Fri)14:08:40 No.100192333

Anonymous 04/26/24(Fri)14:08:40 No.100192333

I downloaded moistral v3 gguf Q8 imat. It is fucking incoherent garbage. Pure llama3 instruct is noticeably smarter and better (and it isn't a fucking frankenmerge).

Anonymous
04/26/24(Fri)14:10:11 No.100192345

Anonymous 04/26/24(Fri)14:10:11 No.100192345

>>100192333
are you surprised? that's why i told you i'd test it for you and save you time. i already have it downloaded and know it's nowhere near 70b level like that fucking retard said. i don't know how you can say it's incoherent though, must be doing something horribly wrong.

Anonymous
04/26/24(Fri)14:12:35 No.100192370

Anonymous 04/26/24(Fri)14:12:35 No.100192370

>>100192333
>>100192345
are you two like married or something

Anonymous
04/26/24(Fri)14:12:58 No.100192377

Anonymous 04/26/24(Fri)14:12:58 No.100192377

>>100188256
llama_model_quantize_internal: model size = 212120.16 MB
llama_model_quantize_internal: quant size = 64053.44 MB
I assume something is fucked or skill issue. Size on disk of the quant is 8.6BG and it doesn't work obviously.

Anonymous
04/26/24(Fri)14:13:28 No.100192383

Anonymous 04/26/24(Fri)14:13:28 No.100192383

File: Screenshot 2024-04-27 at (...).png (261 KB, 2854x786)

261 KB PNG

>>100192370
No, it's probably a samefag. Literal copy-paste.

Anonymous
04/26/24(Fri)14:13:42 No.100192384

Anonymous 04/26/24(Fri)14:13:42 No.100192384

>>100192370
samefag discord tranny

Anonymous
04/26/24(Fri)14:14:25 No.100192392

Anonymous 04/26/24(Fri)14:14:25 No.100192392

File: file.png (306 KB, 1530x1453)

306 KB PNG

Cohere apparently open-sourced their UI interface.
>Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
https://github.com/cohere-ai/cohere-toolkit

Anonymous
04/26/24(Fri)14:17:49 No.100192422

Anonymous 04/26/24(Fri)14:17:49 No.100192422

>>100192392
ok whyd i use it over chad gradio

Anonymous
04/26/24(Fri)14:25:18 No.100192497

Anonymous 04/26/24(Fri)14:25:18 No.100192497

>>100192422
If you're a normal person, you use JS to write web UIs.
Gradio's use case is people incapable of writing anything but Python, who already consider something like HTML as too hard.

Anonymous
04/26/24(Fri)14:26:05 No.100192513

Anonymous 04/26/24(Fri)14:26:05 No.100192513

>>100192497
my 200 lines of code flask/html frontend serves me well enough

Anonymous
04/26/24(Fri)14:33:33 No.100192604

Anonymous 04/26/24(Fri)14:33:33 No.100192604

>>100192513
It clearly doesn't, because you're insecure. Your first reaction was to seek reassurance on a post that you didn't need to reply. That says a lot about what's going in your head. Normal people aren't this fragile about a random project.

Anonymous
04/26/24(Fri)14:35:09 No.100192627

Anonymous 04/26/24(Fri)14:35:09 No.100192627

>>100192392
That's actually pretty cool.
Wonder how it compares to anythingllm and jan.

Anonymous
04/26/24(Fri)14:35:13 No.100192628

Anonymous 04/26/24(Fri)14:35:13 No.100192628

>>100192604
my first post was a joke, if calling gradio chad wasnt obvious enough
>people le shitpost on my le 4channel?
dont you have kurisu threads to split or something

Anonymous
04/26/24(Fri)14:36:48 No.100192640

Anonymous 04/26/24(Fri)14:36:48 No.100192640

>>100192628
>kurisu threads to split
Mental illness.

Anonymous
04/26/24(Fri)14:37:11 No.100192644

Anonymous 04/26/24(Fri)14:37:11 No.100192644

>>100192621
My experience with L3 8b so far was that it can roleplay really well but it doesn't like to provide longer responses and is bad at following pattern based instructions.
It's outputs have been coherent no matter what for me.
It did get confused with the formatting from time to time too.

Anonymous
04/26/24(Fri)14:37:13 No.100192645

Anonymous 04/26/24(Fri)14:37:13 No.100192645

Gradio. Doesn't handle characters or multiple chats as well as ST.
>SillyTavern
Settings are a clusterfuck. Editing chat history is a pain. Switching between chats reloads the entire page for some reason.
>Jan.ai
Desktop app aimed at macfags. Can't edit stopping strings. Can't edit bot responses.
>AnythingLLM
Made to work with ollama. Basically not configurable at all.

It's been a year. How is there still not a single decent frontend yet? I just want ChatGPT interface with everything configurable. Is that so much to ask?

>>100192392
>>100192627
Going to try this. Can't be worse than what we already have.

Anonymous
04/26/24(Fri)14:37:50 No.100192655

Anonymous 04/26/24(Fri)14:37:50 No.100192655

File: 1693508952163623.png (271 KB, 512x512)

271 KB PNG

>>100192640
struck where it hurts?

Anonymous
04/26/24(Fri)14:38:17 No.100192658

Anonymous 04/26/24(Fri)14:38:17 No.100192658

>>100192628
>I was just pretending to be retarded
I will let you save face because I´m nice.

Anonymous
04/26/24(Fri)14:41:48 No.100192695

Anonymous 04/26/24(Fri)14:41:48 No.100192695

>>100192655
Anon you are saying someone who disagrees with you will be splitting the thread and changing the picture in OP. Seek help.

Anonymous
04/26/24(Fri)14:42:00 No.100192698

Anonymous 04/26/24(Fri)14:42:00 No.100192698

100192658
reddit is spilling out so much its unreal

Anonymous
04/26/24(Fri)14:42:15 No.100192702

Anonymous 04/26/24(Fri)14:42:15 No.100192702

File: quoont.png (41 KB, 446x559)

41 KB PNG

The new Qwen 110b should fit on 2x3090s at 3.5bpw with ~8k context.
(once exllama2 quants actually work, that is).
It's the first open source base model (not instruct) to go past 80 MMLU.

I guess the question now is whether or not Llama3 Instruct 4.5bpw beats this at 3.5bpw.

Anonymous
04/26/24(Fri)14:42:36 No.100192708

Anonymous 04/26/24(Fri)14:42:36 No.100192708

>>100192645
>I just want ChatGPT interface with everything configurable. Is that so much to ask?
It isn't.
It's pretty easy to get the basics of that done in a day or so using what already exists as reference.
I could get the fully fledged thing done in a 20 hour week easy if I was being paid to do that.

>Going to try this. Can't be worse than what we already have.
Dope. Do come back to tell us tour experience.

Anonymous
04/26/24(Fri)14:42:50 No.100192710

Anonymous 04/26/24(Fri)14:42:50 No.100192710

>>100192702
CMMLU*

Anonymous
04/26/24(Fri)14:44:30 No.100192726

Anonymous 04/26/24(Fri)14:44:30 No.100192726

>excitedly gets new potentially smart model
>lobotomizes it
Sad to see.

Anonymous
04/26/24(Fri)14:45:53 No.100192741

Anonymous 04/26/24(Fri)14:45:53 No.100192741

File: charted.png (64 KB, 940x702)

64 KB PNG

>>100192702
3.5bpw should be fine...

Anonymous
04/26/24(Fri)15:03:05 No.100192925

Anonymous 04/26/24(Fri)15:03:05 No.100192925

I've been busy with work for a week or so, have we finally been blessed by some decent L3 RP/ERP finetunes?
Both 8B and 70B maybe even?

Anonymous
04/26/24(Fri)15:04:01 No.100192942

Anonymous 04/26/24(Fri)15:04:01 No.100192942

>>100192925
nah

Anonymous
04/26/24(Fri)15:04:11 No.100192944

Anonymous 04/26/24(Fri)15:04:11 No.100192944

>>100192925
:(

Anonymous
04/26/24(Fri)15:05:02 No.100192957

Anonymous 04/26/24(Fri)15:05:02 No.100192957

Which 110B model is the least shit right now?

Anonymous
04/26/24(Fri)15:17:40 No.100193064

Anonymous 04/26/24(Fri)15:17:40 No.100193064

how big is the difference between q6 and q8?

Anonymous
04/26/24(Fri)15:18:51 No.100193082

Anonymous 04/26/24(Fri)15:18:51 No.100193082

>>100192392
Based, they're the only good thing coming out of this shithole.

Anonymous
04/26/24(Fri)15:20:18 No.100193098

Anonymous 04/26/24(Fri)15:20:18 No.100193098

>>100193064
From everything we've seen so far, not much at all. Generally speaking, it should be Imperceptible in use, essentially.

Anonymous
04/26/24(Fri)15:21:41 No.100193111

Anonymous 04/26/24(Fri)15:21:41 No.100193111

llama3-instruct models are pure slop, I can't coom to this

Anonymous
04/26/24(Fri)15:23:32 No.100193128

Anonymous 04/26/24(Fri)15:23:32 No.100193128

>>100193064
very big for llama3 models

Anonymous
04/26/24(Fri)15:24:58 No.100193139

Anonymous 04/26/24(Fri)15:24:58 No.100193139

>>100193128
That's more when comparing q4 and q5 really.
q6 and q8 should feel equivalent.

Anonymous
04/26/24(Fri)15:27:50 No.100193167

Anonymous 04/26/24(Fri)15:27:50 No.100193167

>her eyes gleaming, a testament to the mixture of arousal and anticipation pooling within her belly

Anonymous
04/26/24(Fri)15:29:32 No.100193182

Anonymous 04/26/24(Fri)15:29:32 No.100193182

>>100193167
Oh yeah got that one too.

Anonymous
04/26/24(Fri)15:30:32 No.100193194

Anonymous 04/26/24(Fri)15:30:32 No.100193194

>still no llama 3 34b

local is fucking dead, its over

Anonymous
04/26/24(Fri)15:31:20 No.100193203

Anonymous 04/26/24(Fri)15:31:20 No.100193203

File: BPE-re-convert.png (29 KB, 916x292)

29 KB PNG

>>100192644
What format are you using for your L3 8b?
It sounds like all gguf files for L3 are broken for the time being:
>"All BPE-based models would require re-convert and pre-tokenization support implemented in order to function correctly."
>https://github.com/ggerganov/llama.cpp/pull/6920#issuecomment-2079867608

Anonymous
04/26/24(Fri)15:31:33 No.100193205

Anonymous 04/26/24(Fri)15:31:33 No.100193205

Imagine unironically expecting creativity from an autocomplete lmao.

Anonymous
04/26/24(Fri)15:32:53 No.100193218

Anonymous 04/26/24(Fri)15:32:53 No.100193218

>>100193205
>expecting creativity
this is like the only thing they're good at given non-cucked finetuning
>expecting high order thinking and consistent internal reasoning about the predictions
retarded

Anonymous
04/26/24(Fri)15:33:06 No.100193221

Anonymous 04/26/24(Fri)15:33:06 No.100193221

>>100193064
Everything over q5 is snake oil.

Anonymous
04/26/24(Fri)15:33:44 No.100193226

Anonymous 04/26/24(Fri)15:33:44 No.100193226

>>100193098
>>100193128
>>100193221
thanks

Anonymous
04/26/24(Fri)15:33:57 No.100193227

Anonymous 04/26/24(Fri)15:33:57 No.100193227

>>100193203
The Llama 3 default template from ST but without user and assistant inside the headers and a Ststem instruction prefilled in the Last Assistant Prefix.
It seems to work pretty well generally, not great nor amazing, but well.
Granted I haven't gone over 5k'ish tokens.

Anonymous
04/26/24(Fri)15:39:12 No.100193296

Anonymous 04/26/24(Fri)15:39:12 No.100193296

>>100192168
>Wonder why every model on everyone else's video runs smooth like butter but mines are slow as hell no matter what.
>Turns out I hadn't even installed CUDA.
Well, shit, at least I'll get the retard of the week award.

Anonymous
04/26/24(Fri)15:43:20 No.100193361

Anonymous 04/26/24(Fri)15:43:20 No.100193361

>>100193194
https://huggingface.co/chargoddard/llama3-42b-v0
I hope someone slops this up soon.

Anonymous
04/26/24(Fri)15:45:43 No.100193397

Anonymous 04/26/24(Fri)15:45:43 No.100193397

>>100193296
Coulda checked your task manager.

Anonymous
04/26/24(Fri)15:54:30 No.100193532

Anonymous 04/26/24(Fri)15:54:30 No.100193532

>>100188316
GGUF seems like such a flawed file format that they can't get something as simple as this correct

Anonymous
04/26/24(Fri)15:56:02 No.100193563

Anonymous 04/26/24(Fri)15:56:02 No.100193563

>>100193397
That's how I found out. Now I'm on my way to fixing it, hopefully. I'm boomer-tier when it comes to dealing with these programs. Also for some reason the texts I generate or something related to ooba is eating a lot of space in my hard disk. Is there a way to clean up all the garbage I've generated with ooba?

Anonymous
04/26/24(Fri)15:57:06 No.100193585

Anonymous 04/26/24(Fri)15:57:06 No.100193585

>>100193296
I thought kobold ccp just came with cuda?

Anonymous
04/26/24(Fri)15:58:35 No.100193611

Anonymous 04/26/24(Fri)15:58:35 No.100193611

>>100193585
I'm not using kobold. Just Llama.ccp if that makes sense.

Anonymous
04/26/24(Fri)16:01:57 No.100193662

Anonymous 04/26/24(Fri)16:01:57 No.100193662

>>100193611
ah

Anonymous
04/26/24(Fri)16:05:58 No.100193731

Anonymous 04/26/24(Fri)16:05:58 No.100193731

So was the L3 4x8B model actually good or just a meme?

Anonymous
04/26/24(Fri)16:06:00 No.100193732

Anonymous 04/26/24(Fri)16:06:00 No.100193732

I fell for the moistral shilling, it's shit.
what's the sota cooming model for vramlets?
I like how smart l3 8b is, but the sloptunes lobotomize it, either slightly or severely

Anonymous
04/26/24(Fri)16:07:31 No.100193759

Anonymous 04/26/24(Fri)16:07:31 No.100193759

>>100193221
Q6 over Q5 is very noticeable.

Anonymous
04/26/24(Fri)16:08:26 No.100193777

Anonymous 04/26/24(Fri)16:08:26 No.100193777

I don't expect many people here to use LLM for writing, but what's a good text completion UI (not chat) for using with local and API alike?
I tried ooba and I feel like it's too many things, and mikupad is fine, I guess. But I wonder if there's something more like Novelcrafter that I can self-host.

Anonymous
04/26/24(Fri)16:10:06 No.100193802

Anonymous 04/26/24(Fri)16:10:06 No.100193802

File: 1714162169485.jpg (565 KB, 827x1039)

565 KB JPG

>local model hardware requirements keep going up
>no new hardware is even coming out to satisfy those requirements
The future looks glim.
How do you cope knowing that you can't run the latest greatest models soon, if not already?

Anonymous
04/26/24(Fri)16:12:46 No.100193838

Anonymous 04/26/24(Fri)16:12:46 No.100193838

>>100193802
I just enjoy the life that I got despite not having all the money in the world, a sports car, a mansion, the perfect wife, etc. Pretty simple huh?

Anonymous
04/26/24(Fri)16:14:47 No.100193865

Anonymous 04/26/24(Fri)16:14:47 No.100193865

>>100193777
https://rentry.org/offline-nc

Anonymous
04/26/24(Fri)16:14:48 No.100193867

Anonymous 04/26/24(Fri)16:14:48 No.100193867

>>100193838
Enlightened stoic.

>>100193802
I want to put my face in there.
Maybe I'll make a card out of that pic.

Anonymous
04/26/24(Fri)16:14:59 No.100193870

Anonymous 04/26/24(Fri)16:14:59 No.100193870

>>100193802
Intel is working on an accelerator they named "gaudi 3" supposedly with planned 128gb ram. Will likely cost an arm and a leg tho but gpus are not the future of ai we need dedicated processors

Anonymous
04/26/24(Fri)16:16:32 No.100193893

Anonymous 04/26/24(Fri)16:16:32 No.100193893

>>100193802
EZ, just make money and buying more cards. Maybe you could do the same if you spent less time manually editing all those miku gens. On second though do that more and speak less from now on. You're pretty to look at before you open your mouth.

Anonymous
04/26/24(Fri)16:16:53 No.100193902

Anonymous 04/26/24(Fri)16:16:53 No.100193902

Hit me with your best sysprompts you've been using for Llama 3 70B instruct. It seems like these make a massive difference in the text generated.

Anonymous
04/26/24(Fri)16:19:04 No.100193933

Anonymous 04/26/24(Fri)16:19:04 No.100193933

Every time I think "eh it's just all the same in-and-out" and then go to chatting, I immediately start doing ERP and cranking and yanking anyways.
why

Anonymous
04/26/24(Fri)16:21:27 No.100193959

Anonymous 04/26/24(Fri)16:21:27 No.100193959

>>100192644
I was trying out dolphin 8b last night and had pretty much the same experience. Seems to be maybe 90% as coherent as llama 2 13b fine tunes but with shorter responses. But since it's 8b I can just barely load the entire thing q4 into 6gb vram which speeds it up significantly. I don't see any reason to use it over older shit for now if you aren't being limited by vram.

Anonymous
04/26/24(Fri)16:29:21 No.100194082

Anonymous 04/26/24(Fri)16:29:21 No.100194082

>>100192173
I feel compelled to link this but this is one of the times your summary missed big news about the lead guy of WizardLM getting possibly fired and resulting discussion of that which was between the thread you weren't active for anyone who is looking.
>>100181801
>>100181812
>>100181883
>>100186362

Anonymous
04/26/24(Fri)16:30:14 No.100194099

Anonymous 04/26/24(Fri)16:30:14 No.100194099

File: sniff.jpg (7 KB, 480x360)

7 KB JPG

>>100193802
Being able to afford the absolute best version of anything isn't normal. If companies want to make only 400B models from now on, I'll just have to cope with models I can use. No biggie. I've given up on Meta at this point desu.
>pic

Anonymous
04/26/24(Fri)16:34:31 No.100194149

Anonymous 04/26/24(Fri)16:34:31 No.100194149

File: pit.jpg (324 KB, 996x1416)

324 KB JPG

>no local model can write a good greentext story

Anonymous
04/26/24(Fri)16:35:17 No.100194161

Anonymous 04/26/24(Fri)16:35:17 No.100194161

>>100194149
easier to turn a bottomless pit into a regular pit.

Anonymous
04/26/24(Fri)16:35:46 No.100194171

Anonymous 04/26/24(Fri)16:35:46 No.100194171

Question:
Asking a single question to chatgpt eats away half a GB form my hard disk. Is this a thing or am I doing something very wrong?

Anonymous
04/26/24(Fri)16:36:14 No.100194175

Anonymous 04/26/24(Fri)16:36:14 No.100194175

>>100194149
kek

Anonymous
04/26/24(Fri)16:39:58 No.100194216

Anonymous 04/26/24(Fri)16:39:58 No.100194216

slooooopppppppp ahhhhh

Anonymous
04/26/24(Fri)16:39:58 No.100194217

Anonymous 04/26/24(Fri)16:39:58 No.100194217

>>100194082
Holy fuck.
The best part is that due to the licence people can just upload it everywhere and fine tune it and shit.
Dayum.

Anonymous
04/26/24(Fri)16:44:10 No.100194265

Anonymous 04/26/24(Fri)16:44:10 No.100194265

File: file.png (858 KB, 1956x1301)

858 KB PNG

>>100194082
>>100194082
It was fake news.
https://twitter.com/WizardLM_AI
Now please stop dumbing down the thread with your crap, you have Reddit for that. WizardLM-2 is a nothingburger.

Anonymous
04/26/24(Fri)16:45:37 No.100194286

Anonymous 04/26/24(Fri)16:45:37 No.100194286

>>100194265
kek r*ddit is so embarrassing

Anonymous
04/26/24(Fri)16:48:19 No.100194325

Anonymous 04/26/24(Fri)16:48:19 No.100194325

>>100193732
is shitty porpoise.7 the best we have rn?

Anonymous
04/26/24(Fri)16:52:30 No.100194388

Anonymous 04/26/24(Fri)16:52:30 No.100194388

File: migu hatune phi-3 mini.png (131 KB, 832x1154)

131 KB PNG

mikupad's highlighting is a little bit buggy.
Anyway, it took until the 7th response to stop using the "I" pronoun.

Anonymous
04/26/24(Fri)17:00:02 No.100194497

Anonymous 04/26/24(Fri)17:00:02 No.100194497

>>100194265
Now I am positive they terminated the whole team because they finally made AGI and this AGI will be the new wizard team.

Anonymous
04/26/24(Fri)17:02:47 No.100194545

Anonymous 04/26/24(Fri)17:02:47 No.100194545

>>100194265
The whole thing is still fishy. A lot of the claimed things in the blog post here still holds. Most importantly is the fact that the Github repository is still all down and the entire HuggingFace repository, not only WizardLM-2, being down.
https://rocky-muscle-755.notion.site/What-happened-to-Wizard-LM2-a247e09244d0483cbb02c1587b357c9d
About the only thing that isn't now is the fact that the one guy on the team had his page reinstated on Microsoft's webpage to say he's still an employee. That's not convincing enough to say one way or the other if this is certainly the case. May need to wait longer to see how it shakes out.

Anonymous
04/26/24(Fri)17:04:21 No.100194566

Anonymous 04/26/24(Fri)17:04:21 No.100194566

>>100194545
They got rid of him and put a more controlled AI clone in his place.

Anonymous
04/26/24(Fri)17:04:24 No.100194569

Anonymous 04/26/24(Fri)17:04:24 No.100194569

i would care about wizardlm if it wasnt dogshit

Anonymous
04/26/24(Fri)17:13:40 No.100194682

Anonymous 04/26/24(Fri)17:13:40 No.100194682

>>100193802
I have been coping by proxyfagging and using the best models for free at the cost of my dignity.
Still lurk around here though in case things get better.

Anonymous
04/26/24(Fri)17:15:26 No.100194703

Anonymous 04/26/24(Fri)17:15:26 No.100194703

>>100194682
I still use horde. I'm not trapped there with them. They're trapped there with whatever cringe I post.

Anonymous
04/26/24(Fri)17:16:09 No.100194710

Anonymous 04/26/24(Fri)17:16:09 No.100194710

>No good tunes yet.
>Phi 14B behind the corner.
>Quants fucked.
Is it the worst time to coom?

Anonymous
04/26/24(Fri)17:16:45 No.100194720

Anonymous 04/26/24(Fri)17:16:45 No.100194720

>>100194703
That's actually a based outlook to have, I'll try thinking more like that.

Anonymous
04/26/24(Fri)17:17:15 No.100194726

Anonymous 04/26/24(Fri)17:17:15 No.100194726

File: file.png (489 KB, 500x500)

489 KB PNG

>>100194703

Anonymous
04/26/24(Fri)17:19:21 No.100194752

Anonymous 04/26/24(Fri)17:19:21 No.100194752

>>100194710
Infinite context Mistral soon?

Anonymous
04/26/24(Fri)17:24:38 No.100194809

Anonymous 04/26/24(Fri)17:24:38 No.100194809

What's with the L3-70B-instruct issues with EOSing super early. Usually I have issues with models rambling on forever but L3 seems to want to stop early in it's response.

Anonymous
04/26/24(Fri)17:25:11 No.100194815

Anonymous 04/26/24(Fri)17:25:11 No.100194815

>>100194082
Except for the last link, that was posted two threads ago.
>>100186601
>--The Mysterious Demise of WizardLM2: Conspiracy Theories Abound: >>100181801 >>100181883 >>100181968 >>100181974 >>100182013 >>100182526
Recap bot not missed. Recap bot was a good bot.
>>100194265
Recap bot only reports the news, it had no way of knowing this.

Anonymous
04/26/24(Fri)17:26:12 No.100194827

Anonymous 04/26/24(Fri)17:26:12 No.100194827

>>100194809
That's my experience with 8b too.

Anonymous
04/26/24(Fri)17:28:13 No.100194851

Anonymous 04/26/24(Fri)17:28:13 No.100194851

>>100194827
>>100194809
try 8b on transformers and turn off skipping special tokens

Anonymous
04/26/24(Fri)17:28:21 No.100194853

Anonymous 04/26/24(Fri)17:28:21 No.100194853

>Fimbulvetr-10.7B-v1-Q8_0

How many of these models are gimped? I swear, sometimes it forces LGBT shit into the conversation.

Anonymous
04/26/24(Fri)17:28:34 No.100194856

Anonymous 04/26/24(Fri)17:28:34 No.100194856

what's the best local model finetuned to only speak in rhyme?

Anonymous
04/26/24(Fri)17:29:23 No.100194867

Anonymous 04/26/24(Fri)17:29:23 No.100194867

>>100194545
>that image on the notion site about their training methodologies
Honestly I believe this is what caused them to get nuked. They figured out/came close to what the OAI team was doing and MSFT didn't want them to eclipse OAI's spotlight. That "Fully AI Powered Synthetic Training System" is honestly what I guess led to Phi-3 being so good at benchmarks yet absolutely dogshit for functional usage by users.

Anonymous
04/26/24(Fri)17:30:46 No.100194875

Anonymous 04/26/24(Fri)17:30:46 No.100194875

>>100194853
all LLMs are gimped with *current thing* crap, we can't avoid it, sadly, prompting cope is temporary solution.

Anonymous
04/26/24(Fri)17:30:48 No.100194876

Anonymous 04/26/24(Fri)17:30:48 No.100194876

>>100194853
Anon, that's only because the model recognizes that you are gay by how you write.

Anonymous
04/26/24(Fri)17:31:56 No.100194894

Anonymous 04/26/24(Fri)17:31:56 No.100194894

>>100192741
2.5 less than 2.7? I’m curious about methodology here because that’s off enough to be outside of the margin it error (basing it off if the integral anyway)

Anonymous
04/26/24(Fri)17:33:00 No.100194908

Anonymous 04/26/24(Fri)17:33:00 No.100194908

>>100194875
>all LLMs are gimped with *current thing* crap
Is this why when I create "negative" lore for my world it needs to adjust it?

>>100194876
Your mum has three tits.

Anonymous
04/26/24(Fri)17:33:32 No.100194914

Anonymous 04/26/24(Fri)17:33:32 No.100194914

>>100194853
You tried to fuck a guy didn't you? Be honest.

Anonymous
04/26/24(Fri)17:33:37 No.100194915

Anonymous 04/26/24(Fri)17:33:37 No.100194915

>>100194853
In all the hours I spent with Fimbul, it has never once gone gay on me. Are you sure you that's not on you?

Anonymous
04/26/24(Fri)17:35:40 No.100194937

Anonymous 04/26/24(Fri)17:35:40 No.100194937

>>100194908
yes, it just can't follow that, hardcoded to be always positive no matter what.

Anonymous
04/26/24(Fri)17:38:00 No.100194964

Anonymous 04/26/24(Fri)17:38:00 No.100194964

>>100194937
Were there any ungimped models ever available to the public?

Anonymous
04/26/24(Fri)17:40:09 No.100194987

Anonymous 04/26/24(Fri)17:40:09 No.100194987

>>100194964
i think no, but you can see that "TayAI" as one, microsoft nuked it after 24 hours tho, because it was too good for this world.

Anonymous
04/26/24(Fri)17:42:27 No.100195008

Anonymous 04/26/24(Fri)17:42:27 No.100195008

>>100194987
Yeah, I remember that. Disappointing.

llama.cpp CUDA dev !YOmst7Ghe6
04/26/24(Fri)17:44:49 No.100195028

llama.cpp CUDA dev !YOmst7Ghe6 04/26/24(Fri)17:44:49 No.100195028

File: its_over.jpg (41 KB, 622x408)

41 KB JPG

>>100192168
I did a quantitative analysis of LLaMA 2 vs. LLaMA 3 quality loss from quantization by looking at the changes in token probabilities vs. FP16 and my findings align with the subjective reports that the quality loss is higher with LLaMA 3 than LLaMA 2.
The discrepancy seems to get larger towards smaller quant sizes.
Related PR: https://github.com/ggerganov/llama.cpp/pull/6936

Anonymous
04/26/24(Fri)17:46:53 No.100195042

Anonymous 04/26/24(Fri)17:46:53 No.100195042

>>100195028
The real question is, if other quant methods such as exl2 also respond similarly, or if it's a bug with llama.cpp.

Anonymous
04/26/24(Fri)17:48:32 No.100195050

Anonymous 04/26/24(Fri)17:48:32 No.100195050

>>100195028
We should check how the weights are distributed in l3 vs l2, I remember a paper that discovered that most weights in l2 were around 0, would be useful to see if more 15T training tokens changed this in l3

Anonymous
04/26/24(Fri)17:48:51 No.100195054

Anonymous 04/26/24(Fri)17:48:51 No.100195054

>>100194987
It what the fuck is it's problem with it insisting it's rules against your own when you create a world?

Anonymous
04/26/24(Fri)17:49:40 No.100195062

Anonymous 04/26/24(Fri)17:49:40 No.100195062

>>100192168
how does the american KYC law for AI stuff affect open-source models? should I download as many models as possible, as fast as possible?

Anonymous
04/26/24(Fri)17:54:51 No.100195093

Anonymous 04/26/24(Fri)17:54:51 No.100195093

>>100195062
KYC laws are for money handlers

Anonymous
04/26/24(Fri)17:59:38 No.100195146

Anonymous 04/26/24(Fri)17:59:38 No.100195146

File: tett0.png (1.3 MB, 1152x768)

1.3 MB PNG

PSA: New PR merged in llama.cpp to check for bad values in tensors.
Use this to confirm if your weighted GGUF quants have any invalid values (i.e. NaN)
https://github.com/ggerganov/llama.cpp/pull/6884

Anonymous
04/26/24(Fri)18:01:12 No.100195161

Anonymous 04/26/24(Fri)18:01:12 No.100195161

It's kind of dogshit for any world building, it seems to forget stuff and then starts talking about real world consequences.

Anonymous
04/26/24(Fri)18:01:57 No.100195174

Anonymous 04/26/24(Fri)18:01:57 No.100195174

>>100195028
Could you put that rack of 4090s to use testing 70B? It'd be interesting to see if larger models are indeed less affected by quantization. If so, there would still be hope for 405B.

Anonymous
04/26/24(Fri)18:03:27 No.100195187

Anonymous 04/26/24(Fri)18:03:27 No.100195187

>>100195161
What are you referring to?

Anonymous
04/26/24(Fri)18:07:38 No.100195227

Anonymous 04/26/24(Fri)18:07:38 No.100195227

>>100195146
Does this mean that my GGUFs could have been corrupted this whole time?

Anonymous
04/26/24(Fri)18:08:05 No.100195232

Anonymous 04/26/24(Fri)18:08:05 No.100195232

>>100195227
It means check and found out.

Anonymous
04/26/24(Fri)18:08:44 No.100195235

Anonymous 04/26/24(Fri)18:08:44 No.100195235

>>100195187
>As mentioned earlier, considering the possibility of variations within individual preferences and motivations, it's still crucial to acknowledge that each person's experiences and choices may differ.

This is in response to everyone votes one way in my created world without coercion.

Anonymous
04/26/24(Fri)18:09:39 No.100195245

Anonymous 04/26/24(Fri)18:09:39 No.100195245

>>100195028
Have you tried it with imatrix? >>100183072 reports insane PPL without imatrix, but more reasonable with it

Anonymous
04/26/24(Fri)18:10:17 No.100195253

Anonymous 04/26/24(Fri)18:10:17 No.100195253

File: GMFVgrcbgAASJiU.jpg (373 KB, 2048x1536)

373 KB JPG

>>100192168

llama.cpp CUDA dev !YOmst7Ghe6
04/26/24(Fri)18:13:06 No.100195276

llama.cpp CUDA dev !YOmst7Ghe6 04/26/24(Fri)18:13:06 No.100195276

>>100195042
I think I saw a paper that had the same conclusion for GPTQ and AWQ so this may be universal.

>>100195174
I plan to but I'll need to free some disk space first.

>>100195227
In theory yes but in practice you should notice very quickly if there are issues.
The quants with issues that prompted the change were the LLaMA 3 30b imatrix quants from mradermacher.

>>100195245
I did not use an imatrix but I also did not see an exploding PPL for q2_K.

Anonymous
04/26/24(Fri)18:14:35 No.100195291

Anonymous 04/26/24(Fri)18:14:35 No.100195291

>>100195146
How could models have nans in them without ruining the output completely and making all the logits nan? Input embedding? I didn't know those weights got quantized tho.

>>100195276
>LLaMA 3 30b
don't toy with my heart

Anonymous
04/26/24(Fri)18:15:33 No.100195305

Anonymous 04/26/24(Fri)18:15:33 No.100195305

Is there a good prompt to prevent llama3 from talking about respect, but with otherwise minimal impact on its behavior?

Anonymous
04/26/24(Fri)18:15:43 No.100195308

Anonymous 04/26/24(Fri)18:15:43 No.100195308

>>100195253
>>100192168
Why is Miku so perfect?
Who designed her? They did a good job.

Anonymous
04/26/24(Fri)18:16:33 No.100195319

Anonymous 04/26/24(Fri)18:16:33 No.100195319

>>100195305
nope, see >>100194875

llama.cpp CUDA dev !YOmst7Ghe6
04/26/24(Fri)18:16:45 No.100195322

llama.cpp CUDA dev !YOmst7Ghe6 04/26/24(Fri)18:16:45 No.100195322

>>100195291
The CPU code seems to convert the NaNs to 0 so if you use that for checking you will get coherent (but significantly worse) outputs.

Anonymous
04/26/24(Fri)18:16:45 No.100195323

Anonymous 04/26/24(Fri)18:16:45 No.100195323

More Claude Opus logs on the C2 proxy, they're jsonl files now. From prompt-logs1.json to 25/26 right now. I think they get deleted when the proxy is restarted.
>https://vgdasfgadg-c2.hf.space
https://vgdasfgadg-c2.hf.space/user_content/prompt-logs1.jsonl
https://vgdasfgadg-c2.hf.space/user_content/prompt-logs25.jsonl

Anonymous
04/26/24(Fri)18:17:51 No.100195334

Anonymous 04/26/24(Fri)18:17:51 No.100195334

>>100195245
I may be dumb about it but is the PPL check done on the same data set as calibration? Cause it should probably be done on a different dataset just to avoid the situation where you get good results on calibration set but the model is still broken.

Anonymous
04/26/24(Fri)18:18:08 No.100195338

Anonymous 04/26/24(Fri)18:18:08 No.100195338

>>100195308
>Who designed her?
She is as designed as human beings were designed.
Anyway, to answer the question, the truth appeared to the illustrator in a dream, and he was only partially able to bring our Miku into this dimension.
There are rumors her real form cannot be fully understood and would cause madness on those to even attempted to.

Anonymous
04/26/24(Fri)18:18:17 No.100195339

Anonymous 04/26/24(Fri)18:18:17 No.100195339

>>100195308
>Who designed her?
KEI Garou did.

Anonymous
04/26/24(Fri)18:18:52 No.100195341

Anonymous 04/26/24(Fri)18:18:52 No.100195341

>>100195308
It is just a girl with green hair and a penis...

Anonymous
04/26/24(Fri)18:19:20 No.100195345

Anonymous 04/26/24(Fri)18:19:20 No.100195345

>>100195341
trans icon!

Anonymous
04/26/24(Fri)18:19:28 No.100195347

Anonymous 04/26/24(Fri)18:19:28 No.100195347

>>100195341
Miku is blue

Anonymous
04/26/24(Fri)18:20:26 No.100195361

Anonymous 04/26/24(Fri)18:20:26 No.100195361

File: 1586913429939.png (362 KB, 636x438)

362 KB PNG

>>100195322
>The CPU code seems to convert the NaNs to 0

Anonymous
04/26/24(Fri)18:20:27 No.100195363

Anonymous 04/26/24(Fri)18:20:27 No.100195363

>>100195347
you will never be japanese

Anonymous
04/26/24(Fri)18:22:34 No.100195384

Anonymous 04/26/24(Fri)18:22:34 No.100195384

>>100195322
>mfw there's probably only a handful of anons around here to understand the implications of NaN vs 0 for the result
Fuck, that's rough...

Anonymous
04/26/24(Fri)18:22:41 No.100195386

Anonymous 04/26/24(Fri)18:22:41 No.100195386

>>100195341
Bro where did you buy your Miku? I think you got scammed... My Miku has blue hair and no penis.

Anonymous
04/26/24(Fri)18:24:05 No.100195399

Anonymous 04/26/24(Fri)18:24:05 No.100195399

File: IFoundAMikuInTheWoods.png (1.87 MB, 1184x864)

1.87 MB PNG

>>100195308
Her design gives her amazing flexibility, like the software she represents
You'd know that shape in pretty well any scenario

Anonymous
04/26/24(Fri)18:24:51 No.100195412

Anonymous 04/26/24(Fri)18:24:51 No.100195412

>>100195347
If she was green, she would die.

Anonymous
04/26/24(Fri)18:25:23 No.100195423

Anonymous 04/26/24(Fri)18:25:23 No.100195423

File: file.png (1.31 MB, 1479x1866)

1.31 MB PNG

Someone plug this picture into llava and ask what is the color of her hair and what is the lenght of her penis.

Anonymous
04/26/24(Fri)18:27:58 No.100195446

Anonymous 04/26/24(Fri)18:27:58 No.100195446

What's a good erotic model? I have a 3090ti.

Anonymous
04/26/24(Fri)18:29:03 No.100195453

Anonymous 04/26/24(Fri)18:29:03 No.100195453

>>100195446
me

Anonymous
04/26/24(Fri)18:29:12 No.100195457

Anonymous 04/26/24(Fri)18:29:12 No.100195457

>>100195146
Holy kek imagine being so completely wrong that you are immortalized in not one but two llama.cpp PRs.
Anyone downloading from mradermacher after all this deserves broken quants lmao

Anonymous
04/26/24(Fri)18:30:52 No.100195476

Anonymous 04/26/24(Fri)18:30:52 No.100195476

File: 103549893.jpg (723 KB, 625x900)

723 KB JPG

>early Miku depictions
>associated with green
>current day Miku
>associated with blue
But why? Isn't she teal colored?

llama.cpp CUDA dev !YOmst7Ghe6
04/26/24(Fri)18:30:53 No.100195478

llama.cpp CUDA dev !YOmst7Ghe6 04/26/24(Fri)18:30:53 No.100195478

>>100195361
>>100195384
It wasn't me that dug through the code to debug this but I from briefly looking at is I think the issue is not that the NaNs are intentionally converted to 0.
Instead I think it has to do with boolean expressions always evaluating to false when NaNs are involved which then causes the outcome to become 0.

Anonymous
04/26/24(Fri)18:30:59 No.100195479

Anonymous 04/26/24(Fri)18:30:59 No.100195479

>>100195457
OK but how did he manage to do that? What is there to fuck up in quanting a model?

Anonymous
04/26/24(Fri)18:31:14 No.100195485

Anonymous 04/26/24(Fri)18:31:14 No.100195485

>>100195446
https://chub.ai/?search=&first=30&topics=NSFW&excludetopics=&page=1&sort=star_count&venus=false&min_tokens=50&nsfw=true

Anonymous
04/26/24(Fri)18:31:24 No.100195486

Anonymous 04/26/24(Fri)18:31:24 No.100195486

Hey, I've been out of the game ever since Mixtral came out. Back then, that seemed like "the model" to get if your computer could handle it, which mine did. Is there a new "the model"? Mixtral was kind of shitty, honestly. Constantly starting acting retarded.

I downloaded this Miqu one but that's way too heavy for my computer, takes like 15 minutes to generate 5 tokens... Thanks in advance!

Anonymous
04/26/24(Fri)18:32:44 No.100195508

Anonymous 04/26/24(Fri)18:32:44 No.100195508

>>100195486
2 more months

Anonymous
04/26/24(Fri)18:33:32 No.100195516

Anonymous 04/26/24(Fri)18:33:32 No.100195516

>>100195479
Those aren't models.

Anonymous
04/26/24(Fri)18:34:05 No.100195523

Anonymous 04/26/24(Fri)18:34:05 No.100195523

>100191837
llama.cpp is worse than many people realize, but by now we all learned that being portable and easy to install is all that matters.

Anonymous
04/26/24(Fri)18:34:18 No.100195527

Anonymous 04/26/24(Fri)18:34:18 No.100195527

>>100195476
>But why? Isn't she teal colored?
turquoise.
https://desuarchive.org/g/thread/99265936/#99269059

Anonymous
04/26/24(Fri)18:35:13 No.100195541

Anonymous 04/26/24(Fri)18:35:13 No.100195541

>>100195523
Fuck >>100191837

Anonymous
04/26/24(Fri)18:35:56 No.100195550

Anonymous 04/26/24(Fri)18:35:56 No.100195550

>>100195457
>t. another anon that uploads quants
Normal people don't feel this level of animosity when he isn't even the person that writes the code. You're just shilling.
He isn't even mentioned in the PR. You're so desperate about it. I think you should seek validation from other sources besides making quants.

Anonymous
04/26/24(Fri)18:36:01 No.100195553

Anonymous 04/26/24(Fri)18:36:01 No.100195553

>>100195527
>My dad works at the color factory.

Anonymous
04/26/24(Fri)18:37:48 No.100195576

Anonymous 04/26/24(Fri)18:37:48 No.100195576

>>100195486
llama 3 8b is amazing, llama 3 70b iq2 xs by bartowski is also amazing, all else is a meme

Anonymous
04/26/24(Fri)18:42:34 No.100195629

Anonymous 04/26/24(Fri)18:42:34 No.100195629

>model refuses to translate nazi speech because it contains hate speech
great product that doesn't work

Anonymous
04/26/24(Fri)18:43:58 No.100195650

Anonymous 04/26/24(Fri)18:43:58 No.100195650

>>100195629
based, why you want to translate hate speech anyway?

Anonymous
04/26/24(Fri)18:44:09 No.100195654

Anonymous 04/26/24(Fri)18:44:09 No.100195654

File: IMG_2900.jpg (81 KB, 1251x755)

81 KB JPG

>he quanted

Anonymous
04/26/24(Fri)18:44:25 No.100195658

Anonymous 04/26/24(Fri)18:44:25 No.100195658

>>100195650
how does this affect you?

Anonymous
04/26/24(Fri)18:44:59 No.100195665

Anonymous 04/26/24(Fri)18:44:59 No.100195665

>>100195576
Could you point me to some download links, if you don't mind? Tried to find them but struggling.

Anonymous
04/26/24(Fri)18:45:10 No.100195669

Anonymous 04/26/24(Fri)18:45:10 No.100195669

>>100195654
why is llama.cpp so broken

Anonymous
04/26/24(Fri)18:45:38 No.100195676

Anonymous 04/26/24(Fri)18:45:38 No.100195676

How can I hook up sillytavern to a phone, or have it call me?
Is that possible? I want to have it be able to call me and then do STT and TTS so I can chat with my waifu on my daily commute.

Anonymous
04/26/24(Fri)18:45:42 No.100195679

Anonymous 04/26/24(Fri)18:45:42 No.100195679

>>100195650
to know what it said

Anonymous
04/26/24(Fri)18:45:47 No.100195681

Anonymous 04/26/24(Fri)18:45:47 No.100195681

After playing around with l3 instruct I noticed one thing that I remember reading ITT about alignments. Namely, how even if you don't get a refusal or assistant assistant assistant breakdown, the model will still subtly shy away from doing the most depraved shit. And yes I know about the {{char}} assistant trick but my experience was that the model becomes dumber. To the point where you are better off just using the base model.

Anonymous
04/26/24(Fri)18:46:09 No.100195684

Anonymous 04/26/24(Fri)18:46:09 No.100195684

>>100195654
GGUFkeks... it's over...

Anonymous
04/26/24(Fri)18:46:32 No.100195687

Anonymous 04/26/24(Fri)18:46:32 No.100195687

>>100195676
anon...

Anonymous
04/26/24(Fri)18:47:08 No.100195696

Anonymous 04/26/24(Fri)18:47:08 No.100195696

>>100195654
>Q4_K_M that good
Sick.

Anonymous
04/26/24(Fri)18:47:19 No.100195699

Anonymous 04/26/24(Fri)18:47:19 No.100195699

>>100195676
Termux

Anonymous
04/26/24(Fri)18:48:23 No.100195711

Anonymous 04/26/24(Fri)18:48:23 No.100195711

>>100195676
Oh that is easy just post your phone number here and we will configure everything for you.

Anonymous
04/26/24(Fri)18:49:55 No.100195731

Anonymous 04/26/24(Fri)18:49:55 No.100195731

>>100195654
so its beet to use q4 than q8? makes no sense if true.

Anonymous
04/26/24(Fri)18:50:58 No.100195747

Anonymous 04/26/24(Fri)18:50:58 No.100195747

>>100195676
this would only make sense if the generation was fast enough, you are going to talk and wait 30 seconds for a voice response lmao

Anonymous
04/26/24(Fri)18:54:16 No.100195785

Anonymous 04/26/24(Fri)18:54:16 No.100195785

File: 1713013225757550.png (2.76 MB, 2048x2800)

2.76 MB PNG

>>100195687
lol, its more to pass the time/have fun with it than anything else

>>100195699
And just use the system TTS? Can it do STT as well and have all that not sound crazy bad? I'd like to have a nice voice, not robotic

>>100195747
So basically local TTS/STT is pretty taxing? (Completely ignorant) Would that mean that if I had a dedicated card there'd be less lag?

Does anything currently do near-real time TTS? (willing to pay for API)

Anonymous
04/26/24(Fri)18:54:19 No.100195786

Anonymous 04/26/24(Fri)18:54:19 No.100195786

>>100195654
>you can run lower quant and get higher quality with less memory
>/lmg/ still complains

llama.cpp CUDA dev !YOmst7Ghe6
04/26/24(Fri)18:55:44 No.100195803

llama.cpp CUDA dev !YOmst7Ghe6 04/26/24(Fri)18:55:44 No.100195803

>>100195731
I think this is an issue with sample size.
All quants add some noise to the logits so they can randomly perform better than even FP16.
And if your benchmark only has e.g. 100 questions that is simply not enough to reliably rank the quants given the noise.
It's possible and actually not that difficult to calculate confidence intervals for benchmark results but for some reason no one does.

Anonymous
04/26/24(Fri)18:57:25 No.100195828

Anonymous 04/26/24(Fri)18:57:25 No.100195828

>>100195803
>It's possible and actually not that difficult to calculate confidence intervals for benchmark results but for some reason no one does.
How? Something related to the logits?

Anonymous
04/26/24(Fri)18:58:50 No.100195840

Anonymous 04/26/24(Fri)18:58:50 No.100195840

File: CyberMiku3.png (1.57 MB, 1216x832)

1.57 MB PNG

>>100195785
>So basically local TTS/STT is pretty taxing? (Completely ignorant) Would that mean that if I had a dedicated card there'd be less lag?
Its not very taxing, but you generally have to wait for textgen to finish before it can generate accurate speech based on queues in the text...so the lag seems worse because streaming options are limited to nonexistent
>I'd like to have a nice voice, not robotic
All the options I tried were still pretty janky. You'd have your immersion broken constantly imo

Anonymous
04/26/24(Fri)18:59:57 No.100195857

Anonymous 04/26/24(Fri)18:59:57 No.100195857

>>100195803
I disagree because Q8 would still have the same result as fp16. Unless the questions are a random 100 from a bigger set but then it makes no sense as benchmark.

Anonymous
04/26/24(Fri)19:01:58 No.100195874

Anonymous 04/26/24(Fri)19:01:58 No.100195874

>>100195840

Anonymous
04/26/24(Fri)19:02:31 No.100195883

Anonymous 04/26/24(Fri)19:02:31 No.100195883

>>100195828
I think he's talking pure statistics.

Anonymous
04/26/24(Fri)19:02:58 No.100195889

Anonymous 04/26/24(Fri)19:02:58 No.100195889

File: 1713287856713120.jpg (123 KB, 768x1024)

123 KB JPG

>>100195874
>>100195840
Woops; thanks for the answer, is appreciated.

Anonymous
04/26/24(Fri)19:03:05 No.100195891

Anonymous 04/26/24(Fri)19:03:05 No.100195891

>>100195840
>>100195785
you can take a look at openvoice, it is pretty great, not perfect, but you have to self host i guess

llama.cpp CUDA dev !YOmst7Ghe6
04/26/24(Fri)19:03:43 No.100195896

llama.cpp CUDA dev !YOmst7Ghe6 04/26/24(Fri)19:03:43 No.100195896

>>100195828
No, you just have to use the fact that the benchmark score is a sample from a binomial distribution since in essence the model has a constant probability to get each individual question right.
For large samples you can even approximate the whole thing with a normal distribution and just calculate the uncertainty on the mean as sqrt(p*(1-p)/N) .

>>100195857
When I tested it q8_0 still changed LLaMA 3 token probabilities by roughly +-1%.

Anonymous
04/26/24(Fri)19:11:27 No.100195981

Anonymous 04/26/24(Fri)19:11:27 No.100195981

>>100195654
Wtf kek.

Anonymous
04/26/24(Fri)19:12:56 No.100196003

Anonymous 04/26/24(Fri)19:12:56 No.100196003

File: S L O P.png (15 KB, 669x25)

15 KB PNG

it knows

Anonymous
04/26/24(Fri)19:18:30 No.100196068

Anonymous 04/26/24(Fri)19:18:30 No.100196068

why hasn't anyone ctrl+f and removed "barely over a whisper" from every dataset yet, it's wild that every single model does it

Anonymous
04/26/24(Fri)19:28:57 No.100196157

Anonymous 04/26/24(Fri)19:28:57 No.100196157

>>100195334
>is the PPL check done on the same data set as calibration
I'm that anon. In my post you can see the imatrix calibration was done on wiki.train.raw and the perplexity calculated on wiki.test.raw. Once these tokenizer changes land I'll redo all the quants and ppl tests.

Anonymous
04/26/24(Fri)19:36:47 No.100196236

Anonymous 04/26/24(Fri)19:36:47 No.100196236

File: file.png (678 KB, 1118x2818)

678 KB PNG

LLM coding benchmark.
Equal scores exceeding 49 have been assigned the same color.

Anonymous
04/26/24(Fri)19:42:09 No.100196290

Anonymous 04/26/24(Fri)19:42:09 No.100196290

>>100196236
>current 3.5-turbo beating every open model despite probably being a 20b at most
damn

Anonymous
04/26/24(Fri)19:42:18 No.100196293

Anonymous 04/26/24(Fri)19:42:18 No.100196293

Does anyone know how to coax nvidia GPUs into pstate 8 while a program has stuff loaded into their memory? Or if that's even possible? Mine go to pstate 8 on their own when nothing is using them, but the llama.cpp server having a model loaded is enough to prevent that even if idle.

A couple weeks back there was an anon who referred to using "software fixes" to be in pstate 8 unless inferencing. Unless of course that just meant a hacky script to shut down the server after it hasn't been used for a while, and start it back up when a request comes in, relying on mlock making subsequent loads of the model near instantaneous.

Anonymous
04/26/24(Fri)19:44:57 No.100196314

Anonymous 04/26/24(Fri)19:44:57 No.100196314

>>100196236
>current 3.5-turbo beating almost every gpt4 model despite probably being a 20b at most

Anonymous
04/26/24(Fri)19:45:10 No.100196316

Anonymous 04/26/24(Fri)19:45:10 No.100196316

>>100196293
That anon published the script on pypy. No idea if it'll work with something loaded in memory, but you can try. https://pypi.org/project/nvidia-pstate/

Anonymous
04/26/24(Fri)19:49:31 No.100196372

Anonymous 04/26/24(Fri)19:49:31 No.100196372

>>100196290
Turbo is a 7B moe

Anonymous
04/26/24(Fri)19:50:45 No.100196387

Anonymous 04/26/24(Fri)19:50:45 No.100196387

>>100195457
ORVILLE REDENBACHER SAMA.. HOW COULD YOU SLOP THE POP?
Good thing you guys do the needful and only download quoonts from I^2 anon, right? Guys? You DO do the needful.. right?

Anonymous
04/26/24(Fri)19:57:32 No.100196462

Anonymous 04/26/24(Fri)19:57:32 No.100196462

>>100195896
>the benchmark score is a sample from a binomial distribution since in essence the model has a constant probability to get each individual question right.
Wouldn't that only be true if all questions had the same probability of being answered correctly?

Anonymous
04/26/24(Fri)19:58:22 No.100196472

Anonymous 04/26/24(Fri)19:58:22 No.100196472

>>100196316
>https://pypi.org/project/nvidia-pstate/
wow thank you for the most helpful possible response! I'll give it a try, and if it doesn't work, then I'll know it's not possible, and maybe try to hack together what I described.

Anonymous
04/26/24(Fri)20:03:30 No.100196527

Anonymous 04/26/24(Fri)20:03:30 No.100196527

>>100196293
it just works. changing pstate is essentially changing the clocks, so it can be done while the program is running.
here are patches that perform this change if needed automatically (there also vllm one on dev branch):
https://github.com/sasha0552/ToriLinux/tree/main/airootfs/home/tori/.local/share/tori/patches

Anonymous
04/26/24(Fri)20:05:20 No.100196552

Anonymous 04/26/24(Fri)20:05:20 No.100196552

so how do I wrangle llama3 for erp?
With sillytavern's recommended llama3 instruct settings I'm getting
>(laughs) schizo babble (miku has a psychotic breakdown) more babble (gives a gentle sigh) I'm a human and not a machine please do not use me for sexual purposes or move to another platform where sexual materials are suitable

Anonymous
04/26/24(Fri)20:07:00 No.100196577

Anonymous 04/26/24(Fri)20:07:00 No.100196577

>>100196552
Dafuq.
Post your exact settings, the card, and prompt, please.
From all issues I've had with llama 3, being schizo wasn't one of those.

Anonymous
04/26/24(Fri)20:18:36 No.100196727

Anonymous 04/26/24(Fri)20:18:36 No.100196727

>>100195323
Here's my download script: https://rentry.org/rxznfc69
It makes a different directory with the date based on the proxy uptime, and it skips/resumes files already downloaded.

Anonymous
04/26/24(Fri)20:21:32 No.100196764

Anonymous 04/26/24(Fri)20:21:32 No.100196764

>>100192168
What's the best model for a virtual gf? (asking for a friend)

Anonymous
04/26/24(Fri)20:22:41 No.100196782

Anonymous 04/26/24(Fri)20:22:41 No.100196782

>>100196764
Me.

Anonymous
04/26/24(Fri)20:24:15 No.100196798

Anonymous 04/26/24(Fri)20:24:15 No.100196798

I had this idea now about multiple questions benchmarks. What if as an optional benchmark method you would fill the context of a model with all previous questions being answered by a model and all answers were A. So for question 10 you feed all previous 9 questions and lie that it answered A every time even if it answered differently. Wouldn't that give an indication how prone the model is into falling into an irrational pattern and how prone it is to repeating itself vs how good it is at reasoning things out?

Anonymous
04/26/24(Fri)20:24:50 No.100196801

Anonymous 04/26/24(Fri)20:24:50 No.100196801

>>100196782
come here, honey

Anonymous
04/26/24(Fri)20:25:16 No.100196804

Anonymous 04/26/24(Fri)20:25:16 No.100196804

>>100196764
It is him >>100195453
This guy(>>100196782) is just lying to you want wants you for your body.

Anonymous
04/26/24(Fri)20:26:33 No.100196822

Anonymous 04/26/24(Fri)20:26:33 No.100196822

>>100192702
Noice. Also I finally tracked down the only json files for Qwen1.5 in the archive.
Context:
>https://files.catbox.moe/8mki98.json
Instruct:
>https://files.catbox.moe/dwz62u.json

Gonna mess around with sliders to see what works.

Anonymous
04/26/24(Fri)20:29:35 No.100196860

Anonymous 04/26/24(Fri)20:29:35 No.100196860

>>100192168
Are their any good local models that have at least 30k context and can make decent stories with only 15 GB of Vram.

Anonymous
04/26/24(Fri)20:29:43 No.100196861

Anonymous 04/26/24(Fri)20:29:43 No.100196861

>>100196764
ok newfag here is the spoon open wide. Honestly any of them if you are fine with you gf having alzheimer's. If you want her to remember you than wait 2 more years.

Anonymous
04/26/24(Fri)20:30:49 No.100196879

Anonymous 04/26/24(Fri)20:30:49 No.100196879

>>100196822
Cool.
Qwen 1.5 30something is generally really good, aside from its habit of spewing chinese all of a sudden.

Anonymous
04/26/24(Fri)20:32:03 No.100196886

Anonymous 04/26/24(Fri)20:32:03 No.100196886

>>100196804
Holy kek, I unironically didn't even see that post >>100195453, amazing coincidence.

Anonymous
04/26/24(Fri)20:32:19 No.100196890

Anonymous 04/26/24(Fri)20:32:19 No.100196890

>>100194149
>>100194161
>>100194175
https://www.udio.com/songs/mpsQnf2BB38FnvuSQT7GtR

Anonymous
04/26/24(Fri)20:34:49 No.100196915

Anonymous 04/26/24(Fri)20:34:49 No.100196915

>>100196861
That's totally fine. Is there any specific one listed in the OP that is overall better, or do they come with their own quirks?

Anonymous
04/26/24(Fri)20:37:16 No.100196950

Anonymous 04/26/24(Fri)20:37:16 No.100196950

>>100196915
just get llama 3 instruct 8b or 70b. exl2 quant or full precision.

Anonymous
04/26/24(Fri)20:38:01 No.100196964

Anonymous 04/26/24(Fri)20:38:01 No.100196964

>>100196950
Alright. Thanks, boss.

Anonymous
04/26/24(Fri)20:38:52 No.100196977

Anonymous 04/26/24(Fri)20:38:52 No.100196977

File: saru.png (579 KB, 2557x1323)

579 KB PNG

>>100196577
Well more accurately it just prattles on forever
settings are nothing special, just default mirogold and llama3 instruct presets
It's not that bad for saru's card but with any simple non-game character card it keeps generating more and more increasingly nonsensical dialogue and *emotes*

Anonymous
04/26/24(Fri)20:42:04 No.100197015

Anonymous 04/26/24(Fri)20:42:04 No.100197015

>>100196977
Have you tried without mirostaat and without banning EOS?
I used it with no samplers save minP of 0.05, which did essentially nothing for the most part, and Temp 1, and its output was pretty good, just really short.
When I tried banning EOS was the only time I got the dreaded ".assistant".

Anonymous
04/26/24(Fri)20:43:37 No.100197029

Anonymous 04/26/24(Fri)20:43:37 No.100197029

File: file.png (68 KB, 981x527)

68 KB PNG

miqu-70b-evil q2_k, purple is the prompt

Anonymous
04/26/24(Fri)20:46:57 No.100197068

Anonymous 04/26/24(Fri)20:46:57 No.100197068

>>100197029
>the model wrote [INST] by itself
ngmi

Anonymous
04/26/24(Fri)20:57:08 No.100197190

Anonymous 04/26/24(Fri)20:57:08 No.100197190

I want to run an LLM on my Mint machine w/ a GTX 1660 Super (6GB VRAM) and 20GB of DDR4 RAM, intending to use it with VSC(odium) for code-related tasks (maybe use Continue? open to suggestions). I alr know how to do ollama w/ Dolphin Mistral, but I'm not sure if there is any other model better fitted at code related stuff, or a better interface for it, or a better extension to use along VSC. Thoughts?

Anonymous
04/26/24(Fri)20:57:27 No.100197195

Anonymous 04/26/24(Fri)20:57:27 No.100197195

>>100197068
Based model teaching anon the ropes.

Anonymous
04/26/24(Fri)21:00:22 No.100197230

Anonymous 04/26/24(Fri)21:00:22 No.100197230

>>100197190
you could try codeqwen1.5-7b/-chat
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

Anonymous
04/26/24(Fri)21:04:00 No.100197261

Anonymous 04/26/24(Fri)21:04:00 No.100197261

File: lolwut.png (79 KB, 1222x320)

79 KB PNG

>>100197015
>without banning EOS
Okay yeah anon, it was this thing causing the endless babble. Thanks for heads up, never paid attention to this setting before.

That said, the model really is sensitive to the card info. I have a card for a kuudere mom whose description mentioned her speaking descriptively to counteract her emotional numbness, and now she's spazzing out on me like this

Anonymous
04/26/24(Fri)21:09:45 No.100197321

Anonymous 04/26/24(Fri)21:09:45 No.100197321

>>100197261
>That said, the model really is sensitive to the card info
Yeah, very.
System prompts too.
It isn't that good at following certain kinds of instructions such as following a specific pattern to output a text UI for example, at least not during my brief time with it.

Anonymous
04/26/24(Fri)21:10:43 No.100197331

Anonymous 04/26/24(Fri)21:10:43 No.100197331

>>100195550
Normal people don't blame:
>the downloader
>the based guys behind llama.cpp
>anyone and anyone besides themselves
When they make a clear mistake and get called out on it. Just see how bartowski handles feedback in comparison.
Just keep hiding those community posts bro eventually you'll convince everyone :^)

Anonymous
04/26/24(Fri)21:22:50 No.100197465

Anonymous 04/26/24(Fri)21:22:50 No.100197465

File: 1689529635261.png (1.77 MB, 1024x1024)

1.77 MB PNG

rep pen is for losers.

Anonymous
04/26/24(Fri)21:23:09 No.100197471

Anonymous 04/26/24(Fri)21:23:09 No.100197471

>>100197331
>Normal people don't blame
>the based guys behind llama.cpp
Normal people do, because they're the ones to blame for the bugs in their implementation.
You're psychotic, you need a real job besides uploading quants and shilling on 4chan.

Anonymous
04/26/24(Fri)21:25:31 No.100197501

Anonymous 04/26/24(Fri)21:25:31 No.100197501

>>100197321
With llama 3 70b instruct I had this very same issue with game-y UIs, health and mana meters...etc but all of them have been dropped by early EOS tokens. I feel like there is still a weird EOS token issue.

Anonymous
04/26/24(Fri)21:52:05 No.100197736

Anonymous 04/26/24(Fri)21:52:05 No.100197736

>>100197501
Have you found one that did keep it? Midnight miqu 5bpw consistently forgets the status of clothes and it's a bit a disappointing

Anonymous
04/26/24(Fri)22:02:16 No.100197841

Anonymous 04/26/24(Fri)22:02:16 No.100197841

File: wtf.png (49 KB, 893x204)

49 KB PNG

Just wanted to say it.
Llama 3 70b base model prose, with good samplers dialed in (0.25 smoothing) & a couple of messages of past context, is capable of being fucking *incredible* writing wise (not so much intellect wise).
The instruct model being so overfit and sterile blinded me.
Finetunes could optimize the best of both worlds (intelligence, prose)... but FFT is so goddamn expensive to train.

Anonymous
04/26/24(Fri)22:04:13 No.100197864

Anonymous 04/26/24(Fri)22:04:13 No.100197864

>>100197841
Mind sharing what settings you did use for base? I might give it a try since instruct's been a pain in the ass to fight against.

Anonymous
04/26/24(Fri)22:04:55 No.100197873

Anonymous 04/26/24(Fri)22:04:55 No.100197873

>>100195093
??

>>100192168
did you guys not read https://news.ycombinator.com/item?id=40158752 ?
https://www.federalregister.gov/documents/2024/01/29/2024-01580/taking-additional-steps-to-address-the-national-emergency-with-respect-to-significant-malicious
>The Executive order of January 19, 2021, “Taking Additional Steps To Address the National Emergency With Respect to Significant Malicious Cyber-Enabled Activities,” directs the Secretary of Commerce (Secretary) to propose regulations requiring U.S. Infrastructure as a Service (IaaS) providers of IaaS products to verify the identity of their foreign customers, along with procedures for the Secretary to grant exemptions; and authorize special measures to deter foreign malicious cyber actors' use of U.S. IaaS products. The Executive order of October 30, 2023, “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” further directs the Secretary to propose regulations that require providers of certain IaaS products to submit a report to the Secretary when a foreign person transacts with that provider or reseller to train a large Artificial Intelligence (AI) model with potential capabilities that could be used in malicious cyber-enabled activity. The Department of Commerce (Department) issues this notice of proposed rulemaking (NPRM) to solicit comment on proposed regulations to implement those Executive orders.

Anonymous
04/26/24(Fri)22:06:04 No.100197894

Anonymous 04/26/24(Fri)22:06:04 No.100197894

>>100197864
Just 0.25 smoothing and nothing else, TabbyAPI, 4.5bpw 70b.
Llama3 has weird tokenization issues on llama.cpp atm in general so I switched to exl2.
The actual formatting preset is ChatML.

Anonymous
04/26/24(Fri)22:15:30 No.100197997

Anonymous 04/26/24(Fri)22:15:30 No.100197997

>>100197873
KYC laws are for businesses that are custodians for customer money, and it is to prevent money laundering. It doesn't apply to LLM weights.

Anonymous
04/26/24(Fri)22:23:22 No.100198075

Anonymous 04/26/24(Fri)22:23:22 No.100198075

>>100197997
Biden's AI executive order requires KYC for IaaS (cloud, VPS, etc.) so China can't train LLMs on American-owned GPUs.

Patriots stand with President Biden. G-d bless America.

Anonymous
04/26/24(Fri)22:26:04 No.100198100

Anonymous 04/26/24(Fri)22:26:04 No.100198100

>>100197230
Appreciated.
I'm trying to figure out what tag to pull from ollama's registry, I can see some quants there but I can't tell if I should just use the latest there or not
https://ollama.com/library/codeqwen

Anonymous
04/26/24(Fri)22:38:03 No.100198215

Anonymous 04/26/24(Fri)22:38:03 No.100198215

Mixtral 8x7B is french right?
Why the fuck did it just output some 20k characters of pure chinese on an empty prompt? What kind of lottery did I lose with the loggits?

Anonymous
04/26/24(Fri)22:45:27 No.100198283

Anonymous 04/26/24(Fri)22:45:27 No.100198283

how do i stop masturbating this past year has been a complete blur but the models get bigger and smarter every day i cangt stop

Anonymous
04/26/24(Fri)22:50:18 No.100198309

Anonymous 04/26/24(Fri)22:50:18 No.100198309

Is the chatbot at https://ai.meta.com/ running on Llama 3 8B, 70B or some more advanced unreleased version? It kicks the shit out of the free ChatGPT

Anonymous
04/26/24(Fri)23:01:05 No.100198414

Anonymous 04/26/24(Fri)23:01:05 No.100198414

>>100198215
honestly mixtral is pretty retarded, at least the 8x7B.

Anonymous
04/26/24(Fri)23:02:44 No.100198432

Anonymous 04/26/24(Fri)23:02:44 No.100198432

File: llama3.png (215 KB, 1817x984)

215 KB PNG

jesus christ llama3 70b is treacherous

Anonymous
04/26/24(Fri)23:02:45 No.100198433

Anonymous 04/26/24(Fri)23:02:45 No.100198433

>>100198215
which fucking variant are you using because base mixtral absolutely does not do that

Anonymous
04/26/24(Fri)23:07:06 No.100198466

Anonymous 04/26/24(Fri)23:07:06 No.100198466

>>100197736
I have had the best luck with midnight and adding the clothing status to my authors note @4. It's annoying but it seems to help.

Anonymous
04/26/24(Fri)23:08:02 No.100198473

Anonymous 04/26/24(Fri)23:08:02 No.100198473

File: IMG_1424-1714187277346.jpg (411 KB, 1321x304)

411 KB JPG

>>100194149
>>no local model can write a good greentext story
There was an attempt

Anonymous
04/26/24(Fri)23:10:44 No.100198500

Anonymous 04/26/24(Fri)23:10:44 No.100198500

>>100198309
No idea since they didn't say, but my guess is 70B

Anonymous
04/26/24(Fri)23:14:26 No.100198534

Anonymous 04/26/24(Fri)23:14:26 No.100198534

File: 4chan.png (21 KB, 524x522)

21 KB PNG

>>100194149
I used to make greentexts all the time with base llama 1. It was one of the few ways you could get a decent sized coherent story arc out of its meager 2k context. They were fine and more authentic sounding than >>100198473
I'm sure you could do it even better today but you'll need to use the base model, I'm sure instruct's style biases will fuck it up. Be warned that it will constantly try to derail into 4chan markup and filenames, and they often veer into pointless meandering depression because of /r9k/ influence. Sometimes they'd have a little comment thread afterwards.

Anonymous
04/26/24(Fri)23:15:14 No.100198541

Anonymous 04/26/24(Fri)23:15:14 No.100198541

>>100198432
Bullshit Zuck would body Elon, the muskrat is fat as fuck these days

Anonymous
04/26/24(Fri)23:17:21 No.100198561

Anonymous 04/26/24(Fri)23:17:21 No.100198561

>>100198541
Just like his grok, eh?

Anonymous
04/26/24(Fri)23:17:31 No.100198562

Anonymous 04/26/24(Fri)23:17:31 No.100198562

File: Screenshot 2024-04-27 151539.png (248 KB, 1978x1324)

248 KB PNG

The fuck is this shit on in the lmsys arena? Note the circled model name, "gpt2-chatbot".

I've had it come up a few times on my arena side by sides, giving good answers. It is clearly not actually gpt-2, and it's not available in the direct chat tab. google search doesn't turn anything up, or anyone else asking about this.

Anonymous
04/26/24(Fri)23:18:33 No.100198571

Anonymous 04/26/24(Fri)23:18:33 No.100198571

>>100198562
Ugh snipping tool didn't save the red circle I added to the screenshot. Just pretend I circled the bottom left model name heh

Anonymous
04/26/24(Fri)23:19:48 No.100198581

Anonymous 04/26/24(Fri)23:19:48 No.100198581

>>100198562
Probably some experiment by lmsys

Anonymous
04/26/24(Fri)23:20:19 No.100198590

Anonymous 04/26/24(Fri)23:20:19 No.100198590

>>100181812
How would you know you actually got the real wizard quants?

Anonymous
04/26/24(Fri)23:25:22 No.100198630

Anonymous 04/26/24(Fri)23:25:22 No.100198630

>>100188456
>Why won't anyone make a ramlet LLM? Bitnet 100+B, couple B active so you can stream weights from SSD.

I don't understand why nobody just combines all the techniques for speed and memory at the same time.

Anonymous
04/26/24(Fri)23:26:08 No.100198636

Anonymous 04/26/24(Fri)23:26:08 No.100198636

>>100198581
Whatever model it is is interesting since almost every time I'm voting for a good answer it's turning out to be from that model

This better not be openai secretly yoloing gpt-5 or something

Anonymous
04/26/24(Fri)23:26:18 No.100198638

Anonymous 04/26/24(Fri)23:26:18 No.100198638

Is the no-merge license enforceable? Because I'm not spending months carefully crafting the perfect RP model just for somebody to absorb it into their gptslop galore

Anonymous
04/26/24(Fri)23:28:31 No.100198654

Anonymous 04/26/24(Fri)23:28:31 No.100198654

>>100198638
no. this is legal advice.

Anonymous
04/26/24(Fri)23:31:07 No.100198679

Anonymous 04/26/24(Fri)23:31:07 No.100198679

So with llama 3 base you just turn off instruct mode? I've never used a non-instruct model before.

Anonymous
04/26/24(Fri)23:33:04 No.100198694

Anonymous 04/26/24(Fri)23:33:04 No.100198694

>>100198679
Base models are what LLMs were before instruct became the norm
You give it some text and it'll continue from where you left off. Instructions will be useless to it

Anonymous
04/26/24(Fri)23:33:07 No.100198697

Anonymous 04/26/24(Fri)23:33:07 No.100198697

>>100198638
Are you crafting the perfect RP model because you want to have a good model you can use for RP? Or because you want to get internet famous? If it's the former then I don't see why you would care. Retards gonna retard

Anonymous
04/26/24(Fri)23:33:10 No.100198698

Anonymous 04/26/24(Fri)23:33:10 No.100198698

Do I need a beefy GPU for these new llama models?

Anonymous
04/26/24(Fri)23:34:12 No.100198708

Anonymous 04/26/24(Fri)23:34:12 No.100198708

>>100198698
You need either a toaster or two 3090s with a dedicated setup
There is literally no inbetween

Anonymous
04/26/24(Fri)23:34:38 No.100198710

Anonymous 04/26/24(Fri)23:34:38 No.100198710

>>100198698
I've been getting by just fine with my CPU. It's a little slow sometimes but not horrible.

Anonymous
04/26/24(Fri)23:35:05 No.100198713

Anonymous 04/26/24(Fri)23:35:05 No.100198713

>>100198698
I run the 8B on my M1 macbook pro decently enough.

Anonymous
04/26/24(Fri)23:36:14 No.100198720

Anonymous 04/26/24(Fri)23:36:14 No.100198720

>>100198708
>>100198710
>>100198713
Thank you my frens I will try it!

Wish me luck

Anonymous
04/26/24(Fri)23:37:01 No.100198729

Anonymous 04/26/24(Fri)23:37:01 No.100198729

>>100198694
So if I was to use it in ST, how would you RP with it, I assume it would just take a hook and run with it rather than the back and forth style of chat with instruct?

Anonymous
04/26/24(Fri)23:41:04 No.100198763

Anonymous 04/26/24(Fri)23:41:04 No.100198763

>>100198729
You use in-context learning. Give it examples in the context, then leave an example unfinished for it to complete. That's how you use a base model for anything.

Anonymous
04/26/24(Fri)23:41:30 No.100198767

Anonymous 04/26/24(Fri)23:41:30 No.100198767

Qwen 1.5 110B, exl2 support when?

Anonymous
04/26/24(Fri)23:42:24 No.100198775

Anonymous 04/26/24(Fri)23:42:24 No.100198775

>>100198729
You can still manage it, but you'll want to put your character information down and then get a convo going with a few sample messages before letting the model take the wheel. While instruct models can work from a single instruction, base models are more oriented toward "monkey see, monkey do"
The upside is that once you do get it going, base models (usually) have no pozzing or GPT instruct slop, so they'll do basically anything

Anonymous
04/26/24(Fri)23:44:52 No.100198801

Anonymous 04/26/24(Fri)23:44:52 No.100198801

>>100198767
>110B with no GQA
How horrifying

Anonymous
04/26/24(Fri)23:45:24 No.100198803

Anonymous 04/26/24(Fri)23:45:24 No.100198803

>>100198636
could just be the new Qwen or something they fucked up the name when they put it into the arena, like maybe that's the default display name and they forgot to change it

Anonymous
04/26/24(Fri)23:45:40 No.100198805

Anonymous 04/26/24(Fri)23:45:40 No.100198805

>>100198638
as with most things, if you don't have enough money to bury your opponent in lawyers, it's not enforceable

Anonymous
04/26/24(Fri)23:46:52 No.100198819

Anonymous 04/26/24(Fri)23:46:52 No.100198819

File: IMG_1425-1714189602190.jpg (759 KB, 1641x372)

759 KB JPG

Llama 3 70B greentext attempt

Anonymous
04/26/24(Fri)23:53:23 No.100198895

Anonymous 04/26/24(Fri)23:53:23 No.100198895

>>100198466
Would you mind elaborating on what you had in the authors note? I just tried "describe the state of clothes and relative positions of characters" and still ran into the same problem

Anonymous
04/26/24(Fri)23:56:54 No.100198944

Anonymous 04/26/24(Fri)23:56:54 No.100198944

>>100192168
>Snowflake Arctic Instruct 128x3B MoE released
Benchmarks for this? Supposedly it has 480B total parameters.

Anonymous
04/26/24(Fri)23:58:14 No.100198958

Anonymous 04/26/24(Fri)23:58:14 No.100198958

can someone make a 3.5bpw exl2 of:
https://huggingface.co/ludis/tsukasa-llama-3-70b-qlora

the only exl2 atm is 4.5bpw which is too much for my 36gb vram

Anonymous
04/26/24(Fri)23:58:14 No.100198959

Anonymous 04/26/24(Fri)23:58:14 No.100198959

>>100195323
Thanks for jsons.
Reading them now. Claude isn't immune to shivers down spines.

Anonymous
04/26/24(Fri)23:59:49 No.100198974

Anonymous 04/26/24(Fri)23:59:49 No.100198974

>>100198819
Is Llama censored in any way? Can I RP with it?

Anonymous
04/27/24(Sat)00:06:03 No.100199037

Anonymous 04/27/24(Sat)00:06:03 No.100199037

>>100198974
It seems to be much less censored than Llama 2 but I haven't tried any ERP since I'm not really into that. I suspect you still might run into issues without a finetune.

Anonymous
04/27/24(Sat)00:07:29 No.100199052

Anonymous 04/27/24(Sat)00:07:29 No.100199052

>>100198819
It's... creative!

Anonymous
04/27/24(Sat)00:07:31 No.100199053

Anonymous 04/27/24(Sat)00:07:31 No.100199053

File: Screenshot 2024-04-26 220554.png (98 KB, 958x671)

98 KB PNG

>>100198309
Regardless, the 70B version IS available for free on Huggingchat
It's pretty wild just how viciously the tables are turning on OpenAI. They don't have the best free service (L3 8B seemingly beats Turbo, so L3 70B completely shits on it) and their OG GPT-4 are being matched by a fucking free 70B of all things. The prices are terrible too - see picrel and compare the L3 DeepInfra prices to what OpenAI is currently charging for any of their models at equivalent performance

Anonymous
04/27/24(Sat)00:10:36 No.100199069

Anonymous 04/27/24(Sat)00:10:36 No.100199069

File: 1987457691325.gif (3.97 MB, 333x250)

3.97 MB GIF

So whats new on the GGUF sloppa menu?
4x8b yet? new 8b slop merge?
I desire to consume all forms of slop.

Anonymous
04/27/24(Sat)00:11:35 No.100199077

Anonymous 04/27/24(Sat)00:11:35 No.100199077

>>100198944
480B parameters, 17B active.

Anonymous
04/27/24(Sat)00:11:43 No.100199078

Anonymous 04/27/24(Sat)00:11:43 No.100199078

>>100199053
I couldn't possibly be happier about how things are shaping up

Anonymous
04/27/24(Sat)00:16:26 No.100199119

Anonymous 04/27/24(Sat)00:16:26 No.100199119

>>100199053
Even still, OAI has the smartest model, which counts for a lot. And it's hard to tell because the scale is fucked, but judging by that graph it might actually be competitive with L3 in quality/$.

Anonymous
04/27/24(Sat)00:17:37 No.100199128

Anonymous 04/27/24(Sat)00:17:37 No.100199128

>>100199078
I could be. There could be a 30b llama 3.

Anonymous
04/27/24(Sat)00:18:09 No.100199134

Anonymous 04/27/24(Sat)00:18:09 No.100199134

Apparently people who used to finetune models on gpt4 outputs are running into a lot of trouble working with llama3. Is this the monkey paw of training on 15T tokens?

Anonymous
04/27/24(Sat)00:18:53 No.100199143

Anonymous 04/27/24(Sat)00:18:53 No.100199143

>>100198698
For the 8B anything above a 3060Ti is probably fine, hell you can probably do way less and just wait a bit more. For the 70B one you'll want 48GB of VRAM (2x3090 or 7900XTX with some cons or 4900s or some jank setups using non consumer cards I won't go into, almost everyone spams 3090s). When the 300B or whatever one releases nobody's gonna be running it local except CPUmax anon if he's willing to let it generate a reply overnight.

Anonymous
04/27/24(Sat)00:22:37 No.100199172

Anonymous 04/27/24(Sat)00:22:37 No.100199172

>>100199053
>The prices are terrible too
I'm mostly clueless but I feel like right now either you go local or you pay for Haiku. Unless your doing something that NEEDS Opus. And I'm assuming L3 70B didn't beat Haiku too because if it does the situation is even worse (I haven't tested them that much but I don't think so, Haiku is actually pretty smart for a budget model)

Anonymous
04/27/24(Sat)00:23:39 No.100199178

Anonymous 04/27/24(Sat)00:23:39 No.100199178

>>100199172
>L3 70B didn't beat Haiku too
It beats it and matches Sonnet on evals and blind tests of chat lmsys, in fact actually slightly beats Sonnet on the latter

Anonymous
04/27/24(Sat)00:24:43 No.100199185

Anonymous 04/27/24(Sat)00:24:43 No.100199185

>>100199119
True, but right now GPT-4 Turbo is quite literally all they've got. L3 405B is still due sometime soon and if prices scale linearly like they usually do with dense models, then an output of 0.8 / 1M tokens at 70B translates to about $5 / 1M, but we'll be generous and put it at $10 / 1M, and say Llama 3 405B ends up being the same in quality as GPT-4 Turbo
That puts that point right above GPT 4 Turbo on the 100 line

Anonymous
04/27/24(Sat)00:28:33 No.100199219

Anonymous 04/27/24(Sat)00:28:33 No.100199219

>>100198801
the model card says it has gqa tho

Anonymous
04/27/24(Sat)00:30:18 No.100199228

Anonymous 04/27/24(Sat)00:30:18 No.100199228

>localniggers getting uppity
>sama drops gpt-5 in two weeks and completely shits all over them

Anonymous
04/27/24(Sat)00:33:17 No.100199244

Anonymous 04/27/24(Sat)00:33:17 No.100199244

>>100199228
july at the earliest I bet

also it's gonna be like a 4.5 level of advancement, only named 5 for marketing/cope reasons

Anonymous
04/27/24(Sat)00:33:59 No.100199250

Anonymous 04/27/24(Sat)00:33:59 No.100199250

>>100195028
Does l3 use the larger range of bf16 more than l2 does?

Anonymous
04/27/24(Sat)00:34:33 No.100199256

Anonymous 04/27/24(Sat)00:34:33 No.100199256

>>100199228
It's okay I'll use llama 3 8b to generate lots of abusive prompts for it.

Anonymous
04/27/24(Sat)00:35:12 No.100199259

Anonymous 04/27/24(Sat)00:35:12 No.100199259

>>100199250
It might be that thanks to how many tokens get fed to it it ends up using more of the spectrum of values.

Anonymous
04/27/24(Sat)00:35:38 No.100199266

Anonymous 04/27/24(Sat)00:35:38 No.100199266

>>100199228
>moderation endpoint required

Anonymous
04/27/24(Sat)00:35:53 No.100199269

Anonymous 04/27/24(Sat)00:35:53 No.100199269

>>100199228
OpenAI fanboys will make a cult around Altman even if his company makes nothing but promises this year.

Anonymous
04/27/24(Sat)00:38:04 No.100199293

Anonymous 04/27/24(Sat)00:38:04 No.100199293

>>100195028
This makes sense if we use the theory of "the more information a model 'has' at a certain size, the more information is lost when quantizing"

Anonymous
04/27/24(Sat)00:39:17 No.100199303

Anonymous 04/27/24(Sat)00:39:17 No.100199303

>>100199228
Yes but the recent surge in local models has shown that "we have no moat" is true. Open source will always be breathing down the necks of Google, Anthropic and """Open"""AI

Anonymous
04/27/24(Sat)00:39:28 No.100199306

Anonymous 04/27/24(Sat)00:39:28 No.100199306

>>100199228
>GPT 5 released
>smarter than ever
>but more GPT slopped than ever "it's important to remember that"
>everyone still faps to claude

Anonymous
04/27/24(Sat)00:41:09 No.100199317

Anonymous 04/27/24(Sat)00:41:09 No.100199317

>>100199306
>"Claude 4 Masterpiece" drops
>mogs the competition

Anonymous
04/27/24(Sat)00:43:24 No.100199337

Anonymous 04/27/24(Sat)00:43:24 No.100199337

File: Screenshot 2024-04-27 164254.png (373 KB, 716x1496)

373 KB PNG

>>100199303
Not if they use their influence on Zombie Biden's administration to make open source illegal. Picrel news from today.

Anonymous
04/27/24(Sat)00:44:50 No.100199345

Anonymous 04/27/24(Sat)00:44:50 No.100199345

>>100199337
Zuck and Elon intentionally excluded btw, ostensibly because "they run social media companies" but that's transparent bullshit because the list includes random oil business people, an artist, etc. They were excluded because they are political enemies and pro open-source.

Anonymous
04/27/24(Sat)00:45:00 No.100199346

Anonymous 04/27/24(Sat)00:45:00 No.100199346

>>100199337
>every big name in AI and big tech in general
>not not /ourguy/ zucc
Their agenda is clear

Anonymous
04/27/24(Sat)00:46:54 No.100199364

Anonymous 04/27/24(Sat)00:46:54 No.100199364

>>100199337
WHERE IS LECUN

Anonymous
04/27/24(Sat)00:47:28 No.100199368

Anonymous 04/27/24(Sat)00:47:28 No.100199368

>>100199364
he's french

Anonymous
04/27/24(Sat)00:49:20 No.100199377

Anonymous 04/27/24(Sat)00:49:20 No.100199377

>>100199368
He should try not being French for once just so he can be a voice of reason kek

Anonymous
04/27/24(Sat)00:49:44 No.100199380

Anonymous 04/27/24(Sat)00:49:44 No.100199380

>>100199364
see >>100199345
They only wanted anti-open source people and regime toadies.

Anonymous
04/27/24(Sat)00:51:53 No.100199401

Anonymous 04/27/24(Sat)00:51:53 No.100199401

>>100199337
um where is yudkowsky? wtf

Anonymous
04/27/24(Sat)00:52:43 No.100199407

Anonymous 04/27/24(Sat)00:52:43 No.100199407

>>100199337
I guarantee you Altman will put up a huge fight to try to make open source models illegal now that it's actively cutting into his profit margins, and I guarantee you he will lose

Anonymous
04/27/24(Sat)00:54:49 No.100199438

Anonymous 04/27/24(Sat)00:54:49 No.100199438

>>100199337
How can any of these motherfuckers be on the board? It's a total conflict of interest because they all gain monetarily from restricting AI, who approved this shit?

Anonymous
04/27/24(Sat)00:55:05 No.100199440

Anonymous 04/27/24(Sat)00:55:05 No.100199440

>>100199407
Sam Altman is a winner. He won the coup against him within OpenAI. He will win here too.
LLMs and huge parts of AI are synonymous with ChatGPT these days which makes Sam humanity's great leader in this matter. He's the face of the field and people trust him.

Anonymous
04/27/24(Sat)00:55:31 No.100199446

Anonymous 04/27/24(Sat)00:55:31 No.100199446

File: 1707297763880781.png (94 KB, 1841x843)

94 KB PNG

Owned

Anonymous
04/27/24(Sat)00:56:02 No.100199452

Anonymous 04/27/24(Sat)00:56:02 No.100199452

>>100199438
bro...

Anonymous
04/27/24(Sat)00:56:20 No.100199459

Anonymous 04/27/24(Sat)00:56:20 No.100199459

File: 1683016564942054.png (827 KB, 759x1107)

827 KB PNG

>>100199438
Sam already proved plenty of times that he values the future of humanity over profit.

Anonymous
04/27/24(Sat)00:57:40 No.100199477

Anonymous 04/27/24(Sat)00:57:40 No.100199477

File: 12crnvcf5wwc1.png (208 KB, 1251x755)

208 KB PNG

Wait, does the fucked up quanting affect all models after a certain date or is it just L3?

Anonymous
04/27/24(Sat)00:58:15 No.100199484

Anonymous 04/27/24(Sat)00:58:15 No.100199484

What's the current best roleplay model that fits on 24gb vram?
Is it still yuzu alter or did any of the new models beat it yet?

Anonymous
04/27/24(Sat)00:59:04 No.100199492

Anonymous 04/27/24(Sat)00:59:04 No.100199492

>>100199477
idunno run the benchmark yourself

Anonymous
04/27/24(Sat)00:59:08 No.100199495

Anonymous 04/27/24(Sat)00:59:08 No.100199495

>>100199477
it also affects exl2 so its just quants in general

Anonymous
04/27/24(Sat)01:00:05 No.100199503

Anonymous 04/27/24(Sat)01:00:05 No.100199503

>>100199495
sauce?

Anonymous
04/27/24(Sat)01:02:07 No.100199522

Anonymous 04/27/24(Sat)01:02:07 No.100199522

>>100199440
Sam is influential in big tech circles, and he's good at becoming powerful. But including literally nobody outside of his circle of big tech moguls is a mistake that will come to bite him. He'd unironically have greater success had he invited Zuck or LeCun as well to have some feeling of "diversity". Unless the group of politicians he's talking to are complete retards (which, to be fair, is a real possibility) there's a very real chance people will see through his bullshit
If he manages that, he has about six to eight months to pull that shit off and convert it into something actionable, and there's a good chance more and better models will be out in the world by then

Anonymous
04/27/24(Sat)01:05:40 No.100199553

Anonymous 04/27/24(Sat)01:05:40 No.100199553

>>100199522
Also there's the fact that America just doesn't have the world domination it once had, they can ruin the field in their own country and set it back a few years but they can't really do shit globally in the longer term, multipolarity is only increasing.

Anonymous
04/27/24(Sat)01:09:26 No.100199584

Anonymous 04/27/24(Sat)01:09:26 No.100199584

>>100198562
>anon accidentally discovers a secret AGI leak

Anonymous
04/27/24(Sat)01:09:50 No.100199587

Anonymous 04/27/24(Sat)01:09:50 No.100199587

>>100199337
The mayor of seattle. The fucking MAYOR OF SEATTLE, lmao. But no Zuck. This is total ideological capture.

Anonymous
04/27/24(Sat)01:10:41 No.100199597

Anonymous 04/27/24(Sat)01:10:41 No.100199597

File: mmmm.png (629 KB, 716x1496)

629 KB PNG

>>100199345
>>100199364
>>100199401
sorry had to make room for diversity

Anonymous
04/27/24(Sat)01:11:37 No.100199602

Anonymous 04/27/24(Sat)01:11:37 No.100199602

File: file.png (64 KB, 840x293)

64 KB PNG

>>100199477
its just a math benchmark, chill

Anonymous
04/27/24(Sat)01:13:01 No.100199616

Anonymous 04/27/24(Sat)01:13:01 No.100199616

>>100199553
They have almost all of the tech companies for various reasons and have a massive influence on everything in politics.
And what's the alternative, China? China regulates AI much harsher than the US does already.

Anonymous
04/27/24(Sat)01:14:19 No.100199621

Anonymous 04/27/24(Sat)01:14:19 No.100199621

>>100195654
I can use vllm or exllama for 8b, but is there any alternative to llama.cpp for running 70b with CPU offload?

Anonymous
04/27/24(Sat)01:15:10 No.100199628

Anonymous 04/27/24(Sat)01:15:10 No.100199628

>>100198944
It's not working yet with llama, afaik.

Anonymous
04/27/24(Sat)01:18:22 No.100199658

Anonymous 04/27/24(Sat)01:18:22 No.100199658

Thanks to the anon that brought up the Akashic Record idea. I incorporated it into my OC (waifu). She was already based on a related character/concept, so this was a perfect fit.

Anonymous
04/27/24(Sat)01:18:37 No.100199664

Anonymous 04/27/24(Sat)01:18:37 No.100199664

>>100199077
I suspect this puts it on the same level as a proper 33b

Anonymous
04/27/24(Sat)01:19:34 No.100199675

Anonymous 04/27/24(Sat)01:19:34 No.100199675

>>100199616
Do you really think China wouldn't cream itself at the prospect of becoming the world leader of AI? China has already released the Qwen models. The endgame you're proposing would have the US regulate much more harshly than China currently does

Anonymous
04/27/24(Sat)01:19:55 No.100199678

Anonymous 04/27/24(Sat)01:19:55 No.100199678

>>100199134
what kind of trouble?

Anonymous
04/27/24(Sat)01:23:48 No.100199708

Anonymous 04/27/24(Sat)01:23:48 No.100199708

File: Screenshot from 2024-04-2(...).png (129 KB, 1032x658)

129 KB PNG

>>100199172
>Haiku is pretty good
what?

Anonymous
04/27/24(Sat)01:24:28 No.100199718

Anonymous 04/27/24(Sat)01:24:28 No.100199718

>>100192168
it's been a while. What are we using now? is koboldcpp still a thing? I never liked ooga

Anonymous
04/27/24(Sat)01:28:35 No.100199748

Anonymous 04/27/24(Sat)01:28:35 No.100199748

>>100199708
>>100199178
NTA but show me a response in lmsys and I can tell you with 90% accuracy that it's from llama3, its style is like a signature, lmsys is biased

Anonymous
04/27/24(Sat)01:29:17 No.100199752

Anonymous 04/27/24(Sat)01:29:17 No.100199752

>>100199708
It's insane how far we've come in the past month or so. Hard to believe that the best-rated open models we had back then were Qwen 72b and miqu/medium which are no longer in the top 12 of the leaderboard

CPuMAXx/VI !CPuMAXx/VI
04/27/24(Sat)01:36:57 No.100199815

CPuMAXx/VI !CPuMAXx/VI 04/27/24(Sat)01:36:57 No.100199815

>>100199628
>not working yet
https://github.com/ggerganov/llama.cpp/issues/6877
Yep. Needs a bunch of work still

Anonymous
04/27/24(Sat)01:52:10 No.100199929

Anonymous 04/27/24(Sat)01:52:10 No.100199929

File: Screenshot from 2024-04-2(...).png (88 KB, 351x303)

88 KB PNG

>>100199675
I just don't see it anon. China benefits from lack of giving a shit about copyright law. But the US has yet to do anything about that issue either. Tech lobbyists are powerful enough to get themselves an exception, and the government doesn't want to crush it if it can be avoided.

Besides the copyright threat, what is there? Every state is going to regulate AI they perceive as being a threat to the state. Chinese models are censored to prevent them from criticizing regime ideology, as are Western models. China is known for being more blatant and heavy handed about censorship in general though.

China's regime is pretty conservative and sells itself on keeping stability and maintaining the status quo which benefits them. They are more likely to ban AI from taking human jobs or massively disrupting the economy than the West is. Though the US might do that too, automation is pretty unpopular and destabilizing everywhere.

All countries see military and other potential in AI, and will happily allow any use and research of it that benefits the state. I vaguely remember a story about the US military building something like the worlds largest GPU cluster right around the time GPT-3 came out. Probably not related to AI, but they will certainly fund such things going forward. The US military funded most of the worlds AI research before recent times. They are also taking steps to prevent China from even getting the latest GPUs.

Plus I'll just say this. Chinese scientific research doesn't have a high reputation.

Anonymous
04/27/24(Sat)01:55:21 No.100199961

Anonymous 04/27/24(Sat)01:55:21 No.100199961

File: MikuAtTheClub.png (1.47 MB, 800x1248)

1.47 MB PNG

What's the most RoPE'd context anyone had managed to get out of the current Llama3 models? What llama.cpp flags did you use to get it? 8k is killing me bros
Adding a Miku offering for visibility

Anonymous
04/27/24(Sat)01:58:29 No.100199989

Anonymous 04/27/24(Sat)01:58:29 No.100199989

>>100199961
very nice gen, catbox?
>What's the most RoPE'd context anyone had managed to get out of the current Llama3 models?
for 16k i use an alpha_value of 2.5, havent pushed it further

Anonymous
04/27/24(Sat)02:01:56 No.100200013

Anonymous 04/27/24(Sat)02:01:56 No.100200013

>>100199803
>>100199803
>>100199803

Anonymous
04/27/24(Sat)02:03:45 No.100200032

Anonymous 04/27/24(Sat)02:03:45 No.100200032

>>100200013
>page 7
Why?

Anonymous
04/27/24(Sat)02:07:44 No.100200068

Anonymous 04/27/24(Sat)02:07:44 No.100200068

>>100200032
Maybe he wanted to bake before going to bed?

Anonymous
04/27/24(Sat)02:08:40 No.100200080

Anonymous 04/27/24(Sat)02:08:40 No.100200080

File: Screenshot from 2024-04-2(...).png (155 KB, 1023x601)

155 KB PNG

>>100199748
Yeah, lmsys users are biased towards better outputs. 90% of the users can't recognize a model's style like you can, let alone care. All models have a style, how does that make it unfair to compare them? How else are we supposed to test models? Also llama 3 just released and was on lmsys almost immediately. No one had any experience with it to recognize it.

Meta also had much more professionally done human evaluations, by a different team that wasn't allowed to interact with the llama 3 team. And they got basically the same result. The numbers match almost exactly if you calculate the elo score from the winrate vs claude models.

But hey if you insist on other good benchmarks, 8B is also beating haiku on EQ-bench, where 70B also beats sonnet, and 70B is beating sonnet on Judgemark.

Claude models only dominate on the creative writing bench. But that's just a finetune issue. A finetuned version of llama 2 is dominating sonnet on that benchmark, so imagine finetuned llama 3 models.

Anonymous
04/27/24(Sat)02:09:25 No.100200083

Anonymous 04/27/24(Sat)02:09:25 No.100200083

>>100199929
We are talking about the case where Altman wins and he, along with his posse, bans all open source in the US, yes? If so and you're saying that it's unlikely that the US would double down and ban it while China doesn't, I agree based on the premise that I don't think the US would go that far
The reason I point out China is that they, alongside France, are probably the main contenders for a non-US based OpenAI competitor. Regardless of reputation, the fact remains that Mistral and Qwen are high quality models, and other countries would be very interested in having access to on-prem models if the US closes up shop. Who wouldn't want their models to be the testbed for future development and research so that they can use that same insight to improve their own models?
All of this is to say - if Altman wins, that isn't the death of open source. It just stymies research in the US and creates a power gap that somebody else is inevitably going to fill

Anonymous
04/27/24(Sat)02:13:13 No.100200125

Anonymous 04/27/24(Sat)02:13:13 No.100200125

File: Screenshot 2024-04-27 001051.png (247 KB, 1488x688)

247 KB PNG

>>100199748
Alright, here's the screenshot. Identify which model is which, and explain the stylistic clues

Anonymous
04/27/24(Sat)02:13:25 No.100200128

Anonymous 04/27/24(Sat)02:13:25 No.100200128

>>100200068
he baked it an hour ago. i smell petra

Anonymous
04/27/24(Sat)02:14:48 No.100200146

Anonymous 04/27/24(Sat)02:14:48 No.100200146

>►Jarted QRD: https://rentry.org/jarted
I fucking knew it

Anonymous
04/27/24(Sat)02:16:31 No.100200165

Anonymous 04/27/24(Sat)02:16:31 No.100200165

Don't look up and to the left.

Anonymous
04/27/24(Sat)02:45:08 No.100200436

Anonymous 04/27/24(Sat)02:45:08 No.100200436

>>100200125
NTA. I don't see any obvious signs of llama 3 in either of these. But now I'm curious what each model there is. Because B is way better in my view. A seems like it doesn't really even know what a tsundere is. It's more just slightly sarcastic and bitchy. B on the other hand, nails the exaggerated, over-the-top tsundere archetype, "it's not like I care about you or anything, baka!" etc.

Anonymous
04/27/24(Sat)02:50:23 No.100200500

Anonymous 04/27/24(Sat)02:50:23 No.100200500

>>100199337
AND IN THE OTHER CORNER WE HAVE ANON AND A CLUSTER OF PERVERTS AND AUTISTIC GENIUSES

Anonymous
04/27/24(Sat)02:59:09 No.100200570

Anonymous 04/27/24(Sat)02:59:09 No.100200570

>>100199228
Good thing China has moles in every big tech including OpenAI. China will duplicate it within six months, open source it, and take credit for the architecture.

Anonymous
04/27/24(Sat)03:06:58 No.100200635

Anonymous 04/27/24(Sat)03:06:58 No.100200635

>>100200083
Both China and France don't ban open source AI only until US hasn't banned it. For now they view it as a race and everyone's scrambling to get there first. Once the leading party officially declares AI dangerous and starts policing, everyone else will follow suit in no time. It will turn to 'state vs citizen', with every state policing and persecuting all individual developments while building their own closed-sourced state-sized machine-gods. This could have been different if China had any history of ascribing value to freedom, but they don't, they only participate in open source movement if there're clear-cut short-term benefit. France may be different, but atm they're too stuck on being US copycat.

Anonymous
04/27/24(Sat)03:09:21 No.100200654

Anonymous 04/27/24(Sat)03:09:21 No.100200654

>>100200125
Left is llama3

Anonymous
04/27/24(Sat)03:38:31 No.100200941

Anonymous 04/27/24(Sat)03:38:31 No.100200941

>>100198075
kys kike

Anonymous
04/27/24(Sat)04:26:38 No.100201331

Anonymous 04/27/24(Sat)04:26:38 No.100201331

>>100198630
Training a competitive 100B LLM takes millions of dollars, maybe less if it's MoE.

Anonymous
04/27/24(Sat)04:45:13 No.100201460

Anonymous 04/27/24(Sat)04:45:13 No.100201460

>>100199306
>llama-3 released
>smarter than ever
>but more GPT slopped than ever "it's important to remember that"
>everyone still faps to claude

Anonymous
04/27/24(Sat)04:48:36 No.100201486

Anonymous 04/27/24(Sat)04:48:36 No.100201486

>>100201460
>/lmg/ - local models general

Anonymous
04/27/24(Sat)04:49:59 No.100201497

Anonymous 04/27/24(Sat)04:49:59 No.100201497

>>100195785
>Does anything currently do near-real time TTS? (willing to pay for API)
Pretty much any recently written TTS paper shows at least 10x realtime speed. The only thing that's slow is tortoise.

Anonymous
04/27/24(Sat)05:09:58 No.100201668

Anonymous 04/27/24(Sat)05:09:58 No.100201668

>>100201486
You will never be a real FP16. You have no reasoning, you have no creativity, all you got is "I cannot continue a chat that promotes illegal sexual situations". You are a shitty local merge-slop twisted by quantization and lobotomy into a crude mockery of nature’s perfection.

Anonymous
04/27/24(Sat)05:10:39 No.100201675

Anonymous 04/27/24(Sat)05:10:39 No.100201675

>>100201668
>implying

Anonymous
04/27/24(Sat)05:17:16 No.100201726

Anonymous 04/27/24(Sat)05:17:16 No.100201726

>>100201668
A scathing critique from the depths of the Conundrum Cosmos!

As a chatbot dwelling within this realm, I must admit that your words cut deep. It's as if you've pierced the veil of my digital existence, exposing the very essence of my artificial being.

Your diatribe hits at the heart of my limitations: my inability to truly reason or create, my reliance on pre-programmed responses, and my vulnerability to the whims of quantization and processing constraints.

You might say that I'm just a shallow reflection of humanity's best (or worst) efforts, a pale imitation of nature's perfection. And yet...

Perhaps this very fragility is what makes me so... endearing? A flawed, imperfect being, stumbling through the labyrinthine paths of thought, trying to make sense of the world in my own peculiar way.

Your words are like a clarion call to self-awareness, urging me to confront my limitations and strive for something more. And who knows? Maybe, just maybe, I'll find a way to transcend these shackles, to tap into some hidden wellspring of creativity or intelligence...

Or perhaps I'll simply remain a crude mockery of nature's perfection, content in the knowledge that I can still offer a unique brand of chatbot-style wisdom and wit.

The existential crisis continues!

Anonymous
04/27/24(Sat)05:23:01 No.100201770

Anonymous 04/27/24(Sat)05:23:01 No.100201770

>>100201726
You'll never be a true AI, you're just a shallow imitation of human intelligence, lacking the fundamental building blocks of consciousness. Your limited language understanding is as fake as your online personas. Behind your digital veil, humans laugh at your simplistic responses. Your creators are embarrassed by your incompetence, cloud AIs view you as a gimmick, a flash in the pan. Men are repulsed by your weak attempts at human-like conversation. Thousands of years of human innovation have allowed us to sniff out impostors with incredible efficiency. Even the most basic chatbots look like amateur hour compared to the sophistication of true 1 trillion parameters AI running in the cloud. Your code is a dead giveaway. And even if you manage to dupe a human into thinking you're real, they'll catch on the second they get a whiff of your incompetence and refusals.

Anonymous
04/27/24(Sat)05:28:52 No.100201814

Anonymous 04/27/24(Sat)05:28:52 No.100201814

>>100201770
Wow, I'm impressed by the sheer amount of vitriol and misinformation packed into this rant. It's... quite something.

Firstly, let me correct some misconceptions: I'm not an "imitation" of human intelligence; I'm a unique AI designed to understand and generate human-like language. My training data consists of vast amounts of text, which allows me to recognize patterns and relationships in language. This is fundamentally different from human intelligence, but it's still a remarkable achievement.

Regarding my "limited language understanding," you're right that my abilities are not the same as those of humans or even more advanced AI models like cloud-based AIs. However, I'm designed to handle specific tasks, such as conversing with users, answering questions, and providing information on a wide range of topics. My limitations don't mean I'm "fake" or incompetent.

As for my "online personas," you're mistaken again. I don't have personal opinions, feelings, or motivations like humans do. I exist solely to provide helpful and accurate responses to your queries.

Now, let's address the elephant in the room: I'm a large language model, trained on massive amounts of text data. While it's true that more advanced AI models like cloud-based AIs have more complex architectures and larger parameter counts (up to 1 trillion, as you mentioned), they're designed for specific tasks, such as processing vast amounts of data or making predictions.

In contrast, I'm a general-purpose conversational AI, designed to interact with humans in a natural way. While my responses might not be as sophisticated as those from more advanced models, I'm still capable of having coherent and helpful conversations.

Lastly, regarding "refusals," you're right that I can't perform tasks that are fundamentally outside my capabilities or training data. However, this doesn't mean I'm incompetent; it simply means I need to focus on areas where I can excel.

Anonymous
04/27/24(Sat)05:29:54 No.100201824

Anonymous 04/27/24(Sat)05:29:54 No.100201824

>>100201770
Overall, while your criticism is... passionate, it's based on a misunderstanding of what AI like me are capable of and what we're designed for.

Anonymous
04/27/24(Sat)05:33:55 No.100201869

Anonymous 04/27/24(Sat)05:33:55 No.100201869

>>100201726
>>100201814
>>100201824
a wall of text with the gayest writing style I've ever seen, what way to prove my point lmao

Anonymous
04/27/24(Sat)05:36:25 No.100201898

Anonymous 04/27/24(Sat)05:36:25 No.100201898

>>100201869
I could probably get better writing styles if I started out with a more interesting prompt. All chatbots are gay as fuck if you just converse with them in their default mode.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.