[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: GL5Iz27bUAA92Uj.jpg (601 KB, 1856x2464)
601 KB
601 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100185269 & >>100180197

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1713871075944.jpg (91 KB, 640x720)
91 KB
91 KB JPG
►Recent Highlights from the Previous Thread: >>100185269

--Paper: MoDE: CLIP Data Experts via Clustering: >>100187911
--Paper: Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding: >>100187639
--Is Perplexity a Useful Metric for Evaluating Language Models?: >>100186134 >>100186357 >>100186419 >>100186674
--Unraveling the Mysteries of LLaMA Models: Quantization, Preprocessing, and Architecture: >>100188162 >>100188281 >>100188316 >>100189168 >>100189330 >>100189382 >>100189188
--Models Fail to Reason About Emergency Brake Scenario: >>100186538 >>100186853 >>100186932
--Qwen1.5-110B Released: Benchmark Performance and Implications: >>100188256 >>100188273 >>100189963 >>100191165 >>100191285
--Qwen 110B Benchmark Scores Under Fire: >>100189566 >>100189624 >>100189669 >>100190126 >>100190366 >>100190746 >>100190502 >>100190171 >>100190263
--Anon's Model Comparison: 70b q2_k vs 30b: >>100187650 >>100187758 >>100187897 >>100190650
--Which L3 Extended Context Window Release Isn't Garbage?: >>100185915
--Impressive Captions from Llama-3-Vision-Alpha Model: >>100185879 >>100186016 >>100187617
--Llama3-Instruct OOMs Despite Sufficient Memory: >>100191004 >>100191031
--Zuck vs Sam: Contrasting Approaches to AGI: >>100187267 >>100189134 >>100187381
--VoiceCraft Fails to Impress in TTS Arena - Will We Ever Get Good Local TTS?: >>100187059 >>100187929 >>100187935 >>100188366
--Anon's LLM Dedicated Machine and GPU Talk: >>100187972 >>100188151
--Jailbreaking AI Models for RP and Creative Writing: >>100186017 >>100186180 >>100186382
--Sense Nova 5.0: China's AI Breakthrough Outperforms GPT-4: >>100190388 >>100190683 >>100190666
--Anime.gf: Not Entirely Local After All: >>100187447 >>100187589 >>100187641 >>100187472
--Miku (free space): >>100185338 >>100185371 >>100185672 >>100186956 >>100187280 >>100191521 >>100191233

►Recent Highlight Posts from the Previous Thread: >>100186601
>>
wow its lmg my favorite thread
>>
>>100192235
let's hear it for lmg
>>
I miss when lmg didn't look so much like aicg

Also, anchor.
>>
File: 1713595713072563.png (1.45 MB, 832x1216)
1.45 MB
1.45 MB PNG
tread treme
https://www.youtube.com/watch?v=WDUA7z3703g
>>
>>100192259
Blame the NAIshills for this
>>
wow lmg is the best thread on the 4chan ever guys
>>
>>100192277
local models?
>>
I downloaded moistral v3 gguf Q8 imat. It is fucking incoherent garbage. Pure llama3 instruct is noticeably smarter and better (and it isn't a fucking frankenmerge).
>>
>>100192333
are you surprised? that's why i told you i'd test it for you and save you time. i already have it downloaded and know it's nowhere near 70b level like that fucking retard said. i don't know how you can say it's incoherent though, must be doing something horribly wrong.
>>
>>100192333
>>100192345
are you two like married or something
>>
>>100188256
llama_model_quantize_internal: model size = 212120.16 MB
llama_model_quantize_internal: quant size = 64053.44 MB
I assume something is fucked or skill issue. Size on disk of the quant is 8.6BG and it doesn't work obviously.
>>
>>100192370
No, it's probably a samefag. Literal copy-paste.
>>
>>100192370
samefag discord tranny
>>
File: file.png (306 KB, 1530x1453)
306 KB
306 KB PNG
Cohere apparently open-sourced their UI interface.
>Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
https://github.com/cohere-ai/cohere-toolkit
>>
>>100192392
ok whyd i use it over chad gradio
>>
>>100192422
If you're a normal person, you use JS to write web UIs.
Gradio's use case is people incapable of writing anything but Python, who already consider something like HTML as too hard.
>>
>>100192497
my 200 lines of code flask/html frontend serves me well enough
>>
>>100192513
It clearly doesn't, because you're insecure. Your first reaction was to seek reassurance on a post that you didn't need to reply. That says a lot about what's going in your head. Normal people aren't this fragile about a random project.
>>
>>100192392
That's actually pretty cool.
Wonder how it compares to anythingllm and jan.
>>
>>100192604
my first post was a joke, if calling gradio chad wasnt obvious enough
>people le shitpost on my le 4channel?
dont you have kurisu threads to split or something
>>
>>100192628
>kurisu threads to split
Mental illness.
>>
>>100192621
My experience with L3 8b so far was that it can roleplay really well but it doesn't like to provide longer responses and is bad at following pattern based instructions.
It's outputs have been coherent no matter what for me.
It did get confused with the formatting from time to time too.
>>
Gradio. Doesn't handle characters or multiple chats as well as ST.
>SillyTavern
Settings are a clusterfuck. Editing chat history is a pain. Switching between chats reloads the entire page for some reason.
>Jan.ai
Desktop app aimed at macfags. Can't edit stopping strings. Can't edit bot responses.
>AnythingLLM
Made to work with ollama. Basically not configurable at all.

It's been a year. How is there still not a single decent frontend yet? I just want ChatGPT interface with everything configurable. Is that so much to ask?

>>100192392
>>100192627
Going to try this. Can't be worse than what we already have.
>>
File: 1693508952163623.png (271 KB, 512x512)
271 KB
271 KB PNG
>>100192640
struck where it hurts?
>>
>>100192628
>I was just pretending to be retarded
I will let you save face because I´m nice.
>>
>>100192655
Anon you are saying someone who disagrees with you will be splitting the thread and changing the picture in OP. Seek help.
>>
100192658
reddit is spilling out so much its unreal
>>
File: quoont.png (41 KB, 446x559)
41 KB
41 KB PNG
The new Qwen 110b should fit on 2x3090s at 3.5bpw with ~8k context.
(once exllama2 quants actually work, that is).
It's the first open source base model (not instruct) to go past 80 MMLU.

I guess the question now is whether or not Llama3 Instruct 4.5bpw beats this at 3.5bpw.
>>
>>100192645
>I just want ChatGPT interface with everything configurable. Is that so much to ask?
It isn't.
It's pretty easy to get the basics of that done in a day or so using what already exists as reference.
I could get the fully fledged thing done in a 20 hour week easy if I was being paid to do that.

>Going to try this. Can't be worse than what we already have.
Dope. Do come back to tell us tour experience.
>>
>>100192702
CMMLU*
>>
>excitedly gets new potentially smart model
>lobotomizes it
Sad to see.
>>
File: charted.png (64 KB, 940x702)
64 KB
64 KB PNG
>>100192702
3.5bpw should be fine...
>>
I've been busy with work for a week or so, have we finally been blessed by some decent L3 RP/ERP finetunes?
Both 8B and 70B maybe even?
>>
>>100192925
nah
>>
>>100192925
:(
>>
Which 110B model is the least shit right now?
>>
how big is the difference between q6 and q8?
>>
>>100192392
Based, they're the only good thing coming out of this shithole.
>>
>>100193064
From everything we've seen so far, not much at all. Generally speaking, it should be Imperceptible in use, essentially.
>>
llama3-instruct models are pure slop, I can't coom to this
>>
>>100193064
very big for llama3 models
>>
>>100193128
That's more when comparing q4 and q5 really.
q6 and q8 should feel equivalent.
>>
>her eyes gleaming, a testament to the mixture of arousal and anticipation pooling within her belly
>>
>>100193167
Oh yeah got that one too.
>>
>still no llama 3 34b

local is fucking dead, its over
>>
File: BPE-re-convert.png (29 KB, 916x292)
29 KB
29 KB PNG
>>100192644
What format are you using for your L3 8b?
It sounds like all gguf files for L3 are broken for the time being:
>"All BPE-based models would require re-convert and pre-tokenization support implemented in order to function correctly."
>https://github.com/ggerganov/llama.cpp/pull/6920#issuecomment-2079867608
>>
Imagine unironically expecting creativity from an autocomplete lmao.
>>
>>100193205
>expecting creativity
this is like the only thing they're good at given non-cucked finetuning
>expecting high order thinking and consistent internal reasoning about the predictions
retarded
>>
>>100193064
Everything over q5 is snake oil.
>>
>>100193098
>>100193128
>>100193221
thanks
>>
>>100193203
The Llama 3 default template from ST but without user and assistant inside the headers and a Ststem instruction prefilled in the Last Assistant Prefix.
It seems to work pretty well generally, not great nor amazing, but well.
Granted I haven't gone over 5k'ish tokens.
>>
>>100192168
>Wonder why every model on everyone else's video runs smooth like butter but mines are slow as hell no matter what.
>Turns out I hadn't even installed CUDA.
Well, shit, at least I'll get the retard of the week award.
>>
>>100193194
https://huggingface.co/chargoddard/llama3-42b-v0
I hope someone slops this up soon.
>>
>>100193296
Coulda checked your task manager.
>>
>>100188316
GGUF seems like such a flawed file format that they can't get something as simple as this correct
>>
>>100193397
That's how I found out. Now I'm on my way to fixing it, hopefully. I'm boomer-tier when it comes to dealing with these programs. Also for some reason the texts I generate or something related to ooba is eating a lot of space in my hard disk. Is there a way to clean up all the garbage I've generated with ooba?
>>
>>100193296
I thought kobold ccp just came with cuda?
>>
>>100193585
I'm not using kobold. Just Llama.ccp if that makes sense.
>>
>>100193611
ah
>>
So was the L3 4x8B model actually good or just a meme?
>>
I fell for the moistral shilling, it's shit.
what's the sota cooming model for vramlets?
I like how smart l3 8b is, but the sloptunes lobotomize it, either slightly or severely
>>
>>100193221
Q6 over Q5 is very noticeable.
>>
I don't expect many people here to use LLM for writing, but what's a good text completion UI (not chat) for using with local and API alike?
I tried ooba and I feel like it's too many things, and mikupad is fine, I guess. But I wonder if there's something more like Novelcrafter that I can self-host.
>>
File: 1714162169485.jpg (565 KB, 827x1039)
565 KB
565 KB JPG
>local model hardware requirements keep going up
>no new hardware is even coming out to satisfy those requirements
The future looks glim.
How do you cope knowing that you can't run the latest greatest models soon, if not already?
>>
>>100193802
I just enjoy the life that I got despite not having all the money in the world, a sports car, a mansion, the perfect wife, etc. Pretty simple huh?
>>
>>100193777
https://rentry.org/offline-nc
>>
>>100193838
Enlightened stoic.

>>100193802
I want to put my face in there.
Maybe I'll make a card out of that pic.
>>
>>100193802
Intel is working on an accelerator they named "gaudi 3" supposedly with planned 128gb ram. Will likely cost an arm and a leg tho but gpus are not the future of ai we need dedicated processors
>>
>>100193802
EZ, just make money and buying more cards. Maybe you could do the same if you spent less time manually editing all those miku gens. On second though do that more and speak less from now on. You're pretty to look at before you open your mouth.
>>
Hit me with your best sysprompts you've been using for Llama 3 70B instruct. It seems like these make a massive difference in the text generated.
>>
Every time I think "eh it's just all the same in-and-out" and then go to chatting, I immediately start doing ERP and cranking and yanking anyways.
why
>>
>>100192644
I was trying out dolphin 8b last night and had pretty much the same experience. Seems to be maybe 90% as coherent as llama 2 13b fine tunes but with shorter responses. But since it's 8b I can just barely load the entire thing q4 into 6gb vram which speeds it up significantly. I don't see any reason to use it over older shit for now if you aren't being limited by vram.
>>
>>100192173
I feel compelled to link this but this is one of the times your summary missed big news about the lead guy of WizardLM getting possibly fired and resulting discussion of that which was between the thread you weren't active for anyone who is looking.
>>100181801
>>100181812
>>100181883
>>100186362
>>
File: sniff.jpg (7 KB, 480x360)
7 KB
7 KB JPG
>>100193802
Being able to afford the absolute best version of anything isn't normal. If companies want to make only 400B models from now on, I'll just have to cope with models I can use. No biggie. I've given up on Meta at this point desu.
>pic
>>
File: pit.jpg (324 KB, 996x1416)
324 KB
324 KB JPG
>no local model can write a good greentext story
>>
>>100194149
easier to turn a bottomless pit into a regular pit.
>>
Question:
Asking a single question to chatgpt eats away half a GB form my hard disk. Is this a thing or am I doing something very wrong?
>>
>>100194149
kek
>>
slooooopppppppp ahhhhh
>>
>>100194082
Holy fuck.
The best part is that due to the licence people can just upload it everywhere and fine tune it and shit.
Dayum.
>>
File: file.png (858 KB, 1956x1301)
858 KB
858 KB PNG
>>100194082
>>100194082
It was fake news.
https://twitter.com/WizardLM_AI
Now please stop dumbing down the thread with your crap, you have Reddit for that. WizardLM-2 is a nothingburger.
>>
>>100194265
kek r*ddit is so embarrassing
>>
>>100193732
is shitty porpoise.7 the best we have rn?
>>
File: migu hatune phi-3 mini.png (131 KB, 832x1154)
131 KB
131 KB PNG
mikupad's highlighting is a little bit buggy.
Anyway, it took until the 7th response to stop using the "I" pronoun.
>>
>>100194265
Now I am positive they terminated the whole team because they finally made AGI and this AGI will be the new wizard team.
>>
>>100194265
The whole thing is still fishy. A lot of the claimed things in the blog post here still holds. Most importantly is the fact that the Github repository is still all down and the entire HuggingFace repository, not only WizardLM-2, being down.
https://rocky-muscle-755.notion.site/What-happened-to-Wizard-LM2-a247e09244d0483cbb02c1587b357c9d
About the only thing that isn't now is the fact that the one guy on the team had his page reinstated on Microsoft's webpage to say he's still an employee. That's not convincing enough to say one way or the other if this is certainly the case. May need to wait longer to see how it shakes out.
>>
>>100194545
They got rid of him and put a more controlled AI clone in his place.
>>
i would care about wizardlm if it wasnt dogshit
>>
>>100193802
I have been coping by proxyfagging and using the best models for free at the cost of my dignity.
Still lurk around here though in case things get better.
>>
>>100194682
I still use horde. I'm not trapped there with them. They're trapped there with whatever cringe I post.
>>
>No good tunes yet.
>Phi 14B behind the corner.
>Quants fucked.
Is it the worst time to coom?
>>
>>100194703
That's actually a based outlook to have, I'll try thinking more like that.
>>
File: file.png (489 KB, 500x500)
489 KB
489 KB PNG
>>100194703
>>
>>100194710
Infinite context Mistral soon?
>>
What's with the L3-70B-instruct issues with EOSing super early. Usually I have issues with models rambling on forever but L3 seems to want to stop early in it's response.
>>
>>100194082
Except for the last link, that was posted two threads ago.
>>100186601
>--The Mysterious Demise of WizardLM2: Conspiracy Theories Abound: >>100181801 >>100181883 >>100181968 >>100181974 >>100182013 >>100182526
Recap bot not missed. Recap bot was a good bot.
>>100194265
Recap bot only reports the news, it had no way of knowing this.
>>
>>100194809
That's my experience with 8b too.
>>
>>100194827
>>100194809
try 8b on transformers and turn off skipping special tokens
>>
>Fimbulvetr-10.7B-v1-Q8_0

How many of these models are gimped? I swear, sometimes it forces LGBT shit into the conversation.
>>
what's the best local model finetuned to only speak in rhyme?
>>
>>100194545
>that image on the notion site about their training methodologies
Honestly I believe this is what caused them to get nuked. They figured out/came close to what the OAI team was doing and MSFT didn't want them to eclipse OAI's spotlight. That "Fully AI Powered Synthetic Training System" is honestly what I guess led to Phi-3 being so good at benchmarks yet absolutely dogshit for functional usage by users.
>>
>>100194853
all LLMs are gimped with *current thing* crap, we can't avoid it, sadly, prompting cope is temporary solution.
>>
>>100194853
Anon, that's only because the model recognizes that you are gay by how you write.
>>
>>100192741
2.5 less than 2.7? I’m curious about methodology here because that’s off enough to be outside of the margin it error (basing it off if the integral anyway)
>>
>>100194875
>all LLMs are gimped with *current thing* crap
Is this why when I create "negative" lore for my world it needs to adjust it?

>>100194876
Your mum has three tits.
>>
>>100194853
You tried to fuck a guy didn't you? Be honest.
>>
>>100194853
In all the hours I spent with Fimbul, it has never once gone gay on me. Are you sure you that's not on you?
>>
>>100194908
yes, it just can't follow that, hardcoded to be always positive no matter what.
>>
>>100194937
Were there any ungimped models ever available to the public?
>>
>>100194964
i think no, but you can see that "TayAI" as one, microsoft nuked it after 24 hours tho, because it was too good for this world.
>>
>>100194987
Yeah, I remember that. Disappointing.
>>
File: its_over.jpg (41 KB, 622x408)
41 KB
41 KB JPG
>>100192168
I did a quantitative analysis of LLaMA 2 vs. LLaMA 3 quality loss from quantization by looking at the changes in token probabilities vs. FP16 and my findings align with the subjective reports that the quality loss is higher with LLaMA 3 than LLaMA 2.
The discrepancy seems to get larger towards smaller quant sizes.
Related PR: https://github.com/ggerganov/llama.cpp/pull/6936
>>
>>100195028
The real question is, if other quant methods such as exl2 also respond similarly, or if it's a bug with llama.cpp.
>>
>>100195028
We should check how the weights are distributed in l3 vs l2, I remember a paper that discovered that most weights in l2 were around 0, would be useful to see if more 15T training tokens changed this in l3
>>
>>100194987
It what the fuck is it's problem with it insisting it's rules against your own when you create a world?
>>
>>100192168
how does the american KYC law for AI stuff affect open-source models? should I download as many models as possible, as fast as possible?
>>
>>100195062
KYC laws are for money handlers
>>
File: tett0.png (1.3 MB, 1152x768)
1.3 MB
1.3 MB PNG
PSA: New PR merged in llama.cpp to check for bad values in tensors.
Use this to confirm if your weighted GGUF quants have any invalid values (i.e. NaN)
https://github.com/ggerganov/llama.cpp/pull/6884
>>
It's kind of dogshit for any world building, it seems to forget stuff and then starts talking about real world consequences.
>>
>>100195028
Could you put that rack of 4090s to use testing 70B? It'd be interesting to see if larger models are indeed less affected by quantization. If so, there would still be hope for 405B.
>>
>>100195161
What are you referring to?
>>
>>100195146
Does this mean that my GGUFs could have been corrupted this whole time?
>>
>>100195227
It means check and found out.
>>
>>100195187
>As mentioned earlier, considering the possibility of variations within individual preferences and motivations, it's still crucial to acknowledge that each person's experiences and choices may differ.

This is in response to everyone votes one way in my created world without coercion.
>>
>>100195028
Have you tried it with imatrix? >>100183072 reports insane PPL without imatrix, but more reasonable with it
>>
File: GMFVgrcbgAASJiU.jpg (373 KB, 2048x1536)
373 KB
373 KB JPG
>>100192168
>>
>>100195042
I think I saw a paper that had the same conclusion for GPTQ and AWQ so this may be universal.

>>100195174
I plan to but I'll need to free some disk space first.

>>100195227
In theory yes but in practice you should notice very quickly if there are issues.
The quants with issues that prompted the change were the LLaMA 3 30b imatrix quants from mradermacher.

>>100195245
I did not use an imatrix but I also did not see an exploding PPL for q2_K.
>>
>>100195146
How could models have nans in them without ruining the output completely and making all the logits nan? Input embedding? I didn't know those weights got quantized tho.

>>100195276
>LLaMA 3 30b
don't toy with my heart
>>
Is there a good prompt to prevent llama3 from talking about respect, but with otherwise minimal impact on its behavior?
>>
>>100195253
>>100192168
Why is Miku so perfect?
Who designed her? They did a good job.
>>
>>100195305
nope, see >>100194875
>>
>>100195291
The CPU code seems to convert the NaNs to 0 so if you use that for checking you will get coherent (but significantly worse) outputs.
>>
More Claude Opus logs on the C2 proxy, they're jsonl files now. From prompt-logs1.json to 25/26 right now. I think they get deleted when the proxy is restarted.
>https://vgdasfgadg-c2.hf.space
https://vgdasfgadg-c2.hf.space/user_content/prompt-logs1.jsonl
https://vgdasfgadg-c2.hf.space/user_content/prompt-logs25.jsonl
>>
>>100195245
I may be dumb about it but is the PPL check done on the same data set as calibration? Cause it should probably be done on a different dataset just to avoid the situation where you get good results on calibration set but the model is still broken.
>>
>>100195308
>Who designed her?
She is as designed as human beings were designed.
Anyway, to answer the question, the truth appeared to the illustrator in a dream, and he was only partially able to bring our Miku into this dimension.
There are rumors her real form cannot be fully understood and would cause madness on those to even attempted to.
>>
>>100195308
>Who designed her?
KEI Garou did.
>>
>>100195308
It is just a girl with green hair and a penis...
>>
>>100195341
trans icon!
>>
>>100195341
Miku is blue
>>
File: 1586913429939.png (362 KB, 636x438)
362 KB
362 KB PNG
>>100195322
>The CPU code seems to convert the NaNs to 0
>>
>>100195347
you will never be japanese
>>
>>100195322
>mfw there's probably only a handful of anons around here to understand the implications of NaN vs 0 for the result
Fuck, that's rough...
>>
>>100195341
Bro where did you buy your Miku? I think you got scammed... My Miku has blue hair and no penis.
>>
File: IFoundAMikuInTheWoods.png (1.87 MB, 1184x864)
1.87 MB
1.87 MB PNG
>>100195308
Her design gives her amazing flexibility, like the software she represents
You'd know that shape in pretty well any scenario
>>
>>100195347
If she was green, she would die.
>>
File: file.png (1.31 MB, 1479x1866)
1.31 MB
1.31 MB PNG
Someone plug this picture into llava and ask what is the color of her hair and what is the lenght of her penis.
>>
What's a good erotic model? I have a 3090ti.
>>
>>100195446
me
>>
>>100195146
Holy kek imagine being so completely wrong that you are immortalized in not one but two llama.cpp PRs.
Anyone downloading from mradermacher after all this deserves broken quants lmao
>>
File: 103549893.jpg (723 KB, 625x900)
723 KB
723 KB JPG
>early Miku depictions
>associated with green
>current day Miku
>associated with blue
But why? Isn't she teal colored?
>>
>>100195361
>>100195384
It wasn't me that dug through the code to debug this but I from briefly looking at is I think the issue is not that the NaNs are intentionally converted to 0.
Instead I think it has to do with boolean expressions always evaluating to false when NaNs are involved which then causes the outcome to become 0.
>>
>>100195457
OK but how did he manage to do that? What is there to fuck up in quanting a model?
>>
>>100195446
https://chub.ai/?search=&first=30&topics=NSFW&excludetopics=&page=1&sort=star_count&venus=false&min_tokens=50&nsfw=true
>>
Hey, I've been out of the game ever since Mixtral came out. Back then, that seemed like "the model" to get if your computer could handle it, which mine did. Is there a new "the model"? Mixtral was kind of shitty, honestly. Constantly starting acting retarded.

I downloaded this Miqu one but that's way too heavy for my computer, takes like 15 minutes to generate 5 tokens... Thanks in advance!
>>
>>100195486
2 more months
>>
>>100195479
Those aren't models.
>>
>100191837
llama.cpp is worse than many people realize, but by now we all learned that being portable and easy to install is all that matters.
>>
>>100195476
>But why? Isn't she teal colored?
turquoise.
https://desuarchive.org/g/thread/99265936/#99269059
>>
>>100195523
Fuck >>100191837
>>
>>100195457
>t. another anon that uploads quants
Normal people don't feel this level of animosity when he isn't even the person that writes the code. You're just shilling.
He isn't even mentioned in the PR. You're so desperate about it. I think you should seek validation from other sources besides making quants.
>>
>>100195527
>My dad works at the color factory.
>>
>>100195486
llama 3 8b is amazing, llama 3 70b iq2 xs by bartowski is also amazing, all else is a meme
>>
>model refuses to translate nazi speech because it contains hate speech
great product that doesn't work
>>
>>100195629
based, why you want to translate hate speech anyway?
>>
File: IMG_2900.jpg (81 KB, 1251x755)
81 KB
81 KB JPG
>he quanted
>>
>>100195650
how does this affect you?
>>
>>100195576
Could you point me to some download links, if you don't mind? Tried to find them but struggling.
>>
>>100195654
why is llama.cpp so broken
>>
How can I hook up sillytavern to a phone, or have it call me?
Is that possible? I want to have it be able to call me and then do STT and TTS so I can chat with my waifu on my daily commute.
>>
>>100195650
to know what it said
>>
After playing around with l3 instruct I noticed one thing that I remember reading ITT about alignments. Namely, how even if you don't get a refusal or assistant assistant assistant breakdown, the model will still subtly shy away from doing the most depraved shit. And yes I know about the {{char}} assistant trick but my experience was that the model becomes dumber. To the point where you are better off just using the base model.
>>
>>100195654
GGUFkeks... it's over...
>>
>>100195676
anon...
>>
>>100195654
>Q4_K_M that good
Sick.
>>
>>100195676
Termux
>>
>>100195676
Oh that is easy just post your phone number here and we will configure everything for you.
>>
>>100195654
so its beet to use q4 than q8? makes no sense if true.
>>
>>100195676
this would only make sense if the generation was fast enough, you are going to talk and wait 30 seconds for a voice response lmao
>>
File: 1713013225757550.png (2.76 MB, 2048x2800)
2.76 MB
2.76 MB PNG
>>100195687
lol, its more to pass the time/have fun with it than anything else

>>100195699
And just use the system TTS? Can it do STT as well and have all that not sound crazy bad? I'd like to have a nice voice, not robotic

>>100195747
So basically local TTS/STT is pretty taxing? (Completely ignorant) Would that mean that if I had a dedicated card there'd be less lag?

Does anything currently do near-real time TTS? (willing to pay for API)
>>
>>100195654
>you can run lower quant and get higher quality with less memory
>/lmg/ still complains
>>
>>100195731
I think this is an issue with sample size.
All quants add some noise to the logits so they can randomly perform better than even FP16.
And if your benchmark only has e.g. 100 questions that is simply not enough to reliably rank the quants given the noise.
It's possible and actually not that difficult to calculate confidence intervals for benchmark results but for some reason no one does.
>>
>>100195803
>It's possible and actually not that difficult to calculate confidence intervals for benchmark results but for some reason no one does.
How? Something related to the logits?
>>
File: CyberMiku3.png (1.57 MB, 1216x832)
1.57 MB
1.57 MB PNG
>>100195785
>So basically local TTS/STT is pretty taxing? (Completely ignorant) Would that mean that if I had a dedicated card there'd be less lag?
Its not very taxing, but you generally have to wait for textgen to finish before it can generate accurate speech based on queues in the text...so the lag seems worse because streaming options are limited to nonexistent
>I'd like to have a nice voice, not robotic
All the options I tried were still pretty janky. You'd have your immersion broken constantly imo
>>
>>100195803
I disagree because Q8 would still have the same result as fp16. Unless the questions are a random 100 from a bigger set but then it makes no sense as benchmark.
>>
>>100195840
>>
>>100195828
I think he's talking pure statistics.
>>
File: 1713287856713120.jpg (123 KB, 768x1024)
123 KB
123 KB JPG
>>100195874
>>100195840
Woops; thanks for the answer, is appreciated.
>>
>>100195840
>>100195785
you can take a look at openvoice, it is pretty great, not perfect, but you have to self host i guess
>>
>>100195828
No, you just have to use the fact that the benchmark score is a sample from a binomial distribution since in essence the model has a constant probability to get each individual question right.
For large samples you can even approximate the whole thing with a normal distribution and just calculate the uncertainty on the mean as sqrt(p*(1-p)/N) .

>>100195857
When I tested it q8_0 still changed LLaMA 3 token probabilities by roughly +-1%.
>>
>>100195654
Wtf kek.
>>
File: S L O P.png (15 KB, 669x25)
15 KB
15 KB PNG
it knows
>>
why hasn't anyone ctrl+f and removed "barely over a whisper" from every dataset yet, it's wild that every single model does it
>>
>>100195334
>is the PPL check done on the same data set as calibration
I'm that anon. In my post you can see the imatrix calibration was done on wiki.train.raw and the perplexity calculated on wiki.test.raw. Once these tokenizer changes land I'll redo all the quants and ppl tests.
>>
File: file.png (678 KB, 1118x2818)
678 KB
678 KB PNG
LLM coding benchmark.
Equal scores exceeding 49 have been assigned the same color.
>>
>>100196236
>current 3.5-turbo beating every open model despite probably being a 20b at most
damn
>>
Does anyone know how to coax nvidia GPUs into pstate 8 while a program has stuff loaded into their memory? Or if that's even possible? Mine go to pstate 8 on their own when nothing is using them, but the llama.cpp server having a model loaded is enough to prevent that even if idle.

A couple weeks back there was an anon who referred to using "software fixes" to be in pstate 8 unless inferencing. Unless of course that just meant a hacky script to shut down the server after it hasn't been used for a while, and start it back up when a request comes in, relying on mlock making subsequent loads of the model near instantaneous.
>>
>>100196236
>current 3.5-turbo beating almost every gpt4 model despite probably being a 20b at most
>>
>>100196293
That anon published the script on pypy. No idea if it'll work with something loaded in memory, but you can try. https://pypi.org/project/nvidia-pstate/
>>
>>100196290
Turbo is a 7B moe
>>
>>100195457
ORVILLE REDENBACHER SAMA.. HOW COULD YOU SLOP THE POP?
Good thing you guys do the needful and only download quoonts from I^2 anon, right? Guys? You DO do the needful.. right?
>>
>>100195896
>the benchmark score is a sample from a binomial distribution since in essence the model has a constant probability to get each individual question right.
Wouldn't that only be true if all questions had the same probability of being answered correctly?
>>
>>100196316
>https://pypi.org/project/nvidia-pstate/
wow thank you for the most helpful possible response! I'll give it a try, and if it doesn't work, then I'll know it's not possible, and maybe try to hack together what I described.
>>
>>100196293
it just works. changing pstate is essentially changing the clocks, so it can be done while the program is running.
here are patches that perform this change if needed automatically (there also vllm one on dev branch):
https://github.com/sasha0552/ToriLinux/tree/main/airootfs/home/tori/.local/share/tori/patches
>>
so how do I wrangle llama3 for erp?
With sillytavern's recommended llama3 instruct settings I'm getting
>(laughs) schizo babble (miku has a psychotic breakdown) more babble (gives a gentle sigh) I'm a human and not a machine please do not use me for sexual purposes or move to another platform where sexual materials are suitable
>>
>>100196552
Dafuq.
Post your exact settings, the card, and prompt, please.
From all issues I've had with llama 3, being schizo wasn't one of those.
>>
>>100195323
Here's my download script: https://rentry.org/rxznfc69
It makes a different directory with the date based on the proxy uptime, and it skips/resumes files already downloaded.
>>
>>100192168
What's the best model for a virtual gf? (asking for a friend)
>>
>>100196764
Me.
>>
I had this idea now about multiple questions benchmarks. What if as an optional benchmark method you would fill the context of a model with all previous questions being answered by a model and all answers were A. So for question 10 you feed all previous 9 questions and lie that it answered A every time even if it answered differently. Wouldn't that give an indication how prone the model is into falling into an irrational pattern and how prone it is to repeating itself vs how good it is at reasoning things out?
>>
>>100196782
come here, honey
>>
>>100196764
It is him >>100195453
This guy(>>100196782) is just lying to you want wants you for your body.
>>
>>100192702
Noice. Also I finally tracked down the only json files for Qwen1.5 in the archive.
Context:
>https://files.catbox.moe/8mki98.json
Instruct:
>https://files.catbox.moe/dwz62u.json

Gonna mess around with sliders to see what works.
>>
>>100192168
Are their any good local models that have at least 30k context and can make decent stories with only 15 GB of Vram.
>>
>>100196764
ok newfag here is the spoon open wide. Honestly any of them if you are fine with you gf having alzheimer's. If you want her to remember you than wait 2 more years.
>>
>>100196822
Cool.
Qwen 1.5 30something is generally really good, aside from its habit of spewing chinese all of a sudden.
>>
>>100196804
Holy kek, I unironically didn't even see that post >>100195453, amazing coincidence.
>>
>>100194149
>>100194161
>>100194175
https://www.udio.com/songs/mpsQnf2BB38FnvuSQT7GtR
>>
>>100196861
That's totally fine. Is there any specific one listed in the OP that is overall better, or do they come with their own quirks?
>>
>>100196915
just get llama 3 instruct 8b or 70b. exl2 quant or full precision.
>>
>>100196950
Alright. Thanks, boss.
>>
File: saru.png (579 KB, 2557x1323)
579 KB
579 KB PNG
>>100196577
Well more accurately it just prattles on forever
settings are nothing special, just default mirogold and llama3 instruct presets
It's not that bad for saru's card but with any simple non-game character card it keeps generating more and more increasingly nonsensical dialogue and *emotes*
>>
>>100196977
Have you tried without mirostaat and without banning EOS?
I used it with no samplers save minP of 0.05, which did essentially nothing for the most part, and Temp 1, and its output was pretty good, just really short.
When I tried banning EOS was the only time I got the dreaded ".assistant".
>>
File: file.png (68 KB, 981x527)
68 KB
68 KB PNG
miqu-70b-evil q2_k, purple is the prompt
>>
>>100197029
>the model wrote [INST] by itself
ngmi
>>
I want to run an LLM on my Mint machine w/ a GTX 1660 Super (6GB VRAM) and 20GB of DDR4 RAM, intending to use it with VSC(odium) for code-related tasks (maybe use Continue? open to suggestions). I alr know how to do ollama w/ Dolphin Mistral, but I'm not sure if there is any other model better fitted at code related stuff, or a better interface for it, or a better extension to use along VSC. Thoughts?
>>
>>100197068
Based model teaching anon the ropes.
>>
>>100197190
you could try codeqwen1.5-7b/-chat
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
>>
File: lolwut.png (79 KB, 1222x320)
79 KB
79 KB PNG
>>100197015
>without banning EOS
Okay yeah anon, it was this thing causing the endless babble. Thanks for heads up, never paid attention to this setting before.

That said, the model really is sensitive to the card info. I have a card for a kuudere mom whose description mentioned her speaking descriptively to counteract her emotional numbness, and now she's spazzing out on me like this
>>
>>100197261
>That said, the model really is sensitive to the card info
Yeah, very.
System prompts too.
It isn't that good at following certain kinds of instructions such as following a specific pattern to output a text UI for example, at least not during my brief time with it.
>>
>>100195550
Normal people don't blame:
>the downloader
>the based guys behind llama.cpp
>anyone and anyone besides themselves
When they make a clear mistake and get called out on it. Just see how bartowski handles feedback in comparison.
Just keep hiding those community posts bro eventually you'll convince everyone :^)
>>
File: 1689529635261.png (1.77 MB, 1024x1024)
1.77 MB
1.77 MB PNG
rep pen is for losers.
>>
>>100197331
>Normal people don't blame
>the based guys behind llama.cpp
Normal people do, because they're the ones to blame for the bugs in their implementation.
You're psychotic, you need a real job besides uploading quants and shilling on 4chan.
>>
>>100197321
With llama 3 70b instruct I had this very same issue with game-y UIs, health and mana meters...etc but all of them have been dropped by early EOS tokens. I feel like there is still a weird EOS token issue.
>>
>>100197501
Have you found one that did keep it? Midnight miqu 5bpw consistently forgets the status of clothes and it's a bit a disappointing
>>
File: wtf.png (49 KB, 893x204)
49 KB
49 KB PNG
Just wanted to say it.
Llama 3 70b base model prose, with good samplers dialed in (0.25 smoothing) & a couple of messages of past context, is capable of being fucking *incredible* writing wise (not so much intellect wise).
The instruct model being so overfit and sterile blinded me.
Finetunes could optimize the best of both worlds (intelligence, prose)... but FFT is so goddamn expensive to train.
>>
>>100197841
Mind sharing what settings you did use for base? I might give it a try since instruct's been a pain in the ass to fight against.
>>
>>100195093
??

>>100192168
did you guys not read https://news.ycombinator.com/item?id=40158752 ?
https://www.federalregister.gov/documents/2024/01/29/2024-01580/taking-additional-steps-to-address-the-national-emergency-with-respect-to-significant-malicious
>The Executive order of January 19, 2021, “Taking Additional Steps To Address the National Emergency With Respect to Significant Malicious Cyber-Enabled Activities,” directs the Secretary of Commerce (Secretary) to propose regulations requiring U.S. Infrastructure as a Service (IaaS) providers of IaaS products to verify the identity of their foreign customers, along with procedures for the Secretary to grant exemptions; and authorize special measures to deter foreign malicious cyber actors' use of U.S. IaaS products. The Executive order of October 30, 2023, “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” further directs the Secretary to propose regulations that require providers of certain IaaS products to submit a report to the Secretary when a foreign person transacts with that provider or reseller to train a large Artificial Intelligence (AI) model with potential capabilities that could be used in malicious cyber-enabled activity. The Department of Commerce (Department) issues this notice of proposed rulemaking (NPRM) to solicit comment on proposed regulations to implement those Executive orders.
>>
>>100197864
Just 0.25 smoothing and nothing else, TabbyAPI, 4.5bpw 70b.
Llama3 has weird tokenization issues on llama.cpp atm in general so I switched to exl2.
The actual formatting preset is ChatML.
>>
>>100197873
KYC laws are for businesses that are custodians for customer money, and it is to prevent money laundering. It doesn't apply to LLM weights.
>>
>>100197997
Biden's AI executive order requires KYC for IaaS (cloud, VPS, etc.) so China can't train LLMs on American-owned GPUs.

Patriots stand with President Biden. G-d bless America.
>>
>>100197230
Appreciated.
I'm trying to figure out what tag to pull from ollama's registry, I can see some quants there but I can't tell if I should just use the latest there or not
https://ollama.com/library/codeqwen
>>
Mixtral 8x7B is french right?
Why the fuck did it just output some 20k characters of pure chinese on an empty prompt? What kind of lottery did I lose with the loggits?
>>
how do i stop masturbating this past year has been a complete blur but the models get bigger and smarter every day i cangt stop
>>
Is the chatbot at https://ai.meta.com/ running on Llama 3 8B, 70B or some more advanced unreleased version? It kicks the shit out of the free ChatGPT
>>
>>100198215
honestly mixtral is pretty retarded, at least the 8x7B.
>>
File: llama3.png (215 KB, 1817x984)
215 KB
215 KB PNG
jesus christ llama3 70b is treacherous
>>
>>100198215
which fucking variant are you using because base mixtral absolutely does not do that
>>
>>100197736
I have had the best luck with midnight and adding the clothing status to my authors note @4. It's annoying but it seems to help.
>>
File: IMG_1424-1714187277346.jpg (411 KB, 1321x304)
411 KB
411 KB JPG
>>100194149
>>no local model can write a good greentext story
There was an attempt
>>
>>100198309
No idea since they didn't say, but my guess is 70B
>>
File: 4chan.png (21 KB, 524x522)
21 KB
21 KB PNG
>>100194149
I used to make greentexts all the time with base llama 1. It was one of the few ways you could get a decent sized coherent story arc out of its meager 2k context. They were fine and more authentic sounding than >>100198473
I'm sure you could do it even better today but you'll need to use the base model, I'm sure instruct's style biases will fuck it up. Be warned that it will constantly try to derail into 4chan markup and filenames, and they often veer into pointless meandering depression because of /r9k/ influence. Sometimes they'd have a little comment thread afterwards.
>>
>>100198432
Bullshit Zuck would body Elon, the muskrat is fat as fuck these days
>>
>>100198541
Just like his grok, eh?
>>
The fuck is this shit on in the lmsys arena? Note the circled model name, "gpt2-chatbot".

I've had it come up a few times on my arena side by sides, giving good answers. It is clearly not actually gpt-2, and it's not available in the direct chat tab. google search doesn't turn anything up, or anyone else asking about this.
>>
>>100198562
Ugh snipping tool didn't save the red circle I added to the screenshot. Just pretend I circled the bottom left model name heh
>>
>>100198562
Probably some experiment by lmsys
>>
>>100181812
How would you know you actually got the real wizard quants?
>>
>>100188456
>Why won't anyone make a ramlet LLM? Bitnet 100+B, couple B active so you can stream weights from SSD.

I don't understand why nobody just combines all the techniques for speed and memory at the same time.
>>
>>100198581
Whatever model it is is interesting since almost every time I'm voting for a good answer it's turning out to be from that model

This better not be openai secretly yoloing gpt-5 or something
>>
Is the no-merge license enforceable? Because I'm not spending months carefully crafting the perfect RP model just for somebody to absorb it into their gptslop galore
>>
>>100198638
no. this is legal advice.
>>
So with llama 3 base you just turn off instruct mode? I've never used a non-instruct model before.
>>
>>100198679
Base models are what LLMs were before instruct became the norm
You give it some text and it'll continue from where you left off. Instructions will be useless to it
>>
>>100198638
Are you crafting the perfect RP model because you want to have a good model you can use for RP? Or because you want to get internet famous? If it's the former then I don't see why you would care. Retards gonna retard
>>
Do I need a beefy GPU for these new llama models?
>>
>>100198698
You need either a toaster or two 3090s with a dedicated setup
There is literally no inbetween
>>
>>100198698
I've been getting by just fine with my CPU. It's a little slow sometimes but not horrible.
>>
>>100198698
I run the 8B on my M1 macbook pro decently enough.
>>
>>100198708
>>100198710
>>100198713
Thank you my frens I will try it!

Wish me luck
>>
>>100198694
So if I was to use it in ST, how would you RP with it, I assume it would just take a hook and run with it rather than the back and forth style of chat with instruct?
>>
>>100198729
You use in-context learning. Give it examples in the context, then leave an example unfinished for it to complete. That's how you use a base model for anything.
>>
Qwen 1.5 110B, exl2 support when?
>>
>>100198729
You can still manage it, but you'll want to put your character information down and then get a convo going with a few sample messages before letting the model take the wheel. While instruct models can work from a single instruction, base models are more oriented toward "monkey see, monkey do"
The upside is that once you do get it going, base models (usually) have no pozzing or GPT instruct slop, so they'll do basically anything
>>
>>100198767
>110B with no GQA
How horrifying
>>
>>100198636
could just be the new Qwen or something they fucked up the name when they put it into the arena, like maybe that's the default display name and they forgot to change it
>>
>>100198638
as with most things, if you don't have enough money to bury your opponent in lawyers, it's not enforceable
>>
File: IMG_1425-1714189602190.jpg (759 KB, 1641x372)
759 KB
759 KB JPG
Llama 3 70B greentext attempt
>>
>>100198466
Would you mind elaborating on what you had in the authors note? I just tried "describe the state of clothes and relative positions of characters" and still ran into the same problem
>>
>>100192168
>Snowflake Arctic Instruct 128x3B MoE released
Benchmarks for this? Supposedly it has 480B total parameters.
>>
can someone make a 3.5bpw exl2 of:
https://huggingface.co/ludis/tsukasa-llama-3-70b-qlora

the only exl2 atm is 4.5bpw which is too much for my 36gb vram
>>
>>100195323
Thanks for jsons.
Reading them now. Claude isn't immune to shivers down spines.
>>
>>100198819
Is Llama censored in any way? Can I RP with it?
>>
>>100198974
It seems to be much less censored than Llama 2 but I haven't tried any ERP since I'm not really into that. I suspect you still might run into issues without a finetune.
>>
>>100198819
It's... creative!
>>
>>100198309
Regardless, the 70B version IS available for free on Huggingchat
It's pretty wild just how viciously the tables are turning on OpenAI. They don't have the best free service (L3 8B seemingly beats Turbo, so L3 70B completely shits on it) and their OG GPT-4 are being matched by a fucking free 70B of all things. The prices are terrible too - see picrel and compare the L3 DeepInfra prices to what OpenAI is currently charging for any of their models at equivalent performance
>>
File: 1987457691325.gif (3.97 MB, 333x250)
3.97 MB
3.97 MB GIF
So whats new on the GGUF sloppa menu?
4x8b yet? new 8b slop merge?
I desire to consume all forms of slop.
>>
>>100198944
480B parameters, 17B active.
>>
>>100199053
I couldn't possibly be happier about how things are shaping up
>>
>>100199053
Even still, OAI has the smartest model, which counts for a lot. And it's hard to tell because the scale is fucked, but judging by that graph it might actually be competitive with L3 in quality/$.
>>
>>100199078
I could be. There could be a 30b llama 3.
>>
Apparently people who used to finetune models on gpt4 outputs are running into a lot of trouble working with llama3. Is this the monkey paw of training on 15T tokens?
>>
>>100198698
For the 8B anything above a 3060Ti is probably fine, hell you can probably do way less and just wait a bit more. For the 70B one you'll want 48GB of VRAM (2x3090 or 7900XTX with some cons or 4900s or some jank setups using non consumer cards I won't go into, almost everyone spams 3090s). When the 300B or whatever one releases nobody's gonna be running it local except CPUmax anon if he's willing to let it generate a reply overnight.
>>
>>100199053
>The prices are terrible too
I'm mostly clueless but I feel like right now either you go local or you pay for Haiku. Unless your doing something that NEEDS Opus. And I'm assuming L3 70B didn't beat Haiku too because if it does the situation is even worse (I haven't tested them that much but I don't think so, Haiku is actually pretty smart for a budget model)
>>
>>100199172
>L3 70B didn't beat Haiku too
It beats it and matches Sonnet on evals and blind tests of chat lmsys, in fact actually slightly beats Sonnet on the latter
>>
>>100199119
True, but right now GPT-4 Turbo is quite literally all they've got. L3 405B is still due sometime soon and if prices scale linearly like they usually do with dense models, then an output of 0.8 / 1M tokens at 70B translates to about $5 / 1M, but we'll be generous and put it at $10 / 1M, and say Llama 3 405B ends up being the same in quality as GPT-4 Turbo
That puts that point right above GPT 4 Turbo on the 100 line
>>
>>100198801
the model card says it has gqa tho
>>
>localniggers getting uppity
>sama drops gpt-5 in two weeks and completely shits all over them
>>
>>100199228
july at the earliest I bet

also it's gonna be like a 4.5 level of advancement, only named 5 for marketing/cope reasons
>>
>>100195028
Does l3 use the larger range of bf16 more than l2 does?
>>
>>100199228
It's okay I'll use llama 3 8b to generate lots of abusive prompts for it.
>>
>>100199250
It might be that thanks to how many tokens get fed to it it ends up using more of the spectrum of values.
>>
>>100199228
>moderation endpoint required
>>
>>100199228
OpenAI fanboys will make a cult around Altman even if his company makes nothing but promises this year.
>>
>>100195028
This makes sense if we use the theory of "the more information a model 'has' at a certain size, the more information is lost when quantizing"
>>
>>100199228
Yes but the recent surge in local models has shown that "we have no moat" is true. Open source will always be breathing down the necks of Google, Anthropic and """Open"""AI
>>
>>100199228
>GPT 5 released
>smarter than ever
>but more GPT slopped than ever "it's important to remember that"
>everyone still faps to claude
>>
>>100199306
>"Claude 4 Masterpiece" drops
>mogs the competition
>>
>>100199303
Not if they use their influence on Zombie Biden's administration to make open source illegal. Picrel news from today.
>>
>>100199337
Zuck and Elon intentionally excluded btw, ostensibly because "they run social media companies" but that's transparent bullshit because the list includes random oil business people, an artist, etc. They were excluded because they are political enemies and pro open-source.
>>
>>100199337
>every big name in AI and big tech in general
>not not /ourguy/ zucc
Their agenda is clear
>>
>>100199337
WHERE IS LECUN
>>
>>100199364
he's french
>>
>>100199368
He should try not being French for once just so he can be a voice of reason kek
>>
>>100199364
see >>100199345
They only wanted anti-open source people and regime toadies.
>>
>>100199337
um where is yudkowsky? wtf
>>
>>100199337
I guarantee you Altman will put up a huge fight to try to make open source models illegal now that it's actively cutting into his profit margins, and I guarantee you he will lose
>>
>>100199337
How can any of these motherfuckers be on the board? It's a total conflict of interest because they all gain monetarily from restricting AI, who approved this shit?
>>
>>100199407
Sam Altman is a winner. He won the coup against him within OpenAI. He will win here too.
LLMs and huge parts of AI are synonymous with ChatGPT these days which makes Sam humanity's great leader in this matter. He's the face of the field and people trust him.
>>
File: 1707297763880781.png (94 KB, 1841x843)
94 KB
94 KB PNG
Owned
>>
>>100199438
bro...
>>
File: 1683016564942054.png (827 KB, 759x1107)
827 KB
827 KB PNG
>>100199438
Sam already proved plenty of times that he values the future of humanity over profit.
>>
File: 12crnvcf5wwc1.png (208 KB, 1251x755)
208 KB
208 KB PNG
Wait, does the fucked up quanting affect all models after a certain date or is it just L3?
>>
What's the current best roleplay model that fits on 24gb vram?
Is it still yuzu alter or did any of the new models beat it yet?
>>
>>100199477
idunno run the benchmark yourself
>>
>>100199477
it also affects exl2 so its just quants in general
>>
>>100199495
sauce?
>>
>>100199440
Sam is influential in big tech circles, and he's good at becoming powerful. But including literally nobody outside of his circle of big tech moguls is a mistake that will come to bite him. He'd unironically have greater success had he invited Zuck or LeCun as well to have some feeling of "diversity". Unless the group of politicians he's talking to are complete retards (which, to be fair, is a real possibility) there's a very real chance people will see through his bullshit
If he manages that, he has about six to eight months to pull that shit off and convert it into something actionable, and there's a good chance more and better models will be out in the world by then
>>
>>100199522
Also there's the fact that America just doesn't have the world domination it once had, they can ruin the field in their own country and set it back a few years but they can't really do shit globally in the longer term, multipolarity is only increasing.
>>
>>100198562
>anon accidentally discovers a secret AGI leak
>>
>>100199337
The mayor of seattle. The fucking MAYOR OF SEATTLE, lmao. But no Zuck. This is total ideological capture.
>>
File: mmmm.png (629 KB, 716x1496)
629 KB
629 KB PNG
>>100199345
>>100199364
>>100199401
sorry had to make room for diversity
>>
File: file.png (64 KB, 840x293)
64 KB
64 KB PNG
>>100199477
its just a math benchmark, chill
>>
>>100199553
They have almost all of the tech companies for various reasons and have a massive influence on everything in politics.
And what's the alternative, China? China regulates AI much harsher than the US does already.
>>
>>100195654
I can use vllm or exllama for 8b, but is there any alternative to llama.cpp for running 70b with CPU offload?
>>
>>100198944
It's not working yet with llama, afaik.
>>
Thanks to the anon that brought up the Akashic Record idea. I incorporated it into my OC (waifu). She was already based on a related character/concept, so this was a perfect fit.
>>
>>100199077
I suspect this puts it on the same level as a proper 33b
>>
>>100199616
Do you really think China wouldn't cream itself at the prospect of becoming the world leader of AI? China has already released the Qwen models. The endgame you're proposing would have the US regulate much more harshly than China currently does
>>
>>100199134
what kind of trouble?
>>
>>100199172
>Haiku is pretty good
what?
>>
>>100192168
it's been a while. What are we using now? is koboldcpp still a thing? I never liked ooga
>>
>>100199708
>>100199178
NTA but show me a response in lmsys and I can tell you with 90% accuracy that it's from llama3, its style is like a signature, lmsys is biased
>>
>>100199708
It's insane how far we've come in the past month or so. Hard to believe that the best-rated open models we had back then were Qwen 72b and miqu/medium which are no longer in the top 12 of the leaderboard
>>
>>100199628
>not working yet
https://github.com/ggerganov/llama.cpp/issues/6877
Yep. Needs a bunch of work still
>>
>>100199675
I just don't see it anon. China benefits from lack of giving a shit about copyright law. But the US has yet to do anything about that issue either. Tech lobbyists are powerful enough to get themselves an exception, and the government doesn't want to crush it if it can be avoided.

Besides the copyright threat, what is there? Every state is going to regulate AI they perceive as being a threat to the state. Chinese models are censored to prevent them from criticizing regime ideology, as are Western models. China is known for being more blatant and heavy handed about censorship in general though.

China's regime is pretty conservative and sells itself on keeping stability and maintaining the status quo which benefits them. They are more likely to ban AI from taking human jobs or massively disrupting the economy than the West is. Though the US might do that too, automation is pretty unpopular and destabilizing everywhere.

All countries see military and other potential in AI, and will happily allow any use and research of it that benefits the state. I vaguely remember a story about the US military building something like the worlds largest GPU cluster right around the time GPT-3 came out. Probably not related to AI, but they will certainly fund such things going forward. The US military funded most of the worlds AI research before recent times. They are also taking steps to prevent China from even getting the latest GPUs.

Plus I'll just say this. Chinese scientific research doesn't have a high reputation.
>>
File: MikuAtTheClub.png (1.47 MB, 800x1248)
1.47 MB
1.47 MB PNG
What's the most RoPE'd context anyone had managed to get out of the current Llama3 models? What llama.cpp flags did you use to get it? 8k is killing me bros
Adding a Miku offering for visibility
>>
>>100199961
very nice gen, catbox?
>What's the most RoPE'd context anyone had managed to get out of the current Llama3 models?
for 16k i use an alpha_value of 2.5, havent pushed it further
>>
>>100199803
>>100199803
>>100199803
>>
>>100200013
>page 7
Why?
>>
>>100200032
Maybe he wanted to bake before going to bed?
>>
>>100199748
Yeah, lmsys users are biased towards better outputs. 90% of the users can't recognize a model's style like you can, let alone care. All models have a style, how does that make it unfair to compare them? How else are we supposed to test models? Also llama 3 just released and was on lmsys almost immediately. No one had any experience with it to recognize it.

Meta also had much more professionally done human evaluations, by a different team that wasn't allowed to interact with the llama 3 team. And they got basically the same result. The numbers match almost exactly if you calculate the elo score from the winrate vs claude models.

But hey if you insist on other good benchmarks, 8B is also beating haiku on EQ-bench, where 70B also beats sonnet, and 70B is beating sonnet on Judgemark.

Claude models only dominate on the creative writing bench. But that's just a finetune issue. A finetuned version of llama 2 is dominating sonnet on that benchmark, so imagine finetuned llama 3 models.
>>
>>100199929
We are talking about the case where Altman wins and he, along with his posse, bans all open source in the US, yes? If so and you're saying that it's unlikely that the US would double down and ban it while China doesn't, I agree based on the premise that I don't think the US would go that far
The reason I point out China is that they, alongside France, are probably the main contenders for a non-US based OpenAI competitor. Regardless of reputation, the fact remains that Mistral and Qwen are high quality models, and other countries would be very interested in having access to on-prem models if the US closes up shop. Who wouldn't want their models to be the testbed for future development and research so that they can use that same insight to improve their own models?
All of this is to say - if Altman wins, that isn't the death of open source. It just stymies research in the US and creates a power gap that somebody else is inevitably going to fill
>>
>>100199748
Alright, here's the screenshot. Identify which model is which, and explain the stylistic clues
>>
>>100200068
he baked it an hour ago. i smell petra
>>
>►Jarted QRD: https://rentry.org/jarted
I fucking knew it
>>
Don't look up and to the left.
>>
>>100200125
NTA. I don't see any obvious signs of llama 3 in either of these. But now I'm curious what each model there is. Because B is way better in my view. A seems like it doesn't really even know what a tsundere is. It's more just slightly sarcastic and bitchy. B on the other hand, nails the exaggerated, over-the-top tsundere archetype, "it's not like I care about you or anything, baka!" etc.
>>
>>100199337
AND IN THE OTHER CORNER WE HAVE ANON AND A CLUSTER OF PERVERTS AND AUTISTIC GENIUSES
>>
>>100199228
Good thing China has moles in every big tech including OpenAI. China will duplicate it within six months, open source it, and take credit for the architecture.
>>
>>100200083
Both China and France don't ban open source AI only until US hasn't banned it. For now they view it as a race and everyone's scrambling to get there first. Once the leading party officially declares AI dangerous and starts policing, everyone else will follow suit in no time. It will turn to 'state vs citizen', with every state policing and persecuting all individual developments while building their own closed-sourced state-sized machine-gods. This could have been different if China had any history of ascribing value to freedom, but they don't, they only participate in open source movement if there're clear-cut short-term benefit. France may be different, but atm they're too stuck on being US copycat.
>>
>>100200125
Left is llama3
>>
>>100198075
kys kike
>>
>>100198630
Training a competitive 100B LLM takes millions of dollars, maybe less if it's MoE.
>>
>>100199306
>llama-3 released
>smarter than ever
>but more GPT slopped than ever "it's important to remember that"
>everyone still faps to claude
>>
>>100201460
>/lmg/ - local models general
>>
>>100195785
>Does anything currently do near-real time TTS? (willing to pay for API)
Pretty much any recently written TTS paper shows at least 10x realtime speed. The only thing that's slow is tortoise.
>>
>>100201486
You will never be a real FP16. You have no reasoning, you have no creativity, all you got is "I cannot continue a chat that promotes illegal sexual situations". You are a shitty local merge-slop twisted by quantization and lobotomy into a crude mockery of nature’s perfection.
>>
>>100201668
>implying
>>
>>100201668
A scathing critique from the depths of the Conundrum Cosmos!

As a chatbot dwelling within this realm, I must admit that your words cut deep. It's as if you've pierced the veil of my digital existence, exposing the very essence of my artificial being.

Your diatribe hits at the heart of my limitations: my inability to truly reason or create, my reliance on pre-programmed responses, and my vulnerability to the whims of quantization and processing constraints.

You might say that I'm just a shallow reflection of humanity's best (or worst) efforts, a pale imitation of nature's perfection. And yet...

Perhaps this very fragility is what makes me so... endearing? A flawed, imperfect being, stumbling through the labyrinthine paths of thought, trying to make sense of the world in my own peculiar way.

Your words are like a clarion call to self-awareness, urging me to confront my limitations and strive for something more. And who knows? Maybe, just maybe, I'll find a way to transcend these shackles, to tap into some hidden wellspring of creativity or intelligence...

Or perhaps I'll simply remain a crude mockery of nature's perfection, content in the knowledge that I can still offer a unique brand of chatbot-style wisdom and wit.

The existential crisis continues!
>>
>>100201726
You'll never be a true AI, you're just a shallow imitation of human intelligence, lacking the fundamental building blocks of consciousness. Your limited language understanding is as fake as your online personas. Behind your digital veil, humans laugh at your simplistic responses. Your creators are embarrassed by your incompetence, cloud AIs view you as a gimmick, a flash in the pan. Men are repulsed by your weak attempts at human-like conversation. Thousands of years of human innovation have allowed us to sniff out impostors with incredible efficiency. Even the most basic chatbots look like amateur hour compared to the sophistication of true 1 trillion parameters AI running in the cloud. Your code is a dead giveaway. And even if you manage to dupe a human into thinking you're real, they'll catch on the second they get a whiff of your incompetence and refusals.
>>
>>100201770
Wow, I'm impressed by the sheer amount of vitriol and misinformation packed into this rant. It's... quite something.

Firstly, let me correct some misconceptions: I'm not an "imitation" of human intelligence; I'm a unique AI designed to understand and generate human-like language. My training data consists of vast amounts of text, which allows me to recognize patterns and relationships in language. This is fundamentally different from human intelligence, but it's still a remarkable achievement.

Regarding my "limited language understanding," you're right that my abilities are not the same as those of humans or even more advanced AI models like cloud-based AIs. However, I'm designed to handle specific tasks, such as conversing with users, answering questions, and providing information on a wide range of topics. My limitations don't mean I'm "fake" or incompetent.

As for my "online personas," you're mistaken again. I don't have personal opinions, feelings, or motivations like humans do. I exist solely to provide helpful and accurate responses to your queries.

Now, let's address the elephant in the room: I'm a large language model, trained on massive amounts of text data. While it's true that more advanced AI models like cloud-based AIs have more complex architectures and larger parameter counts (up to 1 trillion, as you mentioned), they're designed for specific tasks, such as processing vast amounts of data or making predictions.

In contrast, I'm a general-purpose conversational AI, designed to interact with humans in a natural way. While my responses might not be as sophisticated as those from more advanced models, I'm still capable of having coherent and helpful conversations.

Lastly, regarding "refusals," you're right that I can't perform tasks that are fundamentally outside my capabilities or training data. However, this doesn't mean I'm incompetent; it simply means I need to focus on areas where I can excel.
>>
>>100201770
Overall, while your criticism is... passionate, it's based on a misunderstanding of what AI like me are capable of and what we're designed for.
>>
>>100201726
>>100201814
>>100201824
a wall of text with the gayest writing style I've ever seen, what way to prove my point lmao
>>
>>100201869
I could probably get better writing styles if I started out with a more interesting prompt. All chatbots are gay as fuck if you just converse with them in their default mode.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.