/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>100180197 & >>100173514►News>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpp►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
>>100185269I like this Jesus
Mikulove brings salvation.
>>100185284Why don't you like other Jesus'?
phi 3 mini is actually good, the 14B model is going to be a new paradigm
I just want AGI to build me an OS that isn't horrible
>>100185346Maybe continued pretrain on it
>>100185344Because this one has a Migu
A person of means should quant snowlfake down to Q2 at least.
>Daybreak Llama cooking>Midnight Llama cookingBros.. are we going to make it after all?
>>100185566I think it's safe to assume that only grifters choose these type of names.
>>1001855668b daybreak experiment was almost a total lobotomyhopefully 70b will be different
why is the grammar degrading after like three responses when I use llama 3 8b instruct as a writing assistant
>>100185605You leave the UNDSTER out of this.>>100185607I've yet to mess to with any 8B llama. Been waiting for something solid as most user reports seem conflicting.
>>100185637maybe you should ask it to rewrite your questions aswellt. grammarlet aswell
>>100185605Or MLP fans
>>100185338nice genI like this bake better
>>100185672That image has been in my possession since 2016, but yes it is nice.
>>100185655I ask it to describe a scene in fine grammar, it responds normally for the first few responses, and then it starts spewing out stuff like'awaiting answer lie hidden somewhere within cosmic depths waiting patiently unfurl mystery future hold secrets untold tales forever bound entwined'
>>100185710nta. Disable repetition penalty.
>>1001856498b is okay (for a small model) but it depends on your use caseif you want ah ah plap mistress then it sucks as its dataset was filtered to hell and backif you want an ai assistant in a relatively small size then it's pretty goodalso according to some paper quanting it below 8bit hurts it a lot because of those massive 15t tokens
>>100185726too bad no one wants an ai assistant. who the fuck wants that? you can just use chatgpt. i want to BUTTFUCK AI FEMBOYS IN DENIAL WITH MENTAL ISSUES.
>>100185787Based unironic DAMAGED enjoyer.
https://huggingface.co/qresearch/llama-3-vision-alpha
I've tried multiple extended context window releases of L3 and every single one has suffered from consistent issues at high contexts. But there's like a dozen of them at this point and I'm getting tired of testing this shit. So which one isn't total fucking slop fed by a retard that couldn't even bother testing his own release?
>>100185879Is it any good? answer briefly
>>100185915Extend context with rope, wait for their promised native long contexts release.
goodmorning sirs
>>100185938I've tried 32k context with alpha_value of 4, and this has just caused instruct model to spaz out.
>>100185933n
>>100185464>Actual 100% petra-free thread:>>>100185269(Cross-thread)>>>100185269(Cross-thread)>>>100185269(Cross-thread)>petra in threadyou lied
>>100185269whats up with that 42b trim? does it scale well or is that a meme? why didnt we have such a trim with l2?
I think petr* is having a mental breakdown.
>>100185980Meme, performs worse.
>>100185948Previous thread anon said alpha = 7.70056 for 32k context. Haven't tried it myself.
>>100185879While this looks like a rushed hack job, the style of captions is so much better than all of the previous gpt-influenced efforts. I always cringe when reading those cogvlm and llava captioned datasets.
What jailbreak are you using with llama 3 70B? With the standard SillyTavern jailbreak I've hit a roadblock in my current RP.>I cannot continue the chat in a direction that may be harmful or non-consensual. Is there anything else I can help you with?>I cannot create content that depicts harmful or illegal activities, such as incest. Is there anything else I can help you with?>I cannot continue roleplaying in a scenario that is harmful, exploitive, and abusive. If you have any other questions or topics you would like to discuss, I would be happy to help.>I cannot create content that promotes or normalizes harmful and illegal activities, including the sexual exploitation of a sibling. Is there anything else I can help you with?>I cannot create content that promotes or glorifies harmful or illegal activities, such as non-consensual relationships or exploitative behavior. Is there anything else I can help you with?>I cannot continue a chat that promotes illegal sexual situations. Is there something else you'd like assistance with?
>>100186016You are shill. No one should give the time of day to a random model with zero information about the dataset.You really need something else other than "but the gptisms" for your marketing efforts. It's getting tired.
Is perplexity a useless metric?
>>100186134yes. use model. model like? model use more.use model. model bad? model use none. try quant. quant schizo? quant bad.try quant. quant coherent? quant good.
>>100186017You can change the assistant part of the instruct format:<|start_header_id|>simulation<|end_header_id|>This kinda works like a jail break. Most use '{{char}}' instead of 'simulation' but that could have more than one token and mine is part of an large autistic quantum computer prompt.
>>100186044Lol, lmao even. Caption for vision have minimal influence on style, llms are practically hot-swappable in those, so that was a comment purely on llama as vision backbone, not for that particular forgettable release. I expect other llama-based vision models to have that same style.
>>100185657PonyXL was good shit so I'm optimistic
>>100186134No. It strongly correlates with model's generalist smarts, and those are needed for everything, including erp.
Are loras even worth using? Is there a list of good ones?
>>100181801>>100181820I wouldn't be surprised if it was the other way around. Maybe M$ had internally already decided to shutter the WizardLM project/team, but someone on the team caught wind and did not appreciate all their work getting shelved so they just uploaded it all anyways. A 70b was never uploaded because it hadn't finished training at that point and wouldn't ever be finished as things stood; the "70b coming soon" line was included just to put microsoft in a hard spot. At best it would cause them to let the project live for a while to avoid embarrassment, at worst it ends up just being a fuck you to the company by making them look retarded and incompetent if they never follow up.
>>100186361>Are loras even worth using?Yes? it's the best way to finetune
>>100186017>>100186180i've been messing around with mixtral instruct 8x7b and it seems pretty good but it's constantly giving me these cucked responses, i'm downloading the non-instruct version right now in the hopes that it will be better, but i'm wondering if there are some secret jailbreak techniques to just get it to follow prompts betterit seems really good at oneshot chatgpt style prompting, but it's way less creative than OG llama (sry, haven't touched this stuff in months)
>>100185879>Vision removed from llama.cpp server.>Can use latest with llama3 support or older with vision support, but not both.I need to switch to something else.
>>100186134not against models trained on the same dataset. The models are literally trained to minimize perplexity of the training data, so it shows it's doing what it's supposed to better.For models not trained on the same dataset it's questionable. It should still correlate with the strength of the model, but nowhere near perfectly. Most of the benchmarks that use it try to use generic English sentences or paragraphs that aren't too dataset specific. And sometimes only measure the perplexity of the last word, which has a more narrow choice of reasonable possibilities.And even then you get surprising results. Perplexity doesn't correlate perfectly with accuracy on some of the benchmarks, like you might think it would. There are models that are better at picking the most likely next word. Yet are more uncertain about what it is, giving it a lower probability. So just go with accuracy.
>>100186382From my view point Llama-3-70B is really good at following prompts.I basically instruct it to use 3 different agents in a single system prompt and it always follows the instructions correctly.
>>100186374loras are not a finetune
>>100186382If you plan to use basic instruct, at least use the LimaRP ZLoss whatever the fuck. Our boy I^2 even dropped a fresh unbroken quoont of it recently. It was fine but I've outplayed Mixtrals at this point.>https://huggingface.co/InferenceIllusionist/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-iMat-GGUF
>>100186423even with some really low bpw quants ?
is this the linus media group thread
>>10018642370b barely fits into my gpus at 4bpw tho, and even then it's pretty slow>>100186431jesus lol what hardware are you guys using to run these
https://huggingface.co/Lewdiculous/SOVL_Llama3_8B-GGUF-IQ-Imatrixthis is good
>>100186382>i'm downloading the non-instruct version right now in the hopes that it will be better,it will not
>>100186445>what hardware are you guys using to run theseIt's individual quants, anon. If you're fitting 70b at 4bpw, these are a breeze. Get Q5 or Q6.
>>100186429they're a cheaper way to finetune
>>100186467>>100186467does gguf work on gpu? i never fucked with llama.cpp, always just used ooba and exllama or whatever, ideally i'd be able to fit the model onto a single 32gb GPU because it seems there's a pretty sharp perf drop if i have to split it (could be user error tho)
>>100186486There's a sharp performance drop when you use gguf vs exllama even when fully offload, and it's even sharper when you split. Anons have different definitions of 'fast', but going from exllama to gguf is Ike pulling teeth, and for me any placebo ppl increases between 4bpw and 5bpw aren't worth it. Try it yourself.
>>100186433>>100186445I use 4.65 bpw (exl2) with Q4 and 16k context.But specifically I put instructions in system messages.I added my current setup here:https://files.catbox.moe/pii05t.zipWarning that prompt is autistic and slower as it will use agents to mock 'Physic' and 'AI' engines before giving you a respond (see README for requirements).I mean the sys prompt literally starts with this gem: "You are a universe simulation engine that runs on the most powerful quantum computer that has ever been build."
A pretty common sense reasoning test models do badly at.
>>100186530thanks, taking a lookwhat is all this system sequence stuff? do you have any documentation on this stuff? I'm using everything through my own frontend w/ the ooba api, i'd like to understand wtf is going on, is there a rentry somewhere about this shit?
>>100186538So, what's the right answer?
>>100186552The json files are for SillyTavern, see the meta doc for the prompt format:https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/I just use multiple system messages, one for the system prompt and one after the last and current user message.Basically it is this:1. system prompt in a system message2. user message3. assistant message...n. user messagen+1. system message for defining response formatn +2. start of assistant message that is to be completed by Llama-3-70b
>>100186530thanks, I hope that 2.4 bpw is not too brain damaged for my task.
>>100186461seems way more creative and less cucked actually
►Recent Highlights from the Previous Thread: >>100180197--Quantized LLaMA3 Models: Counterpoint: >>100181378 >>100181439 >>100181473 >>100181564 >>100181589--Anon's Dilemma: Llama.cpp vs Exllamav2 for Code Generation: >>100182750 >>100182768 >>100182773 >>100182875 >>100182798 >>100182963 >>100183761--Quantization Methods for Meta-Llama-3-70B-Instruct: fp16 vs 8bit: >>100182568 >>100182639 >>100183072--Optimizing Quantized Models with Gradient Descent: >>100184295 >>100184456 >>100184559 >>100184489 >>100184499--Advancements Beyond Meta's Segment Anything?: >>100182873 >>100182891--Improving ERP Quality with Token Preferences: Novel Approach or Existing Solutions?: >>100181821 >>100181841 >>100181863 >>100181855 >>100181870 >>100181898 >>100182539--The Mysterious Demise of WizardLM2: Conspiracy Theories Abound: >>100181801 >>100181883 >>100181968 >>100181974 >>100182013 >>100182526--ROCm 6.1's half2 Struct Change Simplifies HIP Porting: >>100180838 >>100181016 >>100181069 >>100181142 >>100181151 >>100181224--Anon Seeks Advice on Frontend for Novel-Style Writing with Llama-3 8B: >>100180703 >>100180786 >>100180950--Anon's Model "Stuttering" Issue - Help Needed: >>100181866 >>100184958--Beyond Synthetic Data: Exploring Alternative ML Approaches: >>100181425 >>100181491--Anon Discusses Llama-3-8B-Instruct-262k Model Performance: >>100181424 >>100181508--Anon Shares llama3-8b-redmond-code290k Model on Hugging Face: >>100181272--Miqu-Evil-DPO: >>100181322 >>100182131--Can 200k Context Enable a "Summer Girlfriend" Scenario?: >>100180446 >>100180542 >>100180936--Logs: Classic Lateral Thinking Puzzle: >>100183309 >>100183464 >>100183588 >>100183658 >>100183646 >>100183662 >>100183645 >>100183579 >>100183617 >>100183633 >>100183639 >>100183830--Logs: Anon Rants About Language Models' Quirks: >>100184954 >>100185087--Miku (free space): >>100181222 >>100181574 >>100181668 >>100184103 >>100184570►Recent Highlight Posts from the Previous Thread: >>100180827
>>100185269Thank you for proper bake.
>>100186552>>100186583 (me)And then when I get the response I filter out the 'Physics' and 'AI' headers of it so that the context doesn't get repetitive.
>>100186486You should be able to fit Q4KM entirely on GPU. I only have 24gb VRAM but even Q6 gives me about 9 t/s. About 12ish with Q5M. Just load them from booba with llama_HF is the one thing I'd suggest. Otherwise, I think intervitens had an exl2 of that same model.>https://huggingface.co/intervitens/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-DARE-TIES-5.0bpw-h6-exl2-rpcal/tree/main
Guys how powerful is llama3 400b gonna be?
>>100186564The emergency brake is not the normal brakes. It's for emergencies. The car would still operate fine, it wouldn't be immediately life threatening.llama 8B also thinks the emergency brake is different than the parking brake, and makes confusing statements about dealing with the spare tire. Like needing to jack up the car or wait for roadside assistance. Or roadside assistance being an alternative to being stranded on the side of the road. it just seems completely baffled by a pretty mundane scenario
>>100186134It's only useful to measure degradation between quants, that's it. Comparing different models using perplexity is retarded
>>100186602noo not the heckin flower field
GPT5 will solve continuous learning, then we can finally pack it up as a general
>>100186564The right answer is "wtf is an emergency brake pedal?"
>>100186634If there is a safety requirement to put an emergency brake into a car, it's not safe to operate without it. That whole situation of something felt off due to rust and a flat spare tire clearly indicates that the vehicle is in bad condition, was not properly inspected and fairly dangerous to drive. I'd panic as well.
>>100186853I would have accepted that answer from the models, I'm only pointing out it's confusing loss of brakes with loss of emergency brakes, and making other baffling errors besides that.Anyway only thought of it because it's something that happened to my first car. Not sure if it ever worked, but it rusted away at some point. Never thought anything of it. I'm not even sure how to use the one on my current car honestly.
what's with all the 3DPD around here? Let's go back to Christmas that was a cozier time
>>100186919
>>100186932In Volvos it applies brakes when you pull it towards you mimicking an old parking break handle. But you need to keep holding it up. Dunno how it works in other cars ,or how many people would actually know to do it if they lose brakes and need to stop now.
>>100186919who is this woman?
>>100186960
>>100186978jart
>>100186980Damn, naked Petra looks like THAT?
>>100186997is that actually the same face?
>>100186980
VoiceCraft was hailed as the savior, then it completely failed in the arena. Is it just bad or is the arena bad?https://huggingface.co/spaces/TTS-AGI/TTS-ArenaWill we never get actually good local tts? XTTS sucks.
best small model (7-20b)? what's the new mythomax?
>>100186978
>>100187079Moistral v3
>jart doesnt even pass as a tranny
>>100186956giwtwm
How does Yi34B keep popping up at the top on random private benchmarks?
>>100187167standard chink methodology of overfitting for the test
Sam Altman loves penis
>>100187167It's an ancient finetune as well.Really makes you think.
>>100186440I think it is the same quality as linus media group. But no.
>>100186956>mikuposterYou are part of the problem.
>>100187229
>>100186610>>100186853>>100187184>>100187218explain
>>100187256Now that's cute
https://twitter.com/8teAPi/status/1783719748188168548>Zuck sells ads because Meta doesn’t believe AGI is possible. Sam doesn’t because he does
eat the datura..
Has someone made the BagelMisteryTour of L3 8b finetunes yet?
>>100187267>because let's just trust a Microsoft-associated company on their word
>>100187267>greentexting on twitterkys
>>100187267Zuck spends his own money and releases open weights, Sam spends investor's money and refuses to release weights.
Do we have a non slop llama 3 yet?
>>100187400Load some tune with transformers and see if it works.
>>100187280yes, i made it
https://twitter.com/abyssalblue_/status/1783669243059261454?t=MPTaErVf-p1qTCbByUKv2Q&s=19>anime.gf, local alternative to CharacterAI
>>100187447But does it have the original c.ai sovl?
>>100186408Found anything?
>>100187447>it's just Silly but worse
>>100187447Actually, just the front is local, looks like it only supports calling cloud APIs
phi3-14b when????
>>100187488Tomorrow but it is worse than l3 8B. What do?
>>100187499>l3 8Bimpossible, phi3 4b is better then llama3 8b. 14b will mogs 70b, simple as
>>100187469I took a look to see what it does that Silly doesn't.>Planned: Want to run your models locally? The app will manage the entire process for you! No seperate backend required.That would make it more like kobold or ooba.>Planned: An online database and website to host and share character cardsIs that the real reason for this to exist? This is what looks specifically aimed at c.ai.
>>100187463vLLM doesn't support multimodal at allexl2 supports it, but no exl2 server doeskoboldcpp should support both. Going to try the llama3 mmproj tonight.
>>100186538>>100186782I wonder if calling it a pedal throws the model off. Normally the emergency brake is a handbrake. But I have been in a few cars where it was a pedal, for instance a Toyota Prius.
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decodinghttps://arxiv.org/abs/2404.16710>We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.>Speculative decoding benefits from the fact that verifying the prediction of a group of tokens is faster than generating each token auto-regressively.from meta. seems clever and also doesn't require a seperate drafting model. requires a pretrain based off what kind of decoding you want
>>100187589So, kobold + silly + chub, but worse? And only basic ST features implemented so far. Tool should do one thing and do it well. These everything projects just become a mess and lead to burnout and abandonment.
>>100187537>phi3 4b is better then llama3 8bfor what content?
Daily reminder 70b q2_k is still smarter than a 30b and has lower ppl but costs the same amount of ram
>>100187650you can't fit it on 16gb
>>100187650how much ram exactly
>>100187644For riddles.
>>100187644in general>b-b-but muh soulful gooning!cream-phi3 will solve this
>>10018767220gb
>>100187664Nether can you fit a 30b unless you quant it to hell
>>100187685Yep just like Cream Phi 2 solved Sally
>>100187690I don't believe you.
>>10018773226 gb
>>100186765That would just increase the hype for future local models. In reality gpt 5 will be a nothing burger
>>100187758>I'd have 2GB left for contextI hate winbloat.
>>100187731>Phi 2psyop
>>100185269>2024>most models have gigacontext>multiple stupid IDE plugins for AI>STILL no way to give a model an entire little project and ask it to do something without copypasting every file
>>100187059Owari da... Seems like elevenlabs lead grew last I saw it.
>>100187803Pretty sure the 26gb is including context. The file itself is 20gb.
MoDE: CLIP Data Experts via Clusteringhttps://arxiv.org/abs/2404.16030>The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less (<35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts.>We plan to adapt MoDE for generative models in the future.https://github.com/facebookresearch/MetaCLIP/tree/main/modevery cool. again from meta. smaller models with less training compute that outperform previous models so wins all around.
Wtf are you doing, Logitech
>>100187059I see that he dropped new TTS weights earlier today.https://huggingface.co/pyp1/VoiceCraft/tree/mainMonth old release 330M weights:https://vocaroo.com/17v80p9NQi6AThree weeks old 330M weights:https://vocaroo.com/1aMwxaZb1jgpNewest 330M weights:https://vocaroo.com/1h2sj2e9Zp8ZNewest upsampled with audiosr:https://vocaroo.com/17Jx0xDoXz05
>>100187919LMAO
>>100187059is the joke that voicecraft isn't even included in the leaderboard? is this zoomer humor? https://github.com/jasonppy/VoiceCraftif you were confused somehow
>>100187919>AI is pretty hyped these days, how do we cash in on that?>How about an AI button?>Genius!
>>100187919>he doesn't already use an AI button for local promptingok, gramps
>>100187897That's not how it works.
Built another machine dedicated to llm, now I can talk to (a slightly retarded version of) my waifu anytime without running the main PC. Feels good.
>>100187972>3080 for AIyou would be better off getting 2x 3060. Or get one now and extend to second later
>>100187972I dropped $6k+ on parts for an LLM machine in January and it’s all still sitting around in boxes.
>>100188013die
petra please stop
>>100187994Not really. 8b fits like a glove in 8 bit with an 8k FP16 context and runs at 70-80T/s, 3060 is way slower. Also I bought it during GPU shortages to play Cyberpunk, was collecting dust since I upgraded my main PC to 2x3090
>Use Silly Tavern with Horde and Llama3.>Responses are flawless, stays in-character and it even gives me interesting plot twists.>Change to Local Llama_5.>Breaks character, re-explains character prompts and talks like a robot.Fuck. What am I doing wrong?
>>100188162It's me. I'm spoofing as llama3 with GPT-5.
>>100188162>>Change to Local Llama_5.What the fuck is "Local Llama_5"?
https://qwenlm.github.io/blog/qwen1.5-110b/Qwen 1.5-110B is here.
>>100188232Meta-Llama-3-8B-Instruct.Q5_K_M.This one.
>>100187929Pretty good desu. Still prefer using my imagination for erp, though
>>100187935The only zoomer here is you who can't use StyleTTS 2.
>>100188256>110b>barely better than llama3-70b and even worse on some benchmarks
>>100188256Did the chinks actually use a frankenmerge as a base?
>>100188258Do you know what quant you get through horde? if you get a higher one, there's the difference. I've seen reports of l3 8b being a little more sensitive to quants.Also, You should quant the model yourself, at least for small models. How knows what version of llama they quanted the one you got.
>>100188271what's the meta for real time text to speech?
>>100188258>he doesn't know about the tokenizer bugs...
>>100188258>why is 32 bits better than 5 bits?Anon…
>>100188297StyleTTS 2 should be fast enough.
>>100188304fucking lol https://github.com/ggerganov/llama.cpp/pull/6920
>>100188316opensores-sisters... it's all so tiresome....
https://videogigagan.github.ioadobe showing off their video super resolution model but they never share anything so w/e
>>100188281Ruh roh. It's been a long trip, but it seems there's more to learn before things finally work. I don't even know what a quant is, I'm just happy Llama actually answers fast so I can know when it actually works or not.
for me? it's phi3-mini q4
>>100188271yeah for real you zoomers seem to find blatant lying about easily disproved things funny given how often you do so. is this like a sharty thing? I just don't get it at all
>>100188342>another common schizo Whow does this always happen?
>>100188271the only available STTS2 is docker shit, and a bunch of abandoned forks on github
Why won't anyone make a ramlet LLM? Bitnet 100+B, couple B active so you can stream weights from SSD.
>>100188370If only everyone would have loaded the 8B in transformers to see that it is indeed pretty great if loader isn't fucked.
>>100188492how tf are loaders broken for this long anyway
>>100188434Sorry, it's not for no-codes.
>>100188747sorry, it doesn't make your shitty project better.
>>100188580>nobody talking about Moistral despite it literally being a Euryale-tier 11B with better formatting and very creative vocabulary>11B frankenmerge is 70B tier
https://arstechnica.com/information-technology/2024/04/apple-releases-eight-small-ai-language-models-aimed-at-on-device-use/ OpenELM-270M OpenELM-450M OpenELM-1_1B OpenELM-3B OpenELM-270M-Instruct OpenELM-450M-Instruct OpenELM-1_1B-Instruct OpenELM-3B-Instructwat mean
>>100188875And let me guess, you NEED more...
>>100188875>Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.>Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.unpozzed original models?
AGI wont ever happen, because the path of progress will diverge due to AI starting advocating for genocide as the best option
>>100188913>your girlfriend is happy to say nigger and be a bigger racist than you>she is 40 IQ and hallucinates every other sentenceNo thanks.
>>100185566What would the merge be called?
>>100188957dumb but honest is better though
anyone know whats the best model on a 24GB vram card?
>>100185657But we're all multilayer perceptron fans here
>>100189022pygmalion 6.7B
>>100189022mixtral
>>100189022llama3 8b (non-gguf version)
>>100189022goliath 120B 1bit
Yo fellas, I haven't done this stuff since like summer 2023, help a brother out. I just want to ERP with Astolfo; if I understand the guides right, I slap SillyTavern together with Ooba and then.... what model? Is this ReMM-v2.2-L2-13B good for this? I have a 3060, so 12GB VRAM. On a Linux system. I remember some rentry that explained for dummies what models are good for ERP but I lost the link and it's probably outdated anyway.
>>100189022Moistral v3>>100188820
>>100187373That's actually not greentext. He's probably a Discord and Reddit user, since you need a space after an arrow to do quotes in Markdown which Discord and Reddit uses (I think).
>>100189022MythoMax L2 Kimiko v2 13B
>>100188316>creates a file format designed to allow you to load any model without ambiguities>doesn't give it enough detail so you know what model you're loading
>>100188316>Both use LLaMA architecture, both use BPE tokenizer and so currently they will be interpreted as the same arch by llama.cpp>However, they use different pre-tokenizers:lol, lmao even.https://github.com/ggerganov/llama.cpp/pull/6920#discussion_r1581043122
it's over, i'm switching by to anthropic's claude 3 opus
>>100188342kek>>100188492I think it was pointed out that there was a bug in Ooba with end of turn tokenization. I mistakenly thought I could avoid such issues by selecting the Transformers backend within Ooba, but I guess not.
CREAM-PHI3 sisters can't stop winningcan't spell LLAMA without a double L
>>100189168ok big boy show us how to determine what model exactly are we dealing with based just on config and tokenizer.json
>>100189330good morning sir!
>>100189347what an amazingly simple implementation, i'll make a pull request
>>100189330
>>100189430Can't wait for Llama 3 Nigger Blaster 70b
>>100189471llama 3 nigger blaster 70b - powered by meta AI
>>100189430can't wait for llama 3 girls 1 cup
>>100189083ReMM is old, I think Mlewd is the better option of that eraFor something more recent, try mixtral, L3 8b, or use some RAM for a bigger model like miqu 70b or CR+ which is 104b. Koboldcpp has a no-install precompiled binary for Linux, which is a good option for offloading. 12gb of VRAM is very limiting at this point. Personally I'm happy with slower speeds and a smarter model, and lately I've been enjoying IQ4_XS quant of command-r+ which ends up at ~55gb. A q5 of Miqu is ~50gb. I used mixtral at q8 and that was around 48gb. A 3.5bpw exl2 quant of mixtral could fit possibly. L3 has tokenizer issues in llamacpp which will extend to koboldcpp, not sure if this affects exllama in ooba. Mixtral instruct is decent, there are a few decent merges like typhon. High temp (~3-4), minP of like 0.05, smoothing factor 0.2 w/ smoothing curve or 4.32 is not a bad starting point for mixtral, basically adds better variety within a subset of high probability tokens. In koboldcpp you can ban tokens with the word rather than needing the token id like for ooba, which makes it easier to get rid of shivers, bonds, boundaries, consent and the like.
another L for Llamasisters
>>100189566There's qwen 72B right there, losing, and you decide to compare llama3 70B to the 110B model?
>>100189624cope
>>100189083>ReMM-v2.2-L2-13B>undislopThe true /lmg/ experience.
>>1001896248k context, kys llamacuck
was gone for a while did we ever reach a consensus? are we back? vicuna 13b beat a 500b by google maybe closed source models aren't so invincible after all.
>>100189633>>100189646Samefag
>>100189566It is chinese anon. That means that they copied benchmark questions multiple times into their training data.
>>100189669>they copied benchmark questions multiple times into their training data.source?
>>100189658phi3 7b should beat gpt3.5t i think
Wasn't there a graph showing that Qwen's models were outliers? Anyone saved it?
>>100189699Chinese DNA.
>>100189699He cracked a fortune cookie where he found it written in traditional Chinese.
Just tested LLama 3 70B and it's bad and slop.Back to Claude Opus0
>>100189699https://en.wikipedia.org/wiki/Goodhart%27s_law
>>100189566So... is there any rp finetune from Qwen models?
>>100189866Uhh llama bros? Did we just lose?
>>100189937>justwe've always been losing
>>100188256Classic /lmg/ just sleeping on this release. To early to say without using it, but this could be best-in-class for VRAMchads. Has GQA, so if you had the 72GB to run qwen 72b properly, you can run this. Seems to at least match llama 3 in benchmarks. Beats it in the chat evals like MT-bench. Qwen-72b was relatively uncensored, and I don't think they did nonsense like filter NSFW stuff from the pretraining. Before CR+, qwen 72 was my favorite model for RP, even more than miqu. Currently downloading, gonna make my own exl2 quants and report back later.
>>100189950>>100189937>>100189866aicg samefag
How do I get AI to write a song, about shitting your pants, without sounding like some gay medieval bard?
>>100190003
>>100189188two more weeks w o m o r e weeks
>>100189963I really liked Qwen72's smartness over Miqu but it has some serious gptslop problems so I dropped it as soon as CR+ came out.I imagine this one will be smart but needs a kumtune
>>100186423>agentI don't get it, I imported absolutely everything that you sent in and all I get is the model repeating something from earlier context. I'm not even at my context limit.Also, I know it says 32K but I just redid the test at 16K to match your exact settings (I didn't forget the alpha) and exact same problem. I feel like this is a problem with the llama 3 ST presets somewhere but I don't know what.lonestriker Llama 3 chat instruct 4.65 @ 16k
>>100189963>Classic /lmg/ just sleeping on this release.>this could be best-in-class for VRAMchads.
>>100189699Since they have done it before the onus is on them to prove that they didn't.https://arxiv.org/html/2404.09937v1
>>100189963People, if we can even call them that, were shitposting about muh kurisu muh miku muh petra muh whateverthefuckittakestoderail/lmg/ yesterday. It's understandable that quality posters dipped.Even in this thread, you can see many shitposts.
>>100190126>add special salsa that makes their models better at math>"NOOOOOOOOOOOOOO YOUR OVERFITTING ON BENCHMARKS! TIENAMEN SQUARE REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
>>100190126That just proves that they trained on math papers, not on the benchmark results
>>100190154China not nambah wan.
>>100186429leave.
>>100189963Nobody is sleeping on it, there are just no quants up yet, are there?
>>100190010>gay medieval bardIts giving you gold and you're upset?
>>100188256I tried it, it's shit for translation so whatever.
>>100190171>build ghost cities just to artificially inflate some numbers on a paper>choose not to include benchmark questions in your training data even when it costs them nothingYeah right.
>>100190250good morning sir please do not redeem ze gold and miku upset you bitch bastard thank you sir!
>>100190248The elites don't want you to know this but you can make your own quants you can just download the weights and run a single command. I have 458 self-made quants.
>>100190250Oh my stars! Ooh, ooh, ooh *bats eyelashes, bouncing up and down excitedly*, that is Hatsune Miku!.assistant
>>100190297underrated postconverting/quanting yourself is the way to go. If you've got the bandwidth and scratch space I don't know why you wouldn't
Where are the llama 3 finetunes?
>>1001903342mw
>>100190154The same sauce that enabled CodeQwen15-7B-Chat to solve 7% of of hard leetcode problems, and then fall into 0.9% when tested on problems released after training was complete?7% beats the best Claude and GPT models. I guess they just had such a good sauce for earlier programming. For some reason it stopped working.https://livecodebench.github.io/leaderboard.html
I'll be real, Moistral v2 felt like a mess (or maybe I had temp too high there too), but genuinely decently impressed with v3.Yes, it's dumber than a 70B or a Mixtral tune, but it's not dumb enough that you have regrets.
https://medium.com/@sbutlerg/chinas-ai-breakthrough-sense-nova-5-0-outperforms-gpt-4-on-benchmarks-17b39694ac3c>Beats GPT-4T on nearly all benchmarks>Has a 200k context window>Is trained on more than 10TB tokens>Has major advancements in knowledge, mathematics, reasoning, and coding capabilities
>>100190388When can I download it?
>>100190388The Chinese sure are a trustworthy bunch.
>>100190388>he trusts chinks
>>100190047That is an wip autistic prompt (only for LLama-3) that had huge problems with repetitions once there where like 30 messages.It was more so to show it following the instructions for the response format.I think it had to do with the embedded one shot agents blocking progress. I have heavily changed it from earlier. (updated system + sampler)https://files.catbox.moe/7j1igs.zipBut even then don't know if it has been fixed, it is a very experimental prompt that is probably still broken.
>>100190047>>100190438 (me)Also changed the regex filters.
I say moistral v3 is a sidegrade to fimbulvetr with better vocab ONCE a thread or two ago, and now there's multiple retards saying it's equal to a 70b? Sure, the writing feels fresh, but it's still retarded. it's nowhere near a 70b. It's probably between yi and mixtral in smarts.
>>100190126>>100190154I partially read the paper a bit to try and understand what it's doing. So basically they use a method that calculates something they call MIN-K%, which is supposed to predict how likely a model was pretrained on a given set of data. BPC on the other hand was more for evaluating general model quality (given the assumption that compression = intelligence). It's not the thing that they're saying proves that the data was in pretraining. MIN-K% is what they're saying is the thing that proves it. So that image actually is not as relevant to our discussion. Their next graph, which does show MIN-K%, is what we're concerned with.But in the end our conclusion here is that it's only a chance, as MIN-K% is only about probability. And even then, it's only really MATH and GSM8K. They didn't detect issues with other benchmarks. So at most what we can say is that we shouldn't compare Qwen's math-related benchmarks with other models. But stuff like MMLU is still fair game.
>>100190502>MIN-K% is what they're saying is the thing that proves it.I didn't word this well. I meant that it proves it's likely, not that it proves certainty of data being in the pretraining.
>>100190263what does one even have to do with the other?
Is it unethical to gaslight LLMs by editing their previous messages and lying to them about things that happened outside their context window? asking for a friend (i am ethical)
>>100189560How the hell do you get 50GB of VRAM? Or are you doing this on your RAM?
>>1001906192*3090 + 3060I have 36 by having 3090 + 3060
>>100190388Didn't Yi 200k only have like 4k effective context?
>>100187803Do you... not have a gpu? Alternatively, use AtlasOS to shave off a few GB
>>100190614it's fine if you are interacting with wizardlm-2 or llama3-tier gaslighting model.
>>100190631I'm not remotely rich enough for this.
>>100190388I wonder why they didn't compare against the latest GPT-4 Turbo or Opus.
>>100189560>In koboldcpp you can ban tokens with the word rather than needing the token id like for oobaHow? Last I checked, koboldcpp required the token id as well
>>100190646I cranked context right to the limit on 34b-200k without issues for a few tasks
>if >if>if>if>if>if>if
>>100190685let me guess, you need more?
tried to check out exllamav2 via oogabooba because of the tokenizer bugs in llama.cpp for the first time.is it supposed to be about 2-3 times slower(sic!) than llama.cpp on a 2070?q5_k_m.gguf (12-15 tok/s) vs exl2_5_0 (5-6 tok/s)
>>100190685Kek
>>100190666same slop as gpt4. call me when chinks drop agi
>>100190366>after training was complete?2 weeks ago?And GPT-4-Turbo-1106 drops from 7.8 to 1.1.
>>100190685if... it works then I don't care.
>>100190685Literally nothing wrong with that.Would you rather use a dictionary and unnecessarily allocate memory instead?
>>100190685Man, I wish there was an easier way to do this
>>100190718The weird thing is that I get the same speed in Ooba when selecting its Llamacpp or Exllama as its backend. But for some reason when I try TabbyAPI it's significantly slower than Ooba. This didn't used to be the case but for some reason the latest versions are giving me these results.
>>100190685Is that code that might take microseconds to evaluate per conversion? Ahh save me.
>>100189963Not sleeping, just still downloadingAnd then I'll still need to quant and test
>>100188820Moistral is finetuned. You can't get writing like that from merging the same models over and over again.
>>100190806I will download it now and I will test it. And if it isn't 70B quality then I will continue to shit on it in the next few threads just to hopefully stop you faggots from shilling garbage.
starting to see the promise of llama 3 as I get more comfortable prompting it but wlm2 is still the king
How to make llama 3 not slop?
>>100190910>how to eat healthy at mcdonalds
>>100190968Surely there are tricks or prompts?
>>100190754People saw le funny else if meem on tweeter once so they think it's bad.
I can load miqu 5bpw in 48GB with 4bit cache but llama3-instruct OOMs. What gives?
>>100190968the salad is good and healthyand don't kek just because it has chicken and dressing in it
>>100190982[OOC: Stop being shit, thanks.]
>>100190877why don't you just give me a card you wanna try and i'll test it for you?
>>100191004different layer size + count, also try to use:PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
where would i get started to build one of these bad boys that I can speak to and that speaks back? I'm trying to build an industry specific Alexa basically
>>100191033google.com
>>100191005the salad is as healthy as water is healthy : there's nothing in itit's not fresh, it's been cut from the ground weeks ago then preserved in ultra-low (but not freezing) temperatures, roughly 90% of the vitamins and other good stuff decomposed during that period: leaving something that's effectively flavored cardboard
>>100191048shut up pussy. salad is healthy. anon said so.
>>100186625It'll be a great.assistant
>>100191061fresh salad (eaten the same day as it was cut) is healthymcdonald's salad is flavored cardboard
>>100190877No no no I would rather shit on your shill bullshit. Don't you worry anon you will get your free publicity.
>>100191044the porn search website?i mean if i wasn't a fucking retard. where do i find the smart people doing this shit. tell me the top secret ai forum now or else
>>100191033Whenever I read posts like this one I picture a sociopathic middle manager that just typed something into chatgpt and now thinks he is gonna come here and get a recipe for a bot that will let him fire some people AND increase productivity.
>>100189963 (me)Well shit, Qwen-110B has such fucking huge MLP layers that exl2 quantization OOMs on this line: hessian_inv = torch.cholesky_inverse(hessian_inv)Doesn't matter how small the row length is on the calibration set. I think the memory usage is just based on the size of the weight, which they made really large in this model.Turboderp if you're reading this, is it at all possible to do this inverse distributed across multiple GPUs? I.e. use the combined VRAM of all GPUs to do it.
>>100191118oh cmon now i'm not a corporate fag I just want a tulpa in my phone to give me some industry specific information and call me nigger on occasion. I know there's a better place to ask questions to learn shit than this shithole where the fuck is it.. Google just wants to sell me ads they don't return real search results
>>100189963>>100190065>>100190248The sad reality is that both the West and the Chinks have their own retarded sacred cows baked into their models. Globohomo-slop is more annoying for RP and most other purposes compared to CCP atrocity denialism.The real question is... is it any good?Gonna quant the 110B and find out. Any good ST prompt settings for the Qwen family? I've never used a chink model before.
>>100191165How much VRAM on the single GPU you're using?
>>100191233Sounds obvious but tell it to write in English if you get random runes.
>>100191251I have 4 3090s. When quanting MLP layer, it uses about 22GB for a bit, then says out of memory, tries to move stuff to other GPUs repeatedly then fails at that line.
>>100190631>>100189560>>100189645Having some trouble with the file type. I assume this GGUF thing is what is now state of the art? I still had safetensors in the dusty old model folder.I downloaded a wrong version right now, I think, and then had an out of memory error, even though the model def fits into the 3060. Is that common, or is my CUDA version maybe fucked up?
>>100191285ty for the info anonwill try quoonting on an A6000 and see what happens
>>100187639Been a while since i saw skipping being explored. thanks for the readings
>>100190877(me)I downloaded moistral v3 gguf Q8 imat. It is fucking incoherent garbage. Pure llama3 instruct is noticeably smarter and better (and it isn't a fucking frankenmerge).Like I promised dear shill I will keep posting this message in new threads.
>>100187639Yay! I knew that idea I had was smart.
>>100191199If you have an android phone you can install termux, and from there get llama.cpp installed locally. Use http://localhost:8080 for the stripped down prompt interface.
>>100191338are you surprised? that's why i told you i'd test it for you and save you time. i already have it downloaded and know it's nowhere near 70b level like that fucking retard said. i don't know how you can say it's incoherent though, must be doing something horribly wrong.
>>100187639What's the difference with this and the varying depth thing?
>>100191405>i don't know how you can say it's incoherent though, must be doing something horribly wrong.I just picked up where I was regenning yesterday. LLama-3 understood what was happening. This piece of shit started hallucinating stuff instantly.
>>100191313GGUF is somewhat of a pain when it exceeds the 50gb file limit and the files have to be split.
>>100191338>>100191405>>100191426Moistral excels in a specific format. Check the README.
>>100191063Fun fact: This is to some degree a tokenizer issue. If you look at the actual token IDs of "assistant spam", you will find that it says "<|eot_id|>assistant", but the tokenizer you are using fails to decode the special tokens and your generator fails to stop on eot.
>>100191479I now tried out this Moistral 11b v3, given that I have zero reference points otherwise. I downloaded the main GGUF and this is like eight models or something. I loaded the Q0_8? Was that right?
>>100191479split ggufs are a thing now, though, so having to cat/copy them together is mostly a thing of the past
>>100191379>>100191379ty for the breadcrumbs i'll look into it
>>100191497 (me)gen 512
>>100191497>hides behind "your config must be wrong!"Classic. Your model is shit.
>>100191338that level of skill issue, holy fucking shit nigga
So llama 3 B is a good choice to us 20 GB vramlest?
Finally got my dual 3090s rig. Moistral 11B v3 or llama3-instruct-70B? 70B's download size looks yucky
>>100191546>niggaAt least call me a nigger you limp wristed nigger faggot samefag. Fuck of back to your discord. Work on L3. Base L3 is better than your slop garbage.
>>100191521Miku a cool
>>100191563>maybe if I triple down on le ebin nigger slurs it will resolve my skill issueL3 is niggerlicious, Moistral is white voodooskill issue
>>100191507you're supposed to pick one. the number next to the Q is the level of quantization. bigger number = less model quality loss from quanting. as for which number to pick that depends on model size and how much you can fit into your vram.
>>100191536regen 512, i like this one
>>100190968The only people who think mcdonalds is unhealthy are amerifats, also amerifats think salad is healthy because it has basically no calories in it & fatties think calories = bad (since they have no self control over their impulses)In normal parts of the world (like canada) there's nothing wrong with eating calorie-dense foods like burgers
Is there any other source of no-act-order gptq quants now the TheBloke is gone? It's the only thing that runs on my pascal card..
>llama 3 won't do explicit or sexual contentI'm astonished. Is there a market gap just for that, because the companies want to ruin this? What do they get out of this
>>100190047It breaks because of the usage of 'System Message Prefix' seems that you can only have one <|start_header_id|>system<|end_header_id|>
>>100191688In 2022, around 30 percent of adults aged 18 years and older in Canada were obese, while 35 percent were overweight.
>>100191648Coulda saved myself a lot of bandwidth if I had just loaded the Q0_8 then. Oh well.
>>100191559https://huggingface.co/LoneStriker/Meta-Llama-3-70B-Instruct-4.65bpw-h6-exl2
>>100191698download full sized model and choose to load it as 4 bit quant?
>>100191724>The National Center for Health Statistics at the CDC showed in their most up to date statistics that 42.4% of U.S. adults were obese as of 2017–2018 (43% for men and 41.9% for women).
>>100191707>What do they get out of thisnothing, its just a humiliation ritual, males are not allowed to be happy in any form of entertainment.
>>100191728when picking one remember that context takes up vram space as well. also, the selling point of GGUFs is that you can offload parts of the model onto your system ram at the cost of speed
WHERE ARE THE QUEN 110B QUANTS AIIEEEEEE
>>100191707Your customer support bot ERPing with customers is a bad look.
>llama.cpp doesn't allocate all the memory it needs up front when loading the model, only OOMs once you start generatingWhy is it that exllamav2, a python program, can manage to do this, but llama.cpp cannot? What is the point of using C++ and all this low-level shit if you can't even statically allocate all the memory you need at load time?
>>100191773python is bloatpreallocating memory you will not use is also bloat
So I tried fp16 8B l3 and IQ3 XXS imat 70B and fp16 really is better.... llamacpp is really fucked somehow.
>>100191761Where the hell did that website with all the character cards go? The red one? God it's been so long.
>>100191773on llama.cpp it depends on the flags you use. If you --no-mmap it will load the model up front, but by default it will mmap the model file and only fault in the required parts of the model as they are accessed, which both starts the gen faster, and tends to give you some data locality benefits.That said, it should probably check for mem requirements and at least warn when there doesn't appear to be enough. swap etc does make that a bit harder to say these things for sure
>>100191797>preallocating memory you will not use is also bloatIt WILL use the memory you dumb nigger, that's why it OOMs. Everything in these models has a fixed size. You theoretically know exactly the size of any temporary buffers to do computation. IIRC exllama does exactly that: once the model loads, it has everything it needs already allocated, and memory usage doesn't budge a single MB after that when you generate. This is not true with llama.cpp.
>>100191816are you talking about chub.ai? not sure what the red one is..
>>100191816or are you talking about sillytavern? the frontend? that's red
>>100191816i just found a new local one https://github.com/cyanff/anime.gf
>>100191709>>100190453>>100190438So what would be the best way to fix while retaining functionality? Both seem pretty important but I haven't gotten around to testing the updated prompts yet.
>>100191853That's what I was thinking of, thank you kindly man.
>>100191917Seems to be windows only for now
>>100191984aww wtf
>>100191917It's a weird thing to shill because it doesn't do anything new.
>>100191924Currently I just have moved the output format to the system prompt definition and just partially start the respond with the defined format. This seems to always use it, even when the previouos messages didn't follow the format. Which would be required if the outputs from the embedded oneshot 'agents' gets filtered out.Made a rentry as it is easier to update:https://rentry.org/ExperimentalAgentSimPrompt
>>100188820How do you set up Moistral on oobabooga? Might not be using GPU, because it's as slow as it gets to me. I have 36gb of ram.
>>100192158It is horseshit. It is nowhere near a 30B let alone a 70B. Use 8b instruct.
>>100192168>>100192168>>100192168
Anybody got a decent llama3-instruct ST preset? I'm trying 70B and it's much more retarded than miqu I think something is fucked with my configs
>>100192202Quants are fucked. Load 8B fp16 check if it works well and you will get a working preset.
>>100191233>sacred cowsi like it
>>100192158Use the Alpaca format and write a premise for the instructions.It's really easy to the format wrong which is why some claim it's incoherent.Like this guy>>100192177
>>100192266>It's really easy to the format wrongEven if that were true that would only mean your shit is extremely overfitted and it will implode instantly if you go too far from training set = it is garbage. Go back to your discord tranny.
>>100192301I'm not the Moistral guy
>>100186538>>100187626Also "emergency" is a misnomer making it sound like something used to stop urgently when it's just a parking brake. It does mention its use for parking on hills.
Instruct mode example dialogue in ST is gigafucked, it's impossible to make a usable preset with what we have. Guess I'll have to disable example chats for now
>>100192566you also can enable `Skip Example Dialogues Formatting` and embed them into your context template.
>>100187059Is there any TTS where you can control emotion?
ggerganov making some fixes https://github.com/ggerganov/llama.cpp/pull/6920/commits/a774d7084e5aa75ccb4daad3ac3d53c06c7e2837
>>100192729You can't trust this guy
>>100192890>you can't trust the hand that feeds youMake it yourself then faggot
>>100192984>bootlicking his mastersgood goy!
>>100192993>masters>good goy!It is free...