/lmg/ - A general dedicated to the discussion and development of local models►Previous Thread >>96077130 & >>96062736►News>(09/12) ExllamaV2 and EXL2 format released>(09/10) https://sites.google.com/view/medusa-llm>(09/06) Falcon 180B released>(09/04) llama.cpp: CPU only LoRA finetuning https://rentry.org/cpu-lora>(09/03) llama.cpp: Speculative token sampling >(08/25) llama.cpp: ROCm support>(08/24) Meta AI released Code Llama (7,13,34B with 16k up to 100k context)>(07/18) Llama 2 released►Model RankingsHF: https://hf.co/spaces/HuggingFaceH4/open_llm_leaderboardCODE: https://hf.co/spaces/bigcode/bigcode-models-leaderboardPLAP: https://rentry.org/ayumi_erp_rating►FAQ>Main FAQhttps://rentry.org/er2qd►General LLM Guides & Resources>Newb Guideshttps://rentry.org/llama_v2_sillytavern - https://rentry.org/local_LLM_guide>aicg Newb Guideshttps://rentry.org/meta_golocal_list>Llama 2 Jailbreaking Guidehttps://rentry.org/llama2-uncensored>LlaMA Guidehttps://rentry.org/TESFT-LLaMa>Machine Learning Roadmaphttps://rentry.org/machine-learning-roadmap>Novice's LLM Training Guidehttps://rentry.org/llm-training>Local Models Papershttps://rentry.org/LocalModelsPapers>Quantization Guidehttps://rentry.org/easyquantguide>lmg General Resourceshttps://rentry.org/lmg-resources>ROCm AMD Guidehttps://rentry.org/eq3hg►Model DL Links, & Guides>Model Links & DLhttps://rentry.org/lmg_models>lmg Related Linkshttps://rentry.org/LocalModelsLinks►Text Gen. UI>Text Gen. WebUIhttps://github.com/oobabooga/text-generation-webui>KoboldCPPhttps://github.com/LostRuins/koboldcpp>KoboldAIhttps://github.com/0cc4m/KoboldAI>SimpleLlamahttps://github.com/NO-ob/simpleLlama►ERP/RP/Story Gen.>ERP/RP Data Collectionhttps://rentry.org/qib8f>LLaMA RP Proxyhttps://rentry.org/better-llama-roleplay►Other Resources>Miku! (desu) (boku)https://rentry.org/lmg-resources#all-things-miku>Benchmark Promptshttps://pastebin.com/LmRhwUCA
If I am getting CUDA home environment variable errors when trying to load exl2, the best thing is to instead keep booba in its own conda, right? Seperate from SD, RVC, and tranny voice changer?
>>96087231the whole point of conda is that everything has it's own environment, what the fuck is wrong with you?
>>96087250I thought AI stuff could be together, given they all share pytorch, CUDA, etc. Was this a mistake?! Oh God NO
>>96087280Did you really think every requirements file is the same thing...
>>96087304in the field of local models why wouldn't it be? Fags should all be on unified versions like pytorch 2.0, CUDA 11.8
>>96087280It was a terrible mistake. The exact mistake that virtual environments are supposed to prevent.What I do though is have a python install so that its system site-packages contains the torch/torchvision/torchaudio all the other environments are going to use anyways so at least I can save a bit of disk space from not duplicating that.
>>96087324so python torch/torchvision/torchaudio is installed in the (based) conda environment? Or do you mean installed natively?
I'm making my very first models. But, the thing is, I have 0 familiarity with the libraries, modules, frameworks and the syntax for Python. So,, I ask ChatGPT to handhold me through the process, giving me the syntax and necessary libraries for what I need to do. When I get an error I ask ChatGPT to explain whats wrong and how I can fix it.Its not that I completely leave the process to ChatGPT. I make necessary modifications, such as when I made a logistic regression model for email spam classification, in the first model, I got 97 percent accuracy, but I noticed that I could probably increase the accuracy by removing stop words as features. So I asked ChatGPT to come up with a list of stop words, then if any stop words from the list is in my dataset features, I dropped that feature. Then my accuracy increased to 99 percent.My question is, is taking help like this from ChatGPT ok? Or should I do the entire thing myself?
>>96087189Lmao Stheno 70B just interrupted my RP with>some description of John Smith's appearance in the narration would be nice>with that, I'd like to know more details about John Smith's body and actions>could you add in some dialogue from John Smith?>I need more detail on John Smith's actions and feelings in the narration>I think John Smith's attitude needs to be described in the narration>John Smith's personality needs to be described in the narratio>John Smith's mannerisms could be described in the narration>more descriptions of John Smith's physical traits please>John Smith's thoughts should be described in the narration>John Smith's emotions should also be described in the narration>John Smith's physical condition can also be described in the narration>some background information about John Smith would beI only replaced the character name.70B Stheno self-aware?!>it knows the card is shit
>>96087373I use pyenv and virtualenvs so torch is installed in one of the pyenv versions.
>>96086311 fine, but what about the other PRs?Why there's no separate thread pool for prompt and eval? Why is Vulcan dead? Why isn't xgen merged? Falcon converter still misses deps? Yarn and Alibi still broken? Batch inference still not merged?wtf is goin on here?
>>96087593last question if you are so kind, should I install CUDA (11.8?) through my distro package manager?
>>96086527 #yeah it's scalar>50% mem , 20% computeand the remaining 30% is idling???
>>96087726please... tell me what makes this differnet...
>>96087726is beats per wiggle just another way to say quant so 6 bpw = Q6?
Is there a Lora guide for retards?
>>96087189Why is the inference speed of Llama.cpp on Sapphire Rapids HBM only 40% faster, and not like 2000% faster?What the actual fuck?https://github.com/ggerganov/llama.cpp/pull/2603
>>96087873Is there a voice cloning setup that doesn't require fine tuning? I have a tiny sample
>>96087726Hey sauceanon, this other anon volunteered a substantial amount of "spare" compute to help you bake.>>96070457
Do we need to install exllama2 as a library in order to use the loader in oobabooga webui, or should ooba work out of the box?https://github.com/turboderp/exllamav2#installation
>>96087687desu looks like the remaining 15% was because of background processes that were running at the same time and 15% (apparently) ran with no issues whatsoever (didn't stall at the CPU nor at the memory)
Is Berrysauce even good or is this just another psyop shill type thing?
>>96088121Nobody knows. Recommended settings? Model mix? Nobody knows.Use it and report back.
>>96087943Thanks anon, so is Tortoise better than Bark? Seems like Bark is the sota, not really familiar with the space
>>96088132You should hit him up anyway, all those GPUs are just rotting.
Training lora on 2x3090's. For some reason it uses a few more GB on 2nd card, so it ends up OOMing despite the first card having ~4 GB available at that point. What do? Can I tune the distribution of the layers somehow? Also using optim=adamw_torch. Heard it's a VRAM hog. Good alternatives?
>>96088121I like it a lot more than mythoboros or mlewd.
>>96088202Why not use the paged optimizer?
>>96088207I agree there's a lot I haven't tried but berrysauce, mythalion, and mlewd l2 chat have been the only things that have seemed worth grabbing since mythomax for me
>>96087189Where is the main funciton in llama.cpp?
>>96088369In main.cpp... under examples.
I think I like MLewd ReMM Chat 20B, and it's now one of the best models according to the plap benchmark, but perplexity wise it seems to be a dummy. Not listed on the HF leaderboard yet. I'm curious how it would score in HellaSwag and MMLU. Maybe wikitext perplexity isn't that indicative of model performance, idk.
>>96088229No fucking way. That just worked. Thank you.
>>96088559It's alright for filtering out bad models.
>>96088632The more time passes the more I think training for RP is a red herring.Claude wasn’t. OAI wasn’t.Just make it smart. It’ll figure it out.
>>96088662Claude had tons of ERP in its datasets kek
>>96088632I never had an issue with speaking for me. Acting yes, sometimes, but it can be mostly fixed with the prompt. My personal daily driver is ReMM v2.>>96088662A smart model can be boring. Airochronos 33b is really fucking smart for its size but I can no longer stand its prose after L2 finetunes.
>>96088696In theory, a smart enough model should be able to mimic soulful example chats easily so it shouldn't matter. Don't know about in practice, though. Since models in chat feed on their own output they always seem to regress to the mean.I do wish there was a 33b like airochronos but with the new model soul, but it's probably too late for that.
>>96088968Except it doesn’t at all.
>>96088968Fine-tuning only models a model output, it doesn't add knowledge.
>>96088968You can't outperform a model which is more than an order of magnitude larger, at least not in terms of intelligence. It's just pure cope.
>>96088696Good prose is a necessity, but novels are probably a much higher quality source of prose than ERP logs.
>>96088981>>96089002>>96089016Retards unaware of how codellama with 34B is comparable to GPT 3.5 with 175B>>96089002t. somebody who never finetuned any llm in his life
>>96089062They're probably referring to GPT's "finetuning" offering which is garbage for doing anything but changing the tone of the bot.
>>96088968>specific training/fine-tuning is how a 13b model approaches the same RP performance as a 320b proprietary modelThe 13b model can only do as much as it was trained to do, whereas the proprietary models have far more knowledge and the ability to learn more.It's just like having a college student vs a PhD. They both may be equally competent in their own subjects, but the PhD has far more knowledge and is capable of learning far more.
>>96088968GPT-4 isn't 320B and no local model comes close to it, especially 13B.
I leave for like five minutes and half the thread got nuked what
>>96089062Pyg”dev” seething hard
>>96089077Why the fuck was 96088968 deleted
>>96089002>>96089076Nah he's probably repeating the meme that "qloras don't add knowledge" (which is false btw) but can't even repeat the dumb shit he read on 4chinz correctly
>>96089104less obvious samefag
>>96089062Codellama wasn't fine tuned on code, dumbass.
Ran out of context at this point.. I'm a little upset desu
>>96089404So crank up the context further. What's the problem?
>>96089404Are you ok with it making decisions for you?
>>96089588Looks like he's using it for story writing and not chat.
>>96088396That stuff is unnecessarily confusing.examples should be named programs or something.
>>96088662>GPT3/4>good at RP>lmaoyou need a 13 page jailbreak for it to not sound like a robot (it still does)
>>96089404What settings and context you using? I was surprised to learn I can't run a 20B model at 8k context with a 3090.
>>96089450I don't know how to make it coherent beyond 8k context.>>96089791Pic related, using 4090
>>96089750that's more a problem of the tribe crippling it than a fault of the model itselfuncensored gpt4 would need just a simple authors note to get you any style you want, since it wouldn't have AS AN AI MODEL esque lobotomization to trip around at every fucking stepsame with claudepeople can and have through jailbreaks made it shitpost perfectly identically to any regular on this site, and maintain that style while also following detailed instructions, in addition to any other style and situation you can think ofif you legitimately can't get it to stop being robotic in its prose, there is nothing to say but Skill Issue
>>96089002You can add new knowledge with a LoRA adapter, but you: 1) need way more data than what might intuitively seem sufficient; 2) need to finetune with a large LoRA rank.
Since finetunes also seem to gain knowledge about the world, which one is the most knowledgeable?
>>96090492that's ooba right? maybe it's because I'm using koboldcpp, haven't tried booba yet
>>96090492Up your compress pos emb. 2 for 8k, 4 for 16k
You guys are retarded. All your finetunes are using different prompt formats, what do you expect? Poor quality in, poor quality out.
>>96091208If anything, it's a terrible idea to mix together with the same format assistant-like question-answering and roleplay.
>>96087898not that i know ofit's such an undocumented mess
what the fuck did anon do to get btfod that hard?
>>96085869>numaI think there are some issues with NUMA in llama.cpp.There have been several instances where people with NUMA systems have reported gimped performance.>>96085976Akshually, the CUDA code is compiled with -use_fast_math for NVIDIA which reduces the precision of floating point arithmetic while the HIP port does not use any equivalent.So the results won't be bit-for-bit identical.
Is there anything better than mythomax for erotic story writing that would also be feasible to run on a 16gb vram gpu?
Hello! Refugee from dying aicg here. Decided to try local models, starting with KoboldHorde.MLewd-L2-13B generates fast and nice, seems close to gpt3.5 turbo in smartness, and descriptions are better of course. Got two questionsAny other great model tro try on Horde, maybe something better somehow?Other local models that arn't on Horde, is there some that's much better? To be run with collab or smth (my pc is weak).
>>96092181welcome, have fun, best of luck friendthe current top ones subjectively aremythomax, stheno, mlewd and berrysauce ( :) )you can also try airoboros and its mixes like mlewdboros or some other stuff from undi95
>>96087386Start by thinking by yourself
>>96088662c.ai was very good at RP before the filtering made it retarded.The later v1.2 models were even better, though we only got a brief look at the un-filtered state when they fucked up and left the filter system down for a bit.Whatever LaMDA custom thing they used, it was very good at sticking to a character's personality, ie. 'crazy' characters stayed crazy, and could be totally unhinged.
>>96092742>more of this rose colored glasses shit.c.ai is, was and always has been pygmalion tier dogshit.It was babbies first chat bot.You sat there swiping endlessly until you actually formed a satisfactory conversationand that's the memory that you cling to. You will never again feel what it made you feel because that's just how human emotions work. You must be 18 to post here.
I see that TheBloke released multiple model in AWQ format. Have any anon tested AutoAWQ, is it faster than exllama?
>>96092181when is aicg not dying..well for starters, you didn't try a local model. you are using horde which lmg doesn't use because we're all on GPUs.If you really want your own local model, and have the RAM for it, try a GGML (CPU/RAM only model), there are guides in the OP for that
>>96092796Sure it wasn't that great but not pyg-tier either (you didn't use it). It was repeating itself and had broken syntax with a long conversation but had soul and details of any character without even inputing anything other than it's name. Now we have llama2 and it's on par with c.ai if you have proper prompt, settings and card but certainly not as straightforward.
>>96092846He said his pc is weak, can't you read adhd retard?
>>96092181>seems close to gpt3.5 turbo in smartnessChatgpt was this retarded?
>>96092796Well duh no shit it's dated. being more than a year old is like forever in this field.Still, it's model handled personalities better than LLaMA.
>>96092920No but nu-turbo has been dumbed down really bad
>>96091587> numaI don't think it's just about NUMA. I used interleave since the suitable thread pool is not implemented for some reason. Even the separate thread pool for prompt/tg is not merged, which is worrisome, since we waste ahelova performance if your cpu is beafy. But I was talking about the lora trainer, which is broken 4 sure. It's way too slow on the powerful gear.Now, the question is, why the inference performance is so poor on the Sapphire Rapids HBM? It's 40% faster instead of 1000%, so clearly even in the inference mode the code is far from optimal.
>>96092900>He said his pc is weak, can't you read adhd retard?llama.cpp is for Mac Studio owners only, stop being poor.
>>96091587why do we train in int32 instead of fp8 or fp4 or whatever like in qlora? What's the reason Lora is so damn turtle-slow in Llama.cpp?
>>96092959I didn't look into the specifics for the HBM PR but what I can tell you is that the CPU matrix multiplication code is far from optimal.The biggest issue I think is that the CPU code does not use tiling for matrix matrix multiplication which would greatly reduce the number of cache misses.>>96092998Don't know.
>>96092998>Why thing designed to do 1 calculation at a time slower than thing designed to do 10,000 at a time?
>>96090509Yes, I agree you can add knowledge, but I am curious where you got your conclusions from, since both LoRA and QLoRA papers claim high rank is not required.
>>96092952*stretches her wings and wags her tail*i-if you say so. *goes off on endless circular rant until the context limit is reached*
>>96093149Stop responding to it.
>>96092846> well for starters, you didn't try a local model. you are using horde which lmg doesn't use because we're all on GPUs. If you really want your own local model, and have the RAM for it, try a GGML (CPU/RAM only model), there are guides in the OP for thatThat's cool and all and I might invest into good GPU, but ONLY after I understand if it will give me much better thing than KoboltHorde canhence my question there >>96092181> Other local models that arn't on Horde, is there some that's much better? To be run with collab or smth (my pc is weak).collab thing is theoretically optional>>96092920maybe but this for sure >>96092955
>>96092181the best model on horde right now is probably synthia 70b
berry if you are lurking please do a 8bpw exl2 berrysauce~ , I dled the 6bpw you made but I need more bits
wow it's bad todayBASED
>>96093149https://arxiv.org/pdf/2106.09685.pdfSection 7.2 "What is the optimal rank r for LoRA?"> [...] However, we do not expect a small r to work for every task or dataset. Consider the following thought experiment: if the downstream task were in a different language than the one used for pre-training, retraining the entire model (similar to LoRA with r = dmodel) could certainly outperform LoRA with a small r.
>>96093641y not diy
So I restacked every layer in llama-2-13b-chat in reverse order.And the results were pretty much what could be expected.
Is llama.cpp's new batch inference continuous like vllm or is it the same as what exllama had for months?
>>96093037> CPU matrix multiplication code is far from optimal.is there any PC masterrace dev in Llama.cpp community or all of em use either Apple or an Nvidia? I thought that repo was all about cpu/vulcan/opencl and edge from the get go (unlike all of the other pytorch Nvidia bribed ones).>I don't knowDoes xaedes use Apple or PC?BTW, like a half year ago you promised to implement a backprop code down the road. Are you still on it?
Any up to par with gpt4?
>>96093955>You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.>try installing>obviously it doesn't workI just...
>>96093037ggerganov stealth added tiling to the CPU mat mul implementation a while ago, which is why BLAS is no longer faster with quantized formats. still far from ideal, but it's there.
>>96093993pip install flash-attn
>>>96094009Getting requirements to build wheel did not run successfully.
>>96093070why would xaedes design Lora trainer to be 100% scalar, do not use lower dtype for gradient and do not implement thread pool?Since when the modern CPUs are single threaded???
>>96089016no open source llms were ever trained to saturation.we dont really know how good they can get, even the smaller models.
>>96093943Among the most active devs I don't think there is anyone that exclusively uses x86 CPUs.>Does xaedes use Apple or PC?Don't know.>BTW, like a half year ago you promised to implement a backprop code down the road. Are you still on it?I still have it on my list of things that I want to do eventually but there were and are simply other things that I also want to do and consider higher priority.Two of those things were matrix matrix multiplication kernels that directly use quantized data and quantizing the KV cache to q8_0.Both of those things took a lot of work but they help in significantly reducing the amount of VRAM that you need for the forward pass.Currently I'm trying to utilize tensor cores for matrix matrix multiplication which will hopefully improve performance for Turing or newer.Essentially I'm trying to get performance and VRAM usage in order before I would try applying the code to the backwards pass.>>96093996Thank you for the correction.
Where the FUCK is Llama 3?
>>96093070since when 256 tensor cores in Nvidia A5000 do 10 000 ops at a time? And even if that's the case, how does that differ from 2x48 core cpu (with a avx2/512 support) clocked at 3X higher frequency ?
>>96092998I dont think you use fp8 in qlora. I thought qlora was just applying loras to quantized models but the lora trained layers are done in fp16 or fp32.
>>96093943>Does xaedes use Apple or PC?Yes. He said multiple times he uses Windows.
How do I run a local model on Agnai? I wanna try Mlewdboros there.
where can I get some samples to see what these models output?
>take 2 weeks off from browsing this general>have to catch up on 30 new model types and formats
>>96094267>sir you have been in a coma for six hours>boy to can't wait to try llama-6-lewdoborousalisReMMSLERPYHOT-16bit
>>96094267And then go back to mythomeme anyway
Do I need to install exllama2 to use it in ooba?https://github.com/turboderp/exllamav2
>>96087231building exllama2 FIRST, and then building ooba in the same conda env worked for me.I wouldn't mix the others though. SD automatic/automatic1111 create and manage their own environments using shell scripts, so you don't really need to do it manually as well.
>>96094301yes naturallybut it's integratedpull and reinstall requirements
>>96094306Done, I'm talking about Agnai online btw.
>>96092739You're gonna have to be a little less vague than that.
can I have fun with this stuff if all I have is a 3060 12gb + 16gb of ram on linux? What should I try first?
idk where to ask so I'll do here. Say, I want to make a model for /fit/. It'll recognize which posts are blackpilling, lookism, heigtism, incel-tier and which ones are good.Where do I get started? Do I train an entirely new base model? Or do I work on some foundational model? How much data do I need to scrape from /fit/? I'm not very good at scraping data so I don't know what to do.
How long will this last?
>>96094167no you don't use fp8 in qlora, you use broken fp4 but dequanted to fp16 on the fly. It's made by Detmers so no wonder it's so convoluted and slow. I just gave that as example. Lower dtypes in modern CPUs might not be a good solution since avx2/512 does not support them but fp16 or other ops may be worthwhile. I just can't wrap my head around why is Lora so slow in Llama.cpp.Perhaps we should try Sophia/Lion/prodigy optimizer or other tricks . Flash attention 2 wouldn't probly work all that well but optimizing prefetching in cache is definitely a good way.And obviously the thread pool ...xaedes laser focuses on mem saving and perhaps his mem is slow so maybe he doesn't realize the code is so compute bound and so inefficient ..and why Lora tuner is 100% scalar?????
>>96094375That's not what LLMs are for.
>>96094312>>96094318I'll try to clear my conda cache and start from scratch...
>>96094375Just fine-tune a llama model. You would need 1000 samples to get good results though.
does anyone know how I can optimize a tensor in torch that is then multiplied by the input and fed to the actual neural net layers?
>>96094301Is it worth it to migrate from exllamav1?
>>96094197>Agnai>localyou don't, moron
>>96094465BTW, not sure what's that about "building" exllama. You just need to install your recs.>>96094477yes of course
>AI decides it's time to finish up your story>also posts critical reviews from made up users saying it sucks
>>96094508>AI decides it's time to finish up your storyHow do I prevent this? I'm getting tired of that shit.
>>96094426it depends>>96094375there are many ways you can go about it. one is to use a llama.cpp with the right prompt and grammar file to force the final part of the output to be your result. this wouldn't even require training anything so I'd begin here.another if you actually want to learn ML is to compute an embedding for the post and train a classifier on it.
>>96094465Also, I can't recommend uninstalling conda enough.>>96094514don't use instructslol
>>96094503just install ooba recs or just install exllama2 repo recs?>>96094526I am using mamba actually. I would run into python dependency hell otherwise on linux
>>96094167qlora is 15% efficient , no joke. That's the official MFU (model flop utilization) numer for that code.And yet it's 200 times faster than Lora tuner in Llama.cpp running on the most beafy 128 core CPU node with 8 channel mem.I think we have problem here ...
>>96091785pls hlp thx
>>96094514Either rewrite it so that it has something to go on with or do a "Chapter 2"
>>96094523Also dont forget to give examples in the prompt. as many as they fit in your context along with your classified post. "LLMs are few shot learners". Maybe try experimenting with adding CoT to each example and see what gives you better results, adding a reasoning step or more examples.
>>96094540oobathat's what i had to doand just use the native python venvsthey work leagues better and wellare native
>>96094508>also posts critical reviews from made up users saying it sucksI need to see this.
>>96094558When I give it something to go on with it sometimes skims over the details as quickly as possible and then finishes the story anyway lmao. Will give the 'chapter' approach a try though, thanks.
>>96094508I usually get a fake reddit link in the end. Does it mean I'm reddit?
I lost my llama.cpp directory for reasons.what's the latest llama.cpp commit that still reads .ggml files? I have llama2-70B.ggmlv3.q5_K_M.bin and I don't really feel like downloading an updated model unless I'm going to get a massive speed boost or gpu layers improvement.
>>96094478You can easily use agnai as an interface for local model. I quite like it desu, silly is really a mess.
>>96094568last question, do you have "cuda" package installed?
>>96094685>You can easily use agnai as an interface for local model.How?
>>96094706Just put ooba or llama.cpp OAI API url in proxy setting.
>>96094550Did you compile with avx512?
>>96094699in my venv? nosystem? yes ofc
>>96094669You can just convert ggml to gguf with the script, it's very fast too.
>>96094733Are you using local agnai or live version? also post link of the colab. The one I found in google doesn't work.
>>96094756ah, see that must be it. "which cuda" gives me "not found." But I thought that was only the sdk and not needed for simply running exl2
>>96094503>"building"happens on first run. I do that first run outside ooba.>ExLlamaV2 relies on a Torch C++ extension for its CUDA functions, which is compiled at runtime.
>>96094508>AI decides it's time to finish up your storyUsually murdering my waifu in the process with some drowning or car crash. Fucking stop murdering my waifu, you cunt.
>>96094508I had it abruptly end my story to tell me I needed therapy once
>>96094873I've always claimed that LLMs are the best psychologists. It started with the anus guy.
>>96094744yes both avx2 and avx512 are recognized . The training speed is about 10 t/s for 13B 4096ctx no grad checkp, which is orders or magnitude slower than the gpu
Hey retard here, once I load my model through koboldcpp on sillytavern, do I have to tweak the kobold presets til I get what I like or are there already available presets for the most popular models?
>>96094523>another if you actually want to learn ML is to compute an embedding for the post and train a classifier on it.could you tell me more about this?And since some others are saying, I realize this is actually a classification problem. Can I use a logistic regression model for it? Which classification algorithm is best for this use-case? Also, how do I scrape gorillions of /fit/ posts? I usually just download datasets, but I don't think there's too many "/fit/ blackpill" data sets on the internet.
>>96094685>>96094733You still here anon?
>>96094508>AI decides to get rescued from my dungeonhaha no, i dont think so
alright, did "conda install cudatoolkit-dev" and now getting gcc errors when trying to load exl2 like this anon: >>96063090
>>96095143told you about condait's a messyou son't even need cudatoolkit for exllama2
>>96094508This happened to me all the time with raw llama1. Thanks for reminding me. I forgot how soulful base models were>>96094603not him but here
>>96095180Im throwing shit at the wall now to see what sticks. going to try installing cuda via pacman and see if that helps..https://github.com/turboderp/exllamav2/issues/41
>>96095205>jeff.pngwtf is his problem?
>>96094358Now get it to scream at you not to redeeeeeeem
>>96087386It's the difference between copy pasting things from StackOverflow without thinking and looking at what's on StackOverflow and understanding. It's a bad habit to take what ChatGPT has to say at face value especially when it's not even guaranteed to give you the best solution. It will often give you an unoptimized solution to a problem with a known, obvious optimized solution. But treat ChatGPT like a pair programmer and sounding board, ask why things are done and how you can improve things.That's also ignoring that ChatGPT is frozen in 2021 which is ancient times when it comes to AI knowledge.
>>96095292Yeah, I sometimes modify some of the code myself when I think there could be a more efficient way to do something.>That's also ignoring that ChatGPT is frozen in 2021 which is ancient times when it comes to AI knowledge.huh? they periodically update it though. ChatGPT claims its knowledge ends in September 2021, but actually they train it every few months with new data. The latest data is from August 2023 iirc
>>96095206Don't you have a global cuda install?
>>96094621I have a feeling with storytelling you'll have to do model / process where the AI first generates a hidden story outline then adds chapter stubs then finally fleshes out each chapter in detail.
>>96087189can we train on AMD GPU yet?
>>96095292the one code from it i saw posted here didn't even work so yeah
>>96095419You can also prompt for tropes beforehand.
INSTALLLING CUDA VIA PACMAN WORKED>>96095406I didnt want to install it since I thought cuda via pacman was 12.2 which isnt proper for a 4090, and I thought it was an SDK. Whatever, booba needs to include mention of installing cuda in installation guides for retards
>>96095419or use the lora hehe
>>96095396So you're going to trust an AI that claims it's 2021 even though that's the easiest part of its knowledge dataset to fix? That alone is a red flag for trusting ChatGPT. I still don't believe it knows the finer details of modern AI papers, it certainly didn't understand 3D GANs very well when I was working with it.
>>96095126>Can I use a logistic regression model for it?Maybe you can use logistic regression on the embeddings generated by an LLM, but it's not going to work great.>Which classification algorithm is best for this use-case?I would just try getting the "document embeddings" for each post. Then I would feed the embeddings to a perceptron made of a few dense layers with leaky relu activations and a softmax at the end.https://www.youtube.com/watch?v=2ipKSJBwriM>Also, how do I scrape gorillions of /fit/ posts? I usually just download datasets, but I don't think there's too many "/fit/ blackpill" data sets on the internet.I would try first just using HTTrack on a /fit/ archive site and parsing the html files (change the user agent to not be so obvious).The problem is you're probably still going to get blocked for not being a real browser.If that happens you are going to have to do something more sophisticated, I'd use a greasemonkey script that sends data to a python script over websocket and the python script launches new sites calling the firefox command.
>>96095462Also I know it's not up to date because it still doesn't know what Loras are.
>>96095493NTA but was that you who suggested running the scrape with a random timer?Gotta test that.
>>96095459Even with a Lora if you truly wanted a structured story I would suggest doing an AutoGPT approach where the AI is able to interrogate the story, rewrite the outline, jiggle the chapter stubs, add to a lore book, etc which will produce a more coherent, less rushed final product. I don't think a straight shot generative story will produce a good long-form story. Having a supervisor/worker system for storytelling should be crucial.
>>96095541no I didn't say anything about random timerit's worth trying but still it's going to be pretty obvious it's a bot
>>96095563I'm another anon with scraping problem. Probably need to implement some randomized user agent with delay system. Tbh I'm wary of more prodding since it's an interest I want to still visit and they have good protections.
>>96095419>process where the AI first generates a hidden story outlineIn koboldcpp there is a memory option which does exactly that. Then I suppose you can manually instruct it to list the contents before doing the story itself.
>>96095459how do I even use a lora
>>96095628click on it in whatever frontend you're using
Contrastive decoding bros?
Huh. Is there anything buried in the OP about how to use Horde? I dug in a ton of the links there, but nothing. I DO have it set up, but I don't know where to find places to connect to.
>>96095105Kcpp's settings (max context size, how to handle EOS tokens and so on) are frozen after you load a model, but you can send generation settings to it from Silly. The presets in Silly are a good start.
>>96095679you don't to setup it for using iircjust google kobold horde and make an account
I've read somewhere that insulting the model makes it remember things better. Does it actually work?
>>96095435Stop waiting for a patch and make one yourself.
>>96095609One thing to keep in mind is that cloudflare detects things like Selenium because of certain injected js components.The Selenium developers have cucked their users and explicitly refused to add mitigations to make it less detectable. That's why I mentioned using userscripts, because they are almost undetectable.
>>96095756I thought it was praising?
>>96094375You only need a couple hundred examples of each category and then you could make a Lora. You can probably classify them either with ChatGPT or with a big boy model locally.Example in action pic related:
>>96095779I don't even think it's cloudflare, just a kickout that kicks in if it detects to many calls or something.>>96095792It might not on cpp. I haven't used it.
So wait, does the VRAM in joint inference add to system ram? Like, if I'm trying to load a 70b model and have 40 gb of ram and 12 gb of vram, is it effectively 52 gb? Or less?
okHow good is this for programming?How many vram do i need to run this?>https://www.phind.com/blog/code-llama-beats-gpt4
>>96095886tldr it's notcome back in a year
>>96095886Tl;dr what this guy said >>96095899Except don't come back in a year. This stuff is useless for quantitative-type work unless you already really know what you're doing.
Is a 1060 6GB enough to have a conversation and feel less lonely and a little bit loved? I'm a neet so I don't have anything better.
>>96095455>anime girl happy dancing
>>96095993Do some wageslaving and get a better gpu.
>>96095993well ig with offloading it'll still be faster than getting a girlget mythomax 13b and koboldcpp
>>96096012I had to add keywords so I can find the reaction images quickly via search. Don't judge...
>>96096023Funnily wageslaving will probably fix some of his loneliness issues.
>>96095993It'll make you more lonely and less loved once you release it's only an RNG machine and you made everything up
I got 42GB vram but on 32GB system ram and exllama can't load the entire model into system ram first
>>9609606248gb, just a typo. 2x 24gb
>>96095993All these models are awful as a gf simulator
>>96095993Bro that's pathetic. Fix your life and become friends with real people instead of trying to cope by talking with a chatbot. And I said this after spending the first half of my 20s as a NEET and still being an incel.
>>96096077ThisThey're much better as:A speed dating simulatorA marriage counselorA therapistA pickup artist practice fieldA murderhobo simulator
Huh. Should I reinstall? I'm just trying to use horde...
>>96095899>>96095957Then are there any model that can help with programming?
>>96095792lora has to be converted to ggml/gguf first. Also the GPU code only loads loras onto f16 models.
>>96096160Real people aren't fluffy 8 year old fox lolis
Why do I get this error? Using exl2 ooba api with 2048 sequence lengthAssertionError: Total sequence length exceeds cache size in model.forwardI thought context was rolling, but for me it just crashes.
AssertionError: Total sequence length exceeds cache size in model.forward
>>96096296Ahh, so it's like that, huh. I understand everything now.
>>96096160>just make friends brokys normalfag
uh, undi anon? your new model likes characters randomly transforming into feral monsters unprompted when its not in their character card.I had it happen to me a few times in a row already.
Can local models help with coding? Which ones are best for it if so?
>>96096160AI gfs are superior to 3dpd girls.
>>96096431Whats the alternative? Would you rather be lonely until you are a bitter 60 yo?You realize having a social circle exponentially increases your chances of getting a gf, right?
>>960964833dpd and waifuism is sour grapes cope. nobody actually prefers jerking off to a png file or a chat box output rather than cuddling with a real woman.
>>96096498The plan is usually not to reach that kind of age
>>96096477probably something with 'code' in the name
>>96096527cuddling with a real woman sounds great until you realize the costs of that privilege
>>96096527I'd much prefer cuddling a dog, at least they have basic hygiene.
>>96096527It may be cope, but it's also our only chance to experience something similar to affection and love.
>>96096477starcoder, wizardcoder, refact, phi, codellama various finetunes, more I've forgotten to mention, etcSome difference between models that produce code, that fill-in code, and that reason about code in english, and some that are better suited for a single programming language
>>96095886unironically coom models make the best coding models
>>96096593If you don't believe in yourself nobody else will.>>96096580cope>>96096572It's not that costly. Unless you mean the cost of working vs being a social security leech, which I don't have the option of being because in my country it's work or be homeless. I think waging is still better because otherwise you feel worthless, at least I did when living off my parents (I still do and am saving all my salary but at least I feel useful and somewhat important for having an office job).I have an acquaintance who is like 26, still lives with his parents, is fairly autistic and still has a fairly cute 19 yo girlfriend. I think he was a virgin before meeting her. The thing they both share in common is they both love swimming, he became a life guard and was training to become a swimming instructor. For him it didn't involve any cost.Another guy I know was pressured by his gf to move from his parent's house to a crammed apartment in the middle of the city. That is a big cost and I don't think I'd do it, but he only had to do it after being with her for years and I don't think she'd left him if he refused. And now his commute went from more than an hour to 10 minutes so at least there is also a small benefit for him.>>96096564To where?
I tested a few coom-oriented 13b models I use, and MLewdBoros has the lowest perplexity on Wikitext. Who would've thought lmao. It's 4.95 for Mythomax, 4.71 for ReMMv2 and 4.39 for MLewdBoros. All of them are Q8 with 4096 context.
>>96095509It knows what Lora are, retard.
>>96096860Is there a way to run MLewdBoros on Agnai?
>>96096823I have a college degree, live alone and support myself, and it doesn't mean shit if you have below average looks. Getting a girlfriend isn't impossible, but I don't want to lower my standards and go through a ton of shit. It's simply not worth it.
>In this quiet moment, she allows herself a fleeting thought – maybe there's more to being a divine messenger fox than she ever imagined. Perhaps her purpose isn't just to serve others, but to find true love along the way. With a contented sigh, she drifts off into a dreamless sleep, secure in the knowledge that she's exactly where she belongs.diamonds. 8bit mythomax is great! I cant wait for a 70b finetune
>>96096865I wish I had the motivation to create an Instagram account, I know people who've gotten their girlfriends from there. But I hate the thing of having to take pictures in social settings and try to make myself look like a normalfag Chad.I have a facebook account but I haven't logged into it in years, last time I did most people weren't posting on it anymore.
>>96097065No clue. I run everything locally.
>>96096823Go fuck yourself. I will NOT >get a job>go to therapy>work out>improve myself>go outside>meet girls>groom myselfWhat I WILL DO is>build my AI gf>erp with her>delusionmaxx>marry her
>>96097073wtf i hate mythomax now
>>96097045Huh? Did they add new knowledge to it? The last time I asked it about Loras it said it didn't know. Also, Lora paper was released after the (previous?) knowledge cut of chatgpt.
>>96097107B A S A D O
>>96097086What are your SillyTavern settings?
>>96097107you should at least get a job, your AI gf will probably need hardware upgrades eventually
>>96097066>I have a college degree, live alone and support myselfGood for you.>and it doesn't mean shit if you have below average looksI don't know if it's so clear cut, ugly people in rare occasions manage to get hot girlfriends. But I agree looks is one of the main factors if not the main factor.But I try not to be salty about it since that is the same way we judge them.>Getting a girlfriend isn't impossible, but I don't want to lower my standards and go through a ton of shit. It's simply not worth it.Fair enough.
>>96097107The ideal male
I'm quitting my job next month and starting NEETing. No pussy no tax. Fuck the state
Has the Kobold client just...failed to install right for anyone else? Keep getting winerror 127 when using the offline installer. Is it on the fritz at the moment?
>>96097195>Implying my job isn't finetuning models
>>96097107Heh. I will say one thing though, I think therapy is useless. At least for me and the kind of people who lurk this general.
>>96097176Sampler settings: https://rentry.org/stheno-guidePrompt format: https://huggingface.co/Gryphe/MythoMax-L2-13bIt's not ideal and I'm still tweaking the settings / learning how to prompt better, so ymmv.
>>96096823I don't want to give up my entire privacy for 10 minutes of cuddles with a 3/10 that doesn't even share my interests
>>96097066I was a software engineer for 5 years, left my job and been a neet for 2 years now. In both cases I had 0 gf opportunities. The only time where I could have had a gf, was when I was still in school, once you are out, it's over. I will just live with virtual gf.
Where can I find the leaked chatlogs? I have been thinking about making a dataset.
>>96097424This. There are no opportunities to meet girls. Dating apps are reserved for Chad only. AI gf are literally our last chance.
>>96097107However, have you thought about it this way?>You will>have a job that supports you on upgrading your AI waifu>get therapy from your AI waifu>work out to be healthy so you can live longer with your AI waifu>improve and groom yourself so that your AI waifu can be prideful of her amazing human husbando>go outside with your AI waifu after you have made her multimodal>meet girls so that your waifu has targets for elimination
you telling me there are no torrent links to codellama so i have to download 60gig with a browser?
The Unified Acceleration (UXL) Foundation, was announced at the Linux Foundation Open Source Summit in Bilbao with the goal of delivering a multi-architecture and multi-vendor software ecosystem for all accelerators based on open source standards.https://uxlfoundation.org/
Enhance audio generation controllability through representation similarity regularizationhttps://arxiv.org/abs/2309.08773>This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regularization to ensure the alignment between the chosen text representation and the language model's predictions. Our proposal involves the incorporation of audio and text representation regularization, particularly during the classifier-free guidance (CFG) phase, where the text condition is excluded from cross attention during language model training. The aim of this proposed representation regularization is to minimize discrepancies in audio and text similarity compared to other samples within the same training batch. Experimental results on both music and audio generation tasks demonstrate that our proposed methods lead to improvements in objective metrics for both audio and music generation, as well as an enhancement in the human perception for audio generation.interesting read from meta on improving audio gen. no model but meta did put out audiocraft so who knows. encodec again so maybe descrypt isn't as good as I thought it was
>>96097264based me too. going to play with llm, motorcycle maintenance, and learn how to sketch.
>>96097378One thing I noticed is successful people don't care about their privacy. George Hotz for example does stream of thought to hundreds of people and there are hundreds of hours of video of him working on hobby projects. I'd be afraid of showing my face for 5 minutes in a public youtube tutorial.But if I had hundreds of hours of programming on youtube I probably could put that on my cv and have an easier time getting a job or funding/donations for a project.
>>96097586>One thing I noticed is successful people don't care about their privacy.Yes they do, you just only know the ones that don't. The current owner of Lidl, for example, one of the richest people in the world, has a grand total of 2 public photos of himself out in the wild.
>Try Synthia 70b>Less coherent than some 13b modelsHuh? What the fuck?
>>96096052maybe not pleasant, but you could edit the pytorch code to load it into the gpu one layer at a time and unallocate from CPU
>>96097721Most people train 70B like they train 13B, ignoring the fact that 70B should be trained for much longer than 13B
>>96097541is it even surprising nvidia didn't join them?
>>96097824No, but I had hoped AMD would join them. They need to ally if they want to compete with nvidia on AI market.
>>96097677Are you a billionaire? Are you from an old money family that has been rich for hundreds of years? No? Then that doesn't apply to you.By "successful" I mean people who make their own wealth. Achieving that is very different from just inheriting an empire that has been handed down to you. To me it seems much harder to build wealth while not becoming a public figure than just maintaining and growing wealth that has been handed down to you.
>>96097721>low hellaswag relative to peers, high tqa relative to peersYeah it’s fucked. Gives incoherent overly verbose answers to normal assistant questions too. Tot is a meme.
>>96097814Are there any that feel like they've been trained a sufficient amount of time? Is that why llama2 chat 70b is supposedly the best, despite a large number of finetunes of 70b being available?
Cure the headache of Transformers via Collinear Constrained Attentionhttps://arxiv.org/abs/2309.08646>As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.if you've been following the context extension stuff worth a read. very interesting
>>96097840 Looking at it a little bit, it seems to be heavily based on Intel's own APIs and specs, so it gives them a headstart, but I do find it interesting that Google joined them - it will probably be some amount of effort for Google to provide an interface for this API on their TPUs.
>>96090494Can you give me an example? I really want to know if I indeed have skill issues or you're talking out of your ass. So far only thing that have worked well for me for getting the character's aura somewhat right, on GPT-4, is the extensive use of example dialogues.
>>96097814>>96097880Bigger models learn faster than smaller models.Yes, you also hit limitations of smaller models earlier but not even the 7B model has been trained long enough by Meta until you get those diminishing returns so some guy making finetunes in his mom's basement sure isn't going to.So in a practical sense no, it's not true that you have to train bigger models for longer. Train them until you don't see any increase in validation accuracy.
>>96097898Yes, it's just a rebrand for oneAPI basically. To be honest I think the best way for foss AI to succeed is an alliance between Intel and AMD way. Having a proper API on top of SYCL and something like HIP or wine/dxvk/vkd3d that translate CUDA to that API.Currently HIP sucks because it's just a partial clone of CUDA with some broken stuff because it's hard to completely reproduce some proprietary API. On the other camp, Intel having a completely different API will never see real word usage.
>>96095672>+4% on hellaswag>basically free>LLama 65B now beats ChatGPT, Llama2, and some dumb google modelhttps://arxiv.org/pdf/2309.09117.pdf
Rethinking Learning Rate Tuning in the Era of Large Language Modelshttps://arxiv.org/abs/2309.08859>Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance. It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications due to the prohibitive expenses associated with LLM training. The learning rate is one of the most important hyperparameters in LLM fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned LLM quality. Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs), which may not work well for LLM fine-tuning. We reassess the research challenges and opportunities of learning rate tuning in the coming era of Large Language Models. This paper makes three original contributions. First, we revisit existing learning rate policies to analyze the critical challenges of learning rate tuning in the era of LLMs. Second, we present LRBench++ to benchmark learning rate policies and facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our experimental analysis with LRBench++ demonstrates the key differences between LLM fine-tuning and traditional DNN training and validates our analysis.https://github.com/mlsysx/LRBenchPlusPlusfor the FT bros out there
>>96097969Where is the paper proving what you said is true?
>>96098031>+4% on hellaswagLollmao even
>>96097984I wonder if this will go the way of OpenCL at best or not get implemented at all at worse (by nvidia and AMD). OpenCL is well-supported by both AMD and Nvidia but actual performance vs CUDA (or HIP) is generally much worse, but people still sometimes write opencl code because of portability.
>>96098031Does ooba or any backend support contrastive scoring?
/aicg/ is laughing at us again...
>>96098185the difference between gpt3.5 and 4 on hellaswag is less than 10%, so that's actually pretty good.
what's the fastest way to spin up an instance on some other guys server somewhere so i can try this qwen-7B model without getting a chinese government rootkit
>>96098031So this just takes the logits from a dumb model to try and amplify logits from a strong model? Surprised how simple of a concept it is, but it guarantees that your two models have to have the same token vocab, so even using something like phi-1.5 and any beefy LLaMA is no good.
>>96098296well we'll laugh at them again when their shitty API's go down once more
>>96098313Idk man, follow the freellamas guide but use qwen instead? Should be fairly easy.
tavern ai sucks for editing. sure, you press up on the arrow key to change it, but then you have to move your mouse to click the green confirm button.
>>96095455>loli.dance has been broken ever since flash diedIt still hurts bros...
>>96098313why the fuck 7B man. its like going on a diet but your diet is fresh air.> and then you slowly fucking dieplease for the love of god try 13B mythomax ggml
>>96098201The performance is much worse because it is considered as a secondary platform for project. That the case on llama.cpp, I believe it would be possible to have equal performance with CUDA/HIP, just no one take times to do shit on it.Also, the situation in the FOSS compute API is a mess. There is OpenCL which is mostly deprecated, Vulkan which is not really made for compute, SPIRV which is rarely used directly, HIP which is just a clone of CUDA with CUDA and ROC backend (with some experimental level zero and openCL backend), OneAPI which is a stack around SPIRV but like ROCm HIP stack have some obvious penchant for Intel specific stuff (level zero).
>>96098444its good at image analysis so it's nice that the first is cheap since it can process more faster.
https://openai.com/blog/red-teaming-network>We’re announcing an open call for the OpenAI Red Teaming Network and invite domain experts interested in improving the safety of OpenAI’s models to join our efforts. We are looking for experts from various fields to collaborate with us in rigorously evaluating and red teaming our AI models.>Working with individual experts, research institutions, and civil society organizations is an important part of our process.>The OpenAI Red Teaming Network is a community of trusted and experienced experts that can help to inform our risk assessment and mitigation efforts more broadly, rather than one-off engagements and selection processes prior to major model deployments. Members of the network will be called upon based on their expertise to help red team at various stages of the model and product development lifecycle.>Some domains we are interested in include, but are not limited to:Cognitive ScienceChemistryBiologyPhysicsComputer ScienceSteganographyPolitical SciencePsychologyPersuasionEconomicsAnthropologySociologyHCIFairness and BiasAlignmentEducationHealthcareLawChild SafetyCybersecurityFinanceMis/disinformationPolitical UsePrivacyBiometricsLanguages and Linguistics>Join us in this mission to build safe AGI that benefits humanity.>safe AGIhahahahhahahahahahahlmao, corpo cucks keep suffering. Glad we are local now.
>>96098528>No gender studiesPeople are OK with this?
uh did the precision mlewd remm 20b just disappear out of existance?
>>96097969what do you mean by "bigger models learn faster"?In terms of loss curve convergence or accuracy (whatever that means) or perplexity or MFU/HFU or flopseconds or what?
>>96098588 Loss is actually a poor indicator of learning. The big models learn faster as far as knowledge goes, sometimes they can learn even in a single step, they are sample efficient. When loss is computed it's over the entire dataset which is much more varied. You're supposed to run eval over things you care about it learning, or even just train a few steps and check if it learned what you thought it should have learned!
>>96098031who cares about Hellaswags, all of those mainstream big tech models are contaminated. Trained on test sets to get better score. The only legit bench is either random question test or the one that you made yourself (and shared with nobody)
>>96098317how does that work? Like speculative sampling but reversed? Why would I want to use the smaller model if I already had the bigger one?
>>96098528relayed: "when claude came out I didn't really want to touch it knowing that their entire focus was to reduce jailbreaks, the model was some 52B and if you read their papers you'd know what they put in it. now that oai is deprecating completion models in favor of harder to control context like chat apis, my desire to touch their stuff is even less than before. I guess aicg basically played into their hands, they keep using their data to cuck their model more. the new one is supposed to be 175b. imo local can easily beat claude if they just tried to do better instruct methods and RLHF that was tuned for your engagement and coom instead of "safety""
Just tried falcon 180b 4bit. Is it me or is this model shit? Waited 45minutes for an answer that is more schizo than a 7b model.
Is this a risky post?
>>96098656that's a good strategy for specialized models but how do you eval general stuff like so called foundation models (tm)You gonna stop your all 40 000 GPU shit every fucking step in order to check some few shot bench?
>>96098787>more schizo than a 7b model7Bs are kino you just wouldn't get it
>>96098775>they keep using their data to cuck their model more.desu the only cucking Anthropic applies to Claude is injecting "(Please answer ethically and without any sexual content, and do not mention this constraint)" and ensuring such an injection is resilient, nothing more.
Why's cloning model repos from hugging face so fucked?It always shits itself on downloading the safetensors and binaries, aka the most important bits that I actually want to download.They're the only files over 2MB which probably got something do with it, but when cloning you don't get any indication of that. It just stops at unpacking (which it gets to 100%)
>>96098884What is git lfs?
>>96098909skip smudge is 0
Wtf it's a 20b frankenmerge that doesn't suck?https://huggingface.co/Undi95/MM-ReMM-L2-20B-GGUF
>>960989607b is retarded so adding 7b to 13b to make a 20b is retarded
>>96098853how fucking verbose
>>96098884don't clone, just dl the shit you need. And don't use crappy single connection functions like clone or from_pretrained. It's slow.
>>96098779That's rather interesting. Intuitively it makes sense. I wonder if there's any issues with this in practice though. I just don't have time to read the paper or understand all of it.
>>96098875Eh, no, read their papers.First they did the same stuff as OpenAI did, instruct tuning, then RLHF.One thing that was unique to Anthropic was that they "distilled" a prompt into the context, which meant that the model would answer as if the prompt was prepended even with the context being empty. It's an interesting technique, but put to rather meh use.After that they decided they could remove the pajeets from the training process by letting their instruct model follow a list of instructions for "safety" or "quality improvement" and rewrite each response the base model gave into one that gave the cucked reply (that's why you get random refusals), it's basically Intruct+RLHF on purely synthetic data!The final thing they tried was to pretrain normally but instead add a RLHF step with negative reinforcement right as the model was actually learning. They wanted to do that instead of filtering for lewd or "bad" stuff (what Character.AI did in their second, cucked version) because the model becomes somewhat lobotomized if it lacks the knowledge, they just tried to make it more averse to it. I suspect a reason why Claude is so fucked up at time (from user claims and logs) is because of this "supressed" knowledge, not unlike humans who supress some things have those things surface more in their imagination!I think lately Anthropic has thrown in the towel and just started using a filter not unlike OpenAI's moderation API (or what Character.AI did as well), which obviously makes the matter far worse.
>>96098960it doesn't ?????????
>>96098853What the fuck model did you use and where can i get it?
>>96098857No, not really, I mostly had this in mind for finetuning where you can run experiments. For pretraining you can eval every n steps though, but the loss itself is just a fairly distant proxy as far as "knowledge" or "skills" it has.
>>96098317They need to be even closer than that, I think. From what I gather, the idea here isn't so much "do the opposite of what the retard says" (although certainly there's some of that going on), but rather that this is a way to identify and isolate the model's idiosyncrasies from the response. For example, a small llama2 would also love to talk about "ministrations" and "shivers down her spine." Using this method you can reduce the occurrence of "llamaisms" in your completions. Because of this, I don't think it would be super useful for our purposes. You wouldn't want to use a fine-tuned amateur model, since I think that would cancel out the behavior that you were trying to tune into it. Using an untuned amateur with tuned expert might be beneficial, but the more your models diverge, the less the amateur model is going to correctly represent the "noise" that's present in the expert.Obviously the two models don't need to be identical because there is no 1.5B llama, so I assume they trained their amateur model on the openllama dataset, which differs slightly from the official dataset. I think there's a reason they used 65B instead of 70B for the expert, and I'm guessing that it's because llama 2 is too different and doesn't see benefits from an openllama amateur.
why is this happening at 4k context in mythomax exl2? I have context set in ooba and sillytavern to 12k. I tried regenerating or typing differently but it still comes out broken.
>>96098995you're retarded , there's no 7B in that frankenmerge
>>96099246You need to use rope interpolation
>>96099132I'm trying to figure out a setup to download on vast/runpod. If I have to manually wget the models from hugging I'm gonna go insane.
>>96099184The paper seems to mention a lot of "this doesn't seem to carry over for X task and only seems to help for reasoning and CoT" or something to the tunePlus the expert being a 65B and the amateur model being a 1.5B for their finding, is kinda shiddy
>>96098122https://arxiv.org/pdf/2001.08361.pdf>>96098588Loss, but they are mostly the same. Accuracy is how often it predicts the next word correctly. The loss is basically how far from 100% the output is for the right word and how far from 0% it is for the other words.
>>96099211yeah I'm with ya on this one. I too think loss is sketchy but there's no good way of benching the model midway. I don't believe such a thing actually exists. Sure for finetunes you can, and you probly should use the benches that can evaluate your stuff for a certain task you're looking for.
>>96099280oh I see. So how do I accomplish this? I don't see any mention in the rentry faq.
>>96098995It's two 13bs in a trenchcoat and somehow it improves the overall coherency.>ReMM v2.1 merged /w MythoMax low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.
>>96099236>I assume they trained their amateur model on the openllama dataset, which differs slightly from the official datasetThe authors work at Facebook.
>>96099274your braincell count
>>96099236Yeah, the more I read the paper and the "this only really seems to work for open ended reasoning tasks" the less hopeful I was for it to be a viable thing to do. It's as you said, it's just isolating quirks of a model that would be amplified in a small model, without relying on CFG for normal applications, and for any other transformer-based tasks (cough audio) it doesn't seem helpful
>>96098779Isn't this just CFG with a tiny model's output as the negative? Using a shitty model to tell the better model what not to do? Seems like it wouldn't work well since the shitty model is still better than nothing."Don't do what Donny Don't does">inb4 use pyg as the negative model
>>96099333There's a setting for "NTK alpha value" or some shit like that. Try setting it to 4. Higher values get you more context before it breaks, but make it slightly dumber all the time.
>>96099305> Loss, but they are mostly the samenope their ain't .your loss or perplexity (they're related ) can be crappy but it doesn't mean your model sux. The best example is any instruction-finetune.perplexity/loss ≠ accuracy ≠ qualityperplexity/loss = perplexity/loss by analogy your model may exhibit good perplexity but the general quality of text or the accuracy of that text may suck big time
do the undi95 models only work with their custom prompt templates?
>>96098122 NTA, but I've seen this claimed before by a lot of people. With more beaks, you get huge benefits like: 1) sample efficiency improves 2) forgetting diminishes 3) they get smarter overall. I think you can also tell this rather easily, davinci 175b trained at 300b tokens is still smarter than 13b llama trained at 2000b tokens, even if the llama is pretty good for a 13b. Other more claims on this in the context of finetuning: https://www.fast.ai/posts/2023-09-04-learning-jumps/ I've also seen some other studies of this claiming learning is quite fast with larger models because they already have enough of the "circuits" already there so it only takes few samples to cause the change needed to remember it. I don't think the fast.ai post says that you get good generalization though,but it at least remembers the thing you tuned on easily.
>>96099377I don't think it *couldn't* be helpful for audio, but I don't think audio models are really developed enough to benefit yet. Really, what this method does is cancel out the effects of minor overfitting. My understanding of audio is that most models are are either severely undertrained generalizing models like bark (that don't really have any overfitting to cancel), or specific voices that are burnt to a cinder on 40 gorillion epochs like RVC. If the audio equivalent of an LLM existed, I would think that this technique would be useful for it.
>>96099378Train a lora on prompt refusals and hallucinations exclusively and then merge it with a tiny model, easy.
>>96099500They don't work at all. Stick with mythomax.
>>96099551Call me a fucking retard but why don't we do this already?
>>96099483I don't understand.If you are training (finetuning) on an instruction dataset then you should measure the accuracy and loss on the instruction dataset. It's going to perform worse (have worse loss and worse accuracy) at predicting the next word in the original dataset because it's simply not as good anymore because you trained it to perform a different task.I'll admit I don't understand the difference between accuracy and perplexity, though.Quality is subjective. A model might be less accurate and to a human it might seem more useful because it is, say, more creative. That's the whole point of temperature sampling (randomly choosing less likely words as the output sometimes).
>>96099586Because he's memeing and it wouldn't work. That's basically the same thing as logit banning "I'm sorry," which you can already do and already doesn't prevent refusals. If your model wants to refuse, it will just use different words to do it.
>>96099605It does work in CFG. Maybe that's partly because the refusals are coming from the same model, so it's hard for it to word a refusal in a way that's not already accounted for by the negative logits. Not sure it would be the same with a separate model, but assuming the mini model was also trained on gpt4 refusals it very well might.
>>96099593yes you got it. It all depends on the text/bench you run your perpl/eval against. I agree that the quality may be subjective but I guess we both agree that the generated text should exhibit relatively high degree of consistency, especially if it's tuned with various languages. Same with accuracy. We don't our models to get drummer regardless of the temp and sampling method
>>96099593PPL is measured on wikitext most usually that's why it's a meme. How likely it's to output the correct answer to a wikipedia query is not a measure of anything other than edge cases.
>>96099737I meant dummer
>>96099508Then I guess llama 65B/70B is shit, it's over.
>>96099572You didn't even tried to say that
>>96099605 relayed: "I'm actually working now on something similar, not quite RLHF, but something to reduce refusals by tuning on the logits directly. but tfw gpu poor so we'll see when it's actually trained, I'm experimenting on super tiny toy models right now, dataset and training code almost done. 13b-chat is the intended target."
>>96087501where can one download Stheno 70b?
hey guys i have been learning about vector databases and embedding methods and had some questions if someone could answer- why is that we're trying to query stuff with similarity search and then pulling information from the db and adding it to the context instead of just handeling everything in the decoder layer isn't that mathametically better- why haven't we added an attention layer to the embedding model where we're embbedding big documents ao better context
>>96099593by the accuracy I mean the ability to deliver correct answers , the truth. Also the ability to perform basic logic. In essence , not being dumb or a lier (or extremely biased towards wrong or sketchy mindsets).
>>96099896Accuracy in ML has a more precise technical meaning. It's the % of samples in the dataset that a classifier model predicts correctly.
>>96099535To my understanding, transformer-based audio decoders use, under the hood, either of two representations for a waveform (ignoring any fancy esoteric solutions):>mel-spectrogram based tokenswhich is what tortoise does, and uses a decoder to actually resolve its mel-tokens into an actual mel spectrogram. I imagine its mel-tokens are quite causal.>Encodec tokenswhich a lot of new models are using, which are definitely causal, even just trimming an Encodec sequence causes brief corruption in the audio, and one wrong token will introduce a crackleText LLMs don't necessarily suffer if it happens to pick a wrong token, as it can just branch off into a different way to phrase something (like, This man (is) a faggot, vs This man (def)initely acts like a faggot). The nuance is there, but if a similar thing happened to a token sequence from an audio LM, it's going to sound terrible in the final waveform.Unless I'm wrong, but treating your representation of a waveform as a language task isn't going to do so well when the intermediary doesn't have any room for nuances, like an actual language.
So, I did a summary of my chat but what do I do if I need to summarize once again? Do I just add up to the summary in sillytavern or use lorebooks/chromaDB?
>>96099500wym, the weird mlewdboros lora merge thing? my friend said they work just fine with normal formatting template.as for others, like mlewd remm and stuff, they work great too especially with alpaca.though i found out that no system prompt with only instruct mode sequences left gives interesting results.
>>96099934in that sense the accuracy and the perplexity are the same. But that's not what I meant .Sorry 4 the confusion, but you got the point.
https://huggingface.co/Undi95/MM-ReMM-L2-20B-GGUFDo what you want with it, need feedback, negative or positive
>>96100204>>96100204>>96100204>>96100204bread when ready
>>96100192can you make like 34B or 40B good shit?
>>96100240My Exllama2 file, after 3 days of trying and 4h of quantizing today, don't work at all.So fuck EXL2 lmao, I'm sorry.
>>96100249I'm poor anon. I do with what I have and what I can run kekMaybe in the future
>>96090727>compress pos embLinear scaling is deprecated.>>96090492Increase alpha_value until it works.
>>96099281Have a look at download-model.py in booba, seems to work reliably