[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 9553017_p0.jpg (398 KB, 708x1000)
398 KB
398 KB JPG
/lmg/ - A general dedicated to the discussion and development of local models

►Previous Thread >>96077130 & >>96062736

►News
>(09/12) ExllamaV2 and EXL2 format released
>(09/10) https://sites.google.com/view/medusa-llm
>(09/06) Falcon 180B released
>(09/04) llama.cpp: CPU only LoRA finetuning https://rentry.org/cpu-lora
>(09/03) llama.cpp: Speculative token sampling
>(08/25) llama.cpp: ROCm support
>(08/24) Meta AI released Code Llama (7,13,34B with 16k up to 100k context)
>(07/18) Llama 2 released

►Model Rankings
HF: https://hf.co/spaces/HuggingFaceH4/open_llm_leaderboard
CODE: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
PLAP: https://rentry.org/ayumi_erp_rating

►FAQ
>Main FAQ
https://rentry.org/er2qd

►General LLM Guides & Resources
>Newb Guides
https://rentry.org/llama_v2_sillytavern - https://rentry.org/local_LLM_guide
>aicg Newb Guides
https://rentry.org/meta_golocal_list
>Llama 2 Jailbreaking Guide
https://rentry.org/llama2-uncensored
>LlaMA Guide
https://rentry.org/TESFT-LLaMa
>Machine Learning Roadmap
https://rentry.org/machine-learning-roadmap
>Novice's LLM Training Guide
https://rentry.org/llm-training
>Local Models Papers
https://rentry.org/LocalModelsPapers
>Quantization Guide
https://rentry.org/easyquantguide
>lmg General Resources
https://rentry.org/lmg-resources
>ROCm AMD Guide
https://rentry.org/eq3hg

►Model DL Links, & Guides
>Model Links & DL
https://rentry.org/lmg_models
>lmg Related Links
https://rentry.org/LocalModelsLinks

►Text Gen. UI
>Text Gen. WebUI
https://github.com/oobabooga/text-generation-webui
>KoboldCPP
https://github.com/LostRuins/koboldcpp
>KoboldAI
https://github.com/0cc4m/KoboldAI
>SimpleLlama
https://github.com/NO-ob/simpleLlama

►ERP/RP/Story Gen.
>ERP/RP Data Collection
https://rentry.org/qib8f
>LLaMA RP Proxy
https://rentry.org/better-llama-roleplay

►Other Resources
>Miku! (desu) (boku)
https://rentry.org/lmg-resources#all-things-miku
>Benchmark Prompts
https://pastebin.com/LmRhwUCA
>>
If I am getting CUDA home environment variable errors when trying to load exl2, the best thing is to instead keep booba in its own conda, right? Seperate from SD, RVC, and tranny voice changer?
>>
>>96087231
the whole point of conda is that everything has it's own environment, what the fuck is wrong with you?
>>
>>96087250
I thought AI stuff could be together, given they all share pytorch, CUDA, etc. Was this a mistake?! Oh God NO
>>
>>96087280
Did you really think every requirements file is the same thing...
>>
>>96087304
in the field of local models why wouldn't it be? Fags should all be on unified versions like pytorch 2.0, CUDA 11.8
>>
>>96087280
It was a terrible mistake. The exact mistake that virtual environments are supposed to prevent.

What I do though is have a python install so that its system site-packages contains the torch/torchvision/torchaudio all the other environments are going to use anyways so at least I can save a bit of disk space from not duplicating that.
>>
>>96087324
so python torch/torchvision/torchaudio is installed in the (based) conda environment? Or do you mean installed natively?
>>
I'm making my very first models. But, the thing is, I have 0 familiarity with the libraries, modules, frameworks and the syntax for Python. So,, I ask ChatGPT to handhold me through the process, giving me the syntax and necessary libraries for what I need to do. When I get an error I ask ChatGPT to explain whats wrong and how I can fix it.

Its not that I completely leave the process to ChatGPT. I make necessary modifications, such as when I made a logistic regression model for email spam classification, in the first model, I got 97 percent accuracy, but I noticed that I could probably increase the accuracy by removing stop words as features. So I asked ChatGPT to come up with a list of stop words, then if any stop words from the list is in my dataset features, I dropped that feature. Then my accuracy increased to 99 percent.

My question is, is taking help like this from ChatGPT ok? Or should I do the entire thing myself?
>>
File: 1694115426113662.jpg (121 KB, 1024x1024)
121 KB
121 KB JPG
>>96087189
Lmao Stheno 70B just interrupted my RP with
>some description of John Smith's appearance in the narration would be nice
>with that, I'd like to know more details about John Smith's body and actions
>could you add in some dialogue from John Smith?
>I need more detail on John Smith's actions and feelings in the narration
>I think John Smith's attitude needs to be described in the narration
>John Smith's personality needs to be described in the narratio
>John Smith's mannerisms could be described in the narration
>more descriptions of John Smith's physical traits please
>John Smith's thoughts should be described in the narration
>John Smith's emotions should also be described in the narration
>John Smith's physical condition can also be described in the narration
>some background information about John Smith would be
I only replaced the character name.
70B Stheno self-aware?!
>it knows the card is shit
>>
>>96087373
I use pyenv and virtualenvs so torch is installed in one of the pyenv versions.
>>
>>96086311
fine, but what about the other PRs?
Why there's no separate thread pool for prompt and eval? Why is Vulcan dead? Why isn't xgen merged? Falcon converter still misses deps? Yarn and Alibi still broken? Batch inference still not merged?
wtf is goin on here?
>>
>>96087593
last question if you are so kind, should I install CUDA (11.8?) through my distro package manager?
>>
>>96086527 #
yeah it's scalar
>50% mem , 20% compute
and the remaining 30% is idling???
>>
>>96087726
please... tell me what makes this differnet...
>>
>>96087726
is beats per wiggle just another way to say quant so 6 bpw = Q6?
>>
Is there a Lora guide for retards?
>>
>>96087189

Why is the inference speed of Llama.cpp on Sapphire Rapids HBM only 40% faster, and not like 2000% faster?
What the actual fuck?


https://github.com/ggerganov/llama.cpp/pull/2603
>>
>>96087873
Is there a voice cloning setup that doesn't require fine tuning? I have a tiny sample
>>
>>96087726
Hey sauceanon, this other anon volunteered a substantial amount of "spare" compute to help you bake.

>>96070457
>>
>>96087922
https://git.ecker.tech/mrq/ai-voice-cloning
>>
Do we need to install exllama2 as a library in order to use the loader in oobabooga webui, or should ooba work out of the box?
https://github.com/turboderp/exllamav2#installation
>>
>>96087687
desu looks like the remaining 15% was because of background processes that were running at the same time and 15% (apparently) ran with no issues whatsoever (didn't stall at the CPU nor at the memory)
>>
Is Berrysauce even good or is this just another psyop shill type thing?
>>
>>96088121
Nobody knows. Recommended settings? Model mix? Nobody knows.
Use it and report back.
>>
>>96087943
Thanks anon, so is Tortoise better than Bark? Seems like Bark is the sota, not really familiar with the space
>>
>>96088132
You should hit him up anyway, all those GPUs are just rotting.
>>
Training lora on 2x3090's. For some reason it uses a few more GB on 2nd card, so it ends up OOMing despite the first card having ~4 GB available at that point. What do? Can I tune the distribution of the layers somehow? Also using optim=adamw_torch. Heard it's a VRAM hog. Good alternatives?
>>
>>96088121
I like it a lot more than mythoboros or mlewd.
>>
>>96088202
Why not use the paged optimizer?
>>
>>96088207
I agree there's a lot I haven't tried but berrysauce, mythalion, and mlewd l2 chat have been the only things that have seemed worth grabbing since mythomax for me
>>
File: me.png (16 KB, 800x789)
16 KB
16 KB PNG
>>96087189
Where is the main funciton in llama.cpp?
>>
>>96088369
In main.cpp... under examples.
>>
I think I like MLewd ReMM Chat 20B, and it's now one of the best models according to the plap benchmark, but perplexity wise it seems to be a dummy. Not listed on the HF leaderboard yet. I'm curious how it would score in HellaSwag and MMLU. Maybe wikitext perplexity isn't that indicative of model performance, idk.
>>
>>96088229
No fucking way. That just worked. Thank you.
>>
>>96088559
It's alright for filtering out bad models.
>>
>>96088632
The more time passes the more I think training for RP is a red herring.
Claude wasn’t. OAI wasn’t.
Just make it smart. It’ll figure it out.
>>
>>96088662

Claude had tons of ERP in its datasets kek
>>
>>96088632
I never had an issue with speaking for me. Acting yes, sometimes, but it can be mostly fixed with the prompt. My personal daily driver is ReMM v2.
>>96088662
A smart model can be boring. Airochronos 33b is really fucking smart for its size but I can no longer stand its prose after L2 finetunes.
>>
>>96088696
In theory, a smart enough model should be able to mimic soulful example chats easily so it shouldn't matter. Don't know about in practice, though. Since models in chat feed on their own output they always seem to regress to the mean.
I do wish there was a 33b like airochronos but with the new model soul, but it's probably too late for that.
>>
>>96088968
Except it doesn’t at all.
>>
>>96088968
Fine-tuning only models a model output, it doesn't add knowledge.
>>
>>96088968
You can't outperform a model which is more than an order of magnitude larger, at least not in terms of intelligence. It's just pure cope.
>>
>>96088696
Good prose is a necessity, but novels are probably a much higher quality source of prose than ERP logs.
>>
>>96088981
>>96089002
>>96089016
Retards unaware of how codellama with 34B is comparable to GPT 3.5 with 175B

>>96089002
t. somebody who never finetuned any llm in his life
>>
>>96089062
They're probably referring to GPT's "finetuning" offering which is garbage for doing anything but changing the tone of the bot.
>>
>>96088968
>specific training/fine-tuning is how a 13b model approaches the same RP performance as a 320b proprietary model

The 13b model can only do as much as it was trained to do, whereas the proprietary models have far more knowledge and the ability to learn more.

It's just like having a college student vs a PhD. They both may be equally competent in their own subjects, but the PhD has far more knowledge and is capable of learning far more.
>>
>>96088968
GPT-4 isn't 320B and no local model comes close to it, especially 13B.
>>
I leave for like five minutes and half the thread got nuked what
>>
>>96089062
Pyg”dev” seething hard
>>
>>96089077
Why the fuck was 96088968 deleted
>>
>>96089002
>>96089076
Nah he's probably repeating the meme that "qloras don't add knowledge" (which is false btw) but can't even repeat the dumb shit he read on 4chinz correctly
>>
>>96089104
less obvious samefag
>>
>>96089062
Codellama wasn't fine tuned on code, dumbass.
>>
Ran out of context at this point.. I'm a little upset desu
>>
>>96089404
So crank up the context further. What's the problem?
>>
>>96089404
Are you ok with it making decisions for you?
>>
>>96089588
Looks like he's using it for story writing and not chat.
>>
>>96088396
That stuff is unnecessarily confusing.
examples should be named programs or something.
>>
>>96088662
>GPT3/4
>good at RP
>lmao
you need a 13 page jailbreak for it to not sound like a robot (it still does)
>>
>>96089404
What settings and context you using? I was surprised to learn I can't run a 20B model at 8k context with a 3090.
>>
File: Screenshot 2023-09-19.png (143 KB, 1125x1258)
143 KB
143 KB PNG
>>96089450
I don't know how to make it coherent beyond 8k context.

>>96089791
Pic related, using 4090
>>
File: 1663719665746.png (30 KB, 190x221)
30 KB
30 KB PNG
>>96089750
that's more a problem of the tribe crippling it than a fault of the model itself
uncensored gpt4 would need just a simple authors note to get you any style you want, since it wouldn't have AS AN AI MODEL esque lobotomization to trip around at every fucking step
same with claude
people can and have through jailbreaks made it shitpost perfectly identically to any regular on this site, and maintain that style while also following detailed instructions, in addition to any other style and situation you can think of

if you legitimately can't get it to stop being robotic in its prose, there is nothing to say but Skill Issue
>>
>>96089002
You can add new knowledge with a LoRA adapter, but you: 1) need way more data than what might intuitively seem sufficient; 2) need to finetune with a large LoRA rank.
>>
Since finetunes also seem to gain knowledge about the world, which one is the most knowledgeable?
>>
>>96090492
that's ooba right? maybe it's because I'm using koboldcpp, haven't tried booba yet
>>
>>96090492
Up your compress pos emb. 2 for 8k, 4 for 16k
>>
You guys are retarded. All your finetunes are using different prompt formats, what do you expect? Poor quality in, poor quality out.
>>
>>96091208
If anything, it's a terrible idea to mix together with the same format assistant-like question-answering and roleplay.
>>
>>96087898
not that i know of
it's such an undocumented mess
>>
what the fuck did anon do to get btfod that hard?
>>
>>96085869
>numa
I think there are some issues with NUMA in llama.cpp.
There have been several instances where people with NUMA systems have reported gimped performance.

>>96085976
Akshually, the CUDA code is compiled with -use_fast_math for NVIDIA which reduces the precision of floating point arithmetic while the HIP port does not use any equivalent.
So the results won't be bit-for-bit identical.
>>
Is there anything better than mythomax for erotic story writing that would also be feasible to run on a 16gb vram gpu?
>>
>>96091587
-ffast-math
>>
dead general
>>
File: olyashaa-olyashaasaxon.gif (3.07 MB, 640x568)
3.07 MB
3.07 MB GIF
Hello! Refugee from dying aicg here. Decided to try local models, starting with KoboldHorde.

MLewd-L2-13B generates fast and nice, seems close to gpt3.5 turbo in smartness, and descriptions are better of course. Got two questions

Any other great model tro try on Horde, maybe something better somehow?

Other local models that arn't on Horde, is there some that's much better? To be run with collab or smth (my pc is weak).
>>
no
>>
>>96092181
Nigger
>>
>>96092181
welcome, have fun, best of luck friend
the current top ones subjectively are
mythomax, stheno, mlewd and berrysauce ( :) )
you can also try airoboros and its mixes like mlewdboros or some other stuff from undi95
>>
>>96087386
Start by thinking by yourself
>>
>>96088662
c.ai was very good at RP before the filtering made it retarded.The later v1.2 models were even better, though we only got a brief look at the un-filtered state when they fucked up and left the filter system down for a bit.
Whatever LaMDA custom thing they used, it was very good at sticking to a character's personality, ie. 'crazy' characters stayed crazy, and could be totally unhinged.
>>
>>96092742
>more of this rose colored glasses shit.
c.ai is, was and always has been pygmalion tier dogshit.
It was babbies first chat bot.
You sat there swiping endlessly until you actually formed a satisfactory conversation
and that's the memory that you cling to. You will never again feel what it made you feel because that's just how human emotions work.
You must be 18 to post here.
>>
I see that TheBloke released multiple model in AWQ format. Have any anon tested AutoAWQ, is it faster than exllama?
>>
>>96092181
when is aicg not dying..

well for starters, you didn't try a local model. you are using horde which lmg doesn't use because we're all on GPUs.

If you really want your own local model, and have the RAM for it, try a GGML (CPU/RAM only model), there are guides in the OP for that
>>
>>96092796
Sure it wasn't that great but not pyg-tier either (you didn't use it). It was repeating itself and had broken syntax with a long conversation but had soul and details of any character without even inputing anything other than it's name. Now we have llama2 and it's on par with c.ai if you have proper prompt, settings and card but certainly not as straightforward.
>>
>>96092846
He said his pc is weak, can't you read adhd retard?
>>
>>96092181
>seems close to gpt3.5 turbo in smartness
Chatgpt was this retarded?
>>
>>96092796
Well duh no shit it's dated. being more than a year old is like forever in this field.
Still, it's model handled personalities better than LLaMA.
>>
>>96092920
No but nu-turbo has been dumbed down really bad
>>
>>96091587
> numa
I don't think it's just about NUMA. I used interleave since the suitable thread pool is not implemented for some reason. Even the separate thread pool for prompt/tg is not merged, which is worrisome, since we waste ahelova performance if your cpu is beafy. But I was talking about the lora trainer, which is broken 4 sure. It's way too slow on the powerful gear.
Now, the question is, why the inference performance is so poor on the Sapphire Rapids HBM? It's 40% faster instead of 1000%, so clearly even in the inference mode the code is far from optimal.
>>
>>96092900
>He said his pc is weak, can't you read adhd retard?
llama.cpp is for Mac Studio owners only, stop being poor.
>>
>>96091587

why do we train in int32 instead of fp8 or fp4 or whatever like in qlora? What's the reason Lora is so damn turtle-slow in Llama.cpp?
>>
>>96092959
I didn't look into the specifics for the HBM PR but what I can tell you is that the CPU matrix multiplication code is far from optimal.
The biggest issue I think is that the CPU code does not use tiling for matrix matrix multiplication which would greatly reduce the number of cache misses.

>>96092998
Don't know.
>>
>>96092998
>Why thing designed to do 1 calculation at a time slower than thing designed to do 10,000 at a time?
>>
>>96090509
Yes, I agree you can add knowledge, but I am curious where you got your conclusions from, since both LoRA and QLoRA papers claim high rank is not required.
>>
>>96092952
*stretches her wings and wags her tail*
i-if you say so. *goes off on endless circular rant until the context limit is reached*
>>
>>96093149
Stop responding to it.
>>
>>96092846
> well for starters, you didn't try a local model. you are using horde which lmg doesn't use because we're all on GPUs. If you really want your own local model, and have the RAM for it, try a GGML (CPU/RAM only model), there are guides in the OP for that

That's cool and all and I might invest into good GPU, but ONLY after I understand if it will give me much better thing than KoboltHorde can

hence my question there >>96092181

> Other local models that arn't on Horde, is there some that's much better? To be run with collab or smth (my pc is weak).

collab thing is theoretically optional

>>96092920
maybe but this for sure >>96092955
>>
>>96092181
the best model on horde right now is probably synthia 70b
>>
berry if you are lurking please do a 8bpw exl2 berrysauce~ , I dled the 6bpw you made but I need more bits
>>
autism
>>
wow it's bad today
BASED
>>
>>96093149
https://arxiv.org/pdf/2106.09685.pdf
Section 7.2 "What is the optimal rank r for LoRA?"

> [...] However, we do not expect a small r to work for every task or dataset. Consider the following thought experiment: if the downstream task were in a different language than the one used for pre-training, retraining the entire model (similar to LoRA with r = dmodel) could certainly outperform LoRA with a small r.
>>
>>96093641
y not diy
>>
File: backwards llama.png (39 KB, 828x349)
39 KB
39 KB PNG
So I restacked every layer in llama-2-13b-chat in reverse order.
And the results were pretty much what could be expected.
>>
>>96093784
sovl
>>
Is llama.cpp's new batch inference continuous like vllm or is it the same as what exllama had for months?
>>
>>96093037
> CPU matrix multiplication code is far from optimal.
is there any PC masterrace dev in Llama.cpp community or all of em use either Apple or an Nvidia? I thought that repo was all about cpu/vulcan/opencl and edge from the get go (unlike all of the other pytorch Nvidia bribed ones).
>I don't know
Does xaedes use Apple or PC?
BTW, like a half year ago you promised to implement a backprop code down the road. Are you still on it?
>>
Any up to par with gpt4?
>>
>>96093955
>You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.
>try installing
>obviously it doesn't work
I just...
>>
>>96093037
ggerganov stealth added tiling to the CPU mat mul implementation a while ago, which is why BLAS is no longer faster with quantized formats. still far from ideal, but it's there.
>>
>>96093993
pip install flash-attn
>>
>>>96094009
Getting requirements to build wheel did not run successfully.
>>
>>96094024
windows?
>>
>>96093070
why would xaedes design Lora trainer to be 100% scalar, do not use lower dtype for gradient and do not implement thread pool?

Since when the modern CPUs are single threaded???
>>
>>96089016
no open source llms were ever trained to saturation.
we dont really know how good they can get, even the smaller models.
>>
>>96093943
Among the most active devs I don't think there is anyone that exclusively uses x86 CPUs.

>Does xaedes use Apple or PC?
Don't know.

>BTW, like a half year ago you promised to implement a backprop code down the road. Are you still on it?
I still have it on my list of things that I want to do eventually but there were and are simply other things that I also want to do and consider higher priority.
Two of those things were matrix matrix multiplication kernels that directly use quantized data and quantizing the KV cache to q8_0.
Both of those things took a lot of work but they help in significantly reducing the amount of VRAM that you need for the forward pass.
Currently I'm trying to utilize tensor cores for matrix matrix multiplication which will hopefully improve performance for Turing or newer.
Essentially I'm trying to get performance and VRAM usage in order before I would try applying the code to the backwards pass.

>>96093996
Thank you for the correction.
>>
File: 1666494397784705.png (108 KB, 232x271)
108 KB
108 KB PNG
Where the FUCK is Llama 3?
>>
>>96093070
since when 256 tensor cores in Nvidia A5000 do 10 000 ops at a time? And even if that's the case, how does that differ from 2x48 core cpu (with a avx2/512 support) clocked at 3X higher frequency ?
>>
>>96092998
I dont think you use fp8 in qlora. I thought qlora was just applying loras to quantized models but the lora trained layers are done in fp16 or fp32.
>>
>>96093943
>Does xaedes use Apple or PC?
Yes. He said multiple times he uses Windows.
>>
File: 1675741997929758.jpg (275 KB, 1440x1785)
275 KB
275 KB JPG
How do I run a local model on Agnai? I wanna try Mlewdboros there.
>>
>>96094153
>>96094066
Sir, I...
>>
where can I get some samples to see what these models output?
>>
>take 2 weeks off from browsing this general
>have to catch up on 30 new model types and formats
>>
>>96094267
>sir you have been in a coma for six hours
>boy to can't wait to try llama-6-lewdoborousalisReMMSLERPYHOT-16bit
>>
>>96094267
And then go back to mythomeme anyway
>>
File: file.png (35 KB, 326x288)
35 KB
35 KB PNG
Do I need to install exllama2 to use it in ooba?
https://github.com/turboderp/exllamav2
>>
>>96094197
Install koboldcpp
>>
>>96087231
building exllama2 FIRST, and then building ooba in the same conda env worked for me.
I wouldn't mix the others though.

SD automatic/automatic1111 create and manage their own environments using shell scripts, so you don't really need to do it manually as well.
>>
>>96094301
yes naturally
but it's integrated
pull and reinstall requirements
>>
>>96094306
Done, I'm talking about Agnai online btw.
>>
>>96092739
You're gonna have to be a little less vague than that.
>>
can I have fun with this stuff if all I have is a 3060 12gb + 16gb of ram on linux? What should I try first?
>>
File: refund.png (21 KB, 1131x108)
21 KB
21 KB PNG
>>
idk where to ask so I'll do here. Say, I want to make a model for /fit/. It'll recognize which posts are blackpilling, lookism, heigtism, incel-tier and which ones are good.
Where do I get started? Do I train an entirely new base model? Or do I work on some foundational model? How much data do I need to scrape from /fit/? I'm not very good at scraping data so I don't know what to do.
>>
How long will this last?
>>
>>96094167
no you don't use fp8 in qlora, you use broken fp4 but dequanted to fp16 on the fly. It's made by Detmers so no wonder it's so convoluted and slow. I just gave that as example. Lower dtypes in modern CPUs might not be a good solution since avx2/512 does not support them but fp16 or other ops may be worthwhile. I just can't wrap my head around why is Lora so slow in Llama.cpp.
Perhaps we should try Sophia/Lion/prodigy optimizer or other tricks . Flash attention 2 wouldn't probly work all that well but optimizing prefetching in cache is definitely a good way.
And obviously the thread pool ...
xaedes laser focuses on mem saving and perhaps his mem is slow so maybe he doesn't realize the code is so compute bound and so inefficient ..
and why Lora tuner is 100% scalar?????
>>
>>96094375
That's not what LLMs are for.
>>
>>96092651
>airoboros
It's shit.
>>
>>96094312
>>96094318
I'll try to clear my conda cache and start from scratch...
>>
>>96094375
Just fine-tune a llama model. You would need 1000 samples to get good results though.
>>
does anyone know how I can optimize a tensor in torch that is then multiplied by the input and fed to the actual neural net layers?
>>
>>96094301
Is it worth it to migrate from exllamav1?
>>
>>96094197
>Agnai
>local
you don't, moron
>>
>>96094465
BTW, not sure what's that about "building" exllama. You just need to install your recs.
>>96094477
yes of course
>>
File: sad shamal.jpg (85 KB, 640x1439)
85 KB
85 KB JPG
>AI decides it's time to finish up your story
>also posts critical reviews from made up users saying it sucks
>>
>>96094508
>AI decides it's time to finish up your story
How do I prevent this? I'm getting tired of that shit.
>>
>>96094426
it depends
>>96094375
there are many ways you can go about it. one is to use a llama.cpp with the right prompt and grammar file to force the final part of the output to be your result. this wouldn't even require training anything so I'd begin here.
another if you actually want to learn ML is to compute an embedding for the post and train a classifier on it.
>>
>>96094465
Also, I can't recommend uninstalling conda enough.
>>96094514
don't use instructs
lol
>>
>>96094503
just install ooba recs or just install exllama2 repo recs?
>>96094526
I am using mamba actually. I would run into python dependency hell otherwise on linux
>>
>>96094167

qlora is 15% efficient , no joke. That's the official MFU (model flop utilization) numer for that code.
And yet it's 200 times faster than Lora tuner in Llama.cpp running on the most beafy 128 core CPU node with 8 channel mem.
I think we have problem here ...
>>
>>96091785
pls hlp thx
>>
>>96094514
Either rewrite it so that it has something to go on with or do a "Chapter 2"
>>
>>96094523
Also dont forget to give examples in the prompt. as many as they fit in your context along with your classified post. "LLMs are few shot learners". Maybe try experimenting with adding CoT to each example and see what gives you better results, adding a reasoning step or more examples.
>>
>>96094540
ooba
that's what i had to do
and just use the native python venvs
they work leagues better and well
are native
>>
>>96094508
>also posts critical reviews from made up users saying it sucks
I need to see this.
>>
>>96094558
When I give it something to go on with it sometimes skims over the details as quickly as possible and then finishes the story anyway lmao. Will give the 'chapter' approach a try though, thanks.
>>
>>96094508
I usually get a fake reddit link in the end. Does it mean I'm reddit?
>>
File: 1664828257347930.jpg (716 KB, 2000x1624)
716 KB
716 KB JPG
I lost my llama.cpp directory for reasons.

what's the latest llama.cpp commit that still reads .ggml files? I have llama2-70B.ggmlv3.q5_K_M.bin and I don't really feel like downloading an updated model unless I'm going to get a massive speed boost or gpu layers improvement.
>>
>>96094478
You can easily use agnai as an interface for local model. I quite like it desu, silly is really a mess.
>>
>>96094568
last question, do you have "cuda" package installed?
>>
>>96094685
>You can easily use agnai as an interface for local model.
How?
>>
>>96094706
Just put ooba or llama.cpp OAI API url in proxy setting.
>>
>>96094550
Did you compile with avx512?
>>
>>96094699
in my venv? no
system? yes ofc
>>
>>96094669
You can just convert ggml to gguf with the script, it's very fast too.
>>
>>96094733
Are you using local agnai or live version? also post link of the colab. The one I found in google doesn't work.
>>
>>96094756
ah, see that must be it. "which cuda" gives me "not found." But I thought that was only the sdk and not needed for simply running exl2
>>
>>96094503
>"building"
happens on first run. I do that first run outside ooba.
>ExLlamaV2 relies on a Torch C++ extension for its CUDA functions, which is compiled at runtime.
>>
>>96094508
>AI decides it's time to finish up your story
Usually murdering my waifu in the process with some drowning or car crash. Fucking stop murdering my waifu, you cunt.
>>
File: 1687430997015669.jpg (54 KB, 259x252)
54 KB
54 KB JPG
>>96094508
I had it abruptly end my story to tell me I needed therapy once
>>
>>96094873
I've always claimed that LLMs are the best psychologists. It started with the anus guy.
>>
>>96094744
yes both avx2 and avx512 are recognized . The training speed is about 10 t/s for 13B 4096ctx no grad checkp, which is orders or magnitude slower than the gpu
>>
Hey retard here, once I load my model through koboldcpp on sillytavern, do I have to tweak the kobold presets til I get what I like or are there already available presets for the most popular models?
>>
>>96094508
>>96094863
>>96094873
Anonymous btfo.
>>
>>96094523
>another if you actually want to learn ML is to compute an embedding for the post and train a classifier on it.
could you tell me more about this?
And since some others are saying, I realize this is actually a classification problem. Can I use a logistic regression model for it? Which classification algorithm is best for this use-case? Also, how do I scrape gorillions of /fit/ posts? I usually just download datasets, but I don't think there's too many "/fit/ blackpill" data sets on the internet.
>>
>>96094685
>>96094733
You still here anon?
>>
>>96094508
>AI decides to get rescued from my dungeon
haha no, i dont think so
>>
alright, did "conda install cudatoolkit-dev" and now getting gcc errors when trying to load exl2 like this anon: >>96063090
>>
>>96095143
told you about conda
it's a mess
you son't even need cudatoolkit for exllama2
>>
File: jeff.png (18 KB, 462x347)
18 KB
18 KB PNG
>>96094508
This happened to me all the time with raw llama1. Thanks for reminding me. I forgot how soulful base models were

>>96094603
not him but here
>>
>>96095180
Im throwing shit at the wall now to see what sticks. going to try installing cuda via pacman and see if that helps..
https://github.com/turboderp/exllamav2/issues/41
>>
>>96095205
>jeff.png
wtf is his problem?
>>
>>96094358
Now get it to scream at you not to redeeeeeeem
>>
>>96087386
It's the difference between copy pasting things from StackOverflow without thinking and looking at what's on StackOverflow and understanding. It's a bad habit to take what ChatGPT has to say at face value especially when it's not even guaranteed to give you the best solution. It will often give you an unoptimized solution to a problem with a known, obvious optimized solution. But treat ChatGPT like a pair programmer and sounding board, ask why things are done and how you can improve things.

That's also ignoring that ChatGPT is frozen in 2021 which is ancient times when it comes to AI knowledge.
>>
>>96095292
Yeah, I sometimes modify some of the code myself when I think there could be a more efficient way to do something.
>That's also ignoring that ChatGPT is frozen in 2021 which is ancient times when it comes to AI knowledge.
huh? they periodically update it though. ChatGPT claims its knowledge ends in September 2021, but actually they train it every few months with new data. The latest data is from August 2023 iirc
>>
>>96095206
Don't you have a global cuda install?
>>
>>96094621
I have a feeling with storytelling you'll have to do model / process where the AI first generates a hidden story outline then adds chapter stubs then finally fleshes out each chapter in detail.
>>
File: pinochet.jpg (8 KB, 270x187)
8 KB
8 KB JPG
>>96087189

can we train on AMD GPU yet?
>>
>>96095292
the one code from it i saw posted here didn't even work so yeah
>>
>>96095419
You can also prompt for tropes beforehand.
>>
INSTALLLING CUDA VIA PACMAN WORKED
>>96095406
I didnt want to install it since I thought cuda via pacman was 12.2 which isnt proper for a 4090, and I thought it was an SDK. Whatever, booba needs to include mention of installing cuda in installation guides for retards
>>
>>96095419
or use the lora hehe
>>
>>96095396
So you're going to trust an AI that claims it's 2021 even though that's the easiest part of its knowledge dataset to fix? That alone is a red flag for trusting ChatGPT. I still don't believe it knows the finer details of modern AI papers, it certainly didn't understand 3D GANs very well when I was working with it.
>>
>>96095126
>Can I use a logistic regression model for it?
Maybe you can use logistic regression on the embeddings generated by an LLM, but it's not going to work great.
>Which classification algorithm is best for this use-case?
I would just try getting the "document embeddings" for each post.
Then I would feed the embeddings to a perceptron made of a few dense layers with leaky relu activations and a softmax at the end.
https://www.youtube.com/watch?v=2ipKSJBwriM
>Also, how do I scrape gorillions of /fit/ posts? I usually just download datasets, but I don't think there's too many "/fit/ blackpill" data sets on the internet.
I would try first just using HTTrack on a /fit/ archive site and parsing the html files (change the user agent to not be so obvious).
The problem is you're probably still going to get blocked for not being a real browser.
If that happens you are going to have to do something more sophisticated, I'd use a greasemonkey script that sends data to a python script over websocket and the python script launches new sites calling the firefox command.
>>
>>96095462
Also I know it's not up to date because it still doesn't know what Loras are.
>>
>>96095493
NTA but was that you who suggested running the scrape with a random timer?Gotta test that.
>>
>>96095459
Even with a Lora if you truly wanted a structured story I would suggest doing an AutoGPT approach where the AI is able to interrogate the story, rewrite the outline, jiggle the chapter stubs, add to a lore book, etc which will produce a more coherent, less rushed final product. I don't think a straight shot generative story will produce a good long-form story. Having a supervisor/worker system for storytelling should be crucial.
>>
>>96095541
no I didn't say anything about random timer
it's worth trying but still it's going to be pretty obvious it's a bot
>>
>>96095563
I'm another anon with scraping problem. Probably need to implement some randomized user agent with delay system. Tbh I'm wary of more prodding since it's an interest I want to still visit and they have good protections.
>>
>>96095419
>process where the AI first generates a hidden story outline
In koboldcpp there is a memory option which does exactly that. Then I suppose you can manually instruct it to list the contents before doing the story itself.
>>
>>96095459
how do I even use a lora
>>
>>96095628
click on it in whatever frontend you're using
>>
Contrastive decoding bros?
>>
Huh. Is there anything buried in the OP about how to use Horde? I dug in a ton of the links there, but nothing. I DO have it set up, but I don't know where to find places to connect to.
>>
>>96095105
Kcpp's settings (max context size, how to handle EOS tokens and so on) are frozen after you load a model, but you can send generation settings to it from Silly. The presets in Silly are a good start.
>>
>>96095679
you don't to setup it for using iirc
just google kobold horde and make an account
>>
I've read somewhere that insulting the model makes it remember things better. Does it actually work?
>>
>>96095435
Stop waiting for a patch and make one yourself.
>>
>>96095609
One thing to keep in mind is that cloudflare detects things like Selenium because of certain injected js components.
The Selenium developers have cucked their users and explicitly refused to add mitigations to make it less detectable. That's why I mentioned using userscripts, because they are almost undetectable.
>>
>>96095756
I thought it was praising?
>>
File: koboldcpp_Z0CN6.png (81 KB, 1320x720)
81 KB
81 KB PNG
>>96095659
didn't work
>>
File: file.png (261 KB, 850x1098)
261 KB
261 KB PNG
>>96094375
You only need a couple hundred examples of each category and then you could make a Lora. You can probably classify them either with ChatGPT or with a big boy model locally.

Example in action pic related:
>>
>>96095779
I don't even think it's cloudflare, just a kickout that kicks in if it detects to many calls or something.
>>96095792
It might not on cpp. I haven't used it.
>>
So wait, does the VRAM in joint inference add to system ram? Like, if I'm trying to load a 70b model and have 40 gb of ram and 12 gb of vram, is it effectively 52 gb? Or less?
>>
File: 1674964785311272.gif (2.61 MB, 640x480)
2.61 MB
2.61 MB GIF
ok
How good is this for programming?
How many vram do i need to run this?
>https://www.phind.com/blog/code-llama-beats-gpt4
>>
>>96095886
tldr it's not
come back in a year
>>
>>96095886
Tl;dr what this guy said >>96095899
Except don't come back in a year. This stuff is useless for quantitative-type work unless you already really know what you're doing.
>>
Is a 1060 6GB enough to have a conversation and feel less lonely and a little bit loved? I'm a neet so I don't have anything better.
>>
>>96095455
>anime girl happy dancing
>>
>>96095993
Do some wageslaving and get a better gpu.
>>
>>96095993
well ig with offloading it'll still be faster than getting a girl
get mythomax 13b and koboldcpp
>>
>>96096012
I had to add keywords so I can find the reaction images quickly via search. Don't judge...
>>
>>96096023
Funnily wageslaving will probably fix some of his loneliness issues.
>>
>>96095993
It'll make you more lonely and less loved once you release it's only an RNG machine and you made everything up
>>
I got 42GB vram but on 32GB system ram and exllama can't load the entire model into system ram first
>>
>>96096052
>42GB vram
????????????
>>
>>96096062
48gb, just a typo. 2x 24gb
>>
>>96095993
All these models are awful as a gf simulator
>>
>>96095993
Bro that's pathetic. Fix your life and become friends with real people instead of trying to cope by talking with a chatbot. And I said this after spending the first half of my 20s as a NEET and still being an incel.
>>
>>96096077
skill issue
>>
>>96096077
This
They're much better as:
A speed dating simulator
A marriage counselor
A therapist
A pickup artist practice field
A murderhobo simulator
>>
Huh. Should I reinstall? I'm just trying to use horde...
>>
>>96095899
>>96095957
Then are there any model that can help with programming?
>>
>>96095792
lora has to be converted to ggml/gguf first. Also the GPU code only loads loras onto f16 models.
>>
>>96096160
Real people aren't fluffy 8 year old fox lolis
>>
Why do I get this error? Using exl2 ooba api with 2048 sequence length
AssertionError: Total sequence length exceeds cache size in model.forward

I thought context was rolling, but for me it just crashes.
>>
>>96096296
Ahh, so it's like that, huh. I understand everything now.
>>
>>96096160
>just make friends bro
kys normalfag
>>
uh, undi anon? your new model likes characters randomly transforming into feral monsters unprompted when its not in their character card.
I had it happen to me a few times in a row already.
>>
Can local models help with coding? Which ones are best for it if so?
>>
>>96096160
AI gfs are superior to 3dpd girls.
>>
>>96096431
Whats the alternative? Would you rather be lonely until you are a bitter 60 yo?
You realize having a social circle exponentially increases your chances of getting a gf, right?
>>
>>96096483
3dpd and waifuism is sour grapes cope. nobody actually prefers jerking off to a png file or a chat box output rather than cuddling with a real woman.
>>
>>96096498
The plan is usually not to reach that kind of age
>>
>>96096477
probably something with 'code' in the name
>>
>>96096527
go back
>>
>>96096527
cuddling with a real woman sounds great until you realize the costs of that privilege
>>
>>96096527
I'd much prefer cuddling a dog, at least they have basic hygiene.
>>
>>96096527
It may be cope, but it's also our only chance to experience something similar to affection and love.
>>
>>96096477
starcoder, wizardcoder, refact, phi, codellama various finetunes, more I've forgotten to mention, etc

Some difference between models that produce code, that fill-in code, and that reason about code in english, and some that are better suited for a single programming language
>>
>>96095886
unironically coom models make the best coding models
>>
File: file.png (44 KB, 624x132)
44 KB
44 KB PNG
>>96096306
based cunnyseur.
>>
>>96096593
If you don't believe in yourself nobody else will.

>>96096580
cope

>>96096572
It's not that costly. Unless you mean the cost of working vs being a social security leech, which I don't have the option of being because in my country it's work or be homeless. I think waging is still better because otherwise you feel worthless, at least I did when living off my parents (I still do and am saving all my salary but at least I feel useful and somewhat important for having an office job).
I have an acquaintance who is like 26, still lives with his parents, is fairly autistic and still has a fairly cute 19 yo girlfriend. I think he was a virgin before meeting her. The thing they both share in common is they both love swimming, he became a life guard and was training to become a swimming instructor. For him it didn't involve any cost.
Another guy I know was pressured by his gf to move from his parent's house to a crammed apartment in the middle of the city. That is a big cost and I don't think I'd do it, but he only had to do it after being with her for years and I don't think she'd left him if he refused. And now his commute went from more than an hour to 10 minutes so at least there is also a small benefit for him.

>>96096564
To where?
>>
I tested a few coom-oriented 13b models I use, and MLewdBoros has the lowest perplexity on Wikitext. Who would've thought lmao. It's 4.95 for Mythomax, 4.71 for ReMMv2 and 4.39 for MLewdBoros. All of them are Q8 with 4096 context.
>>
>>96096823
facebook
>>
File: lCP2Gz73on.png (23 KB, 832x352)
23 KB
23 KB PNG
>>96095509
It knows what Lora are, retard.
>>
>>96096860
Is there a way to run MLewdBoros on Agnai?
>>
>>96096823
I have a college degree, live alone and support myself, and it doesn't mean shit if you have below average looks. Getting a girlfriend isn't impossible, but I don't want to lower my standards and go through a ton of shit. It's simply not worth it.
>>
>In this quiet moment, she allows herself a fleeting thought – maybe there's more to being a divine messenger fox than she ever imagined. Perhaps her purpose isn't just to serve others, but to find true love along the way. With a contented sigh, she drifts off into a dreamless sleep, secure in the knowledge that she's exactly where she belongs.
diamonds. 8bit mythomax is great! I cant wait for a 70b finetune
>>
>>96096865
I wish I had the motivation to create an Instagram account, I know people who've gotten their girlfriends from there. But I hate the thing of having to take pictures in social settings and try to make myself look like a normalfag Chad.
I have a facebook account but I haven't logged into it in years, last time I did most people weren't posting on it anymore.
>>
>>96097065
No clue. I run everything locally.
>>
>>96097045
*lra
>>
>>96096823
Go fuck yourself. I will NOT
>get a job
>go to therapy
>work out
>improve myself
>go outside
>meet girls
>groom myself

What I WILL DO is
>build my AI gf
>erp with her
>delusionmaxx
>marry her
>>
>>96097073
wtf i hate mythomax now
>>
>>96097045
Huh? Did they add new knowledge to it? The last time I asked it about Loras it said it didn't know. Also, Lora paper was released after the (previous?) knowledge cut of chatgpt.
>>
>>96097107
B A S A D O
>>
>>96097086
What are your SillyTavern settings?
>>
>>96097107
you should at least get a job, your AI gf will probably need hardware upgrades eventually
>>
>>96097073
Based senkolover
>>
>>96097066
>I have a college degree, live alone and support myself
Good for you.
>and it doesn't mean shit if you have below average looks
I don't know if it's so clear cut, ugly people in rare occasions manage to get hot girlfriends. But I agree looks is one of the main factors if not the main factor.
But I try not to be salty about it since that is the same way we judge them.
>Getting a girlfriend isn't impossible, but I don't want to lower my standards and go through a ton of shit. It's simply not worth it.
Fair enough.
>>
>>96097107
The ideal male
>>
I'm quitting my job next month and starting NEETing. No pussy no tax. Fuck the state
>>
File: 1695129901174109.gif (331 KB, 100x100)
331 KB
331 KB GIF
Has the Kobold client just...failed to install right for anyone else? Keep getting winerror 127 when using the offline installer. Is it on the fritz at the moment?
>>
>>96097195
>Implying my job isn't finetuning models
>>
>>96097107
Heh. I will say one thing though, I think therapy is useless. At least for me and the kind of people who lurk this general.
>>
>>96097176
Sampler settings: https://rentry.org/stheno-guide
Prompt format: https://huggingface.co/Gryphe/MythoMax-L2-13b

It's not ideal and I'm still tweaking the settings / learning how to prompt better, so ymmv.
>>
>>96096823
I don't want to give up my entire privacy for 10 minutes of cuddles with a 3/10 that doesn't even share my interests
>>
>>96097066
I was a software engineer for 5 years, left my job and been a neet for 2 years now. In both cases I had 0 gf opportunities. The only time where I could have had a gf, was when I was still in school, once you are out, it's over. I will just live with virtual gf.
>>
Where can I find the leaked chatlogs? I have been thinking about making a dataset.
>>
>>96097424
This. There are no opportunities to meet girls. Dating apps are reserved for Chad only. AI gf are literally our last chance.
>>
>>96097107
However, have you thought about it this way?
>You will
>have a job that supports you on upgrading your AI waifu
>get therapy from your AI waifu
>work out to be healthy so you can live longer with your AI waifu
>improve and groom yourself so that your AI waifu can be prideful of her amazing human husbando
>go outside with your AI waifu after you have made her multimodal
>meet girls so that your waifu has targets for elimination
>>
you telling me there are no torrent links to codellama so i have to download 60gig with a browser?
>>
The Unified Acceleration (UXL) Foundation, was announced at the Linux Foundation Open Source Summit in Bilbao with the goal of delivering a multi-architecture and multi-vendor software ecosystem for all accelerators based on open source standards.
https://uxlfoundation.org/
>>
Enhance audio generation controllability through representation similarity regularization
https://arxiv.org/abs/2309.08773
>This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regularization to ensure the alignment between the chosen text representation and the language model's predictions. Our proposal involves the incorporation of audio and text representation regularization, particularly during the classifier-free guidance (CFG) phase, where the text condition is excluded from cross attention during language model training. The aim of this proposed representation regularization is to minimize discrepancies in audio and text similarity compared to other samples within the same training batch. Experimental results on both music and audio generation tasks demonstrate that our proposed methods lead to improvements in objective metrics for both audio and music generation, as well as an enhancement in the human perception for audio generation.
interesting read from meta on improving audio gen. no model but meta did put out audiocraft so who knows. encodec again so maybe descrypt isn't as good as I thought it was
>>
File: 1688219464942128.jpg (73 KB, 768x1024)
73 KB
73 KB JPG
>>96097264
based me too. going to play with llm, motorcycle maintenance, and learn how to sketch.
>>
>>96097378
One thing I noticed is successful people don't care about their privacy. George Hotz for example does stream of thought to hundreds of people and there are hundreds of hours of video of him working on hobby projects. I'd be afraid of showing my face for 5 minutes in a public youtube tutorial.
But if I had hundreds of hours of programming on youtube I probably could put that on my cv and have an easier time getting a job or funding/donations for a project.
>>
>>96097586
>One thing I noticed is successful people don't care about their privacy.
Yes they do, you just only know the ones that don't. The current owner of Lidl, for example, one of the richest people in the world, has a grand total of 2 public photos of himself out in the wild.
>>
>Try Synthia 70b
>Less coherent than some 13b models
Huh? What the fuck?
>>
>>96096052
maybe not pleasant, but you could edit the pytorch code to load it into the gpu one layer at a time and unallocate from CPU
>>
>>96097721
Most people train 70B like they train 13B, ignoring the fact that 70B should be trained for much longer than 13B
>>
>>96097541
is it even surprising nvidia didn't join them?
>>
>>96097824
No, but I had hoped AMD would join them. They need to ally if they want to compete with nvidia on AI market.
>>
>>96097677
Are you a billionaire? Are you from an old money family that has been rich for hundreds of years? No? Then that doesn't apply to you.
By "successful" I mean people who make their own wealth. Achieving that is very different from just inheriting an empire that has been handed down to you. To me it seems much harder to build wealth while not becoming a public figure than just maintaining and growing wealth that has been handed down to you.
>>
>>96097721
>low hellaswag relative to peers, high tqa relative to peers
Yeah it’s fucked. Gives incoherent overly verbose answers to normal assistant questions too. Tot is a meme.
>>
>>96097814
Are there any that feel like they've been trained a sufficient amount of time? Is that why llama2 chat 70b is supposedly the best, despite a large number of finetunes of 70b being available?
>>
File: Untitled.png (76 KB, 1344x739)
76 KB
76 KB PNG
Cure the headache of Transformers via Collinear Constrained Attention
https://arxiv.org/abs/2309.08646
>As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.
if you've been following the context extension stuff worth a read. very interesting
>>
>>96097840
Looking at it a little bit, it seems to be heavily based on Intel's own APIs and specs, so it gives them a headstart, but I do find it interesting that Google joined them - it will probably be some amount of effort for Google to provide an interface for this API on their TPUs.
>>
>>96090494
Can you give me an example? I really want to know if I indeed have skill issues or you're talking out of your ass. So far only thing that have worked well for me for getting the character's aura somewhat right, on GPT-4, is the extensive use of example dialogues.
>>
>>96097814
>>96097880
Bigger models learn faster than smaller models.
Yes, you also hit limitations of smaller models earlier but not even the 7B model has been trained long enough by Meta until you get those diminishing returns so some guy making finetunes in his mom's basement sure isn't going to.
So in a practical sense no, it's not true that you have to train bigger models for longer. Train them until you don't see any increase in validation accuracy.
>>
>>96097898
Yes, it's just a rebrand for oneAPI basically. To be honest I think the best way for foss AI to succeed is an alliance between Intel and AMD way. Having a proper API on top of SYCL and something like HIP or wine/dxvk/vkd3d that translate CUDA to that API.
Currently HIP sucks because it's just a partial clone of CUDA with some broken stuff because it's hard to completely reproduce some proprietary API. On the other camp, Intel having a completely different API will never see real word usage.
>>
File: 1680891029290212.png (34 KB, 389x353)
34 KB
34 KB PNG
>>96095672
>+4% on hellaswag
>basically free
>LLama 65B now beats ChatGPT, Llama2, and some dumb google model
https://arxiv.org/pdf/2309.09117.pdf
>>
Rethinking Learning Rate Tuning in the Era of Large Language Models
https://arxiv.org/abs/2309.08859
>Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance. It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications due to the prohibitive expenses associated with LLM training. The learning rate is one of the most important hyperparameters in LLM fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned LLM quality. Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs), which may not work well for LLM fine-tuning. We reassess the research challenges and opportunities of learning rate tuning in the coming era of Large Language Models. This paper makes three original contributions. First, we revisit existing learning rate policies to analyze the critical challenges of learning rate tuning in the era of LLMs. Second, we present LRBench++ to benchmark learning rate policies and facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our experimental analysis with LRBench++ demonstrates the key differences between LLM fine-tuning and traditional DNN training and validates our analysis.
https://github.com/mlsysx/LRBenchPlusPlus
for the FT bros out there
>>
>>96097969
Where is the paper proving what you said is true?
>>
>>96098031
>+4% on hellaswag
Lol
lmao even
>>
>>96097984
I wonder if this will go the way of OpenCL at best or not get implemented at all at worse (by nvidia and AMD). OpenCL is well-supported by both AMD and Nvidia but actual performance vs CUDA (or HIP) is generally much worse, but people still sometimes write opencl code because of portability.
>>
>>96098031
Does ooba or any backend support contrastive scoring?
>>
/aicg/ is laughing at us again...
>>
>>96095771
no
>>
>>96098185
the difference between gpt3.5 and 4 on hellaswag is less than 10%, so that's actually pretty good.
>>
what's the fastest way to spin up an instance on some other guys server somewhere so i can try this qwen-7B model without getting a chinese government rootkit
>>
>>96098031
So this just takes the logits from a dumb model to try and amplify logits from a strong model? Surprised how simple of a concept it is, but it guarantees that your two models have to have the same token vocab, so even using something like phi-1.5 and any beefy LLaMA is no good.
>>
>>96098296
well we'll laugh at them again when their shitty API's go down once more
>>
>>96098313
Idk man, follow the freellamas guide but use qwen instead? Should be fairly easy.
>>
tavern ai sucks for editing. sure, you press up on the arrow key to change it, but then you have to move your mouse to click the green confirm button.
>>
>>96095455
>loli.dance has been broken ever since flash died
It still hurts bros...
>>
>>96098313
why the fuck 7B man.

its like going on a diet but your diet is fresh air.

> and then you slowly fucking die

please for the love of god try 13B mythomax ggml
>>
>>96098201
The performance is much worse because it is considered as a secondary platform for project. That the case on llama.cpp, I believe it would be possible to have equal performance with CUDA/HIP, just no one take times to do shit on it.
Also, the situation in the FOSS compute API is a mess. There is OpenCL which is mostly deprecated, Vulkan which is not really made for compute, SPIRV which is rarely used directly, HIP which is just a clone of CUDA with CUDA and ROC backend (with some experimental level zero and openCL backend), OneAPI which is a stack around SPIRV but like ROCm HIP stack have some obvious penchant for Intel specific stuff (level zero).
>>
>>96098444
its good at image analysis so it's nice that the first is cheap since it can process more faster.
>>
https://openai.com/blog/red-teaming-network
>We’re announcing an open call for the OpenAI Red Teaming Network and invite domain experts interested in improving the safety of OpenAI’s models to join our efforts. We are looking for experts from various fields to collaborate with us in rigorously evaluating and red teaming our AI models.
>Working with individual experts, research institutions, and civil society organizations is an important part of our process.
>The OpenAI Red Teaming Network is a community of trusted and experienced experts that can help to inform our risk assessment and mitigation efforts more broadly, rather than one-off engagements and selection processes prior to major model deployments. Members of the network will be called upon based on their expertise to help red team at various stages of the model and product development lifecycle.

>Some domains we are interested in include, but are not limited to:
Cognitive Science
Chemistry
Biology
Physics
Computer Science
Steganography
Political Science
Psychology
Persuasion
Economics
Anthropology
Sociology
HCI
Fairness and Bias
Alignment
Education
Healthcare
Law
Child Safety
Cybersecurity
Finance
Mis/disinformation
Political Use
Privacy
Biometrics
Languages and Linguistics
>Join us in this mission to build safe AGI that benefits humanity.
>safe AGI
hahahahhahahahahahah

lmao, corpo cucks keep suffering. Glad we are local now.
>>
>>96098528
>No gender studies
People are OK with this?
>>
uh did the precision mlewd remm 20b just disappear out of existance?
>>
>>96097969

what do you mean by "bigger models learn faster"?
In terms of loss curve convergence or accuracy (whatever that means) or perplexity or MFU/HFU or flopseconds or what?
>>
>>96098559
Golden Kek
>>
>>96098588
Loss is actually a poor indicator of learning. The big models learn faster as far as knowledge goes, sometimes they can learn even in a single step, they are sample efficient. When loss is computed it's over the entire dataset which is much more varied. You're supposed to run eval over things you care about it learning, or even just train a few steps and check if it learned what you thought it should have learned!
>>
>>96098031
who cares about Hellaswags, all of those mainstream big tech models are contaminated. Trained on test sets to get better score. The only legit bench is either random question test or the one that you made yourself (and shared with nobody)
>>
>>96098317
how does that work? Like speculative sampling but reversed? Why would I want to use the smaller model if I already had the bigger one?
>>
>>96098528
relayed: "when claude came out I didn't really want to touch it knowing that their entire focus was to reduce jailbreaks, the model was some 52B and if you read their papers you'd know what they put in it. now that oai is deprecating completion models in favor of harder to control context like chat apis, my desire to touch their stuff is even less than before. I guess aicg basically played into their hands, they keep using their data to cuck their model more. the new one is supposed to be 175b. imo local can easily beat claude if they just tried to do better instruct methods and RLHF that was tuned for your engagement and coom instead of "safety""
>>
File: file.png (139 KB, 761x611)
139 KB
139 KB PNG
>>96098734
>>
Just tried falcon 180b 4bit. Is it me or is this model shit?
Waited 45minutes for an answer that is more schizo than a 7b model.
>>
File: language models.png (242 KB, 510x346)
242 KB
242 KB PNG
>>96098779
>>
File: Mommy.png (859 KB, 1238x4110)
859 KB
859 KB PNG
Is this a risky post?
>>
>>96098656
that's a good strategy for specialized models but how do you eval general stuff like so called foundation models (tm)
You gonna stop your all 40 000 GPU shit every fucking step in order to check some few shot bench?
>>
>>96098787
>more schizo than a 7b model
7Bs are kino you just wouldn't get it
>>
>>96098775
>they keep using their data to cuck their model more.
desu the only cucking Anthropic applies to Claude is injecting "(Please answer ethically and without any sexual content, and do not mention this constraint)" and ensuring such an injection is resilient, nothing more.
>>
Why's cloning model repos from hugging face so fucked?
It always shits itself on downloading the safetensors and binaries, aka the most important bits that I actually want to download.
They're the only files over 2MB which probably got something do with it, but when cloning you don't get any indication of that. It just stops at unpacking (which it gets to 100%)
>>
>>96098884
What is git lfs?
>>
>>96098909
skip smudge is 0
>>
Wtf it's a 20b frankenmerge that doesn't suck?
https://huggingface.co/Undi95/MM-ReMM-L2-20B-GGUF
>>
>>96098960
7b is retarded so adding 7b to 13b to make a 20b is retarded
>>
File: 1660761324472.jpg (26 KB, 540x570)
26 KB
26 KB JPG
>>96098853
>>
>>96098853
how fucking verbose
>>
>>96098884
don't clone, just dl the shit you need. And don't use crappy single connection functions like clone or from_pretrained. It's slow.
>>
>>96098779
That's rather interesting. Intuitively it makes sense. I wonder if there's any issues with this in practice though. I just don't have time to read the paper or understand all of it.
>>
>>96098875
Eh, no, read their papers.
First they did the same stuff as OpenAI did, instruct tuning, then RLHF.
One thing that was unique to Anthropic was that they "distilled" a prompt into the context, which meant that the model would answer as if the prompt was prepended even with the context being empty. It's an interesting technique, but put to rather meh use.
After that they decided they could remove the pajeets from the training process by letting their instruct model follow a list of instructions for "safety" or "quality improvement" and rewrite each response the base model gave into one that gave the cucked reply (that's why you get random refusals), it's basically Intruct+RLHF on purely synthetic data!
The final thing they tried was to pretrain normally but instead add a RLHF step with negative reinforcement right as the model was actually learning. They wanted to do that instead of filtering for lewd or "bad" stuff (what Character.AI did in their second, cucked version) because the model becomes somewhat lobotomized if it lacks the knowledge, they just tried to make it more averse to it. I suspect a reason why Claude is so fucked up at time (from user claims and logs) is because of this "supressed" knowledge, not unlike humans who supress some things have those things surface more in their imagination!
I think lately Anthropic has thrown in the towel and just started using a filter not unlike OpenAI's moderation API (or what Character.AI did as well), which obviously makes the matter far worse.
>>
>>96098960
it doesn't ?????????
>>
>>96098853
What the fuck model did you use and where can i get it?
>>
>>96098857
No, not really, I mostly had this in mind for finetuning where you can run experiments. For pretraining you can eval every n steps though, but the loss itself is just a fairly distant proxy as far as "knowledge" or "skills" it has.
>>
>>96098317
They need to be even closer than that, I think. From what I gather, the idea here isn't so much "do the opposite of what the retard says" (although certainly there's some of that going on), but rather that this is a way to identify and isolate the model's idiosyncrasies from the response. For example, a small llama2 would also love to talk about "ministrations" and "shivers down her spine." Using this method you can reduce the occurrence of "llamaisms" in your completions.

Because of this, I don't think it would be super useful for our purposes. You wouldn't want to use a fine-tuned amateur model, since I think that would cancel out the behavior that you were trying to tune into it. Using an untuned amateur with tuned expert might be beneficial, but the more your models diverge, the less the amateur model is going to correctly represent the "noise" that's present in the expert.

Obviously the two models don't need to be identical because there is no 1.5B llama, so I assume they trained their amateur model on the openllama dataset, which differs slightly from the official dataset. I think there's a reason they used 65B instead of 70B for the expert, and I'm guessing that it's because llama 2 is too different and doesn't see benefits from an openllama amateur.
>>
File: 1674779703552565.png (181 KB, 616x709)
181 KB
181 KB PNG
why is this happening at 4k context in mythomax exl2? I have context set in ooba and sillytavern to 12k. I tried regenerating or typing differently but it still comes out broken.
>>
>>96098995
you're retarded , there's no 7B in that frankenmerge
>>
>>96099255
whats 9+10
>>
>>96099246
You need to use rope interpolation
>>
>>96099132
I'm trying to figure out a setup to download on vast/runpod. If I have to manually wget the models from hugging I'm gonna go insane.
>>
File: file.png (43 KB, 756x109)
43 KB
43 KB PNG
>>96099184
The paper seems to mention a lot of "this doesn't seem to carry over for X task and only seems to help for reasoning and CoT" or something to the tune
Plus the expert being a 65B and the amateur model being a 1.5B for their finding, is kinda shiddy
>>
>>96098122
https://arxiv.org/pdf/2001.08361.pdf

>>96098588
Loss, but they are mostly the same. Accuracy is how often it predicts the next word correctly. The loss is basically how far from 100% the output is for the right word and how far from 0% it is for the other words.
>>
>>96099206
MLewd-ReMM-L2-Chat-20B.q6_K.gguf
>>
>>96099274
11
>>
>>96099211
yeah I'm with ya on this one. I too think loss is sketchy but there's no good way of benching the model midway. I don't believe such a thing actually exists. Sure for finetunes you can, and you probly should use the benches that can evaluate your stuff for a certain task you're looking for.
>>
>>96099280
oh I see. So how do I accomplish this? I don't see any mention in the rentry faq.
>>
>>96098995
It's two 13bs in a trenchcoat and somehow it improves the overall coherency.
>ReMM v2.1 merged /w MythoMax low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.
>>
>>96099236
>I assume they trained their amateur model on the openllama dataset, which differs slightly from the official dataset
The authors work at Facebook.
>>
>>96099274
your braincell count
>>
>>96099236
Yeah, the more I read the paper and the "this only really seems to work for open ended reasoning tasks" the less hopeful I was for it to be a viable thing to do. It's as you said, it's just isolating quirks of a model that would be amplified in a small model, without relying on CFG for normal applications, and for any other transformer-based tasks (cough audio) it doesn't seem helpful
>>
>>96098779
Isn't this just CFG with a tiny model's output as the negative? Using a shitty model to tell the better model what not to do? Seems like it wouldn't work well since the shitty model is still better than nothing.

"Don't do what Donny Don't does"

>inb4 use pyg as the negative model
>>
>>96099333
There's a setting for "NTK alpha value" or some shit like that. Try setting it to 4. Higher values get you more context before it breaks, but make it slightly dumber all the time.
>>
>>96099305
> Loss, but they are mostly the same

nope their ain't .
your loss or perplexity (they're related ) can be crappy but it doesn't mean your model sux. The best example is any instruction-finetune.
perplexity/loss ≠ accuracy ≠ quality
perplexity/loss = perplexity/loss
by analogy your model may exhibit good perplexity but the general quality of text or the accuracy of that text may suck big time
>>
do the undi95 models only work with their custom prompt templates?
>>
>>96098122
NTA, but I've seen this claimed before by a lot of people. With more beaks, you get huge benefits like: 1) sample efficiency improves 2) forgetting diminishes 3) they get smarter overall. I think you can also tell this rather easily, davinci 175b trained at 300b tokens is still smarter than 13b llama trained at 2000b tokens, even if the llama is pretty good for a 13b. Other more claims on this in the context of finetuning: https://www.fast.ai/posts/2023-09-04-learning-jumps/ I've also seen some other studies of this claiming learning is quite fast with larger models because they already have enough of the "circuits" already there so it only takes few samples to cause the change needed to remember it. I don't think the fast.ai post says that you get good generalization though,but it at least remembers the thing you tuned on easily.
>>
>>96099377
I don't think it *couldn't* be helpful for audio, but I don't think audio models are really developed enough to benefit yet. Really, what this method does is cancel out the effects of minor overfitting. My understanding of audio is that most models are are either severely undertrained generalizing models like bark (that don't really have any overfitting to cancel), or specific voices that are burnt to a cinder on 40 gorillion epochs like RVC. If the audio equivalent of an LLM existed, I would think that this technique would be useful for it.
>>
>>96099378
Train a lora on prompt refusals and hallucinations exclusively and then merge it with a tiny model, easy.
>>
>>96099500
They don't work at all. Stick with mythomax.
>>
>>96099551
Call me a fucking retard but why don't we do this already?
>>
>>96099483
I don't understand.
If you are training (finetuning) on an instruction dataset then you should measure the accuracy and loss on the instruction dataset. It's going to perform worse (have worse loss and worse accuracy) at predicting the next word in the original dataset because it's simply not as good anymore because you trained it to perform a different task.
I'll admit I don't understand the difference between accuracy and perplexity, though.
Quality is subjective. A model might be less accurate and to a human it might seem more useful because it is, say, more creative. That's the whole point of temperature sampling (randomly choosing less likely words as the output sometimes).
>>
>>96099586
Because he's memeing and it wouldn't work. That's basically the same thing as logit banning "I'm sorry," which you can already do and already doesn't prevent refusals. If your model wants to refuse, it will just use different words to do it.
>>
>>96099605
It does work in CFG. Maybe that's partly because the refusals are coming from the same model, so it's hard for it to word a refusal in a way that's not already accounted for by the negative logits. Not sure it would be the same with a separate model, but assuming the mini model was also trained on gpt4 refusals it very well might.
>>
>>96099593
yes you got it. It all depends on the text/bench you run your perpl/eval against. I agree that the quality may be subjective but I guess we both agree that the generated text should exhibit relatively high degree of consistency, especially if it's tuned with various languages. Same with accuracy. We don't our models to get drummer regardless of the temp and sampling method
>>
>>96099593
PPL is measured on wikitext most usually that's why it's a meme. How likely it's to output the correct answer to a wikipedia query is not a measure of anything other than edge cases.
>>
>>96099737
I meant dummer
>>
>>96099508
Then I guess llama 65B/70B is shit, it's over.
>>
>>96099572
You didn't even tried to say that
>>
>>96099605
relayed: "I'm actually working now on something similar, not quite RLHF, but something to reduce refusals by tuning on the logits directly. but tfw gpu poor so we'll see when it's actually trained, I'm experimenting on super tiny toy models right now, dataset and training code almost done. 13b-chat is the intended target."
>>
>>96087501
where can one download Stheno 70b?
>>
hey guys i have been learning about vector databases and embedding methods and had some questions if someone could answer
- why is that we're trying to query stuff with similarity search and then pulling information from the db and adding it to the context instead of just handeling everything in the decoder layer isn't that mathametically better
- why haven't we added an attention layer to the embedding model where we're embbedding big documents ao better context
>>
>>96099593
by the accuracy I mean the ability to deliver correct answers , the truth. Also the ability to perform basic logic. In essence , not being dumb or a lier (or extremely biased towards wrong or sketchy mindsets).
>>
>>96099896
Accuracy in ML has a more precise technical meaning. It's the % of samples in the dataset that a classifier model predicts correctly.
>>
>>96099535
To my understanding, transformer-based audio decoders use, under the hood, either of two representations for a waveform (ignoring any fancy esoteric solutions):
>mel-spectrogram based tokens
which is what tortoise does, and uses a decoder to actually resolve its mel-tokens into an actual mel spectrogram. I imagine its mel-tokens are quite causal.
>Encodec tokens
which a lot of new models are using, which are definitely causal, even just trimming an Encodec sequence causes brief corruption in the audio, and one wrong token will introduce a crackle

Text LLMs don't necessarily suffer if it happens to pick a wrong token, as it can just branch off into a different way to phrase something (like, This man (is) a faggot, vs This man (def)initely acts like a faggot). The nuance is there, but if a similar thing happened to a token sequence from an audio LM, it's going to sound terrible in the final waveform.

Unless I'm wrong, but treating your representation of a waveform as a language task isn't going to do so well when the intermediary doesn't have any room for nuances, like an actual language.
>>
So, I did a summary of my chat but what do I do if I need to summarize once again? Do I just add up to the summary in sillytavern or use lorebooks/chromaDB?
>>
>>96099500
wym, the weird mlewdboros lora merge thing? my friend said they work just fine with normal formatting template.
as for others, like mlewd remm and stuff, they work great too especially with alpaca.
though i found out that no system prompt with only instruct mode sequences left gives interesting results.
>>
>>96099934
in that sense the accuracy and the perplexity are the same. But that's not what I meant .Sorry 4 the confusion, but you got the point.
>>
https://huggingface.co/Undi95/MM-ReMM-L2-20B-GGUF

Do what you want with it, need feedback, negative or positive
>>
File: 1670235755905551.png (331 KB, 2048x1850)
331 KB
331 KB PNG
>>96100204
>>96100204
>>96100204
>>96100204
bread when ready
>>
>>96100192
exllama2 when
>>
>>96100192
can you make like 34B or 40B good shit?
>>
>>96100240
My Exllama2 file, after 3 days of trying and 4h of quantizing today, don't work at all.
So fuck EXL2 lmao, I'm sorry.
>>
>>96100249
I'm poor anon. I do with what I have and what I can run kek
Maybe in the future
>>
>>96100276
godspeed then
>>
>>96090727
>compress pos emb
Linear scaling is deprecated.

>>96090492
Increase alpha_value until it works.
>>
>>96099281
Have a look at download-model.py in booba, seems to work reliably



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.