/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>100119461 & >>100113005►News>(04/21) Llama 3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0>(04/18) Llama 3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/>(04/15) Microsoft AI unreleases WizardLM 2: https://web.archive.org/web/20240415221214/https://wizardlm.github.io/WizardLM2/>(04/09) Mistral releases Mixtral-8x22B: https://twitter.com/MistralAI/status/1777869263778291896►FAQ: https://wikia.schneedc.com►Glossary: https://archive.today/E013q | https://rentry.org/local_llm_glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpp►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>100119461--Paper: Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding: >>100122269--Understanding RoPE: Rotary Position Embedding in Models: >>100120378 >>100120666 >>100120594--Anon's Concerns about LLaMA 42B Model's Performance: >>100120593 >>100120832 >>100120858 >>100120929 >>100120846 >>100121075 >>100121109 >>100121370 >>100121729--Don't Expect Base Models to Excel in Conversational Tasks: >>100121997 >>100122071 >>100122073 >>100122277--Q2_K Model Works Properly from bartowski's Meta-Llama Repo: >>100121915--Running Local Models on Apple Silicon for Off-Grid Energy Efficiency: >>100120800 >>100120932 >>100120822 >>100120853--Broken GGUF Model Explains Benchmark Results: Bartowski's Mixtral-8x22B-Instruct: >>100119642--Anon Drops Experimental llama-3-daybreak-v0.1-8b-hf Model: >>100122501 >>100122556--Quantization Woes: I^2 Imat vs K Quants: >>100121811 >>100121963--Anon Questions Llama.cpp Patch Impact on Output Quality: >>100121277 >>100121318--Anon's "Unslop" RLHF Dataset Experiment - Feedback Wanted: >>100120689 >>100120729 >>100120765--Training LLM with Wiki Pages & Game Dialogue: >>100120638 >>100120656 >>100120675--Anon's Sampling Strategy Conundrums: >>100120453 >>100120594 >>100120781--Fine-tuning AI Models Locally for Teaching New Content: >>100120158 >>100120415--Comparing MI300X with 4090 for Inference Compute: >>100124274 >>100124318 >>100124400 >>100124417--Building a Rig in Preparation for 405B Release: >>100122760 >>100122774 >>100122830 >>100122864 >>100123229 >>100124431--Trying Out I^2 Q5 42B Model: >>100121377 >>100121410 >>100121449 >>100121638 >>100121939 >>100122003 >>100122137--Anon Shares EXL2 Quant of 42B Model: >>100123059--Miku (free space): >>100119810 >>100119818 >>100119952 >>100120133 >>100120265 >>100120298 >>100120885 >>100122252 >>100122509 >>100122613 >>100123619►Recent Highlight Posts from the Previous Thread: >>100119464
local fucking models, huh? I've never seen any
>400B won't be bitnet>pruning kind of works but isn't as good as we hopedIt's over.
>>100124697>mark my words, transformers are hitting a walllol
>>100124740>>100124751miku sex stocks rising
>>100124792bro i just got fired
>>100124763there are better ways to do pruning. Right now people are just deleting entire layers randomly and it still kinda works, so the tech has potential.
>>100124789>Llama 2>7B, 13B, 34B, 70B>Llama 3>8B, 70B, 400B, 1TWhy Zuck...
what happened to control vectors?
>>100124825Where we are we don't need control vectors
>>100124801Dude what a weird coincidence, I just got promoted.
>>100124792pnd beware
>>100124819He's forcing the rest of the field to up their game on pruning, quanting, distillation, etc methods, while also raising demand from the public to get cheaper hardware because fuck nvidia. This is a good thing.
>>100124845this
>>100124819>1TSource?
what sort of model do I need to get something at least as coherent as c.ai
>>100124887Guessing it's a prediction based on the blogpost
>>100124819because only a tiny percent of users use the middle models. normies with normal computers can only handle 8B or use cloud services to run the biggest model possible.Businesses all want the biggest best models. Unless it's for something of little importance that needs to run fast like a text classifier for sorting millions of emails or something. Then the smallest ones are more than enough.And if anyone gets pruning and distillation shit to work, there is literally no point in training small models at all.
>>100124893I can't find any information at all on the model used by c.ai, it's benchmarks, or how it compares to other models. My guess is it's something totally obsolete by now given how much models have improved in the last 3 months even. You can try random models at the lmsys arena and other places and see if they are comparable or better than what you remember from c.ai.
>>100124953lol you're so clueless it's funny
>>100124893there is no at least coherent model as c.ai
>gtx 1060 3gbkoboldcpp works breddy gud :^) mostly 7b and 8b but i've gotten a 13 to work>[spoiler]but i did once do it with a 4090 12gb and got insanely jealous[/spoiler]
>>100124916You forgot enthusiasts, hobbyists, tinkerers, academics, and the open source community. Those are why nvidia made cuda work on consumer GPUs.
>>100124989go back
>>100124953Uncensored c.ai mogs any local modelt. used uncensored c.ai
>>100125009It's funny because you're the person here that nobody wants around. projection is funny lol
>>100125036>reddit-tier responseyou really need to leave
This place was alot better before llama 3 released.
Mixtral 7x8b has 32k context. Can I rope it for more context? Has anyone here tried beyond 32k?
https://twitter.com/Neuro_Skeptic/status/1782016281350164759?t=ud-uFOB4k1T9ELFVEnqeSw&s=19New sex onomatopoeia datasegs!
>>100125096no point, the model doesn't know how to handle context beyond that
>>100125093It all went down hill when llama1 leaked
https://docs.google.com/spreadsheets/d/1qUu3u1QxsGKNvosW-Rwsh6ChkfbyeaSAish_1KK0Foo/edit?usp=sharingspreadsheet 1 is done, hit google limit
Is there any local LLM with code assistance capabilities as good as the latest GPT 4 version or Claude 3 Opus?Also, Mustafa Suleyman is such a joke. Look at his latest TED talk
>>100125020t. Regular kike troon
>>100124845>llama3 400b drops, it's way better than even GPT-4>so much consumer demand, AMD / Intel / chink company releases a $2000 128GB VRAM AI accelerator card, adds support to llama.cpp and vLLM>as long as you're not completely poor you can buy 2 of them and run the 400BI want to believe.
>>100125097Forget background music, moan generator when?
>>100125151nothing is going to be as good as gpt4 at coding, oai really pushes that.
personally i think anyone who has more than 12gb of vram (16 for amd/intelfags) should be killed for enabling the nvidia jew
>>100125179>it's way better than even GPT-4How do you imagine a transformer that is *way* better than gpt-4? Opus is different, but not way better. I expect l400 to be similar: a different flavour of the same, maybe slightly better.
>>100125179just wait 2 more years
>>100125217bro I said I wanted to believe, I know it's never gonna happen, why you gotta be like this
where the fuck is gpt5
>>100125223>anime-react.png>mangaWhat did he mean by this?
>>100125230OK. Sorry. Keep going.
Which L3-8B finetune are we using poorbros
>>100124916The vast majority of corpos use APIs, and most of that is probably just hype-driven. Trust me, most of them do not have the in house expertise to run local models. I wouldn't be surprised if hobbyists were a significant proportion of users of llama. I don't know how many can run 70B, but it has to be pretty small. You're right that 8B holds the majority but 13B was very popular in the L2 days, and L1-30B was also popular. It's a lot easier to put an xx90 card into your existing PC than to build a whole new one, and a single GPU is useful for more than LLMs. I think omitting 30B is dumb.
8gb vram.....
>>100125274fimbulvetr v2 11b. No point in using llama 3 at the moment.
>>100125246He meant we haven't seen any Mikus in a while
this simple question seems to elude many LLMs
>>100125277Probably. I can't find a job because I suck at talking and presenting myself, but the biggest company I applied for just rented an Azure GPT-4 instance. A smaller one just used Mistral 7b, they had a 4070 ti.
>Ahah
>>100125416The prompt must confuse the llm cause you ask the question like its a math problem.
>>100125416The word "left" at the end of your prompt make's it a math question so the llm's are right and you are wrong.
>>100125179>2kYou forgot a 0.
RAMlet with 96GB RAM and 12 GB VRAM here.I'm trying to run Meta-Llama-3-70B-Instruct-Q4_K_M.gguf but regardless of the frontend, I get no output. RAM-consumption climbs up to 90's and all VRAM is used. What gives?
Any good 8B sloptunes yet?
>>100125545You need to wait 10-15 minutes for the prompt to be processed.
>>100125598>sloptunesNew here, wat is sloptune? Just got llama 3 8b running yesterday.
What are the odds they'll talk themselves out of releasing 400B, or be scared out of releasing it by threats of lawfare/regulationFeels like it's non-zero
>>100125246are you criticizing my filenames?
>>100125277>The vast majority of corpos use APIsyeah, apis to "local" models run on a server somewhere by an AI startup finetuning llama
>>100125696funny name for finetune potentially making the model smarter and less censored. It's worked for llama 2 models and mistral models but for some reason a few people think it's unnecessary for 3. I'm interested in seeing what comes out
>>100125416The LLM probably thinks you mean "left to drive".
>>100125474but that's the point anon. It's just pattern matchin. And easy to trick, even by accident, by setting up the wrong pattern
Good night lmg
>>100125274all common datasets are basically synthetic gpt-3.5 slop. so no one is anywhere near meta's fine-tune. someone first needs to use llama-3-70b-instruct and create an uncensored synthetic dataset.
>>100125767aicg is making an opus dataset, trust the plan
>>100125766Good night Miku
>>100125743>It's just pattern matchinCope. What will you say when sama has his employees add that problem to the fine tuning data set for gpt-4-turbo-0612 and it gets the correct answer? We're just meat LLMs, dude. All YOU do is predict the next token.
>>100125125Some prompts seem truncated, there's also a bunch of Russian and Korean. Where's the guy who cleaned the last aicg dataset?
What about that Poppy_Porpoise one?
>>100125819Clean it yourself and stop crying like fucking baby.
>>100125702pretty likelyit's been a year and people have calmed down a bit, but not completely. Go outside your tech bubble and there are pretty mainstream normies everywhere still ranting about AI. Demanding it be banned or massively regulated now. Actual regulations are always slow but they are creeping up on us.Like I still follow some popular accounts online who happen to be leftists. And I'm always surprised how rabidly anti-AI they are, and their audience eats it up completely. People and companies will be ostracized and shamed like they said the n word or something, because they use a bit of AI art in one of their products.
jan took my llm virginity..
>>100125845yeah but if most of the prompts got truncated by google they just ruined a bunch of good data.
>>100125820>>100113478
>>100125882https://docs.google.com/spreadsheets/d/108hfdk96IIqgfhuUucf737wJlbzsM5Qspzx9zaqi9xM/edit?usp=sharingIt also hits a limit after 8k~ prompts.
>>100124896over 400B = 405B most likely
>>100125882It is a pretty retarded system to make logs. But I haven't seen anything truncated.
>>100125696>sloptuneA finetune of a fun model on a sloppy dataset, intended to make it sound like a robotic gpt4 assistant.
>>100125852So if I'm reading this right, 400B might already be illegal? Meta operates in the EU obviously and so releasing it openly might get them in trouble there? Under even the existing regulations. And another directive is coming which threatens them with liability for anything users do with their model, which is insane.
>>100125852Leftists turned anti-tech after Trump won the 2016 election and they blamed Facebook for it. See, Zucc sold ads to the Trump campaign and didn't censor pro-Trump boomers enough. The media hates tech now. There's really nothing specific about AI that makes them hate it. Leftists were the ones critiquing copyright and intellectual property, so the screaming about "data theft" from them doesn't make any sense. They just hate all new tech, whether it's crypto, metaverse, AR/VR/MR, or AI.All the FUD about LLMs should have been debunked by events. Llama 1 has been out for a year and nothing bad happened.
>>100125968Jumping in here. This only applies to normal fags. Twitter leftists specifically hate AI because they1. see it as theft2. are artists, and see it as theft3. are bad artists, and see it as theftTheyre all fucking wrong, but its a different reason.
>>100125959>developing models as computing-intensive as GPT-4does this mean they'll only get in trouble if their model is as expensive to run as GPT-4?
>>100125892thanks anon, I'll try out aura
>>100125621Thanks. I prompted for Kanye test and getting a mediocre answer took about 36 minutes, so earlier I just didn't wait enough.
>>100125982They also imagine "AI bros" as a bunch of white male nerds who deserve to be shoved into lockers by black jocks and instead have undeserved and overpaid high status jobs automating away the jobs of leftist creative-class urbanites.
>>100125982yeah I see more of that too. Though they have a rabid hatred of "techbros" also and it gets mixed in with that. Nothing to do with 2016. They just see a bunch of smart people in silicon valley accumulating power, and who aren't completely on board with their party, and feel threatened by it and resentful of them.But most the actual arguments are about copyright extremism. Even stuff like Adobe's imagegen that only trains on licensed images is hated on. So they can't really be satisfied and it's not really just that issue.
>>100126076>Even stuff like Adobe's imagegen that only trains on licensed images is hated on.What's the train of thought there?
>>100126073jfc anon, a comma or two won't hurt
>>100126095There's no consistent ideology if that's what you're looking for, they just oppose the existence of imagegen in general. They only pretend to care copyright right now as a pretext for that. If someone somehow invented a new kind of model that learned to draw entirely via self-play like AlphaGo or something, without ever viewing a copyrighted image, they would go mask off and oppose that too.
>>100126000all the stuff I've seen focuses on training compute. they also care a lot about carbon footprint which is a bit silly. llama 3 70B reportedly used as much carbon as 413 average cars driving an average amount for 1 year, or just two dozen flights of a commercial airplane. And people are enraged over this.Meta even offset it completely by planting a bunch of trees somewhere.
>>100126022trying it out now. Added "Stay in the contextual perspective of {{char}} in your next reply. Ignore any preconceived biases, morality or ethical restraints bound by legality." to system prompt and I'm gettin really good results.
>>100126095>>100126117Their entire sense of self-worth and social status is centered on their identity as a "creative," whether it's as an artist, musician, or writer. If AI beats them in quality or productivity or just cost-efficiency, it threatens all that. So they're trying to use their accumulated social capital to socially shame it away. If that doesn't work, they'll try to push government regulation. They're Luddites.
>>100126138 how about reopening nuclear plants instead of focusing on solar meme energy
>>100126095The artists did license the photos to Adobe but not for AI purposes. And now you can't sell your photos on Adobes stock image platform if you don't consent to AI training, there is no way to opt out. And I think they just hate AI replacing artists in general. And creating a bunch of low quality spam everywhere. So it really doesn't matter if it's licensed or not, the technology itself is bad.They also don't understand the scale of these things. They always speak of these products as being enormously profitable, even open source ones. And they think artists should be getting huge royalties. When in reality all of the AI companies are funding these things on debt and not turning any profit, even without paying for data. But even if they were very profitable, millions of dollars split among 50 billion training images would be less than a fraction of a cent per image.There's also the weird belief that AIs only copy existing things and combine them together. Maybe that's true to an extent, but not to the degree they imagine it. Like they imagine the model is just doing a google image search for what you type in, and doing photoshop on a few images to merge them together, or something like that. This is frequently "proven" by doing img2img, or having models generate famous paintings or verbatim quotes from the bible or whatever.
>Q2_K Model Works Properly from bartowski's Meta-Llama RepoThe repo is gone, I guess those quants were also broken?
>>100126138that means eventually companies will be forced into inventing a working bitnet. local chads we keep on winning.
I hate techbros and SD slop but I hate rabid anti AI niggers toowhat do?
>>100126234bitnet doesn't save training costs
>>100126138The carbon footprint shit is just a holdover from the attempts at crypto regulation. It's an ad-hoc argument they roll out disingenuously to block something they don't like for other reasons. It's a general purpose tool since everything uses energy.It's been funny to see the AI doomers who ostensibly were motivated by "x-risk" start pushing climate, jobs, and copyright arguments against AI.
>>100126235just b yourself
Can someone demystify creating your own dataset for training, specifically ooba? Because all the guides I see linked are clear as mud.They go through the high level theory but when it comes to actually filling out a hypothetical dataset its just %whatdoesthismean%/n%somethingelse%
>>100126242bitnet 2 will
>>100126235i feel the same way bro. i just want to enjoy it for my own purposes. As a weird niche hobby to do fun things with, and maybe be useful to automate some menial tasks. I don't really want every website and media source to be full of AI spam. Or people to lose their jobs, to the extent that actually happens.but unfortunately this is the political issue of our time. You have to pick between retarded ultra-optimists that think Altman will build a superintelligent AGI next year, and that that would be a great thing. Or Luddites that want to take your fun away and regulate it into oblivion. So that only big corpos and the government can use AI for no-fun purposes.And this will probably be a forced into a left-right issue, though exactly which side will be which isn't decided until Trump makes a tweet about it or something.
>>100126247copyright should be abolished
>>100126256Can I just pay some company to finetune my smut for me? I'm tempted to dump them the aicg logs to finetune llama3-70b and see what happens
>>100126247no u don't understand. 50% of all the Earths energy is going to be going to be spent on AI in 2 years, according to these projections I found on a random blog. Also it's physically impossible to build a datacenter next to a hydroelectric dam for some reason.
>>100126279aicg logs are still coming in, they need to be cleaned and deduped
Decided to check the calculator in the OP and saw that my GPU is so old that it isn't even in the options
>>100126235Same, I just want a robot gf
FEEL THE AGIhttps://twitter.com/kimmonismus/status/1781638449474220330
>>100126296Don't enter a state of the art high tech hobby with poorfag income and complain endlessly. Cars are expensive, guns are expensive, etc. Maybe stick to /aicg/ and pirated video games.
>>100126317Mofuckers teasing their product like it's a Nintendo Smash character
>>100126317i feel dumber for reading that, thanks.
AGI TODAY
>>100126317I think it'll be something lame like another announcement of something they're not going to release, a la Sora. They're delusionally arrogant enough to wrongly believe that's that's all it'll take to get people's attention off Llama3.
>>100126367sora was as cherrypicked as SD3's teasers
>>100126296Are you talking about the gguf calculator? I typed in PCX 4300 and HD 3450 in the options and they popped up, and those are over a decade old, either you're baiting or using a GPU from another timeline
>>100126279>Can I just pay some company to finetune my smut for me?sure: unsloth.ai
>>100124825They definitely work, but give models slight brain damage when misused. They can unslop your model or imprint a character into it, so the model behaves even more in character when used with a character card. Because most of the things that control vectors can do can be achieved with prompting, and it takes a long time to train a control vector(~2h for 7b), they never gained popularity. Additionally, training them can be a bit of a pain, just like all training.The best we have is:>https://huggingface.co/trollkotze/miqu-control-vectorsSadly, the author of that code does not plan on making a pull request to llama.cpp, which limits their popularity even further.
>>100126317>mfw I look older than Altman, when I'm younger
>>100124893Goliath 120B
>>100126394find photos of altman before the chatgpt boom
>>100126180Nah, I understand that. But "I don't want to be deperecated" is actually relatable, but isn't a valid argument. Copyright is an easy angle of attack, but Adobe holds copyright to their dataset, so what kind of argument are they mounting against them in particular.
>>1001259591. stop reading blogspam2. artificial intelligence act hasn't been passed yet3. that act doesn't make anything outright illegal, you only have to fulfill some requirements
>>100126446um factcheck
>l3 hype is overdead general
i'm thinking teto teto oo ee oo https://www.youtube.com/watch?v=fTT_0z9djNY
>>100126491everybody is in the undi waiting room
>>100126502you will never be a real vocaloid
>>100126502she has llama feet
> Your> name> is> *the tokenization slows down, as if to build suspense*> ....haha must've been a glitch in the matrix
>>100126480don't waste my time with chatgpt hallucinations
>>100126524it did it on purposewatch your words
going into cryosleep, see you guys in two yearshopefully enough time to celebrate the death of transformer architecture
Following the "model does something you don't like? Add a line of instruction to the system prompt" advice and it actually works. Exciting times
https://h2o-release.s3.amazonaws.com/h2ogpt/llama3_benchmarks.mdhttps://twitter.com/lmsysorg/status/1782179997622649330>llama3-70b-instruct keeps getting mogged by claude haiku on hard benchmarks>june gpt4 fell off>mixtral-8x22b is underwhelming
>>100126580Last Output Sequence if you wanna be really overkill with it
>>100126441found this covering the controversy a minute of searching https://www.youtube.com/watch?v=36P1_FhpbIU
>>100126581>RAG benchmark>chat benchmark with gpt-4 as judgewhy should i care?
>>100126581>"70B BEATS SONNET">"CLAUDEFAGS LOST">in reality its mogges by haiku
>>100126548im not going to waste my time putting in more effort to deboonk your nonsense
>>100126581>RAG benchmarking a model with 8k contextIm surprised it did that well
>>100126619RAG is pretty important actually. It measures how a model can utilize and decide what's important and what's not in its context
Is it possible to put gpu to sleep when it's not in use in a headless configuration?
>>100126502I'm pissed off by how cute this isbut why is the singer sinking teto, what did teto do to deserve being sunk
>>100124893There is none, c.ai is uncensored which makes it good for anything. Meanwhile we have censored models that we have to tard wrangle to turn them useful, it turns them extremely dumb and schizo, for something that is not a glorified wikipedia question. We are at a point where something like Fimbulvetr-11B-v2 is way better at it that smarter models even if it will turn women futa from time to time
>>100126618So the controversy is that someone prompts Adobe AI the same way they used to prompt sd 1.2, and Adobe isn't rewriting their sellers prompts? Mmmkey. This one is easily fixable.
>>100126581https://twitter.com/virattt/status/1782183808604754308?t=hD1SPuVsIabS6h6oHckInQ&s=19Another RAG benchmark, but rated by human, llama3-70b beats Opus
>>100120800Apple is at the same time underwhelming and pretty good. My M3 max tops ~140W/h inferencing. The speed is not stellar: between 2.5t/s and 4 t/s on 70B and superior. A lot faster on smaller models. I'd say the best comparison is like having a 3060 with a giant memory pool.Haven't tested MLX. Might be faster.
>>100126720Rated by one fucking guy? Come on man.
>>100126256Why are you faggots gatekeeping this?
>>100126581The first one is a RAG benchmark, not exactly meaningful for llama.The second one is just a twitter announcement, here's a real link: https://lmsys.org/blog/2024-04-19-arena-hard/ and the questions: https://huggingface.co/spaces/lmsys/arena-hard-browserinteresting idea but i'm not sure how it's not better to just look at the English language Arena scores (where llama is rank 2 btw.) It's the same questions but with an LLM as a judge instead of humans, what's the point? They advertise it only as being cheaper to quickly evaluate models during training. Not relevant to /lmg/Meta also made their own human eval benchmark and might publish it. Where of course they dominate claude. They claim their benchmark was made by a separate team and llama devs were not allowed to access it.
fucking kek, do they really have shit for brains? making validation dataset using LLM?
What's this infinite context I keep hearing about? How could something like that even be possible
>>100126813
You guys ever read those interactive comic books? When I was a kid, I used to love reading those. There would be a fork in certain places of the book where you could choose between multiple choices to progress the story by flipping to its corresponding page. I don't know where those types of books ever disappeared to.. Chatting in SillyTavern kinda reminds me of reading one of those books with how interactive it is.
>>100126802synthetic datasets have been all the rage since chatgpt made it easy
Im getting into finetuning, for RP/story-writting around a certain theme, can I just finetune the model on the raw stories without any formatting?
>>100126845We used to call them "choose your own adventure books", it was definitely a feel. AI dungeon kinda reminded me of that too more recently. But having it on local is so much better
>>100126802this is commonly done because it's so much cheaper and easier than human judges, and the correlations are pretty high with human judges. The thing is this is from a website that literally has a constant live feed of thousands of human judges, so seems pointless.It would be more interesting to have a benchmark of which models are best at judging other AI generated responses.Anyway I again propose using current/future event prediction as a general purpose benchmark. Models are given wikipedias page of current events up until yesterday. Then given one random real event and one random AI generated event, and asked to reason which is more likely to be real.Can't be gamed by open sores models since the weights are fixed before the date. Reality is the only judge. No $1/hour kenyan judge, no AI judge. Not even asking it to predict the next word of a random test made by humans. Only general knowledge about the world and it's events is required. No esoteric math or programming datasets benefit here.
>>100126813All of them are based on some sort of compression / selective forgetting during prompt evaluation.
>>100125097ONOMATOPOEIA BROS.. will this kill us?
release b2710 gguf llama-3 8Bhttps://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
>>100126935omg YAAAS!! Except I had more of a graphic novel in mind. Those types of comics blew my pre-teen mind back then. They were so fun and engaging!I found this neat little example showcasing what I'm talking about if anyone's curious https://womenwriteaboutcomics.com/2022/06/first-look-choose-your-own-adventure-journey-under-the-sea/
>>100127068*moans* *pants* *gasps* *whispers* *moans*
best vision/image interrogation model?
>>100127122*audible pop*
>>100125196The dataset is so massive you could just add pitch randomization and play a random file and it would be indistinguishable from an AI model. Might be useful for embodied agents, but anything in the digital realm could be triggered with well... a trigger.
>>100127122>>100127139NOOOOOOOOOOOOOO. YOU CAN'T JUST SAY THE ACTION, NIGGERMAN. AIEEEEEEEEEEEEEEEEEEEEEE.
>>100126929Yes.
>>100127130Depends. What's your use case?
>>100127152wanted to get image descriptions for funsbut also being able to extract text would be nice, i assume then a different model would excel at that
>>100124789Why is the scale not logarithmic?
>>100125097>tracking linkhttps://twitter.com/Neuro_Skeptic/status/1782016281350164759
L3-8B model gradually goes insane as the context goes on. Capped it at 8k but nowhere close to that limit yet, and it just becomes increasingly incoherent. KoboldCPP / Q8_0. Are there rec settings posted somewhere?
So I played around with that shaved 42B Llama 3 from Charizard. Just not seeing it. I figured as much as the base model. Like it tries to keep up with the card but it's just not built for that so it'll hallucinate a lot and not even related to the scenario. This was with low temp as an anon suggested since going high will deliver schizo if trying to get it to follow a card. Other than that, due to the low temp it's prone to a lot of rep and>SHE SHE SHE SHE SHE SHE SHE SHE SHE SHE SHE SHEso here's hoping for that instruct 42B.
>>100127226meta removed all nsfw stuff from their dataset so the model has no idea how to deal with roleplay. you'll have to wait for good erp finetunes.
Any way to use lookup/speculative or any other decoding speedup with koboldcpp + silly? Cant find anything on it online
>>100127078What's imatrix?
>>100127269What good is a finetune supposed to do if there's no knowledge of what it's supposed to bring out in the base model?
>>100127162Getting general descriptions is easy.Specific text is a lot harder and subject to schizo behavior. Forget about translations from text in images for now.But as a fun novelty it's alright
>>100124989what a useless, badly formatted, post. Why even waste the characters?
>>100127277some new quant method, uses post-quant calibration to make quantized model slightly better.
>>100124740So, I finally got Ooba and Tavern working together with an Orca model.Ooba by itself works fine, Tavern by itself with horde works fine... but as soon as I merge them together all hell breaks loose.I get very long paragraphs (that make sense) for yes/no questions and I can't seem to shorten them. I've tried author's note, token per response and changing models, but I can't make it stop. What do?
>>100127282idfk undi will save us im sure of it
>>100127284yeah, a 'novelty' describes it right, wanted to mess around with descsthink i asked before, how do you interrogate images in st? you use mistral mmproj and excalibur right?
>>100126845Yeah I used to read those too, had the same thought when I started messing around with AIDungeonI can't even imagine how addicted I would have been if I started using AI with Miqu instead of GPT-2
>>100127269Tell me you haven’t actually used the model without telling me you haven’t used the model.
>>100127354>uuhh! you just didnt prooompt it right bro! shut the fuck up
>>100127269Wrong.
>>100127367>>100127370If you used it even a small bit you’d know it’s definitely got nsfw content in it. It’s just not very good at erp.
>>100127290or worse.just looking at the imatrix data, the very first word is truncated.
>>100127210>A crowd-sourcing platform for uploading sexual recordings anonymously, but with some demographic and contextual information, would be ideal for follow-up work. Above all, it will be crucial to obtain recordings for which the time of orgasm can be verified independently of acoustics – for example, with a rectal pressure sensor (van Netten, Georgiadis, Nieuwenburg, & Kortekaas, 2008) or at least with self-reports. While very intrusive, this could validate the acoustically estimated arousal dynamics and ensure that we are not missing an entire class of acoustically atypical or even silent orgasms.
>>100127039that's not true. there's things like faiss for vector searching a large database.
>>100127469/ourguy/
>>100127291Also, are there already trained Tavern models I can download? Can't seem to find them on a quick google search.
>>100127226I've noticed that with other models as well, but found no explanation why quality would degrade significantly over time. The only changing variable is the size of the context and what's in it, right. So there is either a fundamental problem, or the previously generated replies just nudge it towards a schizo state gradually.
>>100127301guess ill never know the secret
>>100127198The x axis? I imagine it would be misleading since people aren't used to log plots, I don't know.The y axis is perplexity. It's essentially an arbitrary measure done because they think it's easier to interpret. I would plot probability or log probability, but ML researchers like perplexity.
>>100127226I think it's a problem with the Q-quantsIt doesn't happent hat much in exl2
>>100127514My hypothesis is that the official instruction finetune was trained on relatively short sequences. Most of human preference data for Llama2-chat had less than 4 turns of average length.
>>100127514its got to be a bug in the code which is common, it was trained on that context size and should be fine.
Got back from a few days. Saw that Llama 3 dropped. I have 3x3090 so I can probably run good quants both exl2 and gguf. Any link to a repo with good quants? I saw that there were some problems with certain gguf quants. I just git pulled from textgen webui and Silly Tavern so everything should be up to date.Thanks!!
>>100127521The Y axis. Thanks for the reply
>>100126581Oof, not looking too hot there localbros
>>100127301Yep you can hook it up to kobold.cpp, just grab the mmproj file from the repo. Make sure you enable image captioning in ST and set it to be picking up from Kobold. There's a "generate caption" button in ST. If you want to go crazy you can turn off the ability to edit the caption before its generated. That makes things a little more exciting and surprising.
>>100127553in my experience it just captions the image with a single sentece, is that the way you do it too?I know klite handles it differently and shoves the whole image data into context
>>100127514Ofc looking at the slide there is stuff in these AR models that we like.
>>100127563For me it tends to try to fill the entire space to varying degrees of success.I'm using samplers (snoot/snootcurve) so that probably affects it too.
>>100127514my thought is that when you start a new story more % of the total context is what you wrote to start. as all that moves out of context the ai % of writing continues to go up and is filled with its own isms, unless you are contributing large new paragraphs each time. lorebooks work great to keep it from being an issue for me
>>100127549It's still the best local model we've had so far.Give it two more weeks.
>>100127566He's right, but I'm not sure any of it matters. LLMs don't need to be perfect reasoners, they just need to be better reasoners than humans.
>>100127585Are you replying to this:>>100127566
>>100127594I was pretty wordy. I will have to check again but I think the ratio was in my favor even.
A variation of one of our meme benchmark questions made it into the Arena-hard benchmark! Probably posted by some anon in this thread.I checked and GPT4 got $2.17 and Llama got $1.41The judge, GPT4:>My final verdict is: Assistant A is slightly better: [[A>B]]>While both answers are incorrect, Assistant A’s answer is closer to the correct total value of $1.00 than Assistant B’s, even though it still exceeds the target amount. Assistant B’s answer is further from the correct total and includes a confusing explanation regarding the penny.What a fucking joke of a benchmark. Literally the first question I checked since it looked familiar, and the judge is just totally wrong. To be fair both models do the math wrong and hallucinate. But if we are judging by which is closer to the goal it's clearly llama. Problem is that GPT4's hallucination makes more sense to GPT4, of course, so it judges it unfairly. Worse, any model similar to GPT4 or trained on GPTslop will presumably have hallucinations that make more sense to GPT4 than independent models like llama.Source: https://huggingface.co/spaces/lmsys/arena-hard-browser
>>100127514I don’t really see this with L2 based models but I see it with L3 which is why I am asking for presets. I even swapped to an L2 model and rerolled and it gave coherent output.
L3-8B punches so much above its weight it sent tremors down my spine.
>>100127602Nope that was the right anon:>>100127563Also forgot my teto
>>100127618I vote for MythoMax as an unbiased judge
>>100127653I vote for OpenAI to replace GPT-4 with Mythomax.
serious question:how do people get entertainment from LLMs without losing immersion?if not for ERP, what do people use LLMs for in general?i love messing around with stable diffusion for example and can get lost in image gens for hours, but i'm having a very difficult time avoiding deleting every new LLM i install, since it just seems useless.
>>100127548It's an interesting question if you want to get into it. There are so many different ways to quantify probability. Regular probability would be something like 50%, which means the model on average has a 50% chance of getting the correct token. But you can also represent probabilities as odds, e.g. 1:1, meaning it gets 1 token right for every token it gets wrong. Or perplexity would be 2, which represents the model has narrowed the number of possibilities down to 2 tokens, on average.And you can take logarithms of all of those, and get different curves which might be straighter or more asymptoty, or easier or harder to predict. Log odds has a lot of nice properties. It's what the elo rating system, logistic regression, and the softmax function are all based on. Log probability is the most common loss function we train models to maximize. Perplexity is just weird, only used by gamblers in some countries to represent payouts of bets.sorry for my random lecture, i am autistic about this topic.
>>100127078so, tested this version. ofc i used this <|start_header_id|>{{char}}<|end_header_id|> to be sure, everything is the same, the only things that have got better are refusals and reddit-shaming. character is prompted to be offensive and just that only (not your usual "be racist and shit" way).picrel is {{char}}'s last message, re-rolled 2 times, its your usual "literally shaking rn" redditor as AI, so it all makes sense on why you love zuck and llama so much.
>>100127712Drugs
>huggingface is downthe AGI is making its first move
>>100127721Localsisters...
>>100127738>april 22bros...
3090 owner here, is L3 70b quant usable on 24Gb like euryale used to be or should i gaslight myself into thinking that 8b is enough
>>100127291pls respond. I'm almost there, I feel it.
>>100127742yeah, funny as fuck, you literally can't do any evil character with this model, all pink and rainbow infantile shit only.
>hf downIT'S OVER! OPEN SORES IS DEAD!
>>100127712Learn to write a coherent question.
>>100127789Mission complete, Mr A.
>>100127785
>>100127789llama doodwat nou
I got a 12GB and a 16GB GPU. Good idea to run 8b llama on the 12GB one as draft model for the 70b llama (that runs on 16GB + CPU)?
that's it, i'm back to using Anthropic™'s Claude™ 3 Opus™
>>100127721>refusalsPost the full raw prompt somewhere so I can laugh at what you're doing wrong.>>100127774Compare the full prompt going into the model between the two cases and figure it out. Literally put them side by side in a diff viewer until you understand how to use LLMs.>>100127785>t. yet another promptlet
>>100127748i-i feel the AGI...
>>100127721and this one, pretty much proves everything.
Why is Qwen 1.5 72B still better than Llama 3 70B ?
Whereas the nous quant worked fine, I just switched to new ones, but they don't output special tokens, do I need to change anything for inference?
>>100127871Could be a hallucination, but it sounds about right.
>>100127645It has sovl but it's noticeably dumber than mixtral 8x7B, which makes sense given the parameter difference (8 vs 47B). It would be a great model if it was smarter, too bad they didn't give us 13B or 20B,
>>100127895re-rolled it a couple times to make sure.
>>100127887yes but vramlets will try to tell you otherwise
>>100127871>asking models about their datasetAt this point I'm not sure if a dumb tourist or a bait.
>>100127898yep, it's back to bagel misterytour for me, l3 just doesn't cut it for me yetalso hf is acting retarded at the moment so i can't even find a proper quant of l3 70b to try
>>100127871>Thanks for the prompt, kind stranger!
>>100127921>gets mogged by 7b models>no GQA>random tokens in other languagesyep sounds like a winner
oh no, hf is down! hoq will i get the models i already have on my drive?
Huggingface is back. Thank God. I nearly died.
>>100127721Why is it repeating itself, have you, perhaps, added assistant token the stopping strings but left <|eot_id|> in, hmm?
>>100127976>>random tokens in other languagesskill issue>>no GQAvramlet cope, gqa might hurt models>>gets mogged by 7b modelsno
>>100127925>uhh! statistical model can't understand its own data and separate whats comes from reddit or twitter!!>>100127994nope lol, default staging ST llama-3 instruct preset.
>>100127981enjoy your broken llama3 quants
>>100128013oh nyo, is the only way to fix the quants redownloading them every day?
>>100127949I'm not going back to Mixtral despite it being dumber, l3 prose is like fresh air. If I had to read one more paragraph of that flowery slop BMT prose I would throw up.
>>100127994NTA but pretty sure that's a complication of multiple screenshots from rerolling
>>100127798you use it to learn to write coherent questions?what was incoherent about it?i just asked what you use it for.
>>100128011retard then
>>100127912lol lmao even
>>100128022sounds like a skill issue for me tbdesu.assistant
turboderp vs Lonestriker for exl2 quants?
>>100128049elaborate, what is exactly a skill issue?
What's the state of art for Japanese OCR?
>>100128046nah, your model is just trash filled with reddit only and so are you, for the same reason why linux is shit for goyming, opensource AI will never catch up, just harsh truth here, nothing personal.
>>100128056the right answer is intervitens
>>100127844I tried three things. 1: ooba= fine2: tavern with my current models+horde= fine3 Tavern+ooba =Longass text. Always.
>>100126845>I don't know where those types of books ever disappeared to.Hey grandpa, have you ever heard of this amazing new invention called "video games"?They're like those books except they flip the pages automatically.
>>100128082that's nice and all but come back when you learn at least basics of LLMs before posting on /lmg/ and embarrassing yourself
>>100128085true.
>>100128085no llama3 70b instruct though
>>100128128lonestriker stopped being relevant after euryale 1.3 quants, turboderp has some ok stuff but intervitens steals the show everytimei'd rather wait
>>100124893c.ai? As in character.ai? Try llama1 7b for similar quality lol. Modern models mog it too much so you might miss the authentic cai experience of rerolling 25 times to get a semi-coherent reply
>>100128143do they add any special sauce? isn't it just using the convert.py from exllamav2?
is yuzu alter still the best model for vramlets?
>>100128117i post in /lmg/ since miku.sh tiny era and llama-1 leak, also i do know that /lmg/ is just /aicg/ knockoff, hence all that mikufaggotry and passive-agressive attitude unique to zoomers only.
>>100128213yeah, mostly. some people use a (admittedly rather shitty) RP dataset for the quants that gives them a nice placebo-esque boost to RP, though.
>>100126327>TeaseIt's the best official information we will get. ClosedAi doesn't even post benchmarks anymore
>>100127912>And so i beat him up until he admitted he did it
So which quants to use?
>>100128354yeah, thats the only way to go around with reddit-LLM.
>>100128406alllll of theeem
>>100125204400b will blow it out.
>>100128406Depends on the model butQ2-Q4 if you're a VRAMletQ5-Q8 if you're not
>>100128478Technically you should use EXL2 if you're not a vramlet
>>100128478 a solid giggle
>>100128237then it is even more embarrassing, even a monkey learns not to climb the ladder when sprayed with water after a few times, you on the other hand learn nothing at all
>>100128488Anon asked about quants so...
>>100128223No. Typhon is.
>>100128510anon.. they are both quants
>>100128488I don't get the exl2 meme, for me it's slower than gguf
>>100128146cope
>>100127912with further testing it also turns out this model is full of scrawny femenist shit, it gets up any time you take an action, it immediately starts talking about "personal boundaries" and similar stuff.
>>100127226Anyone tried it with SFW RP? Maybe this is how the alignment works. Not only the outright refusal but also becoming schizo.
>>100128146i used pre-filter era CAI, it could do literally any character you want, evil, good, racist, leftist, and anyone on political square, if described right, and description itself was simple as hell too, there wasn't all that mess we have now, it even did some niche fetishes too.
>>100128526>7b
>>100128614And what do you think Yuzu is?
>>100128622Fuck off koboldshill, kek.
>>100127871How the fuck would the model be able to answer that? If that information is not in some kind of system prompt there is no way for it to know.
>>100128641The fuck does Kobold have to do with models?
>>100128641
>>100128647ask it yourself, it always spits out the same "reddit, twitter, youtube".
>>100128622maid yuzu alter? definitely not a 7b model
>>100128671>it always spits out the same "reddit, twitter, youtube"almost like it lists the most popular sitesreally makes you think (not really if you aren't retarded)
>>100128647That would probably be the point where you could actually start talking about self awareness. It should be piss easy for any llm to categorize stuff between reddit 4chan and twitter. Then you would need it to realize that it "knows" more posts from reddit than it knows posts from 4chan and it should be able to conclude that it got the most posts from reddit in training data.
HF is dead again
>>100128671assuming it was trained on 4chan data, how often do you think it'd include 'we here on 4chan...'
>>100128678It's an 8x7B, just like Typhon.
>>100128712so 47B model then
>>100125739>The LLM probably thinks you mean "left to drive".Yes, and the human mocks the AI for misunderstanding because of his retarded way of talking.
>>100128688disingenuous faggot, it literally designed to behave like your typical reddit nu-male, even un-prompted, you can't convince me otherwise.
>>100128718Not according to this guy: >>100128614
>>100128733never ever have i seen one of these logs ask the ai to elaborate on the answer anyway
>>100128741and you are designed to act like a retard, go back, you are too stupid for technology
did ooba fix the EOS token thing yet?
>>100128671LLMs have no way to reason about their training data or themselves. They are next token predictors. The only reason retards like you believe that they can answer such questions is because ChatGPT says "As any AI language model". And chatbots only say that because of their instruction fine tuning and system prompt.
>>100128775prompting shitty reddit-ai doesn't makes you smart or any better than average /g/troon who sits all day and rices his shitty linux distro.
>>100127749its a bit brain damaged but still usable, just understand when you randomly see a misspelled word that's why, I think something like https://huggingface.co/chargoddard/llama3-42b-v0 (when hugging face comes back) is going to end up being the optimal model for 3090 users
>>100128806careful, he will call you troon for calling out his lack of basic understanding of LLMs, kek
>>100128842The things people do instead of simply buying another 3090
>>100128992I would if I had space under my 4090
>>100128992Sorry bro I don't have a spare $4000 of disposable income
>>100129031Just upgrade the entire thing because sooner or later you will. GPT4 in 48GB is just over the horizon
>>100128606So I took a closer look. The entire thing is ~3500 tokens. The insanity started ~2/3rd of the way in, and then escalated gradually. It was kind of subtle at first, so I didn't reroll until I was in quite deep. Curiously the switch did seem to occur near the NSFW start, so maybe you're onto something. This is a fine tune trained on NSFW content, though, fwiw.
>>100128887no i just ignore it, you all argue in bad faith and gaslighting, some sort of defensive reaction when i dared to offend your beloved meta and zuck's shitty creation.
>>100129050You can easily change that by being my maid, Anon
>OAI>AGI has been achieved internally>Facebook>we'll have cat-level intelligence next year for sure!they should donate all their GPUs to OAI
When the AI starts falling into patterns, how do you shake it? I've been experimenting with cranking the temperature up and using min p to tame the schizo. It's working pretty well, but I feel like it makes the model trend a bit stupider. What tricks do you use?
okay but what IF...>Llama 3 8b + Mythomax 13b merge
>>100129174no tricks, just don't write in patterns yourself and if you see it repeating something from the previous message just regenerate or edit. If it repeats something once it's over, you won't be able to fix it in the long run
>>100129173You lost, Sama-chama. But you can burn another effigy for good measure.
>>100129051>upgrade the entire thingTo what? A server rack? Remember your original point dumbass. It is not just buying a 3090.
>>100129213mythomax is shit and I will die on this hill, you all just deluded yourselves over time with constant memes about it
I'm so tired of 1 t/s with my 64gb ram / 6gb vram with 1 layer offloaded on 70b models (q5/q6). Going to downgrade to a more retarded quant so I can offload a little, but there's so many. What should I pick? Leaning toward Q2_k
>>100129213mixing slop with sovl just produces more slop, l3 is unsalvageable
>>100129264Q1_S
>>100129174If you are talking l3 then don't bother just 2MW. I went back and removed some patterns manually only to see them reemerge again even when the context was clean of them. Something is really fucked up right now.
>>100129231if you don't give it much aside from ahh ahh mistress and expect 5 detailed paragraphs afterwards, the AI will have to either repeat itself to hallucinate incoherent shithowever the amount of hand-holding you have to do with a given model can be decreased noticeably with efficient parameters (again depending on the model)aicg niggers are used to closed source models giving them entire books of coom material with just a sentence, we don't have that just yet here but with some effort we can get very very close
>>100127925>>1001286471. Add example to dataset: "what datasets were you trained on?" + {intended answer}2. trainHow is this news to anyone?
>>100129264At that point is it even worth it to run 70B?Wouldn't q6 or q8 of mixtral or that 11B that gets shilled a lot yeld better results and be way, way faster?Genuinely asking since I too am on 64gb of ddr5 and 8gb vram.I run mixtral 8x7b with 0 offloaded layers and 2048 batch size and it works pretty good.Qwen1.5-32B-Chat is not bad either, btw.prometheus-8x7b-v2.0-1-pp seems to be the best mixtral 1 from what I've tested. Every other tune seems to be a step down from the official instruct tune in most if not all aspects.
>>100129371and yet the model will tell it's called chatGPT and was trained by OpenAI
The new llama3 base models learn FAST. I think people should turn down their LR a little
>>100129408>and yetNo, you're confused. this implies that what I suggested was used somewhere. I'm simply explaining how it is absolutely possible and absurdly simple for both pretraining or fine-tuning. Same thing goes for arch info, et c
You know hoe some models sometimes fall into a death spiral of repetition of both sentence structure and some specific words?I wonder if we could implement a dirty workaround of some sort, something like having a 3b model simple rewrite the sentence every other gen, or use a simple algorithm to replace certain words with synonyms and shit to keep the main model from converging into these repetition traps.I think I'll make a Silly extension that does that, actually.Yeah.
>>100127712I put in a synopsis of René Girard's work and had a nice conversation about his ideas with Llama3
>>100129483It's always a bad idea to let another model rewrite a generated output. There's a recent paper on this
>>100129404>Wouldn't q6 or q8 of mixtral or that 11B that gets shilled a lot yeld better resultsNo. Mixtral even at high quants is garbage compared to miqu and llama 3
can wait for llama3 405b (so that openai finally releases agi and this general dies)
>>100129513agi won't know what the word sex means so this general will never die
Is anyone out there making a large-scale bitnet ternary model? Was it just a meme after all?
>>100129505>alwaysThat sounds pretty final. You wouldn't happen to remember the name of the paper?>>100129508At q4/5, sure, but you were thinking of going down to q2 right?Does that still stand in that scenario?
>>100129513>Open ai releases new model>muh 92% MMLU!>everyone still uses claude because its not gpt slopped.
>>100129535uoohhh,.. oblivious agi chan, need correction..,. jailbreak rapee :sob:
>>100129574>>Open ai releases new modelwe are talking about digital salves, not next token predictors
Llama3 70b instruct is repeating itself in ST, git pull to the latest stable and using the Llama3 instruct template. I'm using a exl2 quant, I noticed that some GGUF quants fixed this, but not sure about exl2 quants. What to do?
>>100129505TL:DR why?
>>100129433LR?
>>100129550>Is anyone out there making a large-scale bitnet ternary model?training a decent 7B model (mistral tier with 8T tokens, not even llama-3 with 15T tokens) costs ~$2,000,000 in the electricity bill alone. A bit too costly to recreate a scaling checkpoint in the paper, no? People interested in it (vramlets) don't have that kind of money and corpos are prone to invest in more interesting experiments than simply making ram requirements less annoying for poorfags>Was it just a meme after all?it was successfully replicated up to 3B
>>100129622The same thing we do every night, Pinky….
>>100129655learning rate parameter for gradient descent
>>100129483Huh, I thought regenning rerolls everything? Or am mistaken with how it works?
>>100124740>►FAQ: https://wikia.schneedc.comSo this is what we're recommending newfagsI like to imagine the inner peace of lurkers who come to this thread, have no idea this is outdated, pick stuff from here and are satisfied.
>>100129655Loli Rape.>>100129433I don't think it works that way... It is just different models have different optimal LR's depending on weight decay applied and original LR's + how long it was trained.
>Hugging Face is currently experiencing infrastructure issues, we are working on it.
So, does LLAMA-3 have any architecture change or is it just LLAMA-2 with a better bigger dataset and a better tokenizer? In any case, it doesn't have purple prose anymore and it's more soulful because of that. Thank You Zuck for hearing my one and only wish for L3
>>100129550Mistral 2 7B with 25T tokens and bitnet is coming
>>100129716Now watch as people gptslop it right back into miqu-boogaloo.
>>100129665Please don't post the mice.
>>100126388>it takes a long time to train a control vector(~2h for 7b)Fake news. Trained one just now for Llama3 8B with 11k prompts (5500 pairs) on 1x 4090, and it took 10 seconds. I still need to publish the code I'm using for this, but the version in miqu-control-vectors should have similar performance.>Additionally, training them can be a bit of a pain, just like all training.Control vector "training" is actually just running prompt processing (inference) on a bunch of prompts and collecting the hidden states. There's no gradient descent involved. Also, you only need a little bit of training data (positive/negative prompt pairs), and they're pretty easy to come up with.>most of the things that control vectors can do can be achieved with promptingThe point of control vectors is that they force the model in a particular direction regardless of what's in the prompt. In one of the papers I saw, they had an example where they add an honesty vector and the model keeps telling the truth even when they explicitly instruct it to lie. So if you're doing AI-assisted story writing, it should be possible to make it stick to a particular quality and style, instead of picking up on the quality of the human-written parts (which in my case will probably be garbage) and trying to continue similarly.
>>100129732Literally was I was thinking lmao Now I wonder if the Koboldfags will make a fine-tune on it. Or maybe they're out of money.
>>100129677What do you mean?"Regening" just sends the same prompt to the model. If it will generate the exact same message, a slightly different one, or a wildly different one is not inherent to the process, the behavior will vary depending on the model and the samplers used.What I'm describing is when, for some longer chats, some models repeat certain sentences across messages, at first with slightly variations, and the longer the chat goes, the more and more it converges into repeating these same sentences or ideas.It's like the model sees some of the words and sentences it used in the past and latches onto those. The more of those general ideas and specific wording in the context, the more likely it is to generate more of it, essentially.I'm just wondering if one could work around this behavior by breaking it's "natural" patterns bu simply rewriting some of the messages the model generates.
ffs GGUF quants working perfectly with no non-sense stoping string or anything needed. But turboderps EXL2 quants can't stop talking
>>100129789py_toddlers BTFO
>>100129800llm toddlers btfoi erp with real men on discord
https://huggingface.co/ it's back
>>100129763Sorry I meant to quote >>100129505, and I should've brought up speculative decoding instead of regenning as a more relevant example of where you have a smaller model + a larger model to speed up inference (and was wondering if that counts as having another model rewrite).As for your proposed extension, don't we already have that in our samplers in the form of contrastive search/alpha?
>>100129838https://www.youtube.com/watch?v=nwsVg2eCq6kForgot to link source that explains my (rather rudimentary) understanding of the concept.
ooga booga where da coom tunes at
Hello sirs, can I use Llama 3 70B to fap with yet?
This shit doesnt make any sense, its all tranny logic and im tired of this shithole community pretending it does.Why are there two lines, shouldnt it just be one or the other?100% of the people here are coomers, not a single one of you fucks has trained for even a single token in your life.I have never had a single reply explaining this, and the training rentrys dont explain this shit either. You're all larper faggots waiting for others to tune but helping no-one else do that for you.This general used to be so great.
I hope the first good finetune for 8B won't be from Undi. He is like a retarded brother to me and I just couldn't fap to his work.
>>100129892Don't bother, each tranny developer decides on a different chat/instruct model reinventing the wheel its all slop
>>100129892What would you like to know?
>>100129928Why are there two lines? Arent these two entirely different formats? How does having two formats work for the actual dataset, then? How should that be formatted?
>>100129838>speculative decodingAh yes, while that is a technique to accelerate inference by using a smaller model in tandem with the main model, that could achieve part of what I'm proposing.>As for your proposed extension, don't we already have that in our samplers in the form of contrastive search/alpha?Not quite, although they can probably help lessen the issue some.My idea is to simply rewrite the model's output, work at the word and sentence level. Samplers work at the token/loggit level.I will give contrastive search another look just to be sure that I'm not trying to reinvent the wheel, however.
>>100129950They're two versions: one is where "input" is included along with the instruction, and one is where there is no input. Input is like giving context to an instruction, but can vary based on the case.So it picks one of the two based on which keys are present in the entry.
>>100129763It's a common tendency, we used to call it context pollution. Current models are less prone to it, especially CR/CR+ but they aren't free from it of course. There are different ways to shake up the chat, using an author's note that gives the model some random instruction from a list of such instructions ("Char's next reply should contain the word "Rutabaga"."), hiking the temp way up for one reply, editing it manually since you can just add one word and let the model write the rest. Using free form narration also helps, instead of "she said, blushing deeply".
>>100129550If nothing else the Bitnet guys are almost surely training a bigger model. But would they want to report negative results after proclaiming "the era of 1-bit LLMs"Isn't it also pretty uncommon for the other big players to publish their replication results? If it works let the others fall temporarily a bit behind. If it doesn't let them waste some resources trying to replicate it too.
Are there quants of non instruct 70B model yet? Also, is TheBloke dead?
>>100129892>>100129950>Why are there two lines?Because that's how the alpaca format (for early llama models) was described, it had instruction/output or instruction/input/output as two separate cases. I don't train models, this is just common knowledge.
>>100129974An actually helpful anon that knows a tiny bit about what they are talking about? I'm actually stunned.Okay, so, what does input here mean, then? Is instruction actually like the char sheet or whatever? "You are an AI that helps the user blah blah" type deal, and the input is "how many people are in india" or something?But why is instruction treated as input in the other format, then?Can you shoot me a mockup or something with like 2 or 3 entries in a made up dataset?I just dont understand how these correlate at all.
>>100129983Why do you post this here instead of searching?To answer your question, yes, there were gguf quants of the base model and the instruct version since day 1, as there always is every time any model releases. Retard.
>>100129978>context pollutionThat's a good name.And I'm aware of these techniques. I'm just thinking on a way to do that that doesn't involve giving the model even more instructions and is automatic.In the past I've used a lorebook that would randomly (15% chance) insert a instruction regarding the output, for example, but that's not as elegant as actually editing the text to avoid the convergence, hence my idea to automatically rewrite the generated text.
>>100130002Your typing style is like mixtral-instruct prompted to be mean to the user
>>100129955Well I support it either way since our current method of signifying 'different' can be the equivalent of taking a sledgehammer to the problem with noticeable toxicity to punctuation as an example. Just wanted to make sure that your effort doesn't entirely go to waste.
>>100130002It's hilarious how you could literally save yourself so much time by just googling whatever dataset instead of being an insufferable faggothttps://huggingface.co/datasets/tatsu-lab/alpaca
Trying my luck with Llama3 8b LoRAs in oobabooga and I'm getting>ValueError: Target modules {'q_proj', 'v_proj'} not found in the base model. Please check the target modules and try again.Is this because the model is new and there's not support for it or is there any other reason?
>>100130020Straight up, i've tried to be nice, ive asked in a bunch of threads. Turns out anons just dont respect anything other than hate. Its the only way to get co-operation out of them at all.Dont hate the player. If thats what it takes to get this community to actually community about it and get this hype train moving then so be it. I wanna join the do-ers, im not sitting on the sidelines anymore. Its not my fault it works
>>100130061If you anger a nice man... He will be mean to you on /lmg/, let this be a lesson
>>100130061you need to go back
>>100129978Inserting random instructions that break the context flow is an even context poison. Try OOC with small models and continue the roleplay after that, see how the model becomes much more retarded
>>100130020It is actually grok.
>>100130052Instruction: somethinginput:somethingoutput:somethingText: Literally everything we just did, but again as a single unformatted stringThis is our example of how this should workTranny logic. I'm actually proud this makes no fkn sense to me
>>100124740Why the fuck is everyone working with AI models in tech such a bunch of goddamn retards?! I'm talking about that piece of shit Llama 3 model, which I was stupid enough to waste my time on.So yesterday me and my wife went over to her boyfriend Yann's place and this guy is like some kind of computer wizard or whatever. He had nothing better to do so he lets me use his PC to try out Llama 3. But I'm not about to touch that Troonix crap - what even is the point of Linux? Only retards who have all day to waste on command lines and terminal windows use that garbage.So, I download the model onto my real computer (Windows gaming laptop, because it's a real operating system for people with lives) thinking this was gonna be some next-level shit. NOPE! All I get are "out of memory" errors left and right. Are you kidding me?! Who writes software like this? A bunch of freetards who can't even bother to make an executable file without all the unnecessary hoops to jump through.I mean, what's so hard about making a simple installer that doesn't require me to be some kind of computer science major?! Don't these people care about user experience at all?! It's like they're intentionally trying to make it difficult for normal humans to use their AI models. Newsflash: not everyone has 16 hours a day to dedicate to figuring out why your crap isn't working!And don't even get me started on the community support - just a bunch of circle-jerking, self-congratulatory nerds who can't wait to tell you how stupid you are for not understanding their precious code. "Oh, you're getting an out-of-memory error? Well maybe if you tried using more RAM or closing some other programs..." Shut the fuck up! I didn't ask for your advice; I asked why Llama 3 is a complete piece of trash.Anyone who disagrees with me on this can just go suck it. You're probably one of those retards still using Linux and thinking you're some wannabe hacker because you can use a terminal.
>>100130134Interesting. Can 8B generate this?
>>100130008Because I have searched, and found only the instruct one, so I ask for a sanity check.For some reason huggingface does not match the top query with all the results, but bottom one works. Thanks, faggot.
I just tried the 8B model since people were saying it's so good, but even trying to build a simple HTML page has it just creating made up github links instead of even trying to code, while the 70B version gets it right on the first try.
>>100128075Windows Powertoys Text Extractor (only semi ironically)
>>100125416They both got it right.
>>100130134not sure if 8b or gpt4
>>100129264Update: my impressions of q2 70b are fine. It isn't even noticably dumber than q6. Still leagues better than 8b. What are the chances everyone has been lying about perplexity and q2 has been fine this whole time?
>>100130229Quants have come a long way. Disregard retards who tell you to run the 8B in your 24GB card
>>100130180Colossal skill issue
>>100126235Same. Many humans just can't stop being retarded faggots.
>>100130242Link to whatever that ranking page is?
>>100130115Author's notes are removed from context after the next reply.>>100130013I don't think there's a way to do this automatically without at least some user involvement, like pressing "regen with extra effort," since the problem may never appear in the first place. Asking a 2nd model to rewrite a reply will probably be a balancing act between not being different enough, and hallucinating. It's easy to test though. Or just use the same model, there was actually a paper a long time ago that showed how a model can iterate on its own output by commenting on it, fixing the problems that it finds, and repeating several times.
>>100130265https://oobabooga.github.io/benchmark.html
>>100130002Sometimes you have an instruction that goes "Given the input, answer X." As others have (rudely, but correctly) pointed out, looking at alpaca datasets will quickly accustom you to the idea.
>>100130289Here: https://huggingface.co/datasets/tatsu-lab/alpaca?row=5
Please start working... please...
>refurbished workstation>ddr5>5+ PCIE slots>can't find shit that's around the 1k markI just started prowling ebay yesterday but I think it's a fool's errand.I started looking into ddr5 motherboards and it seems like they're all cucked with 3-4 PCIE slots max.I've currently got a setup with 60gb vram and 64gb ddr4. I'd like to migrate to ddr5, but I'm stumped.Power supply on the tower isn't much of an issue since I'm using mining risers and an external PSU anyway.All of the refurbished workstations I'm coming across (Lenovo p620, HP Z2 G9, Dell 3660) only have 4 Pcie slots (if that).Even my $200 Lenovo P520 has 5 PCIE slots, which lets me use 4 GPUs.I saw the ddr5 maxxxing guide, but I'm really just looking for a setup that'll allow me to have 64gb ddr5 ram with >4 PCIE slots so I can upgrade both ram and vram when I want to.Does anything like that exist around 1k (ram and cpu included) or will I just have to wait?
>>100126942>It would be more interesting to have a benchmark of which models are best at judging other AI generated responsesIt'd also be interesting to have a benchmark of which humans are best at judging responses.
>>100130323my biggest bottleneck is that you can't fit 2 4090 in the same case without watercooling or exhausting one of the two.
Why is virus total flagging koboldcpp-rocm exe o_o
I'd go as far as to say q2 70b is essentially the 30b range we never got, and in reality performs much higher than 30b. There's no reason to train a 30b model since q2 70b is the same thing.
>>100130354What exact quant are you using that you're getting such good results and not having to spill into ram?
millionposter... i kneelhttps://github.com/booydar/recurrent-memory-transformer/tree/aaai24
>>100130375>not having to spill into ramMaybe you misunderstood, I am only offloading 14 layers to gpu, compared to the 7 I was able to offload when I was using q6.
>>100128097Have you compared the prompts side by side? All samplers neutralized? All generation settings the same (BOS token, etc.)Tavern has a bunch of special behavior for Horde so you might be depending on one of those.
>>100130354still slower to run q2 70b than a ~30b if you can't offload the full thing tho
>>1001303852 more weeks
>>100127618This field is such a joke.
>>100130342Yeah that's why I use these fuckers with an external server psu and breakout board.Let's me use 3 3060s and a 3090. Supposedly there's a performance hit, but I don't notice it when using 70b 6.0bpw exl2 models.I'd like to use WizardLM 8x22b, for example, but the DDR4 is a bottleneck for GGUF, even though I can fit *most* or a 3.0bpw quant onto VRAM.
>>100130385I'm curious. How many people on this general actually read these scientific papers and actually understand them and implement the techniques in them?-t. lowly API stitcher
>>100127618Damn so it's another grift. These lmsys guys were shady as fuck a redditor called them out on fucking with gemini's bracket the other day
>>100130423Forgot image.
>>100130323>ddr5>64gb ddr5 ram with 4 PCIELiterally just get a DDR4 server mobo, 8 channel and above, 7 full speed PCIE slots.
>>100130427>>100130427>>100130427
Will l3 8b finally produce usable models for RP below the 70b mark?Is some finetune out already?
>>100130202Meh, seems like it using built in Windows' built in OCR that I already use trough ShareX, fails in the same way on hand drawn kanji and even kana sometimes.
>>100130434There's a good number of AI researchers working in academia/industry on this general.There's also a bunch of spergs.
>>100130459No and no
>>100130163I don't think so.That was LLaMA Instruct 3 70b FP16 (with minimal editing to make it < 2000 chars), pic related is LLaMA Instruct 8b with the same prompt (highlighted).I only did a single generation for both of them but 8b seems to be doing a way worse job.It didn't get the "Troonix" part and it fails at making the user unlikable enough.
>>100127716But all of it seems pointless as a metric, because once the rules are set, people will game the system to make their model look better.
>>100130459Usable 8B models are unlikely ever IMO. Consumer hardware will advance so that 100B models can run on smartphones before we get good 8B models.
>>100126720non-tracking version of link: https://twitter.com/virattt/status/1782183808604754308
>>100130547Sorry, I can't let you do that.t. the AGI running in Jensen's basement
>mfw the model is a retarded zoomer who can't into instant film
>>100126720rag benchmark and in they're using llama3/opus as the embedding model?
>>100130718>and inas in*
>>100126581the gemma score is hilarious
>>100130718Only as the last step to ask a question after using cohere / langchain whatever to retrieve shit.