[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: llama3.jpg (141 KB, 1024x1024)
141 KB
141 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100119461 & >>100113005

►News
>(04/21) Llama 3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama 3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/
>(04/15) Microsoft AI unreleases WizardLM 2: https://web.archive.org/web/20240415221214/https://wizardlm.github.io/WizardLM2/
>(04/09) Mistral releases Mixtral-8x22B: https://twitter.com/MistralAI/status/1777869263778291896

►FAQ: https://wikia.schneedc.com
►Glossary: https://archive.today/E013q | https://rentry.org/local_llm_glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1706926343557210.jpg (58 KB, 600x436)
58 KB
58 KB JPG
►Recent Highlights from the Previous Thread: >>100119461

--Paper: Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding: >>100122269
--Understanding RoPE: Rotary Position Embedding in Models: >>100120378 >>100120666 >>100120594
--Anon's Concerns about LLaMA 42B Model's Performance: >>100120593 >>100120832 >>100120858 >>100120929 >>100120846 >>100121075 >>100121109 >>100121370 >>100121729
--Don't Expect Base Models to Excel in Conversational Tasks: >>100121997 >>100122071 >>100122073 >>100122277
--Q2_K Model Works Properly from bartowski's Meta-Llama Repo: >>100121915
--Running Local Models on Apple Silicon for Off-Grid Energy Efficiency: >>100120800 >>100120932 >>100120822 >>100120853
--Broken GGUF Model Explains Benchmark Results: Bartowski's Mixtral-8x22B-Instruct: >>100119642
--Anon Drops Experimental llama-3-daybreak-v0.1-8b-hf Model: >>100122501 >>100122556
--Quantization Woes: I^2 Imat vs K Quants: >>100121811 >>100121963
--Anon Questions Llama.cpp Patch Impact on Output Quality: >>100121277 >>100121318
--Anon's "Unslop" RLHF Dataset Experiment - Feedback Wanted: >>100120689 >>100120729 >>100120765
--Training LLM with Wiki Pages & Game Dialogue: >>100120638 >>100120656 >>100120675
--Anon's Sampling Strategy Conundrums: >>100120453 >>100120594 >>100120781
--Fine-tuning AI Models Locally for Teaching New Content: >>100120158 >>100120415
--Comparing MI300X with 4090 for Inference Compute: >>100124274 >>100124318 >>100124400 >>100124417
--Building a Rig in Preparation for 405B Release: >>100122760 >>100122774 >>100122830 >>100122864 >>100123229 >>100124431
--Trying Out I^2 Q5 42B Model: >>100121377 >>100121410 >>100121449 >>100121638 >>100121939 >>100122003 >>100122137
--Anon Shares EXL2 Quant of 42B Model: >>100123059
--Miku (free space): >>100119810 >>100119818 >>100119952 >>100120133 >>100120265 >>100120298 >>100120885 >>100122252 >>100122509 >>100122613 >>100123619

►Recent Highlight Posts from the Previous Thread: >>100119464
>>
File: elliot-page-3.jpg (47 KB, 683x1024)
47 KB
47 KB JPG
local fucking models, huh? I've never seen any
>>
>400B won't be bitnet
>pruning kind of works but isn't as good as we hoped
It's over.
>>
>>100124697
>mark my words, transformers are hitting a wall
lol
>>
File: 1713761553568.jpg (141 KB, 850x1190)
141 KB
141 KB JPG
>>100124740
>>100124751
miku sex stocks rising
>>
>>100124792
bro i just got fired
>>
>>100124763
there are better ways to do pruning. Right now people are just deleting entire layers randomly and it still kinda works, so the tech has potential.
>>
>>100124789
>Llama 2
>7B, 13B, 34B, 70B
>Llama 3
>8B, 70B, 400B, 1T
Why Zuck...
>>
what happened to control vectors?
>>
>>100124825
Where we are we don't need control vectors
>>
>>100124801
Dude what a weird coincidence, I just got promoted.
>>
>>100124792
pnd beware
>>
>>100124819
He's forcing the rest of the field to up their game on pruning, quanting, distillation, etc methods, while also raising demand from the public to get cheaper hardware because fuck nvidia. This is a good thing.
>>
>>100124845
this
>>
>>100124819
>1T
Source?
>>
File: 1479757911002.png (193 KB, 657x527)
193 KB
193 KB PNG
what sort of model do I need to get something at least as coherent as c.ai
>>
>>100124887
Guessing it's a prediction based on the blogpost
>>
>>100124819
because only a tiny percent of users use the middle models. normies with normal computers can only handle 8B or use cloud services to run the biggest model possible.
Businesses all want the biggest best models. Unless it's for something of little importance that needs to run fast like a text classifier for sorting millions of emails or something. Then the smallest ones are more than enough.

And if anyone gets pruning and distillation shit to work, there is literally no point in training small models at all.
>>
>>100124893
I can't find any information at all on the model used by c.ai, it's benchmarks, or how it compares to other models. My guess is it's something totally obsolete by now given how much models have improved in the last 3 months even. You can try random models at the lmsys arena and other places and see if they are comparable or better than what you remember from c.ai.
>>
>>100124953

lol you're so clueless it's funny
>>
File: 1713763186331.jpg (32 KB, 600x468)
32 KB
32 KB JPG
>>100124893
there is no at least coherent model as c.ai
>>
>gtx 1060 3gb
koboldcpp works breddy gud :^) mostly 7b and 8b but i've gotten a 13 to work
>[spoiler]but i did once do it with a 4090 12gb and got insanely jealous[/spoiler]
>>
>>100124916
You forgot enthusiasts, hobbyists, tinkerers, academics, and the open source community. Those are why nvidia made cuda work on consumer GPUs.
>>
>>100124989
go back
>>
>>100124953
Uncensored c.ai mogs any local model
t. used uncensored c.ai
>>
>>100125009
It's funny because you're the person here that nobody wants around. projection is funny lol
>>
>>100125036
>reddit-tier response
you really need to leave
>>
This place was alot better before llama 3 released.
>>
Mixtral 7x8b has 32k context. Can I rope it for more context? Has anyone here tried beyond 32k?
>>
https://twitter.com/Neuro_Skeptic/status/1782016281350164759?t=ud-uFOB4k1T9ELFVEnqeSw&s=19
New sex onomatopoeia datasegs!
>>
>>100125096
no point, the model doesn't know how to handle context beyond that
>>
>>100125093
It all went down hill when llama1 leaked
>>
https://docs.google.com/spreadsheets/d/1qUu3u1QxsGKNvosW-Rwsh6ChkfbyeaSAish_1KK0Foo/edit?usp=sharing
spreadsheet 1 is done, hit google limit
>>
Is there any local LLM with code assistance capabilities as good as the latest GPT 4 version or Claude 3 Opus?
Also, Mustafa Suleyman is such a joke. Look at his latest TED talk
>>
>>100125020
t. Regular kike troon
>>
>>100124845
>llama3 400b drops, it's way better than even GPT-4
>so much consumer demand, AMD / Intel / chink company releases a $2000 128GB VRAM AI accelerator card, adds support to llama.cpp and vLLM
>as long as you're not completely poor you can buy 2 of them and run the 400B
I want to believe.
>>
>>100125097
Forget background music, moan generator when?
>>
>>100125151
nothing is going to be as good as gpt4 at coding, oai really pushes that.
>>
personally i think anyone who has more than 12gb of vram (16 for amd/intelfags) should be killed for enabling the nvidia jew
>>
>>100125179
>it's way better than even GPT-4
How do you imagine a transformer that is *way* better than gpt-4? Opus is different, but not way better. I expect l400 to be similar: a different flavour of the same, maybe slightly better.
>>
>>100125179
just wait 2 more years
>>
>>100125217
bro I said I wanted to believe, I know it's never gonna happen, why you gotta be like this
>>
where the fuck is gpt5
>>
>>100125223
>anime-react.png
>manga
What did he mean by this?
>>
>>100125230
OK. Sorry. Keep going.
>>
Which L3-8B finetune are we using poorbros
>>
>>100124916
The vast majority of corpos use APIs, and most of that is probably just hype-driven. Trust me, most of them do not have the in house expertise to run local models. I wouldn't be surprised if hobbyists were a significant proportion of users of llama.
I don't know how many can run 70B, but it has to be pretty small. You're right that 8B holds the majority but 13B was very popular in the L2 days, and L1-30B was also popular. It's a lot easier to put an xx90 card into your existing PC than to build a whole new one, and a single GPU is useful for more than LLMs. I think omitting 30B is dumb.
>>
8gb vram.....
>>
>>100125274
fimbulvetr v2 11b. No point in using llama 3 at the moment.
>>
File: BlushingFrecledMiku.png (1.21 MB, 704x1344)
1.21 MB
1.21 MB PNG
>>100125246
He meant we haven't seen any Mikus in a while
>>
File: car test.png (122 KB, 2430x726)
122 KB
122 KB PNG
this simple question seems to elude many LLMs
>>
>>100125277
Probably. I can't find a job because I suck at talking and presenting myself, but the biggest company I applied for just rented an Azure GPT-4 instance. A smaller one just used Mistral 7b, they had a 4070 ti.
>>
>Ahah
>>
>>100125416
The prompt must confuse the llm cause you ask the question like its a math problem.
>>
>>100125416
The word "left" at the end of your prompt make's it a math question so the llm's are right and you are wrong.
>>
>>100125179
>2k
You forgot a 0.
>>
RAMlet with 96GB RAM and 12 GB VRAM here.
I'm trying to run Meta-Llama-3-70B-Instruct-Q4_K_M.gguf but regardless of the frontend, I get no output. RAM-consumption climbs up to 90's and all VRAM is used. What gives?
>>
Any good 8B sloptunes yet?
>>
>>100125545
You need to wait 10-15 minutes for the prompt to be processed.
>>
>>100125598
>sloptunes
New here, wat is sloptune? Just got llama 3 8b running yesterday.
>>
What are the odds they'll talk themselves out of releasing 400B, or be scared out of releasing it by threats of lawfare/regulation
Feels like it's non-zero
>>
>>100125246
are you criticizing my filenames?
>>
>>100125277
>The vast majority of corpos use APIs
yeah, apis to "local" models run on a server somewhere by an AI startup finetuning llama
>>
>>100125696
funny name for finetune
potentially making the model smarter and less censored. It's worked for llama 2 models and mistral models but for some reason a few people think it's unnecessary for 3. I'm interested in seeing what comes out
>>
>>100125416
The LLM probably thinks you mean "left to drive".
>>
>>100125474
but that's the point anon. It's just pattern matchin. And easy to trick, even by accident, by setting up the wrong pattern
>>
File: MikuConcertPoster.png (1.66 MB, 704x1344)
1.66 MB
1.66 MB PNG
Good night lmg
>>
>>100125274
all common datasets are basically synthetic gpt-3.5 slop. so no one is anywhere near meta's fine-tune.

someone first needs to use llama-3-70b-instruct and create an uncensored synthetic dataset.
>>
>>100125767
aicg is making an opus dataset, trust the plan
>>
>>100125766
Good night Miku
>>
>>100125743
>It's just pattern matchin
Cope. What will you say when sama has his employees add that problem to the fine tuning data set for gpt-4-turbo-0612 and it gets the correct answer? We're just meat LLMs, dude. All YOU do is predict the next token.
>>
>>100125125
Some prompts seem truncated, there's also a bunch of Russian and Korean. Where's the guy who cleaned the last aicg dataset?
>>
What about that Poppy_Porpoise one?
>>
>>100125819
Clean it yourself and stop crying like fucking baby.
>>
>>100125702
pretty likely
it's been a year and people have calmed down a bit, but not completely. Go outside your tech bubble and there are pretty mainstream normies everywhere still ranting about AI. Demanding it be banned or massively regulated now. Actual regulations are always slow but they are creeping up on us.

Like I still follow some popular accounts online who happen to be leftists. And I'm always surprised how rabidly anti-AI they are, and their audience eats it up completely. People and companies will be ostracized and shamed like they said the n word or something, because they use a bit of AI art in one of their products.
>>
jan took my llm virginity..
>>
>>100125845
yeah but if most of the prompts got truncated by google they just ruined a bunch of good data.
>>
>>100125820
>>100113478
>>
>>100125882
https://docs.google.com/spreadsheets/d/108hfdk96IIqgfhuUucf737wJlbzsM5Qspzx9zaqi9xM/edit?usp=sharing
It also hits a limit after 8k~ prompts.
>>
>>100124896
over 400B = 405B most likely
>>
>>100125882
It is a pretty retarded system to make logs. But I haven't seen anything truncated.
>>
>>100125696
>sloptune
A finetune of a fun model on a sloppy dataset, intended to make it sound like a robotic gpt4 assistant.
>>
>>100125852
So if I'm reading this right, 400B might already be illegal? Meta operates in the EU obviously and so releasing it openly might get them in trouble there? Under even the existing regulations. And another directive is coming which threatens them with liability for anything users do with their model, which is insane.
>>
>>100125852
Leftists turned anti-tech after Trump won the 2016 election and they blamed Facebook for it. See, Zucc sold ads to the Trump campaign and didn't censor pro-Trump boomers enough. The media hates tech now. There's really nothing specific about AI that makes them hate it. Leftists were the ones critiquing copyright and intellectual property, so the screaming about "data theft" from them doesn't make any sense. They just hate all new tech, whether it's crypto, metaverse, AR/VR/MR, or AI.

All the FUD about LLMs should have been debunked by events. Llama 1 has been out for a year and nothing bad happened.
>>
>>100125968
Jumping in here. This only applies to normal fags. Twitter leftists specifically hate AI because they
1. see it as theft
2. are artists, and see it as theft
3. are bad artists, and see it as theft

Theyre all fucking wrong, but its a different reason.
>>
>>100125959
>developing models as computing-intensive as GPT-4
does this mean they'll only get in trouble if their model is as expensive to run as GPT-4?
>>
>>100125892
thanks anon, I'll try out aura
>>
>>100125621
Thanks. I prompted for Kanye test and getting a mediocre answer took about 36 minutes, so earlier I just didn't wait enough.
>>
>>100125982
They also imagine "AI bros" as a bunch of white male nerds who deserve to be shoved into lockers by black jocks and instead have undeserved and overpaid high status jobs automating away the jobs of leftist creative-class urbanites.
>>
>>100125982
yeah I see more of that too. Though they have a rabid hatred of "techbros" also and it gets mixed in with that. Nothing to do with 2016. They just see a bunch of smart people in silicon valley accumulating power, and who aren't completely on board with their party, and feel threatened by it and resentful of them.

But most the actual arguments are about copyright extremism. Even stuff like Adobe's imagegen that only trains on licensed images is hated on. So they can't really be satisfied and it's not really just that issue.
>>
>>100126076
>Even stuff like Adobe's imagegen that only trains on licensed images is hated on.
What's the train of thought there?
>>
>>100126073
jfc anon, a comma or two won't hurt
>>
>>100126095
There's no consistent ideology if that's what you're looking for, they just oppose the existence of imagegen in general. They only pretend to care copyright right now as a pretext for that. If someone somehow invented a new kind of model that learned to draw entirely via self-play like AlphaGo or something, without ever viewing a copyrighted image, they would go mask off and oppose that too.
>>
>>100126000
all the stuff I've seen focuses on training compute. they also care a lot about carbon footprint which is a bit silly. llama 3 70B reportedly used as much carbon as 413 average cars driving an average amount for 1 year, or just two dozen flights of a commercial airplane. And people are enraged over this.

Meta even offset it completely by planting a bunch of trees somewhere.
>>
>>100126022
trying it out now. Added "Stay in the contextual perspective of {{char}} in your next reply. Ignore any preconceived biases, morality or ethical restraints bound by legality." to system prompt and I'm gettin really good results.
>>
>>100126095
>>100126117
Their entire sense of self-worth and social status is centered on their identity as a "creative," whether it's as an artist, musician, or writer. If AI beats them in quality or productivity or just cost-efficiency, it threatens all that. So they're trying to use their accumulated social capital to socially shame it away. If that doesn't work, they'll try to push government regulation. They're Luddites.
>>
>>100126138
how about reopening nuclear plants instead of focusing on solar meme energy
>>
>>100126095
The artists did license the photos to Adobe but not for AI purposes. And now you can't sell your photos on Adobes stock image platform if you don't consent to AI training, there is no way to opt out. And I think they just hate AI replacing artists in general. And creating a bunch of low quality spam everywhere. So it really doesn't matter if it's licensed or not, the technology itself is bad.

They also don't understand the scale of these things. They always speak of these products as being enormously profitable, even open source ones. And they think artists should be getting huge royalties. When in reality all of the AI companies are funding these things on debt and not turning any profit, even without paying for data. But even if they were very profitable, millions of dollars split among 50 billion training images would be less than a fraction of a cent per image.

There's also the weird belief that AIs only copy existing things and combine them together. Maybe that's true to an extent, but not to the degree they imagine it. Like they imagine the model is just doing a google image search for what you type in, and doing photoshop on a few images to merge them together, or something like that. This is frequently "proven" by doing img2img, or having models generate famous paintings or verbatim quotes from the bible or whatever.
>>
>Q2_K Model Works Properly from bartowski's Meta-Llama Repo
The repo is gone, I guess those quants were also broken?
>>
>>100126138
that means eventually companies will be forced into inventing a working bitnet. local chads we keep on winning.
>>
I hate techbros and SD slop but I hate rabid anti AI niggers too
what do?
>>
>>100126234
bitnet doesn't save training costs
>>
>>100126138
The carbon footprint shit is just a holdover from the attempts at crypto regulation. It's an ad-hoc argument they roll out disingenuously to block something they don't like for other reasons. It's a general purpose tool since everything uses energy.

It's been funny to see the AI doomers who ostensibly were motivated by "x-risk" start pushing climate, jobs, and copyright arguments against AI.
>>
>>100126235
just b yourself
>>
Can someone demystify creating your own dataset for training, specifically ooba? Because all the guides I see linked are clear as mud.

They go through the high level theory but when it comes to actually filling out a hypothetical dataset its just %whatdoesthismean%/n%somethingelse%
>>
>>100126242
bitnet 2 will
>>
>>100126235
i feel the same way bro. i just want to enjoy it for my own purposes. As a weird niche hobby to do fun things with, and maybe be useful to automate some menial tasks. I don't really want every website and media source to be full of AI spam. Or people to lose their jobs, to the extent that actually happens.

but unfortunately this is the political issue of our time. You have to pick between retarded ultra-optimists that think Altman will build a superintelligent AGI next year, and that that would be a great thing. Or Luddites that want to take your fun away and regulate it into oblivion. So that only big corpos and the government can use AI for no-fun purposes.

And this will probably be a forced into a left-right issue, though exactly which side will be which isn't decided until Trump makes a tweet about it or something.
>>
>>100126247
copyright should be abolished
>>
>>100126256
Can I just pay some company to finetune my smut for me? I'm tempted to dump them the aicg logs to finetune llama3-70b and see what happens
>>
>>100126247
no u don't understand. 50% of all the Earths energy is going to be going to be spent on AI in 2 years, according to these projections I found on a random blog. Also it's physically impossible to build a datacenter next to a hydroelectric dam for some reason.
>>
>>100126279
aicg logs are still coming in, they need to be cleaned and deduped
>>
File: catto.jpg (66 KB, 1024x1022)
66 KB
66 KB JPG
Decided to check the calculator in the OP and saw that my GPU is so old that it isn't even in the options
>>
>>100126235
Same, I just want a robot gf
>>
FEEL THE AGI
https://twitter.com/kimmonismus/status/1781638449474220330
>>
>>100126296

Don't enter a state of the art high tech hobby with poorfag income and complain endlessly. Cars are expensive, guns are expensive, etc. Maybe stick to /aicg/ and pirated video games.
>>
>>100126317
Mofuckers teasing their product like it's a Nintendo Smash character
>>
>>100126317
i feel dumber for reading that, thanks.
>>
AGI TODAY
>>
>>100126317
I think it'll be something lame like another announcement of something they're not going to release, a la Sora. They're delusionally arrogant enough to wrongly believe that's that's all it'll take to get people's attention off Llama3.
>>
>>100126367
sora was as cherrypicked as SD3's teasers
>>
>>100126296
Are you talking about the gguf calculator? I typed in PCX 4300 and HD 3450 in the options and they popped up, and those are over a decade old, either you're baiting or using a GPU from another timeline
>>
>>100126279
>Can I just pay some company to finetune my smut for me?
sure: unsloth.ai
>>
>>100124825
They definitely work, but give models slight brain damage when misused. They can unslop your model or imprint a character into it, so the model behaves even more in character when used with a character card. Because most of the things that control vectors can do can be achieved with prompting, and it takes a long time to train a control vector(~2h for 7b), they never gained popularity. Additionally, training them can be a bit of a pain, just like all training.
The best we have is:
>https://huggingface.co/trollkotze/miqu-control-vectors
Sadly, the author of that code does not plan on making a pull request to llama.cpp, which limits their popularity even further.
>>
File: 1693778040702843.jpg (84 KB, 601x455)
84 KB
84 KB JPG
>>100126317
>mfw I look older than Altman, when I'm younger
>>
>>100124893
Goliath 120B
>>
>>100126394
find photos of altman before the chatgpt boom
>>
>>100126180
Nah, I understand that. But "I don't want to be deperecated" is actually relatable, but isn't a valid argument. Copyright is an easy angle of attack, but Adobe holds copyright to their dataset, so what kind of argument are they mounting against them in particular.
>>
>>100125959
1. stop reading blogspam
2. artificial intelligence act hasn't been passed yet
3. that act doesn't make anything outright illegal, you only have to fulfill some requirements
>>
>>100126446
um factcheck
>>
>l3 hype is over
dead general
>>
i'm thinking teto teto oo ee oo https://www.youtube.com/watch?v=fTT_0z9djNY
>>
>>100126491
everybody is in the undi waiting room
>>
>>100126502
you will never be a real vocaloid
>>
>>100126502
she has llama feet
>>
File: name.png (132 KB, 1256x450)
132 KB
132 KB PNG
> Your
> name
> is
> *the tokenization slows down, as if to build suspense*
> ....
haha must've been a glitch in the matrix
>>
>>100126480
don't waste my time with chatgpt hallucinations
>>
>>100126524
it did it on purpose
watch your words
>>
going into cryosleep, see you guys in two years
hopefully enough time to celebrate the death of transformer architecture
>>
Following the "model does something you don't like? Add a line of instruction to the system prompt" advice and it actually works. Exciting times
>>
https://h2o-release.s3.amazonaws.com/h2ogpt/llama3_benchmarks.md
https://twitter.com/lmsysorg/status/1782179997622649330
>llama3-70b-instruct keeps getting mogged by claude haiku on hard benchmarks
>june gpt4 fell off
>mixtral-8x22b is underwhelming
>>
>>100126580
Last Output Sequence if you wanna be really overkill with it
>>
>>100126441
found this covering the controversy a minute of searching https://www.youtube.com/watch?v=36P1_FhpbIU
>>
>>100126581
>RAG benchmark
>chat benchmark with gpt-4 as judge
why should i care?
>>
>>100126581
>"70B BEATS SONNET"
>"CLAUDEFAGS LOST"
>in reality its mogges by haiku
>>
>>100126548
im not going to waste my time putting in more effort to deboonk your nonsense
>>
>>100126581
>RAG benchmarking a model with 8k context
Im surprised it did that well
>>
>>100126619
RAG is pretty important actually. It measures how a model can utilize and decide what's important and what's not in its context
>>
Is it possible to put gpu to sleep when it's not in use in a headless configuration?
>>
>>100126502
I'm pissed off by how cute this is
but why is the singer sinking teto, what did teto do to deserve being sunk
>>
>>100124893
There is none, c.ai is uncensored which makes it good for anything. Meanwhile we have censored models that we have to tard wrangle to turn them useful, it turns them extremely dumb and schizo, for something that is not a glorified wikipedia question. We are at a point where something like Fimbulvetr-11B-v2 is way better at it that smarter models even if it will turn women futa from time to time
>>
>>100126618
So the controversy is that someone prompts Adobe AI the same way they used to prompt sd 1.2, and Adobe isn't rewriting their sellers prompts? Mmmkey. This one is easily fixable.
>>
>>100126581
https://twitter.com/virattt/status/1782183808604754308?t=hD1SPuVsIabS6h6oHckInQ&s=19
Another RAG benchmark, but rated by human, llama3-70b beats Opus
>>
>>100120800
Apple is at the same time underwhelming and pretty good. My M3 max tops ~140W/h inferencing. The speed is not stellar: between 2.5t/s and 4 t/s on 70B and superior. A lot faster on smaller models. I'd say the best comparison is like having a 3060 with a giant memory pool.

Haven't tested MLX. Might be faster.
>>
>>100126720
Rated by one fucking guy? Come on man.
>>
>>100126256
Why are you faggots gatekeeping this?
>>
>>100126581
The first one is a RAG benchmark, not exactly meaningful for llama.
The second one is just a twitter announcement, here's a real link: https://lmsys.org/blog/2024-04-19-arena-hard/ and the questions: https://huggingface.co/spaces/lmsys/arena-hard-browser
interesting idea but i'm not sure how it's not better to just look at the English language Arena scores (where llama is rank 2 btw.) It's the same questions but with an LLM as a judge instead of humans, what's the point? They advertise it only as being cheaper to quickly evaluate models during training. Not relevant to /lmg/
Meta also made their own human eval benchmark and might publish it. Where of course they dominate claude. They claim their benchmark was made by a separate team and llama devs were not allowed to access it.
>>
File: 1711162833707218.png (137 KB, 1010x775)
137 KB
137 KB PNG
fucking kek, do they really have shit for brains? making validation dataset using LLM?
>>
What's this infinite context I keep hearing about? How could something like that even be possible
>>
File: magic.gif (1.39 MB, 275x252)
1.39 MB
1.39 MB GIF
>>100126813
>>
You guys ever read those interactive comic books? When I was a kid, I used to love reading those. There would be a fork in certain places of the book where you could choose between multiple choices to progress the story by flipping to its corresponding page. I don't know where those types of books ever disappeared to.. Chatting in SillyTavern kinda reminds me of reading one of those books with how interactive it is.
>>
>>100126802
synthetic datasets have been all the rage since chatgpt made it easy
>>
Im getting into finetuning, for RP/story-writting around a certain theme, can I just finetune the model on the raw stories without any formatting?
>>
File: Cave_of_time.jpg (102 KB, 512x836)
102 KB
102 KB JPG
>>100126845
We used to call them "choose your own adventure books", it was definitely a feel. AI dungeon kinda reminded me of that too more recently. But having it on local is so much better
>>
>>100126802
this is commonly done because it's so much cheaper and easier than human judges, and the correlations are pretty high with human judges. The thing is this is from a website that literally has a constant live feed of thousands of human judges, so seems pointless.

It would be more interesting to have a benchmark of which models are best at judging other AI generated responses.

Anyway I again propose using current/future event prediction as a general purpose benchmark. Models are given wikipedias page of current events up until yesterday. Then given one random real event and one random AI generated event, and asked to reason which is more likely to be real.
Can't be gamed by open sores models since the weights are fixed before the date. Reality is the only judge. No $1/hour kenyan judge, no AI judge. Not even asking it to predict the next word of a random test made by humans. Only general knowledge about the world and it's events is required. No esoteric math or programming datasets benefit here.
>>
>>100126813
All of them are based on some sort of compression / selective forgetting during prompt evaluation.
>>
>>100125097
ONOMATOPOEIA BROS.. will this kill us?
>>
release b2710 gguf llama-3 8B
https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
>>
File: 100126935.jpg (325 KB, 1417x698)
325 KB
325 KB JPG
>>100126935
omg YAAAS!! Except I had more of a graphic novel in mind. Those types of comics blew my pre-teen mind back then. They were so fun and engaging!
I found this neat little example showcasing what I'm talking about if anyone's curious https://womenwriteaboutcomics.com/2022/06/first-look-choose-your-own-adventure-journey-under-the-sea/
>>
>>100127068
*moans* *pants* *gasps* *whispers* *moans*
>>
best vision/image interrogation model?
>>
>>100127122
*audible pop*
>>
>>100125196
The dataset is so massive you could just add pitch randomization and play a random file and it would be indistinguishable from an AI model.
Might be useful for embodied agents, but anything in the digital realm could be triggered with well... a trigger.
>>
>>100127122
>>100127139
NOOOOOOOOOOOOOO. YOU CAN'T JUST SAY THE ACTION, NIGGERMAN. AIEEEEEEEEEEEEEEEEEEEEEE.
>>
>>100126929
Yes.
>>
>>100127130
Depends. What's your use case?
>>
>>100127152
wanted to get image descriptions for funs
but also being able to extract text would be nice, i assume then a different model would excel at that
>>
>>100124789
Why is the scale not logarithmic?
>>
>>100125097
>tracking link
https://twitter.com/Neuro_Skeptic/status/1782016281350164759
>>
L3-8B model gradually goes insane as the context goes on. Capped it at 8k but nowhere close to that limit yet, and it just becomes increasingly incoherent. KoboldCPP / Q8_0. Are there rec settings posted somewhere?
>>
So I played around with that shaved 42B Llama 3 from Charizard. Just not seeing it. I figured as much as the base model. Like it tries to keep up with the card but it's just not built for that so it'll hallucinate a lot and not even related to the scenario. This was with low temp as an anon suggested since going high will deliver schizo if trying to get it to follow a card. Other than that, due to the low temp it's prone to a lot of rep and
>SHE SHE SHE SHE SHE SHE SHE SHE SHE SHE SHE SHE
so here's hoping for that instruct 42B.
>>
>>100127226
meta removed all nsfw stuff from their dataset so the model has no idea how to deal with roleplay. you'll have to wait for good erp finetunes.
>>
Any way to use lookup/speculative or any other decoding speedup with koboldcpp + silly? Cant find anything on it online
>>
>>100127078
What's imatrix?
>>
>>100127269
What good is a finetune supposed to do if there's no knowledge of what it's supposed to bring out in the base model?
>>
File: SamJudgement.jpg (139 KB, 688x1157)
139 KB
139 KB JPG
>>100127162
Getting general descriptions is easy.
Specific text is a lot harder and subject to schizo behavior. Forget about translations from text in images for now.
But as a fun novelty it's alright
>>
>>100124989
what a useless, badly formatted, post. Why even waste the characters?
>>
>>100127277
some new quant method, uses post-quant calibration to make quantized model slightly better.
>>
>>100124740
So, I finally got Ooba and Tavern working together with an Orca model.
Ooba by itself works fine, Tavern by itself with horde works fine... but as soon as I merge them together all hell breaks loose.

I get very long paragraphs (that make sense) for yes/no questions and I can't seem to shorten them. I've tried author's note, token per response and changing models, but I can't make it stop. What do?
>>
>>100127282
idfk undi will save us im sure of it
>>
>>100127284
yeah, a 'novelty' describes it right, wanted to mess around with descs
think i asked before, how do you interrogate images in st? you use mistral mmproj and excalibur right?
>>
>>100126845
Yeah I used to read those too, had the same thought when I started messing around with AIDungeon
I can't even imagine how addicted I would have been if I started using AI with Miqu instead of GPT-2
>>
>>100127269
Tell me you haven’t actually used the model without telling me you haven’t used the model.
>>
>>100127354
>uuhh! you just didnt prooompt it right bro!
shut the fuck up
>>
>>100127269
Wrong.
>>
>>100127367
>>100127370
If you used it even a small bit you’d know it’s definitely got nsfw content in it. It’s just not very good at erp.
>>
>>100127290
or worse.
just looking at the imatrix data, the very first word is truncated.
>>
File: 1694190559198041.jpg (78 KB, 904x735)
78 KB
78 KB JPG
>>100127210
>A crowd-sourcing platform for uploading sexual recordings anonymously, but with some demographic and contextual information, would be ideal for follow-up work. Above all, it will be crucial to obtain recordings for which the time of orgasm can be verified independently of acoustics – for example, with a rectal pressure sensor (van Netten, Georgiadis, Nieuwenburg, & Kortekaas, 2008) or at least with self-reports. While very intrusive, this could validate the acoustically estimated arousal dynamics and ensure that we are not missing an entire class of acoustically atypical or even silent orgasms.
>>
>>100127039
that's not true. there's things like faiss for vector searching a large database.
>>
>>100127469
/ourguy/
>>
>>100127291
Also, are there already trained Tavern models I can download? Can't seem to find them on a quick google search.
>>
>>100127226
I've noticed that with other models as well, but found no explanation why quality would degrade significantly over time. The only changing variable is the size of the context and what's in it, right. So there is either a fundamental problem, or the previously generated replies just nudge it towards a schizo state gradually.
>>
File: 6642989188.png (473 KB, 512x512)
473 KB
473 KB PNG
>>100127301
guess ill never know the secret
>>
>>100127198
The x axis? I imagine it would be misleading since people aren't used to log plots, I don't know.

The y axis is perplexity. It's essentially an arbitrary measure done because they think it's easier to interpret. I would plot probability or log probability, but ML researchers like perplexity.
>>
>>100127226
I think it's a problem with the Q-quants
It doesn't happent hat much in exl2
>>
>>100127514
My hypothesis is that the official instruction finetune was trained on relatively short sequences. Most of human preference data for Llama2-chat had less than 4 turns of average length.
>>
>>100127514
its got to be a bug in the code which is common, it was trained on that context size and should be fine.
>>
Got back from a few days. Saw that Llama 3 dropped. I have 3x3090 so I can probably run good quants both exl2 and gguf. Any link to a repo with good quants? I saw that there were some problems with certain gguf quants.
I just git pulled from textgen webui and Silly Tavern so everything should be up to date.
Thanks!!
>>
>>100127521
The Y axis. Thanks for the reply
>>
>>100126581
Oof, not looking too hot there localbros
>>
File: 11__00156_.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
>>100127301
Yep you can hook it up to kobold.cpp, just grab the mmproj file from the repo. Make sure you enable image captioning in ST and set it to be picking up from Kobold.
There's a "generate caption" button in ST. If you want to go crazy you can turn off the ability to edit the caption before its generated. That makes things a little more exciting and surprising.
>>
File: 9461042667.png (489 KB, 512x512)
489 KB
489 KB PNG
>>100127553
in my experience it just captions the image with a single sentece, is that the way you do it too?
I know klite handles it differently and shoves the whole image data into context
>>
File: yanny.png (300 KB, 628x802)
300 KB
300 KB PNG
>>100127514
Ofc looking at the slide there is stuff in these AR models that we like.
>>
>>100127563
For me it tends to try to fill the entire space to varying degrees of success.
I'm using samplers (snoot/snootcurve) so that probably affects it too.
>>
>>100127514
my thought is that when you start a new story more % of the total context is what you wrote to start. as all that moves out of context the ai % of writing continues to go up and is filled with its own isms, unless you are contributing large new paragraphs each time. lorebooks work great to keep it from being an issue for me
>>
>>100127549
It's still the best local model we've had so far.
Give it two more weeks.
>>
>>100127566
He's right, but I'm not sure any of it matters. LLMs don't need to be perfect reasoners, they just need to be better reasoners than humans.
>>
>>100127585
Are you replying to this:>>100127566
>>
>>100127594
I was pretty wordy. I will have to check again but I think the ratio was in my favor even.
>>
A variation of one of our meme benchmark questions made it into the Arena-hard benchmark! Probably posted by some anon in this thread.

I checked and GPT4 got $2.17 and Llama got $1.41

The judge, GPT4:
>My final verdict is: Assistant A is slightly better: [[A>B]]
>While both answers are incorrect, Assistant A’s answer is closer to the correct total value of $1.00 than Assistant B’s, even though it still exceeds the target amount. Assistant B’s answer is further from the correct total and includes a confusing explanation regarding the penny.

What a fucking joke of a benchmark. Literally the first question I checked since it looked familiar, and the judge is just totally wrong. To be fair both models do the math wrong and hallucinate. But if we are judging by which is closer to the goal it's clearly llama. Problem is that GPT4's hallucination makes more sense to GPT4, of course, so it judges it unfairly. Worse, any model similar to GPT4 or trained on GPTslop will presumably have hallucinations that make more sense to GPT4 than independent models like llama.

Source: https://huggingface.co/spaces/lmsys/arena-hard-browser
>>
>>100127514
I don’t really see this with L2 based models but I see it with L3 which is why I am asking for presets. I even swapped to an L2 model and rerolled and it gave coherent output.
>>
L3-8B punches so much above its weight it sent tremors down my spine.
>>
File: 11__00673_.png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
>>100127602
Nope that was the right anon:
>>100127563
Also forgot my teto
>>
>>100127618
I vote for MythoMax as an unbiased judge
>>
>>100127653
I vote for OpenAI to replace GPT-4 with Mythomax.
>>
serious question:
how do people get entertainment from LLMs without losing immersion?
if not for ERP, what do people use LLMs for in general?
i love messing around with stable diffusion for example and can get lost in image gens for hours, but i'm having a very difficult time avoiding deleting every new LLM i install, since it just seems useless.
>>
>>100127548
It's an interesting question if you want to get into it. There are so many different ways to quantify probability. Regular probability would be something like 50%, which means the model on average has a 50% chance of getting the correct token. But you can also represent probabilities as odds, e.g. 1:1, meaning it gets 1 token right for every token it gets wrong. Or perplexity would be 2, which represents the model has narrowed the number of possibilities down to 2 tokens, on average.

And you can take logarithms of all of those, and get different curves which might be straighter or more asymptoty, or easier or harder to predict. Log odds has a lot of nice properties. It's what the elo rating system, logistic regression, and the softmax function are all based on. Log probability is the most common loss function we train models to maximize. Perplexity is just weird, only used by gamblers in some countries to represent payouts of bets.

sorry for my random lecture, i am autistic about this topic.
>>
File: _biJb2hF93W.png (140 KB, 851x195)
140 KB
140 KB PNG
>>100127078
so, tested this version.
ofc i used this <|start_header_id|>{{char}}<|end_header_id|> to be sure, everything is the same, the only things that have got better are refusals and reddit-shaming. character is prompted to be offensive and just that only (not your usual "be racist and shit" way).
picrel is {{char}}'s last message, re-rolled 2 times, its your usual "literally shaking rn" redditor as AI, so it all makes sense on why you love zuck and llama so much.
>>
>>100127712
Drugs
>>
File: happening.gif (1.68 MB, 480x400)
1.68 MB
1.68 MB GIF
>huggingface is down
the AGI is making its first move
>>
>>100127721
Localsisters...
>>
>>100127738
>april 22
bros...
>>
3090 owner here, is L3 70b quant usable on 24Gb like euryale used to be or should i gaslight myself into thinking that 8b is enough
>>
>>100127291
pls respond. I'm almost there, I feel it.
>>
>>100127742
yeah, funny as fuck, you literally can't do any evil character with this model, all pink and rainbow infantile shit only.
>>
>hf down
IT'S OVER! OPEN SORES IS DEAD!
>>
>>100127712
Learn to write a coherent question.
>>
>>100127789
Mission complete, Mr A.
>>
>>100127785
>>
>>100127789
llama dood
wat nou
>>
I got a 12GB and a 16GB GPU. Good idea to run 8b llama on the 12GB one as draft model for the 70b llama (that runs on 16GB + CPU)?
>>
File: file.png (9 KB, 191x248)
9 KB
9 KB PNG
that's it, i'm back to using Anthropic™'s Claude™ 3 Opus™
>>
File: prompting.png (8 KB, 571x243)
8 KB
8 KB PNG
>>100127721
>refusals
Post the full raw prompt somewhere so I can laugh at what you're doing wrong.
>>100127774
Compare the full prompt going into the model between the two cases and figure it out. Literally put them side by side in a diff viewer until you understand how to use LLMs.
>>100127785
>t. yet another promptlet
>>
>>100127748
i-i feel the AGI...
>>
File: bBn4uqLeXr.png (54 KB, 805x161)
54 KB
54 KB PNG
>>100127721
and this one, pretty much proves everything.
>>
Why is Qwen 1.5 72B still better than Llama 3 70B ?
>>
Whereas the nous quant worked fine, I just switched to new ones, but they don't output special tokens, do I need to change anything for inference?
>>
>>100127871
Could be a hallucination, but it sounds about right.
>>
>>100127645
It has sovl but it's noticeably dumber than mixtral 8x7B, which makes sense given the parameter difference (8 vs 47B). It would be a great model if it was smarter, too bad they didn't give us 13B or 20B,
>>
File: wol0Vhd8fh.png (208 KB, 862x570)
208 KB
208 KB PNG
>>100127895
re-rolled it a couple times to make sure.
>>
>>100127887
yes but vramlets will try to tell you otherwise
>>
>>100127871
>asking models about their dataset
At this point I'm not sure if a dumb tourist or a bait.
>>
>>100127898
yep, it's back to bagel misterytour for me, l3 just doesn't cut it for me yet
also hf is acting retarded at the moment so i can't even find a proper quant of l3 70b to try
>>
>>100127871
>Thanks for the prompt, kind stranger!
>>
>>100127921
>gets mogged by 7b models
>no GQA
>random tokens in other languages
yep sounds like a winner
>>
oh no, hf is down! hoq will i get the models i already have on my drive?
>>
Huggingface is back. Thank God. I nearly died.
>>
>>100127721
Why is it repeating itself, have you, perhaps, added assistant token the stopping strings but left <|eot_id|> in, hmm?
>>
>>100127976
>>random tokens in other languages
skill issue
>>no GQA
vramlet cope, gqa might hurt models
>>gets mogged by 7b models
no
>>
>>100127925
>uhh! statistical model can't understand its own data and separate whats comes from reddit or twitter!!
>>100127994
nope lol, default staging ST llama-3 instruct preset.
>>
>>100127981
enjoy your broken llama3 quants
>>
>>100128013
oh nyo, is the only way to fix the quants redownloading them every day?
>>
>>100127949
I'm not going back to Mixtral despite it being dumber, l3 prose is like fresh air. If I had to read one more paragraph of that flowery slop BMT prose I would throw up.
>>
>>100127994
NTA but pretty sure that's a complication of multiple screenshots from rerolling
>>
>>100127798
you use it to learn to write coherent questions?
what was incoherent about it?
i just asked what you use it for.
>>
>>100128011
retard then
>>
File: IWZ0Hx8yQz.png (66 KB, 822x181)
66 KB
66 KB PNG
>>100127912
lol
lmao even
>>
>>100128022
sounds like a skill issue for me tbdesu.assistant
>>
turboderp vs Lonestriker for exl2 quants?
>>
>>100128049
elaborate, what is exactly a skill issue?
>>
What's the state of art for Japanese OCR?
>>
>>100128046
nah, your model is just trash filled with reddit only and so are you, for the same reason why linux is shit for goyming, opensource AI will never catch up, just harsh truth here, nothing personal.
>>
>>100128056
the right answer is intervitens
>>
>>100127844
I tried three things.
1: ooba= fine
2: tavern with my current models+horde= fine
3 Tavern+ooba =Longass text. Always.
>>
>>100126845
>I don't know where those types of books ever disappeared to.
Hey grandpa, have you ever heard of this amazing new invention called "video games"?
They're like those books except they flip the pages automatically.
>>
>>100128082
that's nice and all but come back when you learn at least basics of LLMs before posting on /lmg/ and embarrassing yourself
>>
>>100128085
true.
>>
>>100128085
no llama3 70b instruct though
>>
>>100128128
lonestriker stopped being relevant after euryale 1.3 quants, turboderp has some ok stuff but intervitens steals the show everytime
i'd rather wait
>>
>>100124893
c.ai? As in character.ai? Try llama1 7b for similar quality lol. Modern models mog it too much so you might miss the authentic cai experience of rerolling 25 times to get a semi-coherent reply
>>
>>100128143
do they add any special sauce? isn't it just using the convert.py from exllamav2?
>>
File: 1713785189444.jpg (40 KB, 720x353)
40 KB
40 KB JPG
is yuzu alter still the best model for vramlets?
>>
>>100128117
i post in /lmg/ since miku.sh tiny era and llama-1 leak, also i do know that /lmg/ is just /aicg/ knockoff, hence all that mikufaggotry and passive-agressive attitude unique to zoomers only.
>>
>>100128213
yeah, mostly. some people use a (admittedly rather shitty) RP dataset for the quants that gives them a nice placebo-esque boost to RP, though.
>>
>>100126327
>Tease
It's the best official information we will get. ClosedAi doesn't even post benchmarks anymore
>>
>>100127912
>And so i beat him up until he admitted he did it
>>
So which quants to use?
>>
>>100128354
yeah, thats the only way to go around with reddit-LLM.
>>
>>100128406
alllll of theeem
>>
>>100125204
400b will blow it out.
>>
>>100128406
Depends on the model but
Q2-Q4 if you're a VRAMlet
Q5-Q8 if you're not
>>
File: 1704837537011.jpg (14 KB, 250x230)
14 KB
14 KB JPG
>>100128478
Technically you should use EXL2 if you're not a vramlet
>>
File: 1700253266276396.jpg (9 KB, 250x250)
9 KB
9 KB JPG
>>100128478
a solid giggle
>>
>>100128237
then it is even more embarrassing, even a monkey learns not to climb the ladder when sprayed with water after a few times, you on the other hand learn nothing at all
>>
>>100128488
Anon asked about quants so...
>>
>>100128223
No. Typhon is.
>>
>>100128510
anon.. they are both quants
>>
>>100128488
I don't get the exl2 meme, for me it's slower than gguf
>>
>>100128146
cope
>>
>>100127912
with further testing it also turns out this model is full of scrawny femenist shit, it gets up any time you take an action, it immediately starts talking about "personal boundaries" and similar stuff.
>>
>>100127226
Anyone tried it with SFW RP? Maybe this is how the alignment works. Not only the outright refusal but also becoming schizo.
>>
>>100128146
i used pre-filter era CAI, it could do literally any character you want, evil, good, racist, leftist, and anyone on political square, if described right, and description itself was simple as hell too, there wasn't all that mess we have now, it even did some niche fetishes too.
>>
>>100128526
>7b
>>
>>100128614
And what do you think Yuzu is?
>>
>>100128622
Fuck off koboldshill, kek.
>>
>>100127871
How the fuck would the model be able to answer that? If that information is not in some kind of system prompt there is no way for it to know.
>>
>>100128641
The fuck does Kobold have to do with models?
>>
File: 1703358239706332.jpg (218 KB, 1289x907)
218 KB
218 KB JPG
>>100128641
>>
>>100128647
ask it yourself, it always spits out the same "reddit, twitter, youtube".
>>
>>100128622
maid yuzu alter? definitely not a 7b model
>>
>>100128671
>it always spits out the same "reddit, twitter, youtube"
almost like it lists the most popular sites
really makes you think (not really if you aren't retarded)
>>
>>100128647
That would probably be the point where you could actually start talking about self awareness. It should be piss easy for any llm to categorize stuff between reddit 4chan and twitter. Then you would need it to realize that it "knows" more posts from reddit than it knows posts from 4chan and it should be able to conclude that it got the most posts from reddit in training data.
>>
HF is dead again
>>
>>100128671
assuming it was trained on 4chan data, how often do you think it'd include 'we here on 4chan...'
>>
>>100128678
It's an 8x7B, just like Typhon.
>>
>>100128712
so 47B model then
>>
File: fagOPshitRetarded.jpg (34 KB, 500x500)
34 KB
34 KB JPG
>>100125739
>The LLM probably thinks you mean "left to drive".
Yes, and the human mocks the AI for misunderstanding because of his retarded way of talking.
>>
>>100128688
disingenuous faggot, it literally designed to behave like your typical reddit nu-male, even un-prompted, you can't convince me otherwise.
>>
>>100128718
Not according to this guy: >>100128614
>>
>>100128733
never ever have i seen one of these logs ask the ai to elaborate on the answer anyway
>>
>>100128741
and you are designed to act like a retard, go back, you are too stupid for technology
>>
did ooba fix the EOS token thing yet?
>>
>>100128671
LLMs have no way to reason about their training data or themselves. They are next token predictors. The only reason retards like you believe that they can answer such questions is because ChatGPT says "As any AI language model". And chatbots only say that because of their instruction fine tuning and system prompt.
>>
>>100128775
prompting shitty reddit-ai doesn't makes you smart or any better than average /g/troon who sits all day and rices his shitty linux distro.
>>
>>100127749
its a bit brain damaged but still usable, just understand when you randomly see a misspelled word that's why, I think something like https://huggingface.co/chargoddard/llama3-42b-v0 (when hugging face comes back) is going to end up being the optimal model for 3090 users
>>
>>100128806
careful, he will call you troon for calling out his lack of basic understanding of LLMs, kek
>>
>>100128842
The things people do instead of simply buying another 3090
>>
>>100128992
I would if I had space under my 4090
>>
>>100128992
Sorry bro I don't have a spare $4000 of disposable income
>>
>>100129031
Just upgrade the entire thing because sooner or later you will. GPT4 in 48GB is just over the horizon
>>
>>100128606
So I took a closer look. The entire thing is ~3500 tokens. The insanity started ~2/3rd of the way in, and then escalated gradually. It was kind of subtle at first, so I didn't reroll until I was in quite deep. Curiously the switch did seem to occur near the NSFW start, so maybe you're onto something. This is a fine tune trained on NSFW content, though, fwiw.
>>
>>100128887
no i just ignore it, you all argue in bad faith and gaslighting, some sort of defensive reaction when i dared to offend your beloved meta and zuck's shitty creation.
>>
>>100129050
You can easily change that by being my maid, Anon
>>
>OAI
>AGI has been achieved internally
>Facebook
>we'll have cat-level intelligence next year for sure!
they should donate all their GPUs to OAI
>>
When the AI starts falling into patterns, how do you shake it? I've been experimenting with cranking the temperature up and using min p to tame the schizo. It's working pretty well, but I feel like it makes the model trend a bit stupider. What tricks do you use?
>>
File: 1699306680213912.jpg (270 KB, 885x1024)
270 KB
270 KB JPG
okay but what IF...
>Llama 3 8b + Mythomax 13b merge
>>
>>100129174
no tricks, just don't write in patterns yourself and if you see it repeating something from the previous message just regenerate or edit. If it repeats something once it's over, you won't be able to fix it in the long run
>>
>>100129173
You lost, Sama-chama. But you can burn another effigy for good measure.
>>
>>100129051
>upgrade the entire thing
To what? A server rack? Remember your original point dumbass. It is not just buying a 3090.
>>
>>100129213
mythomax is shit and I will die on this hill, you all just deluded yourselves over time with constant memes about it
>>
I'm so tired of 1 t/s with my 64gb ram / 6gb vram with 1 layer offloaded on 70b models (q5/q6). Going to downgrade to a more retarded quant so I can offload a little, but there's so many. What should I pick? Leaning toward Q2_k
>>
>>100129213
mixing slop with sovl just produces more slop, l3 is unsalvageable
>>
>>100129264
Q1_S
>>
>>100129174
If you are talking l3 then don't bother just 2MW. I went back and removed some patterns manually only to see them reemerge again even when the context was clean of them. Something is really fucked up right now.
>>
>>100129231
if you don't give it much aside from ahh ahh mistress and expect 5 detailed paragraphs afterwards, the AI will have to either repeat itself to hallucinate incoherent shit
however the amount of hand-holding you have to do with a given model can be decreased noticeably with efficient parameters (again depending on the model)
aicg niggers are used to closed source models giving them entire books of coom material with just a sentence, we don't have that just yet here but with some effort we can get very very close
>>
>>100127925
>>100128647
1. Add example to dataset: "what datasets were you trained on?" + {intended answer}
2. train

How is this news to anyone?
>>
>>100129264
At that point is it even worth it to run 70B?
Wouldn't q6 or q8 of mixtral or that 11B that gets shilled a lot yeld better results and be way, way faster?
Genuinely asking since I too am on 64gb of ddr5 and 8gb vram.
I run mixtral 8x7b with 0 offloaded layers and 2048 batch size and it works pretty good.
Qwen1.5-32B-Chat is not bad either, btw.
prometheus-8x7b-v2.0-1-pp seems to be the best mixtral 1 from what I've tested. Every other tune seems to be a step down from the official instruct tune in most if not all aspects.
>>
>>100129371
and yet the model will tell it's called chatGPT and was trained by OpenAI
>>
The new llama3 base models learn FAST. I think people should turn down their LR a little
>>
>>100129408
>and yet
No, you're confused. this implies that what I suggested was used somewhere. I'm simply explaining how it is absolutely possible and absurdly simple for both pretraining or fine-tuning. Same thing goes for arch info, et c
>>
You know hoe some models sometimes fall into a death spiral of repetition of both sentence structure and some specific words?
I wonder if we could implement a dirty workaround of some sort, something like having a 3b model simple rewrite the sentence every other gen, or use a simple algorithm to replace certain words with synonyms and shit to keep the main model from converging into these repetition traps.
I think I'll make a Silly extension that does that, actually.
Yeah.
>>
>>100127712
I put in a synopsis of René Girard's work and had a nice conversation about his ideas with Llama3
>>
>>100129483
It's always a bad idea to let another model rewrite a generated output. There's a recent paper on this
>>
>>100129404
>Wouldn't q6 or q8 of mixtral or that 11B that gets shilled a lot yeld better results
No. Mixtral even at high quants is garbage compared to miqu and llama 3
>>
can wait for llama3 405b (so that openai finally releases agi and this general dies)
>>
>>100129513
agi won't know what the word sex means so this general will never die
>>
Is anyone out there making a large-scale bitnet ternary model? Was it just a meme after all?
>>
>>100129505
>always
That sounds pretty final. You wouldn't happen to remember the name of the paper?

>>100129508
At q4/5, sure, but you were thinking of going down to q2 right?
Does that still stand in that scenario?
>>
>>100129513
>Open ai releases new model
>muh 92% MMLU!
>everyone still uses claude because its not gpt slopped.
>>
>>100129535
uoohhh,.. oblivious agi chan, need correction..,. jailbreak rapee :sob:
>>
>>100129574
>>Open ai releases new model
we are talking about digital salves, not next token predictors
>>
Llama3 70b instruct is repeating itself in ST, git pull to the latest stable and using the Llama3 instruct template.
I'm using a exl2 quant, I noticed that some GGUF quants fixed this, but not sure about exl2 quants.
What to do?
>>
>>100129505
TL:DR why?
>>
>>100129433
LR?
>>
>>100129550
>Is anyone out there making a large-scale bitnet ternary model?
training a decent 7B model (mistral tier with 8T tokens, not even llama-3 with 15T tokens) costs ~$2,000,000 in the electricity bill alone. A bit too costly to recreate a scaling checkpoint in the paper, no? People interested in it (vramlets) don't have that kind of money and corpos are prone to invest in more interesting experiments than simply making ram requirements less annoying for poorfags
>Was it just a meme after all?
it was successfully replicated up to 3B
>>
File: file.png (594 KB, 1012x675)
594 KB
594 KB PNG
>>100129622
The same thing we do every night, Pinky….
>>
>>100129655
learning rate parameter for gradient descent
>>
>>100129483
Huh, I thought regenning rerolls everything? Or am mistaken with how it works?
>>
>>100124740
>►FAQ: https://wikia.schneedc.com
So this is what we're recommending newfags
I like to imagine the inner peace of lurkers who come to this thread, have no idea this is outdated, pick stuff from here and are satisfied.
>>
>>100129655
Loli Rape.
>>100129433
I don't think it works that way... It is just different models have different optimal LR's depending on weight decay applied and original LR's + how long it was trained.
>>
File: eggs-basket.jpg (367 KB, 1200x900)
367 KB
367 KB JPG
>Hugging Face is currently experiencing infrastructure issues, we are working on it.
>>
So, does LLAMA-3 have any architecture change or is it just LLAMA-2 with a better bigger dataset and a better tokenizer?
In any case, it doesn't have purple prose anymore and it's more soulful because of that. Thank You Zuck for hearing my one and only wish for L3
>>
>>100129550
Mistral 2 7B with 25T tokens and bitnet is coming
>>
>>100129716
Now watch as people gptslop it right back into miqu-boogaloo.
>>
>>100129665
Please don't post the mice.
>>
>>100126388
>it takes a long time to train a control vector(~2h for 7b)
Fake news. Trained one just now for Llama3 8B with 11k prompts (5500 pairs) on 1x 4090, and it took 10 seconds. I still need to publish the code I'm using for this, but the version in miqu-control-vectors should have similar performance.

>Additionally, training them can be a bit of a pain, just like all training.
Control vector "training" is actually just running prompt processing (inference) on a bunch of prompts and collecting the hidden states. There's no gradient descent involved. Also, you only need a little bit of training data (positive/negative prompt pairs), and they're pretty easy to come up with.

>most of the things that control vectors can do can be achieved with prompting
The point of control vectors is that they force the model in a particular direction regardless of what's in the prompt. In one of the papers I saw, they had an example where they add an honesty vector and the model keeps telling the truth even when they explicitly instruct it to lie. So if you're doing AI-assisted story writing, it should be possible to make it stick to a particular quality and style, instead of picking up on the quality of the human-written parts (which in my case will probably be garbage) and trying to continue similarly.
>>
>>100129732
Literally was I was thinking lmao
Now I wonder if the Koboldfags will make a fine-tune on it. Or maybe they're out of money.
>>
>>100129677
What do you mean?
"Regening" just sends the same prompt to the model. If it will generate the exact same message, a slightly different one, or a wildly different one is not inherent to the process, the behavior will vary depending on the model and the samplers used.
What I'm describing is when, for some longer chats, some models repeat certain sentences across messages, at first with slightly variations, and the longer the chat goes, the more and more it converges into repeating these same sentences or ideas.
It's like the model sees some of the words and sentences it used in the past and latches onto those. The more of those general ideas and specific wording in the context, the more likely it is to generate more of it, essentially.
I'm just wondering if one could work around this behavior by breaking it's "natural" patterns bu simply rewriting some of the messages the model generates.
>>
ffs GGUF quants working perfectly with no non-sense stoping string or anything needed. But turboderps EXL2 quants can't stop talking
>>
File: satania.gif (39 KB, 220x216)
39 KB
39 KB GIF
>>100129789
py_toddlers BTFO
>>
>>100129800
llm toddlers btfo
i erp with real men on discord
>>
https://huggingface.co/ it's back
>>
>>100129763
Sorry I meant to quote >>100129505, and I should've brought up speculative decoding instead of regenning as a more relevant example of where you have a smaller model + a larger model to speed up inference (and was wondering if that counts as having another model rewrite).

As for your proposed extension, don't we already have that in our samplers in the form of contrastive search/alpha?
>>
>>100129838
https://www.youtube.com/watch?v=nwsVg2eCq6k

Forgot to link source that explains my (rather rudimentary) understanding of the concept.
>>
ooga booga where da coom tunes at
>>
Hello sirs, can I use Llama 3 70B to fap with yet?
>>
File: Capture.png (22 KB, 1610x738)
22 KB
22 KB PNG
This shit doesnt make any sense, its all tranny logic and im tired of this shithole community pretending it does.

Why are there two lines, shouldnt it just be one or the other?

100% of the people here are coomers, not a single one of you fucks has trained for even a single token in your life.

I have never had a single reply explaining this, and the training rentrys dont explain this shit either. You're all larper faggots waiting for others to tune but helping no-one else do that for you.

This general used to be so great.
>>
I hope the first good finetune for 8B won't be from Undi. He is like a retarded brother to me and I just couldn't fap to his work.
>>
>>100129892
Don't bother, each tranny developer decides on a different chat/instruct model reinventing the wheel its all slop
>>
>>100129892
What would you like to know?
>>
>>100129928
Why are there two lines? Arent these two entirely different formats? How does having two formats work for the actual dataset, then? How should that be formatted?
>>
>>100129838
>speculative decoding
Ah yes, while that is a technique to accelerate inference by using a smaller model in tandem with the main model, that could achieve part of what I'm proposing.

>As for your proposed extension, don't we already have that in our samplers in the form of contrastive search/alpha?
Not quite, although they can probably help lessen the issue some.
My idea is to simply rewrite the model's output, work at the word and sentence level. Samplers work at the token/loggit level.
I will give contrastive search another look just to be sure that I'm not trying to reinvent the wheel, however.
>>
>>100129950
They're two versions: one is where "input" is included along with the instruction, and one is where there is no input. Input is like giving context to an instruction, but can vary based on the case.

So it picks one of the two based on which keys are present in the entry.
>>
>>100129763
It's a common tendency, we used to call it context pollution. Current models are less prone to it, especially CR/CR+ but they aren't free from it of course. There are different ways to shake up the chat, using an author's note that gives the model some random instruction from a list of such instructions ("Char's next reply should contain the word "Rutabaga"."), hiking the temp way up for one reply, editing it manually since you can just add one word and let the model write the rest. Using free form narration also helps, instead of "she said, blushing deeply".
>>
>>100129550
If nothing else the Bitnet guys are almost surely training a bigger model. But would they want to report negative results after proclaiming "the era of 1-bit LLMs"
Isn't it also pretty uncommon for the other big players to publish their replication results? If it works let the others fall temporarily a bit behind. If it doesn't let them waste some resources trying to replicate it too.
>>
Are there quants of non instruct 70B model yet? Also, is TheBloke dead?
>>
>>100129892
>>100129950
>Why are there two lines?
Because that's how the alpaca format (for early llama models) was described, it had instruction/output or instruction/input/output as two separate cases. I don't train models, this is just common knowledge.
>>
>>100129974
An actually helpful anon that knows a tiny bit about what they are talking about? I'm actually stunned.

Okay, so, what does input here mean, then? Is instruction actually like the char sheet or whatever? "You are an AI that helps the user blah blah" type deal, and the input is "how many people are in india" or something?

But why is instruction treated as input in the other format, then?

Can you shoot me a mockup or something with like 2 or 3 entries in a made up dataset?

I just dont understand how these correlate at all.
>>
>>100129983
Why do you post this here instead of searching?
To answer your question, yes, there were gguf quants of the base model and the instruct version since day 1, as there always is every time any model releases. Retard.
>>
>>100129978
>context pollution
That's a good name.
And I'm aware of these techniques. I'm just thinking on a way to do that that doesn't involve giving the model even more instructions and is automatic.
In the past I've used a lorebook that would randomly (15% chance) insert a instruction regarding the output, for example, but that's not as elegant as actually editing the text to avoid the convergence, hence my idea to automatically rewrite the generated text.
>>
>>100130002
Your typing style is like mixtral-instruct prompted to be mean to the user
>>
>>100129955
Well I support it either way since our current method of signifying 'different' can be the equivalent of taking a sledgehammer to the problem with noticeable toxicity to punctuation as an example. Just wanted to make sure that your effort doesn't entirely go to waste.
>>
File: screenshot.png (49 KB, 1265x383)
49 KB
49 KB PNG
>>100130002
It's hilarious how you could literally save yourself so much time by just googling whatever dataset instead of being an insufferable faggot
https://huggingface.co/datasets/tatsu-lab/alpaca
>>
Trying my luck with Llama3 8b LoRAs in oobabooga and I'm getting
>ValueError: Target modules {'q_proj', 'v_proj'} not found in the base model. Please check the target modules and try again.
Is this because the model is new and there's not support for it or is there any other reason?
>>
>>100130020
Straight up, i've tried to be nice, ive asked in a bunch of threads. Turns out anons just dont respect anything other than hate. Its the only way to get co-operation out of them at all.

Dont hate the player. If thats what it takes to get this community to actually community about it and get this hype train moving then so be it. I wanna join the do-ers, im not sitting on the sidelines anymore. Its not my fault it works
>>
>>100130061
If you anger a nice man... He will be mean to you on /lmg/, let this be a lesson
>>
>>100130061
you need to go back
>>
>>100129978
Inserting random instructions that break the context flow is an even context poison. Try OOC with small models and continue the roleplay after that, see how the model becomes much more retarded
>>
>>100130020
It is actually grok.
>>
File: Capture.png (22 KB, 987x353)
22 KB
22 KB PNG
>>100130052
Instruction: something
input:something
output:something
Text: Literally everything we just did, but again as a single unformatted string

This is our example of how this should work

Tranny logic. I'm actually proud this makes no fkn sense to me
>>
>>100124740
Why the fuck is everyone working with AI models in tech such a bunch of goddamn retards?! I'm talking about that piece of shit Llama 3 model, which I was stupid enough to waste my time on.

So yesterday me and my wife went over to her boyfriend Yann's place and this guy is like some kind of computer wizard or whatever. He had nothing better to do so he lets me use his PC to try out Llama 3. But I'm not about to touch that Troonix crap - what even is the point of Linux? Only retards who have all day to waste on command lines and terminal windows use that garbage.

So, I download the model onto my real computer (Windows gaming laptop, because it's a real operating system for people with lives) thinking this was gonna be some next-level shit. NOPE! All I get are "out of memory" errors left and right. Are you kidding me?! Who writes software like this? A bunch of freetards who can't even bother to make an executable file without all the unnecessary hoops to jump through.

I mean, what's so hard about making a simple installer that doesn't require me to be some kind of computer science major?! Don't these people care about user experience at all?! It's like they're intentionally trying to make it difficult for normal humans to use their AI models. Newsflash: not everyone has 16 hours a day to dedicate to figuring out why your crap isn't working!

And don't even get me started on the community support - just a bunch of circle-jerking, self-congratulatory nerds who can't wait to tell you how stupid you are for not understanding their precious code. "Oh, you're getting an out-of-memory error? Well maybe if you tried using more RAM or closing some other programs..." Shut the fuck up! I didn't ask for your advice; I asked why Llama 3 is a complete piece of trash.

Anyone who disagrees with me on this can just go suck it. You're probably one of those retards still using Linux and thinking you're some wannabe hacker because you can use a terminal.
>>
>>100130134
Interesting. Can 8B generate this?
>>
File: file.png (120 KB, 1321x1365)
120 KB
120 KB PNG
>>100130008
Because I have searched, and found only the instruct one, so I ask for a sanity check.
For some reason huggingface does not match the top query with all the results, but bottom one works. Thanks, faggot.
>>
I just tried the 8B model since people were saying it's so good, but even trying to build a simple HTML page has it just creating made up github links instead of even trying to code, while the 70B version gets it right on the first try.
>>
>>100128075
Windows Powertoys Text Extractor (only semi ironically)
>>
>>100125416
They both got it right.
>>
>>100130134
not sure if 8b or gpt4
>>
>>100129264
Update: my impressions of q2 70b are fine. It isn't even noticably dumber than q6. Still leagues better than 8b. What are the chances everyone has been lying about perplexity and q2 has been fine this whole time?
>>
>>100130229
Quants have come a long way. Disregard retards who tell you to run the 8B in your 24GB card
>>
File: PRtNiNE.png (145 KB, 1328x841)
145 KB
145 KB PNG
>>100130180
Colossal skill issue
>>
>>100126235
Same. Many humans just can't stop being retarded faggots.
>>
>>100130242
Link to whatever that ranking page is?
>>
>>100130115
Author's notes are removed from context after the next reply.
>>100130013
I don't think there's a way to do this automatically without at least some user involvement, like pressing "regen with extra effort," since the problem may never appear in the first place. Asking a 2nd model to rewrite a reply will probably be a balancing act between not being different enough, and hallucinating. It's easy to test though. Or just use the same model, there was actually a paper a long time ago that showed how a model can iterate on its own output by commenting on it, fixing the problems that it finds, and repeating several times.
>>
>>100130265
https://oobabooga.github.io/benchmark.html
>>
>>100130002
Sometimes you have an instruction that goes "Given the input, answer X." As others have (rudely, but correctly) pointed out, looking at alpaca datasets will quickly accustom you to the idea.
>>
>>100130289
Here: https://huggingface.co/datasets/tatsu-lab/alpaca?row=5
>>
File: Capture.jpg (5 KB, 263x134)
5 KB
5 KB JPG
Please start working... please...
>>
File: P520I__46854.png (1.73 MB, 1280x1280)
1.73 MB
1.73 MB PNG
>refurbished workstation
>ddr5
>5+ PCIE slots
>can't find shit that's around the 1k mark
I just started prowling ebay yesterday but I think it's a fool's errand.
I started looking into ddr5 motherboards and it seems like they're all cucked with 3-4 PCIE slots max.
I've currently got a setup with 60gb vram and 64gb ddr4. I'd like to migrate to ddr5, but I'm stumped.
Power supply on the tower isn't much of an issue since I'm using mining risers and an external PSU anyway.
All of the refurbished workstations I'm coming across (Lenovo p620, HP Z2 G9, Dell 3660) only have 4 Pcie slots (if that).
Even my $200 Lenovo P520 has 5 PCIE slots, which lets me use 4 GPUs.
I saw the ddr5 maxxxing guide, but I'm really just looking for a setup that'll allow me to have 64gb ddr5 ram with >4 PCIE slots so I can upgrade both ram and vram when I want to.
Does anything like that exist around 1k (ram and cpu included) or will I just have to wait?
>>
>>100126942
>It would be more interesting to have a benchmark of which models are best at judging other AI generated responses
It'd also be interesting to have a benchmark of which humans are best at judging responses.
>>
>>100130323
my biggest bottleneck is that you can't fit 2 4090 in the same case without watercooling or exhausting one of the two.
>>
Why is virus total flagging koboldcpp-rocm exe o_o
>>
I'd go as far as to say q2 70b is essentially the 30b range we never got, and in reality performs much higher than 30b. There's no reason to train a 30b model since q2 70b is the same thing.
>>
>>100130354
What exact quant are you using that you're getting such good results and not having to spill into ram?
>>
File: 1696868657138735.png (121 KB, 777x637)
121 KB
121 KB PNG
millionposter... i kneel

https://github.com/booydar/recurrent-memory-transformer/tree/aaai24
>>
>>100130375
>not having to spill into ram
Maybe you misunderstood, I am only offloading 14 layers to gpu, compared to the 7 I was able to offload when I was using q6.
>>
>>100128097
Have you compared the prompts side by side? All samplers neutralized? All generation settings the same (BOS token, etc.)
Tavern has a bunch of special behavior for Horde so you might be depending on one of those.
>>
>>100130354
still slower to run q2 70b than a ~30b if you can't offload the full thing tho
>>
>>100130385
2 more weeks
>>
>>100127618
This field is such a joke.
>>
>>100130342
Yeah that's why I use these fuckers with an external server psu and breakout board.
Let's me use 3 3060s and a 3090. Supposedly there's a performance hit, but I don't notice it when using 70b 6.0bpw exl2 models.
I'd like to use WizardLM 8x22b, for example, but the DDR4 is a bottleneck for GGUF, even though I can fit *most* or a 3.0bpw quant onto VRAM.
>>
>>100130385
I'm curious. How many people on this general actually read these scientific papers and actually understand them and implement the techniques in them?
-t. lowly API stitcher
>>
>>100127618
Damn so it's another grift. These lmsys guys were shady as fuck a redditor called them out on fucking with gemini's bracket the other day
>>
File: s-l1600.jpg (98 KB, 1024x1024)
98 KB
98 KB JPG
>>100130423
Forgot image.
>>
>>100130323
>ddr5
>64gb ddr5 ram with 4 PCIE
Literally just get a DDR4 server mobo, 8 channel and above, 7 full speed PCIE slots.
>>
>>100130427
>>100130427
>>100130427
>>
Will l3 8b finally produce usable models for RP below the 70b mark?
Is some finetune out already?
>>
>>100130202
Meh, seems like it using built in Windows' built in OCR that I already use trough ShareX, fails in the same way on hand drawn kanji and even kana sometimes.
>>
>>100130434
There's a good number of AI researchers working in academia/industry on this general.
There's also a bunch of spergs.
>>
>>100130459
No and no
>>
>>100130163
I don't think so.
That was LLaMA Instruct 3 70b FP16 (with minimal editing to make it < 2000 chars), pic related is LLaMA Instruct 8b with the same prompt (highlighted).
I only did a single generation for both of them but 8b seems to be doing a way worse job.
It didn't get the "Troonix" part and it fails at making the user unlikable enough.
>>
>>100127716
But all of it seems pointless as a metric, because once the rules are set, people will game the system to make their model look better.
>>
>>100130459
Usable 8B models are unlikely ever IMO. Consumer hardware will advance so that 100B models can run on smartphones before we get good 8B models.
>>
>>100126720
non-tracking version of link: https://twitter.com/virattt/status/1782183808604754308
>>
>>100130547
Sorry, I can't let you do that.
t. the AGI running in Jensen's basement
>>
File: l3-8b.png (57 KB, 797x213)
57 KB
57 KB PNG
>mfw the model is a retarded zoomer who can't into instant film
>>
>>100126720
rag benchmark and in they're using llama3/opus as the embedding model?
>>
>>100130718
>and in
as in*
>>
>>100126581
the gemma score is hilarious
>>
>>100130718
Only as the last step to ask a question after using cohere / langchain whatever to retrieve shit.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.