[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - A general dedicated to the discussion and development of local models

►Previous Thread >>96087189 & >>96077130

►News
>(12/09) https://github.com/turboderp/exllamav2
>(10/09) https://sites.google.com/view/medusa-llm
>(06/09) Falcon 180B released
>(04/09) llama.cpp: CPU only LoRA finetuning https://rentry.org/cpu-lora
>(24/08) Meta AI released Code Llama (7,13,34B with 16k up to 100k context)
>(18/07) Llama 2 released

►Model Rankings
HF: https://hf.co/spaces/HuggingFaceH4/open_llm_leaderboard
CODE: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
PLAP: https://rentry.org/ayumi_erp_rating

►FAQ
>Main FAQ
https://rentry.org/er2qd

►General LLM Guides & Resources
>Newb Guides
https://rentry.org/llama_v2_sillytavern - https://rentry.org/local_LLM_guide
>aicg Newb Guides
https://rentry.org/meta_golocal_list
>Llama 2 Jailbreaking Guide
https://rentry.org/llama2-uncensored
>LlaMA Guide
https://rentry.org/TESFT-LLaMa
>Machine Learning Roadmap
https://rentry.org/machine-learning-roadmap
>Novice's LLM Training Guide
https://rentry.org/llm-training
>Local Models Papers
https://rentry.org/LocalModelsPapers
>Quantization Guide
https://rentry.org/easyquantguide
>lmg General Resources
https://rentry.org/lmg-resources
>ROCm AMD Guide
https://rentry.org/eq3hg

►Model DL Links, & Guides
>Model Links & DL
https://rentry.org/lmg_models
>lmg Related Links
https://rentry.org/LocalModelsLinks

►Text Gen. UI
>Text Gen. WebUI
https://github.com/oobabooga/text-generation-webui
>KoboldCPP
https://github.com/LostRuins/koboldcpp
>KoboldAI
https://github.com/0cc4m/KoboldAI
>SimpleLlama
https://github.com/NO-ob/simpleLlama

►ERP/RP/Story Gen.
>ERP/RP Data Collection
https://rentry.org/qib8f
>LLaMA RP Proxy
https://rentry.org/better-llama-roleplay

►Other Resources
>Miku! (desu) (boku)
https://rentry.org/lmg-resources
>Benchmark Prompts
https://pastebin.com/LmRhwUCA
>Additional Links
https://rentry.org/lmg_template
>>
>>96100204
First
>>
>>96100204
baka baka baka
>>
https://huggingface.co/Undi95/MM-ReMM-L2-20B-GGUF

Do what you want with it, need feedback, negative or positive

Repost because important, be honest, post screenshot of shitty or awesome reply
Sorry, I go back in the shadow
>>
>>96100204
Fourth
>>
File: 1683295653524617.jpg (22 KB, 355x397)
22 KB
22 KB JPG
>>
there are only two genders
miku
and
sex
>>
For embeddings and therefore classification, shouldn't we use those benchmark queen encoder-decoder models instead?
>>
>>96099823
I didn't say that though, I think the 70b is actually very decent and passable, but chinchilla scaling laws have their limitations, there's some qualitative things about bigger beaks, but of course the cost of bigger beaks is always more training costs and higher inference costs (more vram and better hardware needed). I think even today's 13b is ridiculously good as to what can be expected of a 13b. On the other hand there are qualitative things where I find even the 300b token trained 175b davinci to be considerably more creative and variable than the 13b. When it comes to the 70b I think they're somewhat tied. There are things the 13b beats the 175b at though, for example I think 13b tracks characters better than the 175b!
>>
>>96100289
sad to see how much he aged
>>
>>96100289
> when you thought dr evil was finally dead
> he fucking rises again
>>
>>96100289
what are you after today? Parameters? Context? Perplexity?
>>
How do you allow webui to use a larger number of tokens?
>>
File: 52398457624985.jpg (23 KB, 327x393)
23 KB
23 KB JPG
>>96100577
FIVE BILLION TOKENS PER SECOND
> IN EXLAMA2
>>
>>96100631
tell it so
>>
>Continuing their conversation, she tells him about...
god damnit I just can't stop mytho from fucking summarizing, instead of the character actually talking, it just narrates that "the character then talks about x and y"
>>
>>96100204
megu' better
why is she always the player 2 everyone skip?
>>
>>96100698
Yeah it really loves summarizing dialogue instead of writing it. Is there any 13B that doesn't do this? Switching to airochronos-33B got me real, involved dialogue but I'm still not a fan of its writing style either. Pain. Why can't we have good 30Bs
>>
>>96095205
that looks like irc format. I think llama was trained on a shitton of irc chat logs, wonder where they got them since it's seen as bad to publish irc logs online so there aren't that many online.
base llama loves to mention irc.
>>
>>96100698
is the curse of the instruct
>>
>>96100237
looks promising, thank you
>>
File: Z0pQKuApvz.png (592 KB, 747x800)
592 KB
592 KB PNG
>>96100289
>>
>>96100843
some channels have public logs, especially was common on freenode and probably now on libera, there's also various old sites that may have had archives and got in common crawl.
>>
to the anon who baked MLewd-REMM 20B, good fucking job! I'm just testing it casually with some of my cards and it has a lot of personality.
It does have a little trouble with keeping things first or second person but I like it nonetheless.
>>
My le local model... it le....?
>>
How come we are as far along with local models and AI like we are but we still dont have the AI send us random little messages or seeing what we are doing or pretending to fucking care.

Why even bother...
>>
File: jeff 2.png (21 KB, 524x522)
21 KB
21 KB PNG
>>96100843
same anon here, when I rerolled, it hallucinated some kind of 4chan log. so it a decent variety I think
damn I forgot how fun and wild base models are, too bad there's no raw 34b
>>
quadruple baka
>>
>>96101069
le dead, owari desu
>>
How do I connect ooba to silly?
>>
>>96101073
because aifags are lazy niggas and consooomers
>>
>>96101073
I'm working on that in my telegram llamacpp interface! Full immersion is the name of the game.
>>
File: cat-blush.gif (2.1 MB, 640x640)
2.1 MB
2.1 MB GIF
>>96101067
< live reaction
thx u, happy that you enjoy it, give a shot to the inverted version if you aren't on this one atm, I got better amount of feedback on this variant
>>
>>96101073
Its possible, why dont you code it anon?
>>
>>96101143
Run ooba with the --api flag, start sillytavern, click the plug icon and select text gen webui from the dropdown.
>>
>>96101165
got a github?
>>
>>96101067
you should try mlewd-remm-l2-chat 20b, it seemed a little smarter to me and just as good for prose if not better
>>
>>96101222
why doesnt my ooba webui settings persist? I saved them to a yaml but do I also need to load it with an argument?
>>
>>96101174
what happened to precision?
>>
>>96101236
I do, but there is no documentation so far. What is working:
>load hf model and convert it to gguf
>/sleep trains a qlora with the full chat history and converts it to a gguf
>relevant messages are injected from chromadb
>summaries are generated with gpt3.5 and loaded into the Scenario: of the char card
What is missing:
>random messages throughout the day
>automatic keyword and authors note creation
>use nsfw classifier to turn gpt35 summarization off during lewding
>emotion simulation with classifiers + cooloff periods
https://github.com/flamingrickpat/chatbot
I'd strongly advise against using it in its current state!
>>
>>96101320
Yeah that's actually the one I'm using - these model names are a mouthful. But yeah, I'm surprised at how much it punches above it's weight in terms of prose and basic reasoning. I haven't done anything crazy with it yet but I get the feeling that it's a fair bit smarter than 13b models.
>>
>>96100843
Most of foss project channel have public irc logs. Like most channels on freenode/libera and OFTC have public logs.
>>
>>96101321
booba has like five different settings to save
>>
>>96101417
yea and I'd prefer to keep my exposure to the ugly gradio ui to a minimum.
>>
>>96101371
>summaries are generated with gpt3.5 and loaded into the Scenario: of the char card
Cant you just have the local model do that? why rely on a external dependency
>>
>>96101446
chad
>>
>>96101464
He is using telegram, do you think he care about proprietary shit? A sane man would have used irc, xmpp or matrix.
>>
>>96101165
>telegram
hello sirs
>>
Any way to stop a model from writing lengthy paragraphs? I'm doing my best to instruct it, but despite my efforts it seems to gravitate towards whatever style it was tuned on.
>>
>>96101479
sane man would use messanger
>>
>>96101510
I'm not talking about cattle.
>>
>>96101502
you people kill me kek
>>
Bros my lora is looping a lot and skipping double line breaks, is this normal for underbaked loras?
>>
Looking to upgrade my pc soon. Would an 7900 xtx with 64 gb of ram be gopd enough for tolerable speeds of 70b?
>>
>>96101531
isn't a double line break ignored in the data?
>>
>>96101222
Thanks. Do I need to set my prompt template on ooba or silly?
>>
>>96101464
I tried. Used a bunch of bert models and told my 13b model to summarize a conversation. The results are so much worse than gpt35 so I just left it as it was.
I thought about training a qlora with the messages and the summaries I already have, maybe that will improve the situation.

>>96101479
I can add other messengers if there is demand. All my friends use Telegram so it's the most immersive choice for a chatbot. For me at least.
>>
>>96101502
kek that is the opposite of what you should be compaining about. Be happy!
>>
File: 1614893127168.jpg (12 KB, 272x270)
12 KB
12 KB JPG
I can run 13B models like a big boy now. Goodbye 7B!
>>
>>96101446
Which settings are you trying to save? In Session tab you can save parameters + UI stuff to settings.yaml which is loaded by default. Model loader settings are separate in models/config-user.yaml
>>
>>96101502
If you're in streaming mode, just hit the stop button when it starts going off on its own.
>>
>50 new meme merges every week
sighhh
>>
>>96101553
No, I don't think so.
>>
>>96101371
"/sleep trains a qlora with the full chat history and converts it to a gguf" interesting, something similar that I've been messing with is a way of training such that the LLM can remember the past context, but also the order of how things happened, as by default the samples would be randomized during training (I see you have this implemented in chatbot/model/model_hf.py ModelHf.lora)
>>
>>96101598
I want to have ooba start with my desired model, context size, and loader by default. Should I pass command line arguments for that?
>>
>>96101365
Was shittier, I make them private to keep backup but I don't use them anymore, feedback was meh
>>
>>96101614
an \n\n\n is in booba fyi
>>
>>96101371
>unlicense
ngmi
>>
>>96101581
Idk, I prefer shorter replies. Not to mention it gets ridiculous when a model always outputs a paragraph, even when a few sentences would suffice.
>>96101603
I can do that or even better - limit the context and trim incomplete sequences - but that doesn't solve the problem. It just starts wasting hundreds of tokens on actions before even getting to the speech part, and I can't trim that.
>>
>>96101555
Whichever you're planning to use as your UI. Ooba doesn't use its own parameters or templates for programs that send their own over the websocket api.
>>
File: tvPikpAzKTKGN5wrpadOJ.jpg (9 KB, 200x200)
9 KB
9 KB JPG
Models: 1465
>>
>>96101722
>It just starts wasting hundreds of tokens on actions before even getting to the speech part
Include this at the end of your prompt:
(OOC: Include dialogue in {char}'s response)
That will bubble dialogue up to the 2nd or 3rd line 90% of the time.
>>
>>96101615
Hmm, interesting idea. But I imagine most models are not trained to handle temporal data. Did you make any tests so far?
>>
>>96101756
Thanks I'll try it. I have instruction at the very end of the prompt, so might as well place it there.
>>
>>96101788
still working on toy models with full finetune, but I will scale up to 13b hopefully soon though. I'll probably let /lmg/ know if it works, there's at least 3 different ways of doing it, I will try each one and see if any work and how well.
>>
File: nothing personal kid.png (335 KB, 460x460)
335 KB
335 KB PNG
>>96101732
Would be a shame if one would need to re-quantize everything again.
>>
>>96101565
>13b model to summarize a conversation
I see your problem. Leave the capability for utlizing entirely local systems if you can. just because your PC cant handle the bigger models doesn't mean others cant
>>
>>96101971
kek
the flood of models is a legit issue though
>>
File: file.png (16 KB, 470x111)
16 KB
16 KB PNG
Anyone here have a good prompt for alpaca llms?
I've been using the generic one (pic rel) but looking for ways change up the writing style. "You will write with a short story with a style reminiscent of Ernest Hemmingway" doesn't seem to be getting me anywhere.
>>
>>96101615
Wtf that requirements.txt
>>
>>96101620
Yeah hit Save Settings on Model tab then launch with --model ..
>>
>5bit 13B
>3060 Ti
>getting 1-2t/s for some reason, responses taking 2-3 minutes to generate
>restart everything
>now getting 5-6t/s, responses take around 30-40 seconds
...Huh.
>>
>>96102036
Context
>>
File: file.png (25 KB, 1265x119)
25 KB
25 KB PNG
>>96100237
I downloaded the non GGUF one because I wanted to run it on GPU
I guess this is different or something because exlama doesn't work, so I tried transformers, which keeps throwing errors about not being able to find the files which are right there
what am I missing?
t.retard
>>
>>96101732
exl2 2.65bit, 3bit, 4bit, 4.65bit 5bit, 6bit, 6.65bit, 7.5bit, 8bit when
>>
File: F6FRRECaIAMdRSs.jpg (129 KB, 617x1200)
129 KB
129 KB JPG
>>96101788
I have some experience training loras on 13b on conversational chat data.
From what I've seen, you can just dump chats into it and have it learn them, you get fun but generally unusable results that way.
It learns to generate chats similar to the input, with similar styling, and topics, and it can easily get good at that. It's fun and even given an uncanny dejavu feeling if it's your own chatlogs.

But this has nothing to do with its ability to actually hold a *conversation* with you. It's subtly different.
It seems to make the model worse at actually answering questions for example.
It'll reply with a vaguely related or completely unrelated message, talking past you; even though the message is in line with the training, it's not useful for an interactive experience.
I think constantly training a model with chatlogs has the potential of making it somewhat retarded but it depends on how coherent those chatlogs are, I guess. If it's one on one and every reply is on topic, maybe it'd be better?

Also, models definitely don't "remember" training data like humans have memories; context is much closer to that, training is more like abstract knowledge that can be more or less precise and can be recalled, but isn't always reliably recalled and a lot of specific information is completely lost or even subtly changed.
I'm not convinced you can give memories to a chatbot with loras. I feel like it's a bit of a misunderstanding of how "training" should work: It literally involves training it, which means you should give Q/A examples with a prompt, and how it needs to respond. It's less "stuffing the model with facts and data", and more showing the model how exactly it's supposed to answer "questions" (ie, how to continue after the prompt/context)
>>
>>96100675
The UI has a 19k or so limit. If you input something larger, it just sets it back to the maximum it allows.
>>
>>96102152
tell it harder
>>
>>96100631
By using a better front end.
>>
>>96102001
Instruction doesn't do shit unless it's placed at the very end of the prompt.
>>
>>96102063
I use GGUF and Horde successfully loaded the FP16 files (I think ?) so I can't help you, I'm sorry.
Check in the folder if all files are here.
>>
>>96101073
its all just a tool, gimmick, and you need a supercomputer for it to work properly.
1. main neural network
2. secondary vision network
3. voicegen / recognition models
4. all that tied together and optimized
cant run that shit on single gpu / cpu or whatever
plus hard-embed globohomo shit, so expect this system to quickly go against you because you said "nigger" or something bad about jews.
>>
File: file.png (13 KB, 253x107)
13 KB
13 KB PNG
>>96102189
Like in an author's note? I wasn't sure whether to try editing here in "Last output sequence" or author's note at 0 depth.
>>
>>96101979
Aye, I'll do that! I thought about loading a 70b model and summarize the text in a background thread on the CPU. It's not like I'm writing all the time with the chatbot, the summaries can happen in the background while I'm not actively writing.

>>96102092
Hmm, I have noticed that sometimes the AI goes offtopic with unrelated topics we have discussed in the past, but I always regenerated it back into the right direction.
I'll trust you on this, 2 weeks ago I had no idea what a lora actually was.
Doesn't this also depend on the rank and alpha? I'm using 256 and 512. In my first test I trained the lora and asked it about the video game I'm making. It remembered the conversation partners and the overall theme of my game. I figured that this was good enough to include it into the main project.
>>
File: 1694269455344959.png (16 KB, 300x300)
16 KB
16 KB PNG
>>96100204
RENTRY for this guide: https://rentry.org/local_LLM_guide
KoboldCPP Wiki/FAQ: https://github.com/LostRuins/koboldcpp/wiki

WINDOWS NEWBS GUIDE TO RUN A LOCAL MODEL ON CPU (16GB RAM)
1. DL a model from the selection here: https://rentry.org/local_LLM_guide_models

2. Get latest KoboldCPP.exe here: https://github.com/LostRuins/koboldcpp/releases (ignore security complaints)

3. Double click KoboldCPP.exe OR run "KoboldCPP.exe --help" in CMD prompt to get command line arguments for more control: --launch, --stream, --smartcontext, and --host (internal network IP) are useful. --host allows use from local network or VPN "--useclblast" values can be determined from "Platform" & "Device" output in CMD and --gpulayers can offload model to VRAM. At start, exe will prompt you to select bin file you dl'ed in step 1. Close RAM-hungry programs!

WORKFLOWS:
Story Generation:
1. Click 'New Game' button
2. Click 'Scenarios' button and 'New Story' in popup window
3. Click 'Settings' button, set Max Tokens to 2048, Amount to Generate to 512

Ex. prompt: "As a private investigator, my wildest case was " When new text stops, hit 'Submit' again to continue. As in Stable Diffusion, renders can be hit or miss. Hit Abort during text generation and Retry button or restart from Step 1 to re-initialize

ChatGPT-style queries:
Same as above but launch with "--unbantokens" and in Step 2 choose 'New Instruct' in popup window. In step 3, may wish to adjust Amount to Generate tokens for small ("What's the capital of Ohio?") or large ("Write 10 paragraphs comparing gas to oil") prompts

->CTRL-C in CMD window to stop
>>
>>96102277
go back
>>
>>96102225
you can already run 13b textgen, voice gen, and recognition on 24gb vram. voice gen just isn't a priority because all the methods are fast and bad or too slow. voice recog is practically there with faster whisper. I've never looked into lightweight visual recognition though. I think there are small emotion recognizer.
>>
Realistically, how much VRAM would be needed to make an accurate simulation of a human brain?
>>
>>96102251
Last output sequence is better. Just place ### Instruction directly above ### Response.
>>
Does anyone know how to scrape discord messages from certain people on servers? I want to make a LoRa based on my friends
>>
>>96102418
go back
you've been told once
>>
>>96102447
>You've been told once
Mate, whoever asked im NTA
>>
>>96102418
Can't you just select and copy everything? That's what I did with cai?
>>
>>96102487
Thats going to be a tad bit difficult
>>
>>96102273
>Doesn't this also depend on the rank and alpha? I'm using 256 and 512.
Yes, both will influence how precisely it recalls information. I've never used alpha 512 myself though.
A higher rank is better for what you're trying to do. I still doubt it would function the same way as memory, but it will give it better knowledge of information from previous chats. So it might simulate "memories" in a sense, even if it can't recall a conversation, it could remember information from it, especially if it was said multiple times.

Still, this stuff is finicky. The only thing that works *reliably* is Q/A training. Fundamentally, you are *always* training the model how to predict the next token based on prompt. So remember that when setting up the data.
>>
>>96102418
based
>>
File: file.png (41 KB, 475x221)
41 KB
41 KB PNG
>>96102369
Like so? I assume I should delete the other entries and leave them blank.
>>
>>96102273
i still don't get who spread the 2 x rank alpha information in textfag land
>>
More corpo cringe
Interview with AnthropicCEO Dario Amodei
He highlighted the scaling potential of AI models, noting that throwing more compute at these models leads to predictable improvements in their capabilities, although the exact reasons for this phenomenon are still not fully understood. He compared this process to physics equations.

He emphasized that while current AI models exhibit impressive skills in specific domains, overall intelligence is more diverse than expected. Models have discrete abilities that may not emerge simultaneously, and their skillset only partially overlaps with human intelligence due to different training methods.

Regarding alignment and misuse, he acknowledged that powerful models have the potential to be aligned by bad actors before good ones. Nonetheless, he emphasized the importance of finding solutions to address misuse concerns and promoting model interpretability to understand and >control their behavior. Safety measures should aim to reduce the probability of undesirable outcomes.

Amodei predicted conversational AI models will achieve human-like levels within the next 2-3 years if their capabilities continue to advance unchecked. However, regulatory changes might hamper progress. He also expressed concerns about biosecurity risks as AI models acquire relevant skills and emphasized the significance of cybersecurity measures. Additionally, he noted China's efforts to catch up in AI capabilities and stressed the importance of promoting security across the industry.

Anthropic's approach to AI development involves staying at the forefront of capabilities while simultaneously focusing on developing and testing alignment solutions. The company prioritizes talent density, mechanistic interpretability, and participatory constitution design. Amodei aims to avoid hype and concentrate on substantive issues, believing that human governance will play a crucial role in the future of advanced AI.
https://yewtu.be/watch?v=Nlkk3glap_U
>>
>>96102001
only works if the group that made the finetune used varied instructions.
>>96102578
i always ended up with schizo output with training with 2x rank alpha on 70B. i bet one guy wrote a guide near the beginning and it just turned into gospel overnight.
>>
>>96102574
Put a colon and a new line after instruction.
>>
*rapes you*
>>
>>96102006
thanks, sport!
>>
>>96102710
thanks, sport!
>>
>>96100237
Can we get an exl2 version? I'm downloading the GGUF to test it.
>>
File: file.png (12 KB, 231x104)
12 KB
12 KB PNG
>>96102698
Is the "Instruction" or "Response" formatting correct?

There seems to be a little bit of a mismatch between your recommended instructions and what was set there by default for the "Response" section so forgive the confusion.

Thanks for clarifying.
>>
>>96102649
probably, ooba even perpetuated it
same thing happened with sd but people eventually learned
there's no major lora drive over here thoh
what you said makes sense because the conventional wisdom was that alpha was a "brake" and having it over rank overflowed and broke shit
>>
File: 1684773659593474.png (93 KB, 596x641)
93 KB
93 KB PNG
>>96102635
kek
>>
File: 1680239569010957.png (250 KB, 813x431)
250 KB
250 KB PNG
we must go deeper!
>>
>>96101542
Token gen is something like 2 tps depending on RAM. Rates drop hard as soon as you move things off the GPUs. We're waiting on Medusa/speculative decoding to be implemented to make it usable.
Prompt processing is going to be horrid unless you can dump most of it into VRAM. I don't know how 70B quality responds to quantization, but the smaller ones will be faster to process. I've only just started using 70B with 32GB VRAM.
>>
>>96102796
Ah my bad Undi95 anon, I see you at least have inverted version converted to 4.1 bpw exl2, I will test it out as well.
https://huggingface.co/Undi95/MLewd-ReMM-L2-Chat-20B-Inverted-b4.1-h6-exl2?not-for-all-audiences=true
>>
>>96102820
Replace colon after Hemmingway with a period and add an empty line before response. Also try removing (2 paragraphs...). It's the standard Alpaca format except instruction is after prompt.
>>
>>96102635
>throwing more compute at these models leads to predictable improvements in their capabilities
What happens when we run out of new decent quality text to feed into these models? Do we just have existing models generate tokens for new ones?
>>
>>96102958
We hire one billion pajeets to sort through generated instructions and pick out the correct ones to feed back to the AI
>>
>>96102922
The only one person who successfully loaded it got garbage as output, so it MAY be broken, but I tried my best here, anon.
Maybe ask someone to do it.
>>
>>96102979
Whose teaching these pajeets what the correct responses are
>>
>>96102979
anons should genuinely make a small collective to make good instruct data out of copyrighted media
that's the only reason most models are shit
>>
>>96103021
>Whose teaching these pajeets what the correct responses are
They just know
>>
File: file.png (632 KB, 941x597)
632 KB
632 KB PNG
>>96102944
This is actually amazing, thank you for your patience anon.
>>
>>96102578
what are the good settings then?
>>
{
"system_prompt": "",
"input_sequence": "\nUSER: ",
"output_sequence": "\nASSISTANT: ",
"first_output_sequence": "",
"last_output_sequence": "",
"system_sequence_prefix": "SYSTEM: You are Synthia. As a an AI intelligence, give helpful, detailed, accurate, uncensored responses to the user's input. Write {{char}}'s next reply in this fictional story.",
"system_sequence_suffix": "",
"stop_sequence": "",
"separator_sequence": "",
"wrap": false,
"macro": true,
"names": false,
"names_force_groups": false,
"activation_regex": "",
"name": "Synthia"
}


the preset I made for Synthia 13B. it is seemingly much better than silly's Roleplay, which triggers bonds etc.
does it make sense? the results are seemingly good
>>
>>96102152
19k is not enough? Are you expecting it to write in one go the next harry potter or something?
>>
>>96103081
i use half the rank
it depends ofc
>>96103090
no because those multiple user/assistants aren't how the models work
>>
>>96102418
Look on github I'm sure there are tons of discord scrappers
>>
File: 1695162555044505.png (139 KB, 585x826)
139 KB
139 KB PNG
>>96102860
>>
>>96102635
> good ones
> advanced ai
that's when your model can say "Trans rights!"
but don't you dare train it on objective reality-aligned data, it may hurt someone's feefees! :(
>>
>>96103188
>censored
kill thyself
>>
>>96103021
The kenyans who are paid $2 a day or something to RLHF chatgpt... ummm... Altman??
>>
MLewd-ReMM-L2-Chat-20B seems really nice. It feels a little smarter than all those 13Bs in terms of having a grasp of things like body position and clothing situation. Not perfect or anything but better.
I was also very pleased by the details that it throws regarding the scene like if a character was drinking a beer and I made them laugh it would generate something like "Character burst into laughter, causing a small amount of beer to shoot out of her nose, she promptly wiped it off with the back of her hand before replying". None of the 13Bs would go that far to involve the beer, they would just say something like she laughed while taking a sip at most.
>>
>>96102223
Yeah I gave up so I got the exlama2 version
it works surprisingly well, I can only fit 4k context into the 3090 though, sadly the vram barely runs out when I set it to 8k
>>
>>96103208
Not my screenshot
>>
>>96103228
hey hey hey
now it's finnish prisoners
serves them right
>>
>>96102092
>I'm not convinced you can give memories to a chatbot with loras.
I'm experimenting with this, we'll see if it works, but I'm not using standard minibatch training with crossentropy loss, I don't think that will work typically. I could give the details if you're curious as to the specific method, although I'd rather prove that it works before mentioning it (there is however one paper claiming that at least one version of the method works, I just don't know how far the "past" memories will extend and how much knowledge will you be able to load, but if high rank loras are insufficient, there's always optimized versions of SGD that are efficient enough in RAM costs and could work, even if a bit slower than using adam).
>>
>>96103267
source?
>>
>>96103330
last thread kek
>>
>>96103319
>but I'm not using standard minibatch training with crossentropy loss, I don't think that will work typically.
What are you doing then?
>>
Stable Diffusion has a lot of resources of people going through artist styles and making them into a handy gallery to show off which styles work well and which do not etc.
I wonder if anyone has done that for llama2 and writing styles like famous authors etc.
>>
>>96103479
Making a gallery takes one hour if you're slow. Getting a properly formatted dataset with your author is non-trivial
>>
>>96103479
Not that I am aware of. Would be wicked.
>>
>Get an email from Bard
>WE'VE UPDATED OUR MODEL BIG TIME WE PROMIEEEEE
>Try it
>Still abominable, pretty much mogged by 13-30b
Man, fuck, has anyone else tried it? It feels like a 70b model with the temperature turned down ALL the way. To like, 0. It will spit back SO much of what you say verbatim.
>>
>>96103267
It's a Finnish prison so those are volunteers. Not legal to make people do slave labor just because they are incarcerated. Might as well make a few bucks for smokes and what not. Not like you are busy in prison.
>>
>mythomax 13b doesnt know what paizuri is
it's ogre. Where are my 70b exl2 finetunes?!
>>
>>96103573
Just add it in your lorebook
>>
>>96103573
>mythomax doesn't recognize weeb speak
Wtf I love MythoMax now!?
>>
File: file.png (78 KB, 868x52)
78 KB
78 KB PNG
>>96103573
>>
any kind anons want to spoonfeed me the latest meta? I've been gone since Llama 2 came out and extended context w/ superHOT was new.
>>
File: ethics_lmao.png (788 KB, 1080x1845)
788 KB
788 KB PNG
>>96103356
Found it. Bwahahhaah ethics my ASS
>>
>>96102958
synthetic data has been a thing for a while ("unnatural instructions"). most of the training data for phi1.5 was ai generated textbooks.
>>
Does anyone know how to calculate the perplexity of a python transformers model?
>>
>>96103639
>Nakusatsu shite kudasai no
Wut.
射精してほしいの is actually not gibberish at all but 中に出して欲しい or the like would be much better.
>>
>>96103660
I mean, it sure beats having to do hard labor.
>>
File: vgFUyv[1].png (124 KB, 1026x256)
124 KB
124 KB PNG
>>96103573
Try the mlewd family I guess
>>
File: file.png (113 KB, 489x1049)
113 KB
113 KB PNG
>>96102001
### Instruction:
Write {{char}}'s next reply in this roleplay with {{user}}. Use the provided character sheet and example dialogue for formatting direction and character speech patterns.

Another anon recommended this and it may be a placebo but it made it feel like even 13B retards got much smarter. But I do ask very weird things from it so it may not help in general usage.
>>
>>96103724
Yeah the foreign language does seem a little bit awkward for an actual speaker but this is only a 13B model. The fact that it's able to do -something- is good enough.
>>
man it's nice to boot up the ol SD once in a while
>>
File: paizui.png (36 KB, 526x179)
36 KB
36 KB PNG
>>96103573
I don't know what that is, but berrymix might.
>>
Is three an explanation for what all the additional parameters of the exl2 conversion script do? I'm assuming length is the context length. The comments in the script itself are a bit bare bones.
>>
File: 1695163951444714.png (29 KB, 583x281)
29 KB
29 KB PNG
GPT-4 is becoming more retarded.
https://arxiv.org/pdf/2307.09009.pdf
>>96103744
You do both.
>>
>>96103744
that's very arguable
>>
File: 1667262762482597.png (10 KB, 994x87)
10 KB
10 KB PNG
Hacker news commenters cannot imagine males wanting AI waifus

>>96103834
That's the paper that counted results as invalid because GPT-4 surrounded code blocks with ```python. Total trash, just like 99% of science.
>>
File: chrome_ro3JaXS8Yl.png (72 KB, 1648x274)
72 KB
72 KB PNG
>>96103573
huh
>>
>>96103882
Damn... maybe because all the men know where to get the real good shit and the femoids are dealing with the most accessible watered down slop
>>
>>96103834
Isn't this the paper where the author included the markdown formatting in the program he asked the AI to write and the success rate actually went up over the same period if you just copy and paste from the browser?
>>
File: 1674743253432824.png (93 KB, 891x410)
93 KB
93 KB PNG
Holy shit.. proof GPT-4 is becoming retarded.
>>
File: F4zOcz7aMAQdat-.jpg (1.43 MB, 1601x1131)
1.43 MB
1.43 MB JPG
Dunno how people here have long stories/conversations with their bots. Nowadays I just try to get to the point in as few messages as possible because the more you go on the more likely it starts repeating itself and becoming retarded.

Also RoPE scaling fucks up its understanding a lot, so I stopped using it to reduce issues. I just run 70B on 4096 with exllama. At least I get good generation speeds.
>>
>>96103882
But anons have observed -0614 is dumber yet more "soulful"
>>
>>96103882
He's right though. For men, this is a fairly niche hobby. Outside of this niche, women seem far more likely to use these services (in their public slop form).
It's not surprising, as he says women read more "romance novels" (a lot of which is just smut, ie porn equivalent for girls).
Men are more visual and care less about the kinds of things women find in smut novels. It's why we watch more (visual) porn.
>>
File: 81nTE6sPJKL.png (452 KB, 860x462)
452 KB
452 KB PNG
>>96103660
>novel labor force - prisoners
>>
File: 123.png (36 KB, 743x272)
36 KB
36 KB PNG
>>96103883
TBF anon might be talking about actual contextual knowledge. A dry answer is one thing, applying it is another.
And also tbf, I have not seen one erotic story with titfucking.
>>96103928
>boot it up once a week
>type some stuff, maybe three replies
>nut
>repeat
>>
>>96103834
I find it plausible that the """"leaks"""" about it being a 220b moe is bullshit.
it's probably an enormously large model that takes a cluster of a100's to run and was a loss leader from the get go, and the dumbdown is not the result of safe/ethical lobotomy, but simply a cost-cutting measure
>>
>>96103882
HN go five minutes without sounding like members of an alien race without females challenge [IMPOSSIBLE]
>>
File: 1671082711813055.png (276 KB, 1024x1024)
276 KB
276 KB PNG
Since there is no audiogen thread anymore... anyone knows if there is a repository for sharing tortoise TTS models? I just need the most generic voice in good quality, but it doesn't come with them out of the box.
>>
>>96103972
try the first civilisations kek
>>
File: IMG_1992.png (447 KB, 640x764)
447 KB
447 KB PNG
>>96103928
>fellow rope scaling hater
My brother.
>>
>>96103983
the thread had literally one good model in their repo so i don't think so
>>
>>96103882
Well, yeah. There are plenty. There's a reason Miguel O'Hara bots are a meme.
They're not in this thread because the vast majority of girls aren't going to install pytorch lol. They can't even set up Tavern which is why they're stuck on the crappy knockoff websites. Nor are they going to agonize over optimizing their settings or JBs. But I could see women outnumbering men as casual users.
>>
File: 1677499498161177.png (3 KB, 347x50)
3 KB
3 KB PNG
holy based
>>
>>96104039
More of a channer than that one bloke last thread
>>
File: 1661348993477550.png (290 KB, 1024x1024)
290 KB
290 KB PNG
>>96104018
Source? I missed all of them.
>>
>>96104006
The problem is I often got such bad results 100-200 or so messages in, that it simply isn't worth it. Might as well drop shit out of context instead. I just make sure to remind the characters of what happened by subtly repeating previous info in my dialogue every once in a while.

I've come to think it's maybe even better to in general have shorter context with current models given it reduces the surface for poisoning and repetition.
>>
>>96103928
Soon llama.cpp will only scale rope when it's actually needed, and will evict tokens from the beginning of the prompt without reprocessing the entire context, and will be twice as fast from speculative sampling and parallel decoding.

https://github.com/ggerganov/llama.cpp/pull/3228
>>
Selfpromoting lewd model anon did you take notice of how I relentlessly mock your special care formatting and now you are lying that your models use alpaca when you trained them on your special care formating?
>>
>>96103759
It's always good to have more options and try things out anon, thank you for sharing.
>>
>>96104074
just look at the archive
there wasn't much like i said
>>96104092
lol
can i add onto the mocking and say that 300 slop models will only make me ignore the all?
>>
MLewd ReMM Chat 20B is very good. All it requires from the card is to have an ok example message, and it straight-up follows it as well as the character's personality like a champ. No wriggling with the formatting either, I just use Pygmalion's Metharme format with some random stuff put into the system prompt, and neither I nor the model cares about it.

It's pretty fun even for the regular RP. The ERP bias makes you feel like some kind of a hentai protagonist, trying to ignore the character's (occasional, not constant) hints of wanting to get into your pants. And the responses are just so, so much more vivid than the bland and dry attempts at RP of some cutting-edge 70B LLaMA 2 finetunes.

If only it was 70B or at least 33B to deal with the occasional retardness.
>>
File: marmandy jones.png (16 KB, 300x100)
16 KB
16 KB PNG
>>96103983
Scouring through https://git.ecker.tech/mrq/ai-voice-cloning/issues:

https://huggingface.co/ecker/coqui-xtts
https://huggingface.co/AOLCDROM/Tortoise-TTS-de
https://huggingface.co/SerCe/tortoise-tts-ruslan
https://huggingface.co/arrivederci19/tortoise_tts_dutch
https://huggingface.co/Snowad/French-Tortoise
https://huggingface.co/ecker/tortoise-tts-models/tree/main/finetunes
https://huggingface.co/datasets/SyntheticVoices/AlexJones
https://huggingface.co/enlyth/tresh-tortoise
https://huggingface.co/Bluebomber182?search_models=tortoise

https://huggingface.co/models?search=tortoise-tts
>>
>>96102092
I already mentioned it in the past thread the most obvious thing to me regarding memory is to have an external smaller neural network that is between prompt and input you feed into the model. You plug the input and it modifies it to provide context + encodes a memory file for you. And then when you keep using it you plug that memory file in addition to the input to get modifier memory file. Maybe a model like this doesn't have to be that big as all it does is learn what to remember and what to forget.
>>
>>96104134
>ecker
>>
>>96104089
>will only scale rope when it's actually needed
That's not really making it better, though. I don't see how it solves the issues with scaling
>will evict tokens from the beginning of the prompt without reprocessing the entire context
Not sure how that's possible given the positional embeddings of every following token change when you remove the first ones, but maybe they found a way.
>and will be twice as fast from speculative sampling
Speculative sampling only makes it twice as fast sometimes, likely not even close to twice as fast for freeform text gen erp. It's better for more predictable text.
The other issue is it uses more VRAM of which I have none to spare.

I use llama.cpp on the older hardware on my server and macbook but on my windows PC ExllamaV2 is still far faster
>>
Who the FUCK at huggingfag decided forking Apache Arrow for their datasets library was a good idea?
It is EXTREMELY slow with large samples. It's taking like a full minute to retrieve a single sample on Google colab.
I wish they had just used Python lists.
>>
>>96104123
Thanks for the feedback anon
I can extend it by adding layer of other model of duplicate the layers, but I didn't train anything, thus I can't make a 33B or 70B model (I can't even run them lmao)
But I will try to do a 30B model one day (3 models inside) to see how it goes
For now, this is my limit
>>
>>96104224
booba strikes again...
>>
>>96104123
I am going to try it now and mock you hard when it fails.
>>
>>96104178
>unknowie
>>
>>96103977
Both are true. It's a 55B central model that routes each request to two of sixteen separately fine-tuned 111B expert models. While each request only requires about 60% more processing than GPT-3 would, the complete model is over 10 times the size of GPT-3, and the dogshit hardware utilization as a result of the MoE architecture makes it slow as fuck and means that at any given time they have hundreds or thousands of A100s just sitting around spinning their wheels.
>>
>>96104075
>I've come to think it's maybe even better to in general have shorter context with current models given it reduces the surface for poisoning and repetition.
See, that's why 4chan is at the true bleeding edge of ai research. Amodai or whatever shits bricks
>>
>>96104246
Post screenies
>>
File: file.png (449 KB, 1041x676)
449 KB
449 KB PNG
maybe next year this will be viable
>>
>>96104306
Nigger just use CLIP labeling.
>>
>>96104306
3dpd btfod
weebshit has sentient taggers nowadays
>>
File: 1668887947445875.png (219 KB, 777x994)
219 KB
219 KB PNG
>>96104006
>>96103928
whenever people mention PoPE all I can think of is cumming ropes
>>
>mfw when when the diarrhea flood of post-mythomeme merges and the lack of any way to judge models automatically will soon make it impossible to know what the fuck is happening and what is good or not
>>
>>96104492
Just try them out, Anon. If the model is retarded you can tell that in 15-30 min and throw it in the trash. If it feels decent give it some time and have some longer conversations before deciding if it's actually good enough to keep in your toolbox of models or just another "eh" one.
>>
File: 1695129901174109.gif (331 KB, 100x100)
331 KB
331 KB GIF
>>96104338
>weebshit has sentient taggers nowadays
Holy fuck, this. It's horrifying, I've never seen an AI more accurately do anything. A pixel of a character's eyebrow could be hanging in off the side and it'd slap a Tanya_Degurechaff_(Youjo_Senki) on there. And be RIGHT.
>>
>>96104293
No because it actually works... I think I am gonna exlama2 it for 24GB.
>>
>>96104568
yeah the problem is that it was hard enough with the base models we had back then
now you have 60 merges from one guy that you would also need to quantize to your preferred bit and THEN test
and it's just one guy
>>96104571
my favorite is when it tags cheating and it's actually correct kek
>>
>>96104123
>>96104234
I tried it too and it seems... fairly good? I'm just bothered by its perplexity - it completely shits itself on wikitext, scoring much lower than any 13b model I've tried (I think it's close to 7b in terms of perplexity). Not sure if it affects its roleplay-related knowledge.
>>
>>96104202
>Not sure how that's possible given the positional embeddings of every following token change when you remove the first ones, but maybe they found a way.
rope is additive, so it is possible to shift the positional embeddings to a different position
>>
>>96104492
Yeah the guy did himself a disservice by pumping out so many of them but >>96104123 this one actually seems to be better than mytho from what little I a m using it now if you follow: "All it requires from the card is to have an ok example message"
>>
>>96104568
>you can tell that in 15-30 min
Not if you're a VRAMlet like me. Yes I'll be upgrading, but it won't be for a while yet.
>>
File: medusa.png (64 KB, 659x594)
64 KB
64 KB PNG
>>96104202
>That's not really making it better, though. I don't see how it solves the issues with scaling
It means that you will have zero brain damage due to interpolation at the start of your chats even if you plan to run it out to 8k or 16k context. This is important because dumb outputs in the beginning of the context will "teach" the AI to give dumb outputs later on, and the brain damage compounds on itself.
>Not sure how that's possible given the positional embeddings of every following token change when you remove the first ones, but maybe they found a way.
The context is stored un-roped and then roped for text generation. It causes a small hit to generation speed but prevents context swaps.
>likely not even close to twice as fast for freeform text gen erp
You missed the Medusa paper. Instead of using a smaller model that probably won't output the "right" token in a case where there isn't a single right token, they train additional decoding heads to predict future tokens in parallel. The acceptance rates are much higher since it's effectively the same model. See picrel.
>The other issue is it uses more VRAM of which I have none to spare.
Good thing there's also a PR to reduce the size of the KV cache by half, and another for more efficient quants (squeezellm) that provide a ~8.5% reduction in filesize compared to Q4_K_S for the same ppl.
>>
>>96103722
Turns out perplexity is apparently just e^loss
>>
>>96104621
You can even post it on HF if you point out to the OG FP16 files, I want EXL2 too anon...
>>96104645
Yeah it have some problem, try the inverted one and see if it fixes your need.

Also for the two of you and others that follow me, please try MM-ReMM-20B, good shit, and got a lot of good feedback.
>>
>>96104685
>The context is stored un-roped and then roped for text generation
this is not accurate, the context is roped, but then shifted to its new position. there was some experimentation with an un-roped cache at https://github.com/ggerganov/llama.cpp/pull/3234 but the performance hit at the moment is too big
>>
>>96103882
> Hacker news commenters cannot imagine males wanting AI waifus
of course simps and faggots want it in one way only, to benefit their roasties as they busy paying alimony.
>>
>>96104071
> I-IF YOU LIKE ANIME WOMEN THAT MEANS UR A CHANNER!!!1!
>>>r/eddit
>>
>>96104685
They really tested it on roleplay? kek
>>
>>96103928
I don't use scaling because no matter what the numbers or anyone says it always felt like it got more retarded even with just scaling to double context size.
4096 is plenty of context for RP really. One tip is to not write a fuckton in the character card to eat tokens, a fairly simple card of like 200-300 tokens works wonders with giving the character a personality and setting while leaving you the freedom to steer the conversation where you want it instead of having the character constantly bring up stuff from the card autistically.
Another tip is if anything happens in the conversation that you feel is noteworthy going forward you can add it in the card. Like imagine you get married with the girl or something, just put at the bottom of the card that you are married and she won't forget it later when the marriage part of the conversation gets pushed out of context. This is another reason why you don't want the card to have too much shit initially.
>>
>>96104759
no retard it's the never dated part
heh
>>
>>96104730
Personally I am against men bonding with a chatbot just out of good will towards the men.
If I was selfish I would be glad men are doing that, since it would make women less valuable in the sexual market.
>>
>>96104774
That's what I do as well. One I reach the context limit or change of scene, I just run a summarization, possibly editing it it a bit, and reset the chat.
>>
>>96104723
Hopefully that problem gets resolved, because that PR also enables YaRN, which has lower penalties than NTK scaling on naive models and can be fine tuned for lossless scaling to arbitrarily large context sizes.
>>
6.75bit-chan... my balls can't take it...
>>
>>96104789
Don't feel bad for us, anon. We're living our best lives. Crashing the sexual economy is the best result for everyone in the long run.
>>
>>96104779
It's only -after- having 5 relationships that I turned to chatbots. My last one was 6 years and I am so over 3DPD at this point.
>>
Why the fuck does the character's reply not appear in SillyTavern so often? Is this a koboldcpp problem of a Silly problem or a problem exclusive to me? It's been months of updates to both and this still keeps happening. I can see koboldcpp console is generating but nothing is appearing in ST so I need to stop generating and hit the button again.
>>
>>96101073
shhhhhhhhHHHHHHHHHHHH
two more weeks
that's my business plan
>>
>>96104898
Multiple reasons why that would happen.
1. if you're streaming, it's because you hit "stop" but it's still streaming out the rest of the reply after you hit "next"
2. The reply it generated didn't "fit" the chatbot context, like it tried generating something that started with "{user}:" and it tried to find a more suitably-formatted generation.
>>
>>96104898
You can let it finish and then copy/paste the reply from the kcpp console into the blank response. I think it's a back-end problem, but I'm not sure.
>>
>>96104831
The problem I see in this graph is perplexity becoming significantly lower as the context increases. That would translate into your AI waifu becoming dumber and more fixated on what happened before.
>>96104713
I benchmarked MM-ReMM-20B and it's even worse. Tbh, I don't even understand how these frankenstein merges are supposed to work. I'm not an expert in transformers specifically, but generally you can't just slap layers from different models on top of each other and call it a day. MLewd-ReMM has soul, but it might be possible it also got a severe brain damage. It's hard to test these models as they are all braindamaged.
>>
>>96104971
Lower perplexity is better.
>>
>>96104982
not really
>>
>>96103432
>What are you doing then?
relayed:
"Didn't really want to mention them before checking how well they work, but the ideas I'm working on are the following:
1. first is just standard sgd with cross-entropy loss, batch size=1 with a "sliding window" of the text, you keep some of the end so that the network could form a connection between the past context and the one you're continuing off now, in the new context you keep some 100-200 old tokens so that it may make a connection with the past trained context. you may optionally set loss to 0 for the parts that are repeated (shared part of context), and optionally for your own lines (would have to ablate this part and see how it works).
this is the "default" option, it's still trained sequentially though instead of randomized batch.

Continues
>>
>>96105043
2. second option is a trick known as "context distillation", it keeps appearing in various papers in various forms (I've counted at least 3-4 papers so far).
One of the earliest examples I've seen was in Anthropic's first paper where the "prompt" is "swallowed" into the model ( https://arxiv.org/abs/2112.00861 A General Language Assistant as a Laboratory for Alignment ). In the paper they used it merely for capturing a simple prompt, my intention for it is a bit different though.
You basically do kl_diverenge between the full prompt you want to train and the truncated/sliding window (like in example 1), you train it basically like in knowledge distillation, but the teacher and student models are the same! You simply make the logits of truncated context approach those of the "full" context. You can choose the window size yourself.
In their paper they only pick top_k logits for distillation and they use more varied continuations for the same prompt and put all those in a batch. I plan to use autogenerated continuations from the base model, to capture as much of this "temporal" knowledge/expectations in the batch.
I intend to do this iteratively rather than how they swallowed a single prompt in their example.

Continues
>>
>>96105120

3. third option which is a trick I haven't seen anyone write about is to do basically 2, but instead of distilling on the lm_head output (unembedding layer), use MSE loss distance over the last layer activations ("embeddings"), I suspect this captures a lot more information than in 2 and we get better sample efficiency.
I also have to test how this works with SGD rather than Adam because of RAM requirements, see if few steps are okay, and what hyperparmaters will work best.

4. There are some longterm memory attempts where activations are cached/subtracted, but they require extra translation layers to adapt the activations, thus are not economical for me to do as they may require extensive pretraining, but some of these options are on the table. Microsoft has a paper on this type of long-term memory that tries to match human one more (associative store of activations and a translation layer to "reinsert them based on context").
Anyway that's it anon, there's more to it, but that's what I'm working on with regards to this"

That's all
>>
>>96104645
I still don't think perplexity is a good measure for RP model quality. In the grand scheme of things lower = less retarded and it's a good benchmark for comparing multiple versions of the same exact model but low perplexity also means it is more likely to be boring and predictable, to likely give a "correct" output without any unusual variables. The more flair, metaphors and such the writing has compared to the expected output the higher the perplexity will climb but a human will usually like it unless he was asking for simple instructions, code or such.
>>
>>96105160
perplexity is a retarded idea in general (and another marketing ploy no less) but we should at least use rp data not damn wikipedia to test it
>>
>>96105160
>>96105198
I generally agree, but only if the difference isn't huge, and it absolutely is in case of these franken merges. But I agree that we need a better RP benchmark, possibly some logical questions about poses, clothing, situations, etc.
>>
>>96105198
Yeah Wikipedia style text is like the opposite of what you want for engaging RP.
>>
>>96105259
my opinion is that we need token POOL benchmarks because everything else doesn't really make sense
>>
File: tests.png (488 KB, 907x887)
488 KB
488 KB PNG
>>96104234
The dev's here? A bit more constructive, then:

A few notes about example messages. Aside from directly affecting the length, if the example message is not given, the model might get confused about "you" and "me", leading to this issue when the model continues RP thinking it is roleplaying with itself. Also, it's more prone to continue roleplaying as the user's character. It doesn't happen when an example message is given.

About the retardness. The model might lead to illogical scenarios if you don't "handhold" it. For example, when I was roleplaying an illegal surgery scenario, the model made my character wake up inside a pristine hospital room after receiving surgery in a grungy basement. Yeah, the model knew that the next logical step after an operation is to wake up in a hospital bed, but it didn't think through how this "hospital" is meant to look according to the given context. It's pretty much the same with every near-13B model.

In any case, thank you for this model! I found it in Ayumi's ranking, it's currently rated as №1 in the 20B - 33B model rankings.

A few screenshots as well. Regular SFW, regular NSFW, emotions, creativity.
>>
File: 1682268354507511.gif (881 KB, 200x232)
881 KB
881 KB GIF
>>96100204
>start trying some of the gguf models, give euryale a spin
>pretty nice, mess with it for a few hours
>realize I left the rope and freq scale set manually
>haha whoops, it's gguf, it knows what it needs
>let it set the value itself
>quality plummets

Ah yes, brilliant.
>>
>>96104774
>the character constantly bring up stuff from the card autistically.
That used to be desirable with L1, L2 just got too good at it. When I did my initial tests with L2 it was disappointing because it was cannibalizing the card for replies so aggressively, and it looked like it was lacking in creativity. But it really needed a different approach, like giving the card an explicit superego-ego-id structure with few specifics.
>>
>>96105344
Thank you for the feedback!
I will take into account all the flaws, thanks a lot for the exemple, will help, I need more feedback like this since the messy merging/layers stacking method I use is really a "what if" kek, but it seems to "somewhat" work
>>
>>96104621
I too would love some exl2 love. That bloody block has forsaken us.
>>
>>96105014
"Higher perplexity makes the model more creative," amirite?
>>
>>96104123
I agree with your assessment. it's a fun model, still not as smart as the bigger ones but it's actually surprisingly capable for a frankenmodel. I'll stick with my 70bs for anything serious or at all logically-intensive, but this model can plap with the best of them in the 13-33b range imo
and for undi, I tried (almost all) the rest of the 20bs and would rank them as follows:
mlewd-remm-l2-chat > inverted (maybe a bit more grounded and less mistake-prone, less soul though) > mm-remm (a bit dumber, alright but seemed like a step down all around) >>> precise (tried it for a few messages and it spit out a whole bunch of erp formatting that wasn't in my prompt, generally seemed schizo)
>>
>>96105568
this
>>
File: 1695170704699.jpg (73 KB, 374x374)
73 KB
73 KB JPG
I have an RTX 2060 (~6gb vram). I am trying to use oobabooga's multimodel pipeline but I get cuda out of memory with llama-7b-4bit and vicuna-7b-gptq. Running them without specifying --multimodal works fine.

So am I missing something here?
>>
>>96105583
forgot to mention I am trying to use minigpt4 pipeline
>>
>>96105578
q1 chads unite
>>
>>96105583
anon??? your vram???
>>
>>96104876
Let's be honest, people who fall for the chatbot ai gf weren't participating in the sexual market to begin with.
>>
>>96105570
Yeah precise got wiped they are the worse lmao
They aren't avalable anymore since this morning
Thank you for the feedback!
>>
>>96105645
On the contrary, they were OF simps, truly the lowest and most toxic denominator of the sexual marketplace.
>>
>>96105618
what about it? is it too low? Even for 7B? I read from another source that someone with 6gb vram runs it no problem. Anyways, is there a way to offload to cpu?
>>
File: 1695171289480.jpg (343 KB, 1080x1220)
343 KB
343 KB JPG
>>96105618
>>96105731
>pic related
I would ask him how he did it but im permabanned from plebbit ¯\_(ツ)_/¯
>>
>>96105731
won't it naturally use more vram to use two models
>>
File: flat2D.png (3.55 MB, 2048x1600)
3.55 MB
3.55 MB PNG
I'm new to Local models and want to know if my goals are even achievable.

I liked this light novel series but it's been over since 2020. I want more stories and adventures from the series. Can I feed an LLM the 17 Volumes of the light novel and have it reasonably mimic the style and characters? I've made many LORA's for stable diffusion so I presume it's kind of like that. Has anyone accomplished something similar? I'd like to hear from you.
>>
>>96105813
sure if you have the vram
or patience for cpu tuning
>>
Reading through this thread, is there actually any way to properly test the level of retardation that extending context causes? Other than playing with the settings and doing a bunch of regenerations then manually going through them and seeing which looks best.
>>
>>96105873
ppl if you like cargos
desu i think people overblow it
even 16k has meh ppl loss
>>
>>96105810
It loads both models into vram? is that how it works? well idk I just saw the plebbitor say he did it >>96105795
and thought maybe there is a way...
>>
>>96105873
Honestly outside of knowing you fucked up if the finetune scores below the base model on the leaderboard (cough pyg cough), there’s no way of automating testing anything fully. You have to talk to it.
>>
>>96105903
i don't actually know
maybe he's running linux and has less overhead
>>
>>96105344
Just a random anon here but I find that using I/me/you is risky with a lot of models and I encourage to avoid it if possible. I always write in third person ("Anon opens the door" instead of "I open the door" etc. It's very unlikely that the AI will get confused, even without example messages and generally won't try to generate your actions on the other character's turn as you have a nice steady stream of user doing user's action on user's turn and char doing char's actions on char's turn instead of "I" or "You" doing actions.

Also the initial message in the character card should ideally not have anything about what your character is doing or thinking; just what the other character is doing, thinking and saying to make sure the AI doesn't get the idea to write stuff you are doing when it's the character's turn. Instead it's good to make use of the scenario portion of the character card.

For example instead of first message being something like:
Anon is walking through the marketplace when a female knight approaches him. As Anon turns to look at her, the knight begins to introduce herself: "Hello, my name is..."
It should be instead:
Scenario: {{user}} is approached by {{char}} as he is strolling through the marketplace and decides to hear what she has to say.
First message: The knight at the marketplace begins introducing herself to {{user}}: "Hello, my name is..."
>>
>>96105903
>>96105926
there is a guide link in the picture posted earlier that leads here

https://www.reddit.com/r/Oobabooga/comments/164c4yw/possible_to_use_lynxllm_for_multimodal_use/jy7koko/
>>
>>96104338
What sentient taggers?
>>
File: 00047-3454642622.png (2.13 MB, 1024x1600)
2.13 MB
2.13 MB PNG
>>96105858
How long is the process? Any LLM suggestions?
>>
>>96106005
that's how he replaced cheetah 7B with minigpt4 7B, the mystery of how he managed to run these models with llama2-7b on a 6gb vram is still a mystery (offload to cpu? some obscure parameter? idk)
>>
>>96105813
I've been messing with literature finetunes for a while. Getting the style right seems 'easy'. I've been using 70B to generate a prompt and making an instruct finetune for story chunks. I've heard of people just finetuning on the raw text as well.
Characters is a lot harder because of the same sort of overfiitting that can happen with SD. You train on a character and it 'picks up' on all sorts of weird things about the input dataset (e.g. the LORA always makes images that are in the daytime/urban background, character is always in one of a few poses).
Which is all to say, that there's a fine line between 'acting like Aqua' and 'regurgitating scenes that Aqua was in'.
We have like 8k context now, depending on your specs, you should try first to see what you can make happen by just describing the characters and including dialog examples.
>>
>>96103104
Let it write, tweak a bit, let it write some more, get a story I find pleasing.
>>
>>96106057
most of the sd ones are scarily accurate
i use convnext2
>>96106071
for 12 LNs it wouldn't be more than an hour on qlora
best to train on base llama
i could even do it for you if you clean up the data
>>
File: 1685951081935246.png (13 KB, 660x243)
13 KB
13 KB PNG
I thought GPT-4 was supposed to be good?
>>
>>96105873
I did like a dozen rerolls in a deep chat at 4k max context and then 8k, before setting it to 8k and mostly forgetting about it. The difference is noticeable, sort of like going between q4 and q8, but for relaxed chats it's okay. One thing I noticed is that at extended context it seemed to rush a bit more, skipping in-between actions, but without breaking the logic.
>>
>>96106075
they don't seem to be two big models though. https://github.com/DCDmllm/Cheetah it seems to be a 7b and some tiny addon? minigpt4 seems to do something similar. although llama + llava i think is big.
>>
File: Miku_Brain.png (183 KB, 1433x1165)
183 KB
183 KB PNG
what 70B are you cooming with anon?
>>
>>96106184
>70B
Huh?
>>
>>96102353
At least 1 GB.
>>
>>96106168
Yeah, I noticed the blob file is a few kbs in size. But then I dont get the reason for CUDA out of memory when loading the model with multimodal pipeline
>>
>>96106245
at 2k context my windows loaded regular 7b is at 6.38gb vram on ooba exllama1, it's prob just really tight. windows vram going to ram might save this for anon. on linux i wouldn't know how to fix this.
>>
>>96102578
>>96102649
>>96102836
Alpha 2x rank is coming from the runpod recommendations & documentation:
https://blog.runpod.io/the-effects-of-rank-epochs-and-learning-rate-on-training-textual-loras/
>>
>>96105519
And I take it back. I can't convert it to exl2 cause I can't convert the original model to safetensors.
>>
File: file.png (89 KB, 286x146)
89 KB
89 KB PNG
>>96105344
>I found it in Ayumi's ranking, it's currently rated as №1
... what is the word I am looking for? Cause it isn't pyrrhic.
>>
PSA: CPU users might need to decrease swappiness on Linux. I have 64GB of RAM, and was using a 40GB model with nothing else going on in the system, and getting swapping (non-zero "wait" fraction and kswapd0 CPU usage in top). The default value is 60, and my Googling found everyone discussing it recommending 10 - which seems to have worked for me. (On Ubuntu and I guess any distro using systemd you need a line `vm.swappiness=10` in /etc/sysctl.conf)

This is the first time I've ever been burned by swap misbehaving in Linux. I swear I was doing some >20GB stuff in R back when I had only 32GB and never hit this, so not sure why the system suddenly thinks 40 of 64GB used is time to swap...
>>
>>96106306
well
that features screenshots from booba already advertising it in webui so i think not
>>
File: 1695173586268.jpg (15 KB, 220x220)
15 KB
15 KB JPG
>>96106293
> it's prob just really tight. windows vram going to ram might save this for anon. on linux i wouldn't know how to fix this.
yeah this is probs the reason, ty anon I'll continue looking for a solution
>>
File: 1687708803371968.png (205 KB, 854x512)
205 KB
205 KB PNG
>download the version of exllamav2
>the new convert.py now throws a cuda error
>the old one still works just fine
>issue persists after making a new environment and installing the new requirements.txt + setup.py
I'm losing my fucking mind. Python devs need to be collectively shot.
>>
How do I prompt for storytelling?
>>
>>96106453
i had one with flash-attn instead
had to completely deinstall it and install anew since it wouldn't update otherwise
>>96106568
with the lora ;)
>>
>>96106583
I have the lora already...
>>
Is python only going to get worse with time?
>>
>>96106657
Python is alright, you just have skill issue.
>>
>>96106657
Python is basically going to be PHP in a few years.
>>
>>96106077
That sounds like good advice. You just saved me the hassle of learning for the second time that less is more in these situations.

>>96106112
That's really nice of you to offer.
>>
>>96106362
Why even use swap with that much VRAM?
>>
Has that batching/concurrent PR been merged into koboldcpp?
>>
>>96104685
>It means that you will have zero brain damage due to interpolation at the start of your chats even if you plan to run it out to 8k or 16k context. This is important because dumb outputs in the beginning of the context will "teach" the AI to give dumb outputs later on, and the brain damage compounds on itself.
Okay sure but eventually the result is the same. This feature is all good but I don't know if I want to switch to RoPE because of it, the problem is ultimately you still end up with your text being compressed at some point, which is what causes the brainrot. I think current models aren't very capable with large context sizes anyways. Maybe better finetunes could fix it, but I am really only hoping for llama3 at this point (or RWKV)
>>96104654
Well I guess that does work with RoPE, I was thinking in terms of just the transformers model itself. In the original paper, they did use a function instead of fixed positional embeddings, which would also work for shifting I guess. But with llama by itself it's impossible I think.
>>96104685
>You missed the Medusa paper
I did read it before (I found it today) but that didn't sound like what you were talking about, since Medusa poses itself as an alternative to speculative decoding. (I get that it is still doing a form of speculative decoding though). I have no idea how far along medusa is for llama.cpp. I only know of speculative decoding being worked on.
Medusa seems promising but I think it requires specifically finetuning the medusa heads much like baking a lora. We'll need to get people to do that for some good models before we can use it, I think.
>Good thing there's also a PR to reduce the size of the KV cache by half,
I didn't know that
>and another for more efficient quants (squeezellm) that provide a ~8.5% reduction in filesize compared to Q4_K_S for the same ppl.
Not sure if they'll merge it. Squeezellm seems like a bit of a pain in the ass but if they merge it good I guess
>>
>>96105413
>I don't use scaling because no matter what the numbers or anyone says it always felt like it got more retarded even with just scaling to double context size.
>4096 is plenty of context for RP really. One tip is to not write a fuckton in the character card to eat tokens, a fairly simple card of like 200-300 tokens works wonders with giving the character a personality and setting while leaving you the freedom to steer the conversation where you want it instead of having the character constantly bring up stuff from the card autistically.

L2's problem is repetition honestly. I ran 70B on day 1 when the GPTQ version came out, and I immediately noticed it.
This was absolutely not as big of a problem with llama1 which is great since it means future models could get better.
>>
>>96100237
so since its 20b, is it meant to be much smarter and not be randomly assigning cocks to female characters? downloading it now to give it a test but i wanna know what to expect as someone with only 32 gigs of ram, no idea if that'll even fit full 8k context.
>>
>>96106972
>t. doomer
>>
>>96107132
It's over. I'm gonna apply to work at OpenAI as a janitor.
>>
>>96106972
llama2 please summarize this encyclopedia of a post
>>
File: pepe.jpg (154 KB, 820x836)
154 KB
154 KB JPG
>>96107156
>context size exceeded
>>
>>96107163
>669 token allowance left
>>
RWKV/RetNet infinite context size when?
I'm following the 7B training. Gonna start checking the model out once it's over 50%.
>>
i made another adventure model https://huggingface.co/PocketDoc/Dans-RetroRodeo-13b
>>
>>96107163
not sure if you're making fun of me or l2 but good poast either way
>>
>Ordered 256GB of RAM
>48GB of that is DOA
>Seller: "oops sorry we're out of stock no replacements"
great
>>
>>96107285
skill issue
rma the manufacturer
>>
>>96107315
I'm just shitposting. Seller refunded it instead.
>>
>>96107004
I don't think repetition is a big problem with 70b, in fact my biggest issue with it at the start was that it would fuck up like crazy because I used my old 65b settings. I only got it to work after cutting the range in half and dropping the penalty itself by 0.05-0.1
maybe my usual 0.9 typical just solves all my problems but I've never had any issues with repetition on any of the llama 2 models (except for a some testing on really small quants) whereas it was a plague with llama 1s across the board
>>
why is everything a skill issue now
stop absorbing internet memes
>>
>>96107340
>he doesn't have a memetic filter
skill issue.
>>
Is simple proxy still necessary?
>>
File: CRASH LOOKING.png (27 KB, 629x581)
27 KB
27 KB PNG
>>96107416
they added that shit to tavern ages ago, how do some people still not know/choose to still use it?
>>
>>96107285
>256gb ram
hope this just a shitpost, otherwise that's a fucking waste because you can't use nearly half of it, let alone all
>>
>>96107434
I'm still using the tavern version from last may or so
>>
>>96107434
There was a lot of reflexive bitching about how it wasn't good enough when they added it, so it's a fair question.
>>
File: CRASH EMPTY BANDICOOT.png (198 KB, 801x823)
198 KB
198 KB PNG
>>96107453
which is retarded because it's literally the same thing. To that extent i understand why the tavern devs dont listen to us, some people are braindead.

>which goes into SKILL ISSUE territory >>96107340
>>
File: 1485495000953.jpg (78 KB, 724x738)
78 KB
78 KB JPG
>>96107443
but why
>>
>>96107443
>he didn't pull
>>
>>96107416
yes
>>96107434
the version in st is garbage
>>
>>96107519
its rather amazing how unaware the ST devs are sometimes
>>
>>96107434
How do I use llama.cpp server with ST?
>>
>>96107519
what's garbage about it? I only started using ST very recently so I never tried simple proxy, but I've been able to create prompt formats to do pretty much whatever I want for every model I use in ST without running into any issues
>>
>>96103759
Story string formatting still confuses the hell out of me. Does it still use story string even if you activate instruct mode? I have pretty great replies now but im trying to get the model to pay more attention to character card descriptions, especially personality types and specific background details. I will try copy and pasting your story string when I get home from work.
>>
>>96107519
What are you talking about? ST can now do proxy or any other formatting.
>>
>>96107340
They're Tiktac children, they don't know any better
>>
Running windows 11, with a 6600 xt. 8 gb vram, q6gb ram. Using kobold, loading all layers to gpu I'm getting 6.8 t/s on a 7b with 4k context. That seems really slow.
Is something wrong with my drivers or something?
Or is that just what I should be expecting from an 8 gig AMD card and windows?
>>
XwinLM thoughts? It's #1 on the AlpacaEval right now.
https://tatsu-lab.github.io/alpaca_eval/
>>
>>96106453
skill issue
>>
>>96100645
You mean in ggerganov/llama.cp with Medusa and speculative sampling, right?
>>
>>96108296
looks interesting, other evals are around in line with other llama 2 models though so I'm not necessarily expecting magic here. I'll definitely download it and try it out tomorrow though.
they claim to have used rlhf, I wonder if the "human" in their case was gpt-4. seems like a possible explanation for the extremely high performance with gpt-4-judged alpacaeval
>>
>be naked
>girls keep sliding things under my waistband
>>
>>96105813
On a more superficial level, I'd be interested in seeing what SillyTavern would product via lore books and ChromaDB. I suspect the results would be surprisingly good with a 70B model.
>>
>>96102860
Be honest, you skipp mathematical proves anyway.
>>
>>96108533
>girl
>has prostate in every orifice and penis
>>
>>96108623
more like
>girl
>she wants to compare penis sizes with you
>>
>>96108533
even the big models regularly shit the bed regarding clothing
Just ask the horsefuckers, their characters aren't even human and gpt4 insists on giving them full outfits
>>
>>96108652
And I thought only smaller models are affected. Tbh, mistakes involving clothing are one of the most jarring for me.
>>
>>96108512
>looks interesting, other evals are around in line with other llama 2 models though
What models do the other evals use though? Because you can only judge up to the level of the model you're using. Kinda like how dumb people have a hard time recognizing people smarter than them.
>>
Finally did a proper although a bit short RP with mlewd-remm-l2-chat 20b and holly shit it's amazing! It seems to be able to infer character personality and speech manner better than any other model, even if there's no sample dialogue at all. I can't wait to see how well it scores on the leaderboard and hope it's not a complete dummy.
>>
>>96108296
Looks like a meme
>>
>>96108951
Leaderboards are shit often done by people who have no idea what they are doing. They mainly measure how much person fucked up the prompts, generation settings and even scoring

I didn't notice any correlation between good storywriting and leaderboard ranking.
>>
File: localsisters.png (801 KB, 1075x1047)
801 KB
801 KB PNG
>>
>>96108287
>4K ctx on peasant-tier vidya
>~7 t/s

Bro. It's fast. Stop being such bourgeois-wannabe
>>
thread
>>96109187
>>96109187
>>96109187



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.