[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: nightracing-34928762.jpg (258 KB, 1776x1224)
258 KB
258 KB JPG
Previous /sdg/ thread : >>100110132

>Beginner UI local install
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io

>Local install
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI (Node-based): https://rentry.org/comfyui
AMD GPU: https://rentry.org/sdg-link#amd-gpu
Intel GPU: https://rentry.org/sdg-link#intel-gpu

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Auto1111 forks
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
Anapnoe UX: https://github.com/anapnoe/stable-diffusion-webui-ux
Vladmandic: https://github.com/vladmandic/automatic

>Run cloud hosted instance
https://rentry.org/sdg-link#run-cloud-hosted-instance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
Inpainting: https://huggingface.co/spaces/fffiloni/stable-diffusion-inpainting
pixart: https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma

>Models, LoRAs & embeddings
https://civitai.com
https://huggingface.co
https://rentry.org/embeddings

>Animation
https://rentry.org/AnimAnon
https://rentry.org/AnimAnon-AnimDiff
https://rentry.org/AnimAnon-Deforum

>SDXL info & download
https://rentry.org/sdg-link#sdxl

>Index of guides and other tools
https://codeberg.org/tekakutli/neuralnomicon
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>View and submit GPU performance data
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

>Share image prompt info
4chan removes prompt info from images, share them with the following guide/site...
https://rentry.org/hdgcb
https://catbox.moe

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/trash/sdg

Official: discord.gg/stablediffusion
>>
File: SDG_News_00167_.png (1.71 MB, 1560x896)
1.71 MB
1.71 MB PNG
>mfw Resource news

04/21/2024

>FlashFace Inference Code Released
https://github.com/ali-vilab/FlashFace

>ComfyUI MagickWand: Proper implementation of ImageMagick
https://github.com/Fannovel16/ComfyUI-MagickWand

>Moving Object Segmentation: All You Need Is SAM (and Flow)
https://www.robots.ox.ac.uk/~vgg/research/flowsam/

>Image Effect Scheduler Node Set for ComfyUI
https://github.com/hannahunter88/anodes/

>ComfyUI-Tripo: Generate 3D models using the Tripo API
https://github.com/VAST-AI-Research/ComfyUI-Tripo

04/20/2024

>Basic Stable Diffusion API GUI
https://github.com/ThioJoe/BasicStabilityAPI-GUI/

>IPAdapter Advanced Weighting support added to sd-webui-controlnet
https://github.com/Mikubill/sd-webui-controlnet/discussions/2770

04/19/2024

>Customizing Text-to-Image Diffusion with Camera Viewpoint Control
https://customdiffusion360.github.io/

>StyleBooth: Image Style Editing with Multimodal Instruction
https://ali-vilab.github.io/stylebooth-page/

>Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
https://github.com/vl2g/Sketch-Inpainting

>ComfyUI ImageMagick: Image processing powered by ImageMagick
https://github.com/jtydhr88/ComfyUI-ImageMagick

04/18/2024

>Meta has releases meta.ai, Multimodal AI including image generation
https://www.meta.ai/

>Stability AI lays off roughly 10 percent of its workforce
https://www.theverge.com/2024/4/18/24133996/stability-ai-lay-off-emad-mostaque

>Stability API nodes for ComfyUI
https://github.com/Stability-AI/ComfyUI-SAI_API

>Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
https://animate-your-word.github.io/demo/

>InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
https://johanan528.github.io/Infusion/

>Factorized Diffusion: Perceptual Illusions by Noise Decomposition
https://dangeng.github.io/factorized_diffusion/

>KGen - A System for Prompt Generation to Improve Text-to-Image Performance
https://github.com/KohakuBlueleaf/KGen
>>
>mfw Research news

04/21/2024

>Prompt-Driven Feature Diffusion for Open-World Semi-Supervised Learning
https://arxiv.org/abs/2404.11795

>MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
https://www.iri.upc.edu/people/nugrinovic/multiphys/

>ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
https://arxiv.org/abs/2404.12216

>BLINK: Multimodal Large Language Models Can See but Not Perceive
https://arxiv.org/abs/2404.12390

>Generating Human Interaction Motions in Scenes with Text Control
https://arxiv.org/abs/2404.10685

>Dual Modalities of Text: Visual and Textual Generative Pre-training
https://arxiv.org/abs/2404.10710

>DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling
https://arxiv.org/abs/2404.09227

>Conditional Prototype Rectification Prompt Learning
https://arxiv.org/abs/2404.09872

04/20/2024

>Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/

>Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
https://arxiv.org/abs/2404.11732

>Partial Large Kernel CNNs for Efficient Super-Resolution
https://arxiv.org/abs/2404.11848

>From Image to Video, what do we need in multimodal LLMs?
https://arxiv.org/abs/2404.11865

>GhostNetV3: Exploring the Training Strategies for Compact Models
https://arxiv.org/abs/2404.11202

>ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis
https://arxiv.org/abs/2404.10141

>StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization
https://arxiv.org/abs/2404.10681

>Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
https://arxiv.org/abs/2404.09326

>Exploring Text-to-Motion Generation with Human Preference
https://arxiv.org/abs/2404.09445

>Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
https://arxiv.org/abs/2404.09601
>>
>>
Anyone tried fp8 training using Transformer Engine? Anyway gonna hope I can make this Docker container work and see what comes out.
>>
can anon post a gen using PAG that doesnt look fried?
>>
I am retarded, how do I actually run Comfy UI on Ubuntu?

I git cloned the repo, but I don't see a start/run/webui.sh

And I checked the readme before asking I swear.
>>
File: 1713721974635_01.png (3.61 MB, 1552x1552)
3.61 MB
3.61 MB PNG
>>100114450
>>
File: 00035-2227307230.png (1.88 MB, 1072x1376)
1.88 MB
1.88 MB PNG
>>100115948
I'm not taking the bait but here's your (You)
>>
File: dena_00062_.png (2.55 MB, 1728x1344)
2.55 MB
2.55 MB PNG
>>
>>
>>100115982
I'm not baiting

There's no instructions on initiating the software. Server.py or execution.py just return an error and don't start.
>>
File: 00037-834085774.png (1.98 MB, 1072x1376)
1.98 MB
1.98 MB PNG
>>100116044
lol (You)
>>
Planning on training an SD3 model, what would you want to see most in a new model?
>>
File: roboquok.png (2.21 MB, 1024x1024)
2.21 MB
2.21 MB PNG
>>100115877
good day
>>
>>100116082
There's a severe lack of general purpose/versatile models.
>>
>>
File: 00002-1109164631.png (1.97 MB, 1024x1536)
1.97 MB
1.97 MB PNG
>>100116082
ANIME
That's all we care about.
Of course, please train the model on furniture, different kinds of clothes, facial expressions, etc.
It takes a lot to make a model usable and not just some one-trick pony. Best of luck, friend.
>>
File: dena_00055_.png (2.58 MB, 1728x1344)
2.58 MB
2.58 MB PNG
>>100116082
seems like you should just go straight for an nsfw model since the most common complaint is gonna be "it can't do nsfw"
>>
>>100116082
ideally it'd understand how to do anything going on in manga/hentai when instructed by a reasonably powerful LLM or (you) including the difficult ones like ha ku ronofu jin nsfw where things cause things.
>>
>>100116044
main.py retard
>>
File: 0.jpg (404 KB, 1024x1024)
404 KB
404 KB JPG
>>
>>100116082
a token limit greater than 75
>>
>>100116082
the obvious answer is just anime porn. i wouldn't really know until i actually get to try it with a proper workflow (copium) and see what it's bad it. every finetune out there just completely butchers it into either a sameface anime porn generator or a sameface 1girl portrait generator
>>
File: 1.jpg (112 KB, 1152x864)
112 KB
112 KB JPG
>>
have they even released a set of captions or their captioner prompt? wasn't this supposed to be trained with natural language captions? it would be nice to know the length/terminology used so that finetune captions don't conflict
>>
We haven't fully explored XL yet why are we thinking about SD3
>>
File: g_0002.jpg (419 KB, 1983x1983)
419 KB
419 KB JPG
>>
>>100116263
Go slow, get lapped.
>>
>>100116207 >>100116130
Honestly *booru like danbooru 202x probably is going to least waste your time tagging data anyhow.
>>
File: sci-fi31.jpg (163 KB, 1024x1536)
163 KB
163 KB JPG
>>
>>100116281
I'm never getting an exquisite details tier XL model am I
>>
>>100116263
I have not fully explored 1.5 yet.
>>
File: 00017-1423708437.png (2.2 MB, 1536x1024)
2.2 MB
2.2 MB PNG
>>
File: g_0003.jpg (312 KB, 1983x1983)
312 KB
312 KB JPG
>>
>>
File: 00009-136854725.png (1.84 MB, 1536x1024)
1.84 MB
1.84 MB PNG
>>100116300
>implying booru tagging is good
>implying booru tagging is competent
>implying booru tagging is consistent enough to make a good dataset
Shortcuts just cause more problems for us all.
fuck off
>>
File: 00027-2113660913.jpg (411 KB, 1792x2304)
411 KB
411 KB JPG
>>
File: 00001-2536060942.png (2.07 MB, 1536x1024)
2.07 MB
2.07 MB PNG
Preparing a dataset needs to be a team effort.
>>
File: file.png (2.06 MB, 1024x1024)
2.06 MB
2.06 MB PNG
>>100116444
Yes, all of this is just fine overall.
>>
>>100116343
Probably not if SD3 can do more exquisite and more XL.

Of course, that will need to happen fast or it'll get bumped to SD4.
>>
>>100116493
>or it'll get bumped to SD4
If Stability AI lives to make that at all. Aren't they deep in the red? They are probably screwed, but only time will tell.
>>
julien is shit
>>
>>100116082
good dataset, train on copyrighted artists and characters, and keep tagging similar to the base model. if you do have to dig through booru tags, be aware that there will be a conflict between the natural language of the model and the tags you might end up using. booru style tags were a good solution to a dumb model problem. as the model gets smarter, tags like that are going to cause more harm than good.
>>
caring about regular posters drama is very very low iq
>>
File: tmp19bid3n3.png (609 KB, 768x1024)
609 KB
609 KB PNG
>>
File: ..png (525 KB, 672x384)
525 KB
525 KB PNG
>>
>>100116532
> as the model gets smarter, tags like that are going to cause more harm than good
I don't think that's a fact that has been objectively demonstrated anywhere.

You can create a system where this is the case but if it's a competent system, why wouldn't it be able to find via tags as well as natural language? If anything natural language people use is less exact.
>>
>>100116516
Buy them for a dollar when they collapse, release the assets, the Internet finishes the job.

>>100116532
Natural language for the win.

Are we not able to get SD to understand that
>these, are, tags, just, put, them, somewhere
and
>when the line of text passes a basic grammar check it's natural language time
...?

Hell, give us two sets of prompt. Natural prompt, tags prompt, and negatives for those. Boomers and Boorus will be happy, and AI kings will master working both together.
>>
>>100116614
>the Internet finishes the job.
With what money, programmers, and hardware?
>>
>>100116587
Even given how English works and how English speakers use it, you need to be able to point many alternative tokens at the same concept where it overlaps.

There should be no issue whatsoever if a tag is used too, if anything the tag should often have the most precise idea of a concept.

>>100116614
Obviously both, but let's also note that among search systems tags have been far more successful and useful so far than natural language boomer descriptions. The issue might as well be on the human side, with people generally having more clear agreement on what tags mean than what every word in English actually visually or structurally or otherwise means exactly.
>>
>>100116587
>If anything natural language people use is less exact.
You mean "natural language" of undereducated promptlets?
Idiot-proofing products is the stupidest, most pointless thing to try and do, countless companies can tell you that.
>>
File: 0.jpg (202 KB, 1024x1024)
202 KB
202 KB JPG
>>
>hyperfine intricate details
>>
So what should happen if a prompt has contradictory tokens? Like, say, black and blue hair, or holding a rifle and also crossing arms?
>>
File: ComfyUI_temp_mksgg_00061_.png (3.15 MB, 1360x1744)
3.15 MB
3.15 MB PNG
>>
>>100116680
>with people generally having more clear agreement on what tags mean than what every word in English actually visually or structurally or otherwise means exactly.

People who want words to have specific meanings use the tag prompt, people wanting to control composition and style would go into natural language. It's really hard for a tag based prompt to respect a described composition. You'll get the things you asked for, but positional relationships are a lost cause. But if the AI could be trained in a context where "X on top of Y" could be learned that "book on top of table" and "flowers on top of grave" and "top hat on top of dancing frog" means what precedes "on top of" is higher on the canvas than what follows, we ought to then be able to use tag prompting to specify exact content and nat lang to put those things into the drawn space rather than rolling seeds till one accidentally gets the arrangement right.
>>
File: elf_0017f.jpg (1.1 MB, 1664x2432)
1.1 MB
1.1 MB JPG
I like both. When I am making a posed girl booru tags are great. With pony it can also simple multicharacter stuff when it hews closely to the kind of images boorus feature. When I am working on a more complicated image with multiple subjects and a lot of fine background details, it starts to get harder and harder to represent this with just tags. Ideally you'd use both: natural language for the base gen to set up the composition, then a fine tuned model for people using tags to inpaint their poses and personal details precisely.
>>
File: ..png (1.32 MB, 1216x832)
1.32 MB
1.32 MB PNG
>>
File: ComfyUI_04132_.jpg (656 KB, 2312x1920)
656 KB
656 KB JPG
>>100116845
dual colored hair unless there is two subjects. for the rifle, you can hold a rife with crossed arms at rest if it's slung correctly
>>
File: g_0004.jpg (438 KB, 1983x1983)
438 KB
438 KB JPG
>>
Now that I have released my jizz to fat Frieren pussy caught in a mimic trap all is good in the world again.
>>
>>100116754
>Idiot-proofing products is the stupidest, most pointless thing to try and do, countless companies can tell you that.
Only fools use fool-proof products.
>>
it it with greater pleasure that i announce cute horse girls with cute horse tails are welcomed in this thread
>>
>>100117057
So how about 2girls and solo?
>>
>>100117148
youtube or follow the github instructions in the OP links. get things running first and then go peruse some models on civit

>>100117150
that's a good one. dunno
>>
>>100116279

FOUR TWENTY BLAZE IT.
>>
>>100117140
fuck off to your furry boards gooner
>>
>>100117185
edibles were too lit yesterday, so today is the stoner posting
>>
File: ComfyUI_temp_mksgg_00096_.png (3.85 MB, 1264x1848)
3.85 MB
3.85 MB PNG
>>
>>100117311
West coast gooners have just woken up from the drug/goon overload last night. Everything goes to shit 11am west coast time.
>>
>absolutely outstanding image
>>
>>100116200
Sigma 300 cap
>>
>>100117312
cool helmet
>>
world is gay, can't wait for the sweet release of death
>>
File: 3569451300.png (2.02 MB, 944x1184)
2.02 MB
2.02 MB PNG
>>100115936
? Are you having problems with it? When using PAG its recommended to lower the CFG a bit
>>
>>100117422
>claims to hate living
>doesn't even kill himself
poser
>>
>>100116857
I'd personally prefer to use positional / relational / logical information with the tags even then rather than natural language, but it doesn't actually work.
>>
>>
File: 1.jpg (98 KB, 1080x1440)
98 KB
98 KB JPG
>>
File: 1689587347683676.png (1.48 MB, 1008x1008)
1.48 MB
1.48 MB PNG
>UnboundLocalError: cannot access local variable 'h' where it is not associated with a value

What is this message telling me? It happens in img2img when I try to upscale beyond this resolution. Seems like if I want a higher resolution, I need to go with another program, then bring that image back in and run it through img2img again to get some detail.
>>
>>100117484
>but it doesn't actually work
Which is kind of a problem.

We need some ChatGPT action that can let us do something like run a gen, then repeat it after adding fixes to the boomer prompt (like "four visible fingers and one thumb on right hand of leftmost woman") and have it actually get that where it drew a hand over there that six fingers and two thumbs was a bit too ambitious.
>>
File: 1.jpg (155 KB, 1080x1440)
155 KB
155 KB JPG
>>
File: ComfyUI_temp_mksgg_00110_.png (3.7 MB, 1552x1552)
3.7 MB
3.7 MB PNG
>>
>>100117758
nicely surreal and creepy
>>
File: upscaledturbo_00076_.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>
>>
>>100117758
>>100117872
Another progressive rock album cover for songs we'll never get to listen to.
>>
File: 0-AFH074262024.jpg (96 KB, 1288x1288)
96 KB
96 KB JPG
>>
File: ComfyUI_temp_mksgg_00119_.png (3.89 MB, 2040x1160)
3.89 MB
3.89 MB PNG
this is really just retreading old ground applying new gen settings
>>
Anyone feels like proompting my schizo (legit) vision I had
a dragon, of the DB type but so fucking massive in the sky that I perceived it as a god
>>
File: 1.jpg (114 KB, 1152x864)
114 KB
114 KB JPG
>>
>>100117758
p cool
>>
>>100117920
my Trypophobia
>>
>>
What the fuck causes regional prompter to gen slightly off pictures sometimes?
>>
File: upscaledturbo_00077_.png (1.24 MB, 1152x896)
1.24 MB
1.24 MB PNG
>>
File: succ_0007f.jpg (1.1 MB, 1664x2432)
1.1 MB
1.1 MB JPG
>>100118145
What do you mean by slightly off?
>>
>>100118178
Like it occasionally it ignores some of the prompts and creates "generic" looking picture.
>>
File: ComfyUI_temp_mksgg_00134_.png (3.29 MB, 1160x2040)
3.29 MB
3.29 MB PNG
>>
Can I not use multiple style loras with regional prompter on forge without it looking completely fucked?
>>
>>100118216
Might be a seed that just doesn't look like what you want it to find. Is it deterministic?
>>
>>100118309
It's not a seed, it seems to be some random words in some particular order that seems to fuck with it.
>>
File: ComfyUI_temp_mksgg_00137_.png (3.84 MB, 1160x2040)
3.84 MB
3.84 MB PNG
>>
File: 1713730702127.jpg (73 KB, 768x1152)
73 KB
73 KB JPG
>>
File: file.png (1000 KB, 960x1088)
1000 KB
1000 KB PNG
>>100117573
I wonder if something like that will show up eventually. Would be nice.
>>
File: ComfyUI_temp_mksgg_00138_.png (3.8 MB, 1160x2040)
3.8 MB
3.8 MB PNG
>>
File: ComfyUI_00162_.png (1.78 MB, 1216x832)
1.78 MB
1.78 MB PNG
>>
File: ComfyUI_temp_mksgg_00142_.png (3.45 MB, 1160x2040)
3.45 MB
3.45 MB PNG
>>
File: ComfyUI_temp_mksgg_00140_.png (3.86 MB, 1160x2040)
3.86 MB
3.86 MB PNG
>>
File: ..png (466 KB, 672x384)
466 KB
466 KB PNG
>>
the gaunaburger, only at toha heavy agriculture
>>
File: upscaledturbo_00078_.png (1.06 MB, 896x1152)
1.06 MB
1.06 MB PNG
>>
File: ComfyUI_temp_mksgg_00151_.png (3.28 MB, 1160x2040)
3.28 MB
3.28 MB PNG
>>
File: seraphic_chicken.png (1.25 MB, 992x1456)
1.25 MB
1.25 MB PNG
is there any model that actually creates proper pixel art without errors?
>>
>>
File: dena_00051_.png (2.63 MB, 1728x1344)
2.63 MB
2.63 MB PNG
>>100117573
instruction-based image editing is a thing but I dunno what resources are actually good
https://github.com/ali-vilab/Ranni
https://github.com/modelscope/scepter
>>
File: 00434-2827966013.jpg (46 KB, 672x1440)
46 KB
46 KB JPG
>>
File: ComfyUI_00282_.png (1.21 MB, 1216x768)
1.21 MB
1.21 MB PNG
>>
File: ComfyUI_04227_.png (2.55 MB, 1920x2312)
2.55 MB
2.55 MB PNG
>>100118299
is there no way to concat the loras?
>>
File: pixelart.jpg (153 KB, 1024x1024)
153 KB
153 KB JPG
>>100118644
slick

>>100118689
what even is going on here?

>>100118807
depends on what you consider an error. if it's a prefect pixel grid that's a matter for postprocessing
>>
File: ..png (1.31 MB, 1216x832)
1.31 MB
1.31 MB PNG
>>
>>100117573
i think that'll either be when parameters are high enough to sufficiently capture nuance and specificity of english, and/or when/if image editor build up enough controlnets and quick/dirty interactive edits to corral the models
>>
File: 00666-TFT_2601701.png (3.89 MB, 2560x1536)
3.89 MB
3.89 MB PNG
>>
File: pixelart2.jpg (181 KB, 1024x1024)
181 KB
181 KB JPG
>>100118934
https://github.com/hako-mikan/sd-webui-regional-prompter#latent
>Slower, but allows separating LoRAs to some extent.
But also not completely. So perhaps you can't really with this.
>>
File: ComfyUI_temp_mksgg_00164_.png (3.02 MB, 2040x1160)
3.02 MB
3.02 MB PNG
>>
>>100118914
why would you do this
>>
File: upscaledturbo_00079_.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>
File: ComfyUI_04239_.png (2.79 MB, 1920x2312)
2.79 MB
2.79 MB PNG
>>100118988
there was an attention couple one that came out not too long ago. maybe try that instead?
https://github.com/Haoming02/sd-forge-couple
>>
>>100119082
jesus christ what is that thing
>>
File: ComfyUI_04219_.png (667 KB, 896x1200)
667 KB
667 KB PNG
>>100119098
>jesus christ what is that thing
picrel is a hint
>>
File: dena_00050_.png (2.34 MB, 1728x1344)
2.34 MB
2.34 MB PNG
>>100119018
I find his gens rather fascinating
>>
File: grid-0006.jpg (1.38 MB, 3600x3200)
1.38 MB
1.38 MB JPG
>>
>>100119116
Actually yes
>>
File: 0-AFH079262024.jpg (189 KB, 1288x1288)
189 KB
189 KB JPG
What's for dinner lads
>>
>>100118939
I just want to mix them but I see a sharp degradation when using forge. One Lora is fine but two style loras don't play well
>>
>>100119163
Lotsa Spaghetti!
>>
File: upscaledturbo_00080_.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>
File: 0-AFH103252024.jpg (278 KB, 1288x1288)
278 KB
278 KB JPG
>>100119276
What will you put on it?
>>
File: file.png (1.62 MB, 1025x1026)
1.62 MB
1.62 MB PNG
>>
File: ComfyUI_04265_.png (3.17 MB, 1920x2312)
3.17 MB
3.17 MB PNG
>>
File: ..png (474 KB, 672x384)
474 KB
474 KB PNG
>>
File: file.png (284 KB, 1920x1080)
284 KB
284 KB PNG
>>100119366
>>
File: ComfyUI_04268_.png (3.65 MB, 1920x2312)
3.65 MB
3.65 MB PNG
>>100119403
kek
>>
File: catbox_5po1hy.png (3.32 MB, 1360x1744)
3.32 MB
3.32 MB PNG
>>
File: ComfyUI_04278_.png (3.77 MB, 1920x2312)
3.77 MB
3.77 MB PNG
>>
>>
File: ComfyUI_04270_.png (920 KB, 960x1152)
920 KB
920 KB PNG
>>
File: cover-4781384115560542.png (312 KB, 512x512)
312 KB
312 KB PNG
>>100106072
Thank you for the helpful guidance! I'll try to apply what you've elaborated upon.

>>100107578
>>100107667
>>100107688
>>100107726
>>100107802
Ngl, this is my favorite thus far in these threads. Mysterious anon with godlike gens, please show yourself (and give me your Patreon kek)
>>
File: 0.jpg (190 KB, 1024x1024)
190 KB
190 KB JPG
>>
>>100119474
brilliant glitch dada collage
>>
>>100119565
he's on a 3 day vacation
>>
File: ComfyUI_04303_.png (3.52 MB, 1920x2312)
3.52 MB
3.52 MB PNG
>>
File: ComfyUI_00428_.png (1.42 MB, 1216x768)
1.42 MB
1.42 MB PNG
>>
>>100119690
momentous work, jules
>>
File: ..png (547 KB, 672x384)
547 KB
547 KB PNG
>>100119699
my humble abode
>>
File: dena_00049_.png (2.57 MB, 1728x1344)
2.57 MB
2.57 MB PNG
>>100119699
>9x8ft one room house
>1.4mil
but its in a nice neighborhood!
>>
File: ComfyUI_04291_.png (946 KB, 960x1152)
946 KB
946 KB PNG
>momentous work, jules
>>
>>100119699
indistinguishable from reality, gj.
>>
File: 00146-2368601654.png (1.61 MB, 896x1088)
1.61 MB
1.61 MB PNG
>>
File: ..png (1.31 MB, 1216x832)
1.31 MB
1.31 MB PNG
>>
File: grid-0007.jpg (1.48 MB, 3600x3200)
1.48 MB
1.48 MB JPG
>>
File: upscaledturbo_00081_.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
>>
File: ComfyUI_00480_.png (1.56 MB, 1216x768)
1.56 MB
1.56 MB PNG
>>
File: 0.jpg (551 KB, 1024x1024)
551 KB
551 KB JPG
>>
>>100119324
sliced bananas and pineapple
>>
File: dekm_00057_.png (3.38 MB, 1344x1728)
3.38 MB
3.38 MB PNG
has anyone tried out kohaku epsilon? I've been finding it rather hard to work with and dunno if its just me or if its a weak model
>>
File: grid-0009.jpg (1.96 MB, 3600x3200)
1.96 MB
1.96 MB JPG
>>
>>100115837
>>StyleBooth: Image Style Editing with Multimodal Instruction
>https://ali-vilab.github.io/stylebooth-page/
i am once again asking if anyone here has successfully tried this out, and if so, can you catbox a working python or google colab notebook
>>
>>100120089
It just seems strictly worse than animagineXL3.1 in every way.
What a shame.
>>
>>100120296
be the change you want to see
take a dive and let us know how it goes
>>
File: dekm_00055_.png (2.91 MB, 1344x1728)
2.91 MB
2.91 MB PNG
>>100120296
we have enough followers. we need a leader

>>100120366
thats exactly how I feel about it :(
>>
>>100119565
>Shitter with shit opinions
Fuck off
>>
>>100120371
>>100120394
i'm trying but i'm retarded with python and dependencies n sheeit and I can't figure out why even when i manually pip install torch==2.2.1 why I get an error saying ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.2.1+cu121 requires torch==2.2.1, but you have torch 2.0.1 which is incompatible.
torchtext 0.17.1 requires torch==2.2.1, but you have torch 2.0.1 which is incompatible.
>>
File: ComfyUI_00679_.png (1.75 MB, 1216x768)
1.75 MB
1.75 MB PNG
>>
>>100115740
Am I the only one getting way better outputs with 1024x1024 on Pony compared to e.g. 896x1152?
>>
>>100120454
I find pony works best at 768x1280 however 1024x1024 isn't bad
>>
File: dekm_00054_.png (3.07 MB, 1344x1728)
3.07 MB
3.07 MB PNG
>>100120437
this hiking trail is only a few miles from where I live. there's benches down near the cliff where you can watch the waves crash
>>
>>100120437
are you just feeding in a real picture at low denoising to get the filename, or is there a "weekend away snapshit" lora out there
>>
File: ComfyUI_00697_.png (1.46 MB, 1216x768)
1.46 MB
1.46 MB PNG
>>100120544
check out the boring reality lora. it advises to use it with the base sdxl model but I've been using it with sds_film
>>
>>100120571
NTA but cool, I might take this for a spin
I was eying the VHS ones for a bit to see if I could make some creepy gens but this also tickles my fancy
>>
>>100119659
thanks, heres a portrait just for you
>>
File: this_ones_fine_i_guess.png (1.09 MB, 832x1216)
1.09 MB
1.09 MB PNG
>>100120394
It's fine for 1girl stuff I guess, but I feel the results are always worse than what Animagine would have produced.

It fucks up hands a lot more and losing good gens to that always sucks.
I'll keep trying it, but I don't have too high hopes desu...
>>
File: 00152-2729173364.png (593 KB, 768x512)
593 KB
593 KB PNG
>>
>>100119660
Very specific answer desu
>>100120401
I am a shitter. We all gotta start somewhere.
>>
>>100120653
Unfortunately for you there is no hope based on your taste.
>>
File: dekm_00052_.png (2.96 MB, 1344x1728)
2.96 MB
2.96 MB PNG
>>100120647
the only thing I've found I really like about it is that it does really interesting manga layouts. but then it blunders all the details so its worthless. maybe I should do a dual-model workflow with KE for the first pass and animagine for the hires
>>
>>100120697
I want to know what it says
>>
File: 00875-TFT_26016757.jpg (1.27 MB, 2560x1536)
1.27 MB
1.27 MB JPG
>backgrounds in pony
it's slightly better after I experimented with some merges but still horrible compared to 1.5
>>
>>100120755
that looks fine and fitting for the character's illustrated style? it looks better than most 1.5 garbage. the background shouldn't be equally or more detailed than the character, that's one of the telltale signs of 1.5 ai slop: overly detailed nonsensical backgrounds
>>
File: dekm_00050_.png (1.92 MB, 1344x1728)
1.92 MB
1.92 MB PNG
>>100120731
sadly we'll never know what the Ai was thinking
>>
File: 00196-[TFT]-878290469.png (1.75 MB, 1024x1536)
1.75 MB
1.75 MB PNG
>>100120793
hm well compare to this 1.5 picture, I think the background looks better
>>
>>100120755
>horrible compared to 1.5
I fucking HATE when my backgrounds are consistent. I won't even give you any help cause you're trollin
>>
File: 00098-587887629.png (559 KB, 512x768)
559 KB
559 KB PNG
>>
File: bandage.png (1.52 MB, 832x1216)
1.52 MB
1.52 MB PNG
>>
>>100120806
that is rather subjective, and you are comparing two completely different types of shots
>>
So is stable diffusion 3 released or not?

What does API release mean?
>>
File: cute_but_flawed.png (1.02 MB, 832x1216)
1.02 MB
1.02 MB PNG
>>100120697
What I've found so far is that it's a lot better at generating zouri and tabi (the kinds of sandals and socks miko wear).
Other models always render generic socks and individual toes and that always bothered me.
>>
>>100120859
no and it mean's ignore it until they actually release
>>
>>100120859
> What does API release mean?
money and the usual free pass of coping that isn't the final version, just like XL on clipdrop. also don't bother with SD3, the license is a mess.
>>
>>100120855
the picture he posted that "looks better" makes 0 sense. why is there a 40 foot sand dune behind that rock formation? why does that rock formation have a perfectly straight pillar? what does that rock formation have a gun trigger? etc etc etc
>>
>>100120859
SD3 doesn't matter until 6mo+ after people can start training models for it
>>
>>100120904
>sand dunes are huge
>wtf are ruins
Here's your (You) since you're starving
>>
>>100120890
>the license is a mess.
What changed between the license to SDXL and SD3?

If they release the model people are gonna fine tune it anyway, license or not
>>
File: upscaledturbo_00082_.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
>>
>>100120422
you need matching versions of all the torch packages. add torchaudio==2.2.1 torchtext==2.2.1 to the install command.
>>
File: portrait.png (927 KB, 832x1216)
927 KB
927 KB PNG
...It *does* generate some pretty cute images, can't deny that...
>>
File: file.png (163 KB, 256x457)
163 KB
163 KB PNG
>>100120931
>non Euclidian backgrounds are LE GOOD
>>
File: dekm_00049_.png (2.69 MB, 1344x1728)
2.69 MB
2.69 MB PNG
>>100120984
I think it's sole purpose is "portrait of character" and it completely falls apart if you try to do anything else
>>
File: 2391936076-3356790775.jpg (929 KB, 2688x1536)
929 KB
929 KB JPG
I wonder if SD3 has the classic stable diffusion tendency that when you put 'elf' in the prompt it wants to give you a cross between green christmas elves and keebler elves.
>>
>>100120957
60 year old saggers on a 20 year old
>>
>>100121003
What? Image generations aren't perfect?
oh my godddddddddddd
>>
File: astolf.png (1.12 MB, 832x1216)
1.12 MB
1.12 MB PNG
>>
File: ..png (477 KB, 672x384)
477 KB
477 KB PNG
>>
File: upscaledturbo_00083_.png (1.43 MB, 896x1152)
1.43 MB
1.43 MB PNG
>>
>>100121051
lol
lmao
>>
>>100121061
you're clutching your black and white tv and screaming that it's better when anyone with eyes can see that you're wrong.
>>
>>
>>100121108
What are you even going on about?
More (You)'s for the starving third-worlder
>>
File: dekm_00048_.png (2.64 MB, 1344x1728)
2.64 MB
2.64 MB PNG
>>
File: 000000_12046_.png (2.2 MB, 1434x932)
2.2 MB
2.2 MB PNG
>>100121096
Nice
>>
File: 0.jpg (387 KB, 1024x1024)
387 KB
387 KB JPG
>>
File: ..png (559 KB, 672x384)
559 KB
559 KB PNG
>>
File: upscaledturbo_00084_.png (1.52 MB, 896x1152)
1.52 MB
1.52 MB PNG
>>
File: 00068-TFT_26016762.png (3.77 MB, 2560x1536)
3.77 MB
3.77 MB PNG
>>
I don't understand why the amount of pictures you have changes the amount of steps necessary to train a lora.
>>
File: ..png (1.22 MB, 1216x832)
1.22 MB
1.22 MB PNG
>>
File: ComfyUI_00798_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
>>100121280
inverse fletchet cosine
>>
File: variety.png (334 KB, 1537x865)
334 KB
334 KB PNG
>>100121280
kek..
>>
>>
>>100121280
the images themselves add steps dingus
(Training Images * repeats)/batch size * epochs
>>
So I've been training a few LORAs on the same dataset recently and I can't help but notice how larger network ranks are directly tied to the quality of the output.
I feel like people recommending anything less than the largest rank your GPU can handle is just vramlet cope.
>>
>>100121444
post them then faggot
>you wont
It's all larp
>>
>>100121444
Thank you for saying so, it's great to hear this kind of information. Do you have any comparison between network ranks? It'd be great to see evidence of the kind of difference it makes.
>>
>>100121469
Gimme a moment, I'm training some right now so I can't gen any comparisons. But I stand by what I said.
>>
>>100121362
trudeau blackface lora when?
>>
>>100121444
"optimal" is def higher than people recommend but it isn't as simple as higher = better.

On pony I have found 128 best, I can train at 256 but it starts to look fucked.

People saying train it on 8 are retards though
>>
>>
>>100121487
alright cool, I'm curious about the science you're doing/have done
>>
>>100121509

True, 256 starts basically reprinting the training data very fast but in fucked up ways.
>>
>>100120935
you need to pay for commercial use
>>
>>100121505
When I grab a 3090.
>>
>>100121556
How would they ever police that?
>>
>>100121509
>>100121524
I used to do requests for LORA training and trained at 128 network and 64 alpha, but people complained about the size of the LORA file. I ended up reducing it to 64/32. If there is evidence that 128 is better though I'd definitely want to switch back.
>>
>>100121580
>nooo not the heckin 200mb file
tell them to keep themselves safe
>>
>>100121564
sue you if you're using an sd3 generated image commercially?
desu, that'd probably cause a big court loss in the corner of generative ai though...
>>
>>100121564
emad unironically said "honesty"
>>
>>100121595
the vast majority of celebrity loras on civit are between 800 and 900 MB. It's a real problem.
>>
>>100121651
honestly, is it?
it's like 50$ per TB of storage at most
>>
>>100121444
>>100121509
>>100121524
>>100121580
>>100121595
>>100121651
you can resize them after you train them
>>
>>100121651
>>100121663
>>100121595
based and I don't give a shit about 5 cents of storage space pilled

>>100121671
or I can simply not
>>
>>100121663
storing the loras isn't the issue. loras have to be loaded in VRAM. can quickly run out of room with 1gb loras.

>>100121671
Interesting. googling.
>>
>>100120965
thank you. trying this now
>>
>>100116832
>>100117355
>maximum details
>extreme hyperrealistic details
>trending on artstation
kekt
>>
>>100121676
>or I can simply not
it retains a lot of the quality without taking up so much space. there isn't a reason to keep them that big

>>100121686
>storing the loras isn't the issue. loras have to be loaded in VRAM. can quickly run out of room with 1gb loras.
this too
>>
>>100121722
>score_9
I bet this will start showing up forever in future models completely unrelated to pony
>>
>>100121740
everyone universally hates it so no
>>
File: 00212-3159343615.png (2.33 MB, 1072x1376)
2.33 MB
2.33 MB PNG
>>
>>100121759
more like people will forever blindly copy prompts from images and a huge % of images over this time period will have that
>>
>>100121766
this
>>
File: 00166-TFT_26016766.png (3.53 MB, 2560x1536)
3.53 MB
3.53 MB PNG
>>
File: dekm_00047_.png (2.81 MB, 1344x1728)
2.81 MB
2.81 MB PNG
>>
>>100121740
>>100121759
I'd want to train a non-cucked SD3 model so people don't have to deal with Pony anymore, but it would need to not suck to compete. I don't mind spending some money renting A100s for the training but the dataset needs to be well done and that seems like a challenge.
>>
How about some chrome?
>>
>>100121793
if you have a budget what you do is literally hire people (probably Indian) to tag massive amounts of data for you.
It isn't tech that is the limitation for a great model, it's datasets
>>
>>
>>100121793
It will need to be a coordinated group effort
>>
File: dekm_00045_.png (3.02 MB, 1344x1728)
3.02 MB
3.02 MB PNG
>>100121804
me in the back (I'm an orb)
>>
>>100121766
to be fair, in the early days of base 1.5, there were some decently complex negative prompts floating around that worked much better than embeddings, and the "amazing quality, masterpiece, award-winning photography" prompts did make a decent difference in quality when trying to gen photoreal people
>>
>>100121793
we need a way to collaboratively put together datasets from all out lora training without retards shitting it up. the latter is the hard part
>>
File: 0.jpg (246 KB, 1024x1024)
246 KB
246 KB JPG
>>
File: dena_00047_.png (2.04 MB, 1728x1344)
2.04 MB
2.04 MB PNG
>>
>>100121846
The only way to really vet people and pay for the training of such a model very quickly begins to resemble something like a real company except its employees dont get paid.
>>
>>100121887
>>100121846
which is why you just don't bother and pay up to the pajeets
>>
File: dena_00046_.png (2.35 MB, 1728x1344)
2.35 MB
2.35 MB PNG
>>
>>100121894
>pay up to the pajeets
that's how you get LAION
>>
What I don't understand about pony model tags are:

> score_8_up
Does this mean score 8 and up? If that's the case it shouldn't even need a score_9? The advice I got when I first started using it, can't remember from where, said to use something like:

> score_9,score_8_up,score_7_up,score_6_up
But this seems redundant if "up" means what I think, so I assume I'm wrong. Also do you have to put a BREAK after the score stuff? At first I was doing that, but then I stopped partly because it was tedious to manage that in comfyUI and it didn't seem to make a whole lot of difference.
>>
>>100121906
>>100121887
>>100121894
alright let's see to have a great model I need to just compete in the same space and style as the billion dollar companies but do it for free with 2k of equipment instead of microsoft using 1$ billion.

I think I will continue to wait for others to provide the models and focus on loras
>>
>>100121918
Check the PonyXL CivitAI page - the trainers literally admit there they fucked up the training so the score numbers are broken and don't function correctly/logically. Literally the only reason it's used is because there's no better alternative XL model. That why I'm really hoping to either train something better or that someone else will because as soon as there's something better that's not cucked there will be no reason to use PonyXL ever again.
>>
can i train on sdxl_vaefix or do I have to train on the model without the built in vae?
>>
>>100121840
You're highly reflective. Good.
>>
>>100121936
they use synthetic datasets like they say they use in their papers
>>
baker...
...baker?
b
a
k
e
r
>b
>a
>k
>e
>r
>>
>>100121887
>>100121846
>>100121838
>>100121812
>>100121968
>>100121936

The process I was thinking was to grab a lot of booru images since those are easy, use a Python script to clean/synchronize tags between different boorus, and then use an AI to convert the tags into natural language which should greatly improve prompting based on the ELLA research: https://ella-diffusion.github.io/

That dataset could be supplemented with more manually gathered images to cover characters/styles/concepts people want, and those images would need to be fed to an AI for captioning too.

I'd thinking I'd need some huge harddrives if I'm going to store the data set locally, maybe pay for a Chat GPT4 subscription to caption safe images, and setup something local to caption explicit images.

For people with any training experience, does that seem reasonable?
>>
>>100122115
would it improve prompting with tags or just make it better at understanding natural language though. I don't think you would get the result you are hoping for.
>>
>>100121714
>>100120965
ok this seems to work but turns out the thing i'm trying to run is not the thing i actually want to run lmao

>>100121862
(painting, traditional media)
>>
>>100122133
"1girl, apple" doesn't provide enough information to the AI to understand location, color, etc. If a language model can turn that into "1girl holding a red apple in her right hand" than the resulting model is leaps and bounds ahead in understanding prompts. That's what the ELLA research found, was going to post one of the pics at https://ella-diffusion.github.io/ but we hit the image limit.
>>
it's actually over this time
>>
>please wait before making a thread
>>
>>100122188
4 u
>>
Next thread
>>100122230
>>100122230
>>100122230
>>
>>100116195
i like this
>>
>>100120089
more intelligible paneling than oda
>>
>>100120755
Try Worldly lora, DPM++ 3M SDE, and maybe perturbed attention guidance



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.