[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.
  • There are 111 posters in this thread.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AltOPp.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
derpy.me/FDnSk
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
GDrive clone of Master File now available >>37159549
SortAnon releases script to run TalkNet on Windows >>37299594
TalkNet training script >>37374942
GPT-J downloadable model >>37646318
FiMmicroSoL model >>38027533
Delta GPT-J notebook + tutorial >>38018428
New FiMfic GPT model >>38308297 >>38347556 >>38301248
FimFic dataset release >>38391839
Offline GPT-PNY >>38821349
FiMfic dataset >>38934474
SD weights >>38959367
SD low vram >>38959447
Huggingface SD: >>38979677
Colab SD >>38981735
XFL split archive >>39024207
Huggingface textual inversion >>39050383
NSFW Pony Model >>39114433
SD show backgrounds >>39418552
so-vits-svc offline ui >>39557072
Text to music model >>39570848 >>39571778
diff-svc mirror >>39567520
so-vits-svc tutorial >>39593802
Deep ponies app >>39576010
so-vits-svc on AMD/ROCm >>39578882
so-vits-svt for colab >>39592429
Potential new TTS >>39629052
Hardware heist >>39665051
New DeltaVox >>39678806
so-vits-svt 4.0 >>39683876

PPP REDUB 4 COMPLETE!
See FAQs for link

>The PoneAI drive, an archive for AI pony voice content:
derpy.me/LzRFX
derpy.me/GOpGP

>The /mlp/con live panel shows:
derpy.me/YIFNt

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
mirror: derpy.me/c71GJ

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>39623147
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: derpy.me/FDnSk
Main: derpy.me/g3fFA

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: derpy.me/A8Us4
How to get the best out of them: derpy.me/eA8Wo

>Where can I find content made with the voice AI?
In the PoneAI drive: derpy.me/LzRFX
And the PPP Mega Compilation: derpy.me/GOpGP

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
derpy.me/pVeU0
derpy.me/Jwj8a

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
derpy.me/CQ3Ca

>Links
/mlp/con: derpy.me/tledz derpy.me/14zBP

PPP Redubs:
Ep1: derpy.me/xZhnJ derpy.me/ELksq
Ep2: derpy.me/WVRAc derpy.me/RHegy
Unused clips: derpy.me/VWdHn derpy.me/OKoqs
Rewatch Premiere: derpy.me/EflMJ
Ep3: derpy.me/b2cp2 derpy.me/RxTbR
Ep4: drive.google.com/file/d/1iQzIeGZTbxcu2BJcPHtmyxtiu2mV-MhN/view?usp=sharing pony.tube/videos/watch/c2a0b52d-344a-4240-a415-9a303b734777
Rewatch Premiere: mega.nz/file/4AklSZgI#9cAkzRz-81BGNX7dKKpOChDAg3x3KCoBKuSa1lJfDVs
>>
File: chor.jpg (32 KB, 337x500)
32 KB
32 KB JPG
>>39686748
Anchor.
>>
File: P-R-I-RI.png (5 KB, 506x137)
5 KB
5 KB PNG
>>39686727
>>
Mares?
>>
>>39686854
Generally, I find that converting a man's singing voice to a woman's requires you to raise by a fifth (7 semitones).
>>
>>39687473
That's gonna depend heavily on the vocal range of both,. There's also a concern of instrumental accompaniment if you're doing a quick vocal swap, which usually sounds crummy if you pitch shift it too much.
>>
File: 2622183.gif (2.05 MB, 480x270)
2.05 MB
2.05 MB GIF
Fluttershy so-vits-svc 4.0 (30400 steps, 1689 epochs): https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Fluttershy%20(singing)
Combined 4.0 model (before): https://files.catbox.moe/9le1sr.mp3
Individual 4.0 model (after): https://files.catbox.moe/edsxj9.mp3
>>
>>39687790
I'd be interested in seeing some individual comparisons between 3.0 and 4.0, not just combined vs combined or combined vs individual.
Also, can you link the reference audio for that?
>>
File: PPP logo alt1.png (2.12 MB, 2480x3508)
2.12 MB
2.12 MB PNG
>>39686116
>>39686203
Applied some color changes to make it more similar to the OP pic, let me now if you prefer this color palette
>>
>>39688065
Its an improvement, but I would still say to keep the Op pic how it is.
>>
File: PPP logo alt1fix.png (2.13 MB, 2480x3508)
2.13 MB
2.13 MB PNG
>>39688070
it is not supposed to replace the OP pic, friend. just a little logo for autism purposes
>>
>>39688000 (checked)
Sure.
3.0: https://files.catbox.moe/rf9muj.mp3
Ref: https://files.catbox.moe/25xcyl.mp3
3.0 is quieter because it was trained before volume normalization was implemented upstream.
>>
>>39687790
Wow, individual Flutters sounds great!

>>39688118
So 4.0 adheres to volume? That would explain why I had to manually do the reduction and fade out for the end of the MAS RD kitchen audio.

I had a quick read through the translated github of 4.0 and thus far it seems it uses 44khz instead and uses even less memory? So that means longer samples at once without risk of OoM. It mentioned other features too, but are there any changes between them that perhaps aren't mentioned on the github? Like perhaps features this "Content Vec" introduces.

>"The data set production and training process are consistent with 3.0, but the model is not universal at all, and the data set also needs to be re-preprocessed"
Not sure what this means (Possibly mistranslated). So, the process is the same, but still isn't compatible? I wonder what this re-preprocessing is about.
>>
>>39688364
Yeah, he means that it trains just like 3.0, but old models don't work, and you'll need to run resample.py on a fresh dataset, since it's resampling to 44khz now.
>>
>>39688364
The volume normalization I am referring to is simply adjusting the volume level so that the peak is at 0dB. In most cases this results in a volume boost. This was implemented at some point during 3.0, but not all of the 3.0 models were trained with volume normalization enabled.

>uses 44khz instead and uses even less memory
I believe so, at least for inference.

>perhaps features this "Content Vec" introduces.
From my very limited understanding ContentVec is a way of encoding the "content" of speech in a manner disentangled from variations introduced by the speaker. It's supposed to be a higher performance replacement for HuBERT which was used in 3.0. I'd expect this to mean better similarity to the target character given arbitrary reference audio (which seems fairly certain given the models I've trained thus far).

>Rest
>>39688591
Is correct. Also the use of ContentVec instead of HuBERT for feature extraction (although most of the actual code seems to still use "hubert" or "h" for naming purposes).
>>
File: 1575638287119.png (6 KB, 1230x95)
6 KB
6 KB PNG
>>39683324
>https://desuarchive.org/mlp/thread/37983482/#38002630
Fuck you.
>>
Yet another svc shows up.
https://github.com/yxlllc/DDSP-SVC
Should be trainable even on GTX 1660
>>
>>39688871
Any samples? My experience with ddsp/sawsing was that the voices were pretty rough.
>>
File: 1676829192432613.png (785 KB, 4000x3659)
785 KB
785 KB PNG
>>
Week count?
>>
File: The answer.png (45 KB, 2376x108)
45 KB
45 KB PNG
>>39689595
>>
>>39689607
It's 2 isn't it?
>>
>>39688118
Hm, okay, that actually sounds rather noticeably better. Curious about normal talking too, but the singing is definitely better. 4.0 is probably worth retraining for.
>>
>>39688791
You're not wrong, but for the sake of historical accuracy keep in mind that exaggerating his "achievements" was a part of his attentionfagging. In the same thread where it was discovered that he's the Noxfag, someone claiming to be the allahfag said that at least that part of this post is bullshit. That was while Delta was still denying all accusations.
>>
>>39690236
I wouldn't be surprised if the alleged allahfag was Delta himself trying to soften the accusations because he already knew he was outed by then.
>>
>>39659495
>(Re: ControlNet) I could imagine that a new model could have potently be trained on fanart and leaked animation images too to allow for same control over the pony models?
>Could you use a pose estimator(?) model that is trained on the leaked flash animation files, to estimate the pose, bone transforms and 2D assets that are used in a post-rendered animation (or something along those lines)?
I think so, yes. It should be possible to deterministically extract pose information from the animation data, train a pony pose extractor from that, and convert all show data into pose-to-image training data. We'd need to train our own pose extractor since the current pose models only work on realistic human images. They supposedly don't work well for cartoon humans, let alone ponies.
>>39660412
With Astra's Pony models plus ControlNet, I've changed my mind on the usefulness of 2D image generators for animations. My guess right now is that a pony model, custom ControlNets, upscaling/downscaling, and in-painting could be enough for creating show-quality animations without going using any vector graphics after training.
I think we'll need to train our own ControlNets. I've seen examples similar to >>39660533 of it working with animals, but I don't expect the standard ControlNet models to be robust for ponies, especially since, e.g., the pose module has no way to explicitly specify horns and wings >>39660672. If we can get a custom pony pose ControlNet, the only additional thing we'd need is a way to generate a frame-by-frame sequence of poses, which should be much easier than trying to generate the full frame-by-frame vector data. For background & assets, the current ControlNets + a large background & assets LoRA might be good enough.
>>39666621
Fixed, and I replied in the Fan Site thread. I'm waiting for someone to test it.
>>39687483
I'm still around, but my time's been hijacked recently so I haven't been checking the thread that often. I'm going to be spending a lot less time on data and a lot more time on random AI things. You're probably not going to see much output from me for the next few months.
>>
>>39683021
>>39683324
Re: training code for LLaMA.
https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama
>>
>>39690397
>https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama
No information on hardware requirements it seems, although I assume you can't finetune these models on a single consumer GPU.
>>
File: 1082919.gif (495 KB, 371x601)
495 KB
495 KB GIF
so-vits-svc 4.0 Applejack singing (2730 epochs, 46400 steps)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applejack%20(singing)

Individual (new): https://files.catbox.moe/121dvr.mp3
Combined: https://files.catbox.moe/wzyzda.mp3
3.0: https://files.catbox.moe/444jde.mp3
Ref: https://files.catbox.moe/8fcsnt.mp3

The difference is a bit less clear here. You might generously attribute this to my own pronunciation, which was not my best Applejack impression.
>>
Is it just me or did Rainbow Dash's singing voice get a lot harsher throughout the show?
>>
>>39691035
keep at it, when i finally find some free time I'll test with my aussie accent to see how well it performs compared to last time, heh
>>
>>39691035
>Added automatic pitch f0 prediction for voice conversion (will be out of tune if used with singing voices)
So what kind of setting in the training Colab should one do if they would want to train an singing voices?
>>
File: file.png (65 KB, 934x729)
65 KB
65 KB PNG
I finally decided to try it out for myself, it isnt as complicated as I tought it would be.

In my first test with Talknet I tried to make the ponies speak spanish. Trixie seems to do it best
https://files.catbox.moe/jqjr85.wav
https://files.catbox.moe/u1sjw2.wav

In my second test, I wanted Starlight to sing the soviet anthem, but I dint find her in the colab version of Talknet so I just did it with Trixie. It didnt go as planned
https://files.catbox.moe/uf6gu6.wav

reference audio:
https://files.catbox.moe/plaq4r.mp3
https://files.catbox.moe/945njf.mp3

I think the main problem is my bad pronuntation on some words that causes the ponies to sound strange. As for the bad quality of the singing audio, I have no idea, maybe just bad entonation?
if this is what a complete newbie is able to do with talknet I cant wait to try out 15.ai
>>
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject
>>
some previously unleaked fla files that were circulating privately and being gatekept
https://files.catbox.moe/vwjj0q.7z
>>
>>39686757
Tutorial for the training of the So Vits SVC Training Models
https://www.youtube.com/watch?v=pXAQCTEZrFU
NamelessHero_eng (its 3.0 model, I will get version 4.0 trained and uplaoded on to huggingface sometime in the future)
https://mega.nz/file/M1803RTT#_a2MrdVo-BZq0S-g59NBsYzDWyC75gTvUvH0KhYFrGs
>>
>>39692703
>76 files
>All EQG
No use for any of that here, and not a significant quantity compared to everything else we already have from the studio leaks, though interesting to know that there's unreleased stuff out there. Would there happen to be anything else from the same source?
>>
>>39692837
Slight correction - it's not ALL EQG, 7 of the .fla files are pony, of which 6 look like fan concept images and only one is actually animated - a collage of portraits of simple pony talking, pic related.
Still wouldn't say this is particularly useful for our purposes.
>>
>>39692902
>fan concept images
no, they're source files for official promo images and some of them (like royalty.fla) contain S1 animation.

for example Hasbro used to distribute Friendship_2 a lot
https://www.wsj.com/articles/discovery-to-take-control-of-the-hub-network-1410979842
>>
File: 6445287.gif (2.51 MB, 434x435)
2.51 MB
2.51 MB GIF
so-vits-svc 4.0 Twilight singing
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Twilight%20(singing)
Samples:
https://files.catbox.moe/vbdc91.mp3
https://files.catbox.moe/9yxmmt.mp3
https://files.catbox.moe/hvq4yj.mp3

>>39691673
Automatic pitch prediction is an inference option; it's not involved in training.

>>39691988
TalkNet uses English phonemes. You might have some better luck with so-vits-svc 3.0 (or 4.0 once I get models trained) although at this point I don't know how easy it is for a beginner to actually set it up.
>>
new maka using so-vits-svc
https://www.youtube.com/watch?v=ZQNZxDJiRBA
>>
>>39693094
I rate it generic/10.
Also the animation seems bit wonky too.
>>
>>39692703
Ah right, I remember seeing those posted before: https://desuarchive.org/mlp/thread/34819795/#34831574
>>
>>39693094
Ugh. Another "I know sad is bad, but don't feel sad, I'm here for you" story? Christ.
>>
>>39686748
I’m curious: did anyone here see any of the posters put out by the AGU fall meeting? Or did anyone besides me submit a poster?
>>
File: 1677606745947687.jpg (14 KB, 328x261)
14 KB
14 KB JPG
>>39692902
is that the adobe animate from synthbot?
for some reaosn when i downloaded it again it required a password to extract
i got the program from a torrent site instead which was pre activated
everything seems to be working right but since its a torrent im a little nervous
>>
>>39692331
Live in ~45 minutes. Animating.
cytu *dot* be/r/PonyPreservationProject

>>39694760
>is that the adobe animate from synthbot?
Yes.

>Password
iwtcird
https://desuarchive.org/mlp/thread/38293341/#q38335424
>>
File: 2557645.png (99 KB, 927x1017)
99 KB
99 KB PNG
>>39695543
thank you so much
have this twi i was playing with on me
>>
>>39692902
that looks like a scene from Rarity's "becoming popular" song from Sweet and Elite.
>>
Rarity and Rainbow Dash singing models, so-vits-svc 4.0.
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(singing)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rarity%20(singing)

Samples:
https://files.catbox.moe/c2lmbb.mp3
https://files.catbox.moe/1o0g0k.mp3
https://files.catbox.moe/ccbffa.mp3
https://files.catbox.moe/7s76wf.mp3

I also made an alternate Rainbow Dash model being selective with some data for a less harsh voice (closer to what she used in S1). The effect isn't very dramatic, but it's there.
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(singing%20alt)
https://files.catbox.moe/vap9dd.mp3

For fun:
https://files.catbox.moe/gzkobw.mp3
https://files.catbox.moe/02ukpo.mp3
>>
>>39696145
>For fun:
>https://files.catbox.moe/gzkobw.mp3
>https://files.catbox.moe/02ukpo.mp3
Really nice. Adding those to my playlist. One day I'll be really shocked when I hear mares singing Roundabout.
>>
Any chance for a Colab version for the So-Vits 4.0?
>>
>>39696385
I'll start working on one if no one else is doing it
>>
>>39696589
Yes please
>>
>>39696145
Second Dash voice is definitely better, though still some ways off.

>For fun
Nicely done, the voices are still impressively capable despite the current lack of accuracy.
>>
>>39696145
Im trying to install the so-vits 4.0 on my windows but im getting "error: no module named 'fairseq'"
using 'pip install fairseq' fixes it
Also there was a quick message popped out about lack of clustering model to be found (but its working fine so dunno)?
And there is problem with the UI being way too tall, I can't access the preview button (and I can only just about press the "convert" button"), could it be possible to make the UI window scale-able (or an option to make 25% smaller)?
>>
>>39697031
Thanks. I've added fairseq to requirements_win and reduced the size of the UI for 4.0.
>>
>>39697031
>>39697629
Also--the lack of clustering model is not an issue.
>>
Pits, successor of VITS just released most of their code expect training code
Demo and model is released
https://github.com/anonymous-pits/pits

https://huggingface.co/spaces/anonymous-pits/pits
>>
>>39692703
This is amazing thank you anon, idgaf if its EqG or "not of use" any assets related to the show are fucking gold, Is there more gatekept files like this? Do you know the source?
>>
>>39696598
so-vits-svc 4.0 inference notebook
https://colab.research.google.com/drive/1128nhe0empM7u4uo5hbZx5lqjgjG1OSf?usp=sharing
>>
File: BaseImage.png (9 KB, 82x88)
9 KB
9 KB PNG
We just dumped the mlp leapfrog game, there's a ton of original voice lines and music!
There is a data.arc that seems to contains all the rest of the assets, we were only able to extract a single png and gif from it though, perhaps one of you datachads could assist in decompiling it further?
Dump: https://www.mediafire.com/file/7sc6wnnaexv3pch/MLP_Leapfrog_Game.zip/file
Tool that extracted two images: https://github.com/panzi/mediaextract
>>
>>39698236
This was the only relevant asset extracted from data.arc so far
>>
>>39698078
Very neat, but can there be an option to download and use a custom models too? (like if someone sharers their things on another HF page, or Mega or google drive?)
>>
>>39698389
I have an inference here that supports custom links, Was told by Amoart's tutorial to post it here.

https://colab.research.google.com/drive/1xUK-bdsrGoV5PUJz3z1cArTH0of1CWON#scrollTo=IpksmWvf0v1H

I have my own translate for the training colab too but you guys seem to have your own anyway
>>
>>39698403
I can't test this script, there is no G_ model in provided testing model and the vul singing models are not uploaded in a zip....
>Never Gonna Give You Up 4K.mp4 1.73GB
oh, you are just shitposting, never mind then.
>>
>>39690386

I'm not too sure as to the progress that has been made in the logic behind animating with AI, as to whether or not the whole image will be generated, or if pre-made vectors will be posed by an AI, but I was thinking about a way that could make it possible to automatically animate speech as well:

If you have a mouth shape for each letter that you're trying to lip-sync to (like is standard in 2D animation I believe, and was done in MLP), could you get an audio-to-phoneme extractor AI thing with timestamps, then just pick out a mouth shape file that is assigned to each phoneme in whatever you're lip-syncing to for each of the timestamps. Then all you've got to do is make some script to compile those image files together into a video at the frame specified by that timestamp, right?
It seems a little too simple to not already be made, since Youtubers that use animated characters in their videos seem to commonly agree that lip-syncing is the majority of the time spent animating, but if there was an automatic one, wouldn't they all be using it already? Is it just that an audio-to-phoneme AI is that new, or is this idea just flawed?

I haven't really got the knowledge in AI to write that code and troubleshoot through all those steps to check the validity of that, which is why I'm just posting my ideas, as I'd like to try to contribute in some way. And alike the previous ideas I posted, it seems to be plausible at face value.
>>
>>39698431
the samples on the google colab is a shitpost. Just change the link to a zip file shown in the image example and it should work (you dont have to use the D.pth file just the G.pth and the config.json is fine)
>>
>>39698468
well, if it works, can you can change it so the code will give you a list of speakers to pick from? Digging inside the config file and typing all the data files by hand is bit of downgrade from what we have in all the other UIs.
>>
File: SHUT UP PONY.png (17 KB, 1247x291)
17 KB
17 KB PNG
>>39698236
rude
>>
>>39686757
https://files.catbox.moe/8gzdfo.mp3
https://huggingface.co/Amo/so-vits-svc-4.0_GA/resolve/main/ModelsZip/NamelessHero_eng.zip
Nameless Hero trained to 18400 steps, the voice still trips off on some simple reference audio noise(like breathing or light change of pitch in middle of audio), but its better than 3.0 version.
>>39698468
There are several little improvements that could be added to your script, like saving the file in wav format so its easier to share or add a most recent audio conversion playback button.
Also for the download zip cell, I would recommend to add some sort of string check converter, that when detecting a HF link like this:
https://huggingface.co/Amo/so-vits-svc-4.0_GA/blob/main/ModelsZip/NamelessHero_eng.zip
changes the "/blob/" element to the "/resolve/" (it took me a while to figure out why it refused to download that zip link).
>>
>>39698528
kek, keep posting stuff about this
i love preservation of almost everything so a game for an obscure edutainment console for children spikes my neurons
imagine a world where the my little pony leapfrog game became lost media, what a tragedy that would be
>>
I assume 15's next update is going to remove most if not all of the artifacts and have 100% character accuracy, right?

>>39688791
>>39690236
>>39690239
What the fuck is going on
>>
>39698674
>>
https://github.com/svc-develop-team/so-vits-svc/tree/4.0
The future update for so-vits-svc had moved to here as original author drop off the project for good
>>
>>39698674
>I assume 15's next update is going to remove most if not all of the artifacts
Yes.
>and have 100% character accuracy, right?
Yes.
Plus, there will be a new UI with many new quality of life improvements and features.

The most difficult problem by far is reconciling the three big problems: stability, character accuracy, and artifact removal. For most configurations, you can have two but not all three. Achieving all three at once is the holy grail, which at one point I thought was impossible without some major modifications.
>>
>>39698968
yippee you’re alive, hows the update goin’
>>
>>39692703
>>39693745
Many thanks for these, I've put what I believe to be all 2013 - 2015 leaks into a single zip now
https://www.mediafire.com/file/i9dpmyg237p5d55/2013_-_2015_MLP_Leaks.zip/file
>>
>>39698674
there won't be a next update.
15 transitioned and now works for elevenlabs under the name 11
>>
>>39699260
Based thank you
>>
>>39699264
Does it make you feel good to be a bootlicker?
>>
>>39699289
he gets a nice big paycheck
>>
>>39699349
The company worship in this thread has gone insane.
>>
>>39699353
have you bought skyrim yet?
>>
>>39695543
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject
>>
File: Standing ovation.gif (1.71 MB, 480x270)
1.71 MB
1.71 MB GIF
>>39698078
>loads fast
>sounds good
>button to delete all audio files in Colab
You absolute legend.
>>
Whats the best ffmpeg command to extract clean voices out of the 5.1 surround episodes?
Right now I'm using -af "pan=mono|c0=c1-7*c2"
but it still has some sound effects
>>
>>39698078
Do all the existing models load in to the colab?
>>
>>39698389
Updated, but not tested on GDrive or MEGA zips.

>>39699458
All of the 4.0 models I'm aware of load into the colab.
>>
>>39699452
AFAIK the FC channel for the episodes always have some sound effects; noise removal tools like izotope RX were used afterwards to clean up the voices.
>>
ost https://youtu.be/srudXQsiT4I
>>
File: PopupIcon.png (5 KB, 90x77)
5 KB
5 KB PNG
Also does anyone have the poses from the game icon? Clipper is looking hard on the stream but coming up a bit short would be nice to recreate the icon in a HD format
>>
Im trying to run the "Fimfarchive tools - release v4.ipynb" tool but Im getting the errors from the 'assert' code, here:
import glob
fimfarchive_path_candidates = glob.glob(f'{ppp_path}/story-data/fimfarchive - *')
assert fimfarchive_path_candidates, "You need to add the pony-preservation-project folder to your google drive."
AssertionError: You need to add the pony-preservation-project folder to your google drive.

I do have the folder shortcut lined to my google drive so I do not know why it can't seem to be able to detect it. Any ideas why it's not working and how to fix it?
>>
https://archive.org/details/mlp-leapfrog-game archive.org page is up
>>
so-vits-svc 4.0 Pinkie speaking models
All speaking data mixed: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Pinkie%20(speaking%20mix)
Separated by emotions: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Pinkie%20(speaking%20sep)

Here's a bit of an experiment: separating the model into speakers by emotion. The resulting model is quite temperamental (some of the speakers just straight up don't work) but it can produce some interesting results. For most purposes I would prefer the mixed model though. If others could test this out and draw their own conclusions I'd appreciate it.

For fun (separate emotions):
https://files.catbox.moe/h9kn99.mp3

As a side note, it appears that my previous assumption that clustering would not do anything for single-speaker models was incorrect--I trained a cluster model for the mixed model.
cluster ratio 0.0: https://files.catbox.moe/q7tgsl.mp3
cluster ratio 1.0: https://files.catbox.moe/2eitez.mp3
>>
>>39700278
Do you think models could be trained for Flim and Flam? Or is lack of data still too big of an issue for that?
>>
[SoVits] Athena - Shawn Song
>https://files.catbox.moe/yramfb.mp4

Not quite the final version, still got some further syncing and redoing to do.
May also see how 4.0 Athena does with it once I get her trained up.
>>
ey Clipper what was the radio you were listening to in the stream? I cant seem to find any /mlp/ horse radio
>>
>>39700567
>Shawn
More like Xiang
>>
>>39700567
Sounds like she's been getting into Shawn's lemonade again
>>
>>39700679
https://radio.horse
>>
>>39700681
Yeah, I noticed that SoVits, or at least 3.0 (no idea if 4.0 differs), has a strange flaw. If the "sh" sound is too short, it'll ignore the hissing "h" sound altogether.

I wonder if there's a way to rectify this and force it, and/or if 4.0 will have the same issues with it. Could this also be a result of being only lightly trained? I suppose I'll find out for certain once I train 4.0 Athena, which will likely begin this Friday.
>>
>>39700278
Follow-up: clustering models have been trained for all the existing so-vits-svc 4.0 singer models and are located in their respective directories on huggingface.
>>
https://files.catbox.moe/6qsoif.mp3
>>
File: RD Giggle.gif (924 KB, 465x465)
924 KB
924 KB GIF
>>39701999
Checked and kek
>>
>>39701999
Damn that melody though
>>
>>39701999
holy based trips and reference
>>
Did 15 talked about what to expect in a future update?
>>
>>39702240
15 more years, trust the plan
>>
>>39701999
good content anon
>>
>>39701999
Trips of awesome.
>>
>>39699387
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject
>>
>>39698078
It was working, but today it's giving an error about faiseq not being found when trying to infer in the final step. Installing fairseq in the setup step leads to a "object is not iterable" error upon running the final step.
>>
>>39703142
Colab updated to python 3.9 which appears to have broken the fairseq install. Updating pip seems to fix the issue.
>>
>>39700367
I think it's possible for speaking models to be trained for Flim and Flam. Most of the singing data is quite noisy though.
>>
Does anybody know the performance of crepe vs. dio vs. parselmouth for f0 detection?
>>
>>39702240
She said she would reveal more about the 11.ai partnership and transition process
>>
>>39703599
How much are you getting paid for this exactly? Genuinely curious.
>>
>>39698465
That sounds doable, though it would take some manual labeling work (which assets correspond to the mouth), it might require manual work aligning audio to animations, and it's unclear to me how animators would use it. Maybe such a tool could generate asset files, which animators can load into their animation software and pose.
>>
>>39700014
Fixed.
That was my bad. I changed the "story-data" folder name to match other data folders, not realizing that doing so would break this script.
>>
>>39703369
This is supposedly good, according to VA Anon's tests.
https://github.com/effusiveperiscope/so-vits-svc/pull/2/commits/218c55a6ebb67a5eba4b7f5d2744dcbe8b6ed3a7
>>
>>39698078
So I clicked on the buttons down to the models, downloaded my models. Where are they stored and how do I run the GUI to get started?
>>
>>39704080
Run the last cell
>>
Pony version when?
https://www.reddit.com/r/artificial/comments/11l3muz/i_made_tinder_but_with_ai_anime_girls/
>>
>>39703253
How?
>>
>>39704232
It's already fixed, just click run all. something in one of the o steps fixes it.
>>
?????
>>
>>39704738
pony?
>>
>>39704738
and before anyone asks, despite being blueray release, it only uses left/right channel, so no voice extraction to shitpost in the Director voice
>>
Just had an idea for the 8x a100s:
train large chatbots and voice models otherwise impossible on consumer-level hardware. There's LLaMa 65b, GPT-X, and many more models with too many parameters to fit on an ordinary pc. Would existing voice models benefit from datasets with more parameters?
>>
>>39704204
god no. that shit is as bad as real tinder -- swipe right 20 times for 1 match. Worst still, they have a pay-to-win subscription.
>>
>>39704796
>train large chatbots and voice models otherwise impossible on consumer-level hardware.
yeah no shit, grass is green too
>>
>>39703763
>Maybe such a tool could generate asset files
If you mean like generating mouth shapes from scratch, I'm sure that's possible, but my current idea would work by just having however many pre-made mouth shapes as images to pick from. That way you wouldn't get any randomness / jitteryness that is currently present in Stable Diffusion in videos. That's also the main reason I feel like using vector images/images would work better than just fully generating a character. If you can pose each image correctly using AI, then you're good. But if you generate a character purely from AI, you've got to pose it, AND generate it, and generating it could lead to all sorts of weird problems, which could mean the exact shape of parts of a character wouldn't be a sure thing. I'm sure that could be fixed, but it's just a lot easier to use images to get the same result, from what I know about it at least.

>and it's unclear to me how animators would use it
If you had something like 18 different images (I don't actually know how many there are usually) you use to animate mouth shapes, each labelled to a different vowel / consonant / phoneme, you could do what I had suggested before by getting the timestamps and phonemes, then pick out each image assigned to that phoneme and stitch them all together at the specified time in a video. Then, if you had images with transparent backgrounds for the mouth shapes, you could just drag and drop the animated video into the one with your character, and then just place the mouth video in the correct spot on the character for each frame. For AI generated animation, you could maybe assign/train a specific place to put the mouth video file/frames, on something like those controlrig things. Maybe even just combine the above logic into an animation AI somehow to remove a few steps.

Not much of my idea is automated apart from picking out the right mouth shape for the right timestamp, but that part that is automated usually is the thing that takes the longest when lip-syncing. It's pretty easy to reposition something in a video editing software, as opposed to fully lip-syncing something. So what I'm saying essentially is, there's still work you'd have to do, but it would hopefully be significantly less.
>>
>>39686757
>>39700278
so-vits-svc 4.0 Rarity speaking (single speaker):
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rarity%20(speaking%20mix)

It seems to me that the best approach might be to combine the data and train separate (smaller) models for additional tones as needed (e.g. for whispering, screaming).

Another thing that I've discovered is that you don't need to train cluster models on the same dataset that the base model was trained with, as long as the speaker name is the same as in the base model. I have no clue what the effect of using a cluster model trained on a different dataset is, but I've updated the 4.0 offline GUI to give an option to load a custom cluster model just in case it turns out that they can be used like voice LoRAs or something.

Also, I've added an option to the 4.0 offline GUI to use crepe for pitch detection if you have tensorflow and crepe installed. It's much slower, and I wouldn't necessarily say it's better than praat/parselmouth, but crepe sometimes succeeds in areas where the former fails and vice versa.
>>
File: 1677201430419246.gif (952 KB, 288x203)
952 KB
952 KB GIF
>>39705179
(I forgot a picture). Normal cluster model used with 0.8 ratio.
https://files.catbox.moe/gh2lvl.mp3
https://files.catbox.moe/c32nuu.mp3
>>
>>39705179
Was this done purely with so-vits or was reference audio ran through talknet? The flow seems off to me.
>>
>>39705318
Purely with so-vits. I wasn't trying to copy Rarity's flow.
>>
>>39705179
On windows, when selecting the output device and ticking off the "autoplay" checkbox it crashes the UI with this message:
Traceback (most recent call last):
File "inference_gui2.py", line 393, in update_init_audio
mixer.init(devicename = self.selected_dev.deviceName())
pygame.error: No such device.

The normal play button still works so that wouldn't stop me from doing any audio conversion, its just a weird bug I've notice, so ehhh.
>>
>>39705415
WOMM. I changed the code to make it use pygame to look for device names though; lmk if it works.
>>
>>39704796
>LLaMa
the best thing you can do now i think is to host the pure 65B or few 30B instances on the on the kobold horde it is absolutely insane how amazing it is, deals very well with fimfiction aswell,
starting with this prompt: https://aetherroom.club/1002 interjecting the memory part between "-- key notes --" and "-- story cont --" every generation while removing a paragraph for every paragraph generated I got this, it is on nat,dev and I really don't know what the deal is with that site, it lets people compare models but i worry it might be very slimy with data collection or just kill itself after few days but it popped out yesterday, is less comfy featurewise than KoboldAI (generations are fast and its as good, but its because only a one person hosts 13B llama on the horde).
>>
>>39705460
>at,dev and I really don't know what the deal is with that site,
It has a hidden quota
>>
I'm having trouble with TalkNet. It fails on step 3 and rerunning the cell isn't fixing it.
>>
File: talknet error cell.png (77 KB, 1530x990)
77 KB
77 KB PNG
>>39705504
hmm, not the colab anon but whenever in the past colab start to give me a fit, ive just installed the modules they were asking like this:

!python -m pip install --upgrade pip
!python -m pip install --upgrade dash
!python -m pip install --upgrade jupyter_dash
!python -m pip install --upgrade nemo
!python -m pip install --upgrade dash_bootstrap_components
!python -m pip install --upgrade torch_stft
!python -m pip install --upgrade kaldiio
!python -m pip install --upgrade pydub
!python -m pip install --upgrade frozendict
!python -m pip install --upgrade unidecode
!python -m pip install --upgrade pystoi
!python -m pip install --upgrade pytorch_lightning
!python -m pip install --upgrade g2p_en

but now im getting this error, and those are not python modules.
>>
>>39702240
trust his plan or something
>>
>>39705504
Error message?
>>
File: 1543072577904.gif (1.23 MB, 350x330)
1.23 MB
1.23 MB GIF
>>39704177
Ah, I see it's really hosting it on google's computers. Is there a way to have the collab install and run everything on my local machine? I have Talknet running locally and I'd like to have So-Vits aswell.
Is there a .bat that can run and install everything like the collab did or do I follow amoart's tutorial install video anyway?
>>
>>39707207
Colab cannot install anything on your machine. Currently there is no installation .bat. IIRC Amo's install video is for 3.0, not 4.0.
>>
>>39705504
>>39705533
Updated
https://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing
>>
>>39698968
I think 15 fell in the shower (2) weeks ago, and that you're just impersonating them. How would you confirm your legit?
>>
File: 1585042.jpg (342 KB, 1189x709)
342 KB
342 KB JPG
Twilight (speaking) so-vits-svc 4.0 model (40 epochs, 43200 steps)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Twilight%20(speaking%20mix)
https://files.catbox.moe/nju4ny.mp3
https://files.catbox.moe/dp1wba.mp3

Extra cluster model for Rarity: https://huggingface.co/therealvul/so-vits-svc-4.0-cluster/tree/main

https://files.catbox.moe/vq36aa.mp3
https://files.catbox.moe/dg2tff.mp3
>>
>>39708556
>https://files.catbox.moe/nju4ny.mp3
>https://files.catbox.moe/dp1wba.mp3
Is this bad input or something? I swear the 3.0 speaking model was sounding really good.
>>
>>39708580
3.0 straight:
https://files.catbox.moe/r9ckqi.mp3
3.0 thru TalkNet:
https://files.catbox.moe/at2hij.mp3
4.0 thru TalkNet:
https://files.catbox.moe/dvhwtc.mp3

It's possible that the model is undertrained (considering the size of the dataset), that the different dataset has an effect on the final result (I believe 3.0 was trained on some smaller subset of the speaking data available for Twilight). 4.0 straight may underperform compared to 3.0 thru TalkNet first. Will try further training.
>>
>>39707243
yea but you can run collab shit locally with jupyter
https://robotwealth.com/how-to-connect-google-colab-to-a-local-jupyter-runtime/
>>
>>39708731
Neat, good to know. That might be a good option.
>>
I know nothing about computers, but could a chatbot similar to chatgpt be made to help write pony fiction? While chatgpt isn't opensource, I wonder what it would take to make a similarly strong chatbot and train it on strictly FIM content and see what would happen.
>>
>>39708786
Something like that might be happening in the coming weeks given the LLAMA leak (there's already code for finetuning it for a chatGPT style chatbot), but you'd need some pretty beefy hardware to finetune and run the models (4bit quantization is lowering the VRAM requirement for running the largest model to around the 40 GB range, which is more than most people around here would have I'd imagine).
>>
>>39708556
>https://files.catbox.moe/vq36aa.mp3
excellent
>>
>>39708797
I wonder if it would be possible to train it on fimfic.net, and have it priotitize certain tags, disregaurd other tags, to filter out non fim specific writing, and prefer high upvote ratios to have it aim for well written stories. Not just porn, but also the longform adventure fics and the like. I want it to be able to pump out a 200k word slice of life story that just follows the day to day life of ponies in ponyville, for example.
>>
>Friday finally arrives
>"Time to train 4.0 Athena"
>Locates training colab
>Runs to begin training
>Huggingface auth failure
>No 4.0 training today

It also seems the entire github (https://github.com/innnky/so-vits-svc/tree/4.0) has gone poof too.
What happened?

>>39701999
Top kek, great reference.

>>39708556
Gosh, those sound very synthy or TTS-like. Was the reference audio of a TTS voice or something? Because that would explain most of it.

>https://files.catbox.moe/vq36aa.mp3
This however, is quality.
>>
>>39709205
Something happened with the project. Someone used it to clone the voice of someone they shouldn't do.
They were scared of that biting them back so even the forked github is gone,i think the eff branch is the last one with 4.0 code
>>
>>39709221
It only takes one idiot/asshole to ruin it for everyone else...
>>
>>39709230
I saw it happened with the SD community so i get why people want to do that
I will keep it updated in the mean time,thought diff-svc and fish diffusion is safe by now
>>
>>39709232
Is 3.0 still functioning though? I wouldn't mind still continuing 3.0 Athena if that's the case, or possibly trying out Lustrous.
Also, hadn't checked inference yet, I assume this only affects training colabs?
>>
File: screm.jpg (10 KB, 100x82)
10 KB
10 KB JPG
>>39709252
Nevermind, 3.0 training is definitely down too. Fuck.
>>
no need to panic. They just moved here https://github.com/svc-develop-team/so-vits-svc

the person who uploaded the repo just wanted to distance themselves cause people in china kept making ai models of celebrities and politicians
>>
>>39709283
i knew your name was familiar, you're that siivagunner guy aren't you?
anyway thanks for the info, phew
>>
>>39709283
Not panicking, just strong frustration as my mental schedule dissolves due to unforeseen circumstances.

>https://github.com/svc-develop-team/so-vits-svc
It appears the pretrained models linked in there also lead to the broken "innnky" location?
>>
>>39709289
Well yes but actually no since my channel isnt exactly siivagunner. Im technically also guilty of what happened to the sovits repo too since i was the person that made it known in the west with the Rick Astley celebrity sovits svc model
>>
>>39709304
don't worry bro, that shit fucking rocked
>>
>>39709303
they are making a new pretrain model due to the previous one having copyright issues. Either wait for that or wait for someone to reupload the old pretrain model

keep in mind the the one upload on svc develop team is sovits 4.2 and not 4.0 so 4.0 models are probably not compatible with that repo
>>
>>39709310
wasn't the last one 4.0v2? what is the difference beside a brand new pre trained model now
>>
>>39709328
yes the last one was 4.0v2/4.2. the main difference from 4.0 to 4.2 was that 4.2 uses visinger2 structure

also nevermind about it being 4.2. The repo updated to seperate 4.2 and 4.0 and the main one is 4.0 now
you can read more about it here. Just deepl the fuck out of it https://github.com/svc-develop-team/so-vits-svc/tree/4.0-v2
>>
>>39709351
thank you,guess i keeping the 4.0 i had before
the collab would need to be updated right?
>>
>>39709354
yeah just replace the dead links and it should work
>>
>>39709310
>reupload the old pretrain model
Hmm, are the D_0.pth and G_0.pth files that were created during past training the original pretrained files? Or are they modified somehow and thus not able to be used?
>>
>>39709367
those are the original pretrain models yeah. It only changes in the later iterations
>>
>>39709372
Excellent, guess I can create a modified version of the 3.0 colab to use my reupload on huggingface.
Though realistically all anons who've trained any SoVits models here likely has them already, so there's no shortage.
>https://huggingface.co/datasets/HazySkies/SV3/tree/main

Looks like I can train her today after all. Thanks.
>>
File: No Pretrained.png (25 KB, 846x114)
25 KB
25 KB PNG
Nevermind, guess it's sleep time instead.
Perhaps tomorrow the mares await.
>>
>>39696145
How in the world did you manage to get the vocals to sound so good. I feel like I'm getting junk from good stuff being put in. Especially background vocals turn everything to a loud buzzing.
>>
>>39709221
Absolutely phenomenal, I was wondering why it just all of a sudden stopped working.
>>
anonfiles com/ScObw8d0z6/checkpoint_best_legacy_500_pt
welp, until the shitshow with the hubert get sorted I guess I will just share the old one for anons to use for their offline 4.0 synthesis.
>>
>>39709808
anonfiles com/GaX4w6dfz7/G_0_pth
anonfiles com/m8Wew6d8z9/D_0_pth
Oh, I guess I should put tout the original G and D models too as well, just in case.
>>
>>39709252
>I assume this only affects training colabs?
Just tried 4.0 inference via Colab, it's down as well.
>>
>>39709808
>>39709826
Do you happen to have the files for 3.0 as well?
>>
>>39710185
anonfiles com/RbC3z4d6z3/hubert_soft_0d54a1f4_pt
3.0 huber model, from what i see the 0 G and D models are same.
>>
File: 2034458.gif (347 KB, 288x329)
347 KB
347 KB GIF
4.0 colabs have been updated. Training has not been tested.
>>39709283
You don't need to ask, but from what you know, is it possible they will try to take down existing github/colab/HF of the models?

>>39709205
>Was the reference audio of a TTS voice or something?
Nope, just me. Although I suppose my delivery was very TTS-like.

Twilight Speaking 4.0 model updated (100 epochs, 108800 steps)
(I also altered some parameters, like noise scale which was set to 0.8 instead of 0.4 as I think it was before)
4.0 straight: https://files.catbox.moe/aie049.mp3
4.0 talknet: https://files.catbox.moe/lpt81y.mp3
This sounds a bit better formant-wise (pitch detection errors left in for authenticity). I'll train speaking models to at least 100 epochs from here out.

I also wonder if the presence/lack of breathing sounds might contribute to your perception of the voices as artificial.

>>39709775
Providing examples/generation parameters would be helpful.
>>
>>39703029
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject

>>39710237
The talknet one sounds better, though the speech feels sort of flat, as in pretty even pitch throughout, which makes it sound somewhat robotic/emotionless. Might sound better with different reference audio.
>>
TalkNet Colab is borked again
>>
>>39710810
WOMM, did you restart the runtime before step 3?
https://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing
>>
>>39710829
Never had to do that step before, but I ran it now and it worked, thanks.
>>
https://www.youtube.com/watch?v=vAZpM_LxrXM
https://robotwealth.com/how-to-connect-google-colab-to-a-local-jupyter-runtime/
with the power of indian dubed tutorials and the basic scripts I am slowly figuring out how to run the sovits training offline so I can just run the script in the background without worrying the google will cuck all my progress.
The first thing I've lear was that jupyter fucking hates being run in the C drive so i needed to 'cd' into a folder in another drive space before running all the connection code.
>>
>>39710185
On 3.0 -- the problem doesn't appear to be what I thought it was; I appear to be able to get the model to save perfectly fine using small eval/log intervals. I would suggest to try running it again using a small eval/log interval in configs/config.json just to see if it works. The only thing I can think of that might be causing problems is Drive space?
https://colab.research.google.com/drive/1ABKgdAWp2UGfgavyxI_Nv7ytdL5h6J6f?usp=sharing
https://files.catbox.moe/4n7jew.log
>>
>>39711153
>!python preprocess_flist_config.py
UnicodeEncodeError: 'charmap' codec can't encode characters in position 7-10: character maps to <undefined>
got stuck on this error for a bit until i realized it just computer unable to read chinese character so ive comment out the line 47 and inserted a 'print('audio: ' + str(file))' line in its place.
>>
>>39710237
out of curiosity, what kind of cudatoolkit version are you using for your training?
>>
>>39708647
when I trained 3.0 speaking twilight I removed all the ones marked very noisy
>>
>>39696145
Had to comment on this. Been kinda meh on most so-vits stuff, but this really came together nicely. Still definitely flawed, but man in the mix you can barely tell most of the time. I think this may be some of the best harmony singing I've heard from AI.
>>
>>39711588
nvcc --version reports v11.4.48

>>39712013
That's the same as my setup, good to know.
>>
>>39710237
they probably wont take down any models. Most likely up to the real life person that the ai model samples which i doubt they'd do anything to the mlp models specifically. The new repo for so-vits svc actually has a terms of service at the very top if you wanna know more about it.
>>
has anyone tried making a so-vits svc gui auto install with bat files? Theres one for diff svc (https://github.com/neutrogic/diff_svc_ui/releases/tag/release). I have the gui installed already but I feel like a so-vits svc equivalent would help out a lot of people. Especially the ones who keep DMing me about it.
>>
File: 1513573190095.gif (46 KB, 205x223)
46 KB
46 KB GIF
Ponyzones 1-3 by Geekbrony have all gone private
anybody got backups?
>>
>>39712308
They're all on Derpi: https://derpibooru.org/tags/artist-colon-geekbrony
If they decide to put him on DNP list, they're also on Twibooru.
>>
>>39712308
https://www.youtube.com/watch?v=4IViz2W8FIU
>>
>>39712335
>>39712352
thank you
>>
>>39712302
Not that I'm aware of; might be something to look into though.
>>
File: 2425763.jpg (27 KB, 480x456)
27 KB
27 KB JPG
so-vits-svc 4.0 Rainbow Dash speaking (111 epochs, 68800 steps)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(speaking)
https://files.catbox.moe/m63mqw.mp3
https://files.catbox.moe/wj8068.mp3
https://files.catbox.moe/5t6s0b.mp3
https://files.catbox.moe/a3bmmf.mp3
https://files.catbox.moe/8zgh1l.mp3
>>
>>39712302
Here's a first attempt for 4.0. Cannibalized some stuff from TalkNet.
https://github.com/effusiveperiscope/so-vits-svc/releases/tag/v0.1.0-alpha
>>
File: holy.gif (375 KB, 200x184)
375 KB
375 KB GIF
>>39708556
>https://files.catbox.moe/vq36aa.mp3

This is amazing. You get one (1) free internet.
>>
>>39710237
>Providing examples/generation parameters would be helpful.
Input then output using the 4.0 GUI linked here earlier. The speaker files were retrieved from page 126 of the main PPP doc.
https://files.catbox.moe/8eq2op.mp3
>>
[SoVits] Athena - Anon's A Faggot (Fuck The Pussy)
>https://files.catbox.moe/1qdxl4.mp4

So glad Synthesizer V works perfectly fine on linux. It pairs well with SoVits, with a lot of tinkering of course.
Started off doing the usual song via SynthV due to pitch errors Athena threw out when fed the cleanest raw vocals I could separate. Got bored and decided to parody too, 'cause fuck it.

[SoVits] Athena - Move Your Body
Original to SoVits
>https://files.catbox.moe/3ph3g0.mp3
SynthV to SoVits
>https://files.catbox.moe/rvllpk.mp3

The latter isn't quite as good as the parody in terms of accuracy, as turns out less breathy settings helps her out more.
I may make a full version some time in the near future. Definitely gonna try more parodies and original works now that this is an option that works well.
>>
>>39713592
I still can't get over the fact that her VA is a 50 year old man. It's gonna make any audio porn (or just any audio) with her really awkward, because you know it isn't Athena professing her love for you, but technically Shawn wanting some bussy.
>>
>>39713645
Hmm. Do you consider, say, Rarity making lewd sounds, or confessing her love as Tabitha being the one that's doing it?

The origin of the voice is inherited, but the identity mare is her own. If she loves you, then she'll tell you wholeheartedly; even if it might take a 50 year old man or more to tell you.
>>
>>39713704
Or well, to say it rather. She can tell you in other ways, but it'll take an inherited voice to actually speak it aloud. In Athena's case it would be Shawn.

Speaking of, aren't all the lewd sounds and lyrics in Pony Zone also spoken by males? Although, I guess that's more male guiding a female voice, where as Athena is more male/female guiding a pitched up male voice doing an impression of a female character. Fairly complicated.
>>
>>39713704
I'd say it's different when a female VA is doing a female character's voice, as opposed to a male VA doing a female character's voice.
>>
>>39713748
>>39713742
Only in the technical sense, it's different. Shawn does the voice himself, with only pitching it up slightly to give the female impression. Voices done by TalkNet may be done by guys, sure, but they go through voice changing software that just auto generates a sound clip using the original voice as reference.
>>
>>39713345
This is polyphonic audio. None of the models trained here involved polyphonic pitch detection algorithms as it is much simpler to model a single speaker. If you are trying to recreate my Roundabout example, I had to sing all the reference voices individually.
>>
File: no.png (805 KB, 1920x1080)
805 KB
805 KB PNG
https://youtu.be/vKXgSZdbW6o
I forgot to post this thing.

>>39713742
>aren't all the lewd sounds and lyrics in Pony Zone also spoken by males?
Where the fuck does this rumor keep coming from? Neither mine nor Geek's Pony Zones have EVER used guided models. I don't think Vul's have either.
>>
>>39421053 (https://desuarchive.org/mlp/thread/39315199/#39421053)
Same anon. I figured I'd post an update even though we're still a long way from an alpha launch.
>>39421762
>Make something relatively normal, but easily hackable and moddable. Then you can unofficially create and support pony addons, and still make investors happy.
We're doing this, but "hacking and modding" is pretty much everything we do. It's going to support bring-your-own-datasets and bring-your-own-models. We'd help you mod AI models and plug the results into other projects that use AI, including ones that run on your desktop. We'll need to stick to standard formats, and a lot of pony data is not in a standard format. At some point, I'll dump conversions and conversion scripts for pony data and models that aren't already compatible.
https://poll.horse/zLjFxezf
>>
>>39713963
My mares are augmenting on their own!
>>
>>39713963
Incredible as always. What'd you use to modify her voice to be the heavy Robotic lines? Melodyne? Some vocoder?

>Pony Zone NOT guided
Ah, right, those used 15 didn't they? With most of your songs nowadays using TalkNet, and now SoVits, guess I forgot.
>>
>>39714770
>What'd you use to modify her voice to be the heavy Robotic lines?
Just the vocals and a simple Serum patch running through a vocoder called VocalSynth,
>>
>>39713794
>I had to sing all the reference voices individually.
Damn I had thought so. I can test later with non-polyphonic audio and equally quality stems. Are the models on page 126 the ones you used or did you do use others?
>>
>>39715276
Used 4.0 singing models; I believe the correct ones (huggingface) are listed in the main doc, yes.
>>
>>39715310
Copy that, many thanks. Is there good documentation on how to tweak things like the transposing or nose ratio values? Also documentation on clustering because I want to use that if it yields better results?
>>
>>39715392
https://github.com/effusiveperiscope/so-vits-svc/tree/eff-4.0
Transposing should be done in multiples of 12 (=1 octave) if you want the singing to be in the same key as the song. For other options check the readme
>>
>>39715397
Couldn’t be more thankful for y’all’s help. I’ll get reading.
>>
File: AthenaTraining.png (64 KB, 1205x368)
64 KB
64 KB PNG
>>39711253
Thanks to your provided/updated colab, Athena has returned to the grind without any issues; it's actually showing step outputs and such proving she's learning. Still not sure how to read the string provided and how much she's improving over time though. Hopefully it's still a noticeable improvement after another 10000 steps at least. Thanks again.
>>
File: 6425880.gif (2.67 MB, 763x501)
2.67 MB
2.67 MB GIF
>>39715793
I didn't update anything though?
>>
File: smolthumbnail.jpg (49 KB, 383x300)
49 KB
49 KB JPG
Decided to play with Sovits today.

https://files.catbox.moe/chb93n.mp4
>>
>>39716332
FULL VERSION.

https://files.catbox.moe/3frq8o.mp3
>>
File: 1362691286207.gif (1.98 MB, 350x350)
1.98 MB
1.98 MB GIF
>>39716332
>>39716349
These pony AI covers are getting better and better. You guys should consider making a PPP cover album, if that doesn't exist already.
>>
https://files.catbox.moe/4ne1v5.mp3
https://files.catbox.moe/ei7uc8.mp3
https://files.catbox.moe/b41mzp.mp3
https://files.catbox.moe/83k4n9.mp3
I didn't think I'd get such good results from my bad singing but wow. This is a game changer.
>>
File: error.jpg (81 KB, 962x477)
81 KB
81 KB JPG
>>39713345
What do
>>
File: AthenaLookSmol.png (39 KB, 237x187)
39 KB
39 KB PNG
[3.0 Athena - 15000 steps]
>https://mega.nz/file/w6ZwDahC#d1beM41XDvyhiZI9vEVdPDDqvcdWgeRdIup-Qou_v-c

Athena is now trained up to 15000 steps after 10h of training on just the standard T4.
Would've been even higher, but the training section started throwing errors (https://files.catbox.moe/qe8ken.png), so I stopped for now.

As a comparison I quickly whipped up a test song line to serve as a base to compare the two:
SynthV sample: https://files.catbox.moe/boo90n.wav
Athena 9k: https://files.catbox.moe/7scxo8.wav
Athena 15k: https://files.catbox.moe/rc07xe.wav

It's a little difficult to tell, but 15k does seem to be better, in that she has less overall noise, seems to have a better time discerning "G"and "K" sounds, and pitch changes seem smoother — less jumpy. The latter change seems like it'd help singing especially well due to being able to better match the vibrato of a singer.
>>
>>39717123
Here's the same comparisons, but with -12 semi (1 octave) to better reflect her normal speaking range.
Athena 9k: https://files.catbox.moe/h4a62i.wav
Athena 15k: https://files.catbox.moe/m5kmtc.wav

Attempting to do ordinary speech in SynthV didn't go quite so well, and I couldn't find any good speaking audio for her to make a speaking comparison, but hopefully these singing ones of this range provide some idea.
>>
File: 1505228357720.png (396 KB, 1133x1682)
396 KB
396 KB PNG
>>39716349
AAAA I LOVE IT
>>
>>39712797
Lots of dependency issues, I had to do this to make it work

RUN "activate.bat"
pip install praat-parselmouth
pip install matplotlib
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

After doing this, the gui opens but I still have these errors:
Note: Pygame is not available.
INFO: Could not find files for the given pattern(s).
Note: Rubberband is not available. Timestretch not available.
Could not find TalkNet on address 127.0.0.1:8050
>>
>>39716349
This is incredible. That reference audio must've been stellar!
>>
>>39716349
Hate to break it to you chief, but your pitch is off.
Maybe try autotuning your voice before processing it, or take singing lessons
>>
>>39718567
I kinda like the idea that AJ is kinda flat when singing. Kinda like how RD has voice cracks, it gives it character. The average voice isn't going to be 100% on-pitch all the time.
>>
>>39718599
retarded

>>39696145
see this post, the last two are great examples of what good singing+so-vits sounds like. You should encourage people to be better at what they do, not feed them bullshit when they are clearly off pitch
>>
>>39718567
>>39718609
It sounds fine
>>
File: 67356756.png (61 KB, 497x313)
61 KB
61 KB PNG
>>39718599
There's a difference between natural imperfection in good singing, and being blatantly off pitch. The reason singing is a talent, and pitch correction is used otherwise, is cause off-pitch singing sounds really fucking bad.
>>
>>39718646
Clearly you too are tone deaf
https://www.youtube.com/watch?v=fLexgOxsZu0
>>
>>39718671
It sounds like you're being autistic over a song most anons liked. Chill out.
It's obviously meant to be a casual cover, otherwise he would've further edited it to get rid of the glitches near the middle
>>
>>39718659
kind of the same way that the less talented a woman is at dancing the more she shakes her ass?
>>
>>39718677
Glitches? I didnt even listen that far. I was talking about how anon is tone deaf, its off pitch idk how you cant hear it. Anyway, i'm not mad, just constructive criticism
>>
>>39717082
That's a new one. Do you not have any audio input devices on your system?
>>
File: 1673233631173423.gif (32 KB, 248x358)
32 KB
32 KB GIF
Now i have no experience or expertise on anything AI image generation wise
but in your opinion what would be the most efficient way to do AI animation?
would it be essentially making an image and then building off that one for the next frame of animation? I mean from my perspective, AI animation would essentially be nothing more than a more advanced version of image generation since it would work off of each frame of animation, but the problem lies in how it would change each frame and stay consistent
could it instead generate individual assets? maybe it's more efficient to generate the background separately from a character both for the sake of speeding up the process as well as reducing the number of things that could go wrong in the generation process
half of this is probably nonsense but I know that there's something about what I just said that makes sense
>>
>>39717082
>>39718764
I've attempted to update 4.0 gui to not assume audio inputs exist; try again.

>>39717123
It looks like Drive mounting times out after a while or something. Not sure what to say other than you might have to restart the runtime from time to time to continue training.

>>39718319
None of those are critical to running the program; they're just optional features. I'd be interested in seeing what dependency issues you ran into? I noticed that matplotlib was not in requirements_win and updated it accordingly.

>>39716349
>>39718567
>>39718671
I haven't checked the whole thing but the voice seems to be 2 semitones too high (it's consistently out of key). It isn't a pitch correction problem; it's a transpose problem.
>>
>>39718891
For animation? I'd be most efficient to have a simple/crude animation for the AI to use as reference and follow, otherwise it'll vary far too wildly to be usable. Looking into a recent tech, ControlNet (>>39660412), would help most for animation. It helps the AI to strongly adhere to character anatomy, composition and posing.

Making the background separately and using them for each frame would definitely be recommended. However usually AI generated images don't have transparency, so you can't easily isolate the subject you intend to animate without potentially changing the background in the process.

I know that Stable Diffusion has a plugin that allows it to work with Blender. I'll have to experient to see how that would work out for animating rather than stills. Might have potential, I'll try that out later today.

Only other advice I can offer is improve your prompting to better get consistent results with stills, it'll help a lot when getting the stills for an animation, and just overall make using the AI easier. Hope this helps.
>>
>>39718891
Last year's PPP panel should help give you a general overview of how it could work.
https://youtu.be/NpFxmmh8NQ0?t=7860
>>
>>39718891
I don't know how it works now, but it definitely should build up on markers, and generate at the image very last moment.

Like
"A cute Applebloom prancing in a field"

It should "understand" the background, and that there is one character.
Then search for (or understand) what the character skeleton is, and animate the skeleton.
When it's done, it can generate a series of crude image of markers and metadata to give to a "rendering AI".
This would allow both image consistency and world beliveability.

But I have no fucking idea on how to make that.
>>
>>39718985
>>39718990
>>39719096
extremely helpful responses
this project doing great, thanks for the info!
>>
what does the thread think about AMD cards used for voice synthesis AI? Did they catch up, is it enough to make due, or is the general consensus still Nvidia as the go-to, be all and end all, for AI?
>>
>>39719503
AMD will be forced to catch up. Thanks to AI technologies being so popular and their competitor being in higher demand as a result, AMD doesn't really have much choice but to adapt; lest they become left behind.

If they hadn't done this already, they will eventually.

>Pic related
That Twilight reminds me of the AI Seinfeld visual style, and now I wonder how soon we can achieve something similar. Would also provide something entertaining to watch on the PPP cytube between streams.
>>
Bump.
>>
The colab talknet thing gives me a 403 even when i open the link in a new tab. Am I doing it wrong or is it broken
>>
training code for PITS is uploaded:
https://github.com/anonymous-pits/pits
might be worth to take a look
>>
>sovits EN training colab 4.0.ipynb
in the "Install dependencies" cell, it has trouble installing the fairseq module.
This can be fixed by adding a cell below with this code:
!python -m pip install --upgrade pip
!pip3 install fairseq
than running it and then running the dependencies cell again.

HOWEVER im getting another error when running the "start training" cell:

The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard

Reusing TensorBoard on port 6006 (pid 6704), started 0:03:32 ago. (Use '!kill 6704' to kill it.)

Traceback (most recent call last):
File "/content/so-vits-svc/train.py", line 321, in <module>
main()
File "/content/so-vits-svc/train.py", line 60, in main
validate(hps)
File "/content/so-vits-svc/train.py", line 45, in validate
ckpt = sovits_utils.latest_checkpoint_path(hps.model_dir, "G_*.pth")
File "/content/so-vits-svc/sovits_utils.py", line 355, in latest_checkpoint_path
x = f_list[-1]
IndexError: list index out of range

Im not sure what to do with this, I assume its another colab error done by google update fuckery?
ive tried to look into the sovits_utils file but that's pretty confusing as I do not know how to solve the issue with the " x = f_list[-1]" .
>>
>>39710357
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject
>>
File: checkthis.png (29 KB, 753x163)
29 KB
29 KB PNG
>>39720229
>The Colab TalkNet gives me a 403 even when i open the link in a new tab
Did you check this already?
Step 3 itself also says, "Restart the Runtime before running this cell!" so check that you did that first, too.
If you're using this Colab instead of the one linked in this thread:
https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB?usp=sharing
Then also make sure that you run step 3 twice as it says for the instructions @ that cell
>>
>>39721142
Did you download the pretrained models?
>>
Hi friends. I want to TTS a MLP fan fic story using mane 6 or celestia voices. Is there any easy way to do this? I tried tortoise with applejack but it didn't give a good result.
>>
>>39721571
Wait until 15.ai comes back
>>
>>39721568
Ive run all the cells as they needed to be run, i did notice that the G_ models were not being downloaded to the sovits/logs/44k folder so I made a new cell to download them in there BUT it still come up with the exact same error.
>>
>>39721577
>>39721568
Ive tried again, this time using the shutil and sticking it just above the 'start training' cell, and now its working again. Colab seems to be acting bit retarded in recent past days for no reason at all.

import shutil

shutil.move("/content/so-vits-svc/D_0.pth", "/content/so-vits-svc/logs/44k")
shutil.move("/content/so-vits-svc/G_0.pth", "/content/so-vits-svc/logs/44k")
>>
>>39721142
Fixed; had to add Clone = "44k" at the top of the training cell. Notebook is updated.
>>
>>39720315
>https://github.com/anonymous-pits/pits
Not sure what format the dataset's going to be in. The training README says "WIP" so I assume that's going to be coming later.
>>
>sovits EN training colab 4.0.ipynb

I keep getting this error in the "Segment training set and generate configuration files" cell:

Traceback (most recent call last):
File "/content/so-vits-svc/preprocess_flist_config.py", line 38, in <module>
for speaker in tqdm(os.listdir(args.source_dir)):
FileNotFoundError: [Errno 2] No such file or directory: './dataset/44k'

what do
>>
>>39721925
Did your resample process any files?
>>
>>39721927
No, it didn't. It doesn't do anything at all when I run the cell
>>
>>39721954
What does your raw dataset file structure look like?
>>
WHY AM I NOT SEEING G4 FILTER ANIMATION OVER ROTOSCOPED HORSE FIGURINES? HUH? GET YOUR PLUSHIES OUT, PUT THE GREEN HANDS ON AND GREEN SCREEN ON AND START ENACTING TWILIGHT SPARKLE PLUSHIE SHOVING HER HORN UP SPIKE'S ASS.
https://www.youtube.com/watch?v=xm7BwEsdVbQ

Also he saw the signs before it happened
https://www.youtube.com/watch?v=WipRVvvNt4Y
>>
Hey Synthbot, just letting you know there seems to be a bunch of partially broken Mayor Mares in the animation assets.
>Glitchy manes
>Duplicated skewed objects
>Extra stretched shaped on certain frames
>Blinks leaving eye ghost
>etc.
>>
File: SteedSpeakUI.png (401 KB, 1600x3100)
401 KB
401 KB PNG
Hello everyone,

For the past month or so, I have been working on a project that I call "Steed Speak." It is my attempt to create a locally-running, universal UI for all the various voice-generation solutions out there. It is still under construction and not available yet, but it's far enough along at this point that I am confident I will eventually get it to a sharable state. I wanted to give a heads-up to developers who might be actively developing other UIs and to gather feedback on the idea and UI design.

With Steed Speak, you will be able to generate audio for different architectures (e.g. controllable talknet, so-vits-svc, etc) from a single UI. Steed Speak puts each AI architecture in its own Docker container to avoid any issue with conflicting library versions, and a "command and control" container presents the UI and communicates with the AI containers. I will create pre-built Docker images, so you won't need to set up any of the environments yourself (although the Dockerfiles will be available if you want to). You just need to install Docker and Graphics drivers and then download the images and volumes. Steed Speak separates the pre- and post-processing options (e.g. pitch shifting, reducing metallic noise, stretching the output audio, etc.) from the architecture-specific options so you can use all of the pre- and post-processing options with ANY architecture.

I designed Steed Speak in such a way that we can add more architectures as we discover and train models for them. Separately, we can add more pre-processing and post-processing options and they will automatically be available for all architectures. My intention is to create a platform that we can keep adding to over time and gets better as we add to it, rather than starting over with a new UI every time we play with a new architecture.

The screenshot is an example of what the UI will look like, but keep in mind that the layout and options presented in it are NOT final. The project will be open source.
>>
Do the additional steps from so-vits 3.0 for adding ROCm support on linux apply to 4.0 or are there different requirements for pytorch in the additional pip commands? Already dealing with torchvision refusing to download 0.11.1 on python 3.8 via pyver so knowing about support issues now would be good.
>>
>>39722800
Definitely looking forward to an all-in-one GUI for the various AI we work with. The image you provided shows it's very clean looking. Nice work!
Will it be usable as an interface for a Colab, or only for local inference? Or perhaps make a version suited for those if not?

>Steed Speak
Not too sold on the name though. Perhaps something like... Mare Muzzle? 'Cause that's where mare vocal sounds come from, it rolls off the tongue better. Also mares.
Steed Speak sounds a bit too much like TeamSpeak, so there's also that.

Alternative ideas: Hay Say, Neural Neigh, IWTCIM (I want to clearly interface mares), Pony Parsing Program.
>>
>>39722941
Thanks for the feedback! I am glad you brought up Colab because I wanted to reach out to the community about it. I have never actually used Colab before, so I don't know its capabilities or whether it supports Docker containers (A quick Google search shows that it did NOT in 2020; I don't know whether things have changed since then). My plan right now is to focus on local inference, but Colab is something I plan to research at some point, especially because (from what I understand) it would let people who don't own sufficiently powerful graphics cards to use Steed Speak. If anyone else wants to help adapt this project to be Colab-friendly, I would very much appreciate the assistance. If Docker containers don't work, perhaps there is another approach that can be taken, like virtual machines.

I am open to changing the name. I'm not 100% sold on Steed Speak, either. Other names I have considered include "Neigh I", "Pony Voz" and "Horse Voices".
>>
Is there a chart or other documentation on the vocal ranges of the character AIs? I notice that I’m often signing too high and it causes the voice to just crunch out.
>>
>>39723041
BGM mapped out the vocal ranges for the Controllable Talknet characters:
https://youtu.be/F9PdoJ9zmrw?t=640
If you want to skip his explanation and go straight to the results:
https://youtu.be/F9PdoJ9zmrw?t=748
>>
>>39723063
Thanks this is a good start. This seems to be based on the 3.0 not 4.0 models and I know for a fact from testing it that Pinkie’s voice dies at around Eb5/E5 not E4 like the video says.
>>
File: 668858.gif (2.54 MB, 825x941)
2.54 MB
2.54 MB GIF
more 4.0 models
AJ speak: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applejack%20(speaking%20mix)
FS speak: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Fluttershy%20(speaking%20mix)
TX mix: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Trixie%20(mix)
https://files.catbox.moe/kb8an1.mp3

I did an experiment and put the speaking and singing speakers for Trixie into the same model (since it's the same VA). Timbre leakage doesn't seem too big of an issue. To me it actually sounds like the "singing" model used in TalkNet (which is what I was used to) might have been trained on speaking data? Then again there's nothing stopping you from just using the "speaking" model for singing. Here the EQG Trixie timbre seems to dominate for singing (makes sense given the only other source of singing data is Road to Friendship/We're Friendship Bound). She still can't seem to roll Rs.

>>39722800
Nice. I'm pretty sure there is a distinction between adjusting the pitch of the source audio and the pitch adjustments in TalkNet/so-vits-svc.

>>39722925
AFAIK few to no people have tried this, fyi, so document everything!
>>
>>39721533
>Step 3 itself also says, "Restart the Runtime before running this cell!" so check that you did that first, too.
Yeah I did that
>https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB?usp=sharing
This one fails on step 3 no matter how many times I run step 3. "No module named 'dash'". If it had to be installed in step 2 ctrl+f pulls up nothing, not even an error message
To prevent any confusion I was using this one
https://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing which seems to get past this okay, I just can't get to the UI
>>
>>39723550
What browser are you using?
>>
>>39723569
Tried firefox and ungoogled chromium
>>
File: tn.png (240 KB, 1880x972)
240 KB
240 KB PNG
>>39723621
https://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing
WOMM on ungoogled chromium, Windows. Do you have any add-ons or other things that might be blocking it?
>>
>>39722800
>install Docker
That's no-go for my win7 system. Also from what I experienced with the combined talknet + so-vits ui is that my GPU can barely run with the both models loaded at the same time.
But I do hope others will find this useful (and I agree with assessment that the name for it is bit mouthful).
>>
>come back two more weeks later
>still nothing from 15
The plan?
>>
File: relevant image.png (210 KB, 800x552)
210 KB
210 KB PNG
>>39723646
Disabled all my plugins and still no dice. Tried it on a totally different computer too and got the same result. I have used this before and it worked (months ago), I don't think the browser should affect much of anything considering it's all done server side, so I really don't know. Guess it just isn't meant to be
>>
>>39723770
>The plan?
Play tf2
>>
>>39723225
I still have problem with the training colab not downloading the G_ and D_ files into the /logs/44k folder.
>>39721733
>>
>Page 9
>>
>>39721268
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject

>>39722800
An all-in-one solution for the various TTS options would be very helpful. The UI looks fine to me, no obvious issues.

>>39723225
>https://files.catbox.moe/kb8an1.mp3
The normal speaking parts of that sample are sounding pretty good to me.
>>
>>39723834
Should have been working. I added another cd into the top of that step just in case the directory changed though.
>>
>>39723788
Does the last step not work either?
>>
File: 1593829518712.png (115 KB, 640x613)
115 KB
115 KB PNG
Adventures in pyenv
>get python 3.8 working under pyenv
>go to install requirements.txt after the additional args for ROCm ran fine
>installer locks up on PyQt5
>killsig it and run it with -vv
>PyQt5 has a fucking license screen
>type "yes" to get on with it since this is the one thing that is holding it all up
>pip decides to consume all of my fucking ram, shove everything I got running onto the page file and then still sits there doing nothing
>any other attempt to install it just does the same attempted ram hostage situation
Fuck you too PyQt5 and fuck your license agreement
>>
>>39725311
Update: pyenv installed an ancient pip, newer versions have commands that autovalidate license stupidity so now the real problem is waiting for an external download of torch==1.10.0+cu113 to finish
>>
>>39725363
torch==1.12.0+cu116
>>
>>39723161
Reading comprehension, Anon. The vocal ranges in the video are Talknet.
>>
File: fucking finally.png (172 KB, 1941x1141)
172 KB
172 KB PNG
ROCm configs work under the new 4.0 setup, 3.0 refuses to run due to some index errors.
>>
I know nobody asked for this, but here are some 4.0 models trained on S1 Rainbow Dash and Applejack data.
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applejack%20(s1)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(s1)
https://files.catbox.moe/q59qmx.mp3
>>
>>39722415
Thanks. It'll be a while before I can fix that, but I added it to a backlog I'm keeping of animation data issues to fix.
>>
>>39723004
>I don't know its capabilities or whether it supports Docker containers
It does not, and it probably never will. If you want to support Colab, here are some notes:
- Colab gives you a VM where you can install and run whatever you want.
- Usually, the first block in a Colab notebook is used for checking if the instance has enough compute available. The second block is usually used for installing dependencies and, if needed, for mounting the user's Google Drive so it can be accessed like a normal file. The rest of the blocks are for running the script, which could be as simple as calling a main method from your repo.
- Providing a user interface on Colab can be a bit tricky. Anons here have taken a lot of different approaches. (Easiest) You can create a Colab-specific UI using whatever form elements Colab supports, or whatever widgets something like ipywidgets supports. (More compatible) You can use JupyterDash, which would also let you show the UI in Colab itself, but would give you more flexibility in what UI you can display You can use JupyterDash outside of Colab, which would let you provide the same interface both in Colab and on local compute. (More flexible) You can run a web server on Colab, then expose it using a tunneling service like ngrok. I think ngrok itself stopped allowing anonymous accounts at some point and anons switched to something else. I forget what.

As an alternative that's worse than Colab but better than only local execution, you can let people choose to bind the server to 0.0.0.0 instead of localhost. That would make it easy for people to run your container on rented GPUs.
>>
>>39725548
>(More flexible) You can run a web server on Colab, then expose it using a tunneling service like ngrok. I think ngrok itself stopped allowing anonymous accounts at some point and anons switched to something else. I forget what.
The Stable Diffusion community likes using gradio, and they also have an extension that allows tunneling to cloudflare, remote.moe and localhost.run.
>>
>>39725013
Yeah 3 and 3b. There's no error messages just the gui embed and link gives me a 403
>>
>>39725470
Neat, S1 voices always have that charm to them
>>
>>39725548
Thank you so much for the notes. I have been writing the UI in Dash by Plotly and, luckily, it looks like it would be straight-forward to convert it to a Jupyter-Dash app:
https://dash.plotly.com/workspaces/using-dash-in-jupyter-and-workspaces

The AI Docker containers (which the main UI communicates with) are running Flask apps. I'm not sure this setup would translate well into Colab. Regarding my comment earlier on Virtual Machines, I'm having trouble finding any info on whether it is possible to run VMs *within* a Colab VM, although judging from the fact that Docker is unsupported, my guess is that it's not possible. I could attempt to install all the architectures together in the Colab VM and at least isolate their Python libraries from each other using Python virtual environments. If there are any conflicts in Linux package dependencies, however, it won't work.

At the very least, it should be possible to have separate Colab projects for each AI architecture and provide the same UI for each of them.
>>
>>39725990
Any dev console errors? Might it be network related?
>>
Mare?
>>
>>39726824
Mare.
>>
[SoVits] Guardians of Pondonia Scene - But everyone is voiced by AI Athena
>https://files.catbox.moe/qcyxbq.mp4

Turns out for most masculine voices she needs to be pitched up by at least +3, which on reflection makes sense as that's how much Shawn typically pitches up his samples for Athena's original voice.

I've also since added her to my shared SoVits drive which is now at 25000 steps. Won't be doing any further training with her on 3.0 as I don't think she'll improve any further with her current dataset. With the improvements of 4.0, and the changes. I'm curious to see how they differ. More than likely I'll compare them at their 9k or 15k models to save time and see if 4.0 is worth the extra hours.
>>
>>39725470
>https://files.catbox.moe/q59qmx.mp3
kek'd and appreciate'd
>>
>>39726861
STALLION
>>
>>39726083
Just a bunch of "analytics logging failed". It's trying to access play.google.com/log but it seems google's own cross-origin policy blocks it. Don't think this is related to the issue unfortunately
>>
>>39727664
IDK what to say anon. Maybe there's a way to use pyngrok to tunnel to the UI?
>>
File: PonkXanadu.png (544 KB, 490x670)
544 KB
544 KB PNG
https://files.catbox.moe/t3icqp.mp3
https://files.catbox.moe/06wzjw.mp3
Really happy with parts of this :3
>>
>>39728173
Neat! Is the 2nd one a reference from some kind of musical?
>>
>>39724634
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject

>>39728173
Nice.
>>
>>39728278
These are both from the album/film Xanadu, not an intentional reference if it is. The first link is “Xanadu” (title track) and the second one is “I’m Alive.”
>>
I want to use the pony text to speech site but its down :(
>>
>>39725470
how can I use this?
>>
>>39728518
https://github.com/effusiveperiscope/so-vits-svc
>>
>>39720315
Attempted to train this on my 3080 but it's SLOW (as in, I'd need about 30 days to train up to the same number of epochs as the model they have). Probably requires bigger computer.
>>
how much help is needed in regards to this
>https://caption.cutemares.xyz/
I've gotten a few done so far and plan to do to it for at least an hour but what's this progress value? is it the overall progress of how far we are through all the images we have?
lastly, I want to say simple grunt work activities like this are highly accessible for those who don't have the knowledge on how to program stuff but still want to help with the projects
>>
Two questions regarding so-vits

1, where is the link to the 4.0 GUI? And is it compatible with 4.0 v2 models?

2, I think there might be a chance that train.py assumes that I'm using a multi GPU setup, when I'm not. How do I take care of that?
>>
>>39729464
>2
From what I see in the training.py the function of main(0 has the code:
n_gpus = torch.cuda.device_count()
so it shouldn't detect gpus that you do not have installed in your motherboard.
>1
>where is the link to the 4.0 GUI?
see >>39728770 that is also linked in the OP main and quick guide doc.
>compatible with 4.0 v2 models
do you mean 3.0? if that's the case no, the 3.0 models will not work with the 4.0 interface.
>>
>>39729464
1. AFAIK the 4.0 GUI should not be compatible with 4.0 v2 models; they use different vocoders (although I think there are different versions floating around called "v2").
2. What makes you think this?
>>
Two questions:
How good, if at all, is so-vits-svc at whispering/soft talking?
Which voices are available?
>>
Astralite's Pony Diffusion v4 came out earlier today:
https://huggingface.co/AstraliteHeart/pony-diffusion-v4
>>
>>39686748
You mfs made some amazing shit with this. How'd you feel when voice ai suddenly became popular?
https://youtu.be/iSZhUKejc3Y
>>
>>39730471
>How good, if at all, is so-vits-svc at whispering/soft talking?
Here's the best I can get with a cluster based on whispering:
https://files.catbox.moe/66h2iu.mp3
It's rough.
>Which voices are available?
See https://docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.ce7da4l3o6jf
(scroll down)

>>39730500
>How'd you feel when voice ai suddenly became popular?
What do you mean?
>>
>>39730497
let me just quote it to the anchor post so other anons would see it easily (also its nice there is a pruned safetensor model included) >>39686757
>>
File: gpt-4.png (351 KB, 1307x993)
351 KB
351 KB PNG
>>
>>39730963
>Sparkle in her eyes
>One paw lifted
>Blue and white scarf
>Butterfly near her tail
It's still misidentifying and making stuff up, so it still has a way to go. But other than that very good describing of the very cute mare. So more progress than setbacks; things are looking good.

>>39729647
Wait, there's a v2 of 4.0 now? There was already a split between 3.0 and 4.0, now the latter is divided even further? That's gonna make managing which models belong to which a little trickier, and would require even more colabs to keep track of, to both infer and train either of the 4.0 versions.
>>
>>39730963
>paw
Fucking furniggers
>>
File: gpt-4-text.png (156 KB, 786x740)
156 KB
156 KB PNG
>>
>>39730963
>paw
>>
File: 1420000494925.jpg (179 KB, 696x660)
179 KB
179 KB JPG
>>39731038
>>39731105
Just like AI can't see the difference between a man and a tranny, it reveals the truth that pony is furry.
>>
File: WingEqualsPawQuestionMark.png (826 KB, 1080x1215)
826 KB
826 KB PNG
>>39731038
>>39731105
>>39731133
To be fair though, I think GPT-4 just mistook her closed wing for a raised paw somehow. Much like how it mistook her striped socks for a scarf. As it is on the upper part of her body and considered raised. The "butterfly near the tail" could be the green leaves at the base of the flower. Even though that was also recognised as a flower earlier, unless it considered only the stem and bloom as the "flower".>>39731043


So it seems to prioritise the identity and positioning of objects compared to other objects rather than if those objects make sense in the context of those others.


GPT-4 logic probably:
>Mare identified
>Mare has raised paw on upper body
>Unusual
>Will not criticise reasoning
>Constructing narrative in a manner allowing anomaly make sense
>>
File: 559598.png (14 KB, 1000x740)
14 KB
14 KB PNG
>>39731133
furries & bronies are fundamentally very different and seperate from eachother. Majority of furfags are nowhere near as creative or imaginative as horsefuckers and nowhere close to being as close-knit as us. The only thing they share in common is wanting to fuck non-humans. /mlp/ has autism and degeneracy (fictional banter), furfags have Aspergers and 'degeneracy' (war crimes).

that's the tl;dr anyway
>>
File: file.png (149 KB, 819x830)
149 KB
149 KB PNG
>>39730963
But I thought GPT-4's currently available models couldn't handle images yet? Did I miss something?
>>
File: file.png (241 KB, 2220x796)
241 KB
241 KB PNG
Seems like I did I guess.
>>
>>39731589
we gettin recursive in here
>>
>>39731003
I haven't been impressed by DSP based vocoders thus far. From what I saw of the v2 demo on huggingface when it was still up (to be fair I didn't give it a thorough look) I didn't find it worth pursuing compared to the existing architecture.
>>
File: 2365042.png (1.07 MB, 1191x670)
1.07 MB
1.07 MB PNG
> Write a short comfy story about this! What are they talking about with each other :)

>The snowy village of Starfall Hollow was bathed in a warm, orange glow as dusk approached. The quiet hum of the village was interrupted only by the sound of the ponies' hooves gently crunching on the snow-covered cobblestone streets.
>
>Mira, Celeste, Luna, and Solstice were four best friends who had grown up in the village together. They had spent countless hours exploring the snow-covered landscapes and cuddled up in their favorite cafe, sharing stories over steaming cups of hot cocoa.
>
>As they walked through the village, the four friends were wrapped in cozy winter scarves and hats, their breath visible in the crisp, cold air. The stars began to appear in the night sky, a breathtaking display of twinkling lights against the inky blackness.
>
>"What do you think those new neighbors are like?" Luna asked, breaking the silence.
>
>"I heard they're from a far-off land where the snow never falls," Celeste replied, her voice filled with curiosity.
>
>"I can't imagine a life without snow!" Solstice exclaimed, shaking her head in disbelief. "What would we even do during the winter?"
>
>"I suppose they'll have to adjust to life in Starfall Hollow," Mira chimed in. "And we can help show them the ropes!"
>
>The friends smiled at each other, already excited at the prospect of new friendships and adventures. As they continued their stroll, they talked about their favorite wintertime memories, from the snowball fights they had as young foals to the magical night they had discovered a secret ice-skating pond deep in the woods.
>
>As they neared the new neighbors' house, they saw a family of ponies unloading boxes from a wagon, their faces flushed from the cold. The friends exchanged glances, their hearts swelling with warmth and camaraderie.
>
>"Let's give them a proper Starfall Hollow welcome," Mira suggested, her eyes shining with excitement.
>
>Together, they approached the newcomers, offering their help, warm smiles, and a plate of freshly baked cookies. The new family, grateful and eager to make friends, accepted their kindness, and soon they were all laughing and talking, sharing stories of their past and dreams for the future.
>
>As the night wore on, the snow began to fall once more, blanketing the village in a fresh layer of sparkling white. The glow of the lamplights and the sound of laughter filled the air, a testament to the magic of friendship and the cozy warmth of a snowy winter's night in Starfall Hollow.
>>
15.ai got updated?
>>
>>39731937
Could be any number of weeks now!
>>
AI Twilight Sparkle proudly sings some Syrian patriotic tunes! Seems like So-Vits surprisingly works well outside of English too!
https://www.youtube.com/watch?v=t858yQSPwvk
>>
Hey what's going on with 15? Seems like the page has changed again. Can't take a screenshot since I'm currently mobilefagging, but it looks like things are going back up.
>>
>>39731910
Comfy
>>
>>39728323
Live in ~1 hour. Animating.
cytu *dot* be/r/PonyPreservationProject
>>
>>39732131
I'm kind of hoping it means it'll be up sometime in the next few days, if not today. But it still blows my mind how "two more weeks" turned into six months. lol
>>
I would like pony TTS please. I want twilight sparkle to read a .txt file for me. 15.ai is not up.
>>
>>39732207
Apparently the last time it went down it was for over a year kek
>>
>>39732265
It's been down for 21 of the last 24 months.
>>
>>39732265
>>39732379
The site didn't go down until September 2022, though. No development in that time, but it was still easily accessible. It's only been down 6 months.
>>
>>39732265
>>39732379
What do you fags get out of making blatant lies like this?
>>
>>39732427
I'm just going off of what I've been told. Apparently it went down for a year according to one of my French.
>>
>>39732436
You’re a fucking shill and you know it. Fuck off.
>>
File: baf.gif (2.1 MB, 372x463)
2.1 MB
2.1 MB GIF
>>39732440
The fuck are you on about?
>>
>>39732444
How do you live your life having sold your soul to a multi-million dollar company? 11fags are fucking pathetic.
>>
>>39732456
I've never even used 11.ai kek
>>
File: 1672976889910218.jpg (97 KB, 385x501)
97 KB
97 KB JPG
>>39729154
seriously what's the deal with this?
is this some abandoned project or something because every time I log back on to do more caption work the percentage value never changes
is this like my personal batch for my IP or something or the overall percentage that we are through the image stockpile?
>>
>>39732659
correction the percentage value never changes unless I submit another description
>>
>>39731910
Good God, is this real? This is truly amazing. In fact I'm mad at myself for not being amazed more.
>>
>>39732451
People here are highly strung about 11 shills
>>
>>39732915
Apparently so, huh?
>Mentions a rumor I heard about 15.ai
>"HOLY FUCK YOU SHILL!!!"
Kek
>>
You guys are still waiting? Don't you know 15 was shot by the ceo of openai? I'm his long lost brother
>>
>>39732122
very nice, Twi voice is little bit too high pitch in some places however its over all good attempt.
>>
>>39732659
I'm pretty sure it's the overall percentage for the whole dataset. Considering that it's like nearly every piece of pony art ever, it's a lot to describe.
>>
File: 2310530.jpg (2.64 MB, 7016x4961)
2.64 MB
2.64 MB JPG
>>39732659
mostly abandoned. As far as I'm aware the backend and database are still completely functional, but
1. the existing dataset actually already performs fairly well with BLIP2, cf Astralite's experiments. The plan for the future is to use the ~10k existing captions to develop a reward model to bootstrap a perfect captioning model from rating preferences, rather than directly writing captions.
2. I have 0 time to maintain anything anymore, working on a million other things. In particular, the domain cutemares.xyz is going to expire in about a month. Contact me if you know anyone that wants it.
3. GPT-4 demonstrates such absolutely stunning capabilities on image comprehension that I'm honestly pretty depressed about it all. Example for pic related:

>The image captures a cozy and warm scene set in a wooden cabin interior during a snowy evening. A picturesque snowy landscape of a quaint village is visible through the large window, giving off a serene vibe. The night sky is filled with stars, illuminating the scene with a soft, ethereal glow.
>A cute, green-haired pony with pink eyes, yellow fur, and a pair of glasses resting on its nose is lying comfortably on a plush, patterned, red armchair. The pony appears to be engrossed in a book, with a small, content smile on its face, which exudes a warm and relaxing atmosphere.
>Near the pony, a small globe with earth-like colors and a few open books are resting on a wooden table. This setup indicates the pony's interest in geography and knowledge.
>A fireplace with a crackling fire casts a warm, inviting glow on the room, further enhancing the snug atmosphere. A sleepy orange fox rests on a lush green rug near the fireplace, curled up in a tranquil and content state.
>A wooden shelf above the fireplace holds several items, including a model airship, books, a snow globe, and an amethyst crystal cluster. These items showcase the adventurous and curious nature of the pony.
>Next to the armchair, a plump pet bed cradles a ginger cat snuggling with its kittens. Their peaceful slumber adds to the overall cuteness and serenity of the scene.
>Overall, the image radiates a feeling of warmth, relaxation, and cuteness, depicting a perfect moment of peaceful respite on a chilly winter evening.

tldr the entire project is becoming increasingly obsoleted.

>>39733425
Funny you should mention that. Rumour has it that OpenAI will be releasing a TTS service within 1~2 months from now.
Both 11 and 15 are going to be irrelevant soon enough.
>>
plz, i want to pony TTS... please....
>>
>>39734554
>ClosedAI
>making anything irrelevant
lol
>>
>>39734554
OpenAI wouldn't use any IP for TTS voices because they would actually risk the banhammer coming down from multiple companies, including Hasbro. And it'll be close-source still, like all their other stuff, so there will probably be crazy filters in place.
>>
>>39734554
>Funny you should mention that. Rumour has it that OpenAI will be releasing a TTS service within 1~2 months from now.
Source?
>>
File: 1676965212611425.jpg (1.65 MB, 10000x8905)
1.65 MB
1.65 MB JPG
>>39734554
thanks for the response
I just wish I could contribute to the overall PPP but this was the only thing I could do with my skill set but now that it's gone I've got nothing to contribute
>>
>>39734554
>Rumour has it that OpenAI will be releasing a TTS service within 1~2 months from now.
And it will be the most cucked shit imaginable. "We've alerted the Florida Sheriff in charge of arresting you for being mean online" tier censorship.
>>
>>39735391
I wouldn't be surprised if this is like Whisper where they just open source it. It's not that big of a deal for them and a good open model would decimate the other voice AI companies which would benefit them in the long run, since they wouldn't get big enough to be able to pivot to their territory.
>>
>>39734751
SHUT THE FUCK UP
>>
File: th.png (710 KB, 1280x720)
710 KB
710 KB PNG
https://www.youtube.com/watch?v=84sVeYJ_poQ
Discord song
>>
>>39735389
Sorry. There are a lot of other datasets that *could* be useful in the future, but would require an entire new data collection platform for...
(Some small ideas: action description --> animation, pony music descriptions, RP logs with larger powerful foundation models)
>>39735391
>>39735189
For sure. Never going to get Twilight moaning from an OpenAI service. But at least 11 will sink.
>>39735459
Don't get your hopes up.......
>>
>>39735652
I dunno, I think there's too much of your own voice in there and it doesn't seem to sound enough like Discord. If John deLancie were singing this song, his delivery would be probably be more emotive.
>>
>>39732179
Live now. Animating.
cytu *dot* be/r/PonyPreservationProject
>>
Page 9 bump.
>>
Hey Vul, can you please change the default config.json for the 4.0 training colab to being 1000 eval_intervals instead of 800?
It's a little triggering, and also makes it difficult to directly compare 3.0 models with 4.0 models due to the inconsistency in saving rates.
>>
>>39686757
https://huggingface.co/Amo/so-vits-svc-4.0_GA/blob/main/ModelsZip/TreeHugger_69k.zip
Tried to train the Tree Hugger 30s dataset for two days straight but the result is not very impressive.
https://files.catbox.moe/57wjpc.wav

>>39686757
Also, here is a tutorial for raining the So-Vits-SVC 4.0 models offline with your own gpu (if your PC can run Talknet or Stable Diffusion it should be able to run this training too, just need to lower the config file batch to just 1).
Video tutorial:
https://www.youtube.com/watch?v=WIlq26jtr9k
Code instructions for creating training conda env:
https://github.com/AmoArt/UI_for_So-Vits-SVC/blob/main/folderDumpText/offlinesovit/info.txt
The ipynb training file:
https://github.com/AmoArt/UI_for_So-Vits-SVC/blob/main/folderDumpText/offlinesovit/sovits_EN_training_colab_4_0_offline_(231703).ipynb
>>
>>39736782
OK
>>
>>39698078
I seem to have run into an issue with the 4.0 inference notebook. Apparently it expects there to be a file that's not a zip?
>>
>>39737543
extract() step is supposed to extract the zip. Not sure why it's not in your case; seems to work fine with the default download.
>>
>>39737570
>https://huggingface.co/datasets/HazySkies/SV3/resolve/main/sovits_athena_44khz_10000_sv4.zip
Here's the file in question I used as the "Other Downloads" Step. Does it matter that I didn't press any "HF Downloads" of the buttons? I figured it would've worked and been quicker were it just to run the custom model. Not sure what I'm missing here. Athena is also nearing 20000 steps. 4.0 seems to be quicker than usual, might be imagining it though.
>>
>>39737593
I see what the problem is--the .zip is directly enclosing the checkpoint files rather than a folder containing them.
>>
>>39737600
I've modified the notebook to work with .zip files structured like this.
>>
>>39737609
* I neglected to create an actual model directory for the ckpts, whoops.
>>
>>39737600
>>39737609
I see, guess that might explain why it dumped the checkpoints into the content directory prior to erroring.
>Modified the colab to work with that structure
Based, and fantastic. This is how the 3.0 ones are/were structured before, so this is another way to keep the training and inference methods consistent between the versions.
>>39737628
Oh, is this a result of the new format compatibility? Or and oversight before the change?
>>
>>39737642
I was making the change quickly and neglected to test the actual inference. It should be working now.
>>
>Page 9
>>
File: 1678949218488280.jpg (81 KB, 961x1061)
81 KB
81 KB JPG
Are we back?
>>
[SoVits] Athena wants to battle
>https://files.catbox.moe/u3ymwz.mp4

It's still remarkable how with SoVits we can not only ponify vocals, but also instrumentals too.
4.0 Athena is also now at 25k steps, and so is equal to her 3.0 counterpart in terms of training and is available here:
>https://huggingface.co/datasets/HazySkies/SV3/blob/main/sovits_athena_44khz_25000_sv4.zip
>>
>>39731589
who the fuck even are you anyway
show yourself bitch
>>
Okay, new Bing is kind of impressive. Hopefully it doesn't get too neutered.
>>
>>39728173
I've been listening to this all week. Please make a longer version.
>>
>>39740022
I plan on making it longer since these test runs are done without vibrato and I’m a little flat on Pinkie’s vocals. Thank you for such nice words.
>>
>>39731367
thank you anon you articulated it better than I ever could
>>
how's sovits for very small datasets? ie, side ponies with ~1 minute of dialogue that we'll never hear from again
>>
>>39741063
SoVits seems to do alright with very small datasets, I tried training Nurse Redheart a few weeks ago and she did alright. Although, I may have added too much noisy/muddled samples in her already small dataset, and only trained her to 7000 steps due to suspecting the lower quality dataset; so part of the lack in quality is likely on me.

Nurse Redheart Original lines: https://files.catbox.moe/nf5tfz.wav
SoVits 3.0 Nurse Redheart (7000 steps): https://files.catbox.moe/csbieh.wav

I plan to retrain her later on, particularly if I can actually manage to clean enough of her audio to be confident that she'll do well.
>>
Looks like the 3.0 sovits colab is down/broken; No module named 'numpy.typing'.

Might train 4.0 mares in the meantime. Speaking of, is anyone presently training 4.0 Celestia/Luna/Cadance?
I'm considering training them, as I need two of them (Celestia & Cadance) for my next song ponification, but wouldn't want to begin if existing training is happening already.
>>
>>39741597
Yes
>>
File: ai-light.png (424 KB, 767x834)
424 KB
424 KB PNG
How long until pony AI evolves to the point where it becomes incomprehensible to regular humans
>>
>>39741909
I support the mare singularity.
Singularimare? Maregularity?
>>
>>39738623
Nice!
Wouldn't it be better if you split the GB channels tho?
I assume you gave the whole music, and the AI had to decide "when" she could sing.
By doing the three channels (heck, you can try noise too), and mixing it afterwards, it would be better I think.

>>39739934
Sweet!
>>
>>39742000
Equine-gularity.

>>39742106
>Split the channels?
>Might be better
Yup. That's exactly what I had done this time around, unlike my previous tests with Fluttershy singing Terraria Underground.
My favorite is her bass track. For some reason mares do low bass-y and percussion-y stuff really well.

[SoVits] Athena Wants To Battle - Individual Channels
>Combined: https://files.catbox.moe/7hibfb.mp3
>Bass: https://files.catbox.moe/xg2cuc.mp3
>Bit1: https://files.catbox.moe/jdrqfb.mp3
>Bit2: https://files.catbox.moe/bt818l.mp3

The earlier video is also a little different sounding in that I used an "AM radio" filter in order to give it that old feel. Like it was being played from a GB speaker or something.
>>
>>39741597
>>39741736
more 4.0 models
PRINCESS:
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Celestia%20(singing)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Cadance%20(singing)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Luna%20(singing)

CMC:
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Scootaloo%20(singing)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applebloom%20(singing)
https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Sweetie%20Belle%20(singing)

GLIM: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Starlight%20(singing)

I'm going to switch to training only while I sleep because I want to do other stuff with my GPU/RAM during the day.
>>
>9
>>
I tried using BigVGAN (https://github.com/NVIDIA/BigVGAN) with TalkNet as the 22 kHz model uses the same Mel settings. But, the results are worse:
>TalkNet HiFi-GAN (downsampled to 22.05 kHz): https://files.catbox.moe/j9rhoy.wav
>BigVGAN Base 22kHz: https://files.catbox.moe/hoiyn0.wav
>BigVGAN 22kHz: https://files.catbox.moe/36u6l0.wav
Finetuning would help, but I'm not sure how much.
>>
File: new 15 maintenance page.png (10 KB, 1918x894)
10 KB
10 KB PNG
looks like 15 has a new maintenance page
>>
>>39743012
Many thanks for training so many models. I'm planning to add them to Docker volumes that I'll be redistributing for my project (see >>39722800). Please let me know if you have any issues with that.

Quick update on the project:
I have a functional Docker container for so-vits-svc 3.0 now and am able to upload audio via the UI, select a character from a dropdown, generate audio and listen the output via the UI. I have already made a lot of progress on the so-vits-svc 4.0 Docker container too and expect it to be functional within a day or two.

However, there are many limitations and I have many to-do items remaining before the project can be released. None of the pre- and post- processing options have been implemented yet, for instance, and only .wav input files work at the moment.

Since the Volumes Backup & Share extension for Docker Desktop is currently limited to 10GB per volume, I am planning to package up models in <10GB bundles. For so-vits-svc, this will likely take the form of mane 6 + CMC speaking, mane 6 + CMC singing, and then various assorted collections of other characters.

I still haven't settled on a name, but I have a few more ideas: "Neigh Say", "Herd Words", and "The Horse's Mouth". As a more generic name, I like how "Cross Talk" sounds, but it carries a negative connotation in electrical engineering. Another one that gave me a chuckle but I won't use is "Mare Swear". My favorite so far is "Hay Say", which Hazy suggested earlier.
>>
>9
>>
so when are we gonna start using this tech?
>>92184008
https://rentry.org/nv3q9
https://rentry.org/f34hy
>>
>>39745558
>>>/g/92184008



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.