Welcome to the Pony Voice Preservation Project!youtu.be/730zGRwbQuEThe Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.EQG and G5 are not welcome.>Quick start guide:derpy.me/FDnSkIntroduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.>The main Doc:docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/editAn in-depth repository of tutorials, resources and archives.>Active tasks:Research into animation AIResearch into pony image generation>Latest developments:GDrive clone of Master File now available >>37159549SortAnon releases script to run TalkNet on Windows >>37299594TalkNet training script >>37374942GPT-J downloadable model >>37646318FiMmicroSoL model >>38027533Delta GPT-J notebook + tutorial >>38018428New FiMfic GPT model >>38308297 >>38347556 >>38301248FimFic dataset release >>38391839Offline GPT-PNY >>38821349FiMfic dataset >>38934474SD weights >>38959367SD low vram >>38959447Huggingface SD: >>38979677Colab SD >>38981735XFL split archive >>39024207Huggingface textual inversion >>39050383NSFW Pony Model >>39114433SD show backgrounds >>39418552so-vits-svc offline ui >>39557072Text to music model >>39570848 >>39571778diff-svc mirror >>39567520so-vits-svc tutorial >>39593802Deep ponies app >>39576010so-vits-svc on AMD/ROCm >>39578882so-vits-svt for colab >>39592429Potential new TTS >>39629052Hardware heist >>39665051New DeltaVox >>39678806so-vits-svt 4.0 >>39683876PPP REDUB 4 COMPLETE!See FAQs for link>The PoneAI drive, an archive for AI pony voice content:derpy.me/LzRFXderpy.me/GOpGP>The /mlp/con live panel shows:derpy.me/YIFNt>Clipper’s Master Files, the central location for MLP voice data:mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSigmega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQmirror: derpy.me/c71GJ>Cool, where is the discord/forum/whatever unifying place for this project?You're looking at it.Last Thread:>>39623147
FAQs:If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.Quick: derpy.me/FDnSkMain: derpy.me/g3fFA>Where can I find the AI text-to-speech tools and how do I use them?A list of TTS tools: derpy.me/A8Us4How to get the best out of them: derpy.me/eA8Wo>Where can I find content made with the voice AI?In the PoneAI drive: derpy.me/LzRFXAnd the PPP Mega Compilation: derpy.me/GOpGP>I want to know more about the PPP, but I can’t be arsed to read the doc.See the live PPP panel shows presented on /mlp/con for a more condensed overview.derpy.me/pVeU0derpy.me/Jwj8a>How can I help with the PPP?Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.>Did you know that such and such voiced this other thing that could be used for voice data?It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.>What about fan-imitations of official voices?No.>Will you guys be doing a [insert language here] version of the AI?Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.>What about [insert OC here]'s voice?It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.>I have an idea!Great. Post it in the thread and we'll discuss it.>Do you have a Code of Conduct?Of course: 15.ai/code>Is this project open source? Who is in charge of this?derpy.me/CQ3Ca>Links/mlp/con: derpy.me/tledz derpy.me/14zBPPPP Redubs:Ep1: derpy.me/xZhnJ derpy.me/ELksqEp2: derpy.me/WVRAc derpy.me/RHegyUnused clips: derpy.me/VWdHn derpy.me/OKoqsRewatch Premiere: derpy.me/EflMJEp3: derpy.me/b2cp2 derpy.me/RxTbREp4: drive.google.com/file/d/1iQzIeGZTbxcu2BJcPHtmyxtiu2mV-MhN/view?usp=sharing pony.tube/videos/watch/c2a0b52d-344a-4240-a415-9a303b734777Rewatch Premiere: mega.nz/file/4AklSZgI#9cAkzRz-81BGNX7dKKpOChDAg3x3KCoBKuSa1lJfDVs
>>39686748Anchor.
>>39686727
Mares?
>>39686854Generally, I find that converting a man's singing voice to a woman's requires you to raise by a fifth (7 semitones).
>>39687473That's gonna depend heavily on the vocal range of both,. There's also a concern of instrumental accompaniment if you're doing a quick vocal swap, which usually sounds crummy if you pitch shift it too much.
Fluttershy so-vits-svc 4.0 (30400 steps, 1689 epochs): https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Fluttershy%20(singing)Combined 4.0 model (before): https://files.catbox.moe/9le1sr.mp3Individual 4.0 model (after): https://files.catbox.moe/edsxj9.mp3
>>39687790I'd be interested in seeing some individual comparisons between 3.0 and 4.0, not just combined vs combined or combined vs individual.Also, can you link the reference audio for that?
>>39686116>>39686203Applied some color changes to make it more similar to the OP pic, let me now if you prefer this color palette
>>39688065Its an improvement, but I would still say to keep the Op pic how it is.
>>39688070it is not supposed to replace the OP pic, friend. just a little logo for autism purposes
>>39688000 (checked)Sure.3.0: https://files.catbox.moe/rf9muj.mp3Ref: https://files.catbox.moe/25xcyl.mp33.0 is quieter because it was trained before volume normalization was implemented upstream.
>>39687790Wow, individual Flutters sounds great!>>39688118So 4.0 adheres to volume? That would explain why I had to manually do the reduction and fade out for the end of the MAS RD kitchen audio.I had a quick read through the translated github of 4.0 and thus far it seems it uses 44khz instead and uses even less memory? So that means longer samples at once without risk of OoM. It mentioned other features too, but are there any changes between them that perhaps aren't mentioned on the github? Like perhaps features this "Content Vec" introduces.>"The data set production and training process are consistent with 3.0, but the model is not universal at all, and the data set also needs to be re-preprocessed"Not sure what this means (Possibly mistranslated). So, the process is the same, but still isn't compatible? I wonder what this re-preprocessing is about.
>>39688364Yeah, he means that it trains just like 3.0, but old models don't work, and you'll need to run resample.py on a fresh dataset, since it's resampling to 44khz now.
>>39688364The volume normalization I am referring to is simply adjusting the volume level so that the peak is at 0dB. In most cases this results in a volume boost. This was implemented at some point during 3.0, but not all of the 3.0 models were trained with volume normalization enabled.>uses 44khz instead and uses even less memoryI believe so, at least for inference.>perhaps features this "Content Vec" introduces.From my very limited understanding ContentVec is a way of encoding the "content" of speech in a manner disentangled from variations introduced by the speaker. It's supposed to be a higher performance replacement for HuBERT which was used in 3.0. I'd expect this to mean better similarity to the target character given arbitrary reference audio (which seems fairly certain given the models I've trained thus far).>Rest>>39688591Is correct. Also the use of ContentVec instead of HuBERT for feature extraction (although most of the actual code seems to still use "hubert" or "h" for naming purposes).
>>39683324>https://desuarchive.org/mlp/thread/37983482/#38002630Fuck you.
Yet another svc shows up.https://github.com/yxlllc/DDSP-SVCShould be trainable even on GTX 1660
>>39688871Any samples? My experience with ddsp/sawsing was that the voices were pretty rough.
Week count?
>>39689595
>>39689607It's 2 isn't it?
>>39688118Hm, okay, that actually sounds rather noticeably better. Curious about normal talking too, but the singing is definitely better. 4.0 is probably worth retraining for.
>>39688791You're not wrong, but for the sake of historical accuracy keep in mind that exaggerating his "achievements" was a part of his attentionfagging. In the same thread where it was discovered that he's the Noxfag, someone claiming to be the allahfag said that at least that part of this post is bullshit. That was while Delta was still denying all accusations.
>>39690236I wouldn't be surprised if the alleged allahfag was Delta himself trying to soften the accusations because he already knew he was outed by then.
>>39659495>(Re: ControlNet) I could imagine that a new model could have potently be trained on fanart and leaked animation images too to allow for same control over the pony models?>Could you use a pose estimator(?) model that is trained on the leaked flash animation files, to estimate the pose, bone transforms and 2D assets that are used in a post-rendered animation (or something along those lines)?I think so, yes. It should be possible to deterministically extract pose information from the animation data, train a pony pose extractor from that, and convert all show data into pose-to-image training data. We'd need to train our own pose extractor since the current pose models only work on realistic human images. They supposedly don't work well for cartoon humans, let alone ponies.>>39660412With Astra's Pony models plus ControlNet, I've changed my mind on the usefulness of 2D image generators for animations. My guess right now is that a pony model, custom ControlNets, upscaling/downscaling, and in-painting could be enough for creating show-quality animations without going using any vector graphics after training.I think we'll need to train our own ControlNets. I've seen examples similar to >>39660533 of it working with animals, but I don't expect the standard ControlNet models to be robust for ponies, especially since, e.g., the pose module has no way to explicitly specify horns and wings >>39660672. If we can get a custom pony pose ControlNet, the only additional thing we'd need is a way to generate a frame-by-frame sequence of poses, which should be much easier than trying to generate the full frame-by-frame vector data. For background & assets, the current ControlNets + a large background & assets LoRA might be good enough.>>39666621Fixed, and I replied in the Fan Site thread. I'm waiting for someone to test it.>>39687483I'm still around, but my time's been hijacked recently so I haven't been checking the thread that often. I'm going to be spending a lot less time on data and a lot more time on random AI things. You're probably not going to see much output from me for the next few months.
>>39683021>>39683324Re: training code for LLaMA.https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama
>>39690397>https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllamaNo information on hardware requirements it seems, although I assume you can't finetune these models on a single consumer GPU.
so-vits-svc 4.0 Applejack singing (2730 epochs, 46400 steps)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applejack%20(singing)Individual (new): https://files.catbox.moe/121dvr.mp3Combined: https://files.catbox.moe/wzyzda.mp33.0: https://files.catbox.moe/444jde.mp3Ref: https://files.catbox.moe/8fcsnt.mp3The difference is a bit less clear here. You might generously attribute this to my own pronunciation, which was not my best Applejack impression.
Is it just me or did Rainbow Dash's singing voice get a lot harsher throughout the show?
>>39691035keep at it, when i finally find some free time I'll test with my aussie accent to see how well it performs compared to last time, heh
>>39691035>Added automatic pitch f0 prediction for voice conversion (will be out of tune if used with singing voices)So what kind of setting in the training Colab should one do if they would want to train an singing voices?
I finally decided to try it out for myself, it isnt as complicated as I tought it would be.In my first test with Talknet I tried to make the ponies speak spanish. Trixie seems to do it besthttps://files.catbox.moe/jqjr85.wavhttps://files.catbox.moe/u1sjw2.wavIn my second test, I wanted Starlight to sing the soviet anthem, but I dint find her in the colab version of Talknet so I just did it with Trixie. It didnt go as plannedhttps://files.catbox.moe/uf6gu6.wavreference audio: https://files.catbox.moe/plaq4r.mp3https://files.catbox.moe/945njf.mp3I think the main problem is my bad pronuntation on some words that causes the ponies to sound strange. As for the bad quality of the singing audio, I have no idea, maybe just bad entonation?if this is what a complete newbie is able to do with talknet I cant wait to try out 15.ai
Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject
some previously unleaked fla files that were circulating privately and being gatekepthttps://files.catbox.moe/vwjj0q.7z
>>39686757Tutorial for the training of the So Vits SVC Training Modelshttps://www.youtube.com/watch?v=pXAQCTEZrFUNamelessHero_eng (its 3.0 model, I will get version 4.0 trained and uplaoded on to huggingface sometime in the future)https://mega.nz/file/M1803RTT#_a2MrdVo-BZq0S-g59NBsYzDWyC75gTvUvH0KhYFrGs
>>39692703>76 files>All EQGNo use for any of that here, and not a significant quantity compared to everything else we already have from the studio leaks, though interesting to know that there's unreleased stuff out there. Would there happen to be anything else from the same source?
>>39692837Slight correction - it's not ALL EQG, 7 of the .fla files are pony, of which 6 look like fan concept images and only one is actually animated - a collage of portraits of simple pony talking, pic related. Still wouldn't say this is particularly useful for our purposes.
>>39692902>fan concept imagesno, they're source files for official promo images and some of them (like royalty.fla) contain S1 animation.for example Hasbro used to distribute Friendship_2 a lothttps://www.wsj.com/articles/discovery-to-take-control-of-the-hub-network-1410979842
so-vits-svc 4.0 Twilight singinghttps://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Twilight%20(singing)Samples: https://files.catbox.moe/vbdc91.mp3https://files.catbox.moe/9yxmmt.mp3https://files.catbox.moe/hvq4yj.mp3 >>39691673Automatic pitch prediction is an inference option; it's not involved in training.>>39691988TalkNet uses English phonemes. You might have some better luck with so-vits-svc 3.0 (or 4.0 once I get models trained) although at this point I don't know how easy it is for a beginner to actually set it up.
new maka using so-vits-svchttps://www.youtube.com/watch?v=ZQNZxDJiRBA
>>39693094I rate it generic/10.Also the animation seems bit wonky too.
>>39692703Ah right, I remember seeing those posted before: https://desuarchive.org/mlp/thread/34819795/#34831574
>>39693094Ugh. Another "I know sad is bad, but don't feel sad, I'm here for you" story? Christ.
>>39686748I’m curious: did anyone here see any of the posters put out by the AGU fall meeting? Or did anyone besides me submit a poster?
>>39692902is that the adobe animate from synthbot?for some reaosn when i downloaded it again it required a password to extracti got the program from a torrent site instead which was pre activatedeverything seems to be working right but since its a torrent im a little nervous
>>39692331Live in ~45 minutes. Animating.cytu *dot* be/r/PonyPreservationProject>>39694760>is that the adobe animate from synthbot?Yes.>Passwordiwtcirdhttps://desuarchive.org/mlp/thread/38293341/#q38335424
>>39695543thank you so muchhave this twi i was playing with on me
>>39692902that looks like a scene from Rarity's "becoming popular" song from Sweet and Elite.
Rarity and Rainbow Dash singing models, so-vits-svc 4.0.https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(singing)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rarity%20(singing)Samples:https://files.catbox.moe/c2lmbb.mp3https://files.catbox.moe/1o0g0k.mp3https://files.catbox.moe/ccbffa.mp3https://files.catbox.moe/7s76wf.mp3I also made an alternate Rainbow Dash model being selective with some data for a less harsh voice (closer to what she used in S1). The effect isn't very dramatic, but it's there.https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(singing%20alt)https://files.catbox.moe/vap9dd.mp3For fun:https://files.catbox.moe/gzkobw.mp3https://files.catbox.moe/02ukpo.mp3
>>39696145>For fun:>https://files.catbox.moe/gzkobw.mp3>https://files.catbox.moe/02ukpo.mp3Really nice. Adding those to my playlist. One day I'll be really shocked when I hear mares singing Roundabout.
Any chance for a Colab version for the So-Vits 4.0?
>>39696385I'll start working on one if no one else is doing it
>>39696589Yes please
>>39696145Second Dash voice is definitely better, though still some ways off.>For funNicely done, the voices are still impressively capable despite the current lack of accuracy.
>>39696145Im trying to install the so-vits 4.0 on my windows but im getting "error: no module named 'fairseq'"using 'pip install fairseq' fixes itAlso there was a quick message popped out about lack of clustering model to be found (but its working fine so dunno)?And there is problem with the UI being way too tall, I can't access the preview button (and I can only just about press the "convert" button"), could it be possible to make the UI window scale-able (or an option to make 25% smaller)?
>>39697031Thanks. I've added fairseq to requirements_win and reduced the size of the UI for 4.0.
>>39697031>>39697629Also--the lack of clustering model is not an issue.
Pits, successor of VITS just released most of their code expect training codeDemo and model is releasedhttps://github.com/anonymous-pits/pitshttps://huggingface.co/spaces/anonymous-pits/pits
>>39692703This is amazing thank you anon, idgaf if its EqG or "not of use" any assets related to the show are fucking gold, Is there more gatekept files like this? Do you know the source?
>>39696598so-vits-svc 4.0 inference notebookhttps://colab.research.google.com/drive/1128nhe0empM7u4uo5hbZx5lqjgjG1OSf?usp=sharing
We just dumped the mlp leapfrog game, there's a ton of original voice lines and music! There is a data.arc that seems to contains all the rest of the assets, we were only able to extract a single png and gif from it though, perhaps one of you datachads could assist in decompiling it further?Dump: https://www.mediafire.com/file/7sc6wnnaexv3pch/MLP_Leapfrog_Game.zip/fileTool that extracted two images: https://github.com/panzi/mediaextract
>>39698236This was the only relevant asset extracted from data.arc so far
>>39698078Very neat, but can there be an option to download and use a custom models too? (like if someone sharers their things on another HF page, or Mega or google drive?)
>>39698389I have an inference here that supports custom links, Was told by Amoart's tutorial to post it here.https://colab.research.google.com/drive/1xUK-bdsrGoV5PUJz3z1cArTH0of1CWON#scrollTo=IpksmWvf0v1HI have my own translate for the training colab too but you guys seem to have your own anyway
>>39698403I can't test this script, there is no G_ model in provided testing model and the vul singing models are not uploaded in a zip....>Never Gonna Give You Up 4K.mp4 1.73GBoh, you are just shitposting, never mind then.
>>39690386I'm not too sure as to the progress that has been made in the logic behind animating with AI, as to whether or not the whole image will be generated, or if pre-made vectors will be posed by an AI, but I was thinking about a way that could make it possible to automatically animate speech as well:If you have a mouth shape for each letter that you're trying to lip-sync to (like is standard in 2D animation I believe, and was done in MLP), could you get an audio-to-phoneme extractor AI thing with timestamps, then just pick out a mouth shape file that is assigned to each phoneme in whatever you're lip-syncing to for each of the timestamps. Then all you've got to do is make some script to compile those image files together into a video at the frame specified by that timestamp, right? It seems a little too simple to not already be made, since Youtubers that use animated characters in their videos seem to commonly agree that lip-syncing is the majority of the time spent animating, but if there was an automatic one, wouldn't they all be using it already? Is it just that an audio-to-phoneme AI is that new, or is this idea just flawed?I haven't really got the knowledge in AI to write that code and troubleshoot through all those steps to check the validity of that, which is why I'm just posting my ideas, as I'd like to try to contribute in some way. And alike the previous ideas I posted, it seems to be plausible at face value.
>>39698431the samples on the google colab is a shitpost. Just change the link to a zip file shown in the image example and it should work (you dont have to use the D.pth file just the G.pth and the config.json is fine)
>>39698468well, if it works, can you can change it so the code will give you a list of speakers to pick from? Digging inside the config file and typing all the data files by hand is bit of downgrade from what we have in all the other UIs.
>>39698236rude
>>39686757https://files.catbox.moe/8gzdfo.mp3https://huggingface.co/Amo/so-vits-svc-4.0_GA/resolve/main/ModelsZip/NamelessHero_eng.zipNameless Hero trained to 18400 steps, the voice still trips off on some simple reference audio noise(like breathing or light change of pitch in middle of audio), but its better than 3.0 version.>>39698468There are several little improvements that could be added to your script, like saving the file in wav format so its easier to share or add a most recent audio conversion playback button.Also for the download zip cell, I would recommend to add some sort of string check converter, that when detecting a HF link like this:https://huggingface.co/Amo/so-vits-svc-4.0_GA/blob/main/ModelsZip/NamelessHero_eng.zipchanges the "/blob/" element to the "/resolve/" (it took me a while to figure out why it refused to download that zip link).
>>39698528kek, keep posting stuff about thisi love preservation of almost everything so a game for an obscure edutainment console for children spikes my neuronsimagine a world where the my little pony leapfrog game became lost media, what a tragedy that would be
I assume 15's next update is going to remove most if not all of the artifacts and have 100% character accuracy, right?>>39688791>>39690236>>39690239What the fuck is going on
>39698674
https://github.com/svc-develop-team/so-vits-svc/tree/4.0The future update for so-vits-svc had moved to here as original author drop off the project for good
>>39698674>I assume 15's next update is going to remove most if not all of the artifactsYes.>and have 100% character accuracy, right?Yes.Plus, there will be a new UI with many new quality of life improvements and features.The most difficult problem by far is reconciling the three big problems: stability, character accuracy, and artifact removal. For most configurations, you can have two but not all three. Achieving all three at once is the holy grail, which at one point I thought was impossible without some major modifications.
>>39698968yippee you’re alive, hows the update goin’
>>39692703>>39693745Many thanks for these, I've put what I believe to be all 2013 - 2015 leaks into a single zip nowhttps://www.mediafire.com/file/i9dpmyg237p5d55/2013_-_2015_MLP_Leaks.zip/file
>>39698674there won't be a next update.15 transitioned and now works for elevenlabs under the name 11
>>39699260Based thank you
>>39699264Does it make you feel good to be a bootlicker?
>>39699289he gets a nice big paycheck
>>39699349The company worship in this thread has gone insane.
>>39699353have you bought skyrim yet?
>>39695543Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject
>>39698078>loads fast>sounds good>button to delete all audio files in ColabYou absolute legend.
Whats the best ffmpeg command to extract clean voices out of the 5.1 surround episodes?Right now I'm using -af "pan=mono|c0=c1-7*c2"but it still has some sound effects
>>39698078Do all the existing models load in to the colab?
>>39698389Updated, but not tested on GDrive or MEGA zips.>>39699458All of the 4.0 models I'm aware of load into the colab.
>>39699452AFAIK the FC channel for the episodes always have some sound effects; noise removal tools like izotope RX were used afterwards to clean up the voices.
ost https://youtu.be/srudXQsiT4I
Also does anyone have the poses from the game icon? Clipper is looking hard on the stream but coming up a bit short would be nice to recreate the icon in a HD format
Im trying to run the "Fimfarchive tools - release v4.ipynb" tool but Im getting the errors from the 'assert' code, here:import globfimfarchive_path_candidates = glob.glob(f'{ppp_path}/story-data/fimfarchive - *')assert fimfarchive_path_candidates, "You need to add the pony-preservation-project folder to your google drive."AssertionError: You need to add the pony-preservation-project folder to your google drive.I do have the folder shortcut lined to my google drive so I do not know why it can't seem to be able to detect it. Any ideas why it's not working and how to fix it?
https://archive.org/details/mlp-leapfrog-game archive.org page is up
so-vits-svc 4.0 Pinkie speaking modelsAll speaking data mixed: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Pinkie%20(speaking%20mix)Separated by emotions: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Pinkie%20(speaking%20sep)Here's a bit of an experiment: separating the model into speakers by emotion. The resulting model is quite temperamental (some of the speakers just straight up don't work) but it can produce some interesting results. For most purposes I would prefer the mixed model though. If others could test this out and draw their own conclusions I'd appreciate it.For fun (separate emotions):https://files.catbox.moe/h9kn99.mp3As a side note, it appears that my previous assumption that clustering would not do anything for single-speaker models was incorrect--I trained a cluster model for the mixed model.cluster ratio 0.0: https://files.catbox.moe/q7tgsl.mp3cluster ratio 1.0: https://files.catbox.moe/2eitez.mp3
>>39700278Do you think models could be trained for Flim and Flam? Or is lack of data still too big of an issue for that?
[SoVits] Athena - Shawn Song>https://files.catbox.moe/yramfb.mp4Not quite the final version, still got some further syncing and redoing to do.May also see how 4.0 Athena does with it once I get her trained up.
ey Clipper what was the radio you were listening to in the stream? I cant seem to find any /mlp/ horse radio
>>39700567>ShawnMore like Xiang
>>39700567Sounds like she's been getting into Shawn's lemonade again
>>39700679https://radio.horse
>>39700681Yeah, I noticed that SoVits, or at least 3.0 (no idea if 4.0 differs), has a strange flaw. If the "sh" sound is too short, it'll ignore the hissing "h" sound altogether.I wonder if there's a way to rectify this and force it, and/or if 4.0 will have the same issues with it. Could this also be a result of being only lightly trained? I suppose I'll find out for certain once I train 4.0 Athena, which will likely begin this Friday.
>>39700278Follow-up: clustering models have been trained for all the existing so-vits-svc 4.0 singer models and are located in their respective directories on huggingface.
https://files.catbox.moe/6qsoif.mp3
>>39701999Checked and kek
>>39701999Damn that melody though
>>39701999holy based trips and reference
Did 15 talked about what to expect in a future update?
>>3970224015 more years, trust the plan
>>39701999good content anon
>>39701999Trips of awesome.
>>39699387Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject
>>39698078It was working, but today it's giving an error about faiseq not being found when trying to infer in the final step. Installing fairseq in the setup step leads to a "object is not iterable" error upon running the final step.
>>39703142Colab updated to python 3.9 which appears to have broken the fairseq install. Updating pip seems to fix the issue.
>>39700367I think it's possible for speaking models to be trained for Flim and Flam. Most of the singing data is quite noisy though.
Does anybody know the performance of crepe vs. dio vs. parselmouth for f0 detection?
>>39702240She said she would reveal more about the 11.ai partnership and transition process
>>39703599How much are you getting paid for this exactly? Genuinely curious.
>>39698465That sounds doable, though it would take some manual labeling work (which assets correspond to the mouth), it might require manual work aligning audio to animations, and it's unclear to me how animators would use it. Maybe such a tool could generate asset files, which animators can load into their animation software and pose.
>>39700014Fixed.That was my bad. I changed the "story-data" folder name to match other data folders, not realizing that doing so would break this script.
>>39703369This is supposedly good, according to VA Anon's tests.https://github.com/effusiveperiscope/so-vits-svc/pull/2/commits/218c55a6ebb67a5eba4b7f5d2744dcbe8b6ed3a7
>>39698078So I clicked on the buttons down to the models, downloaded my models. Where are they stored and how do I run the GUI to get started?
>>39704080Run the last cell
Pony version when?https://www.reddit.com/r/artificial/comments/11l3muz/i_made_tinder_but_with_ai_anime_girls/
>>39703253How?
>>39704232It's already fixed, just click run all. something in one of the o steps fixes it.
?????
>>39704738pony?
>>39704738and before anyone asks, despite being blueray release, it only uses left/right channel, so no voice extraction to shitpost in the Director voice
Just had an idea for the 8x a100s:train large chatbots and voice models otherwise impossible on consumer-level hardware. There's LLaMa 65b, GPT-X, and many more models with too many parameters to fit on an ordinary pc. Would existing voice models benefit from datasets with more parameters?
>>39704204god no. that shit is as bad as real tinder -- swipe right 20 times for 1 match. Worst still, they have a pay-to-win subscription.
>>39704796>train large chatbots and voice models otherwise impossible on consumer-level hardware.yeah no shit, grass is green too
>>39703763>Maybe such a tool could generate asset filesIf you mean like generating mouth shapes from scratch, I'm sure that's possible, but my current idea would work by just having however many pre-made mouth shapes as images to pick from. That way you wouldn't get any randomness / jitteryness that is currently present in Stable Diffusion in videos. That's also the main reason I feel like using vector images/images would work better than just fully generating a character. If you can pose each image correctly using AI, then you're good. But if you generate a character purely from AI, you've got to pose it, AND generate it, and generating it could lead to all sorts of weird problems, which could mean the exact shape of parts of a character wouldn't be a sure thing. I'm sure that could be fixed, but it's just a lot easier to use images to get the same result, from what I know about it at least.>and it's unclear to me how animators would use itIf you had something like 18 different images (I don't actually know how many there are usually) you use to animate mouth shapes, each labelled to a different vowel / consonant / phoneme, you could do what I had suggested before by getting the timestamps and phonemes, then pick out each image assigned to that phoneme and stitch them all together at the specified time in a video. Then, if you had images with transparent backgrounds for the mouth shapes, you could just drag and drop the animated video into the one with your character, and then just place the mouth video in the correct spot on the character for each frame. For AI generated animation, you could maybe assign/train a specific place to put the mouth video file/frames, on something like those controlrig things. Maybe even just combine the above logic into an animation AI somehow to remove a few steps.Not much of my idea is automated apart from picking out the right mouth shape for the right timestamp, but that part that is automated usually is the thing that takes the longest when lip-syncing. It's pretty easy to reposition something in a video editing software, as opposed to fully lip-syncing something. So what I'm saying essentially is, there's still work you'd have to do, but it would hopefully be significantly less.
>>39686757>>39700278so-vits-svc 4.0 Rarity speaking (single speaker):https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rarity%20(speaking%20mix)It seems to me that the best approach might be to combine the data and train separate (smaller) models for additional tones as needed (e.g. for whispering, screaming).Another thing that I've discovered is that you don't need to train cluster models on the same dataset that the base model was trained with, as long as the speaker name is the same as in the base model. I have no clue what the effect of using a cluster model trained on a different dataset is, but I've updated the 4.0 offline GUI to give an option to load a custom cluster model just in case it turns out that they can be used like voice LoRAs or something.Also, I've added an option to the 4.0 offline GUI to use crepe for pitch detection if you have tensorflow and crepe installed. It's much slower, and I wouldn't necessarily say it's better than praat/parselmouth, but crepe sometimes succeeds in areas where the former fails and vice versa.
>>39705179(I forgot a picture). Normal cluster model used with 0.8 ratio.https://files.catbox.moe/gh2lvl.mp3https://files.catbox.moe/c32nuu.mp3
>>39705179Was this done purely with so-vits or was reference audio ran through talknet? The flow seems off to me.
>>39705318Purely with so-vits. I wasn't trying to copy Rarity's flow.
>>39705179On windows, when selecting the output device and ticking off the "autoplay" checkbox it crashes the UI with this message:Traceback (most recent call last): File "inference_gui2.py", line 393, in update_init_audio mixer.init(devicename = self.selected_dev.deviceName())pygame.error: No such device.The normal play button still works so that wouldn't stop me from doing any audio conversion, its just a weird bug I've notice, so ehhh.
>>39705415WOMM. I changed the code to make it use pygame to look for device names though; lmk if it works.
>>39704796>LLaMathe best thing you can do now i think is to host the pure 65B or few 30B instances on the on the kobold horde it is absolutely insane how amazing it is, deals very well with fimfiction aswell, starting with this prompt: https://aetherroom.club/1002 interjecting the memory part between "-- key notes --" and "-- story cont --" every generation while removing a paragraph for every paragraph generated I got this, it is on nat,dev and I really don't know what the deal is with that site, it lets people compare models but i worry it might be very slimy with data collection or just kill itself after few days but it popped out yesterday, is less comfy featurewise than KoboldAI (generations are fast and its as good, but its because only a one person hosts 13B llama on the horde).
>>39705460>at,dev and I really don't know what the deal is with that site,It has a hidden quota
I'm having trouble with TalkNet. It fails on step 3 and rerunning the cell isn't fixing it.
>>39705504hmm, not the colab anon but whenever in the past colab start to give me a fit, ive just installed the modules they were asking like this:!python -m pip install --upgrade pip!python -m pip install --upgrade dash!python -m pip install --upgrade jupyter_dash!python -m pip install --upgrade nemo!python -m pip install --upgrade dash_bootstrap_components!python -m pip install --upgrade torch_stft!python -m pip install --upgrade kaldiio!python -m pip install --upgrade pydub!python -m pip install --upgrade frozendict!python -m pip install --upgrade unidecode!python -m pip install --upgrade pystoi!python -m pip install --upgrade pytorch_lightning!python -m pip install --upgrade g2p_enbut now im getting this error, and those are not python modules.
>>39702240trust his plan or something
>>39705504Error message?
>>39704177Ah, I see it's really hosting it on google's computers. Is there a way to have the collab install and run everything on my local machine? I have Talknet running locally and I'd like to have So-Vits aswell. Is there a .bat that can run and install everything like the collab did or do I follow amoart's tutorial install video anyway?
>>39707207Colab cannot install anything on your machine. Currently there is no installation .bat. IIRC Amo's install video is for 3.0, not 4.0.
>>39705504>>39705533Updatedhttps://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing
>>39698968I think 15 fell in the shower (2) weeks ago, and that you're just impersonating them. How would you confirm your legit?
Twilight (speaking) so-vits-svc 4.0 model (40 epochs, 43200 steps)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Twilight%20(speaking%20mix)https://files.catbox.moe/nju4ny.mp3https://files.catbox.moe/dp1wba.mp3Extra cluster model for Rarity: https://huggingface.co/therealvul/so-vits-svc-4.0-cluster/tree/mainhttps://files.catbox.moe/vq36aa.mp3https://files.catbox.moe/dg2tff.mp3
>>39708556>https://files.catbox.moe/nju4ny.mp3>https://files.catbox.moe/dp1wba.mp3Is this bad input or something? I swear the 3.0 speaking model was sounding really good.
>>397085803.0 straight:https://files.catbox.moe/r9ckqi.mp33.0 thru TalkNet:https://files.catbox.moe/at2hij.mp34.0 thru TalkNet:https://files.catbox.moe/dvhwtc.mp3It's possible that the model is undertrained (considering the size of the dataset), that the different dataset has an effect on the final result (I believe 3.0 was trained on some smaller subset of the speaking data available for Twilight). 4.0 straight may underperform compared to 3.0 thru TalkNet first. Will try further training.
>>39707243yea but you can run collab shit locally with jupyterhttps://robotwealth.com/how-to-connect-google-colab-to-a-local-jupyter-runtime/
>>39708731Neat, good to know. That might be a good option.
I know nothing about computers, but could a chatbot similar to chatgpt be made to help write pony fiction? While chatgpt isn't opensource, I wonder what it would take to make a similarly strong chatbot and train it on strictly FIM content and see what would happen.
>>39708786Something like that might be happening in the coming weeks given the LLAMA leak (there's already code for finetuning it for a chatGPT style chatbot), but you'd need some pretty beefy hardware to finetune and run the models (4bit quantization is lowering the VRAM requirement for running the largest model to around the 40 GB range, which is more than most people around here would have I'd imagine).
>>39708556>https://files.catbox.moe/vq36aa.mp3excellent
>>39708797I wonder if it would be possible to train it on fimfic.net, and have it priotitize certain tags, disregaurd other tags, to filter out non fim specific writing, and prefer high upvote ratios to have it aim for well written stories. Not just porn, but also the longform adventure fics and the like. I want it to be able to pump out a 200k word slice of life story that just follows the day to day life of ponies in ponyville, for example.
>Friday finally arrives>"Time to train 4.0 Athena">Locates training colab>Runs to begin training>Huggingface auth failure>No 4.0 training todayIt also seems the entire github (https://github.com/innnky/so-vits-svc/tree/4.0) has gone poof too. What happened?>>39701999Top kek, great reference.>>39708556Gosh, those sound very synthy or TTS-like. Was the reference audio of a TTS voice or something? Because that would explain most of it.>https://files.catbox.moe/vq36aa.mp3This however, is quality.
>>39709205Something happened with the project. Someone used it to clone the voice of someone they shouldn't do.They were scared of that biting them back so even the forked github is gone,i think the eff branch is the last one with 4.0 code
>>39709221It only takes one idiot/asshole to ruin it for everyone else...
>>39709230I saw it happened with the SD community so i get why people want to do thatI will keep it updated in the mean time,thought diff-svc and fish diffusion is safe by now
>>39709232Is 3.0 still functioning though? I wouldn't mind still continuing 3.0 Athena if that's the case, or possibly trying out Lustrous.Also, hadn't checked inference yet, I assume this only affects training colabs?
>>39709252Nevermind, 3.0 training is definitely down too. Fuck.
no need to panic. They just moved here https://github.com/svc-develop-team/so-vits-svcthe person who uploaded the repo just wanted to distance themselves cause people in china kept making ai models of celebrities and politicians
>>39709283i knew your name was familiar, you're that siivagunner guy aren't you?anyway thanks for the info, phew
>>39709283Not panicking, just strong frustration as my mental schedule dissolves due to unforeseen circumstances.>https://github.com/svc-develop-team/so-vits-svcIt appears the pretrained models linked in there also lead to the broken "innnky" location?
>>39709289Well yes but actually no since my channel isnt exactly siivagunner. Im technically also guilty of what happened to the sovits repo too since i was the person that made it known in the west with the Rick Astley celebrity sovits svc model
>>39709304don't worry bro, that shit fucking rocked
>>39709303they are making a new pretrain model due to the previous one having copyright issues. Either wait for that or wait for someone to reupload the old pretrain modelkeep in mind the the one upload on svc develop team is sovits 4.2 and not 4.0 so 4.0 models are probably not compatible with that repo
>>39709310wasn't the last one 4.0v2? what is the difference beside a brand new pre trained model now
>>39709328yes the last one was 4.0v2/4.2. the main difference from 4.0 to 4.2 was that 4.2 uses visinger2 structurealso nevermind about it being 4.2. The repo updated to seperate 4.2 and 4.0 and the main one is 4.0 nowyou can read more about it here. Just deepl the fuck out of it https://github.com/svc-develop-team/so-vits-svc/tree/4.0-v2
>>39709351thank you,guess i keeping the 4.0 i had beforethe collab would need to be updated right?
>>39709354yeah just replace the dead links and it should work
>>39709310>reupload the old pretrain modelHmm, are the D_0.pth and G_0.pth files that were created during past training the original pretrained files? Or are they modified somehow and thus not able to be used?
>>39709367those are the original pretrain models yeah. It only changes in the later iterations
>>39709372Excellent, guess I can create a modified version of the 3.0 colab to use my reupload on huggingface.Though realistically all anons who've trained any SoVits models here likely has them already, so there's no shortage. >https://huggingface.co/datasets/HazySkies/SV3/tree/mainLooks like I can train her today after all. Thanks.
Nevermind, guess it's sleep time instead.Perhaps tomorrow the mares await.
>>39696145How in the world did you manage to get the vocals to sound so good. I feel like I'm getting junk from good stuff being put in. Especially background vocals turn everything to a loud buzzing.
>>39709221Absolutely phenomenal, I was wondering why it just all of a sudden stopped working.
anonfiles com/ScObw8d0z6/checkpoint_best_legacy_500_ptwelp, until the shitshow with the hubert get sorted I guess I will just share the old one for anons to use for their offline 4.0 synthesis.
>>39709808anonfiles com/GaX4w6dfz7/G_0_pthanonfiles com/m8Wew6d8z9/D_0_pthOh, I guess I should put tout the original G and D models too as well, just in case.
>>39709252>I assume this only affects training colabs?Just tried 4.0 inference via Colab, it's down as well.
>>39709808>>39709826Do you happen to have the files for 3.0 as well?
>>39710185anonfiles com/RbC3z4d6z3/hubert_soft_0d54a1f4_pt3.0 huber model, from what i see the 0 G and D models are same.
4.0 colabs have been updated. Training has not been tested.>>39709283You don't need to ask, but from what you know, is it possible they will try to take down existing github/colab/HF of the models?>>39709205>Was the reference audio of a TTS voice or something?Nope, just me. Although I suppose my delivery was very TTS-like.Twilight Speaking 4.0 model updated (100 epochs, 108800 steps)(I also altered some parameters, like noise scale which was set to 0.8 instead of 0.4 as I think it was before)4.0 straight: https://files.catbox.moe/aie049.mp34.0 talknet: https://files.catbox.moe/lpt81y.mp3This sounds a bit better formant-wise (pitch detection errors left in for authenticity). I'll train speaking models to at least 100 epochs from here out. I also wonder if the presence/lack of breathing sounds might contribute to your perception of the voices as artificial.>>39709775Providing examples/generation parameters would be helpful.
>>39703029Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject>>39710237The talknet one sounds better, though the speech feels sort of flat, as in pretty even pitch throughout, which makes it sound somewhat robotic/emotionless. Might sound better with different reference audio.
TalkNet Colab is borked again
>>39710810WOMM, did you restart the runtime before step 3?https://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing
>>39710829Never had to do that step before, but I ran it now and it worked, thanks.
https://www.youtube.com/watch?v=vAZpM_LxrXMhttps://robotwealth.com/how-to-connect-google-colab-to-a-local-jupyter-runtime/with the power of indian dubed tutorials and the basic scripts I am slowly figuring out how to run the sovits training offline so I can just run the script in the background without worrying the google will cuck all my progress.The first thing I've lear was that jupyter fucking hates being run in the C drive so i needed to 'cd' into a folder in another drive space before running all the connection code.
>>39710185On 3.0 -- the problem doesn't appear to be what I thought it was; I appear to be able to get the model to save perfectly fine using small eval/log intervals. I would suggest to try running it again using a small eval/log interval in configs/config.json just to see if it works. The only thing I can think of that might be causing problems is Drive space?https://colab.research.google.com/drive/1ABKgdAWp2UGfgavyxI_Nv7ytdL5h6J6f?usp=sharinghttps://files.catbox.moe/4n7jew.log
>>39711153>!python preprocess_flist_config.pyUnicodeEncodeError: 'charmap' codec can't encode characters in position 7-10: character maps to <undefined>got stuck on this error for a bit until i realized it just computer unable to read chinese character so ive comment out the line 47 and inserted a 'print('audio: ' + str(file))' line in its place.
>>39710237out of curiosity, what kind of cudatoolkit version are you using for your training?
>>39708647when I trained 3.0 speaking twilight I removed all the ones marked very noisy
>>39696145Had to comment on this. Been kinda meh on most so-vits stuff, but this really came together nicely. Still definitely flawed, but man in the mix you can barely tell most of the time. I think this may be some of the best harmony singing I've heard from AI.
>>39711588nvcc --version reports v11.4.48>>39712013That's the same as my setup, good to know.
>>39710237they probably wont take down any models. Most likely up to the real life person that the ai model samples which i doubt they'd do anything to the mlp models specifically. The new repo for so-vits svc actually has a terms of service at the very top if you wanna know more about it.
has anyone tried making a so-vits svc gui auto install with bat files? Theres one for diff svc (https://github.com/neutrogic/diff_svc_ui/releases/tag/release). I have the gui installed already but I feel like a so-vits svc equivalent would help out a lot of people. Especially the ones who keep DMing me about it.
Ponyzones 1-3 by Geekbrony have all gone privateanybody got backups?
>>39712308They're all on Derpi: https://derpibooru.org/tags/artist-colon-geekbronyIf they decide to put him on DNP list, they're also on Twibooru.
>>39712308https://www.youtube.com/watch?v=4IViz2W8FIU
>>39712335>>39712352thank you
>>39712302Not that I'm aware of; might be something to look into though.
so-vits-svc 4.0 Rainbow Dash speaking (111 epochs, 68800 steps)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(speaking)https://files.catbox.moe/m63mqw.mp3https://files.catbox.moe/wj8068.mp3https://files.catbox.moe/5t6s0b.mp3https://files.catbox.moe/a3bmmf.mp3https://files.catbox.moe/8zgh1l.mp3
>>39712302Here's a first attempt for 4.0. Cannibalized some stuff from TalkNet.https://github.com/effusiveperiscope/so-vits-svc/releases/tag/v0.1.0-alpha
>>39708556>https://files.catbox.moe/vq36aa.mp3This is amazing. You get one (1) free internet.
>>39710237>Providing examples/generation parameters would be helpful.Input then output using the 4.0 GUI linked here earlier. The speaker files were retrieved from page 126 of the main PPP doc.https://files.catbox.moe/8eq2op.mp3
[SoVits] Athena - Anon's A Faggot (Fuck The Pussy)>https://files.catbox.moe/1qdxl4.mp4So glad Synthesizer V works perfectly fine on linux. It pairs well with SoVits, with a lot of tinkering of course.Started off doing the usual song via SynthV due to pitch errors Athena threw out when fed the cleanest raw vocals I could separate. Got bored and decided to parody too, 'cause fuck it.[SoVits] Athena - Move Your BodyOriginal to SoVits>https://files.catbox.moe/3ph3g0.mp3SynthV to SoVits>https://files.catbox.moe/rvllpk.mp3The latter isn't quite as good as the parody in terms of accuracy, as turns out less breathy settings helps her out more.I may make a full version some time in the near future. Definitely gonna try more parodies and original works now that this is an option that works well.
>>39713592I still can't get over the fact that her VA is a 50 year old man. It's gonna make any audio porn (or just any audio) with her really awkward, because you know it isn't Athena professing her love for you, but technically Shawn wanting some bussy.
>>39713645Hmm. Do you consider, say, Rarity making lewd sounds, or confessing her love as Tabitha being the one that's doing it? The origin of the voice is inherited, but the identity mare is her own. If she loves you, then she'll tell you wholeheartedly; even if it might take a 50 year old man or more to tell you.
>>39713704Or well, to say it rather. She can tell you in other ways, but it'll take an inherited voice to actually speak it aloud. In Athena's case it would be Shawn.Speaking of, aren't all the lewd sounds and lyrics in Pony Zone also spoken by males? Although, I guess that's more male guiding a female voice, where as Athena is more male/female guiding a pitched up male voice doing an impression of a female character. Fairly complicated.
>>39713704I'd say it's different when a female VA is doing a female character's voice, as opposed to a male VA doing a female character's voice.
>>39713748>>39713742Only in the technical sense, it's different. Shawn does the voice himself, with only pitching it up slightly to give the female impression. Voices done by TalkNet may be done by guys, sure, but they go through voice changing software that just auto generates a sound clip using the original voice as reference.
>>39713345This is polyphonic audio. None of the models trained here involved polyphonic pitch detection algorithms as it is much simpler to model a single speaker. If you are trying to recreate my Roundabout example, I had to sing all the reference voices individually.
https://youtu.be/vKXgSZdbW6oI forgot to post this thing. >>39713742>aren't all the lewd sounds and lyrics in Pony Zone also spoken by males?Where the fuck does this rumor keep coming from? Neither mine nor Geek's Pony Zones have EVER used guided models. I don't think Vul's have either.
>>39421053 (https://desuarchive.org/mlp/thread/39315199/#39421053)Same anon. I figured I'd post an update even though we're still a long way from an alpha launch.>>39421762>Make something relatively normal, but easily hackable and moddable. Then you can unofficially create and support pony addons, and still make investors happy.We're doing this, but "hacking and modding" is pretty much everything we do. It's going to support bring-your-own-datasets and bring-your-own-models. We'd help you mod AI models and plug the results into other projects that use AI, including ones that run on your desktop. We'll need to stick to standard formats, and a lot of pony data is not in a standard format. At some point, I'll dump conversions and conversion scripts for pony data and models that aren't already compatible.https://poll.horse/zLjFxezf
>>39713963My mares are augmenting on their own!
>>39713963Incredible as always. What'd you use to modify her voice to be the heavy Robotic lines? Melodyne? Some vocoder?>Pony Zone NOT guidedAh, right, those used 15 didn't they? With most of your songs nowadays using TalkNet, and now SoVits, guess I forgot.
>>39714770>What'd you use to modify her voice to be the heavy Robotic lines?Just the vocals and a simple Serum patch running through a vocoder called VocalSynth,
>>39713794>I had to sing all the reference voices individually.Damn I had thought so. I can test later with non-polyphonic audio and equally quality stems. Are the models on page 126 the ones you used or did you do use others?
>>39715276Used 4.0 singing models; I believe the correct ones (huggingface) are listed in the main doc, yes.
>>39715310Copy that, many thanks. Is there good documentation on how to tweak things like the transposing or nose ratio values? Also documentation on clustering because I want to use that if it yields better results?
>>39715392https://github.com/effusiveperiscope/so-vits-svc/tree/eff-4.0Transposing should be done in multiples of 12 (=1 octave) if you want the singing to be in the same key as the song. For other options check the readme
>>39715397Couldn’t be more thankful for y’all’s help. I’ll get reading.
>>39711253Thanks to your provided/updated colab, Athena has returned to the grind without any issues; it's actually showing step outputs and such proving she's learning. Still not sure how to read the string provided and how much she's improving over time though. Hopefully it's still a noticeable improvement after another 10000 steps at least. Thanks again.
>>39715793I didn't update anything though?
Decided to play with Sovits today.https://files.catbox.moe/chb93n.mp4
>>39716332FULL VERSION.https://files.catbox.moe/3frq8o.mp3
>>39716332>>39716349These pony AI covers are getting better and better. You guys should consider making a PPP cover album, if that doesn't exist already.
https://files.catbox.moe/4ne1v5.mp3https://files.catbox.moe/ei7uc8.mp3https://files.catbox.moe/b41mzp.mp3https://files.catbox.moe/83k4n9.mp3I didn't think I'd get such good results from my bad singing but wow. This is a game changer.
>>39713345What do
[3.0 Athena - 15000 steps]>https://mega.nz/file/w6ZwDahC#d1beM41XDvyhiZI9vEVdPDDqvcdWgeRdIup-Qou_v-cAthena is now trained up to 15000 steps after 10h of training on just the standard T4.Would've been even higher, but the training section started throwing errors (https://files.catbox.moe/qe8ken.png), so I stopped for now.As a comparison I quickly whipped up a test song line to serve as a base to compare the two:SynthV sample: https://files.catbox.moe/boo90n.wavAthena 9k: https://files.catbox.moe/7scxo8.wavAthena 15k: https://files.catbox.moe/rc07xe.wavIt's a little difficult to tell, but 15k does seem to be better, in that she has less overall noise, seems to have a better time discerning "G"and "K" sounds, and pitch changes seem smoother — less jumpy. The latter change seems like it'd help singing especially well due to being able to better match the vibrato of a singer.
>>39717123Here's the same comparisons, but with -12 semi (1 octave) to better reflect her normal speaking range.Athena 9k: https://files.catbox.moe/h4a62i.wavAthena 15k: https://files.catbox.moe/m5kmtc.wavAttempting to do ordinary speech in SynthV didn't go quite so well, and I couldn't find any good speaking audio for her to make a speaking comparison, but hopefully these singing ones of this range provide some idea.
>>39716349AAAA I LOVE IT
>>39712797Lots of dependency issues, I had to do this to make it workRUN "activate.bat"pip install praat-parselmouthpip install matplotlibpip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.htmlAfter doing this, the gui opens but I still have these errors:Note: Pygame is not available.INFO: Could not find files for the given pattern(s).Note: Rubberband is not available. Timestretch not available.Could not find TalkNet on address 127.0.0.1:8050
>>39716349This is incredible. That reference audio must've been stellar!
>>39716349Hate to break it to you chief, but your pitch is off.Maybe try autotuning your voice before processing it, or take singing lessons
>>39718567I kinda like the idea that AJ is kinda flat when singing. Kinda like how RD has voice cracks, it gives it character. The average voice isn't going to be 100% on-pitch all the time.
>>39718599retarded>>39696145see this post, the last two are great examples of what good singing+so-vits sounds like. You should encourage people to be better at what they do, not feed them bullshit when they are clearly off pitch
>>39718567>>39718609It sounds fine
>>39718599There's a difference between natural imperfection in good singing, and being blatantly off pitch. The reason singing is a talent, and pitch correction is used otherwise, is cause off-pitch singing sounds really fucking bad.
>>39718646Clearly you too are tone deafhttps://www.youtube.com/watch?v=fLexgOxsZu0
>>39718671It sounds like you're being autistic over a song most anons liked. Chill out.It's obviously meant to be a casual cover, otherwise he would've further edited it to get rid of the glitches near the middle
>>39718659kind of the same way that the less talented a woman is at dancing the more she shakes her ass?
>>39718677Glitches? I didnt even listen that far. I was talking about how anon is tone deaf, its off pitch idk how you cant hear it. Anyway, i'm not mad, just constructive criticism
>>39717082That's a new one. Do you not have any audio input devices on your system?
Now i have no experience or expertise on anything AI image generation wisebut in your opinion what would be the most efficient way to do AI animation?would it be essentially making an image and then building off that one for the next frame of animation? I mean from my perspective, AI animation would essentially be nothing more than a more advanced version of image generation since it would work off of each frame of animation, but the problem lies in how it would change each frame and stay consistentcould it instead generate individual assets? maybe it's more efficient to generate the background separately from a character both for the sake of speeding up the process as well as reducing the number of things that could go wrong in the generation processhalf of this is probably nonsense but I know that there's something about what I just said that makes sense
>>39717082>>39718764I've attempted to update 4.0 gui to not assume audio inputs exist; try again.>>39717123It looks like Drive mounting times out after a while or something. Not sure what to say other than you might have to restart the runtime from time to time to continue training.>>39718319None of those are critical to running the program; they're just optional features. I'd be interested in seeing what dependency issues you ran into? I noticed that matplotlib was not in requirements_win and updated it accordingly. >>39716349>>39718567>>39718671I haven't checked the whole thing but the voice seems to be 2 semitones too high (it's consistently out of key). It isn't a pitch correction problem; it's a transpose problem.
>>39718891For animation? I'd be most efficient to have a simple/crude animation for the AI to use as reference and follow, otherwise it'll vary far too wildly to be usable. Looking into a recent tech, ControlNet (>>39660412), would help most for animation. It helps the AI to strongly adhere to character anatomy, composition and posing.Making the background separately and using them for each frame would definitely be recommended. However usually AI generated images don't have transparency, so you can't easily isolate the subject you intend to animate without potentially changing the background in the process.I know that Stable Diffusion has a plugin that allows it to work with Blender. I'll have to experient to see how that would work out for animating rather than stills. Might have potential, I'll try that out later today.Only other advice I can offer is improve your prompting to better get consistent results with stills, it'll help a lot when getting the stills for an animation, and just overall make using the AI easier. Hope this helps.
>>39718891Last year's PPP panel should help give you a general overview of how it could work.https://youtu.be/NpFxmmh8NQ0?t=7860
>>39718891I don't know how it works now, but it definitely should build up on markers, and generate at the image very last moment.Like"A cute Applebloom prancing in a field"It should "understand" the background, and that there is one character.Then search for (or understand) what the character skeleton is, and animate the skeleton.When it's done, it can generate a series of crude image of markers and metadata to give to a "rendering AI".This would allow both image consistency and world beliveability.But I have no fucking idea on how to make that.
>>39718985>>39718990>>39719096extremely helpful responsesthis project doing great, thanks for the info!
what does the thread think about AMD cards used for voice synthesis AI? Did they catch up, is it enough to make due, or is the general consensus still Nvidia as the go-to, be all and end all, for AI?
>>39719503AMD will be forced to catch up. Thanks to AI technologies being so popular and their competitor being in higher demand as a result, AMD doesn't really have much choice but to adapt; lest they become left behind. If they hadn't done this already, they will eventually.>Pic relatedThat Twilight reminds me of the AI Seinfeld visual style, and now I wonder how soon we can achieve something similar. Would also provide something entertaining to watch on the PPP cytube between streams.
Bump.
The colab talknet thing gives me a 403 even when i open the link in a new tab. Am I doing it wrong or is it broken
training code for PITS is uploaded:https://github.com/anonymous-pits/pitsmight be worth to take a look
>sovits EN training colab 4.0.ipynbin the "Install dependencies" cell, it has trouble installing the fairseq module. This can be fixed by adding a cell below with this code:!python -m pip install --upgrade pip!pip3 install fairseqthan running it and then running the dependencies cell again.HOWEVER im getting another error when running the "start training" cell:The tensorboard extension is already loaded. To reload it, use: %reload_ext tensorboardReusing TensorBoard on port 6006 (pid 6704), started 0:03:32 ago. (Use '!kill 6704' to kill it.)Traceback (most recent call last): File "/content/so-vits-svc/train.py", line 321, in <module> main() File "/content/so-vits-svc/train.py", line 60, in main validate(hps) File "/content/so-vits-svc/train.py", line 45, in validate ckpt = sovits_utils.latest_checkpoint_path(hps.model_dir, "G_*.pth") File "/content/so-vits-svc/sovits_utils.py", line 355, in latest_checkpoint_path x = f_list[-1]IndexError: list index out of rangeIm not sure what to do with this, I assume its another colab error done by google update fuckery?ive tried to look into the sovits_utils file but that's pretty confusing as I do not know how to solve the issue with the " x = f_list[-1]" .
>>39710357Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject
>>39720229>The Colab TalkNet gives me a 403 even when i open the link in a new tabDid you check this already? Step 3 itself also says, "Restart the Runtime before running this cell!" so check that you did that first, too.If you're using this Colab instead of the one linked in this thread: https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB?usp=sharingThen also make sure that you run step 3 twice as it says for the instructions @ that cell
>>39721142Did you download the pretrained models?
Hi friends. I want to TTS a MLP fan fic story using mane 6 or celestia voices. Is there any easy way to do this? I tried tortoise with applejack but it didn't give a good result.
>>39721571Wait until 15.ai comes back
>>39721568Ive run all the cells as they needed to be run, i did notice that the G_ models were not being downloaded to the sovits/logs/44k folder so I made a new cell to download them in there BUT it still come up with the exact same error.
>>39721577>>39721568Ive tried again, this time using the shutil and sticking it just above the 'start training' cell, and now its working again. Colab seems to be acting bit retarded in recent past days for no reason at all.import shutilshutil.move("/content/so-vits-svc/D_0.pth", "/content/so-vits-svc/logs/44k") shutil.move("/content/so-vits-svc/G_0.pth", "/content/so-vits-svc/logs/44k")
>>39721142Fixed; had to add Clone = "44k" at the top of the training cell. Notebook is updated.
>>39720315>https://github.com/anonymous-pits/pitsNot sure what format the dataset's going to be in. The training README says "WIP" so I assume that's going to be coming later.
>sovits EN training colab 4.0.ipynbI keep getting this error in the "Segment training set and generate configuration files" cell:Traceback (most recent call last): File "/content/so-vits-svc/preprocess_flist_config.py", line 38, in <module> for speaker in tqdm(os.listdir(args.source_dir)):FileNotFoundError: [Errno 2] No such file or directory: './dataset/44k'what do
>>39721925Did your resample process any files?
>>39721927No, it didn't. It doesn't do anything at all when I run the cell
>>39721954What does your raw dataset file structure look like?
WHY AM I NOT SEEING G4 FILTER ANIMATION OVER ROTOSCOPED HORSE FIGURINES? HUH? GET YOUR PLUSHIES OUT, PUT THE GREEN HANDS ON AND GREEN SCREEN ON AND START ENACTING TWILIGHT SPARKLE PLUSHIE SHOVING HER HORN UP SPIKE'S ASS.https://www.youtube.com/watch?v=xm7BwEsdVbQAlso he saw the signs before it happenedhttps://www.youtube.com/watch?v=WipRVvvNt4Y
Hey Synthbot, just letting you know there seems to be a bunch of partially broken Mayor Mares in the animation assets. >Glitchy manes>Duplicated skewed objects>Extra stretched shaped on certain frames>Blinks leaving eye ghost>etc.
Hello everyone,For the past month or so, I have been working on a project that I call "Steed Speak." It is my attempt to create a locally-running, universal UI for all the various voice-generation solutions out there. It is still under construction and not available yet, but it's far enough along at this point that I am confident I will eventually get it to a sharable state. I wanted to give a heads-up to developers who might be actively developing other UIs and to gather feedback on the idea and UI design.With Steed Speak, you will be able to generate audio for different architectures (e.g. controllable talknet, so-vits-svc, etc) from a single UI. Steed Speak puts each AI architecture in its own Docker container to avoid any issue with conflicting library versions, and a "command and control" container presents the UI and communicates with the AI containers. I will create pre-built Docker images, so you won't need to set up any of the environments yourself (although the Dockerfiles will be available if you want to). You just need to install Docker and Graphics drivers and then download the images and volumes. Steed Speak separates the pre- and post-processing options (e.g. pitch shifting, reducing metallic noise, stretching the output audio, etc.) from the architecture-specific options so you can use all of the pre- and post-processing options with ANY architecture.I designed Steed Speak in such a way that we can add more architectures as we discover and train models for them. Separately, we can add more pre-processing and post-processing options and they will automatically be available for all architectures. My intention is to create a platform that we can keep adding to over time and gets better as we add to it, rather than starting over with a new UI every time we play with a new architecture.The screenshot is an example of what the UI will look like, but keep in mind that the layout and options presented in it are NOT final. The project will be open source.
Do the additional steps from so-vits 3.0 for adding ROCm support on linux apply to 4.0 or are there different requirements for pytorch in the additional pip commands? Already dealing with torchvision refusing to download 0.11.1 on python 3.8 via pyver so knowing about support issues now would be good.
>>39722800Definitely looking forward to an all-in-one GUI for the various AI we work with. The image you provided shows it's very clean looking. Nice work!Will it be usable as an interface for a Colab, or only for local inference? Or perhaps make a version suited for those if not?>Steed SpeakNot too sold on the name though. Perhaps something like... Mare Muzzle? 'Cause that's where mare vocal sounds come from, it rolls off the tongue better. Also mares.Steed Speak sounds a bit too much like TeamSpeak, so there's also that.Alternative ideas: Hay Say, Neural Neigh, IWTCIM (I want to clearly interface mares), Pony Parsing Program.
>>39722941Thanks for the feedback! I am glad you brought up Colab because I wanted to reach out to the community about it. I have never actually used Colab before, so I don't know its capabilities or whether it supports Docker containers (A quick Google search shows that it did NOT in 2020; I don't know whether things have changed since then). My plan right now is to focus on local inference, but Colab is something I plan to research at some point, especially because (from what I understand) it would let people who don't own sufficiently powerful graphics cards to use Steed Speak. If anyone else wants to help adapt this project to be Colab-friendly, I would very much appreciate the assistance. If Docker containers don't work, perhaps there is another approach that can be taken, like virtual machines.I am open to changing the name. I'm not 100% sold on Steed Speak, either. Other names I have considered include "Neigh I", "Pony Voz" and "Horse Voices".
Is there a chart or other documentation on the vocal ranges of the character AIs? I notice that I’m often signing too high and it causes the voice to just crunch out.
>>39723041BGM mapped out the vocal ranges for the Controllable Talknet characters:https://youtu.be/F9PdoJ9zmrw?t=640If you want to skip his explanation and go straight to the results:https://youtu.be/F9PdoJ9zmrw?t=748
>>39723063Thanks this is a good start. This seems to be based on the 3.0 not 4.0 models and I know for a fact from testing it that Pinkie’s voice dies at around Eb5/E5 not E4 like the video says.
more 4.0 modelsAJ speak: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applejack%20(speaking%20mix)FS speak: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Fluttershy%20(speaking%20mix)TX mix: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Trixie%20(mix)https://files.catbox.moe/kb8an1.mp3I did an experiment and put the speaking and singing speakers for Trixie into the same model (since it's the same VA). Timbre leakage doesn't seem too big of an issue. To me it actually sounds like the "singing" model used in TalkNet (which is what I was used to) might have been trained on speaking data? Then again there's nothing stopping you from just using the "speaking" model for singing. Here the EQG Trixie timbre seems to dominate for singing (makes sense given the only other source of singing data is Road to Friendship/We're Friendship Bound). She still can't seem to roll Rs.>>39722800Nice. I'm pretty sure there is a distinction between adjusting the pitch of the source audio and the pitch adjustments in TalkNet/so-vits-svc.>>39722925AFAIK few to no people have tried this, fyi, so document everything!
>>39721533>Step 3 itself also says, "Restart the Runtime before running this cell!" so check that you did that first, too.Yeah I did that>https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB?usp=sharingThis one fails on step 3 no matter how many times I run step 3. "No module named 'dash'". If it had to be installed in step 2 ctrl+f pulls up nothing, not even an error messageTo prevent any confusion I was using this onehttps://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharing which seems to get past this okay, I just can't get to the UI
>>39723550What browser are you using?
>>39723569Tried firefox and ungoogled chromium
>>39723621https://colab.research.google.com/drive/1sAbqSQj9P56TTpsU7bzbobzAxmydvUSA?usp=sharingWOMM on ungoogled chromium, Windows. Do you have any add-ons or other things that might be blocking it?
>>39722800>install DockerThat's no-go for my win7 system. Also from what I experienced with the combined talknet + so-vits ui is that my GPU can barely run with the both models loaded at the same time.But I do hope others will find this useful (and I agree with assessment that the name for it is bit mouthful).
>come back two more weeks later>still nothing from 15The plan?
>>39723646Disabled all my plugins and still no dice. Tried it on a totally different computer too and got the same result. I have used this before and it worked (months ago), I don't think the browser should affect much of anything considering it's all done server side, so I really don't know. Guess it just isn't meant to be
>>39723770>The plan?Play tf2
>>39723225I still have problem with the training colab not downloading the G_ and D_ files into the /logs/44k folder.>>39721733
>Page 9
>>39721268Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject>>39722800An all-in-one solution for the various TTS options would be very helpful. The UI looks fine to me, no obvious issues.>>39723225>https://files.catbox.moe/kb8an1.mp3The normal speaking parts of that sample are sounding pretty good to me.
>>39723834Should have been working. I added another cd into the top of that step just in case the directory changed though.
>>39723788Does the last step not work either?
Adventures in pyenv>get python 3.8 working under pyenv>go to install requirements.txt after the additional args for ROCm ran fine>installer locks up on PyQt5>killsig it and run it with -vv>PyQt5 has a fucking license screen>type "yes" to get on with it since this is the one thing that is holding it all up>pip decides to consume all of my fucking ram, shove everything I got running onto the page file and then still sits there doing nothing>any other attempt to install it just does the same attempted ram hostage situationFuck you too PyQt5 and fuck your license agreement
>>39725311Update: pyenv installed an ancient pip, newer versions have commands that autovalidate license stupidity so now the real problem is waiting for an external download of torch==1.10.0+cu113 to finish
>>39725363torch==1.12.0+cu116
>>39723161Reading comprehension, Anon. The vocal ranges in the video are Talknet.
ROCm configs work under the new 4.0 setup, 3.0 refuses to run due to some index errors.
I know nobody asked for this, but here are some 4.0 models trained on S1 Rainbow Dash and Applejack data.https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applejack%20(s1)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Rainbow%20Dash%20(s1)https://files.catbox.moe/q59qmx.mp3
>>39722415Thanks. It'll be a while before I can fix that, but I added it to a backlog I'm keeping of animation data issues to fix.
>>39723004>I don't know its capabilities or whether it supports Docker containersIt does not, and it probably never will. If you want to support Colab, here are some notes:- Colab gives you a VM where you can install and run whatever you want.- Usually, the first block in a Colab notebook is used for checking if the instance has enough compute available. The second block is usually used for installing dependencies and, if needed, for mounting the user's Google Drive so it can be accessed like a normal file. The rest of the blocks are for running the script, which could be as simple as calling a main method from your repo.- Providing a user interface on Colab can be a bit tricky. Anons here have taken a lot of different approaches. (Easiest) You can create a Colab-specific UI using whatever form elements Colab supports, or whatever widgets something like ipywidgets supports. (More compatible) You can use JupyterDash, which would also let you show the UI in Colab itself, but would give you more flexibility in what UI you can display You can use JupyterDash outside of Colab, which would let you provide the same interface both in Colab and on local compute. (More flexible) You can run a web server on Colab, then expose it using a tunneling service like ngrok. I think ngrok itself stopped allowing anonymous accounts at some point and anons switched to something else. I forget what.As an alternative that's worse than Colab but better than only local execution, you can let people choose to bind the server to 0.0.0.0 instead of localhost. That would make it easy for people to run your container on rented GPUs.
>>39725548>(More flexible) You can run a web server on Colab, then expose it using a tunneling service like ngrok. I think ngrok itself stopped allowing anonymous accounts at some point and anons switched to something else. I forget what.The Stable Diffusion community likes using gradio, and they also have an extension that allows tunneling to cloudflare, remote.moe and localhost.run.
>>39725013Yeah 3 and 3b. There's no error messages just the gui embed and link gives me a 403
>>39725470Neat, S1 voices always have that charm to them
>>39725548Thank you so much for the notes. I have been writing the UI in Dash by Plotly and, luckily, it looks like it would be straight-forward to convert it to a Jupyter-Dash app: https://dash.plotly.com/workspaces/using-dash-in-jupyter-and-workspacesThe AI Docker containers (which the main UI communicates with) are running Flask apps. I'm not sure this setup would translate well into Colab. Regarding my comment earlier on Virtual Machines, I'm having trouble finding any info on whether it is possible to run VMs *within* a Colab VM, although judging from the fact that Docker is unsupported, my guess is that it's not possible. I could attempt to install all the architectures together in the Colab VM and at least isolate their Python libraries from each other using Python virtual environments. If there are any conflicts in Linux package dependencies, however, it won't work.At the very least, it should be possible to have separate Colab projects for each AI architecture and provide the same UI for each of them.
>>39725990Any dev console errors? Might it be network related?
Mare?
>>39726824Mare.
[SoVits] Guardians of Pondonia Scene - But everyone is voiced by AI Athena>https://files.catbox.moe/qcyxbq.mp4Turns out for most masculine voices she needs to be pitched up by at least +3, which on reflection makes sense as that's how much Shawn typically pitches up his samples for Athena's original voice. I've also since added her to my shared SoVits drive which is now at 25000 steps. Won't be doing any further training with her on 3.0 as I don't think she'll improve any further with her current dataset. With the improvements of 4.0, and the changes. I'm curious to see how they differ. More than likely I'll compare them at their 9k or 15k models to save time and see if 4.0 is worth the extra hours.
>>39725470>https://files.catbox.moe/q59qmx.mp3kek'd and appreciate'd
>>39726861STALLION
>>39726083Just a bunch of "analytics logging failed". It's trying to access play.google.com/log but it seems google's own cross-origin policy blocks it. Don't think this is related to the issue unfortunately
>>39727664IDK what to say anon. Maybe there's a way to use pyngrok to tunnel to the UI?
https://files.catbox.moe/t3icqp.mp3https://files.catbox.moe/06wzjw.mp3Really happy with parts of this :3
>>39728173Neat! Is the 2nd one a reference from some kind of musical?
>>39724634Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject>>39728173Nice.
>>39728278These are both from the album/film Xanadu, not an intentional reference if it is. The first link is “Xanadu” (title track) and the second one is “I’m Alive.”
I want to use the pony text to speech site but its down :(
>>39725470how can I use this?
>>39728518https://github.com/effusiveperiscope/so-vits-svc
>>39720315Attempted to train this on my 3080 but it's SLOW (as in, I'd need about 30 days to train up to the same number of epochs as the model they have). Probably requires bigger computer.
how much help is needed in regards to this>https://caption.cutemares.xyz/I've gotten a few done so far and plan to do to it for at least an hour but what's this progress value? is it the overall progress of how far we are through all the images we have? lastly, I want to say simple grunt work activities like this are highly accessible for those who don't have the knowledge on how to program stuff but still want to help with the projects
Two questions regarding so-vits1, where is the link to the 4.0 GUI? And is it compatible with 4.0 v2 models?2, I think there might be a chance that train.py assumes that I'm using a multi GPU setup, when I'm not. How do I take care of that?
>>39729464>2From what I see in the training.py the function of main(0 has the code:n_gpus = torch.cuda.device_count()so it shouldn't detect gpus that you do not have installed in your motherboard.>1>where is the link to the 4.0 GUI? see >>39728770 that is also linked in the OP main and quick guide doc.>compatible with 4.0 v2 modelsdo you mean 3.0? if that's the case no, the 3.0 models will not work with the 4.0 interface.
>>397294641. AFAIK the 4.0 GUI should not be compatible with 4.0 v2 models; they use different vocoders (although I think there are different versions floating around called "v2").2. What makes you think this?
Two questions:How good, if at all, is so-vits-svc at whispering/soft talking?Which voices are available?
Astralite's Pony Diffusion v4 came out earlier today:https://huggingface.co/AstraliteHeart/pony-diffusion-v4
>>39686748You mfs made some amazing shit with this. How'd you feel when voice ai suddenly became popular?https://youtu.be/iSZhUKejc3Y
>>39730471>How good, if at all, is so-vits-svc at whispering/soft talking?Here's the best I can get with a cluster based on whispering:https://files.catbox.moe/66h2iu.mp3It's rough.>Which voices are available?See https://docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.ce7da4l3o6jf(scroll down)>>39730500>How'd you feel when voice ai suddenly became popular?What do you mean?
>>39730497let me just quote it to the anchor post so other anons would see it easily (also its nice there is a pruned safetensor model included) >>39686757
>>39730963>Sparkle in her eyes>One paw lifted>Blue and white scarf>Butterfly near her tailIt's still misidentifying and making stuff up, so it still has a way to go. But other than that very good describing of the very cute mare. So more progress than setbacks; things are looking good.>>39729647Wait, there's a v2 of 4.0 now? There was already a split between 3.0 and 4.0, now the latter is divided even further? That's gonna make managing which models belong to which a little trickier, and would require even more colabs to keep track of, to both infer and train either of the 4.0 versions.
>>39730963>pawFucking furniggers
>>39730963>paw
>>39731038>>39731105Just like AI can't see the difference between a man and a tranny, it reveals the truth that pony is furry.
>>39731038>>39731105>>39731133To be fair though, I think GPT-4 just mistook her closed wing for a raised paw somehow. Much like how it mistook her striped socks for a scarf. As it is on the upper part of her body and considered raised. The "butterfly near the tail" could be the green leaves at the base of the flower. Even though that was also recognised as a flower earlier, unless it considered only the stem and bloom as the "flower".>>39731043So it seems to prioritise the identity and positioning of objects compared to other objects rather than if those objects make sense in the context of those others.GPT-4 logic probably:>Mare identified>Mare has raised paw on upper body>Unusual>Will not criticise reasoning >Constructing narrative in a manner allowing anomaly make sense
>>39731133furries & bronies are fundamentally very different and seperate from eachother. Majority of furfags are nowhere near as creative or imaginative as horsefuckers and nowhere close to being as close-knit as us. The only thing they share in common is wanting to fuck non-humans. /mlp/ has autism and degeneracy (fictional banter), furfags have Aspergers and 'degeneracy' (war crimes).that's the tl;dr anyway
>>39730963But I thought GPT-4's currently available models couldn't handle images yet? Did I miss something?
Seems like I did I guess.
>>39731589we gettin recursive in here
>>39731003I haven't been impressed by DSP based vocoders thus far. From what I saw of the v2 demo on huggingface when it was still up (to be fair I didn't give it a thorough look) I didn't find it worth pursuing compared to the existing architecture.
> Write a short comfy story about this! What are they talking about with each other :)>The snowy village of Starfall Hollow was bathed in a warm, orange glow as dusk approached. The quiet hum of the village was interrupted only by the sound of the ponies' hooves gently crunching on the snow-covered cobblestone streets. > >Mira, Celeste, Luna, and Solstice were four best friends who had grown up in the village together. They had spent countless hours exploring the snow-covered landscapes and cuddled up in their favorite cafe, sharing stories over steaming cups of hot cocoa.> >As they walked through the village, the four friends were wrapped in cozy winter scarves and hats, their breath visible in the crisp, cold air. The stars began to appear in the night sky, a breathtaking display of twinkling lights against the inky blackness.> >"What do you think those new neighbors are like?" Luna asked, breaking the silence.> >"I heard they're from a far-off land where the snow never falls," Celeste replied, her voice filled with curiosity.> >"I can't imagine a life without snow!" Solstice exclaimed, shaking her head in disbelief. "What would we even do during the winter?"> >"I suppose they'll have to adjust to life in Starfall Hollow," Mira chimed in. "And we can help show them the ropes!"> >The friends smiled at each other, already excited at the prospect of new friendships and adventures. As they continued their stroll, they talked about their favorite wintertime memories, from the snowball fights they had as young foals to the magical night they had discovered a secret ice-skating pond deep in the woods.> >As they neared the new neighbors' house, they saw a family of ponies unloading boxes from a wagon, their faces flushed from the cold. The friends exchanged glances, their hearts swelling with warmth and camaraderie.> >"Let's give them a proper Starfall Hollow welcome," Mira suggested, her eyes shining with excitement.> >Together, they approached the newcomers, offering their help, warm smiles, and a plate of freshly baked cookies. The new family, grateful and eager to make friends, accepted their kindness, and soon they were all laughing and talking, sharing stories of their past and dreams for the future.>>As the night wore on, the snow began to fall once more, blanketing the village in a fresh layer of sparkling white. The glow of the lamplights and the sound of laughter filled the air, a testament to the magic of friendship and the cozy warmth of a snowy winter's night in Starfall Hollow.
15.ai got updated?
>>39731937Could be any number of weeks now!
AI Twilight Sparkle proudly sings some Syrian patriotic tunes! Seems like So-Vits surprisingly works well outside of English too! https://www.youtube.com/watch?v=t858yQSPwvk
Hey what's going on with 15? Seems like the page has changed again. Can't take a screenshot since I'm currently mobilefagging, but it looks like things are going back up.
>>39731910Comfy
>>39728323Live in ~1 hour. Animating.cytu *dot* be/r/PonyPreservationProject
>>39732131I'm kind of hoping it means it'll be up sometime in the next few days, if not today. But it still blows my mind how "two more weeks" turned into six months. lol
I would like pony TTS please. I want twilight sparkle to read a .txt file for me. 15.ai is not up.
>>39732207Apparently the last time it went down it was for over a year kek
>>39732265It's been down for 21 of the last 24 months.
>>39732265>>39732379The site didn't go down until September 2022, though. No development in that time, but it was still easily accessible. It's only been down 6 months.
>>39732265>>39732379What do you fags get out of making blatant lies like this?
>>39732427I'm just going off of what I've been told. Apparently it went down for a year according to one of my French.
>>39732436You’re a fucking shill and you know it. Fuck off.
>>39732440The fuck are you on about?
>>39732444How do you live your life having sold your soul to a multi-million dollar company? 11fags are fucking pathetic.
>>39732456 I've never even used 11.ai kek
>>39729154seriously what's the deal with this?is this some abandoned project or something because every time I log back on to do more caption work the percentage value never changes is this like my personal batch for my IP or something or the overall percentage that we are through the image stockpile?
>>39732659correction the percentage value never changes unless I submit another description
>>39731910Good God, is this real? This is truly amazing. In fact I'm mad at myself for not being amazed more.
>>39732451People here are highly strung about 11 shills
>>39732915Apparently so, huh?>Mentions a rumor I heard about 15.ai>"HOLY FUCK YOU SHILL!!!"Kek
You guys are still waiting? Don't you know 15 was shot by the ceo of openai? I'm his long lost brother
>>39732122very nice, Twi voice is little bit too high pitch in some places however its over all good attempt.
>>39732659I'm pretty sure it's the overall percentage for the whole dataset. Considering that it's like nearly every piece of pony art ever, it's a lot to describe.
>>39732659mostly abandoned. As far as I'm aware the backend and database are still completely functional, but1. the existing dataset actually already performs fairly well with BLIP2, cf Astralite's experiments. The plan for the future is to use the ~10k existing captions to develop a reward model to bootstrap a perfect captioning model from rating preferences, rather than directly writing captions.2. I have 0 time to maintain anything anymore, working on a million other things. In particular, the domain cutemares.xyz is going to expire in about a month. Contact me if you know anyone that wants it.3. GPT-4 demonstrates such absolutely stunning capabilities on image comprehension that I'm honestly pretty depressed about it all. Example for pic related:>The image captures a cozy and warm scene set in a wooden cabin interior during a snowy evening. A picturesque snowy landscape of a quaint village is visible through the large window, giving off a serene vibe. The night sky is filled with stars, illuminating the scene with a soft, ethereal glow.>A cute, green-haired pony with pink eyes, yellow fur, and a pair of glasses resting on its nose is lying comfortably on a plush, patterned, red armchair. The pony appears to be engrossed in a book, with a small, content smile on its face, which exudes a warm and relaxing atmosphere.>Near the pony, a small globe with earth-like colors and a few open books are resting on a wooden table. This setup indicates the pony's interest in geography and knowledge.>A fireplace with a crackling fire casts a warm, inviting glow on the room, further enhancing the snug atmosphere. A sleepy orange fox rests on a lush green rug near the fireplace, curled up in a tranquil and content state.>A wooden shelf above the fireplace holds several items, including a model airship, books, a snow globe, and an amethyst crystal cluster. These items showcase the adventurous and curious nature of the pony.>Next to the armchair, a plump pet bed cradles a ginger cat snuggling with its kittens. Their peaceful slumber adds to the overall cuteness and serenity of the scene.>Overall, the image radiates a feeling of warmth, relaxation, and cuteness, depicting a perfect moment of peaceful respite on a chilly winter evening.tldr the entire project is becoming increasingly obsoleted.>>39733425Funny you should mention that. Rumour has it that OpenAI will be releasing a TTS service within 1~2 months from now.Both 11 and 15 are going to be irrelevant soon enough.
plz, i want to pony TTS... please....
>>39734554>ClosedAI>making anything irrelevantlol
>>39734554OpenAI wouldn't use any IP for TTS voices because they would actually risk the banhammer coming down from multiple companies, including Hasbro. And it'll be close-source still, like all their other stuff, so there will probably be crazy filters in place.
>>39734554>Funny you should mention that. Rumour has it that OpenAI will be releasing a TTS service within 1~2 months from now.Source?
>>39734554thanks for the responseI just wish I could contribute to the overall PPP but this was the only thing I could do with my skill set but now that it's gone I've got nothing to contribute
>>39734554>Rumour has it that OpenAI will be releasing a TTS service within 1~2 months from now.And it will be the most cucked shit imaginable. "We've alerted the Florida Sheriff in charge of arresting you for being mean online" tier censorship.
>>39735391I wouldn't be surprised if this is like Whisper where they just open source it. It's not that big of a deal for them and a good open model would decimate the other voice AI companies which would benefit them in the long run, since they wouldn't get big enough to be able to pivot to their territory.
>>39734751SHUT THE FUCK UP
https://www.youtube.com/watch?v=84sVeYJ_poQDiscord song
>>39735389Sorry. There are a lot of other datasets that *could* be useful in the future, but would require an entire new data collection platform for...(Some small ideas: action description --> animation, pony music descriptions, RP logs with larger powerful foundation models)>>39735391>>39735189For sure. Never going to get Twilight moaning from an OpenAI service. But at least 11 will sink.>>39735459Don't get your hopes up.......
>>39735652I dunno, I think there's too much of your own voice in there and it doesn't seem to sound enough like Discord. If John deLancie were singing this song, his delivery would be probably be more emotive.
>>39732179Live now. Animating.cytu *dot* be/r/PonyPreservationProject
Page 9 bump.
Hey Vul, can you please change the default config.json for the 4.0 training colab to being 1000 eval_intervals instead of 800?It's a little triggering, and also makes it difficult to directly compare 3.0 models with 4.0 models due to the inconsistency in saving rates.
>>39686757https://huggingface.co/Amo/so-vits-svc-4.0_GA/blob/main/ModelsZip/TreeHugger_69k.zipTried to train the Tree Hugger 30s dataset for two days straight but the result is not very impressive.https://files.catbox.moe/57wjpc.wav>>39686757Also, here is a tutorial for raining the So-Vits-SVC 4.0 models offline with your own gpu (if your PC can run Talknet or Stable Diffusion it should be able to run this training too, just need to lower the config file batch to just 1).Video tutorial:https://www.youtube.com/watch?v=WIlq26jtr9kCode instructions for creating training conda env:https://github.com/AmoArt/UI_for_So-Vits-SVC/blob/main/folderDumpText/offlinesovit/info.txtThe ipynb training file:https://github.com/AmoArt/UI_for_So-Vits-SVC/blob/main/folderDumpText/offlinesovit/sovits_EN_training_colab_4_0_offline_(231703).ipynb
>>39736782OK
>>39698078I seem to have run into an issue with the 4.0 inference notebook. Apparently it expects there to be a file that's not a zip?
>>39737543extract() step is supposed to extract the zip. Not sure why it's not in your case; seems to work fine with the default download.
>>39737570>https://huggingface.co/datasets/HazySkies/SV3/resolve/main/sovits_athena_44khz_10000_sv4.zipHere's the file in question I used as the "Other Downloads" Step. Does it matter that I didn't press any "HF Downloads" of the buttons? I figured it would've worked and been quicker were it just to run the custom model. Not sure what I'm missing here. Athena is also nearing 20000 steps. 4.0 seems to be quicker than usual, might be imagining it though.
>>39737593I see what the problem is--the .zip is directly enclosing the checkpoint files rather than a folder containing them.
>>39737600I've modified the notebook to work with .zip files structured like this.
>>39737609* I neglected to create an actual model directory for the ckpts, whoops.
>>39737600>>39737609I see, guess that might explain why it dumped the checkpoints into the content directory prior to erroring. >Modified the colab to work with that structureBased, and fantastic. This is how the 3.0 ones are/were structured before, so this is another way to keep the training and inference methods consistent between the versions.>>39737628Oh, is this a result of the new format compatibility? Or and oversight before the change?
>>39737642I was making the change quickly and neglected to test the actual inference. It should be working now.
Are we back?
[SoVits] Athena wants to battle>https://files.catbox.moe/u3ymwz.mp4It's still remarkable how with SoVits we can not only ponify vocals, but also instrumentals too.4.0 Athena is also now at 25k steps, and so is equal to her 3.0 counterpart in terms of training and is available here:>https://huggingface.co/datasets/HazySkies/SV3/blob/main/sovits_athena_44khz_25000_sv4.zip
>>39731589who the fuck even are you anywayshow yourself bitch
Okay, new Bing is kind of impressive. Hopefully it doesn't get too neutered.
>>39728173I've been listening to this all week. Please make a longer version.
>>39740022I plan on making it longer since these test runs are done without vibrato and I’m a little flat on Pinkie’s vocals. Thank you for such nice words.
>>39731367thank you anon you articulated it better than I ever could
how's sovits for very small datasets? ie, side ponies with ~1 minute of dialogue that we'll never hear from again
>>39741063SoVits seems to do alright with very small datasets, I tried training Nurse Redheart a few weeks ago and she did alright. Although, I may have added too much noisy/muddled samples in her already small dataset, and only trained her to 7000 steps due to suspecting the lower quality dataset; so part of the lack in quality is likely on me. Nurse Redheart Original lines: https://files.catbox.moe/nf5tfz.wavSoVits 3.0 Nurse Redheart (7000 steps): https://files.catbox.moe/csbieh.wavI plan to retrain her later on, particularly if I can actually manage to clean enough of her audio to be confident that she'll do well.
Looks like the 3.0 sovits colab is down/broken; No module named 'numpy.typing'.Might train 4.0 mares in the meantime. Speaking of, is anyone presently training 4.0 Celestia/Luna/Cadance?I'm considering training them, as I need two of them (Celestia & Cadance) for my next song ponification, but wouldn't want to begin if existing training is happening already.
>>39741597Yes
How long until pony AI evolves to the point where it becomes incomprehensible to regular humans
>>39741909I support the mare singularity.Singularimare? Maregularity?
>>39738623Nice!Wouldn't it be better if you split the GB channels tho?I assume you gave the whole music, and the AI had to decide "when" she could sing.By doing the three channels (heck, you can try noise too), and mixing it afterwards, it would be better I think.>>39739934Sweet!
>>39742000Equine-gularity.>>39742106>Split the channels?>Might be betterYup. That's exactly what I had done this time around, unlike my previous tests with Fluttershy singing Terraria Underground. My favorite is her bass track. For some reason mares do low bass-y and percussion-y stuff really well.[SoVits] Athena Wants To Battle - Individual Channels>Combined: https://files.catbox.moe/7hibfb.mp3>Bass: https://files.catbox.moe/xg2cuc.mp3>Bit1: https://files.catbox.moe/jdrqfb.mp3>Bit2: https://files.catbox.moe/bt818l.mp3The earlier video is also a little different sounding in that I used an "AM radio" filter in order to give it that old feel. Like it was being played from a GB speaker or something.
>>39741597>>39741736more 4.0 modelsPRINCESS:https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Celestia%20(singing)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Cadance%20(singing)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Luna%20(singing)CMC:https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Scootaloo%20(singing)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Applebloom%20(singing)https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Sweetie%20Belle%20(singing)GLIM: https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main/Starlight%20(singing)I'm going to switch to training only while I sleep because I want to do other stuff with my GPU/RAM during the day.
>9
I tried using BigVGAN (https://github.com/NVIDIA/BigVGAN) with TalkNet as the 22 kHz model uses the same Mel settings. But, the results are worse:>TalkNet HiFi-GAN (downsampled to 22.05 kHz): https://files.catbox.moe/j9rhoy.wav>BigVGAN Base 22kHz: https://files.catbox.moe/hoiyn0.wav>BigVGAN 22kHz: https://files.catbox.moe/36u6l0.wavFinetuning would help, but I'm not sure how much.
looks like 15 has a new maintenance page
>>39743012Many thanks for training so many models. I'm planning to add them to Docker volumes that I'll be redistributing for my project (see >>39722800). Please let me know if you have any issues with that. Quick update on the project:I have a functional Docker container for so-vits-svc 3.0 now and am able to upload audio via the UI, select a character from a dropdown, generate audio and listen the output via the UI. I have already made a lot of progress on the so-vits-svc 4.0 Docker container too and expect it to be functional within a day or two.However, there are many limitations and I have many to-do items remaining before the project can be released. None of the pre- and post- processing options have been implemented yet, for instance, and only .wav input files work at the moment.Since the Volumes Backup & Share extension for Docker Desktop is currently limited to 10GB per volume, I am planning to package up models in <10GB bundles. For so-vits-svc, this will likely take the form of mane 6 + CMC speaking, mane 6 + CMC singing, and then various assorted collections of other characters.I still haven't settled on a name, but I have a few more ideas: "Neigh Say", "Herd Words", and "The Horse's Mouth". As a more generic name, I like how "Cross Talk" sounds, but it carries a negative connotation in electrical engineering. Another one that gave me a chuckle but I won't use is "Mare Swear". My favorite so far is "Hay Say", which Hazy suggested earlier.
so when are we gonna start using this tech?>>92184008https://rentry.org/nv3q9https://rentry.org/f34hy
>>39745558>>>/g/92184008