[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

[Advertise on 4chan]

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.
  • There are 109 posters in this thread.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AltOP.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
TwAIlight welcomes you to the Pony Voice Preservation Project!
https://clyp.it/tm03e5en

This project is the first part of the "Pony Preservation Project" dealing with the voice.
It's dedicated to saving our beloved pony's voices by creating a neural network based Text To Speech for our favorite ponies.
Videos such as youtu.be/GuJKTodX1FA. or youtu.be/DWK_iYBl8cA have proven that we now have the technology to generate convincing voices using machine learning algorithms "trained" on nothing but clean audio clips.
With roughly 10 seasons (9 seasons and 5 movies) worth of voice lines available, we have more than enough material to apply this tech for our deviant needs.

Any anon is free to join, and many are already contributing. Just read the guide to learn how you can help bring on the wAIfu revolution. Whatever your technical level, you can help.
Document: https://docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU

We now have a working TwAIlight that any Anon can play with:
https://15.ai/
https://derp.link/vCzm2 (48KHz Training)
https://derp.link/hdJQF (48KHz Synthesis)
https://derp.link/NR7Xi (Ngrok Synthesis)
https://derp.link/YTJ94 (Guide)

>Active Tasks
Cookie is working on controllable speech
Research into animation AI
Research into pony image generation

>Latest Developments
Clipper sorts animation files (derp.link/O24pp)
Clipper looking for AI skit ideas (derp.link/JfVsA)
Clipper collecting sound effects from show (>>36723767)
New DeltaVox (>>36812261)
Training notebook for HiFi-GAN (>>36874641)
New guides and notebooks for training/exporting models for DeltaVox RS (>>36898031)
Clipper voice dataset (>>36901235)
Clipper added to the HiFi-GAN notebook (>>36905521)
Train your own CLIP model (>>36930047)
GPT-2 model released (>>36930714)
Animation script finished (>>37003821)
New audacity to TacoTron training text tool (>>37025693)
15 makes updates to test site
TalkNet as a potential replacement for TacoTron (>>37040781)
TalkNet update (>>37082982 >>37119597)
Public contributions reintroduced for next year's panel (>>37099451)
Singing Talknet models (>>37134971 >>37144858)
Animate automation tool available (>>37147092)
DiffSVC UI done(>>37150296)
GDrive clone of Master File now available (>>37159549)
New TalkNet models (>>37179832)
Better copy of show bible available in doc (>>37246652)
FiMFic based GTP-J-6B demo notebook (>>37284129)
Latest Synthbot progress report (>>37241505 >>37251301 >>37253865)
Latest Cookie progress report (>>37241115)
Latest Clipper progress report (>>37189422 >>37193768)

>Voice samples
https://derp.link/fHs3K
https://derp.link/O1xdh

>Clipper Anon's Master File 2.0:
https://mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ

>Synthbot's Torrent Resources
https://derp.link/ZJNca

>Cool, where is the discord/forum/whatever unifying place for this project!?
You're looking at it.

Last Thread:
>>37240950
>>
FAQs:
>READ THE DOC
Do it now
derp.link/V7cMp

>Where can I find things made with the voice AI?
In the Good Poni Content folder: derp.link/23EUs

>Did you know that such and such voiced this other thing?
Yes. We are very much aware. It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imatitions of official voices?
No.

>How do I make the voices?
Several guides are available. In depth guides on how to do training and synthesis (making the ponies speak) are in the doc. If you don't want to use the navigation bar in the doc, the sections are also directly linked in the OP. If you want to use the WiP 48KHz notebook, some kind Anons have put together some image guides for you.
48KHz Training: derp.link/wW2hX
48KHz Sythesis: derp.link/j4MXQ

>How do I make the ngrok links?
Doc: derp.link/SfIhY
Video: derp.link/qYgIp

>Where are all the voice samples?
In the doc.

>Is a place I can find all the pony models?
In the doc.

>What about muh waifu?
Check the doc.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phoenetic transcriptions of other languages.

>What about [insert OC here]'s voice?
Not a priority. Again, however, you're welcome to. There are already people doing this.

>Where can I view the PPP /mlp/con panel?
>2020:
YouTube: youtu.be/WtuKBm67YkI
CyTube chat: pony.tube/videos/watch/b83fbbfc-6d4e-4768-8deb-edb61ea38abb
>2021:
YouTube: youtu.be/RAYWr1uOGVM
CyTube chat: pony.tube/videos/watch/56cf0502-0ef8-41a7-96c5-bd7cb727bb9f

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
derp.link/CQ3Ca
>>
File: WhiteAnk.jpg (966 KB, 1280x1920)
966 KB
966 KB JPG
>>37286871
Anchor.
>>
i haven't seen 15 in a bit, i miss them already
>>
>>37286879
He posted not even 3 hours ago.
>>
>>37241115
>Cookie 11 days ago
Damn I'm slow.
Anyway, I've;
- written DiffWave
- written HiFi-GAN
- written Fre-GAN
- written new universal BOHB modules so every model and every line of every config file can be "tuned" automagically to improve results.
- written Unet, DilatedWN and FFT versions of CTC models. (conformer version soon too)
- trained the new HiFi-GAN to 842k steps on single GPU
- tuned a few DiffWave + CTC model variants
- moved my data to faster storage
- reduced RAM usage on dataloaders (letting me spin more up at a time to help with small model training)

Infact, reading
>>37241115
> Some main things I expecting to fix;
It looks like I've completed almost everything I said I would do.
Next up is training all the old baselines and comparing them against new models/modules programmatically, and crushing all the bugs I can find.

>>37241494
>What's DiffPitch? A diffusion version of FastPitch?
Diffusion version of DiffSVC.
PPG + Noisy F0 -> Denoised F0 (for a 10~ steps)
It's a small network that works in conjunction with the mean shifting pitch preprocess to ensure all pitch values for the speaker within their speaking range (using ppg to take into account the phoneme being spoken simultaneously).
>>
>>37286975
Oh, using diffusion to clean up F0, that's clever
>>
>>37286884
It's like he's still here, with us, even now.
I swear sometimes I can still even see his posts...
>>
>>37286879
>them
15 is a team of people? didn't know about that
>>
>>37286975
>universal BOHB modules
What is this?
>>
>>37287144
Here's your (You).
>>
>>37287224
It is a parameter tuning algorithm that is slightly faster than the normal ones, combined with my 600~ lines of config file.
I can add any line of the config to a tuning file (along with appropriate limits) and it'll run many many training runs and slowly find the values that result in the lowest validation loss (or whatever I set) possible.
It's really slow, but extremely low effort to use and can find patterns that I miss because I don't expect them to work and never test them.
Since it works directly with the config files and train module, it will work with every model and data feature I build in future, so it should reduce the amount of manual work by a decent chunk or let me work with more GPUs at a time.
>>
File: brain ouch.png (149 KB, 250x250)
149 KB
149 KB PNG
Hello? Yes, 15 inc.? Your Fluttershy model generated static at me, This is unacceptable and I demand to speak to the manager so I may receive compensation for this injustice.
>>
>>37287270
Fluttershy works fine for me, what browser are you using?
>>
>>37287275
I'm just shitposting, models work fine
>>
File: 1440995870531.png (24 KB, 450x352)
24 KB
24 KB PNG
>>37287325
>ppp is actually polacks
Kek, thx slavs.
>>
wew,they really are doing it for free.
Anyhow, anybody making new projects or something? im feeling like re-dubbing some meme songs into with pony voices.
>>
>>37287369
talknet got celly voice or nah?
>>
>>37287369
I'm about to start work on PTS4. Also working on another thing that'll probably come out before that, depending.

>>37287381
Yes for Celly voice, no singing model for her yet though.
>>
File: 1399131466016.png (3.05 MB, 1529x2034)
3.05 MB
3.05 MB PNG
>>37287408
>Yes for Celly voice, no singing model for her yet though.
I mean we are not really far away from it but still I'm really grateful for what all of us made real.
Literally what other fandom made multiple artificial intelligences because they love their show/waifu that much?
In retrospect it really is fucking insane what we have done.
>>
After training GPT I will want to make a porn StyleGAN2. Has anyone mass scraped images from one of the boorus yet?
I'll also want the TPDNE checkpoint to finetune from.
>>
>>37286884
link?
>>
>>37287502
The previous thread isn't archived yet, you can still make it! You just have to believe!
>>
>>37287486
There's the TPDNE dataset, but it's just faces, so it won't be helpful. Clipper was labeling pony butts, could be useful depending on what you mean by porn.
For mass scraping, maybe talk to the altboorus. Don't know if TPA or iwiftp has something.
Checkpoint is linked on the TPDNE website. Pretty sure it uses the estimator branch of Shawn's fork. See https://make.pony.pictures for a usage example.
>>
>>37287369
Making one more effort to attempt the TalkNet. Hasn't worked on my end, but that's probably because I use Firefox. If it still doesn't work, I'll stick to 15ai until I can figure it out; probably will just have to add another browser that works for it.

Currently working on a Sci-Twi audio idea I had a while back. It was on hold when the test sites went down, so I'm working on it a bit tonight. Probably won't be done for a while. Currently sits at 18 minutes long.
>>
File: netscape.png (27 KB, 632x482)
27 KB
27 KB PNG
>>37287517
To narrow down your search, it doesn't work in Netscape Navigator either.
>>
>>37287517
>Sci-Twi
Is this the part where I'm supposed to say
>no hooves
?
>>
>>37287550
Say the line, Anon!
>>
For some reason 15's models really don't like sentences starting with 'Thus, '. Doing so results in audio that either has noticeable artifacts/static or skips over the problematic part really quickly. Using high-emotion contextualizers makes it worse:
>Thus, the two sisters maintained balance for their kingdom and their subjects, all the different types of ponies.|I'm bored.
https://u.smutty.horse/mcfeqknxwzs.wav
>Thus, the two sisters maintained balance for their kingdom and their subjects, all the different types of ponies.|What?!
https://u.smutty.horse/mcfesppbomu.wav
>Thus, the two sisters maintained balance for their kingdom and their subjects, all the different types of ponies.|I can't wait!
https://u.smutty.horse/mcfermkotbx.wav
Interestingly, the issue doesn't seem to apply to similar words like 'therefore'.

>>37287550
>Sci-Twi
>no hooves
Wait hold on, you mean that the (pony) Twilight with glasses that appears in some fan videos is actually from eqg and not an original fan idea? That explains a lot.
>>
File: fondue.webm (2.92 MB, 1280x720)
2.92 MB
2.92 MB WEBM
>>37287578
Anon...
>>
File: you did it.jpg (155 KB, 1280x720)
155 KB
155 KB JPG
>>37287578
>no hooves

Also, the models are really picky like that sometimes. I remember BGM posting an example text for his second pony rap video, and the models only accepted the very last sentence and ignored the rest.
>>
>>37287517
>Currently working on a Sci-Twi audio idea I had a while back. It was on hold when the test sites went down,
Too bad it couldn't be on hold indefinitely you barbie faggot. Take it to one of the EQG threads. It doesn't belong here.
>>
>>37287625
If you niggers put half as much energy into loving ponies as you do into hating EQG we would have robo-waius and probably our own 10th season at this point.
>>
>>37287650
>You will never go to Equestria.
Extremely ironic coming from someone defending a character who left Equestria.

>>37287665
You apparently weren't here from the start but the idea of scrapping audio from eqg caused controversy in the early stages of PPP despite the characters sharing VAs. Ultimately it's a good thing that it was done but it wasn't unanimously praised.
Let me rephrase it, even touching the eqg audio (which objectively doesn't differ from FiM) was treated like a minor blasphemy.
>>
https://u.smutty.horse/mcffdjquygg.wav
https://u.smutty.horse/mcffdjquygg.wav
>>
>>37287680
>even touching the eqg audio (which objectively doesn't differ from FiM) was treated like a minor blasphemy.
Tell me about it. I'm glad Clipper was willing to pick that up for me because I sure as hell wasn't looking forward to dealing with it. I acquired the audio, but that was about all I did involving EQG here. It would be a mistake for any of these barbiefags to think that this meant they were ever welcome here.
>>
>>37287685
>I'm not adding potential data to the project that would increase the quality of our waifus because FINGERS
okay buddy
>>
>>37287743
Do you even read, retard?
>>
Boy am I glad deleted posts don't count towards bump limit, or PPP threads would be a few hundred short thanks to these goobers who piss about purity instead of posting ponies.
>>
>>37287901
Why the hell would you bring this up again?
At best you're an idiot and at worst a troll trying to start the reignite the shitflinging.
>>
>>37287910
Nah, just wanted to make my quirky observation about bump limits, I also feel if it went unreferenced, some jackass might just come in and do it all again thinking it's for the first time.
>>
File: 516501_cropped.png (118 KB, 423x377)
118 KB
118 KB PNG
Just migrated to Linux 'cause Windows kicked it. Speaking of, looks like 15 kicked away the "final" huh?
I look forward to the 14th update past here just so we can get to 15.15 and double it up.

Back to Linux though, is there a Linux specific version for DeltaVox RS? Or am I gonna have to try and Wine the existing one?
>>
>>37287932
Last time I checked the doc, I don't think I saw a Linux specific version. Wine might be your best shot. Imagine not duel booting Win7.
>>
>>37287932
>Linux specific version for DeltaVox RS? Or am I gonna have to try and Wine the existing one?
You're gonna have to use Wine, I have many things imported as Windows-only DLLs (even if they're cross-platform) and porting to Linux is planned but only when I have absolutely nothing else to do. Although it might be unfeasible without major refactors because the Logitech LED API is Windows-only.
>>
>>37287946
Alrighty then, thanks for the quick response. I guess I'll let you know how the Wine-ing goes.
If worst comes to worst I guess I'll be able to run it within a virtual box? Let's hope it doesn't come to that.
I look forward to your linux version though, whenever that comes around.
>>
>>37287970
From the distant memory of trying out a very early build in my Kubuntu 18.04 LTS installation back when I was dual-booting, it didn't go very well, possibly because Tensorflow is pretty big and complex. Interested to see your results.
>>
File: Deltavox results edit.png (454 KB, 1675x982)
454 KB
454 KB PNG
>>37288045
So far results don't seem too good. It can't find certain dll libraries it seems, even though most of them seem to exist still. Strange. The directory in question is identical to what I had working on Windows, so there's no missing files. And Wine is working too as I was able to run the included 'Visual C++ Redistributatable (Install if v140 dll error).exe' just fine.
>>
>>37287262
That's pretty cool. Other than BOHB for hyperparameter tuning, HiFi-GAN for vocoding, and diffusion methods for improving audio quality, have you found anything else that looks like a clear winner?
By the way, it looks like you're not looking into anything involving equivariant networks. I think those are getting really popular. Here's an example used by AlphaFold2: https://arxiv.org/abs/2006.10503.
Equivariance sounds complicated, but it just means "hidden layer outputs have the same structure as their input, and they deliberately reflect certain characteristics of the input." For example, a convolutional layer's outputs have the same pixel structure as their inputs, and their outputs will reflect translations in their inputs. Convolutions also reflect rotations when using rotationally-symmetric kernels, but that's really rigid since that always turns, e.g., 50 degree rotations in the input into 50 degree rotations in the output. SE(3) transformers also reflect rotations, but they can do so more flexibly, e.g., by stretching or shrinking rotations.
I don't think it's been applied to speech yet, but it's something to watch for, especially in voice conversion.
>>
>>37288297
I read that AlphaFold2 is using Invariant Point Attention instead of SE(3) transformers, but I think the idea is the same.
I'm curious what kind of equivariance would be good for speech, though. Translational invariance, of course. That's already built into 1D convs (though aliasing may hurt performance a bit, e.g. see Alias-Free GAN). "Flip invariance" via symmetric kernels doesn't seem that useful. Maybe something could be done with spectrograms, since they're 2D?
>>
File: mpv-shot0034.jpg (264 KB, 1920x1080)
264 KB
264 KB JPG
>>37287684
>Posting Content
https://u.smutty.horse/mcfiokhnvgu.wav
>>
>>37288683
Highlights how strange it is to shout out loud what you're typing as you type it.
>>
File: WoodyFace.png (113 KB, 199x227)
113 KB
113 KB PNG
>>37288683
Kek, nice!
>>
Is SortAnon's TalkNet not working for anyone else? Last two times I tried to use it, the generate button just doesn't work. No error or anything, it just doesn't create an output. I'm running Chrome, I've changed nothing on my end since it was last working.
>>
>>37288808
I tried signing out of my Google account and signing back in, that fixed it for me.
>>
>>37288297
>have you found anything else that looks like a clear winner
There are no clear winners that I can think of. Everything has at least 1 trade-off. Be it performance, variability, stability or coding complexity.
>I don't think it's been applied to speech yet, but it's something to watch for, especially in voice conversion.
Thanks.
I'm going to be rewriting all my networks to use my new modules for at least a few more weeks, but I'll keep this in mind for when I want new stuff to test out.
>>
>>37288816
Hmm, didn't seem to make a difference for me. Still not getting any output from the models.
>>
>>37288565
PCM is a phase and amplitude assigned over time. Voice conversion should probably be equivariant with shifts in most of these things and their rates of change. That means phase shifts (to capture spatial positioning), time shifts (to capture timing... invariance might work better here than equivariance), frequency changes (i.e., change in phase over time, to capture formants), and amplitude shifts (to capture volume... maybe use the Bark scale version of these). The only one it probably shouldn't be equivariant with is shifts in power / MFCCs since that one is usually used as a voice signature to identify the speaker.
>>
>>37288808
well, it did work for me yestarday but now im getting the No module named 'dash', i was able to get it to point of generating the UI window by adding the missing modules above step 2:

!pip install dash
!pip install jupyter_dash
!pip install crepe
!pip install psola
!pip install torch_stft
!pip install kaldiio
!pip install pydub
!pip install frozendict
!pip install unidecode

!pip install pyannote.audio
!pip install g2p_en
!pip install pesq
!pip install pystoi
!pip install ffmpeg-python

HOWEVER that just leads me to the same point of BGM were clicking the Generate button does nothing.
>>37288816
following this on new incognito tab gives me error 403, both with step 3 and step 3B.
Than I factory restarted and run everything again but without extra code and was still getting the error 403, it seem colab is being very fible with this particular code.
Can someone tech savvy figure out how to run this offline because it seems google is really going out of its way to fuck around the colab code.
>>
>>37284805
>The only downside is that I was previously getting the origin point for each shape from JSFL. I can't do that anymore since I don't have the mapping between XFL elements and JSFL elements. I was doing that because I couldn't figure out how to get that information from the XFL. It looks like the <transformationPoint> needs to be converted somehow, and I haven't figured out how.
I'm still struggling with this. I found some code that does this for Unity, and it looks like transformation matrices and origins are tracked separately. As in, it doesn't apply origin changes and matrices alternatingly, it only applies matrices to matrices and origin shifts to origin shifts. I don't understand how that's supposed to work, but I can try it.
>>
>>37288864
just tossing an idea out there but could it be possible to make talknet save the generated wavs in the google drive folder like the ngrok does ? would that make any difference SortAnon ?
>>
File: error 403.jpg (76 KB, 764x398)
76 KB
76 KB JPG
hmm, it seems colab isn't fan of re-downloading files from the same github source, so maybe having all the anons duplicate the talknet to their google drives and than run it from their personal accounts would solve this ?
>>
>>37288901
Just tried saving a copy in my drive and running it from there, still getting the same error BGM reported of the generate button not doing anything. At this point I'm pretty sure that the issue is something inherent to the script that only SortAnon can fix.
>>
File: 1625858392639.png (152 KB, 380x415)
152 KB
152 KB PNG
>>
I've made a list of all the tags used in the master folders for the Music and SFX files for organizing reasons, just posting those here as perhaps some other anons could use those for their own projects.
https://pastebin.com/2CHjh5tW
https://pastebin.com/6BkgJV1w
>>
>>37287144
>>
bump
>>
File: archive all med done.png (43 KB, 1228x484)
43 KB
43 KB PNG
>page 10
Bumping with this chart now that the FIMFiction archive 20k steps finetuning run is complete. Currently readying the model for inference in Colab by slimming it.
>>
>>37290865
FIMFiction-20k model can be played with in Colab.
https://colab.research.google.com/drive/13R8MJEDTwinEmUJMLqydKOIcAvWiBIlT?usp=sharing
>>
On 15.ai, voices of other characters bleed into generated speech clips from models that are trained with 0.3 minutes of audio data.

For instance, the Snails voice model sounds this way: https://u.smutty.horse/mcfotncapfe.wav
>>
>>37291191
Yeah I noticed this as well, I thought Octavia sounded like Twilight and Rarity a bit too much at times.
>>
>>37291191
Good thing nobody gives a shit about snips or snails.
>>
>>37291162
is there a way to download the text model so i could generate my own text offline ?
>>
>>37291346
You can download it from the link that it downloads from the notebook, here, I'm pasting it for you: https://storage.googleapis.com/xdisk/fimfmed.tar
But you need an RTX 3090 and Linux with everything installed or TPUs to run it locally, it's a really, really big model.
>>
>>37290137
Thanks for that, I've added these as "taglist.txt" in the respective "SFX" and "Music" folders.
>>
>>37291366
https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram
Apparently it is possible to run it on 1060 6gb gpu, but it also seems to get a large hit in the performance / quality.DH8PA
>>
>>37291366
>Needs RTX 3090
Wouldn't it be able to work with any decent card, but just allow it to process longer for the same result?
Like... If a GTX 1650 is around 400% slower, wouldn't that just mean it'd take 4 times as long to get a similar result?
>>
>>37291903
problem is you need to physically able to fit the card on computer memory while the correct gpu runs the code, like the above github code is able to get away with using older cards as they are still compatible with tensorflow code BUT they still need to split the load between GPU and RAM.
Having a gpu with over 20GB just means its all in one spot and can be processed all at once, you cannot just tell computer to load 1/4 of the gpt model (like imagine reading a one part of a book that was chopped in four parts like a pie, it would impossible)
>>
>>37291930
Oh I see, so it's not so much a matter of processing the same amount of data over time, but rather needing the required memory overhead in order to support such a massive model.
Thanks for clearing that up, anon.
>>
File: Gasp 2.jpg (136 KB, 1253x1129)
136 KB
136 KB JPG
>>37291162
I'm getting MUCH better results than from the 500 step iteration, though I guess that's to be expected. It actually seems to generate things somewhat coherently, perhaps even getting close to the coherency of the GPT-2 AID model. I'm having a blast with it so far.
>>
>>37288901
>re-downloading files
Ah fuck, I think I know why it's broken now. I'll look into it in a few hours.
>>
Try the new TalkNet notebook and see if it spits out an error message. Post it here if it does. If the error is "VersionConflict", try restarting the runtime (but copy the error message first).

If it still fails without an error, post a screenshot of your models folder, like this. You might need to click the refresh icon for it to show up.
>>
>>37292758
I got the VersionConflict error but restarting the runtime fixed it. I copied the error message if you want me to paste it. Apart from that it seems to be working for me now at least. Tested two different lines and got the inputs as expected.
>>
https://u.smutty.horse/mcftehfskaa.wav
>>
>>37288867
I figured it out.
- For shape objects, the edge format defines the origin point. That origin is relative to the midpoint of the shape. I can dump the origin point and the size of the shape with Animate.
- https://u.smutty.horse/mcftdfylrke.mp4

It turns out that didn't fix the issue I had last time. I'm getting the same animation with the messed up eyebrows, choppy animation, and non-animated wings. BUT the whole thing runs way faster, it's way cleaner (you'll have to trust me on this since the code still looks like shit), and I no longer run into any of Animate's random failures when trying to dump shapes.
I got a hint from the anon in the last thread about what's making the animation choppy. I'm going to commit my stuff for now, upload the updated tool for dumping animations, and work on the choppiness when I get back in several days.
Render Anon or Morph Anon, if you want to try figuring it out while I'm away, I should still be able to respond to posts occasionally. If you need samples to test out, Clipper knows how to convert files to the new format.
>>
File: mlp all low.png (55 KB, 1260x497)
55 KB
55 KB PNG
I am now beginning the /mlp/ dataset finetuning run - it is as I explained in one of my earlier posts last thread. When trained, it'll work as a shitposter and green writer.
>>
>>37292891
Correction: I can't post the updated auto-animate tool yet. Pyinstaller doesn't play nicely with cairocffi. I'll need to figure that out first.
>>
>>37288851
I think human hearing is invariant to phase, so I'm not sure if phase equivariance is useful. And if you use magnitude spectrograms, you're already ignoring phase. I've only seen the mono -> binaural audio NN care about phase since it affects perception there.
Time equivariance seems good. Some invariance could be good for changing prosody? Depends on if it's a 1 to 1 conversion or a seq to seq.
Frequency equivariance makes sense. I guess that's what pitch augmentations aim for.
Amplitude equivariance makes sense too.
Wonder if all of these things need special architectures or just tweaks to existing ones.
>>
File: file.png (64 KB, 301x291)
64 KB
64 KB PNG
>>37292881
Recently remade the /tg/ station Start/End Round Sounds for the /mlp/ SS13 server (/vg/ Codebase), not sure if it fits here but it's content.

Orginal Start/End Round Sounds: https://u.smutty.horse/mcftodlmcrj.wav

Ponified: https://u.smutty.horse/mcftodotcky.wav
>>
>>37292923
Human hearing uses phase to spatially position a sound. That doesn't matter much for our ability to recognize a speaker, but it can have a big impact on how realistic something sounds.
>>
>>37292945
Yeah, I guess I meant to say that absolute differences in phase don't matter. But relative differences certainly do
>>
>>37292758
Got the 'VersionConflict' error, so i follow the runtme->restart routine steps and run it all again.
Now, after pressing the generate button im getting the error 'cuDNN error: CUDNN_STATUS_NOT_INITIALIZED'
>>
>>37293086
ive tried to run it in the incognito opera and once again getting the error 403
>>
File: 1622446133639.png (363 KB, 720x640)
363 KB
363 KB PNG
>>37287144
>>
Is there a way to make Trixie do say The Great And Powerful Trrrrixie?
https://vocaroo.com/1iomCiEQTi0D
>>
File: file.png (173 KB, 900x807)
173 KB
173 KB PNG
>>37293322
I've tried making her roll her R's but to no avail, she sure does sound cute while saying it though
https://u.smutty.horse/mcfvloqfhqy.wav
>>
>>37293086
I think I've fixed this error. If it shows up again, post your output from step 1.
>>
By the way, there's no reason TalkNet can't run offline. How many of you have gaming PCs?

https://www.strawpoll.me/45515369
>>
>>37293388
>output from step 1
I got the usual gpu Tesla T4 15109MiB
And now the colab randomly decided it does want to cooperate on firefox once again, this inconsistency makes me think that maybe this is a problem of too many people downloading the files at the same moment ?
>>37293388
Also since you are talking about offline version, that would be pretty great since I could use the other hifigan or ngrok at the same time when working on a projects.
if I may make a suggestion for the code upgrade , could it be possible to add a "ticked on" option for adding an extra 3~5 seconds of silence at the end of clips? ive notice when messing around the echos and reverb is bit difficult as many editing programs refuse to "go over" the audio clip length making a weird hard cut on the echo effects (nothing that cant be fixed in audacity BUT editing 100+ clips by hand does adds up).
>>
>>37293398
How would you put together a Controllable TalkNet model that runs offline, exactly? And how would the speeds compare even on a good gaming GPU?
>>
>>37293486
>could it be possible to add a "ticked on" option for adding an extra 3~5 seconds of silence at the end of clips?
Sure, I could do that.

>>37293512
It already runs offline. I just need to write a guide on how to set it up.

>And how would the speeds compare even on a good gaming GPU?
About the same speed as on Colab.
>>
>>37293398
I knew being a VRAMlet would bite me in the ass eventually got a 1060 3gb here
>>
>>37293398
wait all it takes is 4GB?
>>
File: thegreatandpowerfultrixie.png (170 KB, 1526x1107)
170 KB
170 KB PNG
>>37293322
Here's what I got from the attached input as the second generation.
https://u.smutty.horse/mcfxsozxuaq.wav

>>37293373
You guys using ARPAbet strings?
>>
File: 1626665526887.jpg (72 KB, 704x702)
72 KB
72 KB JPG
>>
>>37293322
You're going to have to hire Kathleen Barr, or another voice actress who can imitate her. Rolling the Rs is a very human skill.
>>
>>37295424
Everything is a "very human skill" until we get a machine to do it better.
>>
>>37295424
You can get Trixie to trill her R's, you just have to be very lucky and use the right contextualizer.
>>
>>37295424
Funnily enough I remember the earlier models like the 22khz google colab being able to trill the Rs quite often. So the AI *can* pick up on it. It just probably thinks it's unnecessary. So you find a way to specifically tell the AI you need to include it with that phrase.
>>
>>37295515
>better
Machines can never be better humans than humans.
>>
File: Strawman.jpg (23 KB, 474x355)
23 KB
23 KB JPG
>>37296085
>>
Yo anons. I'm not sure if this is a known issue because I haven't participated in anything related to the PPP whatsoever until just now, but I tested the latest MMI Pinkie Pie voice model (11NULGhxh1JTwb7oHBdmT7TAMKvuX-rCg) with the text set as the singular word "I".

This causes a glitch that results in her saying the word "I" like 40 times instead of just once! It's actually hilarious. https://drive.google.com/file/d/1YqWwkJ1U3Og6qC-jog33P2cDal0nz9xw/view?usp=sharing
>>
>>37296295
then don't do that
if you want it to say I then just use a word that sounds like it like eye
>>
File: 2310221.png (232 KB, 777x847)
232 KB
232 KB PNG
God! Why I keep hitting "Generate" instead of Download the sound when I get great result!?
>>
>>37296295
Has there not been a HIFI GAN model trained of pinkie yet?

>https://drive.google.com/file/d/1YqWwkJ1U3Og6qC-jog33P2cDal0nz9xw/view?usp=sharing
Also could you use smutty.horse in the future for file hosting?
>>
>>37296357
Ironically, that has the same result. I don't need the sound for anything though; I was just surprised the notebook broke in such a strange way.
>>
>>37296369
>Also could you use smutty.horse in the future for file hosting?

lol Is that the standard here?
>>
>>37296389
>lol Is that the standard here?
Yes, because there is no chance for arbitrary deletions or content policing.
Also other file hosts compress files sometimes that can cause feedback issues.
>>
>>37296379
yea the notebook's and and even 15.AI are pretty easy break if it's spitting out garbage you should just change what you're putting in
>>
File: oops.gif (1.69 MB, 332x434)
1.69 MB
1.69 MB GIF
>>37296362
I know the struggle, Muscle memory is a blessing and a curse.
>>
>>37296080
I've been thinking about it a lot, I don't think there are any words where R is followed by a W, so I'm wondering if any trilled R could be rewritten in the training data as "RW". It might be dumb, but it would also bake a consistent prompt for trilled Rs into the models for now. (Until models are trained on the full IPA.)
>>
>>37296621
otherwise
And holy hell the captcha was DRW2Y. What are the odds?
>>
>>37296841
n^5 where n is the number of possible characters
assuming full alphabet and numbers (i know some aren't used bear with me) it'd be 36^5, so your odds of exactly that combination are approximately 1 in 60,466,176
>>
File: Anon2.jpg (38 KB, 517x520)
38 KB
38 KB JPG
Does Ruiji still lurk here? The one who posted
https://vocaroo.com/1nvYTdC84VZt
and
https://u.smutty.horse/maujfrovyne.mp3
>>
>>37296362
If only there was a way to temporarily keep the previous generation in memory. Just one cycle, then delete after "generate" is pressed a second time.
>>
>>37296977
or just 1 in 324 if we're just looking for the odds of an R followed by a W appearing anywhere in a captcha, instead of the odds for the full string.
>>
>>37292906
You were asking in the last thread whether 20k iterations was enough. This might help.
https://arxiv.org/abs/2001.08361

Also, the roto-translation equivariance stuff seems both complicated and extremely useful. I'll write up a summary once I understand it better and get some time. >>37288565, if you already understand it, a summary would be great. I'm having trouble seeing where the Wigner-D matrices come from and how exactly to use them.
>>
>>37297527
Unfortunately, I don't understand it. I only skimmed this blog post by the author (https://fabianfuchsml.github.io/alphafold2/), which brings up irreducible representations and Wigner-D matrices quite suddenly. Maybe the paper will give more background info or at least have citations.
But yes, the overall idea is interesting. Again, not sure how effective 2D/3D stuff will be for audio, but you never know.
>>
Out of curiosity, where the hell are you guys going to store all of this data anyways and how big is the filesize now?
>>
>>37297736
hello cia
>>
>>37297736
My PPP folder is 330 GB. It has some non-pony data in it, though.
>>
>>37297736
>Out of curiosity
don't shoot glowing one
>>
This week on things we need to apply to ponies eventually:

https://www.youtube.com/watch?v=0zaGYLPj4Kk
>>
>>37286871
I’m just curious: what tools are you guys using to mine the data and make predictions? I took a data mining class where I had the choice of PySpark, Pytorch, and others for a project and am probably going to use Prophet in a work project.
>>
File: fbi plus hasbro.png (83 KB, 935x65)
83 KB
83 KB PNG
>>37298709
shooo fbi plus hasbro, go and glow somewhere else
>>
>>37287582
this is cute, got more?
>>
>>37298415
that's cool
>>
File: windows.png (301 KB, 515x508)
301 KB
301 KB PNG
I've made a script that lets you "install" TalkNet on Windows. I've only tested it on one machine, so it might still have some bugs.

https://github.com/SortAnon/ControllableTalkNet/releases/latest/download/TalkNetOffline.zip

Extract this somewhere, and run setup.bat first. It'll take 20 minutes, and you need 10 GB of free space, and an NVIDIA card with 4+ GB VRAM. When it's done, run talknet.bat. If everything works, the TalkNet UI should run at http://127.0.0.1:8050/.
>>
https://u.smutty.horse/mcgkxoztihg.mp3
>>
>>37298804
But suppose I were interested in working on this project for an entry-level salary for a master’s graduate. What tools would I need to know how to use? Though I’d probably need to brush up a little on my Python or learn to program in R regardless.
>>
https://u.smutty.horse/mcgkyfqfudo.wav
>>
File: fgddfgdfgfgfgd.gif (15 KB, 220x216)
15 KB
15 KB GIF
>>37299690
yeehaw!
https://u.smutty.horse/mcgkzkjxdmr.wav

pov you just messed up your lines:
https://u.smutty.horse/mcgkzpdjsqw.wav

outtakes:
https://u.smutty.horse/mcgkzowqtbi.wav
https://u.smutty.horse/mcgkzprbnso.wav

what a silly pony:
https://u.smutty.horse/mcgkzmjhmry.wav
>>
>>37299594
Is this TalkNet for training models, or Controllable TalkNet for actually generating outputs?
>>
>>37299899
Generating outputs.
>>
>>37299916
And does the installation allow me to set which drives/directories will be used, or does it use default directories?
>>
>>37299928
It installs everything to the folder you run it from.
>>
>>37299936
Ah, nice. I'll try getting this set up on my primary PC at some point and see how everything runs

Thanks for all the work you've put into this, by the way
>>
>>37299594
I would've tried to Wine this, but my GPU sadly only has 3GBs, not quite enough to match the required 4. Damn.
>>
How much would it be worth it to get one of these to train models on? https://www.pugetsystems.com/recommended/Recommended-Systems-for-Machine-Learning-AI-174/Buy_200
>>
>>37296085
Just a quick reminder, humans are just machines of a different kind. I'm sure some day after ponies are completely digitised we can immortalize ponyfags too.
>>
>>37300284
The ones >>37298709 mentioned are free and open-source, but Pydub and PyAudio may work the job. The issue may be the amount of storage you’ll need to hold the training data, which, as many here will likely concur, is exactly why NO copy of SM64 is personalized.
>>
>>37298804
>>37298709
>when the subsequent leaks proved the absolute basement-tier attempts by habsro to find out who leaked the first time
Wish I could relieve those days. Need more leaks they're too fucking funny at seeing how terrible billion dollar corporations are
>>
>>37300297
The page I linked to was for a Machine Learning PC.
>>
>>37300307
Oh. I thought you were referring to the algorithm or tool to use.
>>
>>37300385
The url itself is pretty vague though.
>>
Are there any idiot guides out there for getting koboldAI to run GPT-J with the fimfiction mod?

Idiot guides for idiots on the level of not knowing wtf any of what I just typed up there means. I just want a new AIdungeon for frantic pony fuckery.
>>
https://u.smutty.horse/mcgmqzdupcs.wav
>>
I trusted the plan and 15 still hasn't come back m8s. It's over.
>>
>>37300479
https://u.smutty.horse/mcgmuilvlnz.wav
>>
>>37300486
https://u.smutty.horse/mcgmvvsibua.wav
>>
>>37300412
I'm looking at their example Colab and I think I can implement the same API and web service but in my notebook, give me tens of minutes to a few hours, depending on how complicated. It doesn't look difficult.
>>
>>37300556
Then give me Carl
>>
>>37300527
https://u.smutty.horse/mcgmxxgfkzx.wav
>>
>>37300584
>>37300556
https://u.smutty.horse/mcgmzskxnmd.wav
>>
>>37300412
from the /aids/ thread. I don't know about fimfiction but i assume that's something you've made?
https://rentry.org/itsnotthathard
>>
>>37299594
it says i need visual studio do i really need visual studio for this to work? or i just bypass it
>>
>>37300748
Normally any program that says you need Visual Studio means you have to bite the bullet, and download the dependency. -T. Various Versions.
>>
File: 1626666647885.png (147 KB, 1903x1057)
147 KB
147 KB PNG
>>37300748
>>37300778
that doesn't look good
>>
File: Snake In Fire.gif (1.99 MB, 448x252)
1.99 MB
1.99 MB GIF
>>37300790
Ooh, that's a lot of red.
>>
File: 1611450346033.png (979 KB, 1440x810)
979 KB
979 KB PNG
>>37299594
well that was something my PC froze for about 20 mins then the setup failed
>>
>>37300778
Does Visual Studio have compatibility with Python and R?
>>
>>37301095
Some cursory google tells me it gets goofy.
>>
>>37300790
>>37301060
RIP
>>
>>37296621
Rwanda
>>
File: vram.png (3 KB, 227x152)
3 KB
3 KB PNG
>>37300256
It might work with 3 GB, depending on how much you're running in the background.
>Wine
It runs natively on Linux. Do you want some setup instructions?
>>
>>37299695
What predictions are you taking about? Most of the AI stuff is Python, and some C++. People are using a lot of tools to work with audio data and to scale up training. I don't think anyone is using anything special to preprocess data.
>>
>>37301060
What happens if you just install this and rerun setup.bat? I really hope I don't need to include a full Visual Studio setup.

https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=16
>>
>>37300609
https://u.smutty.horse/mcgqbkohako.wav
>>
>>37299594
Having trouble running it. when starting the talknet batch it gives me the bellow message.
>no module named 'nemo'
Ive run pip install on nemo but it still tells me there isnt nemo module installed.
Ive also tried to upgrade it but it tossed this message at me, whatever the 'git' command is the python is not recognizing it, are you sure it not supposed to be 'get' or something else here?
>>
>>37301734
i thought about being smartass and forcing the module to be installed before the code ask for it but it still no-go, i will try mesa around admin permissions if that will fix anything.
>>
>>37301746
ive somehow solved the problem by installing the git 64 from here
https://git-scm.com/download/win
than running update and letting every module get sorted.
Now i have just few questions, where the fuck do I upload the reference wavs in the browser only version (since there isn't a colab folder to drop those in) ?
How do I get custom models working complete offline (as far I understand it seems the script can only reference back prewritten models and all new ones need to be re-download ?)
>>
>>37301778
and im getting some weird errors again.
>>
>>37301788
never mind, it seems there is some problem with extracting the Trixie model as script makes the proper folder for it but do not actually export anything in it, just had to do it by hand, also it seems i misunderstood how this works, as you do need to have internet to first fun a new code to download model but after that it will use the model reference to grab the file from model folder.
Im sorry Sortanon if I sounded assholeish above but im just really annoyed with getting random setbacks in everything when all I want is just listining to pony voices.

Well, there is one more error, im getting message that cuda is out of memory were clearly the code isn't even using half of my gpu memory. And yes, i did close it down and reopened but the same thing happened anyway so it seems to be a code problem
>>
>>37301830
Were you offline when you first selected Trixie? I know why >>37301734 is happening, but >>37301788 is still a mystery.

>im getting message that cuda is out of memory were clearly the code isn't even using half of my gpu memory
Windows could be underreporting the VRAM usage. Try opening a command prompt and typing "nvidia-smi".

>>37301778
>where the fuck do I upload the reference wavs in the browser only version
For now, it's the folder called ControllableTalkNet. I'll add a more convenient way to manage it later.
>>
>>37302077
>Were you offline when you first selected Trixie?
no, all the models downloaded fine, even the custom ones, it just for some reason Trixie had to be done by hand.
Actually let me post the one made in past few month here, for the other anons to use:
tf2_soldier_TalkNet.zip
1Gt7sD4fsU0aC06V2zQsn4Vrnj6g2E6xQ
MrRogerTalknet_TalkNet
1qbcYrxgO3f3RIWfOrL9QqFJVuzy0H_W_
NamelessHeroTN
1lhtg5jPfz-9j-re2d4DQ0P1vQCXBqcLw


>Windows could be underreporting the VRAM usage. Try opening a command prompt and typing "nvidia-smi".
I've tried that, it still run out of memory, even after i closed my video editor and rtx voice, this is bit weird since i am able to run the medium gpt2 text model and those also require 4gb of gpu memory.
>>
>>37302203
what gpu do you have?
>>
>>37302228
gtx 1080, it sure doesn't rocks socks off but it can play even the newest games at medium settings.
>>
>>37302238
Same. I'm building a new machine but >GPU prices.
>>
pinkie pie huge ass
>>
>>37302203
It shouldn't use that much VRAM. What happens if you do a line without reference audio?
>>
>>37302444
>What happens if you do a line without reference audio?
If i tick the "disable reference" it works fine, there is no problem generating audio, if I tick that off and click the update list it still works (it just ask to put the reference in), it only breaks when i click the droplist and choose the wav reference file.
>>
>>37302457
So it's the pitch estimator that's the problem. I should try replacing it with torchcrepe.
>>
File: talknet.png (54 KB, 1279x751)
54 KB
54 KB PNG
>>37299594
Working perfectly over here.
>>
File: 1606087027516.png (26 KB, 718x402)
26 KB
26 KB PNG
>>
File: pinkie hmmmmm.png (224 KB, 407x570)
224 KB
224 KB PNG
If nothing derails me too much for next few days I think I will be able to get new audio episode this thursday/friday (also bump).
>>
>>37304192
Awesome, I'm excited to see what you have in store.
I get the feeling this week might be a good one for content.
>>
>>37300293
>he fell for the "brains are just bio-computers" meme
Lol. Lmao.
>>37301514
Lost hard. It's just like Terry used to say.
Also now I've gotta go jack off to the thought of AJ calling me a fucking nigger.
>>
yo who the fuck is this nigger?
https://youtu.be/zqklInNM9H4
>>
>>37304842
Random faggot who doesn't browse the board and got mad that the 15.ai is down so he now lies about it on the internet for views. Ignore him.
>>
>>37304842
>yo who the fuck is this nigger?
>ThunderShyOfficial
Someone who doesn't want to be impersonated apparently. Too bad for them.
I hope this video is satire.
>>
File: song title.jpg (160 KB, 528x436)
160 KB
160 KB JPG
Figured I wouldn't let these go to waste:

The Weeknd - Can't Feel My Face (WIP)
https://u.smutty.horse/mcgzletwotq.mp3

Gwen Stefani - Rich Girl (WIP)
https://u.smutty.horse/mcgzlessmrw.mp3

The AI shits itself around the lines with overlapping or faint vocals so maybe one day it can learn how to distinguish them for better results. Still fun as heck.
>>
>>37304842
>look through his videos
>hes a shitfag
every single time
>>
>>37299594
Thanks very much for this! It works great so far although I did run into the same problems this anon did
>>37301060
>>37301734
but fortunately downloading this
>>37301462
and this
>>37301778
and rerunning the install .bat fixed everything.
>>
>>37305246
>Gwen Stefani - Rich Girl (WIP)
Wouldn't that be more appropriate for Rarity to sing? What kinds of problems does Rarity's TalkNet model have?
>>
>>37305355
Twi is the more stable and flexible horse
Rarity doesn't sound bad now that I notice but she still needs a bit more practice:
https://u.smutty.horse/mcgzzxscvkg.mp3
>>
>>37305343
Thanks for confirming we only need the build tools. I'll update the setup script later today.
>>
>>37305628
I needed to install the "Desktop development with C++" under the "workloads" tab in order for the install to build correctly. Was that what you were referring to?
>>
>>37305650
Yes. I have Visual Studio installed, so I never ran into any errors.
>>
File: mpv-shot002.jpg (198 KB, 1920x1080)
198 KB
198 KB JPG
>Page 10
>>
File: fluttershy puke.gif (181 KB, 480x270)
181 KB
181 KB GIF
>>37306339
>that face
>>
TalkNet installer's been updated. It should be bulletproof now.

https://github.com/SortAnon/ControllableTalkNet/releases/latest/download/TalkNetOffline.zip
>>
>>37306812
made a fresh installation, and once again the no-reference audio works wile the reference audio make the gpu run out of memory.
Is there chance to make the part of code that converts reference wav to reference pitch use cpu instead of gpu ?
>>
>>37307147
It could be an issue with the CUDA install or Nividia's drivers. I had issues like that last year. Did you update your drivers and what GPU are you using?
>>
>>37293398
RTX 3060 12gb here, gonna have sum fun.
I hope
>>
>>37307279
gtx 1080, and whatever the newest drivers were updated four days ago.
>>
>>37306812
there's a whole lot of red words here
>>
>>37307550
And no other applications that could be using a significant amount of VRAM are running while generating the audio?
>>
>>37307675
haven't change anything since last time >>37302203
I guess Sortanon haven't change yet whatever the pitch audio reference code in here is causing this error.
>>
File: Untitled.png (130 KB, 1271x1000)
130 KB
130 KB PNG
>>37307590
>Cannot open include file: 'io.h'
Don't tell me you need the entire Windows 10 SDK to read a file.

Try this. Go to the installers folder, and run vs_BuildTools.exe. Follow the steps in this image, and run setup.bat again when it's done installing. Does that fix the error?
>>
File: 1607903372051.png (618 KB, 1537x1616)
618 KB
618 KB PNG
>>37308014
It works!
>>
>>37307697
I tried replacing the pitch estimator today, and it broke the models' ability to hold notes. So without retraining every character, I'm stuck with the existing one.
https://u.smutty.horse/mchhrqrwewg.ogg
It does run on CPU, but it's very slow. I'll add it as an option.

>>37308099
Installer's been updated again. No one else should run into >>37307590.
>>
>>37308290
Any word on >>37307147 ?
>>
>>37308345
That's what the new option fixes, at the cost of speed.

Run update.bat. Open ControllableTalknet/controllable_talknet.py in a text editor. Go to line 41, change "CPU_PITCH = False" to "CPU_PITCH = True", and save. That should fix the memory problems.
>>
>>37308516
run the update and still nope, it still getting the same old message >>37307147
here is a screenshot a moment before it spams the "out of memory" message.
>>
>>37308575
And you're sure it's set to CPU_PITCH = True? I've tested it on two different machines, and it doesn't use CUDA on either of them.
>>
>>37308617
>CPU_PITCH = True
in what file is that written ?
>>
>>37308814
controllable_talknet.py, in the ControllableTalknet folder. Line 41, just beneath all the import stuff.
>>
File: spike thumb up.png (425 KB, 957x538)
425 KB
425 KB PNG
>>37308835
yaa, it works now, happy times.
>>
Oh man..
I want to make a version of Alabama Nigger but with apul and talk of Ziggers, but im more retard then a nig
Me no understand compooter stuff, what do
>>
>>37308870
You just gotta fiddle around with it until you come to understand it. Too bad there isn't any text anywhere to read that explains it, like say a long running thread, or literal guides written into the notebooks.
>>
>>37308885
I'm trying but all the words start dancing around and shit ahhh
>>
>>37308889
Sounds more like you need a diagnoses.
>>
>>37308896
It's not worth it, FUCK doctors
>>
>>37308899
Take the meds, or face the feds.
>>
>>37308904
>feds
because I can't read good?
>>
>>37308908
Because the government treat the doctor avoidant poorly, despite doctors costing a lot. Also because it rhymes.
>>
>>37308931
Government sucks dick
>>
is the test site down?
>>
>>37309498
Looks like it.
>>
>>37309534
I am back in action, after some stuff happened. Today I experimented with programming angles into the servos. I don't know how to make a walking robot with 4 legs yet.
>>
File: 2211579.gif (2.68 MB, 402x210)
2.68 MB
2.68 MB GIF
>>37305246
these are amazing pls make more
>>
>>37309545
I can help but wonder how many generals the schematics are gonna touch.
>>
>>37304842
lol hes right faggot.
>>
File: bumpfs.png (452 KB, 1000x1000)
452 KB
452 KB PNG
>page 9
Bumping with this mini-PTS I made last week:
https://u.smutty.horse/mcgnzmibuvu.mp4
>>
>>37292942
Can I get a link to the git?
>>
>>37309498
I do hope a test site comes back soon.
>>
>>37310795
https://github.com/AlphaPassive/mlpstation13
and here's a link to the thread if you're interested
>>37306319
>>
File: drd man.png (379 KB, 580x652)
379 KB
379 KB PNG
https://www.youtube.com/watch?v=kgjvnI_FVcc
Im getting annoyed with some sfx/voice stuff from main audio episode and took a break to finish the meme song Ive been messing around for some time.
So I hope you guys will enjoy the DRD 100% Gamer song.
>>
>>37300564
I don't like how bloated KoboldAI is, I will instead make a simple web interface like Cookie's ngrok notebooks. Hell, I could probably use one of the free TPUs under the TRC to serve the FIMFiction model if I didn't accidentally waste a ton of free Google Cloud credit on data ingress between continents.
>>
first time using the talknet thing and I'm getting this error CUDA out of memory. Tried to allocate etc
did i fuck up something?
>>
>>37311317
try this >>37308835
>>37308516

it would e nice if there was a small readme file added to the zip download to explain that to people.
>>
>>37306812
If the filename for the reference audio is too long it will throw an error when trying to generate the audio.
>>
>>37311628
Windows API doesn't support paths longer than 260 characters. I'm working on a UI change that might help with this, but it's not really a bug.
>>
>>37311747
>>37311774
Ok good to note of it because people might stick the lyrics they want to generate in the filename so they might run into the same issue I did.
Thanks.
>>
>>37311780
I just type the lyrics separately in Notepad or something.
>>
File: the_line.png (2 KB, 495x42)
2 KB
2 KB PNG
>>37311226
GPT-J-6B inference notebook has been updated with ngrok-access web interface. Same link.
Instructions are to use Runtime->Run all and wait for the last cell to output pic related, then click the link.
https://colab.research.google.com/drive/13R8MJEDTwinEmUJMLqydKOIcAvWiBIlT?usp=sharing
Now I have to write a post in my Tumblr since the TRC program requires progress tracking.
>>
>>37312085
I've tried to run it several times to no avail. It fails every time on the first step with pic related as the error

I'm using TPU and I've tried both restarting and factory resetting the runtime, with no change in the end result.
>>
>>37312593
Go to the first code block, move line 25 to line 17 and try again. I'm not the owner so I can't update it myself.
>>
>>37312911
Thanks, this seems to have worked.

>>37312085
I've only generated once and I'm already impressed by its coherency.
>>
>>37310998
ebin
>>
Been away from the thread for a bit, what happened to that anon who said he was trying to rig up voice-controlled generation? Did that ever make it to a workable stage?
>>
>>37313305
It works like a charm, everybody loves it.
https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB?usp=sharing#scrollTo=tOXejargIPTq

(Here's Pinkie doing an acapella song as an example.)
https://u.smutty.horse/mbrjeellwhq.wav
>>
>>37312593
>>37312911
Silly mistake. Fixed.
>>
>>37313305
SortAnon put up a notebook 5 threads ago, here's some notable stuff:
https://u.smutty.horse/mbnemwqjxln.ogg
https://u.smutty.horse/mbolteukctt.ogg
https://u.smutty.horse/mbqacswulum.ogg
https://u.smutty.horse/mbqddkmktnx.wav
https://u.smutty.horse/mbqdwvygrxn.mp3
https://u.smutty.horse/mbqkqvfyaos.wav
https://u.smutty.horse/mbqrdwgevpa.wav
https://u.smutty.horse/mbqporevtcw.ogg
https://u.smutty.horse/mbqgtteqdcj.mp4
https://u.smutty.horse/mbrqwdwjekk.wav
https://u.smutty.horse/mbsiezecxpw.ogg
https://u.smutty.horse/mbsixqorygt.ogg
https://u.smutty.horse/mbvckphwqad.wav
https://u.smutty.horse/mbwiirbsscv.mp3
https://u.smutty.horse/mbvfpojqtss.wav
https://u.smutty.horse/mbtxajqrtdk.mp3
https://u.smutty.horse/mbujesuprdf.wav
https://u.smutty.horse/mbvfaothukn.wav
https://www.youtube.com/watch?v=QkGuSVkunkA
https://www.youtube.com/watch?v=mBYTgYtwgmI
https://u.smutty.horse/mbypxpvtomc.wav
https://u.smutty.horse/mbyrxvsgrgd.wav
https://u.smutty.horse/mcamdlgqtkp.mp4
https://u.smutty.horse/mcbpnpgofmv.mp4
https://www.youtube.com/watch?v=BBSUc6aT-IA
https://u.smutty.horse/mcbvbohuldg.wav
https://u.smutty.horse/mcgzletwotq.mp3
https://u.smutty.horse/mcgzlessmrw.mp3

Some shameless plugs up there and currently trying to make this Shania Twain one work:
https://u.smutty.horse/mchvphwakky.mp3

>>37313324
This one's better
https://u.smutty.horse/mbrjphugmmx.ogg
>>
>>37292923
Actually, one way to achieve frequency equivariance could be with the constant-Q transform. It's like a Fourier transform, but the bins are geometrically spaced. So, you can make the center frequencies correspond to a musical scale. This means that a pitch shift would correspond to a shift of the bins in the same direction.
I've seen some papers use it, but overall it's not as common as the STFT.
>>
>>37314028
Excuse my dumbness, but didn't AI uses MEL transform for frequency stuff?
>>
>>37314569
Yes, but not all bins of a Mel spectrogram are geometrically spaced. So, a pitch shift, or multiplicative scaling of the audio, won't always result in an additive scaling of bins.
Maybe this is good, because human perception of pitch doesn't perfectly follow a geometric scale anyway. But, if you want equivariance to pitch shifts on a musical scale, the CQT is still probably better.
>>
>10
>>
File: tpdne-segmentation.png (550 KB, 1014x835)
550 KB
550 KB PNG
TPDNE has been used in a paper on unsupervised StyleGAN segmentation using CLIP. As far as I know, this is the second time ponies have been used in a published ML paper (first time was iCartoonFace).

>Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP
>https://arxiv.org/abs/2107.12518
>https://github.com/warmspringwinds/segmentation_in_style
>>
>>37316125
>When AI can do better muzzles than G5.
>>
>>37310111
Is there a way to share files without me having to manually clear metadata every single time? Otherwise, it won't be an anonymous project.
>>
>>37317212
Depending on how the meta-data is stored, there's probably programs that'll do it for you.
>>
>>37317231
I'm mostly concerned about the computer name and location data in stl, (solidworks) part, and (prusa) 3mf files.
>>
>>37317250
I think the best way to handle the metadata for the stl is to use a patching program like http://www.romhacking.net/utilities/240/ to create a patch that you can apply to any stl file. *(You do this by taking one of your files that has the metadata still, and manually editing the metadata out of it, then make an .ips file with these two). This may not work, but if it does, it'll save time. I haven't found anything that says 3mf stores metadata in a common way across files, but it would be weird if it didn't. So this method may work for both.
>>
>>37317338
Wow anon, thanks! I found anonymous github, and looking into that now too.
>>
File: Ews84PPXIAkJj8.jpg (98 KB, 1080x596)
98 KB
98 KB JPG
https://u.smutty.horse/mcihghgobfv.wav

Gais Van Baelsar speech from FFXIV delivered by Glimmer, hope you enjoy
>>
>>37317680
Starlight Glimmer is a nigger.
>>
>>37313563
Jesus, the dub of the Mentally Advanced Series almost sounds just like the VAs did it. Fits perfectly.
>>
File: 1623279368168.png (218 KB, 1200x595)
218 KB
218 KB PNG
>>37286875
https://www.youtube.com/watch?v=NynmEU-tCBA
So here it is, another long audio episode, this time I've done it 100% without using the 15s voices while going full ham on the ambient and sfx sounds.
I hope all you guys will enjoy watching this one.
>>
File: partying internally.png (139 KB, 348x348)
139 KB
139 KB PNG
>>37318223
Interesting.
>>
>>37318223
That was a good watch
Flutters could be really adorable as a tap-dancing god
>>
File: RAP BATTLE.png (2.01 MB, 1920x1080)
2.01 MB
2.01 MB PNG
>>37286875
The Mane 6 get together for a friendly rap battle, courtesy of Pinkie befriending Tyrone, but it turns out there’s more to him than anypony expected...

https://youtu.be/psS0fTknd-c

This is another thing I’ve wanted to do for ages, now finally realised with SortAnon’s TalkNet, and a slammin’ beat courtesy of BGM. Thanks very much to you both for making this possible.

Based on this /mlp/ thread from 2012:
https://desuarchive.org/mlp/thread/6709157
https://u.smutty.horse/mcidnwljcym.png

High quality downloads:
Full version - https://mega.nz/file/Nc4mFTYZ#h65qzHd57Ser8ObYz0BXn5imdzaBeoSjfHqY104hw9o
Rap battle only - https://mega.nz/file/dIwSFJiB#_AnhKoHngCn4NKGTTu5FU2OdkRUoOHKZ8g-wzn7BR4M
Instrumental - https://mega.nz/file/oZgyzZxT#qaTr8vMcYMakKdZNSFN0dmSBEaeXyJ2bGN1fvaAtQT8

>>37318223
Nice work mate, I enjoyed listening and seeing you get better at using sound effects. One thing I did notice is that a lot of the lines got cut off just before the end, which sounded a bit weird. Not sure what was going on there, but was still a fun story and I'm happy to see you still making these.
>>
>>37319348
>Mane6 Rap Battle
Now thats good shit,
Btw, were do you get the Tyrone sounds? I hear that voice inserted in ytp from time to time but I can't never find original source.

>lot of the lines got cut off just before the end
I run into a problem of generating sentences that kind of breaks the last word in the sentence (for some reason mostly with TS and RD lines) in like it 9 out of 10 times, a weird cracking, shimmer or other noise is added to it.
So I solved it by adding extra word and just cut it out n the editor.
I know I can generate the word on its own or in another sentence and try to swap it however a lot of time it will have a different speed, pitch and/or tone to original sentence and to me it sound more jarring than cutting the sentence short.
I think that could be solved with arpabet control like in the DeltaVox, with enough dicking around any word can be broken and rearranged in such way that I can force it to work.
And before someone post to just use more of the reference wavs, I've used that function for some of words that tts had trouble to generate but sadly most of the time my way of speaking did not go hand in hand with the way pony models wanted to pronounce the word (and on more autistic note, pretty much all the lines were generated between 5 to 20 times, pic related).
>>
File: CMCMilkshakes.gif (569 KB, 498x279)
569 KB
569 KB GIF
>>37319348
Love how it turned out. It was a lot of fun to collaborate on!

>>37319473
>were do you get the Tyrone sounds?
They're from my music program, Logic Pro. I exported them for Clipper, if anyone else wants them, here they are. They're all locked to 100BPM cause that's what was relevant for the project.
https://drive.google.com/file/d/1wXk1FfoLzzIR11-5qApgu2_3Jvq2tuPY/view?usp=sharing
>>
File: wut.gif (803 KB, 512x512)
803 KB
803 KB GIF
>>37319348
>Applejack
>>
Hey Sortanon, i think you need to fix something in your colab talknet training code.
I just got some missing modules on step 5 (easy to fix with '!pip install' command ) and a wall of text of errors on the step 7. ive change the batch size to 1 but that didnt fix it.
https://pastebin.com/dhWxrtxQ

RuntimeError: The size of tensor a (217) must match the size of tensor b (685) at non-singleton dimension 1
>>
>>37320238
Still works on my end. Post your dataset.
>>
>>37318223
What voice is Twilight Sparkle at 6:15?
>>
Voiced MLP mod for Idol Manager when
>>
>>37305246
https://u.smutty.horse/mcisjivsmbx.mp4
>>
File: damn.png (584 KB, 669x690)
584 KB
584 KB PNG
>>37319348
hnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnng applejack
>>
File: 606957.png (248 KB, 1131x946)
248 KB
248 KB PNG
>>37319348
noice
>>
Hello. Which model was used for this? https://u.smutty.horse/mbtxajqrtdk.mp3
>>
>>37321564
The singing model
>>
>>37321578
This one? https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB
>>
File: twi_sunset.jpg (1.74 MB, 2400x1697)
1.74 MB
1.74 MB JPG
The Weeknd - I Feel It Coming

https://u.smutty.horse/mciudfgpcmg.mp3

5am drop, chorus might be a bit off but whatevs, Enjoy
>>
>>37321626
I love how she holds those long notes, or whatever they’re called. Good song choice too, nice job anon.
>>
>>37320441
https://u.smutty.horse/mciutndcxiu.zip
Here
>>
Can't get the setup.bat to run with the offline talknet
>>
>>37321588
Yes
>>
Hey Sortanon, the offline Talknet is not detecting my gpu anymore, I've have done nothing code/driver wise with my pc between now and >>37308861 this time.
>>
>>37321741
Fixed. Delete your output folder and try again.

>>37321922
What happens when you click on it?

>>37322229
I haven't changed anything either. Try rebooting your PC. If that doesn't help, go to line 40 and change "cuda:0" to "cuda".
>>
>>37321626
>>37321663
Fuck, I forgot to fix some of the notes

https://u.smutty.horse/mcixvofmfxh.mp3
>>
>>37322342
This is fantastic, very well done.
>>
File: 1621245727500.gif (36 KB, 300x260)
36 KB
36 KB GIF
>>37319348
Damn this was good
>>
File: missing.jpg (21 KB, 485x173)
21 KB
21 KB JPG
>>37322255
>Fixed.
yep, its training allright.
>Try rebooting your PC
well, it is working now but I also got welcomed by this message on start up, not sure if this fille and error were connected or not.
>>
>>37322549
https://u.smutty.horse/mciyzjcedsx.mp3
Hmm, the trained model sounds really out of breath (Spectrogram 400, HiFi-GAN. 2,000 steps), I think next time I will try using the pretrained spectrogram and see if that would made any difference.
>>
>>37322342
That was amazing to listen to, great work anon.
>>
>>37322549
That's the auto-updater for GeForce Experience, I think. There's not a lot of info about it.

Something's wrong with your system. Install CrystalDiskInfo and check your C drive. Run Windows Memory Diagnostic and test your RAM. If none of those are failing, remove all the NVIDIA stuff and do a clean install of the latest drivers.
>>
Is anyone training a pony module for NovelAI? I wonder how it'd compare to Delta's FIMFiction model.

https://novelai.medium.com/custom-ai-modules-dbc527d66081
>>
>>37323935
is this something that I could download and use with KoboldAI to play offline?
>>
>>37323960
No, but others will probably copy what they've done. A 200 KB module is a lot easier to share than an entire GPT-J model.
>>
>>37323935
Delta's model should be better:
>12 GB of Fimfic stories vs 10 MB max for NovelAI
>GPT-J with 6B params vs GPT-Neo with up to 2.7B params
>Trained for 20k steps (about 34h according to >>37290865) vs a maximum of 8k steps (about 24 min) per month
>Free vs having to pay $25/month for Opus tier to train
NovelAI might be better if you only want to finetune on a tiny amount of data and don't mind paying up. Maybe running GPT-Neo on their servers is faster than running GPT-J on Colab. But quality is probably much better with Delta's model.
>>
>>37324237
They're both based on GPT-J.
>>
>>37324281
Oh, where does it say that? I'm not familiar with the company, so I just looked at two recent blog posts, both of which say Neo:
>https://novelai.medium.com/roadmap-pricing-launch-scaling-new-features-cfb7efa445eb
>https://novelai.medium.com/the-first-month-of-novelai-30a4a551a4ba
The second post also confirms that it's the 2.7B Neo model.
>>
>>37324292
Never mind, I see in the FAQ that they do have a GPT-J model too. The main advantage of Delta's model is just the dataset size and training time, then.
>>
File: 1626980073217.jpg (62 KB, 564x680)
62 KB
62 KB JPG
>page 9
>>
File: 1623663668141.png (94 KB, 460x381)
94 KB
94 KB PNG
>>37324947
huh, fast weekend, I guess people are busy with stuff.
>>
>>37325467
The board has been stupid fast recently. I don't know what exactly is going on, but you can't even leave a thread unattended for a couple of hours anymore without dying due to the insane post volume these days. I know for a fact this wasn't a problem a couple of years ago.
>>
>>37325479
And the crazy thing is it seems less stuff than ever is going on. Probably just EQG and G5 spam ramping up again.
>>
>>37325479
>>37325496
>probably just EQG and G5 spam ramping up
There are usually 8-12 eqg "human" threads at once, the last three days we've had 20+. Yesterday there were 23. That extra dozen threads makes a big difference.
>>
>page 9
IM GONNA SAY IT
>>
>>37326026
Me too.
>>
>>37322868
https://u.smutty.horse/mcjksnphbvp.mp3
trained to same spectrogram and hifigan, but soft starting from other modem spectrogram made it sound worse, however this is probably my fault because i've used male voice. welp, ive used up all my free gpu so i will need to wait few days until further tests.
BTW, if the guy whom trained the Solder on talknet could share what option and how much he trained his model, I mean I know there is going to be massive difference between 2 minutes model and 30 minutes but still.
>>
>>37325641
I wish the mods actually did their fucking jobs and stopped this spam. We lost a bunch of extremely valuable threads already to this rampant spamming, like instances of tempo or elaowf. All because the mods just don't give a shit.
>>
Since TEMPO is dead, I'm looking for any copy of a Glitch VST or Glitch 2 by Illformed. Thank you. This is for a vaporwave.
>>
is there a tutorial for the talknet stuff?
>>
>>37327788
>>37307999 ?
>>
>>37327847
Follow the instructions written at the top of the notebook. Ask here if you run into any errors.
>>
>>37327852
I found it, sorry for pestering. Even more embarrassing, I had a copy of a GLitch 2 VST in my folder already. So if anyone wants it, I can provide.
>>
okay I checked the Google Doc but I don't see the link to TALK.net. do I need to check the archive for those dead posts in the OP?
>>
>>37327886
https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB
>>
>>37327886
You added a period that shouldn't be there, that's why you can't find it. It's "talknet" not "talk.net"
>>
Has anyone wasted their free trail of splitter.ai 's better 2 stem model to figure out if it's viable for getting clean voicelines for training or talknet stuff yet?
>>
>>37328145
From what I've seen lalal.ai is still better
You have to give an email for a full 10 min max song though
>>
>>37328145
>>37328759
I used lalal.ai to extract vocals for a tempo collab a while back. It works pretty well for what it's intended but it'll probably damage the vocals and retain too much noise to be usable for the purpose of clean lines. Izotope RX7 would probably work better anyway since lalal.ai is specifically for music while the dialogue isolate in RX7 seems to be much more geared towards general purpose.
>>
Page 10 in four hours? Still not use to the recent boost in board speed.
>>
>>37329304
>the recent boost in board speed.
It's spam. It's just getting worse.
>>
>>37329304
>>37329372

The cup is going on. That always brings a bit more activity to the board as well.
>>
>>37328145
>>37328759
>>37328778
there's also `vocal remover 5` which has new models optimized for extracting the vocals (at the cost of confusing naming: the vocals end up in the INSTRUMENT file)
i'd say it's similar to the old lalal.ai model (to my untrained $20-headphone-having ear), that is to say you might be able to extract some bits

samples with unhelpful commentary, based on this wander over yonder clip: https://files.catbox.moe/qxs5mb.wav

vr5 vocal_2band: https://files.catbox.moe/p6g7v6.mp3 (meh)
vr5 vocal_hp_4band: https://files.catbox.moe/pdl2k0.mp3 (similar?)
vr5 regular 4band: https://files.catbox.moe/5krclh.mp3 (a lot more foley makes it in)
lalal.ai new mild: https://files.catbox.moe/q1i92z.mp3 (a bunch of foley but at least it doesn't dip)
lalai.ai new aggressive: https://files.catbox.moe/5q7dgq.mp3 (similar to above but cuts out the bit at the end)
lalal.ai old mild: https://files.catbox.moe/vbfm8e.mp3 (bad awful horrible)
lalal.ai old aggressive: https://files.catbox.moe/a0s12r.mp3 (similar to vr5)

the python can be downloaded from: https://github.com/Anjok07/ultimatevocalremovergui/tree/v5-beta-cml (the 5.0.0 in `releases` is outdated)
and the models from: https://github.com/Anjok07/ultimatevocalremovergui/releases

alternatively here's a colab notebook for it: https://colab.research.google.com/drive/1eK4h-13SmbjwYPecW2-PdMoEbJcpqzDt?usp=sharing
`pretrained_model` should be `MGM-v5-Vocal_2Band-32000_BETA1` or `...BETA2`, `parameter` should be `2band_32000.json`, and `aggressiveness` should apparently be `0.5` but i can't tell the difference
>>
>>37329971
The vr5 samples are really impressive. I'm surprised you think they sound worse.
>>
File: AdorableFilly.jpg (52 KB, 1024x1024)
52 KB
52 KB JPG
What's the latest on the animation AI? Is that still making good progress?
>>
bump
>>
Yesterday I heard a 15.ai pony song crossover with the song Everlong from Foo Fighters, but been looking on this thread and I havent found it unless I'm blind and retarded
Any help?
>>
>>37286871
Is it finally online? C'mon I want to start my pony review series and I don't want everyone to recognize my voice. I want to use my voice for other things without everyone yelling "Haha he started off with ponies, haha he can't write worth a shit but he criticized the show for bad writing"
>>
File: 235968.png (182 KB, 583x510)
182 KB
182 KB PNG
>>37331533
>caring about what normies think
Give up, you've already lost.
>>
>>37331452
https://u.smutty.horse/mccpdvywsdw.mp3
This is reupload link, ive renamed the file and cant find the original (I think it was first posted in TEMPO threads).
>>
>>37331557
Fine it's actually because I'm too lazy to voice it myself, but I can't stand the way Google.AI sounds.
>>
>>37331569
Thank you based fren
Foo Fighters is my jam
>>
>>37331569
fuck, this sounds better than expected
>>
>>37331572
Give him a couple more days, it'll be back soon
>>
File: 15.jpg (20 KB, 480x124)
20 KB
20 KB JPG
>>37331626
>>
https://u.smutty.horse/mcjxkqgbrwp.mp3
TF2spyTN_TalkNet, Train spectrogram 400, Train HiFi-GAN 7000 steps
1BbatM94deM1iCYBiH8Lib-tfv-olKF4h
Im taking a step back from doing audios to train the tf2 voices, here Spy example, the no-reference talk, singing reference and talking reference, it seems Spy model have preference to work with deeper male voices and it has bit of trouble "stretching" it's notes.
Im totally expect one of you guys to make a cover of some french song by the end of the week.
>>
>>37331681
...fuck, forgot to quote the anchor post again >>37286875
>>
>>37331681
The normal talking part actually sounds pretty good, nice job!
>>
>>37331452
See >>37261796
>>37331569
That's the original link, it came from a previous PPP thread
>>
>>37323935
Here you go Anon. There are only two MLP modules at the moment. One is for Fallout Equestria and is trained on the Original plus part of Project Horizons and my Friendship is Optimal module that I managed to train on most of the canon compatible stories. I do want to try Deltas model on KoboldAI to compare the two though. its just that NAI has a way better interface and memory functions.

anonfiles dot com slash zaPeX79du7 slashMLP_NAI_Modules_7z
>>
>>37331972
This doesn't have the Fallout Equestria module, only the lorebook.
>>
>>37332182
Well oops, here

anonfiles dot com slash F2k0Y896uc slash Fallout_Equestria_Universe_2_module
>>
>>37332212
Thanks. I'm surprised no one's trained a more general pony module yet, though. The closest I've seen is a foalcon model on /vg/.
>>
>>37332280
Well I do have more training steps, we were given another full month worth after a glitch. I just dont know what fics and stuff I should use for it.
>>
File: full.png (758 KB, 1280x720)
758 KB
758 KB PNG
Could somebody please train TalkNet models for the CMCs? They don't have to be singing, but it'll be a nice bonus.
>>
>>37332819
I'll have a new batch of characters trained in a few days. Thanks for reminding me.
>>
File: large.jpg (113 KB, 922x1024)
113 KB
113 KB JPG
>>37332847
Have considered training from the Kristen Chenoweth dataset?
https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ/folder/JMJnlYYJ

Clipper was complaining that the voice Chenoweth used here to narrate her audiobook is too neutral to sound like Skystar, but this is exactly the kind of application that voice style transformation is suited for. If we want her to sound more emotional, we can supply the emotion ourselves in the reference audio.
>>
>>37332847
I'm interested; since TalkNet is a single speaker model, what would happen if you trained it on data from multiple characters? Would it end up sounding like an amalgamation of those characters, or would it sound uneven and constantly shift in and out of different "voices"?
>>
Is there just a plug and play way to make use of this? Like just feed a bot some text and get a voice out?
>>
>>37332997
Not until 15 puts his site back online.
>>
>>37333036
>>37332932
What about the Tacotron 2 notebook,or the Talknet Notebook?
>>
>>37333053
I don't know if you're the same anon from >>37332997, but someone who asks that question wouldn't consider them to be "plug and play".
>>
Just to note that if you are using the local talknet program to not to have too many GPU intensive programs open while generating audio. Had some editors open in other windows and the GPU soft crashed causing the programs to close without saving.
Nothing major was lost but be careful when doing such things.

Also always make backups.
>>
>>37332847
Great work on the talknet stuff, just started messing with the "normal" voices using reference audio and the results are very interesting. Also the program is a godsend.
>>
>>37331681
Now we can have Spy voice all of his lines from the TF2 comics. https://www.teamfortress.com/comics.php
>>
"CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 7.43 GiB total capacity; 193.70 MiB already allocated; 4.81 MiB free; 244.00 MiB reserved in total by PyTorch)"
and that's what Talknet said
>>
>>37333744
what version are you using ? the offline version has option to change the pitch stuff to work off the cpu, in the file controllable_talknet.py change "CPU_PITCH = False" to "CPU_PITCH = True".
For colab one i guess you can try the Runtime - > Restart Runtime and than run the last cell.
>>
>>37333761
found the fix. restarting didn't work but factory resetting worked
>>
>>37332280
Link?
>>
>>37333865
>still thinking LINK is a good buy
imagine not holding at least $100 worth of GE before today
>>
>>37333744
>>37333772
This is what happens when Google lends you a Tesla P4 instead of a T4. In the last thread, I mentioned that you can see this by running only Cell 1 first: >>37262053
>>
File: 1625853309374.gif (340 KB, 500x500)
340 KB
340 KB GIF
>>
i am in dire need of 15 to return, but if he's gone for this long that must mean he's making sure its extra good, or trying some new thing he thought of.
>>
File: Haha No.png (18 KB, 411x293)
18 KB
18 KB PNG
>>37335387
>if he's gone for this long that must mean he's making sure its extra good
>>
>>37335439
He's passed out in a lab with VR goggles and headphones. Victim to his own invention as his mind lives in equestria while the body withers away. God I wish that was me.
>>
>>37334266
>This is what happens when Google lends you a Tesla P4 instead of a T4.
Interesting. Pascal cards seem to be the common factor, but I've never seen it happen on a P100. Maybe the extra VRAM is enough to mask the problem.

>>37332981
I trained a multispeaker TalkNet last month. It works fine, but the audio quality's worse. I plan to experiment with it more later.
>>
>>37335823
He finally made it to Equestria...
I actually know someone who's friends with him. Dude's been playing TF2 here and there.
>>
>>37334266
I hope not to get the p4 too often
>>
>>37336056
>It works fine, but the audio quality's worse.
Is this regardless of the source audio quality? If you fed it, say, hours and hours of perfectly clean, perfect quality audio ripped from, say, a video game, or from a sufficiently large audiobook sample, does it make any difference?

I'm mostly interested in if it would be possible to essentially create a "hybrid" voice based on a sufficiently large and high quality enough sample set of two character voices.
>>
File: 1627587905755.png (125 KB, 385x383)
125 KB
125 KB PNG
Why don't we just create an AI to create better AIs
>>
Apple Bloom, Scootaloo, Sweetie Belle and Cozy Glow have been added to TalkNet.

>>37336775
You'd want a multispeaker model for that as well. I haven't tried it, but mixing speaker embeddings should let you "morph" between two different voices, kinda like pic related with StyleGAN.
>>
>>37336502
He's always playing TF2 and he's fuckin nasty at the game too
>>
>>37336796
If only it were that simple...
>>37336903
Voice morphing sounds very interesting. Would allow for many possibilities in regards to new voices. Like a voice version of This Pony Does Not Exist, but not to that extreme and with more control.
>>
File: discord sceming.gif (686 KB, 600x1000)
686 KB
686 KB GIF
Would it be greedy and selfish if I were to ask SortAnon of a singing model of Discord so I can do this dumb meme better?
https://u.smutty.horse/mckppypurdz.wav
>>
>https://derp.link/NR7Xi (Ngrok Synthesis)
Why does this thing have minor ass characters that only showed up once yet no Twilight Velvet?
>>
>>37336903
>but mixing speaker embeddings should let you "morph" between two different voices
I don't suppose this would help voices with very little data by using a much more robust and stable voice as a crutch, right?
>>
>>37337725
Multi-speaker models always sound way worse than single-speaker models, that's why everyone except 15 is using single speakers.
>>
>>37337725
we're already kind of doing that, instead of training a fresh model every time we use a pretrained model which had 24 hours of training data ( https://keithito.com/LJ-Speech-Dataset/ )
though i do wonder if a model pretrained specifically on cartoon voices would fare better
>>
>>37336903
Awesome, great to see the character list expand.

>>37336056
>multispeaker TalkNet
This brings a question to mind. Would it be possible to get singing models from characters with little to no singing data with a multispeaker? I ask because I recall many of the speakers from Cookie's Ngrok model to be capable of their rudimentary singing, even those that had little to no singing parts in the actual show.
I don't know how this all works, I just want more singing models.
>>
>>37337304
Thank goodness Travis Stebbins provided acapella tracks.
>>
>>37338602
It was really nice of her, yes.
>>
>>37338602
The acapellas have a lot of harmonies that mess up the pitch detection, so I'm actually singing the harmonies myself in this.
However, at least for Discord, TalkNet can't do stretched notes well at all, so it's tough to do the rest of the track, especially the "discord~!" chants.
>>
>>37337963
I think it would, specifically because the LJ Speech dataset uses older/more formal english and lacks emotion and dynamics in voice tone.





Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.