[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

[Advertise on 4chan]

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.
  • There are 80 posters in this thread.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AltOP.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
TwAIlight welcomes you to the Pony Voice Preservation Project!
https://clyp.it/tm03e5en

This project is the first part of the "Pony Preservation Project" dealing with the voice.
It's dedicated to saving our beloved pony's voices by creating a neural network based Text To Speech for our favorite ponies.
Videos such as youtu.be/GuJKTodX1FA. or youtu.be/DWK_iYBl8cA have proven that we now have the technology to generate convincing voices using machine learning algorithms "trained" on nothing but clean audio clips.
With roughly 10 seasons (9 seasons and 5 movies) worth of voice lines available, we have more than enough material to apply this tech for our deviant needs.

Any anon is free to join, and many are already contributing. Just read the guide to learn how you can help bring on the wAIfu revolution. Whatever your technical level, you can help.
Document: https://docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU

We now have a working TwAIlight that any Anon can play with:
https://15.ai/
https://derp.link/vCzm2 (48KHz Training)
https://derp.link/hdJQF (48KHz Synthesis)
https://derp.link/NR7Xi (Ngrok Synthesis)
https://derp.link/YTJ94 (Guide)

>Active Tasks
Cookie is working on controllable speech
Research into animation AI
Research into pony image generation

>Latest Developments
Clipper sorts animation files (derp.link/O24pp)
Clipper looking for AI skit ideas (derp.link/JfVsA)
Clipper collecting sound effects from show (>>36723767)
New DeltaVox (>>36812261)
Training notebook for HiFi-GAN (>>36874641)
New guides and notebooks for training/exporting models for DeltaVox RS (>>36898031)
Clipper voice dataset (>>36901235)
Clipper added to the HiFi-GAN notebook (>>36905521)
Train your own CLIP model (>>36930047)
GPT-2 model released (>>36930714)
Animation script finished (>>37003821)
New audacity to TacoTron training text tool (>>37025693)
15 makes updates to test site
TalkNet as a potential replacement for TacoTron (>>37040781)
TalkNet update (>>37082982 >>37119597)
Public contributions reintroduced for next year's panel (>>37099451)
Singing Talknet models (>>37134971 >>37144858)
Animate automation tool available (>>37147092)
DiffSVC UI done(>>37150296)
GDrive clone of Master File now available (>>37159549)
New TalkNet models (>>37179832)
Better copy of show bible available in doc (>>37246652)
FiMFic based GTP-J-6B demo notebook (>>37284129)
Latest Synthbot progress report (>>37241505 >>37251301 >>37253865)
Latest Cookie progress report (>>37241115)
Latest Clipper progress report (>>37189422 >>37193768)

>Voice samples
https://derp.link/fHs3K
https://derp.link/O1xdh

>Clipper Anon's Master File 2.0:
https://mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ

>Synthbot's Torrent Resources
https://derp.link/ZJNca

>Cool, where is the discord/forum/whatever unifying place for this project!?
You're looking at it.

Last Thread:
>>37240950
>>
FAQs:
>READ THE DOC
Do it now
derp.link/V7cMp

>Where can I find things made with the voice AI?
In the Good Poni Content folder: derp.link/23EUs

>Did you know that such and such voiced this other thing?
Yes. We are very much aware. It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imatitions of official voices?
No.

>How do I make the voices?
Several guides are available. In depth guides on how to do training and synthesis (making the ponies speak) are in the doc. If you don't want to use the navigation bar in the doc, the sections are also directly linked in the OP. If you want to use the WiP 48KHz notebook, some kind Anons have put together some image guides for you.
48KHz Training: derp.link/wW2hX
48KHz Sythesis: derp.link/j4MXQ

>How do I make the ngrok links?
Doc: derp.link/SfIhY
Video: derp.link/qYgIp

>Where are all the voice samples?
In the doc.

>Is a place I can find all the pony models?
In the doc.

>What about muh waifu?
Check the doc.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phoenetic transcriptions of other languages.

>What about [insert OC here]'s voice?
Not a priority. Again, however, you're welcome to. There are already people doing this.

>Where can I view the PPP /mlp/con panel?
>2020:
YouTube: youtu.be/WtuKBm67YkI
CyTube chat: pony.tube/videos/watch/b83fbbfc-6d4e-4768-8deb-edb61ea38abb
>2021:
YouTube: youtu.be/RAYWr1uOGVM
CyTube chat: pony.tube/videos/watch/56cf0502-0ef8-41a7-96c5-bd7cb727bb9f

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
derp.link/CQ3Ca
>>
File: WhiteAnk.jpg (966 KB, 1280x1920)
966 KB
966 KB JPG
>>37286871
Anchor.
>>
i haven't seen 15 in a bit, i miss them already
>>
>>37286879
He posted not even 3 hours ago.
>>
>>37241115
>Cookie 11 days ago
Damn I'm slow.
Anyway, I've;
- written DiffWave
- written HiFi-GAN
- written Fre-GAN
- written new universal BOHB modules so every model and every line of every config file can be "tuned" automagically to improve results.
- written Unet, DilatedWN and FFT versions of CTC models. (conformer version soon too)
- trained the new HiFi-GAN to 842k steps on single GPU
- tuned a few DiffWave + CTC model variants
- moved my data to faster storage
- reduced RAM usage on dataloaders (letting me spin more up at a time to help with small model training)

Infact, reading
>>37241115
> Some main things I expecting to fix;
It looks like I've completed almost everything I said I would do.
Next up is training all the old baselines and comparing them against new models/modules programmatically, and crushing all the bugs I can find.

>>37241494
>What's DiffPitch? A diffusion version of FastPitch?
Diffusion version of DiffSVC.
PPG + Noisy F0 -> Denoised F0 (for a 10~ steps)
It's a small network that works in conjunction with the mean shifting pitch preprocess to ensure all pitch values for the speaker within their speaking range (using ppg to take into account the phoneme being spoken simultaneously).
>>
>>37286975
Oh, using diffusion to clean up F0, that's clever
>>
>>37286884
It's like he's still here, with us, even now.
I swear sometimes I can still even see his posts...
>>
>>37286879
>them
15 is a team of people? didn't know about that
>>
>>37286975
>universal BOHB modules
What is this?
>>
>>37287144
Here's your (You).
>>
>>37287224
It is a parameter tuning algorithm that is slightly faster than the normal ones, combined with my 600~ lines of config file.
I can add any line of the config to a tuning file (along with appropriate limits) and it'll run many many training runs and slowly find the values that result in the lowest validation loss (or whatever I set) possible.
It's really slow, but extremely low effort to use and can find patterns that I miss because I don't expect them to work and never test them.
Since it works directly with the config files and train module, it will work with every model and data feature I build in future, so it should reduce the amount of manual work by a decent chunk or let me work with more GPUs at a time.
>>
File: brain ouch.png (149 KB, 250x250)
149 KB
149 KB PNG
Hello? Yes, 15 inc.? Your Fluttershy model generated static at me, This is unacceptable and I demand to speak to the manager so I may receive compensation for this injustice.
>>
>>37287270
Fluttershy works fine for me, what browser are you using?
>>
>>37287275
I'm just shitposting, models work fine
>>
File: 1440995870531.png (24 KB, 450x352)
24 KB
24 KB PNG
>>37287325
>ppp is actually polacks
Kek, thx slavs.
>>
wew,they really are doing it for free.
Anyhow, anybody making new projects or something? im feeling like re-dubbing some meme songs into with pony voices.
>>
>>37287369
talknet got celly voice or nah?
>>
>>37287369
I'm about to start work on PTS4. Also working on another thing that'll probably come out before that, depending.

>>37287381
Yes for Celly voice, no singing model for her yet though.
>>
File: 1399131466016.png (3.05 MB, 1529x2034)
3.05 MB
3.05 MB PNG
>>37287408
>Yes for Celly voice, no singing model for her yet though.
I mean we are not really far away from it but still I'm really grateful for what all of us made real.
Literally what other fandom made multiple artificial intelligences because they love their show/waifu that much?
In retrospect it really is fucking insane what we have done.
>>
After training GPT I will want to make a porn StyleGAN2. Has anyone mass scraped images from one of the boorus yet?
I'll also want the TPDNE checkpoint to finetune from.
>>
>>37286884
link?
>>
>>37287502
The previous thread isn't archived yet, you can still make it! You just have to believe!
>>
>>37287486
There's the TPDNE dataset, but it's just faces, so it won't be helpful. Clipper was labeling pony butts, could be useful depending on what you mean by porn.
For mass scraping, maybe talk to the altboorus. Don't know if TPA or iwiftp has something.
Checkpoint is linked on the TPDNE website. Pretty sure it uses the estimator branch of Shawn's fork. See https://make.pony.pictures for a usage example.
>>
>>37287369
Making one more effort to attempt the TalkNet. Hasn't worked on my end, but that's probably because I use Firefox. If it still doesn't work, I'll stick to 15ai until I can figure it out; probably will just have to add another browser that works for it.

Currently working on a Sci-Twi audio idea I had a while back. It was on hold when the test sites went down, so I'm working on it a bit tonight. Probably won't be done for a while. Currently sits at 18 minutes long.
>>
File: netscape.png (27 KB, 632x482)
27 KB
27 KB PNG
>>37287517
To narrow down your search, it doesn't work in Netscape Navigator either.
>>
>>37287517
>Sci-Twi
Is this the part where I'm supposed to say
>no hooves
?
>>
>>37287550
Say the line, Anon!
>>
For some reason 15's models really don't like sentences starting with 'Thus, '. Doing so results in audio that either has noticeable artifacts/static or skips over the problematic part really quickly. Using high-emotion contextualizers makes it worse:
>Thus, the two sisters maintained balance for their kingdom and their subjects, all the different types of ponies.|I'm bored.
https://u.smutty.horse/mcfeqknxwzs.wav
>Thus, the two sisters maintained balance for their kingdom and their subjects, all the different types of ponies.|What?!
https://u.smutty.horse/mcfesppbomu.wav
>Thus, the two sisters maintained balance for their kingdom and their subjects, all the different types of ponies.|I can't wait!
https://u.smutty.horse/mcfermkotbx.wav
Interestingly, the issue doesn't seem to apply to similar words like 'therefore'.

>>37287550
>Sci-Twi
>no hooves
Wait hold on, you mean that the (pony) Twilight with glasses that appears in some fan videos is actually from eqg and not an original fan idea? That explains a lot.
>>
File: fondue.webm (2.92 MB, 1280x720)
2.92 MB
2.92 MB WEBM
>>37287578
Anon...
>>
File: you did it.jpg (155 KB, 1280x720)
155 KB
155 KB JPG
>>37287578
>no hooves

Also, the models are really picky like that sometimes. I remember BGM posting an example text for his second pony rap video, and the models only accepted the very last sentence and ignored the rest.
>>
>>37287517
>Currently working on a Sci-Twi audio idea I had a while back. It was on hold when the test sites went down,
Too bad it couldn't be on hold indefinitely you barbie faggot. Take it to one of the EQG threads. It doesn't belong here.
>>
>>37287625
If you niggers put half as much energy into loving ponies as you do into hating EQG we would have robo-waius and probably our own 10th season at this point.
>>
>>37287650
>You will never go to Equestria.
Extremely ironic coming from someone defending a character who left Equestria.

>>37287665
You apparently weren't here from the start but the idea of scrapping audio from eqg caused controversy in the early stages of PPP despite the characters sharing VAs. Ultimately it's a good thing that it was done but it wasn't unanimously praised.
Let me rephrase it, even touching the eqg audio (which objectively doesn't differ from FiM) was treated like a minor blasphemy.
>>
https://u.smutty.horse/mcffdjquygg.wav
https://u.smutty.horse/mcffdjquygg.wav
>>
>>37287680
>even touching the eqg audio (which objectively doesn't differ from FiM) was treated like a minor blasphemy.
Tell me about it. I'm glad Clipper was willing to pick that up for me because I sure as hell wasn't looking forward to dealing with it. I acquired the audio, but that was about all I did involving EQG here. It would be a mistake for any of these barbiefags to think that this meant they were ever welcome here.
>>
>>37287685
>I'm not adding potential data to the project that would increase the quality of our waifus because FINGERS
okay buddy
>>
>>37287743
Do you even read, retard?
>>
Boy am I glad deleted posts don't count towards bump limit, or PPP threads would be a few hundred short thanks to these goobers who piss about purity instead of posting ponies.
>>
>>37287901
Why the hell would you bring this up again?
At best you're an idiot and at worst a troll trying to start the reignite the shitflinging.
>>
>>37287910
Nah, just wanted to make my quirky observation about bump limits, I also feel if it went unreferenced, some jackass might just come in and do it all again thinking it's for the first time.
>>
File: 516501_cropped.png (118 KB, 423x377)
118 KB
118 KB PNG
Just migrated to Linux 'cause Windows kicked it. Speaking of, looks like 15 kicked away the "final" huh?
I look forward to the 14th update past here just so we can get to 15.15 and double it up.

Back to Linux though, is there a Linux specific version for DeltaVox RS? Or am I gonna have to try and Wine the existing one?
>>
>>37287932
Last time I checked the doc, I don't think I saw a Linux specific version. Wine might be your best shot. Imagine not duel booting Win7.
>>
>>37287932
>Linux specific version for DeltaVox RS? Or am I gonna have to try and Wine the existing one?
You're gonna have to use Wine, I have many things imported as Windows-only DLLs (even if they're cross-platform) and porting to Linux is planned but only when I have absolutely nothing else to do. Although it might be unfeasible without major refactors because the Logitech LED API is Windows-only.
>>
>>37287946
Alrighty then, thanks for the quick response. I guess I'll let you know how the Wine-ing goes.
If worst comes to worst I guess I'll be able to run it within a virtual box? Let's hope it doesn't come to that.
I look forward to your linux version though, whenever that comes around.
>>
>>37287970
From the distant memory of trying out a very early build in my Kubuntu 18.04 LTS installation back when I was dual-booting, it didn't go very well, possibly because Tensorflow is pretty big and complex. Interested to see your results.
>>
File: Deltavox results edit.png (454 KB, 1675x982)
454 KB
454 KB PNG
>>37288045
So far results don't seem too good. It can't find certain dll libraries it seems, even though most of them seem to exist still. Strange. The directory in question is identical to what I had working on Windows, so there's no missing files. And Wine is working too as I was able to run the included 'Visual C++ Redistributatable (Install if v140 dll error).exe' just fine.
>>
>>37287262
That's pretty cool. Other than BOHB for hyperparameter tuning, HiFi-GAN for vocoding, and diffusion methods for improving audio quality, have you found anything else that looks like a clear winner?
By the way, it looks like you're not looking into anything involving equivariant networks. I think those are getting really popular. Here's an example used by AlphaFold2: https://arxiv.org/abs/2006.10503.
Equivariance sounds complicated, but it just means "hidden layer outputs have the same structure as their input, and they deliberately reflect certain characteristics of the input." For example, a convolutional layer's outputs have the same pixel structure as their inputs, and their outputs will reflect translations in their inputs. Convolutions also reflect rotations when using rotationally-symmetric kernels, but that's really rigid since that always turns, e.g., 50 degree rotations in the input into 50 degree rotations in the output. SE(3) transformers also reflect rotations, but they can do so more flexibly, e.g., by stretching or shrinking rotations.
I don't think it's been applied to speech yet, but it's something to watch for, especially in voice conversion.
>>
>>37288297
I read that AlphaFold2 is using Invariant Point Attention instead of SE(3) transformers, but I think the idea is the same.
I'm curious what kind of equivariance would be good for speech, though. Translational invariance, of course. That's already built into 1D convs (though aliasing may hurt performance a bit, e.g. see Alias-Free GAN). "Flip invariance" via symmetric kernels doesn't seem that useful. Maybe something could be done with spectrograms, since they're 2D?
>>
File: mpv-shot0034.jpg (264 KB, 1920x1080)
264 KB
264 KB JPG
>>37287684
>Posting Content
https://u.smutty.horse/mcfiokhnvgu.wav
>>
>>37288683
Highlights how strange it is to shout out loud what you're typing as you type it.
>>
File: WoodyFace.png (113 KB, 199x227)
113 KB
113 KB PNG
>>37288683
Kek, nice!
>>
Is SortAnon's TalkNet not working for anyone else? Last two times I tried to use it, the generate button just doesn't work. No error or anything, it just doesn't create an output. I'm running Chrome, I've changed nothing on my end since it was last working.
>>
>>37288808
I tried signing out of my Google account and signing back in, that fixed it for me.
>>
>>37288297
>have you found anything else that looks like a clear winner
There are no clear winners that I can think of. Everything has at least 1 trade-off. Be it performance, variability, stability or coding complexity.
>I don't think it's been applied to speech yet, but it's something to watch for, especially in voice conversion.
Thanks.
I'm going to be rewriting all my networks to use my new modules for at least a few more weeks, but I'll keep this in mind for when I want new stuff to test out.
>>
>>37288816
Hmm, didn't seem to make a difference for me. Still not getting any output from the models.
>>
>>37288565
PCM is a phase and amplitude assigned over time. Voice conversion should probably be equivariant with shifts in most of these things and their rates of change. That means phase shifts (to capture spatial positioning), time shifts (to capture timing... invariance might work better here than equivariance), frequency changes (i.e., change in phase over time, to capture formants), and amplitude shifts (to capture volume... maybe use the Bark scale version of these). The only one it probably shouldn't be equivariant with is shifts in power / MFCCs since that one is usually used as a voice signature to identify the speaker.
>>
>>37288808
well, it did work for me yestarday but now im getting the No module named 'dash', i was able to get it to point of generating the UI window by adding the missing modules above step 2:

!pip install dash
!pip install jupyter_dash
!pip install crepe
!pip install psola
!pip install torch_stft
!pip install kaldiio
!pip install pydub
!pip install frozendict
!pip install unidecode

!pip install pyannote.audio
!pip install g2p_en
!pip install pesq
!pip install pystoi
!pip install ffmpeg-python

HOWEVER that just leads me to the same point of BGM were clicking the Generate button does nothing.
>>37288816
following this on new incognito tab gives me error 403, both with step 3 and step 3B.
Than I factory restarted and run everything again but without extra code and was still getting the error 403, it seem colab is being very fible with this particular code.
Can someone tech savvy figure out how to run this offline because it seems google is really going out of its way to fuck around the colab code.
>>
>>37284805
>The only downside is that I was previously getting the origin point for each shape from JSFL. I can't do that anymore since I don't have the mapping between XFL elements and JSFL elements. I was doing that because I couldn't figure out how to get that information from the XFL. It looks like the <transformationPoint> needs to be converted somehow, and I haven't figured out how.
I'm still struggling with this. I found some code that does this for Unity, and it looks like transformation matrices and origins are tracked separately. As in, it doesn't apply origin changes and matrices alternatingly, it only applies matrices to matrices and origin shifts to origin shifts. I don't understand how that's supposed to work, but I can try it.
>>
>>37288864
just tossing an idea out there but could it be possible to make talknet save the generated wavs in the google drive folder like the ngrok does ? would that make any difference SortAnon ?
>>
File: error 403.jpg (76 KB, 764x398)
76 KB
76 KB JPG
hmm, it seems colab isn't fan of re-downloading files from the same github source, so maybe having all the anons duplicate the talknet to their google drives and than run it from their personal accounts would solve this ?
>>
>>37288901
Just tried saving a copy in my drive and running it from there, still getting the same error BGM reported of the generate button not doing anything. At this point I'm pretty sure that the issue is something inherent to the script that only SortAnon can fix.
>>
File: 1625858392639.png (152 KB, 380x415)
152 KB
152 KB PNG
>>
I've made a list of all the tags used in the master folders for the Music and SFX files for organizing reasons, just posting those here as perhaps some other anons could use those for their own projects.
https://pastebin.com/2CHjh5tW
https://pastebin.com/6BkgJV1w
>>
>>37287144
>>
bump
>>
File: archive all med done.png (43 KB, 1228x484)
43 KB
43 KB PNG
>page 10
Bumping with this chart now that the FIMFiction archive 20k steps finetuning run is complete. Currently readying the model for inference in Colab by slimming it.
>>
>>37290865
FIMFiction-20k model can be played with in Colab.
https://colab.research.google.com/drive/13R8MJEDTwinEmUJMLqydKOIcAvWiBIlT?usp=sharing
>>
On 15.ai, voices of other characters bleed into generated speech clips from models that are trained with 0.3 minutes of audio data.

For instance, the Snails voice model sounds this way: https://u.smutty.horse/mcfotncapfe.wav
>>
>>37291191
Yeah I noticed this as well, I thought Octavia sounded like Twilight and Rarity a bit too much at times.
>>
>>37291191
Good thing nobody gives a shit about snips or snails.
>>
>>37291162
is there a way to download the text model so i could generate my own text offline ?
>>
>>37291346
You can download it from the link that it downloads from the notebook, here, I'm pasting it for you: https://storage.googleapis.com/xdisk/fimfmed.tar
But you need an RTX 3090 and Linux with everything installed or TPUs to run it locally, it's a really, really big model.
>>
>>37290137
Thanks for that, I've added these as "taglist.txt" in the respective "SFX" and "Music" folders.
>>
>>37291366
https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram
Apparently it is possible to run it on 1060 6gb gpu, but it also seems to get a large hit in the performance / quality.DH8PA
>>
>>37291366
>Needs RTX 3090
Wouldn't it be able to work with any decent card, but just allow it to process longer for the same result?
Like... If a GTX 1650 is around 400% slower, wouldn't that just mean it'd take 4 times as long to get a similar result?
>>
>>37291903
problem is you need to physically able to fit the card on computer memory while the correct gpu runs the code, like the above github code is able to get away with using older cards as they are still compatible with tensorflow code BUT they still need to split the load between GPU and RAM.
Having a gpu with over 20GB just means its all in one spot and can be processed all at once, you cannot just tell computer to load 1/4 of the gpt model (like imagine reading a one part of a book that was chopped in four parts like a pie, it would impossible)
>>
>>37291930
Oh I see, so it's not so much a matter of processing the same amount of data over time, but rather needing the required memory overhead in order to support such a massive model.
Thanks for clearing that up, anon.
>>
File: Gasp 2.jpg (136 KB, 1253x1129)
136 KB
136 KB JPG
>>37291162
I'm getting MUCH better results than from the 500 step iteration, though I guess that's to be expected. It actually seems to generate things somewhat coherently, perhaps even getting close to the coherency of the GPT-2 AID model. I'm having a blast with it so far.
>>
>>37288901
>re-downloading files
Ah fuck, I think I know why it's broken now. I'll look into it in a few hours.
>>
Try the new TalkNet notebook and see if it spits out an error message. Post it here if it does. If the error is "VersionConflict", try restarting the runtime (but copy the error message first).

If it still fails without an error, post a screenshot of your models folder, like this. You might need to click the refresh icon for it to show up.
>>
>>37292758
I got the VersionConflict error but restarting the runtime fixed it. I copied the error message if you want me to paste it. Apart from that it seems to be working for me now at least. Tested two different lines and got the inputs as expected.
>>
https://u.smutty.horse/mcftehfskaa.wav
>>
>>37288867
I figured it out.
- For shape objects, the edge format defines the origin point. That origin is relative to the midpoint of the shape. I can dump the origin point and the size of the shape with Animate.
- https://u.smutty.horse/mcftdfylrke.mp4

It turns out that didn't fix the issue I had last time. I'm getting the same animation with the messed up eyebrows, choppy animation, and non-animated wings. BUT the whole thing runs way faster, it's way cleaner (you'll have to trust me on this since the code still looks like shit), and I no longer run into any of Animate's random failures when trying to dump shapes.
I got a hint from the anon in the last thread about what's making the animation choppy. I'm going to commit my stuff for now, upload the updated tool for dumping animations, and work on the choppiness when I get back in several days.
Render Anon or Morph Anon, if you want to try figuring it out while I'm away, I should still be able to respond to posts occasionally. If you need samples to test out, Clipper knows how to convert files to the new format.
>>
File: mlp all low.png (55 KB, 1260x497)
55 KB
55 KB PNG
I am now beginning the /mlp/ dataset finetuning run - it is as I explained in one of my earlier posts last thread. When trained, it'll work as a shitposter and green writer.
>>
>>37292891
Correction: I can't post the updated auto-animate tool yet. Pyinstaller doesn't play nicely with cairocffi. I'll need to figure that out first.
>>
>>37288851
I think human hearing is invariant to phase, so I'm not sure if phase equivariance is useful. And if you use magnitude spectrograms, you're already ignoring phase. I've only seen the mono -> binaural audio NN care about phase since it affects perception there.
Time equivariance seems good. Some invariance could be good for changing prosody? Depends on if it's a 1 to 1 conversion or a seq to seq.
Frequency equivariance makes sense. I guess that's what pitch augmentations aim for.
Amplitude equivariance makes sense too.
Wonder if all of these things need special architectures or just tweaks to existing ones.
>>
File: file.png (64 KB, 301x291)
64 KB
64 KB PNG
>>37292881
Recently remade the /tg/ station Start/End Round Sounds for the /mlp/ SS13 server (/vg/ Codebase), not sure if it fits here but it's content.

Orginal Start/End Round Sounds: https://u.smutty.horse/mcftodlmcrj.wav

Ponified: https://u.smutty.horse/mcftodotcky.wav
>>
>>37292923
Human hearing uses phase to spatially position a sound. That doesn't matter much for our ability to recognize a speaker, but it can have a big impact on how realistic something sounds.
>>
>>37292945
Yeah, I guess I meant to say that absolute differences in phase don't matter. But relative differences certainly do
>>
>>37292758
Got the 'VersionConflict' error, so i follow the runtme->restart routine steps and run it all again.
Now, after pressing the generate button im getting the error 'cuDNN error: CUDNN_STATUS_NOT_INITIALIZED'
>>
>>37293086
ive tried to run it in the incognito opera and once again getting the error 403
>>
File: 1622446133639.png (363 KB, 720x640)
363 KB
363 KB PNG
>>37287144
>>
Is there a way to make Trixie do say The Great And Powerful Trrrrixie?
https://vocaroo.com/1iomCiEQTi0D
>>
File: file.png (173 KB, 900x807)
173 KB
173 KB PNG
>>37293322
I've tried making her roll her R's but to no avail, she sure does sound cute while saying it though
https://u.smutty.horse/mcfvloqfhqy.wav
>>
>>37293086
I think I've fixed this error. If it shows up again, post your output from step 1.
>>
By the way, there's no reason TalkNet can't run offline. How many of you have gaming PCs?

https://www.strawpoll.me/45515369
>>
>>37293388
>output from step 1
I got the usual gpu Tesla T4 15109MiB
And now the colab randomly decided it does want to cooperate on firefox once again, this inconsistency makes me think that maybe this is a problem of too many people downloading the files at the same moment ?
>>37293388
Also since you are talking about offline version, that would be pretty great since I could use the other hifigan or ngrok at the same time when working on a projects.
if I may make a suggestion for the code upgrade , could it be possible to add a "ticked on" option for adding an extra 3~5 seconds of silence at the end of clips? ive notice when messing around the echos and reverb is bit difficult as many editing programs refuse to "go over" the audio clip length making a weird hard cut on the echo effects (nothing that cant be fixed in audacity BUT editing 100+ clips by hand does adds up).
>>
>>37293398
How would you put together a Controllable TalkNet model that runs offline, exactly? And how would the speeds compare even on a good gaming GPU?
>>
>>37293486
>could it be possible to add a "ticked on" option for adding an extra 3~5 seconds of silence at the end of clips?
Sure, I could do that.

>>37293512
It already runs offline. I just need to write a guide on how to set it up.

>And how would the speeds compare even on a good gaming GPU?
About the same speed as on Colab.
>>
>>37293398
I knew being a VRAMlet would bite me in the ass eventually got a 1060 3gb here
>>
>>37293398
wait all it takes is 4GB?
>>
File: thegreatandpowerfultrixie.png (170 KB, 1526x1107)
170 KB
170 KB PNG
>>37293322
Here's what I got from the attached input as the second generation.
https://u.smutty.horse/mcfxsozxuaq.wav

>>37293373
You guys using ARPAbet strings?
>>
File: 1626665526887.jpg (72 KB, 704x702)
72 KB
72 KB JPG
>>
>>37293322
You're going to have to hire Kathleen Barr, or another voice actress who can imitate her. Rolling the Rs is a very human skill.
>>
>>37295424
Everything is a "very human skill" until we get a machine to do it better.
>>
>>37295424
You can get Trixie to trill her R's, you just have to be very lucky and use the right contextualizer.
>>
>>37295424
Funnily enough I remember the earlier models like the 22khz google colab being able to trill the Rs quite often. So the AI *can* pick up on it. It just probably thinks it's unnecessary. So you find a way to specifically tell the AI you need to include it with that phrase.
>>
>>37295515
>better
Machines can never be better humans than humans.
>>
File: Strawman.jpg (23 KB, 474x355)
23 KB
23 KB JPG
>>37296085
>>
Yo anons. I'm not sure if this is a known issue because I haven't participated in anything related to the PPP whatsoever until just now, but I tested the latest MMI Pinkie Pie voice model (11NULGhxh1JTwb7oHBdmT7TAMKvuX-rCg) with the text set as the singular word "I".

This causes a glitch that results in her saying the word "I" like 40 times instead of just once! It's actually hilarious. https://drive.google.com/file/d/1YqWwkJ1U3Og6qC-jog33P2cDal0nz9xw/view?usp=sharing
>>
>>37296295
then don't do that
if you want it to say I then just use a word that sounds like it like eye
>>
File: 2310221.png (232 KB, 777x847)
232 KB
232 KB PNG
God! Why I keep hitting "Generate" instead of Download the sound when I get great result!?
>>
>>37296295
Has there not been a HIFI GAN model trained of pinkie yet?

>https://drive.google.com/file/d/1YqWwkJ1U3Og6qC-jog33P2cDal0nz9xw/view?usp=sharing
Also could you use smutty.horse in the future for file hosting?
>>
>>37296357
Ironically, that has the same result. I don't need the sound for anything though; I was just surprised the notebook broke in such a strange way.
>>
>>37296369
>Also could you use smutty.horse in the future for file hosting?

lol Is that the standard here?
>>
>>37296389
>lol Is that the standard here?
Yes, because there is no chance for arbitrary deletions or content policing.
Also other file hosts compress files sometimes that can cause feedback issues.
>>
>>37296379
yea the notebook's and and even 15.AI are pretty easy break if it's spitting out garbage you should just change what you're putting in
>>
File: oops.gif (1.69 MB, 332x434)
1.69 MB
1.69 MB GIF
>>37296362
I know the struggle, Muscle memory is a blessing and a curse.
>>
>>37296080
I've been thinking about it a lot, I don't think there are any words where R is followed by a W, so I'm wondering if any trilled R could be rewritten in the training data as "RW". It might be dumb, but it would also bake a consistent prompt for trilled Rs into the models for now. (Until models are trained on the full IPA.)
>>
>>37296621
otherwise
And holy hell the captcha was DRW2Y. What are the odds?
>>
>>37296841
n^5 where n is the number of possible characters
assuming full alphabet and numbers (i know some aren't used bear with me) it'd be 36^5, so your odds of exactly that combination are approximately 1 in 60,466,176
>>
File: Anon2.jpg (38 KB, 517x520)
38 KB
38 KB JPG
Does Ruiji still lurk here? The one who posted
https://vocaroo.com/1nvYTdC84VZt
and
https://u.smutty.horse/maujfrovyne.mp3
>>
>>37296362
If only there was a way to temporarily keep the previous generation in memory. Just one cycle, then delete after "generate" is pressed a second time.
>>
>>37296977
or just 1 in 324 if we're just looking for the odds of an R followed by a W appearing anywhere in a captcha, instead of the odds for the full string.
>>
>>37292906
You were asking in the last thread whether 20k iterations was enough. This might help.
https://arxiv.org/abs/2001.08361

Also, the roto-translation equivariance stuff seems both complicated and extremely useful. I'll write up a summary once I understand it better and get some time. >>37288565, if you already understand it, a summary would be great. I'm having trouble seeing where the Wigner-D matrices come from and how exactly to use them.
>>
>>37297527
Unfortunately, I don't understand it. I only skimmed this blog post by the author (https://fabianfuchsml.github.io/alphafold2/), which brings up irreducible representations and Wigner-D matrices quite suddenly. Maybe the paper will give more background info or at least have citations.
But yes, the overall idea is interesting. Again, not sure how effective 2D/3D stuff will be for audio, but you never know.
>>
Out of curiosity, where the hell are you guys going to store all of this data anyways and how big is the filesize now?
>>
>>37297736
hello cia
>>
>>37297736
My PPP folder is 330 GB. It has some non-pony data in it, though.
>>
>>37297736
>Out of curiosity
don't shoot glowing one
>>
This week on things we need to apply to ponies eventually:

https://www.youtube.com/watch?v=0zaGYLPj4Kk
>>
>>37286871
I’m just curious: what tools are you guys using to mine the data and make predictions? I took a data mining class where I had the choice of PySpark, Pytorch, and others for a project and am probably going to use Prophet in a work project.
>>
File: fbi plus hasbro.png (83 KB, 935x65)
83 KB
83 KB PNG
>>37298709
shooo fbi plus hasbro, go and glow somewhere else
>>
>>37287582
this is cute, got more?
>>
>>37298415
that's cool
>>
File: windows.png (301 KB, 515x508)
301 KB
301 KB PNG
I've made a script that lets you "install" TalkNet on Windows. I've only tested it on one machine, so it might still have some bugs.

https://github.com/SortAnon/ControllableTalkNet/releases/latest/download/TalkNetOffline.zip

Extract this somewhere, and run setup.bat first. It'll take 20 minutes, and you need 10 GB of free space, and an NVIDIA card with 4+ GB VRAM. When it's done, run talknet.bat. If everything works, the TalkNet UI should run at http://127.0.0.1:8050/.
>>
https://u.smutty.horse/mcgkxoztihg.mp3
>>
>>37298804
But suppose I were interested in working on this project for an entry-level salary for a master’s graduate. What tools would I need to know how to use? Though I’d probably need to brush up a little on my Python or learn to program in R regardless.
>>
https://u.smutty.horse/mcgkyfqfudo.wav
>>
File: fgddfgdfgfgfgd.gif (15 KB, 220x216)
15 KB
15 KB GIF
>>37299690
yeehaw!
https://u.smutty.horse/mcgkzkjxdmr.wav

pov you just messed up your lines:
https://u.smutty.horse/mcgkzpdjsqw.wav

outtakes:
https://u.smutty.horse/mcgkzowqtbi.wav
https://u.smutty.horse/mcgkzprbnso.wav

what a silly pony:
https://u.smutty.horse/mcgkzmjhmry.wav
>>
>>37299594
Is this TalkNet for training models, or Controllable TalkNet for actually generating outputs?
>>
>>37299899
Generating outputs.
>>
>>37299916
And does the installation allow me to set which drives/directories will be used, or does it use default directories?
>>
>>37299928
It installs everything to the folder you run it from.
>>
>>37299936
Ah, nice. I'll try getting this set up on my primary PC at some point and see how everything runs

Thanks for all the work you've put into this, by the way
>>
>>37299594
I would've tried to Wine this, but my GPU sadly only has 3GBs, not quite enough to match the required 4. Damn.
>>
How much would it be worth it to get one of these to train models on? https://www.pugetsystems.com/recommended/Recommended-Systems-for-Machine-Learning-AI-174/Buy_200
>>
>>37296085
Just a quick reminder, humans are just machines of a different kind. I'm sure some day after ponies are completely digitised we can immortalize ponyfags too.
>>
>>37300284
The ones >>37298709 mentioned are free and open-source, but Pydub and PyAudio may work the job. The issue may be the amount of storage you’ll need to hold the training data, which, as many here will likely concur, is exactly why NO copy of SM64 is personalized.
>>
>>37298804
>>37298709
>when the subsequent leaks proved the absolute basement-tier attempts by habsro to find out who leaked the first time
Wish I could relieve those days. Need more leaks they're too fucking funny at seeing how terrible billion dollar corporations are
>>
>>37300297
The page I linked to was for a Machine Learning PC.
>>
>>37300307
Oh. I thought you were referring to the algorithm or tool to use.
>>
>>37300385
The url itself is pretty vague though.
>>
Are there any idiot guides out there for getting koboldAI to run GPT-J with the fimfiction mod?

Idiot guides for idiots on the level of not knowing wtf any of what I just typed up there means. I just want a new AIdungeon for frantic pony fuckery.
>>
https://u.smutty.horse/mcgmqzdupcs.wav
>>
I trusted the plan and 15 still hasn't come back m8s. It's over.
>>
>>37300479
https://u.smutty.horse/mcgmuilvlnz.wav
>>
>>37300486
https://u.smutty.horse/mcgmvvsibua.wav
>>
>>37300412
I'm looking at their example Colab and I think I can implement the same API and web service but in my notebook, give me tens of minutes to a few hours, depending on how complicated. It doesn't look difficult.
>>
>>37300556
Then give me Carl
>>
>>37300527
https://u.smutty.horse/mcgmxxgfkzx.wav
>>
>>37300584
>>37300556
https://u.smutty.horse/mcgmzskxnmd.wav
>>
>>37300412
from the /aids/ thread. I don't know about fimfiction but i assume that's something you've made?
https://rentry.org/itsnotthathard
>>
>>37299594
it says i need visual studio do i really need visual studio for this to work? or i just bypass it
>>
>>37300748
Normally any program that says you need Visual Studio means you have to bite the bullet, and download the dependency. -T. Various Versions.
>>
File: 1626666647885.png (147 KB, 1903x1057)
147 KB
147 KB PNG
>>37300748
>>37300778
that doesn't look good
>>
File: Snake In Fire.gif (1.99 MB, 448x252)
1.99 MB
1.99 MB GIF
>>37300790
Ooh, that's a lot of red.
>>
File: 1611450346033.png (979 KB, 1440x810)
979 KB
979 KB PNG
>>37299594
well that was something my PC froze for about 20 mins then the setup failed
>>
>>37300778
Does Visual Studio have compatibility with Python and R?
>>
>>37301095
Some cursory google tells me it gets goofy.
>>
>>37300790
>>37301060
RIP
>>
>>37296621
Rwanda
>>
File: vram.png (3 KB, 227x152)
3 KB
3 KB PNG
>>37300256
It might work with 3 GB, depending on how much you're running in the background.
>Wine
It runs natively on Linux. Do you want some setup instructions?
>>
>>37299695
What predictions are you taking about? Most of the AI stuff is Python, and some C++. People are using a lot of tools to work with audio data and to scale up training. I don't think anyone is using anything special to preprocess data.
>>
>>37301060
What happens if you just install this and rerun setup.bat? I really hope I don't need to include a full Visual Studio setup.

https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=16
>>
>>37300609
https://u.smutty.horse/mcgqbkohako.wav
>>
>>37299594
Having trouble running it. when starting the talknet batch it gives me the bellow message.
>no module named 'nemo'
Ive run pip install on nemo but it still tells me there isnt nemo module installed.
Ive also tried to upgrade it but it tossed this message at me, whatever the 'git' command is the python is not recognizing it, are you sure it not supposed to be 'get' or something else here?
>>
>>37301734
i thought about being smartass and forcing the module to be installed before the code ask for it but it still no-go, i will try mesa around admin permissions if that will fix anything.
>>
>>37301746
ive somehow solved the problem by installing the git 64 from here
https://git-scm.com/download/win
than running update and letting every module get sorted.
Now i have just few questions, where the fuck do I upload the reference wavs in the browser only version (since there isn't a colab folder to drop those in) ?
How do I get custom models working complete offline (as far I understand it seems the script can only reference back prewritten models and all new ones need to be re-download ?)
>>
>>37301778
and im getting some weird errors again.
>>
>>37301788
never mind, it seems there is some problem with extracting the Trixie model as script makes the proper folder for it but do not actually export anything in it, just had to do it by hand, also it seems i misunderstood how this works, as you do need to have internet to first fun a new code to download model but after that it will use the model reference to grab the file from model folder.
Im sorry Sortanon if I sounded assholeish above but im just really annoyed with getting random setbacks in everything when all I want is just listining to pony voices.

Well, there is one more error, im getting message that cuda is out of memory were clearly the code isn't even using half of my gpu memory. And yes, i did close it down and reopened but the same thing happened anyway so it seems to be a code problem
>>
>>37301830
Were you offline when you first selected Trixie? I know why >>37301734 is happening, but >>37301788 is still a mystery.

>im getting message that cuda is out of memory were clearly the code isn't even using half of my gpu memory
Windows could be underreporting the VRAM usage. Try opening a command prompt and typing "nvidia-smi".

>>37301778
>where the fuck do I upload the reference wavs in the browser only version
For now, it's the folder called ControllableTalkNet. I'll add a more convenient way to manage it later.
>>
>>37302077
>Were you offline when you first selected Trixie?
no, all the models downloaded fine, even the custom ones, it just for some reason Trixie had to be done by hand.
Actually let me post the one made in past few month here, for the other anons to use:
tf2_soldier_TalkNet.zip
1Gt7sD4fsU0aC06V2zQsn4Vrnj6g2E6xQ
MrRogerTalknet_TalkNet
1qbcYrxgO3f3RIWfOrL9QqFJVuzy0H_W_
NamelessHeroTN
1lhtg5jPfz-9j-re2d4DQ0P1vQCXBqcLw


>Windows could be underreporting the VRAM usage. Try opening a command prompt and typing "nvidia-smi".
I've tried that, it still run out of memory, even after i closed my video editor and rtx voice, this is bit weird since i am able to run the medium gpt2 text model and those also require 4gb of gpu memory.
>>
>>37302203
what gpu do you have?
>>
>>37302228
gtx 1080, it sure doesn't rocks socks off but it can play even the newest games at medium settings.
>>
>>37302238
Same. I'm building a new machine but >GPU prices.
>>
pinkie pie huge ass
>>
>>37302203
It shouldn't use that much VRAM. What happens if you do a line without reference audio?
>>
>>37302444
>What happens if you do a line without reference audio?
If i tick the "disable reference" it works fine, there is no problem generating audio, if I tick that off and click the update list it still works (it just ask to put the reference in), it only breaks when i click the droplist and choose the wav reference file.
>>
>>37302457
So it's the pitch estimator that's the problem. I should try replacing it with torchcrepe.
>>
File: talknet.png (54 KB, 1279x751)
54 KB
54 KB PNG
>>37299594
Working perfectly over here.
>>
File: 1606087027516.png (26 KB, 718x402)
26 KB
26 KB PNG
>>
File: pinkie hmmmmm.png (224 KB, 407x570)
224 KB
224 KB PNG
If nothing derails me too much for next few days I think I will be able to get new audio episode this thursday/friday (also bump).
>>
>>37304192
Awesome, I'm excited to see what you have in store.
I get the feeling this week might be a good one for content.
>>
>>37300293
>he fell for the "brains are just bio-computers" meme
Lol. Lmao.
>>37301514
Lost hard. It's just like Terry used to say.
Also now I've gotta go jack off to the thought of AJ calling me a fucking nigger.
>>
yo who the fuck is this nigger?
https://youtu.be/zqklInNM9H4
>>
>>37304842
Random faggot who doesn't browse the board and got mad that the 15.ai is down so he now lies about it on the internet for views. Ignore him.
>>
>>37304842
>yo who the fuck is this nigger?
>ThunderShyOfficial
Someone who doesn't want to be impersonated apparently. Too bad for them.
I hope this video is satire.
>>
File: song title.jpg (160 KB, 528x436)
160 KB
160 KB JPG
Figured I wouldn't let these go to waste:

The Weeknd - Can't Feel My Face (WIP)
https://u.smutty.horse/mcgzletwotq.mp3

Gwen Stefani - Rich Girl (WIP)
https://u.smutty.horse/mcgzlessmrw.mp3

The AI shits itself around the lines with overlapping or faint vocals so maybe one day it can learn how to distinguish them for better results. Still fun as heck.
>>
>>37304842
>look through his videos
>hes a shitfag
every single time
>>
>>37299594
Thanks very much for this! It works great so far although I did run into the same problems this anon did
>>37301060
>>37301734
but fortunately downloading this
>>37301462
and this
>>37301778
and rerunning the install .bat fixed everything.
>>
>>37305246
>Gwen Stefani - Rich Girl (WIP)
Wouldn't that be more appropriate for Rarity to sing? What kinds of problems does Rarity's TalkNet model have?
>>
>>37305355
Twi is the more stable and flexible horse
Rarity doesn't sound bad now that I notice but she still needs a bit more practice:
https://u.smutty.horse/mcgzzxscvkg.mp3
>>
>>37305343
Thanks for confirming we only need the build tools. I'll update the setup script later today.
>>
>>37305628
I needed to install the "Desktop development with C++" under the "workloads" tab in order for the install to build correctly. Was that what you were referring to?
>>
>>37305650
Yes. I have Visual Studio installed, so I never ran into any errors.
>>
File: mpv-shot002.jpg (198 KB, 1920x1080)
198 KB
198 KB JPG
>Page 10
>>
File: fluttershy puke.gif (181 KB, 480x270)
181 KB
181 KB GIF
>>37306339
>that face
>>
TalkNet installer's been updated. It should be bulletproof now.

https://github.com/SortAnon/ControllableTalkNet/releases/latest/download/TalkNetOffline.zip
>>
>>37306812
made a fresh installation, and once again the no-reference audio works wile the reference audio make the gpu run out of memory.
Is there chance to make the part of code that converts reference wav to reference pitch use cpu instead of gpu ?
>>
>>37307147
It could be an issue with the CUDA install or Nividia's drivers. I had issues like that last year. Did you update your drivers and what GPU are you using?
>>
>>37293398
RTX 3060 12gb here, gonna have sum fun.
I hope
>>
>>37307279
gtx 1080, and whatever the newest drivers were updated four days ago.
>>
>>37306812
there's a whole lot of red words here
>>
>>37307550
And no other applications that could be using a significant amount of VRAM are running while generating the audio?
>>
>>37307675
haven't change anything since last time >>37302203
I guess Sortanon haven't change yet whatever the pitch audio reference code in here is causing this error.
>>
File: Untitled.png (130 KB, 1271x1000)
130 KB
130 KB PNG
>>37307590
>Cannot open include file: 'io.h'
Don't tell me you need the entire Windows 10 SDK to read a file.

Try this. Go to the installers folder, and run vs_BuildTools.exe. Follow the steps in this image, and run setup.bat again when it's done installing. Does that fix the error?
>>
File: 1607903372051.png (618 KB, 1537x1616)
618 KB
618 KB PNG
>>37308014
It works!
>>
>>37307697
I tried replacing the pitch estimator today, and it broke the models' ability to hold notes. So without retraining every character, I'm stuck with the existing one.
https://u.smutty.horse/mchhrqrwewg.ogg
It does run on CPU, but it's very slow. I'll add it as an option.

>>37308099
Installer's been updated again. No one else should run into >>37307590.
>>
>>37308290
Any word on >>37307147 ?
>>
>>37308345
That's what the new option fixes, at the cost of speed.

Run update.bat. Open ControllableTalknet/controllable_talknet.py in a text editor. Go to line 41, change "CPU_PITCH = False" to "CPU_PITCH = True", and save. That should fix the memory problems.
>>
>>37308516
run the update and still nope, it still getting the same old message >>37307147
here is a screenshot a moment before it spams the "out of memory" message.
>>
>>37308575
And you're sure it's set to CPU_PITCH = True? I've tested it on two different machines, and it doesn't use CUDA on either of them.
>>
>>37308617
>CPU_PITCH = True
in what file is that written ?
>>
>>37308814
controllable_talknet.py, in the ControllableTalknet folder. Line 41, just beneath all the import stuff.
>>
File: spike thumb up.png (425 KB, 957x538)
425 KB
425 KB PNG
>>37308835
yaa, it works now, happy times.
>>
Oh man..
I want to make a version of Alabama Nigger but with apul and talk of Ziggers, but im more retard then a nig
Me no understand compooter stuff, what do
>>
>>37308870
You just gotta fiddle around with it until you come to understand it. Too bad there isn't any text anywhere to read that explains it, like say a long running thread, or literal guides written into the notebooks.
>>
>>37308885
I'm trying but all the words start dancing around and shit ahhh
>>
>>37308889
Sounds more like you need a diagnoses.
>>
>>37308896
It's not worth it, FUCK doctors
>>
>>37308899
Take the meds, or face the feds.
>>
>>37308904
>feds
because I can't read good?
>>
>>37308908
Because the government treat the doctor avoidant poorly, despite doctors costing a lot. Also because it rhymes.
>>
>>37308931
Government sucks dick
>>
is the test site down?
>>
>>37309498
Looks like it.
>>
>>37309534
I am back in action, after some stuff happened. Today I experimented with programming angles into the servos. I don't know how to make a walking robot with 4 legs yet.
>>
File: 2211579.gif (2.68 MB, 402x210)
2.68 MB
2.68 MB GIF
>>37305246
these are amazing pls make more
>>
>>37309545
I can help but wonder how many generals the schematics are gonna touch.
>>
>>37304842
lol hes right faggot.
>>
File: bumpfs.png (452 KB, 1000x1000)
452 KB
452 KB PNG
>page 9
Bumping with this mini-PTS I made last week:
https://u.smutty.horse/mcgnzmibuvu.mp4
>>
>>37292942
Can I get a link to the git?
>>
>>37309498
I do hope a test site comes back soon.
>>
>>37310795
https://github.com/AlphaPassive/mlpstation13
and here's a link to the thread if you're interested
>>37306319
>>
File: drd man.png (379 KB, 580x652)
379 KB
379 KB PNG
https://www.youtube.com/watch?v=kgjvnI_FVcc
Im getting annoyed with some sfx/voice stuff from main audio episode and took a break to finish the meme song Ive been messing around for some time.
So I hope you guys will enjoy the DRD 100% Gamer song.
>>
>>37300564
I don't like how bloated KoboldAI is, I will instead make a simple web interface like Cookie's ngrok notebooks. Hell, I could probably use one of the free TPUs under the TRC to serve the FIMFiction model if I didn't accidentally waste a ton of free Google Cloud credit on data ingress between continents.
>>
first time using the talknet thing and I'm getting this error CUDA out of memory. Tried to allocate etc
did i fuck up something?
>>
>>37311317
try this >>37308835
>>37308516

it would e nice if there was a small readme file added to the zip download to explain that to people.
>>
>>37306812
If the filename for the reference audio is too long it will throw an error when trying to generate the audio.
>>
>>37311628
Windows API doesn't support paths longer than 260 characters. I'm working on a UI change that might help with this, but it's not really a bug.
>>
>>37311747
>>37311774
Ok good to note of it because people might stick the lyrics they want to generate in the filename so they might run into the same issue I did.
Thanks.
>>
>>37311780
I just type the lyrics separately in Notepad or something.
>>
File: the_line.png (2 KB, 495x42)
2 KB
2 KB PNG
>>37311226
GPT-J-6B inference notebook has been updated with ngrok-access web interface. Same link.
Instructions are to use Runtime->Run all and wait for the last cell to output pic related, then click the link.
https://colab.research.google.com/drive/13R8MJEDTwinEmUJMLqydKOIcAvWiBIlT?usp=sharing
Now I have to write a post in my Tumblr since the TRC program requires progress tracking.
>>
>>37312085
I've tried to run it several times to no avail. It fails every time on the first step with pic related as the error

I'm using TPU and I've tried both restarting and factory resetting the runtime, with no change in the end result.
>>
>>37312593
Go to the first code block, move line 25 to line 17 and try again. I'm not the owner so I can't update it myself.
>>
>>37312911
Thanks, this seems to have worked.

>>37312085
I've only generated once and I'm already impressed by its coherency.
>>
>>37310998
ebin
>>
Been away from the thread for a bit, what happened to that anon who said he was trying to rig up voice-controlled generation? Did that ever make it to a workable stage?
>>
>>37313305
It works like a charm, everybody loves it.
https://colab.research.google.com/drive/1aj6Jk8cpRw7SsN3JSYCv57CrR6s0gYPB?usp=sharing#scrollTo=tOXejargIPTq

(Here's Pinkie doing an acapella song as an example.)
https://u.smutty.horse/mbrjeellwhq.wav
>>
>>37312593
>>37312911
Silly mistake. Fixed.
>>
>>37313305
SortAnon put up a notebook 5 threads ago, here's some notable stuff:
https://u.smutty.horse/mbnemwqjxln.ogg
https://u.smutty.horse/mbolteukctt.ogg
https://u.smutty.horse/mbqacswulum.ogg
https://u.smutty.horse/mbqddkmktnx.wav
https://u.smutty.horse/mbqdwvygrxn.mp3
https://u.smutty.horse/mbqkqvfyaos.wav
https://u.smutty.horse/mbqrdwgevpa.wav
https://u.smutty.horse/mbqporevtcw.ogg
https://u.smutty.horse/mbqgtteqdcj.mp4
https://u.smutty.horse/mbrqwdwjekk.wav
https://u.smutty.horse/mbsiezecxpw.ogg
https://u.smutty.horse/mbsixqorygt.ogg
https://u.smutty.horse/mbvckphwqad.wav
https://u.smutty.horse/mbwiirbsscv.mp3
https://u.smutty.horse/mbvfpojqtss.wav
https://u.smutty.horse/mbtxajqrtdk.mp3
https://u.smutty.horse/mbujesuprdf.wav
https://u.smutty.horse/mbvfaothukn.wav
https://www.youtube.com/watch?v=QkGuSVkunkA
https://www.youtube.com/watch?v=mBYTgYtwgmI
https://u.smutty.horse/mbypxpvtomc.wav
https://u.smutty.horse/mbyrxvsgrgd.wav
https://u.smutty.horse/mcamdlgqtkp.mp4
https://u.smutty.horse/mcbpnpgofmv.mp4
https://www.youtube.com/watch?v=BBSUc6aT-IA
https://u.smutty.horse/mcbvbohuldg.wav
https://u.smutty.horse/mcgzletwotq.mp3
https://u.smutty.horse/mcgzlessmrw.mp3

Some shameless plugs up there and currently trying to make this Shania Twain one work:
https://u.smutty.horse/mchvphwakky.mp3

>>37313324
This one's better
https://u.smutty.horse/mbrjphugmmx.ogg
>>
>>37292923
Actually, one way to achieve frequency equivariance could be with the constant-Q transform. It's like a Fourier transform, but the bins are geometrically spaced. So, you can make the center frequencies correspond to a musical scale. This means that a pitch shift would correspond to a shift of the bins in the same direction.
I've seen some papers use it, but overall it's not as common as the STFT.
>>
>>37314028
Excuse my dumbness, but didn't AI uses MEL transform for frequency stuff?
>>
>>37314569
Yes, but not all bins of a Mel spectrogram are geometrically spaced. So, a pitch shift, or multiplicative scaling of the audio, won't always result in an additive scaling of bins.
Maybe this is good, because human perception of pitch doesn't perfectly follow a geometric scale anyway. But, if you want equivariance to pitch shifts on a musical scale, the CQT is still probably better.
>>
>10
>>
File: tpdne-segmentation.png (550 KB, 1014x835)
550 KB
550 KB PNG
TPDNE has been used in a paper on unsupervised StyleGAN segmentation using CLIP. As far as I know, this is the second time ponies have been used in a published ML paper (first time was iCartoonFace).

>Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP
>https://arxiv.org/abs/2107.12518
>https://github.com/warmspringwinds/segmentation_in_style
>>
>>37316125
>When AI can do better muzzles than G5.
>>
>>37310111
Is there a way to share files without me having to manually clear metadata every single time? Otherwise, it won't be an anonymous project.
>>
>>37317212
Depending on how the meta-data is stored, there's probably programs that'll do it for you.
>>
>>37317231
I'm mostly concerned about the computer name and location data in stl, (solidworks) part, and (prusa) 3mf files.
>>
>>37317250
I think the best way to handle the metadata for the stl is to use a patching program like http://www.romhacking.net/utilities/240/ to create a patch that you can apply to any stl file. *(You do this by taking one of your files that has the metadata still, and manually editing the metadata out of it, then make an .ips file with these two). This may not work, but if it does, it'll save time. I haven't found anything that says 3mf stores metadata in a common way across files, but it would be weird if it didn't. So this method may work for both.
>>
>>37317338
Wow anon, thanks! I found anonymous github, and looking into that now too.
>>
File: Ews84PPXIAkJj8.jpg (98 KB, 1080x596)
98 KB
98 KB JPG
https://u.smutty.horse/mcihghgobfv.wav

Gais Van Baelsar speech from FFXIV delivered by Glimmer, hope you enjoy
>>
>>37317680
Starlight Glimmer is a nigger.
>>
>>37313563
Jesus, the dub of the Mentally Advanced Series almost sounds just like the VAs did it. Fits perfectly.
>>
File: 1623279368168.png (218 KB, 1200x595)
218 KB
218 KB PNG
>>37286875
https://www.youtube.com/watch?v=NynmEU-tCBA
So here it is, another long audio episode, this time I've done it 100% without using the 15s voices while going full ham on the ambient and sfx sounds.
I hope all you guys will enjoy watching this one.
>>
File: partying internally.png (139 KB, 348x348)
139 KB
139 KB PNG
>>37318223
Interesting.
>>
>>37318223
That was a good watch
Flutters could be really adorable as a tap-dancing god
>>
File: RAP BATTLE.png (2.01 MB, 1920x1080)
2.01 MB
2.01 MB PNG
>>37286875
The Mane 6 get together for a friendly rap battle, courtesy of Pinkie befriending Tyrone, but it turns out there’s more to him than anypony expected...

https://youtu.be/psS0fTknd-c

This is another thing I’ve wanted to do for ages, now finally realised with SortAnon’s TalkNet, and a slammin’ beat courtesy of BGM. Thanks very much to you both for making this possible.

Based on this /mlp/ thread from 2012:
https://desuarchive.org/mlp/thread/6709157
https://u.smutty.horse/mcidnwljcym.png

High quality downloads:
Full version - https://mega.nz/file/Nc4mFTYZ#h65qzHd57Ser8ObYz0BXn5imdzaBeoSjfHqY104hw9o
Rap battle only - https://mega.nz/file/dIwSFJiB#_AnhKoHngCn4NKGTTu5FU2OdkRUoOHKZ8g-wzn7BR4M
Instrumental - https://mega.nz/file/oZgyzZxT#qaTr8vMcYMakKdZNSFN0dmSBEaeXyJ2bGN1fvaAtQT8

>>37318223
Nice work mate, I enjoyed listening and seeing you get better at using sound effects. One thing I did notice is that a lot of the lines got cut off just before the end, which sounded a bit weird. Not sure what was going on there, but was still a fun story and I'm happy to see you still making these.
>>
>>37319348
>Mane6 Rap Battle
Now thats good shit,
Btw, were do you get the Tyrone sounds? I hear that voice inserted in ytp from time to time but I can't never find original source.

>lot of the lines got cut off just before the end
I run into a problem of generating sentences that kind of breaks the last word in the sentence (for some reason mostly with TS and RD lines) in like it 9 out of 10 times, a weird cracking, shimmer or other noise is added to it.
So I solved it by adding extra word and just cut it out n the editor.
I know I can generate the word on its own or in another sentence and try to swap it however a lot of time it will have a different speed, pitch and/or tone to original sentence and to me it sound more jarring than cutting the sentence short.
I think that could be solved with arpabet control like in the DeltaVox, with enough dicking around any word can be broken and rearranged in such way that I can force it to work.
And before someone post to just use more of the reference wavs, I've used that function for some of words that tts had trouble to generate but sadly most of the time my way of speaking did not go hand in hand with the way pony models wanted to pronounce the word (and on more autistic note, pretty much all the lines were generated between 5 to 20 times, pic related).
>>
File: CMCMilkshakes.gif (569 KB, 498x279)
569 KB
569 KB GIF
>>37319348
Love how it turned out. It was a lot of fun to collaborate on!

>>37319473
>were do you get the Tyrone sounds?
They're from my music program, Logic Pro. I exported them for Clipper, if anyone else wants them, here they are. They're all locked to 100BPM cause that's what was relevant for the project.
https://drive.google.com/file/d/1wXk1FfoLzzIR11-5qApgu2_3Jvq2tuPY/view?usp=sharing
>>
File: wut.gif (803 KB, 512x512)
803 KB
803 KB GIF
>>37319348
>Applejack
>>
Hey Sortanon, i think you need to fix something in your colab talknet training code.
I just got some missing modules on step 5 (easy to fix with '!pip install' command ) and a wall of text of errors on the step 7. ive change the batch size to 1 but that didnt fix it.
https://pastebin.com/dhWxrtxQ

RuntimeError: The size of tensor a (217) must match the size of tensor b (685) at non-singleton dimension 1





Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.