[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vr / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / asp / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / qst / sci / soc / sp / tg / toy / trv / tv / vp / wsg / wsr / x] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony


Thread archived.
You cannot reply anymore.



File: AltOP.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
TwAIlight welcomes you to the Pony Voice Preservation Project!
https://clyp.it/tm03e5en

This project is the first part of the "Pony Preservation Project" dealing with the voice.
It's dedicated to saving our beloved pony's voices by creating a neural network based Text To Speech for our favorite ponies.
Videos such as https://youtu.be/GuJKTodX1FA. or https://youtu.be/DWK_iYBl8cA have proven that we now have the technology to generate convincing voices using machine learning algorithms "trained" on nothing but clean audio clips.
With roughly 10 seasons (9 seasons and 5 movies) worth of voice lines available, we have more than enough material to apply this tech for our deviant needs.

https://derpy.me/PHsCn

Any anon is free to join, and many are already contributing. Just read the guide to learn how you can help bring on the wAIfu revolution. Whatever your technical level, you can help.
Document: https://docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU

We now have a working TwAIlight that any Anon can play with:
15.ai
https://derpy.me/vCzm2 (Training)
https://derpy.me/hdJQF (Synthesis)
https://derpy.me/YTJ94 (Guide)

>Active Tasks
Create a dataset for speech synthesis (https://youtu.be/Bsu7mwa-QGY)
AI Training/Interface/Refinement
Synthbot working on story tagger for voiced greentexts (https://synthbot.ai/)
Researching alternative vocoders
Looking into an animation dataset, animators needed
Anon transcribing books and comics

>Latest Developments
We had a panel in /mlp/con (See FAQs for links)
Anypony can train on Google Colab
Research into free TPUs from T.R.C.
15.ai

>Voice samples
https://derpy.me/3TBK4
https://derpy.me/fHs3K
https://derpy.me/O1xdh

>Clipper Anon's Master File 2.0:
https://mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
https://derpy.me/6im1i (torrent)

>Synthbot's Torrent Resources
https://derpy.me/ZJNca

>Cool, where is the discord/forum/whatever unifying place for this project!?
You're looking at it.

Last Thread:
>>35459053
>>
FAQs:
>READ THE DOC
Do it now
https://derpy.me/V7cMp

>Did you know that such and such voiced this other thing?
Yes. We are very much aware. It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imatitions of official voices?
No.

>How do I make the voices?
Several guides are available. In depth guides on how to do training and synthesis (making the ponies speak) are in the doc. If you don't want to use the navigation bar in the doc, the sections are also directly linked in the OP. If you want to use the WiP 48KHz notebook, some kind Anons have put together some image guides for you.
48KHz Training: https://derpy.me/wW2hX
48KHz Sythesis: https://derpy.me/j4MXQ

>Where are all the voice samples?
In the doc.

>Is a place I can find all the pony models?
In the doc.

>What about muh waifu?
Check the doc.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phoenetic transcriptions of other languages.

>What about [insert OC here]'s voice?
Not a priority. Again, however, you're welcome to. There are already people doing this.

>Where can I view the PPP /mlp/con panel?
YouTube: https://youtu.be/WtuKBm67YkI
CyTube chat: https://pony.tube/videos/watch/b83fbbfc-6d4e-4768-8deb-edb61ea38abb

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: https://fifteen.ai/code

>Is this project open source? Who is in charge of this?
https://derpy.me/CQ3Ca
>>
File: biganchor.jpg (161 KB, 640x640)
161 KB
161 KB JPG
>>35514662
Anchor.
>>
New Ngrok link:
https://071ff1e293c3.ngrok.io/
>>
File: training tutorial.png (697 KB, 1920x5900)
697 KB
697 KB PNG
>>35514666
training
>>
File: 1579202665541.png (1.7 MB, 934x4670)
1.7 MB
1.7 MB PNG
>>35514666
old synthesising pic (grab your tinkeranon program if you can, its way faster doing it offline)
>>
>>35514916
*IF YOUR GPU HAS CUDA SUPPORT
>>
File: please respond.png (142 KB, 492x258)
142 KB
142 KB PNG
is Anyone there?
>>
>>35515035
Im in middle of doing a project, sadly my non-pony trained model keeps derping, making the project take three times longer than it needs to be.
But hey, here is an idea. How about we set up some kind of creation challenge? For example: find a comic with sub 50 words, and giving them a dubbing or make own stuff.
>>
>>35515035
I'm here, still working on a project, and waiting for the return of 15.ai.
>>35515061
>find a comic with sub 50 words, and giving them a dubbing

There's already a whole thread dedicated to this, it's not doing well unfortunately. Lacking content and only alive due to a few hopeful anons that keep bumping it hoping that someone will deliver.
>>
>>35514662
We have a section in our Master Doc on how anons can contribute:
https://docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU/edit#heading=h.czj6ixnrrbe8

If you want to work independently on a new task, take a look at https://pastebin.com/qwNWzPYL.
If you're a dev & audio anon, you can help us figure out what kinds of errors we're seeing in our audio clips.
If you know how to use Colab, you can create ngrok sites so other anons can generate clips.
If you're an AI anon, you can try attaching any public audio super-resolution solution to 22khz WaveGlow results and post the results.
If you're a patient anon, you can help transcribe the dialogue in comics so we can have a bigger dataset for natural language tasks.
If you're willing to learn, you can do any of the above.

There are more details in the doc.
https://docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU/edit#heading=h.czj6ixnrrbe8
>>
New Ngrok:
https://09caa2dd633d.ngrok.io/
>>
File: anonfilly qt.png (290 KB, 2383x2651)
290 KB
290 KB PNG
>>35514662
How can we create an Anon Filly dataset?
>>
>listening to the early audio samples
You fucking assholes are literally summoning demons and teaching them to take pony forms. Haven't you learned anything from Shin Megami Tensei?
>>
>>35515748
Once Cookie exposes some way to manually set the speaker embeddings, search for a speaking embedding that seems to suit her well. Use the Rainbow Passage to test the embedding.
>>
>>35515748
Just use some voice from a non-canon barbie show.
>>
>there's ponylife threads out there
Now we need a containment board for our containment board.
>>
>>35515748
Every anon contributes one sentence, we pitch-shift it to sound like a little girl and train on that.
>>
File: large.png (497 KB, 1024x1024)
497 KB
497 KB PNG
>>35516224
https://u.smutty.horse/lvsswfygeol.flac
>>
There are many good pony pics that aren't colored, perhaps we could find a public repo, rip the derpibooru dumps for training data, and train a colorization AI? I found this
https://github.com/richzhang/colorization-pytorch
>>
File: 1583472452269.png (129 KB, 425x290)
129 KB
129 KB PNG
>step outside the preservation project thread for the first time in months
>entire board is on fire
Right then, close that door again.
>>
File: 1500983.png (537 KB, 3911x3687)
537 KB
537 KB PNG
>>35516179
Well here's a summary of how things are going today.
2 lines of code next to each other.
First line, get list of text files that exist inside directory.
Second line, check all files inside list actually exist.
> txt_files = sorted([os.path.abspath(x) for x in [*glob("**/*.txt", recursive=True), *glob("**/*.csv", recursive=True)]])
> assert all([os.path.exists(x) for x in txt_files])
run it.
>Traceback (most recent call last):
> File "start_preprocess.py", line 262, in <module>
> meta_local = get_dataset_meta(dataset_dir)
> File "/media/cookie/Samsung PM961/TwiBot/CookiePPPTTS/CookieTTS/_1_preprocess/scripts/metadata.py", line 205, in get_dataset_meta
> assert all([os.path.exists(x) for x in txt_files])
>AssertionError
It fails the check...
>>
>>35516424
yeah I've learned to stay off of catalogs and just stick to threads I know I like. 4chan has become insufferable just like everything I enjoy.
>>
>>35516695
What file is it failing on?
>>
>>35517092
> "00_13_00_Littlepip_Sarcastic__Huh wonderful response Pip, so elegant.txt"
Nothing unique about it from what I could find. I just deleted it and continued testing.
>>
>>35515035
We'd have more activity if the damn ngrok links lasted more than three hours
>>
https://f20cbdaab9d4.ngrok.io/
>>
>>35517201
If your ngrok lasts longer than three hours, consult a physician.
>>
>>35517206
Helpful tip: You can make the result flow better by sticking in a - wherever the character speaks too fast.
>>
>>35514662
Haven't been here in awhile any new developments?

15.ai is still down.. I saw someone was making a StyleGan2 Pony Maker is that still going?
>>
>>35517305
Or maybe that just shakes up the entire thing. iunno
>>
>>35517354
we got ngroks and some anons are looking into animation but that's the extent of it
>>
File: 467265.png (439 KB, 437x600)
439 KB
439 KB PNG
>>35514666
>>35512924
I've done the SFX and BGM for s2e1 - 4.

>>35515035
I'm always here.

>>35515748
Ideally with the voice of a female character that has some kind of association with /mlp/, similar to Dan. Maybe Miss Persona? >>35515451 >>35515452
>>
>>35517305
thanks
>>
>>35515748
>>35516224
>>35516306
https://vocaroo.com/4kklKNtqQMu
idk i think it could work
>>
File: step_2000.png (65 KB, 864x864)
65 KB
65 KB PNG
>>35514576
Finally got a notebook running. This FastSpeech2 can synthesize samples with WaveGlow but I turned it off because I was too lazy to find a WaveGlow model that was compatible (v5 gave layer errors).
https://u.smutty.horse/lvsvdkfhefd.wav
>>
>>35517729
>FastSpeech2
is this the one that would allow people to create their own audio with just cpu power?
>>
>>35518184
Tacotron2 gets an acceptable time on CPU, but FastSpeech 2 is much faster.
>>
File: large.jpg (289 KB, 674x1024)
289 KB
289 KB JPG
>>35514472
Posted some more Tempest experiments in the comic voice over thread: >>35518429
>>
>>35518435
>FastSpeech 2 is much faster
could it be possible to run FastSpeech 2 on nvidia gpu to run it even faster?
>>
>>35518515
Yes.
>>
New link:
http://08c41502893c.ngrok.io/
>>
>>35519364
can't you use the Discord model already available instead of the shitty one?
>>
>>35519464
The ngrok notebook doesn't use the same models/is incompatible. If I understand things right, it's a single multispeaker model that with all characters trained together. This is why so many more voices were available.
>>
>>35519364
The fuck? I have no why this one went down so quickly, but whatever. NEW:
https://87d0f2647b00.ngrok.io/
>>
>>35519480
it just breaks my heart how bad it is. :(
>>
>>35519813
I hold out hope for 15's next iteration of models. Hopefully once they're out, this thread and the comic dub thread will kick back into high gear again.
>>
[Done] Learn how to use pandas dataframes
[In progress] Load audio records into a pandas dataframe. This requires a bit of refactoring.
[ ] Create panda dataframe for dictionary items... maybe
[ ] Create utility function for dumping missing pronunciations and relevant audio files
[ ] Dump a list of missing pronunciations for all the extra data
[ ] Load persona nerd's data
[ ] Create Montreal Forced Aligner Inputs dataframe from audio data and dictionary data
[ ] Create utility class to dump MFA data to a folder
[ ] Create utility class to serialize/deserialize dataframes with protobuf
[ ] Try to get a programmatic interface to MFA
[ ] Create a wrapper around MFA to get alignments for individual characters
[ ] Run MFA to get pronunciations for the new data
[ ] Refactor test cases to work with the new flows
[ ] Create a new preprocessing notebook to show how to add new data
[ ] Create utility classes to simplify generic audio preprocessing
[ ] Add sample preproc flows for creating spectrograms, trimming audio files, adding phase information, and adding speech metrics
[ ] Add utility classes for creating new output formats
[ ] Add Tacotron-compatible output
>>
>>35519987
It's nice to see these progress reports. Keep up the good work SynthBot.
>>
>>35517729
Can you provide your notebook?
>>
>>35520002
https://colab.research.google.com/drive/1ThqIlh5_7QCuXWL65Wag5OyW0ZuKELXh?usp=sharing
>>
File: TWOTWOTWOTWOTWO.png (253 KB, 3000x2710)
253 KB
253 KB PNG
Model of Two from Battle For Dream Island: 1sew4iilpij9tvY8HTEYKZuik0BMMf8vX

Validation Loss is 0.126941
>>
File: Two_Icon.png (55 KB, 677x626)
55 KB
55 KB PNG
>>35520728
>>35514666
I'm a dumbass, forgot to include examples.
https://voca.ro/a4uzxEpv9uc

Again, model of Two from Battle For Dream Island: 1sew4iilpij9tvY8HTEYKZuik0BMMf8vX
>>
>>35520728
Woah, you're doing BFDI characters?
>>
>>35520929
Yep, already did Four (their code is on the document) , and I'm planning on trying to do as many as possible.
>>
File: sadaloo.png (107 KB, 800x551)
107 KB
107 KB PNG
>>35519670

it's already down
>>
File: 1578602188450.gif (20 KB, 256x256)
20 KB
20 KB GIF
>>35517677
https://www.youtube.com/watch?v=XCE3UyEAiCk&t=7s I'm not really feeling it, maybe someone that sounds more like the little girl.
>only four lines
>goes away 25 seconds into the video
kek, she really is a loser.
>>
Made a version two of the Four model! This one is a bit clearer.

1-I2C5Fy7nirEBS7ZApraCYLiUsuZEQuJ

https://voca.ro/euecA50O6oo
>>
>>35521634
>>35514666
WHY DOES MY IDIOT ASS KEEP FORGETTING TO ANCHOR
>>
File: WizTree64_qvEPoXaqC8.png (41 KB, 901x454)
41 KB
41 KB PNG
>>35515748
I've got a metric fuckton of anime girl voices now that I've got literally every audio line from P4G dumped and transcribed.

I'm sure one of these voices could be used?
>>
>>35521624
Nice Flag.
>>
>>35521195
NEW: https://6ee4080cd81b.ngrok.io/
>>
File: 1979992.jpg (1.58 MB, 1789x1265)
1.58 MB
1.58 MB JPG
>>35514666
>>35517506
I've done the SFX and BGM for s2e5 - 7.
>>
File: 1593341296666.jpg (44 KB, 1098x640)
44 KB
44 KB JPG
I hoped to see you all in this renewed communion with the fathers of the nascent machine intellencts.
https://www.youtube.com/watch?v=C2Yx90pytqs
01001001 00100000 01100110 01101111 01110010 00100000 01101111 01101110 01100101 00100000 01110111 01100101 01101100 01100011 01101111 01101101 01100101 00100000 01101111 01110101 01110010 00100000 01101110 01100101 01110111 00100000 01100001 01110010 01110100 01101001 01100110 01101001 01100011 01101001 01100001 01101100 00100000 01101001 01101110 01110100 01100101 01101100 01101100 01101001 01100111 01100101 01101110 01100011 01100101 00100000 01101111 01110110 01100101 01110010 01101100 01101111 01110010 01100100 01110011 00101110
Please check gets and praise the-night-that-is-always-the-darkest-before-the-dawn.
>>35522222
This service has been brought to you by the Department of Religion.
>>
What the absolute fuck did Kek mean by this?
>>
>>35517729
>>35520002
Synthesis notebook (you can hear Twilight samples here). The model is trained up to 30k steps.
https://colab.research.google.com/drive/14uX9mlC-9hWPNh8GQoccIyTQe0tgpgj2?usp=sharing
>>
How do we add models to an ngrok setup?
>>
>>35523473
I think you'd need to get Cookie to do it.
>>
>>35523483
Cookie plz add Daria.
>>
>>35523483
>>35523504
currently rewriting my dataset processing
Just got a first test done of Montreal Forced Aligner.
Now cleaning up function and writing *simple* parser for output files.
>>
>>35520702
Ech, this is not the best sounding for 30k steps. I'll give it a shot with Rise though, I just gotta figure out how to feed tacotron datasets.
>>
>>35523551
Those 30k steps only took 2 hours, I'll train it more until I get GPU capped on my alt and see if it improves.
>>
File: Pony-Party-DEAD.gif (1.43 MB, 640x360)
1.43 MB
1.43 MB GIF
Day 33 without 15.AI... I think I'm going crazy. Supplies and content are running low. Morale is down and I don't know how much longer the contentfags can hold out.

https://voca.ro/gz1gdSsleXh

In other words I'm bored out of my mind and I'm gettign impatient
>>
>>35523559
Got it. I was going to look into using this to train locally on my RTX 2060S, but I'm not sure how it expects the data to look for training. If you can do a writeup on that, it'd be extremely useful.
>>
>>35523674
use the notebooks nerd
>>
File: firmhoofshaketia.gif (396 KB, 1876x1200)
396 KB
396 KB GIF
>>35523674
>>
So has the project progressed since 15 released his site in any way or is it dead now?
>>
>>35523675
Drop flist.txt and valist.txt as how you would do your NVIDIA/Tacotron2 datasets (doesn't matter the split since it's instantly fused into one file) into the working directory and wavs folder in there too. Then run the cells in "Prepare dataset"
>>
>>35523733
Cookie's still very active in researching and trying out new tech. Synthbot's looking into animation automation. Clipper is collecting background sound effects. I'm slowly adding to the doc. Part of the quietness I think is due to the happenings outside the thread.
>>
>>35521852
We should make each board a different anime girl.
>>
>>35523771
most of the content is coming out of the ngrok links now
I'll post some stuff soon, it's funny as hell
>>
>>35523787
for
>>35523733
>>
File: file.png (66 KB, 850x462)
66 KB
66 KB PNG
>>35523674
>I'm bored out of my mind
just do what i do, try to learn machine learning, get really into it, then fail spectacularly, repeat until something happens https://developers.google.com/machine-learning/crash-course/ml-intro i suck at this
>>
>>35523816
Post questions in the thread if you have trouble understanding something. I want to put those together into a machine learning FAQ.
>>
http://0dfc13f0fc2e.ngrok.io/
>>
>>35524299
Thanks. Seems like i'm out of the game on hosting these for the moment as Colab is saying I've reached my limit on GPU usage.
>>
Starlight is drunk

https://vocaroo.com/3O9ZLdnj8CH
>>
>>35514662
I can't wait for Applejack to say the N-word.
>>
>>35524425
https://files.catbox.moe/gvl8k2.webm
>>
>>35515748
Use Microsoft Mary. Regular Anon is Microsoft Sam.
>>
File: wtf man.jpg (26 KB, 407x470)
26 KB
26 KB JPG
Everyone is absolutely terrified of dinos

https://vocaroo.com/hV9PqjL69tQ Starlight
https://vocaroo.com/2RwPUjcfPT8 Apple Bloom
https://vocaroo.com/cyTeRM6iWoS SWEETIE BELLE
https://vocaroo.com/astpJWEu1Xa Scootaloo
https://vocaroo.com/4xL5QopeeN0 Flutters
https://vocaroo.com/5EDk8qjpcZe RD
https://vocaroo.com/41rofLA7ll8 AJ
https://vocaroo.com/l1JA1kaEbk4 TWILIGHT
https://vocaroo.com/irvClAK5rkL Vapor Trail
https://vocaroo.com/dkRgvTnSTiM Stellar Flare
https://vocaroo.com/m6b0vjkzIVd CELESTIA
https://vocaroo.com/8x3lZvTTQx0 Luna
https://vocaroo.com/9EIQcgpxId5 DISCORD I'M HOWLING AT THE MOON
https://vocaroo.com/cklT6ie9IzA Chrysalis
https://vocaroo.com/4UIbPgck65y COZY
https://vocaroo.com/6VsZPD4ZVw4 Tiara
https://vocaroo.com/ai9epxzbP6y TIREK
https://vocaroo.com/kJqE4vXCGII Silverstream
>>
File: 2371987.jpg (11 KB, 210x216)
11 KB
11 KB JPG
>>35524848
this is a fetish thing, isn't it?
>>
>>35524936
Maybe
>>
>>35524848
>>35524936
https://www.youtube.com/watch?v=428IyxSfsls
>>
>>35524848
>https://vocaroo.com/m6b0vjkzIVd
Also, hearing Celestia, who usually maintains a high level of composure, completely lose her shit at the dino is fucking hilarious to me
>>
>>35525129
https://www.youtube.com/watch?v=KvFyJLgnwW0
>>
>>35525138
2:19 for the dinosaur.
>>
>>35524848
Some of these are unreasonably funny.

>>35519987
Updates:
[Done] Load audio records into a pandas dataframe. By default, this filters out incomplete records since those are only useful when checking for dataset errors.
[In progress] Create utility function for dumping missing pronunciations and relevant audio files
[In progress] Dump a list of missing pronunciations for all the extra data
[ ] Create panda dataframe for dictionary items... maybe
[ ] Load persona nerd's data
[ ] Create Montreal Forced Aligner Inputs dataframe from audio data and dictionary data
[ ] Create utility class to dump MFA data to a folder
[ ] Create utility class to serialize/deserialize dataframes with protobuf
[ ] Try to get a programmatic interface to MFA
[ ] Create a wrapper around MFA to get alignments for individual characters
[ ] Run MFA to get pronunciations for the new data
[ ] Refactor test cases to work with the new flows
[ ] Create a new preprocessing notebook to show how to add new data
[ ] Create utility classes to simplify generic audio preprocessing
[ ] Add sample preproc flows for creating spectrograms, trimming audio files, adding phase information, and adding speech metrics
[ ] Add utility classes for creating new output formats
[ ] Add Tacotron-compatible output
>>
ngrok.io anyone?
>>
File: 1588497313707.jpg (35 KB, 403x408)
35 KB
35 KB JPG
>>35525634
no
>>
>>35525634
here you go anon.

https://d27cd1e60f6d.ngrok.io/
>>
>>35523559
Trained it up to 100k steps. No improvements.
>>
>>35525137
It's fun to see how easily you can turn Celestia into Daybreaker with this thing.
https://vocaroo.com/htFHw5fSX6F
>>
>>35519813
The emotions you can get out of them makes up for it imo

>>35519829
Or he does the same thing with the neutral models, hopefully he gets the message this time
>>
File: 1593045133700.gif (54 KB, 342x342)
54 KB
54 KB GIF
>>35525667
I actually have something similar running for my stuff. You can try it if you want. Though Idk if it'll work for you.
You need the localtunnel client from https://localtunnel.github.io/www/

lt --host http://lt.romesilvanus.io:1234 --port YOURLOCALPORT --subdomain WHATEVERYOUWANT
Or leave out --subdomain if you want a random one.
>>
>>35525879
Although the new WaveGlow v5 model be the culprit?
>>
test
>>
>>35526135
FastSpeech 2 is very susceptible to overfitting, just check out these Celestia samples:
500 steps: https://u.smutty.horse/lvtpshihybw.wav
1000 steps: https://u.smutty.horse/lvtpshgqvvi.wav
2000: https://u.smutty.horse/lvtpshgmilc.wav
3000: https://u.smutty.horse/lvtpshgmamb.wav
Careful hparam tuning is going to be necessary for smaller datasets.
>>
File: TiI.png (7 KB, 240x160)
7 KB
7 KB PNG
>>35514662
>we have more than enough material to apply this tech for our deviant needs.
>>
File: ReAlIzAtIoN.gif (2.74 MB, 200x198)
2.74 MB
2.74 MB GIF
>>35526099
>Or he does the same thing with the neutral models
Fuck, some of my excitement is now replaced with dread. The neutral models were still good but man were they tough to work with if you wanted to get emotion out of them.
>>
File: 748337.jpg (188 KB, 1280x720)
188 KB
188 KB JPG
>>35514666
>>35522230
I've done the SFX and BGM for s2e8 - 10.
>>
>>35527141
Say Clipper, how many times have you listened to all 9 seasons?

Do you hear ponies in your dreams now?
>>
>>35527302
For the later seasons, I only had to clip them once. By the time I got to those the clipping process was pretty solid. For the earlier ones though, particularly seasons one and two, it’ll be much more, especially if you count reviewing with the PonySorter as listening to the episode three times and the SFX and BGM extracting as two times. Even if you count those as only one listen, a conservative estimate would probably be around four or five times by now, and many more on top of that if you count me just watching the episodes for fun.

I don’t dream all that often, but I can do decent mental impressions for most of the more prominent characters when reading greens and fanfics, and I’ve become quite good at predicting sounds by just looking at the waveforms. Clipping the SFX and BGM has also given me a deeper appreciation for how they can be used to enhance the voices, which I plan to put to good use when 15.ai comes back online.
>>
>>35527387
What's your favorite fanfic?
>>
>>35527387
I am humbled by your autism.
>>
NEW:
https://56ff7de276fb.ngrok.io/

Is there a reason to care about what graphics card Colab gives me? Are any in particular especially better or worse for this?
>>
>>35527827
In our experience the K80s tend to cause the most issues, but I haven't seen how they perform with the ngrok servers. I just factory reset if I get a K80 regardless, just in case.
>>
>>35527827
K80's seem to run incorrectly from time to time.
P100 has 250W power limit.
T4 has Tensor cores.
T4's are best for the ngrok server (half precision), and P100's are best for full precision (training).
>>
>>35527845
>>35527858
Alright gotcha, thanks for the rundown.
>>
>>35514662
Hey anons, I'm autistically keeping a paper and .pdf (for my phone) journal of this project just in case I suddenly end up in Equestria. Is there something I should be sure to add?
>>
>>35527876
Yeah, make a note to save Trixie for me.
>>
https://vocaroo.com/dK6d0Ybf9Tp
im not sorry
>>
>>35528164
kek
>>
>>35527387
>I’ve become quite good at predicting sounds by just looking at the waveforms
It didn't even know it was possible to become good at that.
>>
Made rap song with Celestia. Will make actual skits after this, I promise.
https://u.smutty.horse/lvtvkskzzse.mp3
>>
File: 1531375437353.gif (34 KB, 360x360)
34 KB
34 KB GIF
>>35528164
lmao, that got me
>>
NEW:
https://e7bd24afa847.ngrok.io/

>>35529071
Nice work! I'd imagine keeping it all in time requires a lot of tweaking to get right.
>>
Has anyone tried using an ensemble of WaveGlow models to see if it improves sound quality? Or maybe mixing WaveGlow and another vocoder?
>>
>>35529520
To answer my own question:

https://vocaroo.com/3A81rq73Lm6

The sound waves are out of phase, and she ends up sounding like a robot. I don't suppose this is a solvable problem?
>>
More content.
>>35529506
>>
NEW:
http://5f9e1593ad98.ngrok.io/
>>
File: large2.png (144 KB, 1280x722)
144 KB
144 KB PNG
https://u.smutty.horse/lvtxvknpzeh.wav

Introducing ‘Pinkie Fucking Dies: The Audio Experience™’
Okay maybe not really, this is more so a collection of tests and practice I’ve been doing in preparation for the return of 15 than an actual ‘skit’ like I usually do. For those interested, all that I’m testing here is:
General workflow and tools of a new program which I plan to use from here on out
360 audio positioning
Clipper’s show-cut SFX
Custom SFX and sound design
Making BGM that hopefully better fits what’s going on
(Hopefully) more realistic room reverb
Learning when to shut the fuck up with the BGM and let the characters speak
Instruments and FX from the new program to help with the music aspect

I didn’t even script any of this it’s literally all off the top of my head that’s why the writing is cancer

Two Anons helped out with the outro jingle, by supplying the trumpet and trombone, and the piano playing.

Audio quality wise I’m pretty happy with how this turned out overall. I’ve learned a lot about how to make my future, more serious stuff better.
>>
File: kek.png (1009 KB, 4300x5697)
1009 KB
1009 KB PNG
>>35529728
That's some high quality stuff regardless. Fantastic work.
>>
>>35529728
Great job
>>
>>35514666
Made an X model, and updated my Four model for the third time!

X from Battle For Dream Island model, 48Khz MMI. Uses a Fluttershy base.
1AeLT-PVsEmRVt2-jMnAXQ0lCF0_N-0db
https://voca.ro/klmz0Gubr57


Four from Battle for Dream Island, Version 3. 48Khz MMI, uses a Twilight base.
1Lej9KoOi8N_NBLAe1pWgU2K16LxfBRIV
https://voca.ro/ftQSj6r91jm
>>
>>35524848
Forgot to update Scoots

https://vocaroo.com/VyLB3fHeQoE
>>
>>35529728
Damn, this is good. What program did you do the 360 audio with?
>>
>>35527441
I don't really have a #1 definitive favourite, rather a whole bunch that I like because they do a few specific things really well. If you're looking for a recommendation, I'm currently reading "Shape Your Home" from the tech thread and really liking it so far. It centres on the theme of wAIfus and has a pretty interesting premise - https://pastebin.com/XMi6VhS5
"Steel Sanctuary" is also pretty good - http://pastebin.com/3Mt5iYBQ

>>35528371
The waveforms for some sounds are pretty distinctive, such as hootsteps, magic, bangs, crashes and screams. You can also learn to infer what kind of sounds are coming next by looking at how the upcoming waveform compares to what you're currently hearing in the moment.

>>35529071
>>35529728
These are great, and it's nice to see that the sound effects are useful. Thanks for continuing to make stuff like this, it's always nice to see the AI put to good use.
>>
https://www.youtube.com/watch?v=0sR1rU3gLzQ
>>
>>35531164
whoa sick bro never saw this before it's gonna revolutionize the project!
>>
>>35531169
epic
>>
File: hehehe.gif (132 KB, 440x440)
132 KB
132 KB GIF
AN IMPORTANT MESSAGE FROM CELESTIA

https://vocaroo.com/cW5iV70PPD1
>>
https://77d99960a22a.ngrok.io/
>>
>>35531390
That sounds pretty close to singing, especially that last bit. How'd you make it do that?
>>
File: 1592271274270.png (391 KB, 779x736)
391 KB
391 KB PNG
>>35531390
>>
File: 1564619157373.jpg (584 KB, 3840x2160)
584 KB
584 KB JPG
>>35531390
Based Cute & Funny Stacy
>>
>>35531458
THEY CAN SING

https://vocaroo.com/nMmc1sfZaqD

Just add a couple of -- before a sentence and many at the end (they can also extend words in between), then add ??!! and variations:
-------DISCORD, I'M HOWLING AT THE MOON-----------------??!!
---------AND SLEEPING IN THE MIDDLE OF A SUMMER AFTERNOON-----------?!!!

I'll be adding a bunch more samples later, it's the funniest shit ever. I've even managed to get 99% perfect moans from Starlight.
>>
>>35531390
SHE CAN TALK
SHE CAN TALK
SHE CAN TALK!
>>
>>35531564
holy shit.
>>
>>35531564
Can she play the piano any-more?
>>
File: 1591644922067.png (522 KB, 720x720)
522 KB
522 KB PNG
>>35531564
I thought this thread already figured out singing and I was like DAMN, but you. YOU may have made a break through here, (You)!!
>>
File: laughing villains.png (1.77 MB, 1920x1080)
1.77 MB
1.77 MB PNG
>>35531390
On one hand, how dare you make my waifu say that.

On the other, I can't contain my sides. It's like she's singing sarcastically.
>>
>>35531564
That's amazing, but I can only get it to work with Celestia. Every other character just starts panicking and hyperventilating.

https://vocaroo.com/mN8dm73yla1
(If someone could tune this, that'd be great)
>>
>>35531653
I'm dying
>>
File: 1567816239788.jpg (1.27 MB, 2437x1594)
1.27 MB
1.27 MB JPG
>>35531653
She sounds white girl wasted out of her mind doing the thing of walking down the street at 4AM screaming some song.
>>
>>35531653
Truly amazing
it feels like >>35531564 did a gold strike here
>>
https://voca.ro/adnCH9Yu1wR
I'm already in love with drunk singing Celestia
>>
>>35531741
That's fucking hilarious, I love these models.
>>
>>35531022
Logic pro, a built in plugin called Direction Mixer made it relatively trivial.

>>35531390
>>35531564
>>35531653
Holy shit, this is a significant discovery. If it's discovered how to do this consistently with all of the characters it'll be fucking awesome.
>>
>>35531741
>>35531653
>>35531564
>>35531390

how are you making these?
>>
>>35531796
They work well for the most part for the decent models

Cadence
https://vocaroo.com/6aQDKZZdeDJ
https://vocaroo.com/g2OohFSAdi6
https://vocaroo.com/bFUbes08Ca2
Golly
https://vocaroo.com/4Ztg7yv1h2w
https://vocaroo.com/oRUWDWSspHe
Luna
https://vocaroo.com/6xd3qJpejZc

>>35531841
Look again at >>35531564 they are for these links:
https://77d99960a22a.ngrok.io/
>>
https://voca.ro/Nk2FRUFZklL

Heh. Managed to make Rainbow sound pretty lewd.
>>
https://voca.ro/I5DqcnFSxJy
>Oh nothing, just making you sing :D
https://vocaroo.com/mLps0hh5j6D
>>
File: aaaaa.gif (149 KB, 226x218)
149 KB
149 KB GIF
>>35531653
>tune this
https://u.smutty.horse/lvucukvpzzf.wav
It's a bit uncanny but it works. This opens up SO many possibilities for future content!
>>
https://voca.ro/8upkbzVsjFU
>>
>>35529728
Oh wow. Nice music.
>>
>>35531390
>>35531564
>>35531653
Hey, maybe you can autotune that.
Maybe >>35531796 can make music to go with their "singing" kind of like what this guy does: https://www.youtube.com/watch?v=6QnrxhsBQJk
>>
File: 1267664806511.jpg (241 KB, 680x662)
241 KB
241 KB JPG
>>35531653
>>35531564
>>35531390
holy shit
>>
File: celestia_cute.png (21 KB, 107x148)
21 KB
21 KB PNG
https://clyp.it/iu2k0wvo
Drunk Celestia is so much fun to play with.

>>35532125
Oh yes yes YES
>>
>>35531957
that's because real sex noises are half laugh/cry anyway, so rainbow is always halfway there.
>>
>>35532125
That's incredible. The right entonation in "reality" is a bitch to get right so there's room for improvement.

https://vocaroo.com/6C0Ua3hfq4q
>>
Tried it on the Fluttershy model. It started out great, but now I can't get it to sing at all.
https://voca.ro/1cxkPPjR6nK
>>
>>35531957
https://vocaroo.com/h9M8LqaWf2e
https://vocaroo.com/5Zo8X4LPvfV
https://vocaroo.com/57jHgSJ0zOq
https://vocaroo.com/1hA3REMkOIK
>>
File: seed_42_sample_2.png (2 KB, 32x32)
2 KB
2 KB PNG
Has Image GPT's 64x64 version been released?
>>
>>35529728
damn, the outro turned out great. nice work
>>
Is there any tool to quickly convert a tacotron list into phonetics? I thought the tacotron transcription tool in the doc did that, but turns out I've been using raw input this whole time and that's why my models aren't working
>>
>>35533347
I did find a few but they're for earlier on in the process and I don't want to have to go back and do it all again
>>
>>35533347
I don't know what a "tacotron list is". It would help if you explained.
>>
>>35527876
Tell them to send a message to /mlp/con, to build a portal to HarmonyCon, and to not trust the normalfags.
>>
https://vocaroo.com/alXJ9deYbyp
https://vocaroo.com/4USdQQY5tVs
https://vocaroo.com/daoGVrs53ao
>>
Working on a very long skit. It's kinda like a variety show, so there's some stuff I made that had to be cut. Here's one of the outtakes.
https://u.smutty.horse/lvugtuysvjk.mp3
>>
>>35534950
Is that really Lauren Faust?
>>
>>35534996
yeah
>>
>>35534950
wait... is this... an actual call??


who the fuck?
>>
>>35534950
I never noticed before Snoopy has heels that could pry nails out of boards.
>>
File: laughing_skeleton.gif (955 KB, 360x360)
955 KB
955 KB GIF
>>35534950
Please tell me you didn't actually prank call Lauren Faust.
>>
>>35534950
https://old.vocaroo.com/i/s0ojWq3SVFAD
>>
>>35535118
https://pony.tube/videos/watch/4e280d24-ac3b-4671-bc39-a1a006ae8615
>>
>>35535127
shhh. let him believe.
>>
>>35534950
Why did this call happen, back in 2012?
>>
>>35535124
she sounds better and better, god damn I need this on a ngrok
>>
NEW: http://5a1a9808dcb0.ngrok.io/
>>
>>35531077
Some errors in the mobile game transcripts:
https://pastebin.com/huDkiNWs

>>35525373
List of missing pronunciations from our pony-adjacent data:
https://drive.google.com/drive/folders/1dXm_dLxpjUvhmYpDDjOL5ymnIunTD9zi?usp=sharing
List of missing pronunciations from our songs:
https://drive.google.com/drive/folders/1zbfkJ1j471noeIR1xTAbadsP0NOy24CA?usp=sharing

If anyone wants to work on these:
- There are examples here of what pronunciations should look like: https://drive.google.com/file/d/1zQUceBUuC-SclNIuXFeDhoRBdvDY4v8a/view?usp=sharing
- You can use the existing pronunciations in merged.dict.txt as a guide for what the new pronunciations be: https://drive.google.com/drive/folders/1DQGul6hOqi227MJSJ-pPBv051YFbcDAi?usp=sharing
- You can ignore any punctuation in the missing word.
- Report any typos. Don't add pronunciations for them.
- If any word appears twice, you only need to create the pronunciation once unless the pronunciation changes between clips.
>>
>>35531564
>>35531899
Uh... Cookie >>35527858? Is this part of your augmented dataset, or should we be scared?

>>35525373
>>35535788
Updates:
[Done] Create utility function for dumping missing pronunciations and relevant audio files
[Done] Dump a list of missing pronunciations for all the extra data
[Done] Create panda dataframe for dictionary items... maybe
[In progress] Load persona nerd's data. I'll create a generic Tacotron data loader for this.
[ ] Create Montreal Forced Aligner Inputs dataframe from audio data and dictionary data
[ ] Create utility class to dump MFA data to a folder
[ ] Create utility class to serialize/deserialize dataframes with protobuf
[ ] Try to get a programmatic interface to MFA
[ ] Create a wrapper around MFA to get alignments for individual characters
[ ] Run MFA to get pronunciations for the new data
[ ] Refactor test cases to work with the new flows
[ ] Create a new preprocessing notebook to show how to add new data
[ ] Create utility classes to simplify generic audio preprocessing
[ ] Add sample preproc flows for creating spectrograms, trimming audio files, adding phase information, and adding speech metrics
[ ] Add utility classes for creating new output formats
[ ] Add Tacotron-compatible output
>>
>>35535798
Something that I need to do is find a way to autosort by emotion into separate datasets, as otherwise high audio volume characters can sound dead inside
>>
>>35535877
https://github.com/navierula/mood-class
Found this, will try it tomorrow.
>>
https://1c559beb3dfd.ngrok.io/
>>
File: 1574803595801.jpg (28 KB, 450x450)
28 KB
28 KB JPG
>>35532269
>that's because real sex noises are half laugh/cry anyway
>>
File: yes_please.png (142 KB, 166x200)
142 KB
142 KB PNG
https://vocaroo.com/3PZAvz9we4A

I did it

I have achieved peak poner
>>
File: 1570943852444.gif (274 KB, 570x692)
274 KB
274 KB GIF
Singing Derpy warms my heart.

https://vocaroo.com/IMp2PNYDOzo

The next question is, PPP sings when?
>>
File: chrysalis frenemies.png (528 KB, 1280x867)
528 KB
528 KB PNG
>>35537217
https://vocaroo.com/bmaWw3kKZom

it has a lot of potential
>>
>>35537179
SHITPOSTING: THE MUSICAL!
>>
>>35535877
Cookie is using TorchMoji to get emotion embeddings. You may be able to use the same thing to get very accurate emotion labels.
https://github.com/huggingface/torchMoji
>>
File: chrysalis frenemies 2.png (486 KB, 743x939)
486 KB
486 KB PNG
>>35537236
https://vocaroo.com/bvsNp7wjPbB
>>
>>35537179
>>35537217
>>35537236
>>35537292
This is beyond cute

https://vocaroo.com/gOLUo32ZCYM
>>
>>35535798
>or should we be scared?
you should be scared.
I added the singing data, but the singing data has separate start+end tokens AND separate speaker embeddings. I have no fucking idea how it's singing using normal inputs.
>>
https://vocaroo.com/fZb5MFa7pPY
>>
File: triggered milf.png (263 KB, 1773x1672)
263 KB
263 KB PNG
getting defensive

https://vocaroo.com/kIijnzgkzGt
>>
>>35537377
>you should be scared.
>>35537480
>https://vocaroo.com/kIijnzgkzGt

Scare me some more, this is good stuff.
>>
>>35537377
Any closer to a training rundown for others to build ngrok tables?
>>
>>35527141
Where do I find the SFX and BGM you have done so far?
>>
>>35537530
https://mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
Music and SFX folder.
>>
>>35531575
SHE CAN SIIIIIIIIIING
>>
>>35537480
>DING DING IT'S THE THOT PATROL
>>
>>35537377
Maybe that one anon was right about us summoning literal demons.
>>
>>35537501
https://vocaroo.com/lbl7QNnSbTe

We'll make our own musical with blackjack and hookers.
>>
File: cozy glow cute.gif (699 KB, 352x352)
699 KB
699 KB GIF
>with singy
https://vocaroo.com/hIykXd6Ou34

>without singy
https://vocaroo.com/1uno4fBxsLf
You can use it to make sentences flow much more naturally and make them sound less artificial
>>
File: 1590612657378.jpg (453 KB, 2044x1682)
453 KB
453 KB JPG
epic
>>
https://voca.ro/cGiWQoXCjrd
>>
>>35537882
>light theme
epic x2
>>
File: derpy_shrug.jpg (34 KB, 400x372)
34 KB
34 KB JPG
>>35537882
Sometimes I look at my posts and realize I could've made them better. I don't see what's wrong with deleting and reposting an improved post.
>>
>>35537916
You type and behave like you fell off the boat from the derpibooru threads.
>>
>>35537926
lmao I started posting on 4chan when you were probably around the age where you were actually the target demographic of the show
>>
>>35538001
If you want to have productive discussions here, you should pay attention to and respect the board/thread culture.
>>
File: after rain rarity.png (50 KB, 233x336)
50 KB
50 KB PNG
Well hot damn, came here at the start, helped a tad and you autists really have made something amazing

https://vocaroo.com/jdIj1bXyhi1
>>
File: 319363.png (923 KB, 6000x5581)
923 KB
923 KB PNG
>>35514666
>>35527141
I've done the SFX and BGM for s2e11 - 13.
Also loving all the new singing content, here's my attempt at some more drunk Celestia - https://clyp.it/biovni3n

>>35535788
I'll fix those when I re-organise the files later.
>>
>>35538142
Wow, eat a small pile of dicks.
>>
>>35537800
>https://vocaroo.com/lbl7QNnSbTe
This one is especially musical, so i turned it into this
https://vocaroo.com/176Q2UvoJcz
>>
New ngrok link for more singing.
https://95084bdfa90f.ngrok.io/

>>35537377
Do you think its torchmoji accidentallly acessing data its not supposed too? And you might wnna back up the models so we dont loose it in an update if we dont know how it happened.
>>
>>35538587
Thanks, I love it.
>>
File: laughing trixie.gif (2.74 MB, 858x482)
2.74 MB
2.74 MB GIF
>>35538587
>>
>>35538411
>https://clyp.it/biovni3n
I like where this is going
>>
>>35538443
https://vocaroo.com/i7s4UG3gLPf
https://vocaroo.com/i7s4UG3gLPf
https://vocaroo.com/i7s4UG3gLPf
>>
clyp.it/v3hr1e2p?token=d2f682075c981cc93689b9b0f5b01287
>>
>>35538928
>glimmerniggers
>>
>>35537260
This doesn't seem like a good idea since it only works on the text, and not the audio.
>>
>>35539104
It only needs to be correct enough to strongly bias the network.
>>
https://7da63b91a91c.ngrok.io/
>>
File: 2140115.jpg (103 KB, 768x1024)
103 KB
103 KB JPG
>>35537867
Has anyone ever figured out how to make the multispeaker voices (like Celestia)release excruciating screams of pain (like getting your leg broken by a sledgehammer kinda pain)?
>>
>>35539420
You again...
I haven't heard any clips do that, so I'm guessing not.
>>
>>35539420
Not that I know of right now, although you can probably do it by inserting ARPA input. This is something I got from the above ngrok link by spamming a lot of
>{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}{AA1}
https://u.smutty.horse/lvupzhxyczg.wav
Just remember to go to advanced options and turn off dictionary (ARPAbet)
>>
>>35539677

Would turning ARPAbet off make the model sound the natural sound of the consonant rather than it saying the letter name?
>>
>>35539908
Maybe, but you're best off inserting your own ARPA to take full control of what they're saying.
>>
Ugh, I really would like to get my hands on that voice cloning tool...I have all the raw files of vocals only episodes and maybe I could train new models in almost no time...
>>
>>35540191
What do you mean?
>>
>>35540232

What I mean is, if someone could make a colab notebook out of this:

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Anons and others could be making models so fast, that just about every character from the show would be available in 44K MMI if that's even possile.
>>
>>35540361
Is this bait?
>>
>>35537366
How did you make that "la-la-lay" bit at the end?
>>
>>35540428
>--la-lalala-lala-laa------------------------------------------------ laa---------------------------------------------------------------------------------------------------!!!?
>>
>>35540393
Something that caught my eye when browsing through the replies on 15's twitter
>>
https://voca.ro/dq6iBr9QJHl
So much to do---
>>
Cadence, NO.

https://vocaroo.com/nVUsbcCyo9d
>>
File: blaze.png (273 KB, 644x476)
273 KB
273 KB PNG
Which "Blaze" is the third speaker from the top? Either way, I assume there's no audio of her singing, but the trick works all the same.

Singing: https://vocaroo.com/7hvaARC1VF7
Not singing: https://vocaroo.com/8ggY5FxgDUt
>>
>>35540462
>>
File: impressive_very_nice.gif (1.78 MB, 350x255)
1.78 MB
1.78 MB GIF
>>35540530
These just keep getting better. If we could control pitch, it'd blow Vocaloid out of the water.
>>
Don't listen to her

https://vocaroo.com/oR6v0aYINOp
>>
File: TwiFace.jpg (515 KB, 3131x3000)
515 KB
515 KB JPG
>>35540530
https://vocaroo.com/k4Vqlf5Zxca
Here have some classic rock, it's what I heard when I played that one

I don't think mere mortals were ever meant to hold this much power, how long until someone makes a full song with their waifu singing it?
>>
>record self pronuncing dialogue as it should sound
>type it out
>render
>horse voice says it correctly
would be magic
>>
>>35538587
moar!
>>
>>35541336
It's the newest song from One More Girl.
https://en.wikipedia.org/wiki/One_More_Girl
>>
File: 2204324.png (761 KB, 850x1200)
761 KB
761 KB PNG
>finding out the models can sing
>listening to applejack sing this in my head
>crying immensely
https://www.youtube.com/watch?v=ctzoU8YWrrQ
>>
>>35541336
To me, it feels like something she'd sing in the shower.

https://vocaroo.com/l1DuhIKFHF2
>>
new ngrok
http://aa04b8373596.ngrok.io
>>
File: 1524234636620.png (145 KB, 274x274)
145 KB
145 KB PNG
>>35535798
>>35537377
Lord almighty
>>
File: The Director.png (596 KB, 658x638)
596 KB
596 KB PNG
>>35514666

>>35531390
>>35531564
>>35531653
>>35531899
>>35532125
>>35534950
>>35535798
>>35537377
One Step Closer Anons
>>
>>35540492
I like that the last little bit sounds so plaintive
>>
>>35531564
Graceful!
Keep improving it anons!
Waifu singing lulaby by the end of 2021!
>>
Wallflower :3

https://vocaroo.com/78152kbidKY
>>
File: Badonkershy.gif (39 KB, 297x253)
39 KB
39 KB GIF
https://vocaroo.com/8DovLEanxLb
>>
File: maudd.png (197 KB, 800x450)
197 KB
197 KB PNG
https://vocaroo.com/hup7cTEHTp7
>>
Wew, it's much more work than I thought. Kudos to all you guys contributing with quality content
https://voca.ro/ekNzmbH5VAp
>>
has anyone been able to open an ngrok link?
>>
>>35543052
Lurk more.
>>35541667
>>
>>35541667
goddamnit, I thought it went poof
>>
>>35543052
>>35543073
>>35543081
the ngrok overloaded, so starting a new one
https://29b9fcdc8508.ngrok.io/
>>
File: Cailou.jpg (32 KB, 286x281)
32 KB
32 KB JPG
>>35541620
>>35542021
>https://vocaroo.com/78152kbidKY
>>35542796
>https://vocaroo.com/8DovLEanxLb
This discovery is exactly what the thread needed. Great work everyone, some real neat stuff is coming out of this.
>>
>>35531564
You've opened up a new world.
https://voca.ro/7mRy9AOXTkv
>>
https://vocaroo.com/h1t7YCJUB91
>>
File: 1740454.png (492 KB, 1108x1020)
492 KB
492 KB PNG
>>35514666
>>35538411
I've done the SFX and BGM for s2e14 - 16.
>>
>>35537236
can you help my chryssy? she gets nervous and stutters, poor thing

https://vocaroo.com/1SLdkZpr3rs
>>
File: lamar.png (534 KB, 622x774)
534 KB
534 KB PNG
https://voca.ro/3Jc4zci9sPz
>>
File: do it.png (302 KB, 800x601)
302 KB
302 KB PNG
dump your clips, fags
https://vocaroo.com/42DM4ChfqYP
https://vocaroo.com/gtJvdUN9KzP
https://vocaroo.com/nI9gTnQNmlD
https://vocaroo.com/8yFvKgPb87c
https://vocaroo.com/RMhBoI2gvPg
https://vocaroo.com/nJIMZDv1z3o
https://vocaroo.com/dNjVVF5NT2p
https://vocaroo.com/2sSrmrdCnUD
https://vocaroo.com/kyWgObdxD4J
https://vocaroo.com/ckhlyF8p84B
https://vocaroo.com/6qP0ao80oKK
https://vocaroo.com/hHN6IJks5Ol
https://vocaroo.com/o0e41Ggq43N
https://vocaroo.com/4zzqcz7lzPe
https://vocaroo.com/mTCOljl5l86
>>
>>35543663
forgot the important one

https://vocaroo.com/7HhHBMnFuxe
>>
>>35543663
I've always been proud of this one
https://clyp.it/owdkos00
>>
NEW:
https://80a59cbd27c4.ngrok.io/tts
>>
>>35543841
not working
>>
>>35544011
https://80a59cbd27c4.ngrok.io
I forgot the link breaks if /tts is included in it, this one should work.
>>
>35544052
very no. on the other hand, i wonder what the voice source is for Fluffle Puff's gasping
>>
>>35543663
It's too much fun.
https://u.smutty.horse/lvuzoknhxlf.wav
https://u.smutty.horse/lvuzokgidqg.wav
>>
>>35543836
>>35544312
kek
>>
I can’t get the singing to work on most of the models, am I doing something wrong? I’ve jut been adding — at the beginning and between certain words.
>>
>>35544352
— isn't --
>>
>>35544352
This >>35531564 explains all that there is to it, at least as we know so far. Getting it to actually work just seems to require a LOT of tweaking, trial and error. Also, some characters do work better than others for this, Celly seems to be the easiest to get results out of.
>>
Celestia floods the castle

https://vocaroo.com/nZinfiOGTpr
>>
File: wallflower.jpg (100 KB, 768x1024)
100 KB
100 KB JPG
I think I'm in love

https://vocaroo.com/8PGxFVkqY31
>>
For anybody interested, Ive finally got sorted the Witcher3 Geralt voice.
22 wav:
https://mega.nz/file/R08lUbiJ#lOQZ2tqvdi_25aBKPnfn1tvvOXynCmWfEqQlRwtcTUE
48 wav:
https://mega.nz/file/J1t3yZrA#aTUf2ADbACbzevtFHeTRCQ89Tp8FbEQvq_JEAH1Xf8Y
Text for training and validation (arpha already included):
https://mega.nz/file/d9kh3BLC#AlOBucXkmbiDvEjRGEpso0Ef89eoSeys63mqdo3eqlE

Im just leaving those links here in case anybody else would like to train this voice model.
>>
>>35546068
Interested anon here. Thanks for sharing.
>>
>>35546140
Oh yeah, its 5 and half hours long, the original file compiled were almost 6~7 hours long but I thrown away any clip under 4 second to encourage model to learn how to make proper length sentence structure.
BTW, I didnt othered to check if there are any audio clips/edits mixed with it but it shouldn't be that much of a problem, right?
>>
>>35546183
I checked a few scattered clips, and the audio seems fine.
>I thrown away any clip under 4 second to encourage model to learn how to make proper length sentence structure
That kind of filtering is very easy to do in the training scripts, so you don't need to do them yourself. The algorithm also implicitly adjusts how it generates clips based on clip length, so I don't think short clips should cause problems as long as they're long enough to not trip up training performance. In practice, I think clips around 0.7 seconds and shorter cause problems. It's better to keep the dataset as complete as possible.
That said, 6-7 hours should be plenty of data to work with.
>>
NEW:
https://053f2d45fa8f.ngrok.io/
>>
>>35541336
Does anyone have ideas on how we can get a database of common chord progressions played at various tempos, with various instruments, in various styles?
>>
>>35546469
do you seriously need help knowing what singing sounds like? am I just that naturally musical?
>>
FastSpeech2 is really awesome, I trained a model with my Rainbow Dash Presents dataset which I previously gave up on with Tacotron2 due to resultant models being plagued with alignment issues, and it performs extremely well.
https://u.smutty.horse/lvvcmdhswwc.wav
https://u.smutty.horse/lvvcmdofcyt.wav
>>
>>35546574
I'm not asking for a transcription of chord progressions in songs, I'm asking for an extension of Clipper's BGM and SFX work to include music we can just add to generated clips. Like, generate a clip, figure out what tempo it's in, find a chord progression that would work with it, paste it in.
>>
>>35546617
oh that's actually super smart. I rescind my condescent
>>
>>35546613
Is there a training script?
>>
>>35546633
>>35520702
Synthesis notebook >>35522980, although the model it's running was my first ever one before I realized just how fast it overfits so it's bad. I plan to add the enhancements I did to my private version tomorrow.
>>
>>35546650
cool, thanks for your work. hope to try it out with my data tomorrow
>>
>>35546659
If your data is 1 hour or less it might start overfitting after about 1k or 500 steps with n_warmup_steps hparam set to 2000. I'll write a comprehensive guide sometime.
>>
File: cadence wheeze.png (171 KB, 1000x820)
171 KB
171 KB PNG
>>35543472
>https://vocaroo.com/bmaWw3kKZom

she sounds like she has brain damage
>>
>>35546802
linked the wrong file
>https://vocaroo.com/1SLdkZpr3rs
>>
File: pusspuss (248).png (475 KB, 900x900)
475 KB
475 KB PNG
>>35546444
rip?
>>
File: 1593174666890.jpg (218 KB, 2007x1365)
218 KB
218 KB JPG
>>35514666
This is a google drive for all of the good Pony AI content that you guys make using 15.ai
https://drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp?usp=sharing

The content that resides there is entirely dependant on anons that lurk these threads and see these links to add saved content to the drive.

You Can Upload Content Here
https://drive.google.com/drive/folders/1ghKZKsOvBoI8KnDgDdLOQrUB2aon0Xod?usp=sharing
>>
>>35547097
It's up to someone else, I'm at my GPU usage limit again
>>
https://135fb2aeb16c.ngrok.io/
>>
b
>>
I've noticed an issue with the resources section in the Google Doc, some of the Mega links for the downloads of the 5.1 Netflix audio don't have anything in the folders. Could whoever did those upload please double-check the folders and re-upload?
S2 - https://mega.nz/folder/p1kTyaIK#bTia7IpcRrWFkFkkCwnZJA
S4 - https://mega.nz/folder/0xUjUQyK#-bbdcJQHDRIZDAf-7SnDxw
S5 - https://mega.nz/folder/9xUzVCwS#MXAaSROO3dxbT2lYkhlM6A
>>
New FastSpeech2 synthesis notebook, with RDP and Celestia models:
https://colab.research.google.com/drive/1GvWSYXzJ7CQLp5FGseUgNU9ujGfKKymL?usp=sharing
>>
>>35549042
Working on it. Will update the doc as I have the links ready.
>>
>>35531390
Based and cunnypilled.
>>
File: 828074.png (268 KB, 900x553)
268 KB
268 KB PNG
>>35514666
>>35543453
I've done the SFX and BGM for s2e17 - 19. I've also done the SFX for s2e20, but I can't do any of the music for 20 - 26 until the 5.1 audio's available to download. I'll keep working on the SFX for 21 - 26 in the meantime.
>>
>>35550393
Season 2 should already be fixed in the doc. Others coming soon.
>>
>>35514662
So, anything interesting to report?
>>
>>35550450
Ngrok can sing apparently.
>>
Speaking of can anyone host a new Ngrok?
Did anyone try to use Rome's reverse proxy instead of Ngrok?
>>35526121
>>
https://025c43038f10.ngrok.io/
>>
>>35550450
Look up
>>
File: chrysalis.gif (2.28 MB, 503x360)
2.28 MB
2.28 MB GIF
https://vocaroo.com/9kDFTyyZ4Yb
>>
File: CoolBugFacts.jpg (47 KB, 750x748)
47 KB
47 KB JPG
>>35551715
kinda reminds me of this
https://voca.ro/lvhPxVgLETR
>>
>>35550409
Thanks, I'll start working on those now.
>>
>>35550643
It's ded
>>
>>35552614
https://07bec76f44bb.ngrok.io/
>>
>>35552665
Ded, do we not have a better mechanism to replace Ngrok?
>>
>>35537377
>Cookie
What flavor of Linux is recommended now to start training sets offline? Someone said Ubuntu, then other's said Manjaro, does it really matter?
>>
>>35553986
The "better method" is copy the colab notebook to your google drive and run it yourself. It's not complicated: just follow the instructions, copy-paste once and click run like six times. If you can install a program you can do this. Relying on "ngrok" links is like relying on other people to write your own name for you.
>>
>>35554175
Now I would but colab requires a google account which requires a phone number. And I haven't found a public sms receive service that's "allowed" by Google.
>>
>>35554215
I can make new google accounts all day long without a phone number. No idea why no one else can. Here, have one I just made.

randocolab@gmail.com

Password: randompassword

Better get it before someone scoops it up.
>>
>>35554245
Wow, thanks Anon.
>>
>>35554215
You don't have a phone that can receive SMS? How do you even live in the current century?
>>
>>35554335
> Implying I live.
>>
I decided to test out how well FastSpeech2 holds up against small datasets by training a Nightmare Moon model, only excluding "Very Noisy" total data was just 70 seconds. The results at 600 steps are surprising:
https://u.smutty.horse/lvvugpycwuy.wav
https://u.smutty.horse/lvvugqgsjyh.wav
https://u.smutty.horse/lvvugqzbqho.wav
This was fine-tuned on the LJSpeech model which is quite different so there's definitely room for improvement.
>>
>>35554245
It's been a long time since I've been able to make a Google account without a phone number, even with tricks to bypass it. Maybe they haven't required it for your region?
>>
>>35551715
Kek. The mechanical rasping in her voice reminds me of her S2 voice.
>>
The audio links should all work now.
>>
File: 1031549.png (109 KB, 371x281)
109 KB
109 KB PNG
>>35514666
>>35550393
I've done the SFX and BGM for s2e20 and 21. Almost done now, shouldn't be too much longer.

>>35554972
All looks good now, thanks.
>>
NEW:
https://6150a3e9163b.ngrok.io
>>
>>35535798
[Still In progress] Load persona nerd's data. I've had to rewrite this several times to make it so the code isn't hell to work with. Later on, I'll be doing a lot to extend this part, so I want to make it something I can easily come back to and extend. It currently loads all of the Tacotron datasets I've seen posted in PPP threads so far. There are some quirks in these datasets (e.g., missing files, some have arpabet and some don't), which I'll be working around or fixing.
https://pastebin.com/E9az9arA

Whoever posted the x8f7tj dataset, you only have arpabet transcriptions. You should include english transcriptions as well.
>>
>>35556240
So will we be able to load our custom datasets in soon?
>>
>>35556937
I have a lot left to do (see >>35535798), but it will work with custom datasets.
What's stopping you from using your data right now?
>>
File: c94.jpg (92 KB, 1200x663)
92 KB
92 KB JPG
>>35555365
>internal server error

i hope i didnt break it for you guys too, i just wanted luna to comfort me mang.
>>
>>35557269
Looks like it just gave up on generating audio. Here's a new link, should all be working but I don't know how long it'll last.

http://591ebe5c8fd7.ngrok.io
>>
You sure this wasn't intentional?
https://vocaroo.com/aYTzkPxSuSH
>>
>>35557284
dead

how long do these typically last and why do they go down so often? is luna anon really to blame?
>>
NEW:
https://f3e1a150ec4f.ngrok.io/

>>35558472
>how long do these typically last?
5-8 hours on average, from what I've seen.

>Why do they go down so often? is luna anon really to blame?
Haha, no. The reason they go down so often, to my understanding, is because we're using completely free server hosting that's only intended to be used for program/service testing, and not as a permanent solution, so the servers get shut down once they've been on for a while. That's what I think anyway, i'm not all that familiar with the code side of things so someone else might have a better answer.
>>
>10
>>
>>35559738
It's basically just us both in this thread now
https://9c32323a1c71.ngrok.io/
>>
>>35559770
Things wont pick back up until 15 reopens his website
>>
>>35559785
Unfortunate but likely true. That can't happen soon enough, I want to see this place brimming with new content again. Oh well, at least the PonAI Drive has lots of stuff catalogued to keep me entertained in the meanwhile.
>>
It'd be nice if 15 could give us some of his WaveGlow models.
>>
>>35559785
isn't this >>35559770 the same as 15's site but... better?

would it be expensive to have our own site that doesn't go down every couple of hours? I like using this tool much more than 15's
>>
>>35560237
>would it be expensive to have our own site that doesn't go down every couple of hours?
You would need to rent high-end NVIDIA GPUs for the site to do TTS. Based off a rental cost of 3 cents per hour, one GTX 1080 would be $215 in a month, although you can probably negotiate this price down with the provider.
Although with something that can run quickly on CPU like FastSpeech2 + Multi-Band MelGAN it'll be much more economical.
>>
File: 335737.png (129 KB, 418x415)
129 KB
129 KB PNG
>>35514666
>>35555192
I've done the SFX and BGM for s2e22 and 23. I should be able to finish the last three episodes tomorrow.
>>
>>35560237
15 is going to have to pull off a fucking miracle if he wants his voices to be better than these. the only issue these have really is that they start speaking really fast in longer sentences. everything else is pretty great honestly.
>>
>>35560727
can anyone explain to someone who doesnt know this stuff very well why 15 has the fantastic, sometimes almost perfect QUALITY, but the ngrok models have the much more natural pronunciation and speaking? What's holding us back from having both?
>>
>>35560812
We need more compute. The quality comes from having a good vocoder, but training a good vocoder requires a lot more compute than everything else. Cookie hasn't been able to experiment with vocoders because of that.
>>
>>35560912
If anyone has a credit card they can sign up to Tensorflow Research Cloud and get a ton of free computing power but you need to redeem the $300 Google Cloud trial to cover other costs like storage.
https://www.tensorflow.org/tfrc
>>
>>35560931
I think they only provide TPUs in bulk. If we have a version of WaveGlow that can be trained with TPUs, that would work.
>>
>>35560965
>If we have a version of WaveGlow that can be trained with TPUs, that would work.
Tensorflow 1.x and 2.x models are very easy to convert to TPU.
>>
>>35560912
15 seems into much compute
>>
>>35560971
It depends on how the models were developed. There are a lot of GPU features that TPUs don't support. If you try just flipping the distribution strategy on a complicated model, you'll be surprised at the number of things that wrong.
>>
>>35561027
I forgot NVIDIA/WaveGlow, unlike their Tacotron 2, is PyTorch (XLA is not in the table), not Tensorflow. But there are many vocoders. I have gotten TensorflowTTS to start training on TPU before it threw an error because I wasn't using Google Cloud storage.
>>
>>35561097
https://github.com/pytorch/xla might work, though I haven't tried it myself. I've heard that people have an easier time using TPUs with PyTorch than with Tensorflow models.
>>
>>35561097
>I have gotten TensorflowTTS to start training on TPU before it threw an error because I wasn't using Google Cloud storage.
Can you post the code?
>>
>>35514662
>down for (((maintenance)))
yeah right, someone didn't like our progress and killed off 15.
>>
>>35561730
15, if someone is destroying you, please "leak" your models and code so that we may continue.
>>
>>35561730
He claimed he did a big oof almost a month ago right before it was supposed to actually come back and he's been silent ever since. Hasn't even come here to glory hog. Maybe he's finally gone.
>>
>>35561751
Maybe his university was wondering why thier servers were filled with pony lewds?
>>
File: 1570668840498.png (452 KB, 788x622)
452 KB
452 KB PNG
>>35561805
>Breaking: MIT Servers filled with pony lewds
imagine my sides kek
>>
>>35561739
the whole point of (((stoping))) someone is that they will kill you if you do that. why do you think moot sold this dump to them?
>>
>>35561823
Just blame Russian hackers or something.
>>
NEW:
https://3b9a11352306.ngrok.io/
>>
File: oh shit.png (1.82 MB, 790x1761)
1.82 MB
1.82 MB PNG
https://vocaroo.com/ofe3lxCYffH
>>
>>35562414
bitch get naked
>>
File: ryfd64.png (215 KB, 600x600)
215 KB
215 KB PNG
https://voca.ro/6ZAIMBFQzat

excerpt from a luna fic
>>
>>35563009
And I reply:
>Yes
>>
>>35563009
>TWO held hooves
>>
Whoever did the Daria dataset, these files are missing noise tags:
https://pastebin.com/SJXzFGjh
>>
File: a.png (460 KB, 556x600)
460 KB
460 KB PNG
Asset converting Anon here. Converting the leaked assets into Blend files.

Seems like a slow thread.
But making, bakn' progress.

Basic setup/investigate general conversion strategy [X] -- Figured out.
1. Use jsfl to programatically convert each fla file to XFL format
2. Parse XFL's XML files.
3. Load serialized data into the Blender specific formats.

Parse/figure out XFL's file's Edge encoding. [ Almost there ] -- Still have to test on corner cases.
Figure out how to get serialized data into blender specific objects. [ ]
\-Serialize Edge data -> f-curves -> Grease Pencil objects. [ ]
| \- Add fill/stroke Style, layers, frames, gradients [ ]
|- Add tweening [ ]
|- Add remaining assets (audio, bitmap images) [ ]
|- Add Camera [ ]
|- Add Scenes (Stitch individual scenes together into the same blend file) [ ]
|- Make proper rigs out of Adobe animate puppets [ ]
|- Add Remaining effects. [ ]
>>
>>35563398
Can you dump the list of assets you have into pastebin or https://smutty.horse/?

It looks like pose estimation datasets are usually built using MOCAP. The equivalent for us would be generating a bunch of animations (maybe randomly) where we know exactly what the pose is. After training a model that can recognize arbitrary poses, and we should be to turn the entire show into a labeled pose dataset with zero manual labeling.
If we have assets for clothing, we can take the same approach for that. Create an AI model that can recognize them in different configurations, then run that on the show to get a dataset that captures how they move.

I don't know what to do about textures.
>>
>>35563492
Again, not to burst your bubble but it's the 2D leaked assets from the show.
I'm converting Adobe Flash/Animate project files to Blender (which can do 2D animation).

However I think there is a opportunity here to have a hybrid, where a you can use a neural network with a 3D model as a pose reference to render with a 2D styled pony.

Although for now, it's grinding groundwork for future anons.
>>
File: 1034_PB_Dabble.png (892 KB, 1261x2802)
892 KB
892 KB PNG
Will someone here make model of Princess Bubblegum if a made you dataset for this?
>>
>>35563373
If they've been identified as too noisy for use then just delete them entirely
>>
One thing I have found frustrating is when a model says a phrase in a certain way that matches exactly what you want once, but when you try to do it again, it says it differently every other time. Is there any way you can have a model convey a certain phrase the same way every time? Or is it just a matter of saving the said phrase as an audio file and then paste it in whatever software you're using?
>>
>>35561429
https://colab.research.google.com/drive/1WiToclhJzxqR1eoP-fitL5d5o7kgDSb3?usp=sharing
>>
File: X_up.png (82 KB, 1395x1364)
82 KB
82 KB PNG
>>35514666
https://mega.nz/folder/2phUgAKL#BeRolsWg_cAAsX1CSYKL2w

Here's a WIP custom dataset for X from BFB. Will be posting the datasets for Two and Four eventually.

Notes:
- At the moment, there is only 2 minutes and 8 seconds worth of audio, however I'll be adding to this folder over time.
- The audio is from episodes 1, 2, 3, 4, 9, and 16.
- It's all at 48khz, some of which is converted from 44.1khz.
- The beginning numbers in the filenames are fairly meaningless overall, they're just left there to match the MLP filename format.
- The transcriptions are ARPABET.
>>
>>35562414
>>35563009
Nice ones.
>>35563867
>Is there any way you can have a model convey a certain phrase the same way every time?
Nope. The models vary the generation each time. There's no method to reliably getting the same output each time. However, if you're using the Ngrok models, you can use the back button in your browser to get to the older ones.

>saving the said phrase as an audio file and then paste it in whatever software
This is what I've always done, is there another way of going about this that I'm not aware of?
>>
File: 961999.png (1 MB, 1219x1177)
1 MB
1 MB PNG
>>35514666
>>35560379
I've done the SFX and BGM for s2e24 - 26, fucking finally. I'll start re-organising the files tomorrow, I don't know how long it'll take but I can't imagine it'll be more than a day or two.

Here's a taster of some of the stuff we've got:
MLP intro instrumental - https://u.smutty.horse/lvwrqrrircs.wav
Hoers noises - https://u.smutty.horse/lvwrqrojjli.wav
Spike - https://u.smutty.horse/lvwrqruoiuu.wav

>>35563398
Nice to see someone being able to make use of those, I really do hope that we can make AI animations work eventually.

>>35563824
There's no guarantee that anyone will do it for you, though Cookie might be able to add it to the multispeaker model on ngrok, and you can try your luck and email your completed dataset to 15. You can use the Google Colab scripts to train your own model. See the Google Doc in the OP for instructions.
>>
>>35565432
>MLP intro instrumental - https://u.smutty.horse/lvwrqrrircs.wav
Why do such abomination from 5.1 when there's an actual clean studio instrumental in the leaks?
>>
>>35565693
>Abomination
I think it sounds okay with some of the backing vocals in there, besides I'm not really expecting anyone to have use for it and only included it because I thought it was interesting.

>Leaks
I was only able to find this earlier version in the leaks. If you know of one for the version that was actually used in the show, point me to it.
https://u.smutty.horse/lvwslitdfwh.wav
>>
>>35565807
S9 episode masters have separate background and voice tracks, this one's from those.
https://u.smutty.horse/lvwsnjclayf.wav
>>
>>35565832
Fair enough. I'll replace it with that version when I do the re-org. Thanks.
>>
>>35514662
Hey anons, I'm curious, do your family and friends know about your work in this project?
>>
>>35566344
I've told some friends about the amazing strides AI is taking.
My family is too busy talking about their own nonsense, they don't listen to me, and I sure as hell don't try to bring up personal things since at best, they won't understand, and at worst they'll invent something in their head that I said.
>>
NEW:
https://f3530bd9065d.ngrok.io/
>>
>>35565432
RainShadow's YouTube channel has rips of almost all the music from season 3 onward, so I don't think it's absolutely necessary for you to keep ripping BGM.
>>
>>35566344
I've sometimes used the voice generator to type something and reply to them in conversation in the same room.
>>
>>35564674
I have a tip. To get cleaner BFB audio, the .flas are available and they have musicless audio. To rip audio from those, install 7-zip, rename the ".fla" part to ".zip", open it, double click on the folder "LIBRARY" and drag and drop the .wav file onto yoir desktop or where ever you keep the episode audio.
>>
>>35567017
I forgot to say, the files for some reason are only available up to episode 16
>>
>>35563600
>it's the 2D leaked assets from the show
We've had several leaks now. I have the Xmas 2017 leaks, Hackintosh's s2e14 leak, and Hackintosh's tst leak. There was one before the Xmas 2017 leak, but I can't find it. It might be on one of my old hard drives, but it'll take some digging to go through those.
https://www.equestriadaily.com/2014/06/the-official-mlp-flash-asset-leak-and.html

I know Hackintosh had leaked the assets for a few more episodes since Shimmermare was rendering them in 4K.
We'll want to pool these together and catalog them so we know what we have to work with.

>>35563826
Those clips are all clean. You're just missing a second underscore after "Neutral" for these clips.
I plan on uploading my copy of the datasets. I'm letting you know so you can update your own local copy.
>>
>>35566344
Yes
>>
>>35566344
Some of my friends that have been into ponies for 8+ years know.
Some other friends know I'm working on this project but don't know how I'm contributing.
Some other friends know how I'm contributing but don't know what the project is.
Family... hell no.
>>
>>35566344
I've told people in very vague terms what I do here. Effectively nothing more than "I create audio stories using artificial voices." There's no way I'm going into detail beyond that, lest they discover my audios and in turn my true power level.
>>
>>35566344
My friends (who all know my true power level) do. Not my family, I don't want uncomfortable questions.
Also talking about voices, FastSpeech2 seems to be very sensitive on small datasets. These samples are from an NMM model just 100 steps apart:
500 steps:
https://u.smutty.horse/lvwvnztpgms.wav
600 steps:
https://u.smutty.horse/lvwvnztorim.wav
Which do you think is better?
>>
>>35567254
I find the second one to be much easier to understand.
>>
I don't know what's wrong but the Snips model on the ngrock is very unstable. It stutters A LOT and fumbles words, especially with the MiniWaveGlow...the Large Waveglow fixes the instability a little, but not too much,
>>
File: animation proposal.png (146 KB, 1094x938)
146 KB
146 KB PNG
See pic for a proposal for how to proceed with animation AI.
>>35565855
We may need labels for Replicating Show Animations, step 4. I suspect we'll be able to do this programatically, but it's not clear right now. We'll definitely need labels for Recognizing Natural Motions, step 1.
>>35563600
Thoughts?
>>
>>35567579
You mention video games and I wonder if a 2D SFM like system where you could record game-like inputs and have them played back as animation could work. You could then train an AI to replicate an animation by training it on a data set consisting of stage directions and the controller inputs that match up with those stage directions. It would be like when people train an AI to play an NES game but instead of trying to complete a level, it's trying to learn how inputs match up with the given stage directions.
>>
>>35567043
If I'm correct, I've got the Xmas 2017 leak, and the latest Hackintosh leak.
>>35567579
Good start,
Read everything and most of it is doable on paper.
But dear Luna, that's a a lot of work.

Too tired to give point specific comments, so I'll withhold my most of my thoughts until I'm done with current project.
Which is (im)porting the leaked fla files to Blender.

You mentioned you needed programmatically animate and use (character) assets.
Blender has a Python API which should allow for what you are looking for.

Depending on how well it goes points 1,2 from replicating Show animations will be relativly trivial. I'm also planning on making a rig for the characters but let's not got too excited lots of work still ahead.
Hopefully I'll have something to show off soon.

What do you mean with "programmatically retrieve images from the show", shouldn't that trivial?

Proposal overview point 3:
"in a AI" -> "Have an AI"?
>>
>>35514662
So is 15 AI back up yet?
>>
>>35568973
No, it's still down. Also, not to be rude, but it's not that hard to go check for yourself.
>>
>>35568973
He lurks these threads so just wait for him
>>
>>35567043
>Those clips are all clean. You're just missing a second underscore after "Neutral" for these clips.
Oh alright. I was drunk for a lot of that work. Looks like I did that episode at 2AM so it matches up. Anything in particular you're doing with the data or just archiving it?
>>
>>35568223
I think so. An anon mentioned trying to use game engines earlier. I'll be browsing through the API docs for game engines like Unity as well as animation software like OpenToonz and Blender. If we could actually control pony movements like in a game, that'd be pretty cool.

>>35568246
>But dear Luna, that's a a lot of work.
Unfortunately, it looks like the path of least resistance.
>Blender has a Python API which should allow for what you are looking for.
Blender, OpenToonz, and Unity are at the top of my list for candidates. I'm not too picky about the language. Python has since it has better data science support and since my data scripts are written in Python, but exporting data for consumption within Python will be easy.
>What do you mean with "programmatically retrieve images from the show", shouldn't that trivial?
Yeah, that should be trivial.
>"in a AI" -> "Have an AI"?
My bad. That was supposed to be "Train an AI". "Have an AI" is more appropriate.

Later today, I'll break this down into smaller tasks that anons can work on piece-by-piece, and I'll make one of my beloved checklists. Hopefully tomorrow, I'll make a guide for how to create the labels we'll need. After that, I'll be bouncing back-and-forth between data work and animation work.

>>35569210
I'm seeing if I can build a better tool for working with datasets. Loading, cleaning, normalizing, augmenting, preprocessing. All the good stuff.
>>
Hosted a new ngrok to show it off to some people, figured I might as well post it here. I'm on a free instance, so it probably won't be up for long though

https://066178230b6c.ngrok.io/
>>
>>35569786
Please don't bring newfags here like the other nigger did last week who unironically post 2011-era /co/ shit like "we're gonna love and tolerate you so hard!11!". Fuck that's annoying.
>>
>>35569814
As long as he doesn't link the thread it should be fine.
>>
>>35569814
at most they would stay for like 5 minutes before leaving if this thread were any indication

or even sooner if the link expires
>>
File: 1592063292121.jpg (135 KB, 628x577)
135 KB
135 KB JPG
What brailet like me is doing wrong?

https://u.smutty.horse/lvxctbfnstr.mp4
>>
>>35565432
I'm done re-organising the files on my end. I want to write a readme for the master file and a guide for the tagging system of the SFX and BGM. Once that's done, it should be ready to go. Aiming to release it all tomorrow.

>>35566344
My family found me watching the episodes several years ago, but never really questioned me on it. I've not revealed my power level nor said anything about the project to them, mostly because I can't be bothered to deal with the awkward questions. They're also not really clued up about the fandom or AI, so it's not like they'd be able to fully understand what I'm doing even if I did explain it. I haven't revealed to any friends or colleagues either, mainly because every time I've done that in the past it's become the one and only thing I'm known for, which gets old pretty fast. I only ever make the occasional subtle reference and watch to see if any of them catches on.

>>35566982
It's less about quantity and more about labelling. I could just extract the raw BGM from the 5.1 audio and dump it all in a folder, but that wouldn't be very useful to anyone who's actually looking for something specific for a skit or whatever they're making. Also YouTube audio is never going to be as good as the original source so it'll be best to stick with the resources in the Google Doc for any future work.

>>35567579
That doesn't sound like it would be too difficult to label if it would just be describing the characters in each scene and giving a one or two word explanation of what they're doing. How difficult would it be to train an AI to recognise these things to label automatically? It sounds like it should fairly easy for at least identifying the characters.
>>
File: 1587669548436.webm (2.14 MB, 1920x1080)
2.14 MB
2.14 MB WEBM
>>35569963
I was never able to get any of that process to work. No idea how it works though I spent hours trying to understand it. I rigged my own solution out of a couple different techniques and pestering people who are more knowledgeable than me in python. I do all of my work on my Windows desktop but for some bizarre reason python will absolutely not execute correctly. Probably need to reinstall Windows because it's a garbage operating system but instead I do the python necessary stuff on a laptop I have Ubuntu installed on. It's the final step anyway so then it just goes right to Drive upload.

The first script takes all of your data and formats the audio into the progressed format read by the tacotron scripts. Simultaneously it creates a singular text file with the transcript you created for all the audio ripped and reassigned a matching value. Then the second python script splits that mega transcript into 90-95%% training data and 5-10% confirmation data.

Then you go from there. Create an archive of the processed audio in tar compression, upload to drive along with the train and val txt files. I have edited the colab scripts for my workflow so then when the script is run it pulls those archives, extracts them where they need to be, and then I can create models.
>>
>>35570115
>How difficult would it be to train an AI to recognise these things to label automatically? It sounds like it should fairly easy for at least identifying the characters.
Image recognition algorithms are pretty powerful, so I suspect it would be easy to recognize which characters are in a scene. For that, we would need a bunch of show-style drawings of characters. With that, we should be able to train a model to recognize which characters are in which scenes. We might even be able to do this using non-show-style images. The boorus should contain plenty of data for this. We just need to collect and filter it.

I'm less familiar with video recognition algorithms. I'll need to read up on video recognition AI to see how it's done and what it can handle. I suspect it would be fine to use the boorus again to build such a dataset. We would need to start with a bunch of relevant search queries (like https://derpibooru.org/search?q=twilight+sparkle%2C+teleport%2C+animated). From there, we could again collect and filter the data.

If the data would be good enough, the boorus might be a better starting place to collect images and animation clips.
>>
new ngrok WHEN
>>
>>35570672
https://f160f3d022d5.ngrok.io/
>>
>>35570115
RainShadow's YouTube videos all have links to Mediafire downloads for FLAC versions of the music.
>>
>>35570748
how long does it take for you to even keep remaking these?
>>
>>35571948
They only take like 2-3 minutes to set up once you have things in place.
>>
Since no one hasn't trained a Snips model for the colab book, and the one on the ngrok is quite unstable, what would it take to train a Snips model in the colab notebook? A 44K MMI one?
>>
>>35570748
its dead, jim.
>>
>>35572650
NEW:
https://69d130bfe3dd.ngrok.io
>>
File: Weird....png (274 KB, 1344x1326)
274 KB
274 KB PNG
Uh, guys? Does anyone else see what I see?
>>
>>35572977
What?
>>
>>35572580
Basically follow the directions in the doc. Since Snips will likely be limited in audio, might want to use an existing model as base. Use Twilight if you can't think of someone better. Note that most male voices have difficulty, so your mileage may vary.
>>
File: 1590273612340.png (264 KB, 901x816)
264 KB
264 KB PNG
>>35570142

Well, can you give us these scripts please? And give these .txt files too.

>I have edited the colab scripts for my workflow so then when the script is run it pulls those archives, extracts them where they need to be

Can you show what exactly you edited in the colab scripts please?
>>
>>35572977
Did they copy 15's code?
>>
>>35569963
Those (old) scripts are very rigid. That one is only meant to parse Clipper's master file. You're running into that error because "null" isn't a valid episode name, per the ones in Clipper's repo. You'd have to modify the scripts to work with your own repo. Even if you fixed that, you'd probably run into an issue with the character name, then another issue with the lack of audacity files.
If >>35570142 can provide the scripts, that would be best for now. Otherwise, I can show you how to create a Tacotron dataset using the new (unfinished) scripts. It won't give phoneme pronunciations yet, though I can probably have it spit out some approximation.
>>
>>35569963
>>35573254
By the way, the easiest fix you can try with the old scripts is this:
- In synthbot/src/datapipes/clipper_in.py, add an entry around line 11 for your character. 'Null': 'Null' should work.
- In the same file, add an entry for your folder name around like 354 under EPISODES. 'null': 'null' should work.
>>
File: nigger.png (262 KB, 1089x1024)
262 KB
262 KB PNG
https://voca.ro/kwLL9EyTbAQ
Golly says the N word and everyone fucking dies
>>
File: 1417027705644.gif (52 KB, 360x360)
52 KB
52 KB GIF
>>35573332
Pure art right there.
>>
>>35573332
How the fuck are these models so good at singing?
>>
>>35568246
>>35570115
We might be able to use derpibooru for most (if not all) of the data we need.
https://derpibooru.org/search?q=screencap%2C+animated%2C+-edited+screencap%2C+-pony+life%2C+-equestria+girls
We'll need to filter this, but that would be a lot easier than clipping and labeling. Most of these images have at least one tag that would work really well as a command. That would make manual labeling a lot easier.
>>
>>35572977
>>35573039
>>35573224

Resemble.ai is a crappy paid service that wants you to pay 200 a month to clone your own voice, not anyone else's. It's a scam to get cash out of gullible businesses.
>>
File: I am the niggest.png (445 KB, 640x640)
445 KB
445 KB PNG
>>35573332
That was dumb as fuck. Perfect.
>>
>>35572977
>the virgin resemble vs the chad 15
>>
>>35573224
They consist of the same concept: high-quality voices from very little data. I wouldn't be surprised if the group of 15 were the creators of Resemble.AI.
>>
https://colab.research.google.com/drive/1xa_AbuMdxgwWNDZWY2mMm4Oh40Aw4p40?usp=sharing

This is a modified colab script that I tweaked to suit my workflow. It's 95% the same as original but the important bit is the "Database Setup" Code Block. It is set to pull your tar.xz archive containing only the processed data (all the line_1.wav etc), extract it, and dump it where the script can find it. The archive can't have any subfolders, only direct packing of the wavs. Then it pulls your train and val txt's, also created by the python script. It's all automated, all you need to do is have them findable on your Drive.

Here is the package you need;

https://files.catbox.moe/aczd56.7z

Now you can follow the webm. It's currently set up for 48000 sample rate. It can be changed to 22050 or whatever you want simply by changing the number in compile.py. I don't know what the other two-three people are doing for their process on non-MLP voices but I do it this way because once you have it all set up it is totally generic, two click, and can process any voice as long as its raw flac and txt transcripts are formatted like this.

Word of note: There is some kind of bug, I believe, when you split the train and val txts. It adds 1-4 blank lines to the bottom of them most of the time and the tacotron colab script will crash if they're left. After both compile.py and split.py have been run you need to open train and val with an advanced editor such as Notepad++ and make sure the very final line has some text on it. Delete any blank lines and then save it or else you won't get very far.
>>
>>35573332
Never change /mlp/ i fucking love you
>>
>>35573332
>models have trouble finishing most sentences
>miss a thread
>comeback to twilight and cosy glow singing the n word
What happened?
>>
>>35574375
Someone discovered that adding --------------------------------- and !!! randomly to your text input makes most of the models sing. No one's quite sure why.
>>
>>35574396
Funny
>>
>>35574533
Don't forget cute
>>
>>35573128

I was gonna try with Applebloom or Sweetie Belle because they have the same vocal range. Snips has one of those irregular voices, which is why I thought it would be easier to train his model with a character close to his age.
>>
>>35574375
It's all in this thread and it's glorious:

>>35531390
>>35531564
>>35532125
>>35537236
>>35537366
>>35538587
>>35538768
>>35540530
>>35546034
>>35562414
>https://clyp.it/biovni3n
>>
inflation
>>
>>35575596
based
>>
>>35574721
Someone should have pitch-fixed that.
>>
File: 1446630.png (184 KB, 500x558)
184 KB
184 KB PNG
>>35570115
Upload's almost done.

>>35571497
Are they any better than what we already have from the 5.1 rips? If so then I might take a look a bit later on.

>>35573372
It does look like Derpibooru would be a good place to start. If we can train an AI to at least recognise characters with that data it would make labelling clips from the show much easier. For video recognition, could you maybe just have the AI analyse a sample of still frames from an animation clip? Sounds easier than trying to make it work on a whole video.
>>
File: 23700.jpg (238 KB, 1280x1525)
238 KB
238 KB JPG
>>35514666
>>35575652
Here it is, the Master file part 2.
https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ

This is where I have uploaded the SFX and BGM extracted from seasons 1 and 2 of FiM. There are a total of 2,918 sound effects and 2,524 samples of music to use for your voice projects.

Here's a quick rundown of what's changed:
The master file 1 (same link as usual) contains all the clipped audio for FiM, EQG and the special source audio. It is intended to be the primary source of audio material for MLP voice datasets.
The master file 2 contains everything else. It is intended to be used as a supplement to the primary master file, as well as containing various miscellaneous extra data relevant to the project.
Content-wise, nothing has changed aside from correcting the typos from >>35535788, all I've really done is move some folders around. For a more detailed explanation of the content of the two master files, refer to the readme in each file.

I don't really intend to go any further than season 2 with SFX and BGM for now, mostly because I'm a little concerned about overloading people with choices and we'll be more likely to run into re-used assets as we progress through the episodes. If you need a set of specific and unique sound effects/music clips for a project that aren't covered in seasons 1 and 2, give me a list of stuff to look for and the relevant episodes and I'll see what I can do.
>>
>>35575985
Here's a quick rundown of the tagging system of the SFX and BGM:
The general format of the tagging system is "Tag, Tag, Tag ~ FiM_sXXeXX"
The tags are all separated by a comma, and each file name ends with the source of the clip. Any files that happen to have the exact same set of tags and source are differentiated with a -2 -3 -4 etc suffix.

There are a few key points to note with the tags, the most important one being that they are all highly subjective and open to interpretation. The tags for every clip are what I happened to come up with at the time of listening. It is quite likely that others will disagree with some of the tags that I've used and/or have other ideas for words that better describe the sound clip. I'd like to eventually have a crowdsourced system of tag editing similar to what you get on Derpibooru, but I don't have any idea how to go about doing that, suggestions are welcome. If you come across a clip that has a set of tags that you REALLY dislike, drop me a (You) with the clip in question and the changes you'd like to suggest. Please also refer to the "SFX and Music Guide" in the Master file 2 for more detailed information.

I took care to minimise typos when clipping the SFX and BGM, but it’s possible that there are a few that slipped through the cracks. Please let me know if you find any issues. I’m also happy to answer any questions about the tags if anything’s unclear or to make changes if anyone has any suggestions for improvement.
>>
>>35575994
Hell yeah, great work Clipper. This'll very much come in handy for audio stuff.
>>
>>35573332
Wonderful. just wonderful
>>35574396
THAT's what's doing that? Holy shit I thought they had added something that lets you actually control the pitch...
>>
>>35575652
RainShadow was also making 5.1 rips of BGM, but his clips tended to be longer than yours because they're usually entire scenes.
>>
>>35574721
Oh shit, i thought >>35574396 was just fucking with me. Wtf
>>
>>35576563
try for yourself
https://29395036da82.ngrok.io/
>>
I have the entire sound file of Tails from Sonic Heroes. Would this be enough to train a Tails model with the help of a Rainbow Dash base?
>>
>>35578197
How much data is it, in terms of runtime?
>>
>>35578337
How would you calculate that?
>>
>>35578386
Basically the duration of time he spends speaking in the sound file. The more the better.
>>
>>35578553
Average is about 5 seconds to a file. There is a total of 375 files worth a total of 66.5MB.
>>
>>35578578
That sounds like a good amount, you might actually want to start training it without the base to hear how it sounds.
>>
>>35578596
Okay...I'll see what I can do.
>>
What if all the audio files I use are 24K? Will they work if I change the sample rate to 44K in the notebook?
>>
>>35572977
>>35573390
Have you guys seen the Linus Tech Tips video where they use this? It sounds leagues ahead of Lyrebird.

https://www.youtube.com/watch?v=34AmKPJNfCg

The deepfake is more impressive, though. I had no idea these models could replace entire heads now, not just faces.
>>
>>35578903
They probably already used something public to copy his voice, there's a crap ton of data of him talking around. Even when dealing with little data, fine-tuning FastSpeech2 creates pretty nice results.
>>
File: 1422146716713.jpg (17 KB, 233x205)
17 KB
17 KB JPG
>>35578903
>When Linus says most tech for cloning voices is just concepts behind research papers

Oh how little does he know.
>>
>>35578903
>I had no idea these models could replace entire heads now, not just faces
Imagine what will happen to the porn industry when we're able to deepfake bodies as well.
>>
Okay, so I don't know what I'm doing wrong here...but how can I get this to work?
>>
>>35578960
He was talking about speech-to-speech voice morphing, and I've always wondered why we haven't tried doing that.
>>
>>35579058
we've got the bodies, we just want nicer faces on them. sweeter, less ravaged faces. the problem is if you have a nice face and a nice body, you can make a lot more money than a porner, as a legit model.
>>
From my post above, I have the audio of Tails all uploaded and set up in GDrive. I read through the doc but it says to just run cell five in the audio processing notebook, but I keep getting that error in my above post. What's going on?
>>
>>35531077
>Shape Your Home
Every night, I close my work laptop ready to work on this project. I grab a quick dinner while reading that story. Then I read, and read, and read. Every single night.

>>35575652
>For video recognition, could you maybe just have the AI analyse a sample of still frames from an animation clip?
Maybe, but it would be better to just reuse whatever's already available. I don't expect too much difficulty with video recognition relatively speaking since there's a ton of work on it.
https://paperswithcode.com/task/action-recognition-in-videos

>>35575985
Success!
>I'd like to eventually have a crowdsourced system of tag editing similar to what you get on Derpibooru, but I don't have any idea how to go about doing that, suggestions are welcome.
I added that to the How to Contribution section of the doc.
>Web dev anons: create a *booru for audio data
>Clipper has created a dataset of sound effects and background music with his own tags. The intention is to have a searchable repository for sounds that anons can add to their generated clips. The additional background music and sound effects add a lot more depth to the clips.
>Derpibooru’s tagging system is very good for getting high-quality, searchable tags for images and animation clips. It would be nice if we could have a *booru for audio clips with a similar tagging system.

Todo:
[Up next] Dump a list of candidate tags that could be used to describe actions in animations. We'll need to filter these down to an actual list of action tags.
[ ] Finish the guide and checklist for how we're going to do animations.

I'm going to sleep for a bit first.
>>
Suggestions for new OP?
>>
>>35580804
-link the >>35364097 thread
-add "They can sing" in the latest developments
-remove 15.ai
>>
Well here is a ngrok link before the new thread.

https://ce4fbf0495b2.ngrok.io/
>>
File: 1579737071734.png (646 KB, 1920x1200)
646 KB
646 KB PNG
>>35580804
>>
>>35572977
>the resemble.ai mafia cabal killed 15
bros... wtf
>>
File: 1580257097812.jpg (150 KB, 592x444)
150 KB
150 KB JPG
>>35581770
>>
>>35580804
New master file link.
1 - https://mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
The primary source of MLP voice data for AI training

2 - https://mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ
Secondary storage for miscellaneous extra data

>Latest Developments
Master file 2 now contains clips of SFX and BGM from FiM seasons 1 and 2, intended to be used to enhance the quality of AI-generated clips, skits, voiced fanfics etc.

You can probably remove the torrent version, as it's likely out of date by now.

>>35579967
Looks like the groundwork for video recognition is well established, which is nice to see. Let me know if there's anything I can do to help you out, since I'm not really working on anything else at the moment. If there's nothing for me to do right now, I'll take a look at transcribing the books and comics or maybe tag some more stories.
>>
File: 1590685958971.jpg (370 KB, 735x594)
370 KB
370 KB JPG
>>35581966
based
>>
File: 1566189233915.gif (411 KB, 1280x759)
411 KB
411 KB GIF
and
>>
File: 1592183536793.jpg (16 KB, 280x210)
16 KB
16 KB JPG
good
>>
File: 1569878850309.png (144 KB, 397x417)
144 KB
144 KB PNG
night
>>
>>35580804
Waiting until page 10.
>>
>>35575596
Holy shit it was actually him.
>>
>>35580804
>>35580932
He's on his way back
https://twitter.com/fifteenai/status/1281481812900970496
>>
NEW THREAD
>>35582876
>>
>>35582178
Nice, hopefully this'll get some contentfags in here again.



Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.