[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

[Advertise on 4chan]


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: AltOP.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips for any person or character using clean audio recordings as a reference. As you can surely imagine, the ability to create audio in the voices of any pony you like has endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
derpy.me/FDnSk
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1xe1Clvdg6EFFDtIkkFwT-NPLRDPvkV4G675SUKjxVRU/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Cookie is working on controllable speech
Research into animation AI
Research into pony image generation

>Latest developments:
Singing Talknet models (>>37134971 >>37144858)
Animate automation tool available (>>37147092)
GDrive clone of Master File now available (>>37159549)
SortAnon releases script to run TalkNet on Windows (>>37299594)
TalkNet training script (>>37374942)
Delta updates GPT-J model (>>37554229)
Anonfilly bot update (>>37603355)
Delta GPT-J model (35.204.47.23:3389) (>>37646239)
GPT-J downloadable model (>>37646318)
News on Guided-TTS and Chunked Autoregressive GAN (>>37648659)
SortAnon found way to vastly improve TalkNet audio quality (>>37662611)
AI Dub doc: derpy.me/8q4Qc
Ways devs can help (>>37730470)
Delta GPT-J Notebook (>>37751617)
Progress on animation AI (>>37801447)
Possible phone synthesis (>>37815692)
Latest Synthbot progress report (>>37779282 >>37750996 >>37754318 >>37755436 >>37759445 >>37796212 >>37796731)
Latest Cookie progress report (>>37371181)
Latest Clipper progress report (>>37779262 >>37779282)

>AI REDUB COMPLETE!
-Ep1
youtu.be/gEXaFVw9J1o
derpy.me/ELksq

-Ep2
youtu.be/fIhj2bFYG4o
derpy.me/RHegy

-Unused Clips
youtu.be/N2730oPqLzE
derpy.me/OKoqs

-Rewatch Premiere
derpy.me/EflMJ

>The PoneAI drive, an archive for AI pony voice content:
derpy.me/LzRFX

>The /mlp/con live panel shows:
derpy.me/YIFNt

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>37736434
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: derpy.me/FDnSk
Main: derpy.me/lN6li

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: derpy.me/A8Us4
How to get the best out of them: derpy.me/eA8Wo
More detailed explanations are in the main doc: derpy.me/lN6li

>Where can I find content made with the voice AI?
In the PoneAI drive: derpy.me/LzRFX

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
derpy.me/pVeU0
derpy.me/Jwj8a

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
derpy.me/CQ3Ca
>>
File: ASmallAnchorUUUU.jpg (35 KB, 284x300)
35 KB
35 KB JPG
>>37816124
Anchor.
>>
Not a question related to any work that that anyone has done here, but did anybody has ever seen a proper academic (or amateur) attempts at making a ai powered sports/games commentator (with a select able personalities) ?
Like something that gets feed information (eg Player A beaten Player B by 5 points) and use that information to generate a text/audio response?
>>
>>37816276
I doubt it. It seems like much too niche of an application for most people to care about. Wouldn't it also heavily depend on what the sport is?
>>
Is there any way to get in contact with the AI guys? Skype or discord or something?
>>
>>37817451
It's called the thread.
>>
>>37817451
You're looking at it.
>>
File: the circle jerk of life.jpg (131 KB, 1024x575)
131 KB
131 KB JPG
>>37817451
>Skype or discord or something?
>were do you think you are?
>>
Whats the best way to synthesize stuff like laughter? My model gets phonemes as input and I don't really know how to map that to laughing sounds.

Also which Grapheme to Phoneme model are you using? The Model I am using has problems handling stuff like a long 'a' written as 'aaaaaaaaaaaaaa' for example because stuff like that is not in the training set.
>>
Both Bert and DeepMoji get word embeddings as input. Even cased Bert doesn't have embeddings for stuff like SUCK. Does someone know how to map SUCK to a different sentiment than suck, because I saw that many samples are spelled that way in the MLP dataset.
>>
>>37817451
Lurk here to find them. 15 you might be better off e-mailing, but only if it's something of substance and not "Can you do X character for me please?" Just ask whatever you're going to ask here.
>>
>>37817583
Are you using ARPABET phonemes? If so, have you tried {HH AA1 HH AA2 HH AA0} and {AA1 AA1 AA1 AA1 AA1 AA1}?
>>
>>37817596
You mean case sensitivity? Do you want all-caps text to be more energetic?
>>
>>37817992
Thanks, this answered my question
>>
>>37818113

Yeah, a lot of text samples in the dataset with emphasis are spelled like HELLO. But I was wrong, cased berts word embedding scheme will create 5 tokens: H E L L O

First try at making the model more emotional by extracting emotion embeddings directly from the text using TinyBert and then concatenating that together with the speaker is getting trained now. Let's see how bad the voices turn out :))
>>
>16 posts in almost a whole day
is this project dead?
>>
>>37818499
16 posts throughout the night? Nah
>>
>>37818499
I'm working on few pieces , it just I don't have much to show of my WIPs and I wód assume the others are doing same.
>>
>>37818551
>I wód assume
kek, pronunciation almost checks out.
>>
>>37818499
Still waiting for 15 to feel better
>>
It is almost sad how people fell for the fake 15 yesterday. Never trust a non vdrbose 15
>>
>>37818731
You mean the one person who asked if it was 15 and the other person that said "Who cares?" You mean those people that "fell for" it? All zero of those people?
>>
>>37818790
Don't forge the two people that said "maybe" and "probably." Lol. They fell for it good. They got absolutely owned. Those fucking idiots, I can't believe they fell for it. Absolute clowns, they were. Grow up, >>37818731
>>
I saw in your colab:

https://colab.research.google.com/drive/1Tv6yaMQ0rxX9Zru3_D16Yzp5gQNsgn9h#scrollTo=QC7vrzLUYUFg

That you train on audio at 48khz. Why is that, is there a noticeable difference to 22.05? Almost all papers I've read use either 16khz or 22.05.

Also which Vocoder are you using then? Did you pretrain your own HiFiGAN and changed its architecture or how did you do it?
>>
>>37819001
48kHz doesn't work well and gives much worse quality than if you trained on 22.05 or 16.
>>
>>37819018

Okay thanks, then I do everything right ...
>>
VITS gives really promising results:

https://arxiv.org/pdf/2106.06103.pdf
https://jaywalnut310.github.io/vits-demo/index.html

However, their architecture gives me cancer and training probably takes 2+ weeks even on GPU clusters
>>
Hey so is there some way of using wine to use deltavox on linux?
>>
>>37819001
This is notebook is ancient, and the person that wrote it (Cookie) no longer frequents this thread. Try asking around the Audio Deepfakes discord server.
22khz audio sounds low quality, and we have 48khz audio so we use it when we can train models with it. You should be asking why research code often uses 22khz instead of higher quality audio. With older vocoders (WaveGlow and friends), 48khz was harder to train just because of the larger number of samples and the fact that there's a chance for models to screw up the higher frequencies. I don't think this is a big concern with newer vocoders like HiFiGAN.
>>
>>37819391
That's not entirely true. 48 kHz training introduces a lot of robotic artifacts in the vocoder that isn't present with a lower sampling rate, so it's still recommended to use 22.05 kHz audio data or downsample your data.
>>
>>37819001
>>37819018
>>37819401
It sounds horrible if you're not a typical half deaf pleb with cheapest audio equipment.
>>
>>37819410
I can't tell if you're agreeing with me or not.
>>
>>37819429
They are being your typical audiophile, and criticizing the fact that, to their professionally-tuned ears, anything less than 69420kHz lossless raw audio carved into an ivory vinyl is garbage.
>>
>>37819489
I don't think you understand what I'm saying. I'm on standard headphones and I'm definitely not an audiophile, but I can still hear the weird scratchiness on 44.1 kHz vocoders. It's a weird mix of raspiness and roboticness.
>>
>>37819489
>not wanting to hear below average quality is audiophilia now
Jesus Christ, humanity's standards just keep sinking lower and lower it seems.
>>
>>37819518
This is why uberduck exists. They're full of children with no sense of quality standards.
>>
https://openreview.net/pdf?id=yM5rtGA4pNX used 48 khz instead of 24khz and got MOS up from 3.47 to 3.76 that way. I don't know if it's worth a doubled training time though, I guess you also change the FFT parameters?
>>
>>37819556

The sub frequency discriminator is really interesting, maybe I'm gonna plug this into FastSpeech 2.
>>
>>37819556

Ah ye, now I understood it, hop size is a fixed 5ms for both 24 and 48khz, so there is no increase in latency.
>>
>>37819001
>>37819401
There's also the approach I use: Train 22 KHz models, and apply super-resolution on the vocoder side. Not perfect, but easier to train.
>>
>>37819709

Training the acoustic model at 48khz isn't really the problem, fastspeech 2 is very fast. However, I fine-tune HiFiGAN v1 which was pretrained for 2.5 million steps by the authors at 22khz. Training a HiFiGAN v1 at 48khz to 2.5 million steps will take ~2.5 million seconds on my garbage pc :D
>>
I guess no one here has a trained HiFiGAN v1 at 48khz?
>>
>>37819733
No. Cookie trained a few 44 KHz model, though.

https://drive.google.com/drive/folders/1nTyn6qr2b76aOE430trasuZj0Kr2H_ya

If you want to train a vocoder from scratch, consider using FreGAN instead.
>>
>>37819973
Because it trains faster or because it's better?
>>
>>37819733
It's been done before but they're not good
>>
>>37820034
Okay, I think I'm gonna throw that idea away for now.
>>
>>37820029
It achieves similar quality in about 200,000 steps (compared to 2.5 million). It's supposed to have less hissing noise as well, but I didn't notice that when I trained a 22KHz one.
>>
>>37820065
Sounds awesome, thanks for the tip :)

Did you notice any difference compared to the 2.5 million step HiFiGAN v1 model?
>>
>>37815950
I've sorted s7e21-26.

>>37817596
>I saw that many samples are spelled that way in the MLP dataset.
Those are almost certainly carried over from the original episode transcripts that were used as a reference to speed up the clipping process way back when. The whole thing was set up to be case-insensitive from the start, so I've never made upper/lower case a consideration while transcribing. There'll almost certainly be some appropriate instances of caps used for shouting/emphasis as they were in the original transcripts, but since it wasn't a constant consideration, there can be no guarantee of consistency.

>>37818499
Most of the discussion for the current active tasks has already been had, so it's more a case of actually doing the work as of right now. A lot of effort goes into this behind the scenes, most of it being boring grunt work that isn't interesting to talk about. For example, it takes ~30 - 40 minutes for me to review a single episode for the new round of iZotope cleaning of seasons 7 - 9, and essentially all I'm doing is listening to original vs cleaned clips and pressing buttons according to which one I think sounds better. I doubt anyone is interested in reading any more detail than that.

I do try to give at least some kind of regular status update whenever I'm actively working on something, but due to the nature of the work I typically do all I can really give is a simple update of how far I've progressed on a given day. Becuase of this, it will always look like less is happening than is immediately apparent. Be assured though, I have no intention of letting this project die anytime soon.
>>
>>37819108
VITS reaches pretty good quality at 150k steps from slightly above 2 hours of data from scratch, one or two days on a V100; improves over the span of a week until about 800k steps.
>>37819375
Some guy tried a while ago and failed
>>37819391
I have this conspiracy theory where companies are paying off owners of open source repos to never push their models to the limit.
>>37819410
Correct. Even with my modestly priced headphones, I can't bear 22KHz audio.
>>37819001
That notebook is ancient. I recommend you look at >>37816126
>A list of TTS tools: derpy.me/A8Us4
I use 44.1KHz MB-MelGAN with HiFi-GAN discriminator. My tool and models should be on that list. Are you the Sunburst-flagging phone app anon? You might also want to look at that since the generator is extremely quick on CPU. If you want to gather data from celebrities I have a script that can extract transcribed clips from audiobooks (provided that the book is available as a .txt file) using ASR + fuzzy string search, although the efficiency is 20 to 50% and results have to be manually inspected or taken out with a spell checker (about 5 - 10% are bad).
>>
>>37820482

Yeah, I am that guy :) I used MelGAN without the HiFIGAN discriminator but had worse results. Time taken to convert voices is like 10% FastSpeech 2 and 90% HiFiGAN v1 atm on my app, but the voices really sound like crap after they come out of FastSpeech 2, and after HiFiGAN is fine-tuned for 100-200k steps the difference is huge.

Can you tell me about the FFT parameters you used for 44khz?
>>
>>37820529
For example, those are my outputs of FastSpeech 2 now (without fine-tuning HiFiGAN v1, which I'm currently doing).
At least emotions are now definitely modeled, which are predicted directly from the text using MiniBert.

Input: "I love you."
https://vocaroo.com/16mYBMVB37v1

Input: "Who the hell are you?"
https://vocaroo.com/1fPUCeF1EuFz
>>
>>37820529
Preprocessing parameters here: https://u.smutty.horse/mecqojrenay.yaml
You can take a look at my training notebooks for more info, including the config. I had to experiment for months to get good results on the vocoder.
>>
>>37820594

Thank you very much :) I will maybe train FreGAN using those parameters later this week.
>>
>>37820529 (You)

Using TinyBert, sorry
>>
>>37820098
You can listen to my samples here:
https://desuarchive.org/mlp/thread/37240950/#37275312
>>
>>37820750
Nice samples :) Did you train on twilights whole dataset or did you remove samples that were annotated as noisy?
>>
>>37820836
I trained it on the whole MLP dataset, and then fine-tuned it on Twilight's TalkNet model for a few thousand steps. Can't remember if I filtered out noisy lines or not, but I probably did.
>>
File: delta error.png (44 KB, 794x633)
44 KB
44 KB PNG
>>37820482
any idea on how to fix this? it pops up on my windows 10 and windows 7 laptop BUT not on my W7 pc.
>>
>>37821628
Install visual studio redistributable.
https://docs.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170
>>
Do we have any programs which automate lip syncing?
Also I understand the AI struggles even harder to draw than to connect the dots when animating so I'm guessing getting the AI to make a MS-Paint looking comic book goes hand in hand with actually animating the poses and backgrounds you make it select.
I guess the first step would be to have 10 terrabytes worth of poses and backgrounds for the AI to identify when you type [Twilight 3/4 front pose] [hoof raised up pointing downwards] ["eye cheekbones expression; cheeks raised up] [mouth expression: smile24, wider right side going upwards] [eye shape expression; almond shape, angry] and so on. Probably just typing "angry smile/evil smile/sad angry" would be infinitely better faster than having to type at least 4 parameters just for the eye expressions.
>>
>>37822409
>Do we have any programs which automate lip syncing?
Nothing made by any of us that I know of, though I'm pretty sure that's a standard feature in most professional animation software at this point. Any particular reason why you ask?
>>
File: cheesepie spatula.png (72 KB, 1821x638)
72 KB
72 KB PNG
they sure love spatulas
>>
>>37818499
I'm just being lazy with the image stuff. Apache Beam keeps crashing randomly, and it seems to be running something like 10x slower than it did when the scrape started. I'll fix this soon.
>>
About this archive:

https://drive.google.com/drive/folders/1DQGul6hOqi227MJSJ-pPBv051YFbcDAi

What's the difference between 'merged_dict.txt' and 'normalized_dict.txt'? And should I use those dicts when I run montreal forced aligner on the datasets or should I let montreal forced aligner create the dictionaries?
>>
Does somebody still have the datasets for the TF2 characters and/or glados (portal 2)? The links are broken.
>>
>>37823481
>tf2
https://mega.nz/folder/R51XTQ6L#cuaGtb6oDJ5ig3hvJb9ZAw
>glados (portal 2)?
I only have raw wav audio (mixed with the sfx elements), if anyone else has the clean and transcribed version/zip of it I would also love to have a copy of it.
>>
Did Delta take the gpt model down? I got a 403 in my colab
>>
>>37823565

Thank you very much!
>>
>>37823922
Cloud trial ran out but I downloaded the checkpoint beforehand, let me upload it somewhere else.
>>
>>37823975
Thank you. Also can you maybe compress it with zstd? That may help speed up the setup part of the colab notebook
>>
Did you find a place?
>>
>>37824817
should
>>
>>37826330
poniponi
>>
The ngrok models have speaker embeddings in both the encoder and decoder. If you use different embeddings for each, you can "merge" voices.
Each of these examples is "encoder pony -> (decoder ponies)", with the first clip always being "encoder pony -> encoder pony".
>AJ -> (FS, PP, RA, RD, TS): https://u.smutty.horse/medeogcvzmr.mp3
>FS -> (AJ, PP, RA, RD, TS): https://u.smutty.horse/medeogashws.mp3
>PP -> (AJ, FS, RA, RD, TS): https://u.smutty.horse/medeogdhuih.mp3
>RA -> (AJ, FS, PP, RD, TS): https://u.smutty.horse/medeogcmlng.mp3
>RD -> (AJ, FS, PP, RA, TS): https://u.smutty.horse/medeogcwnrq.mp3
>TS -> (AJ, FS, PP, RA, RD): https://u.smutty.horse/medeogcehvl.mp3
It seems like the encoder controls intonation/pacing/accent, while the decoder controls timbre/overall pitch.

Also, if you generate speaker embeddings (by fitting a Gaussian mixture model), you can get random voices:
>https://u.smutty.horse/medesedgdbc.mp3

The idea to do this came from a recent paper (https://google.github.io/tacotron/publications/speaker_generation/) that synthesized non-existent voices: given a gender and locale (US, British, Australian, Indian), the model can generate a corresponding voice. It works as follows:
>Train a multispeaker Tacotron with speaker metadata (gender/locale) and learned embeddings (not embeddings from a speaker verification model)
>Train a simple model (Gaussian mixture density network) to predict embeddings given metadata
>Given some metadata, generate an embedding, then feed both the metadata and embedding to Tacotron
Surprisingly, the "metadata -> embedding" model can be trained separately from Tacotron (joint training doesn't improve performance). So, this technique can be applied to existing Tacotron models. The ngroks weren't trained with metadata, though, so we can only generate random voices.

Overall, the quality of these examples isn't great, but they show that the method can work. If someone plans to train a large, multispeaker model in the future, consider trying this out.
>>
>>37827003
Those merged voices are pretty neat.
>>
>Page 10
>>
Has someone had any success in adding noise to input spectrograms when training the vocoder?
>>
Any updates for the gpt model?
>>
https://youtu.be/NKRa7vor63c
This would be nice with AI voices.
>>
>>37827003
This is actually something I've been wanting to hear for a very, very long time.

Thank you for finding this.

It's also interesting how some of the voice combos sound scarily like some of the other VAs used for secondary characters.
>>
>>37827003
The random voices are very recognizably MLP characters, even if it's hard to pin down which ones.
>>
>>37828567
>adding noise to input spectrograms when training the vocoder?
If it's noise from the spectrogram model you're trying to get rid of, you can train HiFi-GAN on ground truth-aligned spectrograms. Or did you have some other task in mind?
>>
>>37829705

I thought about pretraining the vocoder to map spectograms with noise added to clean waveforms, this way the vocoder knows how to remove noise even before fine-tuning.
>>
Sorry Synthbot, I have been super busy last week, and now I am sick as a dog.
I will try to upload the file as soon as I can, again, sorry for the delay.
>>
>>37830001
Don't die, and take your time. I have plenty to do in the meanwhile.
>>
File: 1937609.png (110 KB, 813x1024)
110 KB
110 KB PNG
LowTierTrixie
https://u.smutty.horse/medkmeelmez.mp3
>>
>>37830033
Don't worry, I am not planning to die any time soon.
>>
>>37828962
>replacing Alumx's Celestia
But that's the best part.
>>
>>37827003
>Twilight Sparkle imitating Applejack's accent
funny
>>
>>37822953
merged_dict.txt was a combination of all known dictionaries. normalized_dict.txt is the same as merged_dict.txt, but without hyphens and apostrophes.
>And should I use those dicts when I run montreal forced aligner on the datasets or should I let montreal forced aligner create the dictionaries?
I never compared the two approaches. I was working on this last year before I got sidetracked. See https://github.com/synthbot-anon/synthbot/blob/experimental-v2/src/datapipes/phonemes.py and https://github.com/synthbot-anon/synthbot/blob/experimental-v2/src/datapipes/mfa.py.
I would guess that the best approach is to:
- Let MFA create its own dictionaries.
- Merge MFA's result with normalized_dict.txt so the forced aligner part can pick the best pronunciation for itself.
- If you find any cases where the normalized_dict.txt pronunciation is incorrect or missing something that MFA generates better, update normalized_dict.txt with the updated/additional pronunciation.

I haven't maintained the dictionaries in a while. If you want to post corrections/updates, I can put them on the Google Drive version. If you want to maintain your own copy, I can link to your Drive folder from mine.
>>
I can't seem to run the GPTJ or the KoboldAI notebook...did something happen to them?
>>
>>37831287
Delta said the time ran out on the storage api. He is currently finding somewhere to re-upload it
>>
>>37823975
You might need the model to be on Google Cloud Storage for it to work with TPUs. Also, you can save on data transfer costs by co-locating the model with the TPUs.
https://cloud.google.com/tpu/docs/types-zones
If you need help paying for storage costs, ping me at synthbot.anon@gmail.com. I'll need to figure out how Google Cloud's permission system works, but that shouldn't take too long.
>>
>>37831256
>- Merge MFA's result with normalized_dict.txt so the forced aligner part can pick the best pronunciation for itself.

Thanks, I don't really know how MFA works internally if i just merge both dictionaries into one, so for example have two phoneme sets for the word 'hero' in the merged dictionary, Monreal Forced Aligner will automatically pick the better one?
>>
>>37831256
I noticed that some entries in merged_dict.txt have identical words with different pronunciations, and some have words distinguished by a number with different pronunciations. And a few entries even have both:

>HYGIENISTS HH AY2 G IY1 N IH0 S
>HYGIENISTS HH AY2 G EH1 N IH0 S T S
>HYGIENISTS HH AY2 G IY1 N IH0 S T S
>HYGIENISTS HH AY2 G EH1 N IH0 S
>HYGIENISTS(1) HH AY2 G IY1 N IH0 S
>HYGIENISTS(2) HH AY2 G EH1 N IH0 S T S
>HYGIENISTS(3) HH AY2 G EH1 N IH0 S

In cases where there are identical words, the TTS models will always use the pronunciation of the last variant. But do the numbered words ever get used at all?
>>
File: 76554356.jpg (284 KB, 1749x994)
284 KB
284 KB JPG
Is it possible to use the leaked assets to put this onto this?
>>
>>37833627
On a complete guess, I'd wager no. The mannequin was more than likely all just a single element, with no independent rigging of the outfit itself.
>>
>>37833627
Lauren Faust already drew a couple pieces of concept art with Rarity wearing a saddle.
>>
>>37820231
I've sorted s8e1-4 and s8e12. s8e5-11 and 13 have special source audio, and so will be skipped.
Also one minor change to s8e4 - I've added three new character tags for the three separate personas Fluttershy adopts to explicitly differentiate the voices. These are "Snooty Fluttershy", "Hipster Fluttershy" and "Goth Fluttershy".
>>
>more bollocks tomorrow and day after
im sorry anon that wanted the fluttershy song, its just been a weird week, I will make my best attempts to get it done by Friday.
>>
>>37834222
you've been on a roll with the episodes
>>
>>37832682
That's correct per my understanding. MFA will try to figure out which pronunciation matches the clip better and it will use that.
>>37833609
Good find. The TTS will probably ignore the ones with parentheses. I didn't write any of the TTS scripts/programs though, so I don't know. I think my text-to-arpabet function (same as create_arpa_converter function in https://github.com/synthbot-anon/synthbot/blob/experimental-v2/src/datapipes/phonemes.py ) was the one that Cookie ended up using. If all of the other TTS scripts/programs are using the same thing, they'll effectively ignore the words indexed with parentheses.
I hadn't noticed the words with parentheses before. I'm not sure if MFA handles those properly. I just updated normalized.dict.txt to get rid of those parenthesized numbers. I'm not sure which copy all of the TTS scripts/programs are using, so they might need to be updated separately.
>>
>>37834222
>These are "Snooty Fluttershy", "Hipster Fluttershy" and "Goth Fluttershy".
Hmm, I remember there was a slide from this years PPP panel related to RD in styles of other mane6, and now we have anon pointing out swapping the encoder / decoder >>37827003 makes me wonder how close we are from using the voice datasets as just character and instead of mixing them up to create new, never existing before voices (eg Pinkie but with Rarity 'fanciness' and with Celestia 'elegance' )?
>>
>>37820482

If I train the vocoder at 44.1khz, do I train the acoustic model on 44.1khz too?
>>
File: Mare ooo.gif (429 KB, 235x274)
429 KB
429 KB GIF
https://drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp

The Good Poni Content folder has been updated with ~50 new pieces of audio content from the threads. There was a ton of high quality stuff this time around so it's worth a gander.
>>
>>37837372
Yes. The output of the acoustic model and the input of the vocoder should match. There's a few other settings to look out for, like the number of mel bands and the convolution settings.
>>
File: hellfire wip 3.png (124 KB, 1912x1172)
124 KB
124 KB PNG
>>37834985
80% done, should have the audio completed tomorrow-ish and simple pmv by friday/early saturday.
>>
>>37834222
I've sorted s8e14-18.
>>
If you see this Delta i know Mega has 20gb free so gpt j can be stored there. I
>>
>>37837897
nice
>>
>>37838912
>I
oh shit the MEGA ninjas got him
>>
>>37838408

I was a little bit confused because here they train the acoustic model at 16khz and the vocoder at 48.

https://arxiv.org/pdf/2110.12612.pdf

That's a really interesting paper by the way, it's basically FastSpeech 2 on steroids.
>>
>>37838408

I didn't increase the number of mel bins, because most papers I read which tried increasing them said that It made no difference.

On the convolution part I simply added one more upsampling block [2x] so the vocoder generates 44.1khz instead of 22.05, I think that should do the trick.
>>
>my little pony battletech mecha robot
Man, this stuff is fun. Question to the Anon(s) that was/were working on the mlp generated images (faces/butts/bodies and all the other ones), what is the progress on it so far ?
Also I have a question to the posters in here, I was thinking of creating a mance6 themed audio pack for the use for the VoiceAttack (or whatever similar program), and I would like to know if anyone else would be interested in this (and for what game) and if you guys would like to drop in any lines read by the characters (a google doc will be made if there is interest in this).
>>
>>37842876
man you made me realize I don't play games where audio packs are used any more.
>>
File: large.png (574 KB, 701x1024)
574 KB
574 KB PNG
>>
File: wip 1 hour.jpg (140 KB, 1149x719)
140 KB
140 KB JPG
soon nut also bump
>>
>>37845937
not sure how but the video got fucked on export, gonna need another hour or two to fix it
>>
>>37846053
https://u.smutty.horse/mefaukbvgpb.mp3
now that re-render will take another hour to finish, let me post the audio, I do hope >>37792992 you and others will enjoy it.
>>
File: 333015.png (913 KB, 880x900)
913 KB
913 KB PNG
>>37846132
>>37816133
https://www.youtube.com/watch?v=OFaLELBo2YA
AND here we are, the Hellfire parody song with Fluttershy voice.
>>
File: rylyt.gif (1.64 MB, 375x300)
1.64 MB
1.64 MB GIF
I was wondering if there are any notable videos or channels other than BGM Pony Degeneracy? I love every single video on that channel (apart from Pony Zone and the redub I guess) and I need more AI ponies.
>>
>>37846374
Clipper and GothicAnon have channels where they tend to do more long-form types of content.
https://www.youtube.com/channel/UC4tLPTP0u5Qy0xfocMZzt8w
https://www.youtube.com/channel/UCc5tbyfuixq0WS4CK28WOCw

And there's also the PoneAI drive for an archive of most of the good stuff that's been made over the years.
derpy.me/LzRFX
>>
Is 15 ok?
>>
>>37846374
To add to >>37846601
https://www.youtube.com/user/GeekBrony/videos
https://www.youtube.com/user/DiegoAlanTorres96/videos
https://www.youtube.com/user/shadowfox9356493/videos
https://www.youtube.com/channel/UC98fHZMisJo6wnqMkRdKBCA/videos
https://www.youtube.com/channel/UCfmfMiNrYnnciHPv31h443Q/videos
https://www.youtube.com/channel/UCtg1gc78gyP86y9iVSPAzrg/videos
https://www.youtube.com/c/ThunderShyOfficial/videos
https://www.youtube.com/c/MKogwheel/videos
https://www.youtube.com/channel/UCqhV3OhA6aTFDuSoKEeytiA/videos
https://www.youtube.com/channel/UCia6f0L-7nA9AnmiFT-jOxw/videos
https://www.youtube.com/channel/UCVvQ5E-M0kjufkdQfPsDVsA/videos

https://derpibooru.org/search?q=-explicit%2C+sound%2C+score.gt%3A15%2C+-fetish%2C+-vore%2C+-inflation%2C+-nudity%2C+%28artificial+intelligence+OR+aivo+OR+fifteen.ai+OR+talknet%29&sf=score&sd=desc

There's a dozen more channels but you have dig between the unrelated videos, you're better off using the search engine. Twitter has more hidden stuff but it's a pain in the ass to browse. Also there's stuff from older threads not in the AI drive, have fun with that. For example:

https://u.smutty.horse/mdmpdcushkd.mp4
https://u.smutty.horse/mcgzlessmrw.mp3
https://u.smutty.horse/mcgzletwotq.mp3
https://u.smutty.horse/mdgzegeqqln.mp3
https://u.smutty.horse/mcbvbohuldg.wav
https://u.smutty.horse/mcjkpwpnboh.wav
>>
>>37846611
He died, sorry
>>
>>37846611
He'll push the release back another 12 hours every time somebody asks him for an ETA on Twitter. Dude's a master troll.
>>
>>37846611
poisoned by his enemies.
>>
>>37847035
Vocodes and ubercuck had no way of beating him with better tech, so they had to take matters into their own hands...
>>
Whoever killed 15 also killed Delta u guess.
Rip both
>>
You guys are really good with your thread archival, is there someone who can help me track down the date of the first T:EM/P/O thread?
I think if I ask in the alt sites thread they ignore it or tell me to fuck off even though I send them money every month
With Yukila gone I have no idea where to look, desu doesn't go far enough. I think there is something on internet archive but I can't work through that on my phone because I'm down a computer right now but not for long
Anyways we are trying to figure out when the 10th anniversary of the T:EM/P/O threads is to do some special stuff
Any help would be lovely!
With your threads being so new you have the amazing opportunity and sense to record it all
>>
>>37846611
It's time for 15 to start making considerations in his Will for how his research will be furthered and continue to enrich humanity posthumously
>>
>>37847092
A good place to start would be to find out from some of the thread people what they used to call the threads. That may make a search through things like Google's cache/internet archives easier. There's no way it was called T:EM/P/O at the beginning.
>>
>>37847125
https://desuarchive.org/mlp/thread/5051881/
>Friend and I decided not to make this a serial thread so I'll only post this once a week or two.
Hoho they aren't going to like that one
>>
https://desuarchive.org/mlp/thread/5075704/
>I'll jumpstart the chain of contributions to this thread by sharing an synthesizer-orchestral lament, and momento: http://snd.sc/P3zELo
>This piece was recorded because my friend has passed away. T:EMPO was our project... It seems that it has been left to me, now.
Damn
>>
Do you guys have anything to make life easier for CYOAs/Questfags?
>>
>>37847309
>CYOAs/Questfags
not sure what you are asking here? if you want a text generated with gpt or some other text models there colab links in previous thread (but those things are down at the moment I think?).
>>
>>37847309
If you want audio from show characters without spending 6 months trying to debug why some python script isn't working on your machine like it does on every one else : yes.
... with 15.ai.
... the site that's up less often than Prince Charle's boner.
>>
>>37838865
I've sorted s8e20-26.
s8e19 and s8e23 were skipped as we have the studio outtakes to use for those.

>>37846261
Decent effort. For future musical works, consider adding some vocal reverb and using alternate takes (slightly pitch-shifted) as backing vocals. The singing feels kinda flat with just the voice as is.
>>
>>37847459
Delta still hasn't gave the new link so i can't update my colab. I am starting to worry about him
>>
>>37837897
add this lol https://u.smutty.horse/mdivfxpjcln.wav
>>
File: alt takes.png (54 KB, 884x688)
54 KB
54 KB PNG
>>37847571
>adding some vocal reverb
welp, it was on the to-do list but I've forgotten.
>using alternate takes
im gonna bitch here a little bit, all the lines had anywhere from 3 to 12 takes (pic related, multiple takes,pitch and speed edits), and while the ability to use reference audio is a god send I would still enjoy if there was a extra ability to control the voice (like use the | emotion control and/or have some kind of sliders to get the exact voice tone while using reference audio for the speed the spoken words) since I lack the proper musical vocal ability to do it and like VA from last thread pointed out, the talknet works best when the reference audio is pretty close to sounding like the character it supposed to be used for.
Also I personally think some voices are just plain too damn difficult to work outside their "expertise", the dragging and deep song of Hellsfire was such pain I had to Frankenstein sentences together from multiple takes (and in the case of both "BUUUUURN" lines I straight up given up and used Pinkie singing model with some pitch modification afterwards), there are few soften sang songs Im pretty sure I could use the FS model in replicating.
>>
>>37847680
Added.
For future reference, you can upload things to the drive yourself by following the "where to upload your content and guidelines" doc in the main folder.
>>
the gpt-j interference google colab demo doesn't work properly
at the fourth cell it gave me an error. I ran all of the cells one by one as instructed and enabled cookies.
the error:

Please wait as we initialize the transformer neural network necessary to run the model.
Elapsed Time: 0:01:52 Traceback (most recent call last):
File "tornapp.py", line 91, in <module>
network.state = read_ckpt(network.state, load_path, devices.shape[1])
File "/usr/local/lib/python3.7/dist-packages/mesh_transformer/checkpoint.py", line 147, in read_ckpt
n_tensors += len(np.load(f"{dir}shard_0/{file_index}.npz").keys())
File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'tfimfalmed3_l_slim/step_50065/shard_0/0.npz'
>>
>>37846611
"I'm back to normal. The site will likely be up tomorrow."
>>
>>37848962
Just if anyone asks for the source
https://twitter.com/fifteenai/status/14615312833230684207
>>
>>37849180
You do realize that 15 is an /mlp/ horsefucker yes?
>>
>>37849203
I do know that. Just giving a link to where he said that.
>>
>>37849203
no one cares lol
>>
>>37849406
t. uberduck
>>
https://ai.googleblog.com/2021/10/goemotions-dataset-for-fine-grained.html
>GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories.

They provide both the dataset and some pre-trained models. It might outperform TorchMoji.
>>
File: 1352489874663.jpg (100 KB, 498x422)
100 KB
100 KB JPG
>>37846601
Thanks!

>>37846848
Holy shit. That's a lot more than I expected. Thanks as well
>>
>>37849203
Thank you, Captain Obvious.
>>
>>37848384
Delta seems to have been missing. The model still hasn't been reuploaded so i can't change the url in the setup line
>>
For now i deleted my colab because of the whole Delta thing and him missing
>>
>>37851452
Last seen Saturday, supposedly uploading the checkpoint somewhere else. Since then he's been MIA, I assume he probably just got busy with other things. There's a million things that could distract, especially so as the holiday season gets into gear.
>>
>>37850263
>leddit controlling the emotion of the mares
Can we please just make something better
>>
>>37851514
I guess we could train it on 4chan posts, but than emotions would be limited to the Baiting, Seethe and Pretending To Be Retarded.
>>
>>37847571
I've sorted s9e1-7
I'm going to try to speedrun the rest of s9 over the weekend.

>>37851514
>Can we please just make something better
We have the entire show (and more) clipped, transcribed and labelled with similar emotion tags. If you have any ideas on how to make something better than that, I'm all ears.
>>
>>37852021
>If you have any ideas on how to make something better than that, I'm all ears
Question: Why is it necessary to let the AI decide the emotion by predicting it based on the text? Because so far all emotion manipulation has been done through this middleman of "let the AI fish up the emotions it thinks matches a certain phrase". How difficult would it be to let the user pick out those emotions directly? If the audio is all tagged with "angry", "happy", "surprised", etc. why can't those same tags be directly applied to the generation?
>>
>>37852290
I'm not a codefag but as far as I understand it, the algorithm analyzing sentences for emotion doesn't output a single emotion but more like a list of weighs. The output isn't "angry" or "happy", it's "0.87 angry, 0.57 sad, 0.42 disappointed...". Then it either applies some of each emotion to the output according to the generated weighs (probably with some randomness added in). Even the clear sentences like 'Fuck you!' add a bit of influence from several emotions other than anger.
>>
>>37852321
bingo, that's why there's a histogram of emojis shown from highest to lowest by weight, not just five emojis
>>
>>37852290
>Why is it necessary to let the AI decide the emotion by predicting it based on the text?
Text is (usually) the only input a TTS AI is given. It has no other way to infer the emotion it should be using.

>How difficult would it be to let the user pick out those emotions directly?
Both easy and difficult. Cookie's ngrok and 15.ai allow you to override the AI's interpretation with a separate contextualisation phrase, essentially telling the AI to say "I love you" with the emotional tone "FUCK YOU!". However (according to 15), it is not easy to give the AI emotional information directly. I'm not familiar with the inner workings of AI, so the best I can do is point you at a relevant tweet and see if anyone else has an actual explanation.
https://twitter.com/fifteenai/status/1453829522046996488

>If the audio is all tagged with "angry", "happy", "surprised", etc. why can't those same tags be directly applied to the generation?
Maybe they could be, but using something like TorchMoji or the GoEmotions thing from >>37850263 allows you to use a tool that's already purpose-built for interpreting emotion and is trained on larger text datasets. If you already have these tools available, building a new tool from scratch that only works for this specific format of dataset simply isn't a good use of one's time.
>>
>>37852021
>We have the entire show (and more) clipped, transcribed and labelled with similar emotion tags. If you have any ideas on how to make something better than that, I'm all ears.
Already?
>>
>>37852392
>Already?
Since almost two years ago. All dataset work since then has just been minor incremental improvements, such as the work I'm currently doing cleaning and reviewing lines from seasons 7-9 using iZotope's recently updated dialogue extraction tool.
https://desuarchive.org/mlp/thread/34802590/#q34809970

You can view the dataset in its current form here.
mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ
>>
>>37852321
I understand that, but my question was more, is there any major obstacle preventing the user from directly setting those weights instead of having to essentially trick the AI into setting them through a specific phrase?
>>
>>37852848
You'd have to ask 15 about it
>>
https://u.smutty.horse/mefsyhwyvpx.wav
>>
>>37853343
This sounds great overall, but the vocals are pretty hard to understand near the beginning. I personally can't make out the words.
>>
>>37853712
When you’re tumbling in a snowglobe
And solutions are a “don’t know”
Know surrender is a no-no
Don’t you give them what they want

If you’re just feeling like crying
Know that enemies are spying
Take your mind off of complying
As we sing our silly song
>>
IT'S UP
>>
>>37853883
https://voca.ro/1bODIrMTZQO2
>>
>>37853883
Probably the best I've ever heard Luna and Rarity sound.
>>
>>37854019
It's remarkable how better each version sounds from the last one, even if I think that the last one couldn't be improved any further.
>>
Well 15 I for one believe that working on the verge of death is worth it ultimately
Suffering for the art or whatever
Too make realer mares
>>
>>37853343
OK now for what I was actually working on. I was really not feeling this one but it's done so whatever. "The Tendrils".
https://u.smutty.horse/mefwbwqkiwq.ogg
>>
>>37854158
and better than ever, too
>>
>>37854158
>Twilight avatar is on the main page
Absolutely based
>~150 queued
>almost no delay
What is this sorcery?
>>
Reminder for those using 15.ai now
you can make them moan by adding some random commas, apostrophes and letters followed of course by something for them to say Ex:

>a,,,,,,,,,,,,,,,''',,,, something
generates something like that :
-https://voca.ro/1e9Y9SaUXmbw
>>
>>37854177
btw, it works better the more sample time the character has
>>
>>37854177
and of course, the results may vary a lot but are pretty similar, as the AI is not deterministic
>>
>>37854176
150 queued isn't even that much because the site went up at 4 AM. Last time it was up I saw the queue go up to above 10,000 and it was still churning out requests pretty quickly.
>>
>>37853883
https://voca.ro/1856qhjy7Ya7
I just had to, sorry
>>
y'all techfags
>>
>>37854296
>implying
I'm just here for the ponies, man.
>>
>>37854141
>On the verge of death
WHAT.
>>
>>37854305
same bruh, but u guys still AI-fags
>>
File: revive.png (231 KB, 498x472)
231 KB
231 KB PNG
>>37854350
fuck, he got us there
>>
Anyone else getting occasional code 400 errors "Request Failed" on 15's site? It didn't happen for the first 2-3 hours but ever since it's been getting more and more common. Even now it's almost a non-issue (I get one every ~30 attempts) but it could be if the trend continues.
>>
>>37854570
That's always been an issue on the public version. If I recall correctly it has to do with the web servers lagging behind the traffic demands a bit, so when traffic is increasing (which would make sense as it's turning morning in burgerland) the servers fall behind and drop some requests.
>>
>>37854593
Ah, right. I completely forgot about that issue. Thankfully queue times are still very short.
>>
Someone should make AJ sing tiddies and beer.

https://www.youtube.com/watch?v=jsMOth_XkZM
>>
>>37854722
This, but with Rainbow on the piano.
>>
Lol i missed it being up
>>
Ooooh never mind. My browser was being stupid it works
>>
>>37854899
I had the same problem, probably due to cookies. I'm still using the site in the private window that I opened 4+ hours ago.
>>
File: 1635828349716.png (604 KB, 744x900)
604 KB
604 KB PNG
>>37134971
>>37662611
this is great. Please keep me updated!
>Cookie is working on controllable speech
this stuff is also what I'm really into as well, but can someone explain why talk net isn't it?
I feel like if we could somehow select an emotion and then use a guiding intonation with our own voices through talk net we could pretty much make any kind of voice happen.

I'll be using full talknet voice acting for my RPG at this rate.

In any case, since 15.ai is kill is there any way to play with talk net without a singing voice? are the 15.ai voices compatible in some way? I came here mostly looking to learn how to install them and then talk net, but it seems like you guys have, probably smartly, moved to notebooks for your AI stuff. I don't have a potato but if we need AI cycles for this stuff now I'd probably better learn to use it.
>>
>>37855681
>15.ai is kill
Huh?
>>
>>37855681
>since 15.ai is kill
Anon, try clearing cookies. It's been up for a day.
>>
>>37855681
>is there any way to play with talk net without a singing voice?
There's a checkbox to disable reference audio, which essentially turns it into a normal TTS.
>>
>>37855686
>>37855688
well it was a week ago when I tried to go. Thanks. I do keep forgetting this version of the project is for non commercial purposes though.

>>37855692
I want to feed it non singing reference audio so I can control the intonation and timing of speech.
Basically my goal is to create new pony content using my own voice clips as a guide for how each pony should say a line.
>>
>>37855714
it doesn't have to be singing audio you can use normal reference audio as well
>>
>>37855714
>my goal is to create new pony content using my own voice clips as a guide for how each pony should say a line.
That's... Exactly how TalkNet works. Just select one of the non-singing models and use talking as the reference input.
>>
File: 1473565370014.png (1.32 MB, 1200x1600)
1.32 MB
1.32 MB PNG
>>37855747
you guys work really fast, the OP post mentioned only a few singing voices are available to start.
I wonder if I can buy sortanon a beer or something. this kind of thing is exactly what I need, and its CC0
Literally only emotion processing could make it better.
>>
>>37855714
>for non commercial purposes
As long as you credit 15.ai it should be fine.
>>
>>37855889
it says right in the opening pop up its for non commercial purposes
>>
>>37855935
I asked 15 about it via email, he says as long as credit is given he doesn’t care too much about it anymore
>>
>inb4 wrong thread
lmk where else to ask

I wish to learn to make 3d stuff for a vr environment. I hear I should learn blender and unity, and am willing to focus solely on gettin gud at those.
However I'm super newfag at both and 3d modeling in general. Any good starting points for a dummy who can't even draw good?
What pone models exist to mess around with?
>>
>>37855942
well i might go back to it if it ever says you can mix them with other TTS then, but as of now, I kind of need the ability to guide the voices as hoping for a random emotion/intonation isn't really a great use of my time. 15.ai also has the largest bank of possible voices, and while EQG and etc isn't welcome in this thread, I don't have any personal problem with reusing their voices since this is an original project anyway.
>>
>>37855950
this really isnt a great place for this, I would ask tem/p/o
djthed probably has the best models you can ask for, but if you want to use them in a unity like environment you will have to scale them down significantly.
>>
The slight echo is still there in (all?) 15ai clips. It's barely audible in most cases but I can definitely hear it on my headphones, especially towards the end of each clip. Not noticeable on my shitty built-in laptop speakers.
https://u.smutty.horse/megcaxbjsso.wav
Rarely it's much louder, here's an example of a really badly messed up clip. You can't miss it regardless of your speakers quality.
https://u.smutty.horse/megcbeoltce.wav
Also some form of speed control would be nice. Long sentences are usually way too fast.
>>
File: mares.jpg (78 KB, 500x241)
78 KB
78 KB JPG
https://u.smutty.horse/megcbrtsbfc.wav
https://u.smutty.horse/megcbrzmzhu.wav
https://u.smutty.horse/megcbrhmxat.wav
https://u.smutty.horse/megcbqomipt.wav
https://u.smutty.horse/megcbrayfbi.wav
https://u.smutty.horse/megcbrnmsnz.wav
https://u.smutty.horse/megcbqismlq.wav
https://u.smutty.horse/megcbquvjsv.wav
>>
>>37854154
Some parts could have used more love but pretty nice
>>
When will 15.ai be back up
>>
>>37856435
Is this a joke? If not then just take 1 (one) look at the posts since yesterday.
>>
>>37856441
Thanks
>>
File: 1519416914744.png (297 KB, 612x612)
297 KB
297 KB PNG
>>37856396
>>
File: Got cucked.png (16 KB, 296x273)
16 KB
16 KB PNG
>>37855950
Just going to dump a shit ton of resources I've found over the year in this .txt document

https://u.smutty.horse/megdpfunvga.txt
>>
Lmao based
https://twitter.com/fifteenai/status/1462257156837810181
>>
File: Almost_there.png (25 KB, 953x438)
25 KB
25 KB PNG
>tfw spent ~10 hours generating clips today
Now I see why you always recommend starting with something shorter. Eventually all the voices start sounding the same and you can only determine who's who by the the picture on 15's site.
To make this less of a fucking blogpost, have a rare recording of NMM trying to break free:
https://u.smutty.horse/megekjaidyn.wav
>>
I'm surprised 15 is taking his time in sharing the upcoming character list, isn't it a shared google spreadsheet that we've seen already, and if so does anyone have link?
>>
>>37856396
>https://u.smutty.horse/megcbqomipt.wav
Ha that's pretty gay Rainbow dash
Rest are based
>>
>>37857386
yup that's hell.
>>
File: 1631313797156.gif (2.16 MB, 639x360)
2.16 MB
2.16 MB GIF
https://u.smutty.horse/meggfndrobu.mp4

Learning Adobe Animate today, spent the whole day making this. If someone could share some tutorials and resources that would be much appreciated.
>>
>>37857386
That's why I stick to 20s shitposts.
>10 hours
What's the rush? Doing the same monotonous thing for 10 hours is the easiest way to get burnt out.

>>37857963
shitpost / 10
Try asking in tempo. That's the designated content creation thread.
>>
>>37857095
>Sebastian Lague
His stuff is awesome and well-explained but definitively not something I would recommend to a beginner.
>>
>>37858199
>What's the rush?
I want to get the lines before waiting times get annoying or the 15 takes the site down. That probably won't happen for at least a couple of days but still.
>>
File: Anon's Computer.jpg (44 KB, 459x639)
44 KB
44 KB JPG
>>37858409
>That probably won't happen for at least a couple of days but still.
I really wish I could say "That's ridiculous, he wouldn't take the site down so quickly" with a straight face, but honestly you're totally reasonable given the site's track record.
>>
hehe autotune
https://u.smutty.horse/megijauhlit.wav
>>
File: fucking game.jpg (56 KB, 350x417)
56 KB
56 KB JPG
>>37858409
>>37858432
you two don't even try to jinx it, im 1/3 into generating lines with a background character model since the site got up and idea of getting stuck at this stage and waiting two extra months just to finish few dozen lines makes me sweat buckets.
>>
>>37858432
He's been promising that v24 would be the version he sticks with so he can add 1,000 new voices within a 72 hour period. Fingers crossed
>>
File: 15.jpg (20 KB, 480x124)
20 KB
20 KB JPG
>>37858723
He says a lot of things. His models are fantastic and all but let's be real, his word is worth jack shit with how frequently he goes against it. In fact, he said he was gonna stick with a model version and update with characters at LEAST two other times in the past.
>>
>>37858467
that's cute
>>
this could really help if we somehow managed to
make it work with 15 ai, with a voice to text kinda thing, I'm pretty sure something that could emulate a conversation would be possible
https://www.youtube.com/watch?v=1N9BHR0d_8E
>>
>>37858805
>GPT-3 open to public
>but they made sure to castrate it as best as they could
All in all I guess that's still an upside? Hopefully their """safety procedures""" are not too harsh.
>>
>>37858827
true, this bunch of corporate cunts love to fuck with independent creator's projects
>>
>>37858805
>>37858827
It's open to public but you still need to pay. You get like $20 of free access and then you have to open the wallet.
>>
Pinkapocalyptica.
https://u.smutty.horse/megkajgguuo.mp3
https://u.smutty.horse/megkajkdscf.ogg
>>
>>37858999
good job anon, it was very funky. It's nice to see anons getting more experimental with the voices.
>>
>>37858734
https://twitter.com/fifteenai/status/1462482708626649097
I really hope this is a sign the site will stick around longer
>>
File: 1609586752612.webm (516 KB, 325x208)
516 KB
516 KB WEBM
>>37859382
>soon will finally be able to have Tempest say sweet nothings too me
>>
>>37859382
ubercuck admins are in absolute shambles because of this
>>
>>37859558
who?
>>
>>37859742
literal nobodies
>>
https://vocaroo.com/1hEwKMzgc6px
hue
>>
Which model is best for moans?
>>
File: 1618559473580.png (214 KB, 912x876)
214 KB
214 KB PNG
>>37859775
It's quite easy to for 15's voices to take on a british accent with the right substitutions.
"Fokken wanka" being the most notable.
https://u.smutty.horse/megmbohfgek.wav
Full transcript: u wot? aye, U 'avin a bit ov a giggl mate? ye fokkin wanka! ayl bash ye in the edd I swea on me mum!
>>
File: TrixYay.png (1.72 MB, 5881x10000)
1.72 MB
1.72 MB PNG
>>37858999
That's awesome!v20t4
>>
>>37858827
I hope we can get zigger through their gay stuff.
>>
>>37859558
It's hilarious, I'm lurking in their Discord right now and everyone is planning to jump ship from uberduck as soon as their character gets added to 15's site.
Some of them are outright admitting that they've only been donating to 15's Patreon instead of uberduck's.
>>
https://docs.google.com/spreadsheets/d/1dlV9tNtasW_YvrCEj7jAz3hg8UKIFGxfNCOYigfa_50
>>
>>37859877
>You {W AO1 AO1 T}? Aye, you avin a bit ov a giggle {M AY1 T}? Ye fokkin {W AE1 N K AH0}! Oil bash {Y EH1} in the ed oi {S W EH1 EH1} on me mum!
https://u.smutty.horse/megmzgmuhho.wav
>>
>>37860413
Remarkable. It's ready for the full green.
>>
File: 1480500.png (977 KB, 914x1100)
977 KB
977 KB PNG
>>37852021
The speedrun is complete. I've now reviewed all episodes of s7-9 for cleaning with iZotope's new dialogue extract tool, except for those that had clean sources from the studio leaks. Thanks again to SortAnon for creating the PonySorter tool, which has made this much quicker than it would have been without. I'll now need to re-export all the audio and transcripts and then re-upload them to the master file. I expect this to take up to 2-3 days, depending on how hard MEGA throttles me.

>>37858999
I think the volume of the vocals was a little too low on this one, was a bit hard to understand what was being said. Great work otherwise, especially liked the funky beat and the general flow of the lyrics.
>>
>>37858999
I was not ready for the pinkpocalypse
>>
>>37860218
what is uberduck?
>>
>>37860835
scummy company that tries to rip off 15 by stealing PPP code
>>
>>37860835
Let's just say a project run by "coders" (mainly 15.ai rejects) and normies who just want attention. I've tried it myself and sheesh...it's bad...really bad. So now we have Uberduck and FakeYou (formally Vocodes) trying to battle 15.ai, and neither of them are even close.

Aside from that...hope GPT-J is still possible...
>>
>>37860874
>...it's bad...really bad.
Not only that, there was a whole big deal in this thread when it was discovered that they were stealing the PPP's code for their shitty models.
They are absolute scum, stay far away from them.
>>
File: 1624863103841.jpg (139 KB, 674x744)
139 KB
139 KB JPG
>>37860292
>mfw watching that list grow over time
>All those League characters
>practically every Adventure Time character
>CDi Zelda
>Sonic
>Metal Gear Rising fucking Revengance
>All to be released over the span of a few days to a few weeks

15, are you sure you're ready to unleash this meme bomb on the world?
>>
>>37860292
>mfw Tempest voice
>mfw Athena voice
>mfw Clipper voice
>>
>>37861146
>mfw 15 voice
>>
Do we have a RDP dataset?
>>
>>37861170
Why would you make a dataset out of a TTS? Sounds absolutely retarded unless I'm missing something.
>>
>>37861035
hope someone can add zootopia on there
https://www.youtube.com/watch?v=vA7Y-snSwRs
>>
>>37861174
Huh? Rainbow Dash Presents?
>>37861178
Zootopia's already on the list.
>>
>>37861180
Ignore me, I briefly confused greg's Dash with DRD. It's late and I only noticed it after I clicked post.
>>
>>37861180
forgot to ctrl f sorry
>>
>>37860292
>>37861035
There's no way he'll be able to this unleash this much power into the world without facing some sort of repercussion for it. Either the servers will melt because too many people are using the website, or he will be sued into oblivion (Even if he says he's legally in the right).
>>
>>37822409
>Do we have any programs which automate lip syncing?
Not really automation, but PapagayoNG could help you.
That's what I use.
>>
File: ghost.png (2 KB, 324x43)
2 KB
2 KB PNG
>>37860292
Holy shit, someone sent in a dataset of Ghost from True Capitalist Radio?
>>
Ponies paved the way for what's cumming
Don't forget that
>>
>>37861149
I still don't know what this zoomer meme means
>>
>>37861992
https://knowyourmeme.com/memes/poggers
>>
File: 1611093215755.png (2 KB, 275x43)
2 KB
2 KB PNG
>>37861246
If this doesn't cause a lawsuit, then nothing will
Captcha: WAR AJ
>>
File: 1616601662985.jpg (40 KB, 399x339)
40 KB
40 KB JPG
>>37862115
>>
>>37862025
No I don't think I will
>>
File: holybased.png (41 KB, 596x460)
41 KB
41 KB PNG
https://twitter.com/fifteenai/status/1462667450374402051
>>
>>37862183
/Our Guy/
>>
>>37860292
>L4D2
Oh man, can't wait for some new voice lines for the survivors in custom campaigns...
>>
File: Steamed Pies.png (1.87 MB, 3500x2500)
1.87 MB
1.87 MB PNG
https://u.smutty.horse/megrkensdrz.wav
>>
>>37862115
Now he’s just asking for it.
Might as well see what happens.
>>
>>37860292
>senko-san
... dub or original?
>>
>>37858999
Awesome, numbers confirm
>>
File: 1600412946406.jpg (437 KB, 4000x4000)
437 KB
437 KB JPG
>>37862183
>you literally developed ai speech synthesis so you can get cartoon ponies to say things you want?
>>
>>37860292
Guys...is this our mindset?
>>
File: Kekfuffle.jpg (70 KB, 717x1073)
70 KB
70 KB JPG
>>37863027
Holy shit I'm dying. I'll have to use it in some project in the future.
>>
>>37860609
Great work. I wonder if 3 more seasons of data will improve the models noticeably.
>>
>>37862115
This, 15 is definitely putting the target on his back now. I mean, the Spongebob voices weren't around long enough to get IP Owners' attention, but this sure as hell will cause waves at the Mouse House.
>>
>>37862183
imagine trying desperately to troll and getting out-trolled lol
>>
File: hmmmmmmmmm.png (384 KB, 4326x3327)
384 KB
384 KB PNG
>>37861766
Wait a sec, I thought 15 only accepted datasets for fictional characters.
Does a radio host's on-air persona count as a fictional character?
>>
File: stampontheground.gif (1021 KB, 498x448)
1021 KB
1021 KB GIF
>burn out working on the redub
>come back to see what people have been up to
>shitposting died down and everyone's back to work
It's great to be back.
>>
>>37862493
Nicely done
>>
>>37863242
He talked about how he faked the rage when it came down to the stupid memes and wanted to be popular and that sort of thing.
>>
>>37863369
How is that at all relevant to the topic at hand?
>>
>>37863242
>>37863376
iirc Ghost would rage over MLP to the point of knocking dozens of soda/beer cans around. Using his voice in a project heavily founded on MLP datasets brings him full circle
>>
>>37863447
>goes from raging at ponies to his voice raging at anons for having worst pony as a waifu
kek
>>
https://u.smutty.horse/megwbxvjakf.ogg
>>
File: IMG_1522.jpg (14 KB, 308x320)
14 KB
14 KB JPG
>>37864097
>>
File: 1522375324729.gif (66 KB, 316x304)
66 KB
66 KB GIF
>>37862493
I Love it
>>
>>37859830
15 of course
>>
https://twitter.com/fifteenai/status/1462873587363233792
>>
>>37864909
Yes, we see his Twitter. You don't need to link every god damn tweet he makes here, the people who care can see for themselves.
>>
Everyone here is pointing out the legal shit that could come with Mickey Mouse but I'm here looking at Hatsune Miku. She's a Vocaloid mascot, a mascot for a voice synthesizer that already exists. Literally the entire product they're selling is her voice.

Sure, that's a singing voice and this is a talking voice, but someone could easily do something like this to it using Vocalshifter (an obscure but free autotune program), and get a free and possibly superior Hatsune Miku singing voice.
https://soundcloud.com/pix-prucer/spongebob-wig-snatcher

Disney probably wouldn't be able to defend their case well, but Yamaha could potentially twist things into some sort of piracy narrative. I'm not saying it's an inherently bad idea to have Miku, I've even experimented with attempting to make my own Miku model, but I feel like this is more of a legal gray area than most other characters.
>>
>>37864981
Sorry, Crypton Future Media, not Yamaha. I forgot that Miku's dev team had separated from Vocaloid to make their own separate program.
>>
>>37860292
>my Combine Soldier dataset is finally in
Finally
>>
>>37856379
I’ll have an updated model up within the next few days that should reduce this reverb effect, or at least decrease the probability that any given output contains this reverb effect.
>>
How do you do a whisper in 15.ai?
>>
>>37865677
The contextualizers "|I like to whisper" or "I'm shy" can work sometimes, depending on the character. AFAIK there's not really a reliable way to trigger it unfortunately. Generally shorter inputs seem to be easier to get whispering out of though.
>>
Rainbow Dash discovers the Sigma Mare Grindset
https://u.smutty.horse/mehawowvqwg.mp3
https://u.smutty.horse/mehawpgmbkz.ogg

Lyrics: https://ponepaste.org/6112
>>
>>37866274
fucking RD
>>
File: GLORIOUS CONTENT2.png (2.64 MB, 1280x720)
2.64 MB
2.64 MB PNG
>>37866274
How the fuck do you work so fast
>>
>>37866274
is this 15.ai?
>>
>>37866294
Isn't she dastardly?

>>37866295
I'm on break at uni so instead of using the free time to catch up on my work or improve my life I do this.

>>37866297
The talking parts are mostly done with 15.ai. The rapping and singing are done with TalkNet.
>>
>>37863369
SHUT UP HE NEVER SAID THAT
>cans.wav
>>
>>37866274
Holy shit this fucks.
>>
File: 1620522378360.png (259 KB, 2208x2200)
259 KB
259 KB PNG
>>37866274
Dash's voice works surprisingly well for rapping.
>>
>>37866274
damn rd goes hard
>>
>>37866274
Holy fucking slap
>>
File: IMG_1552.png (128 KB, 900x833)
128 KB
128 KB PNG
>>37866304
REEEEE YOU CANT DO THAT
REEEEEEEEEEE
>>
>>37864965
but I dont go on twitter
>>
>>37866274
Holy fuck this shit fucking cums. This is probably one of the best fanmade songs I've ever heard. Bravo, anon!
>>
>>37866274
This is pretty damn good.
>>
File: 1637226827278.png (695 KB, 5000x5000)
695 KB
695 KB PNG
>>37866304
>mostly done with 15.ai
>mostly
>15.ai
Anon, lock your doors and sleep with one eye open gripping your pillow tight. He's coming for you.
>>
>>37866768
*gripping your life-sized mare plushie tight
>>
>>37865564
opinion on >>37864981 ?
>>
>>37866796
>>
>>37866274
Wtf I love rap now
>>
File: throne_room.jpg (979 KB, 3600x2025)
979 KB
979 KB JPG
First time making a real audio, here's the demo/trailer (the first scene)
https://youtu.be/xz8SsHj-aFU

>What is this?
A fanmade FiM audio episode, with backgrounds to help set the scene better. I experimented with a couple other ideas but I'm sticking to just the backgrounds for now, I felt that adding VN-like static pony images took away from the experience more than it added. Not to mention it'd lead to all sorts of comparisons with DRD which would really murder the overall mood.

I'm looking for feedback of any kind but especially about audio effects and video. Things like
>music/SFX too loud/quiet
>too few SFX
>fucked up reverb (this wouldn't surprise me, I ended up setting the wet reverb effect to -16dB)
>bad render settings
>subtitles appear too late / are hard to read / are too large / stay on the screen for too long
>music doesn't fit
>ESL dialogs
>many of the clips used have incorrect intonation or don't fit
that I can correct before editing the audio and then video for the remaining five scenes. All the voice lines for the remaining scenes have been generated. I noticed that the subtitles at 1:04 are wrong in this scene but that's such a minor detail (for now) that I don't think it's worth re-rendering and re-uploading just to fix it.
>>
>>37867060
Literally this

Please ponify every rap song so that they become accessible
>>
File: 34466.png (15 KB, 215x326)
15 KB
15 KB PNG
>as more characters get added eqg gets further and further away from fim
>AND you have to scroll for longer to even get to it
I am CONVINCED of 15's genius
>>
File: 7867396730.png (20 KB, 232x232)
20 KB
20 KB PNG
>>
>>37868001
honse
>>
>>37867480
The music became so quiet I forgot it was there, otherwise the audio mix is OK to me. I don't really agree with the intonations of the dialogue when Luna talks to Celestia; they both sound annoyed when they should be sounding more sad. Also, you used the wrong subtitle at 1:04 when Celestia says "Finally, at least that should be the last audience today."
>>
>>37867480
First thing first, nice to see the growth of anons interested in creating the AI related content, so I welcome you and I will be awaiting in excitement all future development.
As for the clip, some of the lines could use a soft of noise reduction, nothing too hash, just enough to cover some of the computer artifacts.
The reverb seems just itsy bit too strong, it sounds like they are in a cave, specially with Luna lines.
Pacing between the words needs something, while I know it's pain to generate a perfect sentences it can be fixed in small degree with spacing/cutting the lines in the program.
When using the music try to get ones that do not have anyone speaking in the background (unless the scene/setting states so) as it can be slightly distracted.
Idea to use background images is good, just make sure they will not crash too much with the text.
Idea to use character images is a bit of mix bag, I think you are correct in not using those unless you go full VN and get dozen high quality images of different emotions per character.
Other than that it's seems pretty good, but Im sure someone with a better eye and year to details will give you more detailed pointers.

>>37868001
>"your honor, I would like that picture be removed from evidence, that Anon clearly pinch and zoomed on this horse, its an absolute havoc of an image"
>>
Don't be afraid to add commas and periods for adjusting the delivery of their text
>>
File: question baby.png (1.92 MB, 1438x1080)
1.92 MB
1.92 MB PNG
>>37816124
Spoonfeed me a tutorial on how to use the speech synthesis effectively. I want to make some content.
>>
>>37868535
>Go to 15.ai
>Type in words
>???
>Profit
>>
>>37868536
I want to know how to alter their accentuation, draw out words, and make them sing
>>
>>37868541
>alter their accentuation
ARPABet
>draw out words and make them sing
Edit the audio clips in audacity
>>
>>37868535
https://youtu.be/RAYWr1uOGVM?t=4490
This might be of use to you.

>>37868541
For singing you just need to use TalkNet singing models with reference audio.
>>
File: 1354716329522.png (141 KB, 424x336)
141 KB
141 KB PNG
>>37868549
>>37868550
Thanks
>>
>>37868541
>alter accentuation
either do the cheap and dirty way of typing it as if you're trying to describe someone speaking in an accent, or do it through ARPAbet which gives you more control but is slower.
>draw out words
ARPAbet can help, but usually talknet with your own voice acting as the reference can be the most optimal choice.
>Sing
Talknet entirely
>Other tips and tricks for 15ai
Use, or remove commas to dictate the pace and pauses of a sentence. Use three commas (,,,) instead of ellipses for longer pauses and breaths. Strings of commas (,,,,,, ,,,,, ,,,) can cause heavy breathing or hyperventilating sounds.
And of course, type things like "aahhhhh,," "hhhh,," "mmmnnnn,," and "ohhh,," for moans. (use two commas after each moan noise)
Mix words in with the moans or breaths for cleaner, more natural sounds.
>>
>>37868629
Thanks to you as well
>almost half of the advice is making ponies moan
Absolutely based
>>
>>37868629
>>Sing
>Talknet entirely
Note that you CAN make some bangers using 15.ai, like all of the Pony Zones for example.
https://www.youtube.com/watch?v=h5Yc_OSPnRQ
>>
>>37868649
Pony Zones are insanely cringe in both idea and execution I can't lie
>>
>>37868652
>>
>>37868652
pony zones are like 10% jerk off material and 90% the same instrumental over and over. Very difficult to coom to.
>>
>>37868660
The ones made by an anon here fix both of those
https://drive.google.com/file/d/1m5c9mgYaNg5_tU3UhbbI3cX7rBDaIvx6/view?usp=sharing
https://drive.google.com/file/d/1ZpPfvmaMZUFKl23rDXM4V8b_x5f7f71f/view?usp=sharing
>>
>>37867480
On my $10 headphones the music is way too quiet. The rest is fine as is. What's the source fic?
>>
>>37868075
>intonations
Either sad is a hard emotion to generate or it requires knowing some trick or hack that I don't know. Even using emotional contextualizers with overwhelmingly 'sad' emotes rarely results in usable sad-sounding audio. But I'll keep it in mind and fix the most egregious cases.

>>37868765
>music too quiet
I'll make the voices a bit quieter. They're already peaking/almost peaking in few spots before applying any changes.
>sauce
You're looking at it. I got into an argument over A Royal Problem that ended with "if you don't like the episode write your own". So I did just that.
I'm not a writefag but I asked in /fimfic/ if the basic plot is at least coherent.

>>37868095
>reverb, noise reduction, pacing
noted, will tweak it a bit more
>music without anyone speaking
It was hard to find fitting music that was also clean. In Canterlot some minor background chatter is hopefully not a terrible issue, for Everfree Castle scenes I'm thinking about using some fanmade Princesses themed music. Maybe from Melodic Pony (rip), Acoustic Brony or Radiarc.

Thanks for the feedback. The final version should be up two weeks at most unless my perfectionism kicks in and I get stuck at polishing phase.
>>
File: 1031943.png (2.65 MB, 1920x1080)
2.65 MB
2.65 MB PNG
>>37816133
>>37860609
Re-export and re-upload is done. The final result is a moderate improvement in the quality of voice lines from seasons 7, 8 and 9 - by way of noise reduction with iZotope’s updated dialogue extract tool and minor adjustment to a few of the text transcripts.

The dialogue extract tool did a decent job on noise reduction, though it came to be quite obvious that it’s built more for typical speech from boring humans rather than emotive high energy pony voices. For example, it was completely hopeless with characters like Pinkie, Gabby and Silverstream. I’d rate its overall performance as being similar to that of the dialogue extract method with foreign dubs that was used back in the earlier cleaning done for seasons 1-6, that is to say that the improvement is worthwhile, but not exactly a game-changer.

AI anons, update your dataset by simply re-downloading the folders for S7, 8 and 9 in the FiM section of the master file 1. Episodes that had studio leaks (s7e13, s8e5-10, 11, 13, 19, 23, and s9e22-26) are kept separate in the “Special source” folder - these remain unchanged.

If you want them, you can also get the Audacity label files for each episode in the “Label files” folder in master file 1, and the JSON files output of the PonySorter can be found in the “Reviewed episodes” folder in the master file 2.

Please let me know if you notice any mistakes and/or missing files. I was careful to make sure that it's all good, but I can never give a 100% guarantee.

Master file 1 - mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw
Master file 2 - mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ

I’m now gonna take a day or two off as I work out what I want to work on next. Suggestions are welcome.
>>
>>37866274
Rap isn’t really my jam, but I can still appreciate a good work when I see it. Even with the large amount of free time you apparently have, I’m still really impressed with both the quantity and quality of the stuff you’re making.

>>37865677
>>37868535
In addition to the panel segment already linked in >>37868550, I'll also direct you to the quick start guide, which has a few more assorted tips and tricks. Feel free to ask if you need help with anything more specific.
https://docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit#heading=h.9yyn6qx78j28

>>37867480
I'll list a few things that I think may help for this scene:

(1) Background effects - in general, you should avoid periods of complete silence in the background unless the scene specifically call for it, otherwise the characters end up feeling somewhat isolated in the environment. There's a fountain at the base of the throne, suggest adding a quiet "water trickle" sound to the scene, this is something that can also be heard in the actual show for throne room scenes. Also suggest re-watching some throne room scenes from the show for inspiration on other kinds of sounds to use.

(2) Generic movement sounds - as it is right now, the scene feels like the characters are just talking at each other while standing perfectly still, with the one exception of the enter/exit hoof steps. Suggest adding a few "swish" and "magic with paper" sounds to signify gesturing and a few hoof steps every now and then. Use them somewhat sparingly and only in places that make sense, typically on key words and phrases that you want to emphasise.

(3) Subtitles - they don't always perfectly match the audio, but that's not a big deal in this early phase of development. Make sure you double check all that for the final versions, and keep capitalisation and punctuation consistent. Also move them a bit higher, the lower line is sometimes slightly cut off along the bottom.

(4) Reverb - generally lover it a few decibels, it shouldn't be too obvious unless the environment is truly huge and/or characters are shouting.

(5) Audio cutting out - the lines sometimes end abruptly, especially noticeable when the reverb isn't allowed to fully play out. Example - the end of Luna's line on "perhaps" at 1:20. Also, some lines have an isolated breath at the end that shouldn't be there. Example - the end of Luna's line on "go" at 1:50.

I could probably nit-pick it more, but for now I think it's best to leave it at these general concepts. Happy to clarify things or provide more advice on specific elements if you need, and feel free to post more WiPs as and when. Also make sure you watch the panel segment here if you haven't already >>37868550, and try not to stress too hard about making it perfect for your first "real" audio. Focus on the basics and the rest will come in time.
>>
If I make a dataset with only one character, would it be fine to omit the character's name from the filenames?
>>
>>37869085
Specifically for the file naming system we use (HH_MM_SS_Character_Emotion_Noisy_Transcript), the character field needs to be populated as the scripts that work with them to auto-generate text transcripts expect labels in that specific format. For 15's naming system, I'd say stick to the guide on the site unless he says otherwise.
>>
>>37869085
If I may shill my old github scripts for a moment (if you dont plan to use the masterfile scripting tool):
https://github.com/AmoArt/Audacity2Transcript
https://github.com/AmoArt/ARPAbet-dialogue-converter
You can train model without using the tagging system, however I think if you are going thought trouble of doing the whole thing from scratch you may as well go extra mile and add the tagging system as it will make it easier to work your dataset with the future voice training scripts.
>>
File: very impressed.png (129 KB, 1739x1070)
129 KB
129 KB PNG
>>37866274
Pretty good, though the timing of some of the bars is a bit wonky. Keep it up, this has potential.
>>
>>37869054
Thanks for the rundown, that's precisely what I was looking for. Especially when it comes to SFX, I noticed that it did sound a bit 'empty' but I had no clue what effects could be added so the
>re-watching some throne room scenes from the show for inspiration
advice helps a lot. I can't believe I didn't think about it.
I'm still on the fence about music volume, right now it's apparently way too quiet but overdoing it wouldn't be ideal either. Although now that I think about it, both your and GothicAnon's audios tend to have relatively loud music and it doesn't usually drown out the voices much.
>>
>>37867480
I actually really enjoyed that. I agree with the others about music and sound effects, but I'll defo watch the final product
>>
File: ponder-ts-001.png (30 KB, 150x150)
30 KB
30 KB PNG
>>
>>37870642
15 should use this as the new favicon for his site
>>
>>
>>37868873
>You're looking at it. I got into an argument over A Royal Problem that ended with "if you don't like the episode write your own". So I did just that.
Based. The sisters deserve a good episode.
>>
>>37868541
You could change how they say things by typing | in front and use a sentence that describes the preferred emotion
like I hate you!|I love you!
>>
>>37867955
>Spongebob
Reminded me of pic related, wonder if someone's done this yet.
>>
>>37871173
Yes.
https://www.youtube.com/watch?v=58ZsCcIAs2M
>>
>>37871177
Noice, thanks for the laugh anon.
>>
File: PonyScript.jpg (195 KB, 625x850)
195 KB
195 KB JPG
>>37867480
>I felt that adding VN-like static pony images took away from the experience more than it added. Not to mention it'd lead to all sorts of comparisons with DRD which would really murder the overall mood.
Ah fuck it, gonna shill something I made an eternity ago:
https://derpicdn.net/img/view/2018/12/9/1903963.webm
It was designed to make stuff like TheeLinker's, but you can add whatever images you want, including gifs (so Desktop Ponies, or gif'ed renders of the show's leaked assets should work).

Another Anon asked me to improve it quite a bit for his own project, but he has gone silent for half a year now so I guess that got canceled.
Here is the latest version: https://mega.nz/file/BgkGlKyK#2uqcr56yweIupgvNB3h9cWy374cMPQmnmIqNGYnNdvM
He was nice enough to add a manual ("Helper" folder) and re-organize the default available sprites ("Pony Script_Data/Custom" folder).

I haven't touched it in forever, but in case someone want low-quality visuals for low-ish efforts, to slap in the background of their audio creation...
>>
>>37871855
That's pretty cool, thanks for sharing it here.
>>
IT'S UP
>>
>>37872077
?
>>
>>37872080
>>37864965
annnnd this is why people post 15's tweets here
>>
>>37872080
SIGMA
>>
>>37871855
Nice dubs Reviewfilly
>>
>finally got around to watching the AI Redub
>"Cookie did nothing wrong"
Based.
>>
>>37823975
Hey Delta, I remember you posted a DeltaVox model for RDP a while ago - would you mind sharing the dataset for it so I can host it in the master file?
https://desuarchive.org/mlp/thread/36904619/#36970930
https://desuarchive.org/mlp/thread/36904619/#36971414
>>
>>37872295
Except for the part where he let everyone walk over the PPP's code and steal it without accreditation because he used the (uber)cuck licence, sure
>>
>>37872714
Hey, I found this (no idea where I got this from , probably from one of the past threads in here).
Sadly those are NOT transcribed and saved in the whole episode format (and cleaned with RTX voice so the quality is so-so).
https://pastebin.com/N76u4PBx
>>
>>37872970
Should have said so to the guy who included that line then.
>>
>>37873028
I only watched a couple of the episodes when they were posted so I wasn't aware that line was included before the whole thing premiered. I would have pointed it out if I had seen it.
>>
>>37873035
>>37873041
I think everyone's in agreement with that, though. He did fuck up but he had plenty of chances to apologize for his fuck-up and then tried to shift the blame onto everyone else. There's nothing controversial about saying what happened.
>>
File: report_AND_IGNORE.png (478 KB, 1501x929)
478 KB
478 KB PNG
>outsider trannies starting shit again
oh boy, here we fucking go again
>>
>>37873063
Outsider Discord trannies would actually be defending the other side, but either way there's no need to bring this up again.
>>
File: Nightly Storm_VP8.webm (1.31 MB, 1920x1080)
1.31 MB
1.31 MB WEBM
>>37871855
I am very sorry I got silent, I was buried under stuff to do IRL.
I still continued to work on the translation tho, and added some assets / finding some more high res ones.

Fortunately, I finished most of my renovation on my new flat, so I will be able to finish the help file, if you are still interested of course.

I tried to make some animation a while ago, and so I saw the need for automation. So I am planning to work on some python script to automate my workflow, but I need some time for that too, unfortunately.

>Pic rel VS
>https://u.smutty.horse/mehvchrufmp.webm
You can see it in this example, I first write the script and make it good looking, I then generated the voices, and lastly, tried to make everything fit together.
That's not how to do it properly.
The soundless version is okay, but the sound one is awful.
So my workflow will be :
Writing the transcript
Python script generate the voices (or ask me to generate it and put them in a folder)
Python script generate an Audacity label file with the texts
I make the audio with effects and music in Audacity and export the aligned labels file
Python munches it and generate a Shotcut file
Python munches it and generate a Pony Script file
Pony Script generate the video
Open the Shotcut file and make ajustments if needed

Tada, high yield shitposting workflow!
That is, when I will have time to work on it (I started working on it a year ago, and I could only dedicate 1-2h top to this project).
>>
>>37872979
Okay, that's better than nothing I guess. I suppose I could turn that into a proper dataset since I don't have much else going on now. I'll give Delta a bit longer to respond before I start working on that though.

On another note, I saw in 15's spreadsheet that Ghost is apparently in the pipeline, which implies that a dataset exists somewhere. I don't remember anyone ever posting a dataset here so would be interested in hosting that too if anyone has it?

Alternatively, there are some old videos with the full "radio voice" effect that I think would be fun to feed to the AI, so if that dataset can't be found I guess I could clip these as well.
https://youtu.be/UWIlR07mTfM
https://youtu.be/f-i1TiM1ZCs
https://youtu.be/OU5NptER9e8
>>
>>37873042
I'm honestly surprised no one said anything about it.
>>
>>37873310
I don't even know which scene is it
>>
>>37873168
Does that mean we’ll be getting a RD Presents model AND a Ghost from TCR model?
>>
How long does 15.ai usually stay up for until it's taken down for updates/maintenance?

Writing a script but not sure if I should just make plans to do the voice myself.
>>
>>37873901
v24 will stay up indefinitely
>>
File: Forever.png (178 KB, 575x789)
178 KB
178 KB PNG
>>37873911
>>
>>37873862
Honestly why havent we done this yet
I have dvds of rdp, can I rip better audio than youtube form there?
I'm going to have to find a fucking diskdrive that connects through usbc
Does such a thing even exist
>>
>>37873911
Lol
>>
File: 1623301724225.jpg (75 KB, 585x529)
75 KB
75 KB JPG
>>37873911
big if true
>>
>>37873918
Don't get your hopes up. That anon is trolling you
>>
>>37873948
It could be him though
>>
>>37873901
Generally no more than ~6 weeks.

>>37873911
This will change the second a new way to improve the models becomes apparent. Please for the love of god, if anyone has projects that rely on the service, get them finished NOW. Don't push them off assuming the site will be there when you need it.
>>
>>37873951
>>
>>37873911
>new ip
>15 or at least someone claiming to be 15 already posted in the thread

I gotta say I'm a little skeptical
>>
>>37873951
Why won't 15 just use a tripcode?
>b-but tripfaggotry is... le BAD
If there's anyone who deserves (and needs) to use one, it's him.
>>
>>37873974
yeah but it's fun to spread misinformation
>>
>>37873911
I changed my mind. I'm taking the site down in ~30 seconds.
>>
>>37873974
I don't think he'd ever want to tripfag but I agree, if anyone should be tripfagging it should be him
>>
>>37873991
the training dungeon demands sacrifice
the aztecs got nothing on this madman
>>
>>37873966
I'm cautiously optimistic
>>
Pay up paypigs
>>
>>37872970
It was his code...
>>
>>37874210
That’s the fucking point, he fucked up by using the worst possible license and letting a scumbag company like uberdick profit off of the PPP’s hard work without even mentioning the anons who worked their asses off for this.
>>
>>37874218
Typical reddit demeanor
>>
Remember, don’t respond to the Discord trannies
>>
>>37874221
stop replying to the outsiders
>>
>>37874218
It was a training script and an inference script that Uberduck stole. All of that was Cookie's work. Cookie literally wrote those himself. You are getting pissed off because you want to act like you own Cookie's work. A license on the scripts would not have affected the data, which other anons worked on. A license on the scripts would not have affected the documentation, which other anons worked on. A license on the scripts would only affect the training script and inference script that Uberduck stole.
I'm not responding again. Sorry for wasting posts.
>>
Ubercuck fans are so pathetic
>>
>>37873077
>Outsider Discord trannies would actually be defending the other side
Shitposting has evolved to the point a person will commonly play both sides because their goal is and always has been to waste your time.
The only winning move is not to play, do not engage it at all even if they agree with you, not even with my post here.
Polite sage for offtopic.
>>
File: TrixMunch.gif (424 KB, 438x590)
424 KB
424 KB GIF
What's everyone working on at the moment? I haven't seen much from our codefags lately, though I assume that's just cause everyone's taking it easy for holiday season. I myself have been trying to get things in order to start on PTS5 finally.
>>
>>37874381
Im slowly chip-in the dialogue from a greentext, and planing to turn this into simple plane animation in blender (ETC after new year). I am very thankful for Clipper the help with the narration+Anon dialogue, as well as tempo anon whom make a short audio intro. >>>37856170
Other than that I've picked up another fimfic that would be nice to turn into a audio episode, after that I will probably look into making my own stories.
>>
>>37873160
Cute scene.
Can't wait to see this completed.
>>
>>37874381
Honestly, I haven't been posting much because I got caught up on something really stupid that will almost certainly have no consequences.
I'm trying to see if there's a programming paradigm that hybridizes normal software development with machine learning in a way that makes chatbots more comprehensive and more tractable.

I updated my clone of Clipper's Master File, by the way. You can find it here:
- https://drive.google.com/drive/u/2/folders/1ho2qhjUTfKtYUXwDPArTmHuTJCaODQyQ
>>
>>37874551
Thank you.
But as it's a demo file I made to show the tinting abilities of PonyScript, I highly doubt it will be extended.
But I will definitely post it when my new workflow is working.
>>
>>37874553
>hybridizes normal software development with machine learning
Like GitHub Copilot? Or mixing symbolic AI with machine learning?
>>
>>37873923
>Honestly why havent we done this yet
Never thought of it until just now, and I'm usually busy with other stuff. Also, the only source for Rainbow Dash Presents that I know of is the videos on YouTube, which will all have music and stuff which isn't ideal. Lack of easily available material means it goes lower on the priority list.

>I have dvds of rdp, can I rip better audio than youtube form there?
Anything from DVDs will almost certainly be better than what's on YouTube, however I'm not aware of anything that was ever on DVD, could you elaborate a bit more on what exactly you have?

>I'm going to have to find a fucking diskdrive that connects through usbc
>Does such a thing even exist
A quick look on Amazon suggests that such things do exist.

>>37874381
I've been wanting to do another audio thing but I keep getting distracted with dataset work. Hopefully I'll have time to make something after I'm done with sorting out Ghost and Greg.
>>
>>37875319
>I'm not aware of anything that was ever on DVD, could you elaborate a bit more on what exactly you have?
Not him but I have the DVD too. Greg sells the RDP series on a DVD at dawnsomewhere.com/store
To reiterate, it's only the Rainbow Dash Presents series (about a dozen episodes iirc). Does not include positive/negative MAS or the Nepotism Adventure.
>>
>>37875340
>Greg sells the RDP series on a DVD
Somehow I never came to be aware of that. The more you know I guess.

>Does not include positive/negative MAS or the Nepotism Adventure.
That's okay, the exact episodes don't matter too much as long as there's enough audio to build a decently sized dataset.
>>
>>37875354
Funnily enough, I ordered a DVD drive connected via USB a week ago and received it yesterday. I never tried ripping stuff from DVD but it shouldn't be that hard. I'll see if I can grab the audio later today.
>>
>>37875374
DVDRipperSpeedy does a pretty good job (just need to make sure you set it up to rip audio at 5.1).
>>
>>37875423
Thanks but I'm on Linux so I just grabbed the first command line program (42kB) that looked alright. I'm currently mirroring the DVD. The videos are stored as VOB, I think ffmpeg should be able to convert them to MKV given that it's just a different container. Then one more command to extract the audio and it'll be ready for upload.
>>
Its a race to see who can assemble the fastest data set
This should be easy boys
>>
>>37874808
More like mixing symbolic AI with machine learning. I'm trying to work out what complex things need to go into specifying a complete chatbot, what the compositional structure of those things are, what their software equivalents are, and what data is necessary to get the same things.
>>
File: 1595947203902.png (217 KB, 622x289)
217 KB
217 KB PNG
>>37875478 (me)
Update, I think I successfully got the audio out losslessly. Unfortunately VOB files have no headers so AFAIK it's impossible to copy them to individual videos. For now it's just one large audio file. If anyone wants to double check if I haven't mistakenly ruined the quality, here's what I did:
>cat *.VOB > output.VOB
>ffmpeg -i output.VOB -map 0:4 -codec:a copy output.mkv # 0:4 is the audio channel
>(drag the mkv onto Audacity with ffmpeg plugin installed), export > 24-bit signed WAV, export > 16-bit signed WAV
>upload both to mega
Note that the last two steps are likely incredibly unnecessary as the original .mkv is about 5-10x smaller than the exported WAVs. I'll post link once it finishes uploading
>>
>>37875687
I might suggest makeMKV. It's available on Linux, you do need a key but you can grab a copy from thier forums (it's free while it's in perpetual beta). It really doesn't get much easier for ripping dvds or blurays.
>>
Haha phew you guys saved me needing to buy a disk drive for one dvd
>>
File: 15329.gif (1.35 MB, 1000x562)
1.35 MB
1.35 MB GIF
>>37816133
>>37875687
Rainbow Dash Presents DVD audio rip:
https://mega.nz/folder/3XoAQa4Q#HHP8LJURzUBqRskaDazAqg

Table of contents:
*Excuse any mistakes, the original ToC was not compatible with the rip so I had to do it by hand
0:00 - 9:41 - DVD bonus commentary
9:41 - 21:15 - RDP: Bubbles
21:15 - 41:49 - RDP: Cupcakes
41:49 - 51:51 - RDP: Somewhere Only We Know
51:51 - 1:04:49 - RDP Spiderses
1:04:49 - 1:31:12 - RDP: Captain Hook the Biker Gorilla
1:31:12 - 1:52:20 - RDP: Haunting Nightmare
1:52:20 - 1:58:52 - RDP: A Beatiful Day in Equestia
1:58:52 - 2:26:16 - RDP: My Little Dashie
2:26:17 - 2:50:42 - RDP: Bittersweet
2:50:42 - 3:13:04 - RDP: The Star in Yellow
3:13:10 - 3:16:04 - Alicorn Day
3:16:17 - 3:23:20 - Investment Loses
3:23:21 - 3:24:15 - Budget Impasse

>>37875713
Thanks, I'll keep it in mind for ripping other dvds in the future.
>>
>>37875921
Thanks anon, much appreciated. One more thing - does the DVD have any subtitles? If so, could you also upload the .srt file to help with transcribing? If not, I could probably grab them from the YouTube versions but I'd assume any that may or may not exist on the DVD would be more accurate.
>>
>>37875960
I double checked and no, I don't see them anywhere. They're not on the DVD and ffmpeg doesn't see them in any of the files either. Playing the DVD normally doesn't let you pick them too.
>>
>>37875993
Okay, nvm then. YT's auto subtitles usually do a decent enough job so I should still be able to get something useful from there.
>>
>>37873978
the world isn't all doom and gloom, champ. keep your chin up
>>
>>37876012
Nice
Other than auto subtitles, do any transcripts of rdp exist to reference if the subtitle comes out wonky?
>>
File: ghost.jpg (7 KB, 215x235)
7 KB
7 KB JPG
>>37816133
The Ghost dataset is ready. Audio is taken from the videos linked here >>37873168, totalling ~35 minutes.
mega.nz/folder/0UhSmYAB#WBrB-qCprQTofkAhwMp5CQ
Other sources -> Ghost (True Capitalist Radio)
Audacity labels can be found in the "Label files" folder in the master file 1 if you need them.

Note that these lines all have the "radio voice" effect, and so are comparatively low quality, but that's not an issue in this case as that's how Ghost is supposed to sound. Just make sure that it doesn't affect any other "clean" voices from other sources while training.
Also, all instances of Ghost going full cans.wav are marked with triple exclamation points in the transcripts to differentiate them from normal speech.

RDP using the DVD audio from >>37875921 is up next, this one will take a bit longer.

>>37876899
None that I know of. Suggestions are welcome.
>>
>>37877056
Well done Clipper. I can't wait to be able to troll Ghostler with his own voice.
This will spawn a new generation of trolling.
>>
File: 1636302988708.png (867 KB, 900x889)
867 KB
867 KB PNG
https://u.smutty.horse/meijeeyxwnq.ogg
>>
>>37877760
I forgot to add, while making this, I noticed that when using "embarrassed" as the emotional context word, Fluttershy's model in particular will start adding natural stutters by itself. For example that "T-totally by accident" was not intentional, nor was the "s-so could you help me"

The extra breaths and pauses were caused by just throwing a bunch of commas between words.
>>
>>37877760
>>37877769
That's remarkably good, the inflection on "funny story" is superb. 15?
>>
>>37877814
This is 15's model, yes.
>>
>>37877814
The quality and intonation of it makes it obvious that it’s 15.ai for those who have used it, but there’s always that little bit of uncertainty if the poster doesn’t specify it.
I used to think it was kind of silly but I understand why 15 is always so adamant giving proper credit and not mixing it with crappy ubercuck models. Those greasy fucks don’t deserve to be confused with actual quality work.
>>
>>37877861
That's why I don't understand how there are still people posting shit here that they got from 15 without specifying they got it from 15. It's literally his only rule and somehow even posters here fuck it up.
>>
>>37877861
>ubercuck
Fucking kek
Surely those faggots should have known when they chose their name that people were gonna call them that, right?
>>
>>37877861
>ubercuck
>>
>>37859877
https://voca.ro/164yHlMbzE4b
Holy shit, got this beautiful take
The voice crack at the end just makes it 10 times better
>>
File: Madeinheaven.jpg (82 KB, 300x300)
82 KB
82 KB JPG
A little bit of a preview of something I started working on. Here's Twilight Sparkle singing Made In Heaven by Queen.

https://u.smutty.horse/meikwfrjkhp.mp3
>>
File: cute mare.jpg (35 KB, 290x465)
35 KB
35 KB JPG
Would it be sacrilege to do away with standard TTS for PTS entirely? Looking at the upcoming character list for 15 kinda makes me want to wait on it and then just see how it turns out if I make it solely with 15AI doing ALL of the voices.
>>
>>37877056
But A message By Ghost has horribly distorted sound that doesn't fit with the rest of Ghost's broadcasts, his voice never sounded that way in other episodes before or after that.
>>
File: soat.png (1.03 MB, 1438x950)
1.03 MB
1.03 MB PNG
I tried "StyleGAN of All Trades" (https://github.com/mchong6/SOAT) with TPDNE. It's a set of ways to manipulate pretrained StyleGAN models: you can stitch together images to make a panorama, stretch a single image, interpolate between two images, etc.
Pic related has some examples. Unfortunately, the quality isn't great, as TPDNE can be finnicky (I didn't spend that long looking for good seeds) and most of the methods are designed for pictures of landscapes/buildings.
At the top is a panorama, which just places faces side-by-side and occasionally connects them together.
At the bottom are vertical and horizontal blendings of two faces. These are especially tricky because unaligned faces produce very distorted results.
>>
>>37878570
TPDNE almost always looks anthro due to including it in their dataset. Some of those examples there are really bad.
>>
>>37878576
Yeah, these examples aren't good. But I only spent about 30 minutes and everything else was either covered in blob artifacts, elongated in weird ways, etc. I also tried to replicate good samples on the TPDNE website, but I couldn't figure out how the generate the latents from the seed number in the filename.
>>
>>37878426
Do what you want. Hopefully our TTS eventually catches up to 15's in terms of quality and naturalness. In the meanwhile, there's no reason you need to limit yourself as a content creator.
It'd be neat if you and other content creators could provide the best clips (+ some flawed clips?) and corresponding transcripts/contextualizers so we can use it for further training.
>>
>>37878570
>>37878576
The left examples are horrible (mostly due to shoulders and pose) but the Little Pip +Twi/Luna look fine.
These two horizontal blend pics are pretty interesting because of how strongly they differ from source pics. The right one kind of looks like Colgate and the left one is astonishingly disgusting.
>>
File: 1610010376276.png (1 KB, 27x63)
1 KB
1 KB PNG
>>37878211
Amazing!
>>
>>37878570
That's pretty crazy, it somehow learns how to add way more details in the process.
>>
File: ghost.png (44 KB, 324x318)
44 KB
44 KB PNG
>>37877056
Ghost TalkNet: 1ThtuJtqNG_SOfzpG27uo7Thq04VAw33V
Sample: https://u.smutty.horse/meioonixcsl.ogg

Trained on A Message by Ghost. I tried Tacotron as well, and the quality turned out roughly the same. Thoughts on America is too noisy, and wouldn't align at all.
>>
>>37877861
I don't think any of the AI models, especially the competitors, have been able to successfully do heavy breathing, stuttering, or moans like 15's. It's somewhat of a trademark.

That may change eventually as the only one I haven't tested is Talknet with clean voice acting reference. But honestly the thought of moaning into a microphone is too much for me.
>>
>>37879355
Honestly that doesn't really sound like Ghost, not to mention the really bad static and noise in the background.
>>
>>37879355
On how many steps did you trained it (for both the tacotron2 and spectrogram decoder) ?
>>
>>37879403
Using moaning clips with TalkNet doesn't work nearly as well, it usually results in a weird robotic hissing noise.
>>
File: index.png (57 KB, 655x424)
57 KB
57 KB PNG
>>37879439
>A Message by Ghost
TalkNet was trained to 200 epochs. Tacotron did best after about 800 steps. Any more, and he starts to stutter and skip syllables.
>Thoughts on America
Alignment looked like this the whole time. I gave up after 20 minutes.
>>
>>37879355
I NEVER SAID THAT YOU SPLICING PIECE OF CRAP GODDAMN IT *cans.wav*
>>
>>37879355
Ehh it’s decent, but I think Ghost trollers should wait until the 15 version for maximum trolling efficiency. It wouldn’t have the same impact if Ghost heard the TalkNet version first.
>>
Found a few interesting things on arXiv.

>EdiTTS: Score-based Editing for Controllable Text-to-Speech
https://editts.github.io/
This is a way to splice clips together that sounds more natural than Audacity. Haven't tried it yet, but samples seem good. Code is available.

>SingGAN: Generative Adversarial NetWork For High-Fidelity Singing Voice Generation
https://singgan.github.io/
New vocoder that's a lot better at handling singing. No code available, but worth keeping an eye on.
>>
>>37879569
>SingGAN
>not SinGAN
They had one job. The other team didn't fuck it up.
>>
>>37879611
You could say it's a...
Sin
>>
File: CARLOS!.png (36 KB, 125x127)
36 KB
36 KB PNG
>>37879649
>>
File: large.png (200 KB, 1101x1024)
200 KB
200 KB PNG
https://u.smutty.horse/meirebcjapk.wav
>>
>>37877056
I've completed the RDP episodes up to and including "Somewhere Only We Know". I'll see if I can complete the rest over the weekend.

>>37878426
No complaints from me; I'm all about progress so do whatever you think will make the best work.

>>37878440
Due to the whole "radio effect" thing Ghost's voice is much more susceptible to variation, therefore I'd say there's no "true" sound to his voice. I selected those three videos primarily on their ease to work with. Other broadcasts are frequently interrupted with stream donation sounds, voices from external callers and so on. Whereas these sources were just Ghost on his own. If you can show me a source of what you consider to be Ghost's "true" voice that's just as easy to work with, I'll consider adding it to the dataset as well.
>>
>>37880545
Few first episodes of 2016 return were fully transcripted on the first wiki, and I managed to find one in the web archive
http://web.archive.org/web/20160817111015/http://truecapitalist.wikia.com/wiki/Capitalist_Episode_227/Transcript
https://www.blogtalkradio.com/ghost/2016/03/25/true-capitalist-radio-hosted-by-ghost--episode-227
I suggest using this because it's ~1 hour monologue in the most usual and consistent quality.
>>
>>37880615
I keep getting an access denied error when I try to download it, switching VPN doesn't work. Could you throw it up on mega or similar for me?
>>
>>37880650
Fuck. I tried to google something but only managed to find a huge ass archive of everything at once, ep 227 should be in it
https://archive.org/download/true-capitalist-radio-archive/True%20Capitalist%20Radio%20%28Return%20Era%29%20%28Part%201%29/
>>
>>37880650
>>37880713
Oh, the whole thing is on yt actually
https://www.youtube.com/watch?v=ZQc_2id_6WY
>>
>>37880713
>>37880729
Okay, I should be able to assemble all the required material from these. Thanks.
>>
File: 1614343263320.png (91 KB, 448x484)
91 KB
91 KB PNG
>>
Is there a clean version of Princess Celestia's pained scream from MLP FiM se04e02 ?
the scream in question:
https://u.smutty.horse/meivniuplgr.wav
>>
>>37881931
It's not in the 5.1 front center channel?
>>
i'm making a val_filelist and my question is, do i just copy and paste the files from the train_filelist to the val_filelist or do i take away a few files from the train_filelist then put it to the val_filelist??
>>
>>37881931
>>37882194
It's mixed in with the sound effects, so no clean version here unfortunately. The best I can do is remove the music.
https://u.smutty.horse/meixryzlczs.wav
>>
>>37882451
Take away. The same lines must not be in both files. Think of it like a test, the val_files are the test and the train_files are the study material, you don't put the same exact questions from the study material on the test or else the student won't learn, he'll just recite the answer.
>>
what are the recommended iterations or epochs for duration predictor and pitch predictor in talknet?
>>
>>37882648
Leave them on default. It won't make a real difference, and they're only used if you disable reference audio.
>>
Any updates on 6b Delta?
>>
Does anyone have any ideas for how to give audio a sort of traveling 3D effect to it? I know you could sort of do it in audacity but I wonder if there are any programs out there that might make it easier to do.
>>
>>37816124
Man are we gonna have Her level ai in our lifetime, it feels like it could but idk shit
>>
>>37883874
I think where its at is already pretty damn convincing to my dumbass, so yeah probably
>>
>>37882452
this is great! Thank you Clipper!
>>
>>37884219
What did you mean by this
>>
>>37884219
what did you mean by this?
>>
>>37883851
https://www.youtube.com/watch?v=bP8-EbShd9o

Is this what you're after?
>>
>>37884219
>>37884287
Good to see fellow rewatchfags here but please try to see what thread you're in.
Friendship is Tragic was kino though.
>>
>>37884298
I'm not watching the rewatch stream. Context?
>>
>>37884298
oh, fuck, ive mistaken the rewatch tab for ppp tab, sorry for posting my shitpost here
>>
>>37884303
They're watching edgy shit.
>>
>>37883950
Noice and china likes mlp and ai so we got that going for us
>>
>>37884686
How new are you
>>
>>37884686
>china
MIT isn’t in China retard
>>
NEW THREAD
>>37884994
>>
>>37884706
Bing qilin



Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.