[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/mlp/ - Pony

Name
Spoiler?[]
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
Flag
File[]
  • Please read the Rules and FAQ before posting.
  • There are 78 posters in this thread.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now closed. Thank you to everyone who applied!




File: AltOPp.png (1.54 MB, 2119x1500)
1.54 MB
1.54 MB PNG
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE

The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.

Technology has progressed such that a trained neural network can generate convincing voice clips for any person or character using clean audio recordings as a reference. As you can surely imagine, the ability to create audio in the voices of any pony you like has endless applications for pony content creation.

AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.

Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.

EQG and G5 are not welcome.

>Quick start guide:
derpy.me/FDnSk
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.

>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.

>Active tasks:
Research into animation AI
Research into pony image generation

>Latest developments:
GDrive clone of Master File now available (>>37159549)
SortAnon releases script to run TalkNet on Windows (>>37299594)
TalkNet training script (>>37374942)
GPT-J downloadable model (>>37646318)
FiMmicroSoL model (>>38027533)
FiMfic dataset (>>38029649)
Delta releases new GPT-J notebook + tutorial (>>38018428)
Another FIMfic downloader (>>38051136)
New DeltaVox (>>38064386)
New TTS Notebook: derpy.me/a2Ceh (>>38185345 dunky11)
New FiMfic GPT model (>>38308297 >>38347556 >>38301248 GothicAnon)
Help convert fla to xfl (>>38346245)
FimFic dataset release (>>38391839 GothicAnon)
KoboldAi training tutorial (>>38422872 GothicAnon)
Research into Centipede Diffusion V3 text to image (>>38623679)
DALL-E update (>>38780715)
Backup doc (>>38821254)
Offline version of GPT-PNY (>>38821349)
Vinyl Scratch model (>>38838776)
Dall-E limiting free usage (>>38863016)
Latest Synthbot progress report (>>38830930)
Latest Clipper progress report (>>38867324 >>38872255)

PPP REDUB
-Ep4 NOW TAKING CONTRIBUTIONS!

>The PoneAI drive, an archive for AI pony voice content:
derpy.me/LzRFX

>The /mlp/con live panel shows:
derpy.me/YIFNt

>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
mirror: derpy.me/c71GJ

>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.

Last Thread:
>>38820901
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: derpy.me/FDnSk
Main: derpy.me/lN6li

>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: derpy.me/A8Us4
How to get the best out of them: derpy.me/eA8Wo
More detailed explanations are in the main doc: derpy.me/lN6li

>Where can I find content made with the voice AI?
In the PoneAI drive: derpy.me/LzRFX

>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
derpy.me/pVeU0
derpy.me/Jwj8a

>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.

>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.

>What about fan-imitations of official voices?
No.

>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.

>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.

>I have an idea!
Great. Post it in the thread and we'll discuss it.

>Do you have a Code of Conduct?
Of course: 15.ai/code

>Is this project open source? Who is in charge of this?
derpy.me/CQ3Ca

>Links
/mlp/con: derpy.me/tledz derpy.me/14zBP

PPP Redubs:
Ep1: derpy.me/xZhnJ derpy.me/ELksq
Ep2: derpy.me/WVRAc derpy.me/RHegy
Unused clips: derpy.me/VWdHn derpy.me/OKoqs
Rewatch Premiere: derpy.me/EflMJ
Ep3: derpy.me/b2cp2 derpy.me/RxTbR
Ep4: In progress!
>>
File: biganchor.jpg (161 KB, 640x640)
161 KB
161 KB JPG
>>38894195
Anchor.
>>
File: 1658542202229259.png (82 KB, 656x656)
82 KB
82 KB PNG
PPP FiM E4 AI REDUB
The time has come to continue the /mlp/ AI Redub series! This time, we're redubbing season 1 episode 4 "Applebuck Season" with AI, and we want the whole board to contribute. See the Info Doc for more details.

Rules & info:
https://docs.google.com/document/d/14YzE-WBTH4xvwP2vU-Uk28T3j-iWKhf5s3DS-LKv4M8

The spreadsheet:
https://docs.google.com/spreadsheets/d/1uWLs6z1nz49VNfBfI4rB3rKHiCj81YDO6j0U987zuHs
4-3 and 4-9 may be available shortly if their current claimants don't show any signs of life.
>>
>take an interest in ai
>maybe I could go to school for this
>find out there are gender studies is mandatory
i didn't want to believe it, but /pol/ was right.
>>
>>38894230
>that gender studies is mandatory*
>>
>>38894230
>>38894235
Fuck off to some other board with your political bullshit.
>>
File: AdorableFilly.jpg (52 KB, 1024x1024)
52 KB
52 KB JPG
>>38894215
On the note of the redub, we previously agreed to establish a hard deadline once ~75% of clips were submitted. We're at 74.07%, so I think it's time.
Currently I'm thinking the deadline should be Thursday, August 18th, that being about 3 weeks from now, with the idea being that I can put it together on the following Friday, and we'll have it all ready to go for the next new content stream on Saturday, the 20th.
Anyone, especially of our current claimants, have any thoughts on the matter?
>>
>>38894230
>gender studies is mandatory
It is for basically anything at university. I managed to get past it by taking a course on the history of rock music, which was actually alright. Despite satisfying the category there wasn't really any gender non-sense involved.
>>
>>38894230
The Slow March Through the Institutions was completed a long time ago. Academia, Media, and any kind of bureaucracy, national (and often state and local) were done for ages ago. You either participate in the system and learn to lie, or you learn to live outside the system, scavenging from it for your own purposes like PPP has done. Hilariously, I remember Penny Arcade complaining about this exact thing (bullshit trendy left-wing courses) back in '02. Things have been rotting for a while.
>>
>>38894230
>going to school
>for anything tech related
If you need a degree for a job or something, I symphatise. But if learning is your only goal, you'll get more out of online resources than in wasting your efforts in a typical college.
>>
>>38894282
Sounds fine to me.

As soon as 4-10's done with it's final editing I'd be happy to continue with another clip to help meet the deadline. Providing I'm still motivated at that point and have a decent idea for any of the remaining clips. I'll keep you posted.
>>
File: unknown.png (93 KB, 1113x927)
93 KB
93 KB PNG
Hey, guys, I'm encountering an error when using RealESRgan, can anyone help?
>>
>>38894984
>RealESRgan
Can you post link to it? I do not see it in either main or quick start doc.
As possible quick fix, I guess I can suggests now to just factory reset this Colab and manually run it from top one cell at the time.
>>
>>38894984
Undefined usually means something that's meant to run before it didn't run, because it's requesting a function that isn't established yet were all cells before it ran correctly?

Also, unrelated to the above, apparently the "fixed" GPT-PNY Colab file wasn't actually openly shared, woops. Should be available now. Pretty sure it's still broken again now though. Thanks, cowboy for the inadvertent heads up.
>>
>Page 9
>>
I wonder how 15's going with the new voices, and the pressure to get it releases at his usual Soon™ deadline.

Hopefully we'll get more ponies in time for Christmas?
>>
>>38895648
Almost done
>>
>>38895708
Thanks for the heads up to wait another 2 months.
>>
>>38894215
4-3 and 4-9 have been remarked as available.
Also officially setting deadline for August 18th. >>38894282
>>
File: image.jpg (565 KB, 1242x1814)
565 KB
565 KB JPG
>finally ditch my old ISP because I was overpaying for slow copper connections
>now I have gigabit fiber and i'm still paying less
>make post on /mlp/
>Posting from your IP range has been blocked due to abuse. [More Info]
>my new ISP is rangebanned
>I can only post from my phone now
>probably not for long, i've been rangebanned before, sometimes it's lifted
>sometimes pick up new IPs and find out I was banned on boards I don't even go on
fucking shitposting faggots are gonna get me rangebanned again
>>
>>38896109
>The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.
And what exactly does your post have to do with this thread?
>>
>>38890820
V2 that's on DeltaVox
>>
What would happen if I fed talknet a full VO booth session? Like, with a dozen takes of every line. Would that improve the result over one take per line or make it worse?
>>
Do we have any Sergeant yelling voices?
I'd like this voiced with some improved wording:
>>38896062
>>
>>38896228
Spitfire.
>>
>>38894348
>Penny Arcade complaining about this exact thing (bullshit trendy left-wing courses) back in '02
Could you share some of those strips?
>>
>>38895148
>Also, unrelated to the above, apparently the "fixed" GPT-PNY Colab file wasn't actually openly shared, woops.
We told you that at least 5 times.
https://desuarchive.org/mlp/thread/38781041/#38784643
https://desuarchive.org/mlp/thread/38781041/#38791870
https://desuarchive.org/mlp/thread/38781041/#38792892
https://desuarchive.org/mlp/thread/38781041/#38794356
And someone pointed it out least once in the cytube, but I'm not going to look through hours of content to find that one.
>>
>>38896225
It would probably improve the result. If, like with decision transformers, you condition the network on some recording quality information, like with a "final take" flag, it would almost certainly improve the result.
>>
>>38896225
With reference audio? The result would improve, assuming you're going through the outputs and picking the best ones.
>>
>>38896548
>>38896689
there are about 6 good takes and 1 best take. would it be prudent to make the 6 good takes training data and the best take verification data?
>>
>>38896109
There are botnets that can help you.
>>
>>38896727
Oh, I thought you were referring to generating with an existing voice, not creating a new model. In that case, more data is usually better, as long as it's good data. If it's all clean vocal booth sessions with no noisiness I can only see it helping.
>>
>>38896532
The reason I thought it was already openly shared, is because on the Colab it was/is set to anyone with the link. But apparently the file itself (the .ipynb) on the Google Drive apparently wasn't, which was causing the issue of inaccessibility. Didn't think access would have to be granted twice, that's kinda cringe. Only checked because the email informing me sent be to the Google Drive page rather than the Colab itself.
>>
https://u.smutty.horse/milajmhqvih.wav
https://u.smutty.horse/milajmicyqu.wav
https://u.smutty.horse/milajmievtg.wav
https://u.smutty.horse/milajmifpuk.wav
>>
https://u.smutty.horse/milamxaihzq.mp3
>>
>>38895104
Yeah sorry for the delay. Have some Midjourney pony!
>>
And another

It's no DallE2. Anyone got Dalle2 ponies?
>>
>>38895708
maybe we could beta test v25 like last time if you want
>>
>>38895104
https://colab.research.google.com/drive/1k2Zod6kSHEvraybHl50Lys0LerhyTMCo?usp=sharing
>>
>>38897273
fresh oc
>>
>>38897238
my nirik
>>
I order you to give me your waifu
>>
File: Trixmug.png (394 KB, 3500x4729)
394 KB
394 KB PNG
>>38897303
>singles
No.
>>
>>38896866
i'm trying to augment the glados dataset with unprocessed booth recordings.
>>
>>38897848
That seems like a bad idea. I predict that when you try to generate speech with this model, some words will have the auto-tune effect applied to them and some won't. It's not going to sound like GLaDOS.
>>
>>38897861
there are multiple ways i could prevent that, one of which i've chosen. we'll see what happens
>>
>>38897884
So, what way did you choose?
>>
>>38897897
https://u.smutty.horse/milcsspgnhm.wav
>>
>>38898003
Sounds really good. If only a new version of this could be made: https://www.youtube.com/watch?v=u_DSTiUPeaY
>>
>>38898020
not bad for no ref audio!
https://u.smutty.horse/mildcfzqxxv.wav
>>
>>38898078
There used to be concerns about using data from multiple sources because of mismatches in recording setups and such causing problems, even if it's all from the same actor. I wonder if what you're doing here could be used to alleviate that.
>>
>>38898078
The GLaDOS lines in that comic dub were made entirely with 15.ai back in mid-2020. I think the weakest spot is when GLaDOS says "...and why can't you love science like [insert co-worker's name here]?" There was no easy way to make her say it like she says it in Portal, at a much higher pitch. That's a literal different inflection Ellen McLain used.

10:07 "develop" is also pronounced wrong. 15.ai's GLaDOS at the time struggled with long words, mostly. In addition to that, the only other thing that comic dub has that I wish could be changed is at 14:06, when Abimation (voice of the Companion Cube) says "reverse grid" instead of "reserve grid".
>>
>>38898164
>>38898092

Here are some comparisons with existing GLaDOS models, all with no reference

15.ai (truncated to text limit):
https://u.smutty.horse/mildmlfafet.wav

Uberduck Tacotron (portal 1):
https://u.smutty.horse/mildmkxuruw.wav

Uberduck Tacotron (Portal 2):
https://u.smutty.horse/mildmlakpfd.wav

My previous finetuned Portal 2 GLaDOS Talknet model:
https://u.smutty.horse/mildmyykqqp.wav

MYSTERY TALKNET MODEL:
https://u.smutty.horse/mildmlctkmw.wav

Thoughts?
>>
>>38898214
>ubercuck
>>
>>38898217
>existing models presented without bias (demonstrating how bad they are, even)
>>
>>38898222
Even acknowledging their existence is bad because they solely exist to profit off of others' work.
>>
>>38898226
you're hopeless
>>
>>38898233
>new IP
The Discord shills are coming
>>
>>38898214
They're all better than 15.ai was in 2020, and if that Lab Rat comic dub were ever done again, either by the same guy or someone else, GLaDOS's lines could sound much improved.
>>
>>38898307
Kill yourself, and tell you tranny friends to do the same.
>>
>>38898307
Fag
>>
>>38898307
I will take this bait and believe that you're 100% serious!
>>
>>38898333
>bait
>>
>>38898333
If you search up "4chan" in their Discord server, they have links to the PPP all over the place. There was even a coordinated raid from literal underaged Discord trannies to make ubercuck look good last year when anons discovered they were stealing the PPP's work.
>>
>>38898344
Genuine fucking parasites. They even tried to hire someone to make a Wikipedia article for them when they found out 15.ai had one but it got struck down due to a clear COI.
>>
>>38898354
How are they still around? Don't you have to sign up just to use the site?
>>
>>38898370
They're all literal children (like, aged 13 through 16) that don't understand that they're being manipulated and used by a soulless company and think that they're part of some hardworking community. Even Cookie got embroiled in this shit when he was working for them literally for free.
>>
>>38898377
Isn't cookie like ~20 though
>>
>>38898381
Yes but he's also hopelessly autistic judging from his posts early on in the PPP so I assume he was just as easily manipulated as the others.
>>
>>38898354
Remember when they were peddling NFTs to children and when people found out about it their defense was “b-but we needed the m-money to keep the site up :((“ lmao
>>
Report and hide. Do not respond.
>>
File: TimeToDisco(7)_0.png (888 KB, 1280x768)
888 KB
888 KB PNG
>>38897288
Here's what Disco Diffusion made out of it
>>
>>38898354
Can someone spoonfeed me-what's uberduck?
>>
>>38898627
No. It's not important or relevant.
>>
>>38898627
A bunch of greedy talentless people wanted to make their own 15.ai with blackjack and hookers, but the result of their shitty work was horrible, so they could only lure people in with deception and fraud.
>>
>>38898627
Shitty unethical company that deceives children to do their work for them and tries to market themselves as the "better 15.ai" while being utterly garbage
>>
gotcha
>>
>>38898344
>There was even a coordinated raid from literal underaged Discord trannies to make ubercuck look good
I kek'd hard at this.

Man, the ridiculous and retarded things youth get up to these days. Sometimes makes me worry about the future of humanity though...
>>
>>38899185
Don't worry about it, there is no future.
>>
I'm addicted to Midjourney, send help.
>>
https://u.smutty.horse/milhsxcaddw.mp3
>>
File: 1635674487111.png (98 KB, 803x677)
98 KB
98 KB PNG
>>38899506
If that awkward "uh..." from Twilight was 15.ai that's pretty great.
>>
>>38899453
Yeah, me too. Just remember you get more hours with relaxed mode rather than fast, infinite hours if you have the 30 monthly sub. It is slower without it being instantly queued, but it's still way more over time. There's so much great art that can be made through AI these days. It's like a, Modern Renaissance

Btw, use "--v 3" without quotes in your prompt to use the new V3 model, and "--q 0.5" usually gets better results, especially for characters.
>>
>>38899599
Yeah, the "uh" is from 15.ai. Ya gotta run a bunch of times but you can get some good "uh"s, "eh"s, and "ah"s
>>
https://u.smutty.horse/miljyixrrnv.mp3
>>
>>38898354
>Genuine fucking parasites.
Low empathy.
>They even tried to hire someone to make a Wikipedia article for them when they found out 15.ai had one but it got struck down due to a clear COI.
Low empathy does that.
>>38898370
>How are they still around?
Mega capitalism favors the mentally ill.
>>38898627
NPCs.
>>
File: Spoiler Image (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
AIs need baptism and regular exorcisms.
>>
File: Spoiler Image (907 KB, 1024x1024)
907 KB
907 KB PNG
AIs need frequent exorcisms.
>>
File: 1644953085365.jpg (18 KB, 240x240)
18 KB
18 KB JPG
>>38900399
>>38900404
My man, what the fuck did you type in.
>>
>>38900399
>>38900404

what generated that!?
>>
File: a10.jpg (6 KB, 144x146)
6 KB
6 KB JPG
>>38900399
>>38900404
Those are some dark synthwave/ industrial music album cover material...
>>
>>38900464
>>38900399
This is the cover of "In The Court Of The Crimson King" in an alternate universe
>>
>>38900399
>>38900404
This is harshing my buzz pretty bad.
>>
>>38900446
According to the uploader: "rarity crying with a big mouth with sharp teet".
>>
File: 1638715806697.gif (1.81 MB, 640x360)
1.81 MB
1.81 MB GIF
>>38900404
>>38900399
>>
File: why.jpg (56 KB, 1024x1024)
56 KB
56 KB JPG
>>38900404
>>38900399
>>
Does TalkNet still work with Firefox? I tried using it a while back and it wouldn't let me use it.
>>
>>38900484
>teet
>teeth
Jesus Christ.
>>
>>38901370
Typos, at least in DALLE-2, make the AI freak out in strange ways.
>>
File: ChromeSolePurpose.png (1.28 MB, 1080x1617)
1.28 MB
1.28 MB PNG
>>38901296
As far as I'm aware, it's still not working with Firefox. Had to install Chrome exclusively to use it.
>>
>>38901576
Damn, I guess that explains why it hasn't worked. It used to work for me months ago, but only stopped working when I tried using it again. Really don't want to use Chrome for anything. Do we need a Google account to use the collab?
>>
>>38901583
>Really don't wanna use Chrome for anything
Same, thus why I use it for TalkNet specifically for it.

>Do you need a Google account to use the Colab?
Pretty sure you do, yeah.
>>
>>38901583
The offline Talknet doesn't need Chrome or google account but on the flip side it does require having a Windows machine.
It's also skips all the waiting once you run it and download the models.
>>
File: am i retarded.png (124 KB, 878x1421)
124 KB
124 KB PNG
Trying to train Tacotron 2. What's going wrong here?
>>
>>38901661
>data_path got removed from the path parameters for generating mels
cookie
>>
>>38901676
>this script has you put the wavs in /wavs/out and then deletes the duplicate wavs instead of putting the wavs in /wavs and the npys in /wavs/out like the original implementation was designed to do, then has several hacky fixes to prevent it from breaking because it does it wrong.
>>
>>38901676
real schizo hours
>>
>>38901658
I fixed the docker image for the linux version (for now). You need an nvidia card and nvidia container toolkit
>>
Is there a way to easily sort through wavs by speaker? I've got a few datasets with a mix of speakers, and listening to the files and separating them into folders will take forever.
>>
>>38901661
OK, i solved this problem by just copying my dataset into the correct subdir in the tacotron2 folder and using all the defaults, but now it insists that none of the npy files it JUST CREATED exist.
>>
>>38901916
Unless the filenames contain a name/symbol for who is speaking at the moment you will most likely need to sort then by hand. Yeah, this part of preparations (and creating transcript files) is pretty much why there is such massive lack of non FiM voices, it takes a LOT of effort to sort all of it from scratch.
>>
>>38901921
Does anyone have any insight? I double checked that my file lists are right. It just seems to not be able to find the files no matter where I put them or what the list says.
>>
>>38902005
I'm not an expert, but maybe replacing all the */....*///*bumblefucknowhere relative paths in the code with absolute paths might help?
>>
>>38902015
that's the weirdest part, there aren't any for this part.
>>
File: stableponies.png (1.47 MB, 1023x1536)
1.47 MB
1.47 MB PNG
Stable diffusion, but I can't get a decent Applejack.
>>
>>38902371
Did you try the prompt "a cute orange background pony eating an apple" ?
>>
File: Spoiler Image (478 KB, 512x512)
478 KB
478 KB PNG
>>38902488
I'm so sorry.
>>
Is there more material on render traces than this? https://ponepaste.org/7569
More python examples would be nice.
>>
>>38902371
all her friends are here and applejack isn't.
>>
How hard would it be to port Univnet to Sortanon's talknet? It uses the same spectrogram format, right?
>>
>>38902527
A cutie.
Well, hairy short legs, but still.
Maybe add 'in the style of My Little Pony"?
>>
>Page 9
>>
Does anyone have a script for REVERSING Anon’s Transcript Generator? That is to say, converting from the old style of transcripts to the current one? I'm trying to use an old dataset.
>>
>>38901916
You can use the speaker clustering/similarity tools in https://github.com/DanRuta/xva-trainer
>>
File: Applejack.png (263 KB, 512x512)
263 KB
263 KB PNG
>>38903229
This is the best I can get.
>>
>>38902920
UnivNet expects Mels with a 12 kHz cutoff, but TalkNet generates Mels with a 8 kHz cutoff, which is incompatible. You could take the final output of TalkNet, which is 32 kHz, and use it to make a UnivNet compatible Mel, but UnivNet's output is only 24 kHz, so it'd be a downgrade in quality.
>>
Any new voices added to TalkNet? Having more male ponyo voices would be pretty great.
>>
>>38904251
Shit tons of singing models, depending on when you last checked. Just not officially added to the character list so you have to load them as custom models:
https://ponepaste.org/5733
>>
>>38904073
That's not bad in the grand scheme of things.
But, yes, not quite Applejack indeed.
>>
>>38901658
I have a few questions:
How long does it take to download a model?
Where/How do I go about downloading the models?
How much processing power/RAM/etc does using TalkNet take up? Is it very light, or is it gonna eat up memory or something?
>>
>>38894215
I am very sorry Anons, I couldn't make some time to work on 4-4.
Next week will be packed for me, there is very little chance I can continue to work on it.
Please, free it.
I will continue to work on it if I have some time.
Most probably, someanon will do it.
At worst, it will be an unofficial alternative section.
At best, no one took it before the deadline, and it can be used.
The show must go on!
>>
>>38904661
Talknet uses vram, so any nvidia gpu with 4 or more gigs of vram would do. Talk et has a windows installer and a docker image for linux
>>
Downloaded Chrome, only ever gonna use it for TalkNet until a fix comes for Firefox. But I decided to play around and get a good feel for it again, been a while. Did a little line with Rainbow: https://u.smutty.horse/milxwwuezwm.wav
>>
>>38904847
>has only 128mb of VRAM
rip
>>
Are there any new male character models coming to TalkNet officially? Trying to find a good male character to be an Anon voice, but all the male character voices that sound good are on other TTS programs that are nowhere near as good or expressive as TalkNet or 15ai.
>>
>>38904938
The only one who would know is SortAnon, and afaik he's been MIA for a while now. Worst case you can ask in here or maybe T:EM/P/O for an Anon voice actor.
>>
>>38904938
I may be able to train one (singing/non-singing) if you have a dataset available
>>
>>38905003
Unfortunately, I don't. But if anyone has a dataset for a Yuri Lowenthal voice, I'd love to see something come from that. I always loved his voice, and he just sounds like a good Anon type character to me.

In any case, how would I go about creating a dataset? I can't imagine it's as simple as grabbing voice clips and stuffing them into a single folder, right?
>>
>>38905016
There's a process involving labeling audio in audacity and generating files using some scripts, see Clipper's video
https://www.youtube.com/watch?v=Bsu7mwa-QGY
>>
>>38905016
>In any case, how would I go about creating a dataset? I can't imagine it's as simple as grabbing voice clips and stuffing them into a single folder, right?
You're stuffing them in a folder, pruning the ones that are wildly different from the normal speaking voice, cleaning the sounds of unnecessary breathing or screaming noises, and then creating a training list in the format
wavs/yourfilename1.wav|Your transcription.
wavs/yourfilename2.wav|transcription.
wavs/asdasasdas.wav|transcription.
wavs/etc.wav|crgdgdfgfgn.

then, cutting and pasting a few of the nicer sounding entries into validation.txt instead. After that, you select all your wav files and compress them. The zip file should be called wavs.zip. Then, in the interface, you give it your training.txt, validation.txt and wavs.zip and hit next a bunch of times. Fine tuning from this point is a bit of an arcane process if you haven't made any spelling mistakes or included symbols that will just be discarded.

I think cookie's tacotron might be better for emotions, but I haven't tried it yet.
>>
>Page 9
>>
>>38904089
Thanks for the detailed answer! I appreciate it.
>>
>>38904089
I have a question, does hifigan use the full-sample-rate files for learning even though talknet doesn't, or is it safe to pass 22khz files in?
>>
>>38904938
Yeah, I've been putting subtitles whenever Anon speaks and using Sans Undertale.
>>
>>38904938
I'd be happy to make a model. Been looking for more datasets to train. who are you interested in?
>>
>>38906008
While I sadly don't have a dataset on me right now, I've always like to hear a Yuri Lowenthal model, something tame and similar to his Ben Tennyson voice or better so Peter Parker from the PS4 Spider-Man game. There, his voice is pretty much smooth and to the point. I don't know why, but I always imagine Anon sounding like that for me.
>>
>>38905960
The TalkNet training notebook finetunes a HiFi-GAN that converts from 8 kHz cutoff Mel to 22.05 kHz audio. The inference notebook uses that vocoder, and an additional HiFi-GAN (shared between all voices, not finetuned) that converts from 22.05 kHz to 32 kHz.
In short, it's safe to use 22.05 kHz audio for training. If you use audio with a different sampling rate, it will be converted to 22.05 kHz before training.
>>
Is 15 still working on the site, or even around? I know he played TF2 a lot; does he still do that? Just would like to know if he's still around and doing alright.
>>
>>38907129
I'm sorry Anon, but there is no refunds.
But the real answer is that only 15 knows when he will make any push for new updates so dunno when that will be
>>
>>38907136
I don't mind if it takes a while, as long as he's doing alright and isn't burning himself out. I don't know what everyone's opinion is of him these days, but I still really enjoy his work and can't wait to see what the next version of the site brings.
>>
>>38907129
He posted ITT >>38895708
>>
>>38907172
Oh shit, I completely missed that. I know he hasn't really posted in these threads in ages, but it's good to know he lurks here from time to time still. Still, that's pretty great. :)
>>
I have so many ideas I want to do both as stories and as audios and even art, it's hard to stay focused on one thing. I've been holding off on the Trixie audio for the last year plus, so I'm going to try and force myself to do that one first.

After that, I have another Fluttershy audio planned and a Twilight one (pony-Twi, no Sci-Twi). Trixie will probably be around 10 minutes, Fluttershy 5-ish, and if I do it right, Twilight will be 30 minutes, but that'll probably be more dialogue and stuff similar to PonyASMR with some slight casual sex added in. Debating whether or not I want to have a voice for Anon in that one because otherwise there would be long silences between Twilight talking and I don't want her to just be repeating what Anon is saying to tell the listener that's what (You) said.
>>
>>38908219
>>>/trash/
>>
>>38908219
I have a morbid curiosity for doing VO. I don't know if I would suck at it or not but it always has interested me.
>>
Any good laughs or giggles that could pass for Trixie? Kind of stumped and there are only a couple laugh sounds for her in the MEGA folders.
>>
>>38893401
I can join a call on Tuesday.
I've been lazy with the animation stuff recently. I'm working out shape labeling stuff I mentioned in >>38887110 now. Let's see if I can get it done tonight.
>>
>>38908660
>I'm working out shape labeling stuff I mentioned in >>38887110 now. Let's see if I can get it done tonight.
https://u.smutty.horse/mimlzkmdvxz.zip
I'm using a hash of the DOMShape (after stripping unnecessary whitespace) for the shape filename. That should be easy enough to calculate consistently. I might update the render trace format to include this hash in shapes.json.
I'm working out how to integrate it properly script interface now. I might do something like this:
- If the output path ends with /.assets.sample, I'll dump a sample of each asset and a sample frame from the scene.
- If the output path ends with .sample-shapes, I'll dump a sample of each shape.
- If it ends with .sample-doc, I'll dump a sample frame from the scene.
- If it ends with .sample-assets, I'll dump a sample of all assets.
- If it ends with .sample, I'll dump a sample of everything.

The only reason I'm adding .sample-doc is that some XFL files contain a single frame with a background image. In these cases, the whole background is neither a shape nor an asset.
With these changes, the filename labels can refer to either a document, an asset, or a shape. I'll need to make a few more changes to account for that. But before that...
I've already run into some incompatibilities between my old filename labeling scheme (from last year) and the recent one. I'm going to port all of the current labels over to the new scheme first, then make the interface changes.
>>
>>38907159
I just wish he got appreciated for what he actually contributes. He's not revolutionizing speech synthesis, but text-to-speech specifically, in very interesting ways. He gets appreciated as "man who make website for funny voice memes" but it's not really about how good the voices sound, but how responsive and lifelike they are.
>>
>>38907053
Let me see if this is in my bag of tricks for this week. I'm knee deep trying to get a totally custom microphone capsule on the market by September but I'll see if I can't rip him from spider man and maybe even grab transcripts too if they're associated with the audio files somehow.
>>
>>38909671
I mean if worse comes to worst I could always play Anon. I've been trying to avoid having my voice in 4chan stuff now that I'm more serious about voice work, but if that's really the voice you're looking for, that's more or less my voice. I'll see about this Peter model first though.
>>
Live in ~1 hour, I'll be doing general prep and organisation (no actual work yet) for my next voice project, later joined by Synthbot to discuss animation assets >>38908660.
https://cytu *dot* be/r/PonyPreservationProject
>>
>>38909692
Give some voice work, Anon. Let those moist little cords strum.
>>38909667
I agree. 15 has completely changed the way text-to-speech has been perceived in just the last three years alone, and it's crazy to think how people just treat it like it's some casual little gimmick. While a bunch of Anons here were making good strides, he just slides in and makes history. That's not to dissuade the other anons and their work, but their work might've otherwise gone unnoticed or incomplete had 15 not come into the scene. Maybe that's licking his boot too much, but I don't know. But I can say that 15's work will be referenced plenty going forward, and stuff like his combined with TalkNet tech, would be more than just revolutionary.
>>
>>38910149
The reference audio would be a major game changer if it wasn't so damn complicated to use.
>inb4 "It's actually not hard, read the 50-page instruction manual (no, not that instruction manual, this one, and instead of doing step 19 in that one do step 37 from the other one) and then figure out the dozen error codes it invariably throws and it works fine."
>>
>>38910162
reference audio with talknet? you literally hit one single button on sortanon's talknet interface
>>
>>38910197
I think he's talking about building up the system that would allow the use of reference audio, not using it in its 'complete' form.
>>
>>38909667
His impact is huge, but I don't think he cares whether or not he's appreciated by others. Kind of based.
>>
>Page 9
>>
File: Scootabloom.png (365 KB, 1492x799)
365 KB
365 KB PNG
>>38894199
>>38909717
Quick summary of the stream:
Synthbot taught me how to use Adobe Animate with the .fla files to change pony poses, which should be helpful for future voice works similar to The Tax Breaks (new project likely starting in ~1-2 weeks). Also discussed workflow for animation data in general. The idea is to start with small steps, rendering individual frames of characters and backgrounds and feeding them to an image-generating AI. From there, move on to making the data more usable for more AI frameworks. This will involve rendering all the "clean" animations labelled in the previous labelling run, and then having anons watch the animations to determine if they are clean or noisy. There will be ~10,000 animations to sort so any help with this would be much appreciated.

Further work that will need to be done from there to make the data more AI-friendly is not yet known and will require Synthbot to do some more research, but having this sorted animation data ready to go will be a big boost.

>>38908974
To-do list:
For character images:
>Render all the complete images associated with each "clean" character animation asset in a high-resolution png, sort by character
>De-duplicate if possible
>Have some way to identify the original .fla file each one came from
>(Optional) Re-upload only the relevant .fla files for the clean character assets

For background images:
>For all backgrounds that can be broken down into individual assets/shapes, re-render all these backgrounds and components (in a high resolution) to a new folder with the same filename as the original background image
>>De-duplicate if possible

For completed animation sorting:
>Render all "clean" animations
>>De-duplicate if possible
>Upload in batches of 100 or so
>Determine some system for labelling (just put animations in a clean/noisy folder)

I think that covers everything, let me know if I missed anything.
For clarity, the rendering of character images and backgrounds is mainly for anons who want to use these assets/images directly for content creation, such as using the backgrounds for wallpapers or backdrops for animations. The completed animation sorting is specifically for refining the animation dataset.

Stream link:
https://youtu.be/iWfBJzJtVzc?t=4015
>>
>Page 8
>>
>>38909667
He doesn't get nearly enough attention, credit, or appreciation. Companies refuse to mention him because they don't want to admit that his tech is better than their commercial one. Academics refuse to acknowledge him because he's associated with the hacker 4chan. Normies, YouTubers, and TikTokers are happy to use his site but fail to give him credit. Even anons here have accused him of some nasty stuff even though he puts /mlp/ above everyone else.
>>
>>38911240
The ironically best thing about this whole mess? 15's work will outlive any and all of the shit companies have tried to push out, and they will only be able to dream of being on his level. He made immense strides from December 2019 to April 2020; that's four months and he did more in that time than most companies did in ten years.
15 should be absolutely goddamn proud of himself. Fuck the companies, fuck the academics, and fuck the people who undermine his efforts.
15, if you're reading this, keep on rocking the fuck on, because I love everything you do, and I know there are millions around the world that can say the same. Your work will live on where others will fail. Be proud of that.
>>
>>38911240
It's going to be that way until he publishes his paper, which will also reveal what his real name is.
>>
File: yhapu.jpg (6 KB, 225x225)
6 KB
6 KB JPG
>>38911750
>which will also reveal what his real name is.
>giving up his anonymity
that is a very heavy price to pay
>>
>>38911750
If he's submitting to a conference or even just posting on arxiv, then he'll probably have to use a real name. But he could avoid that by only posting a PDF to his website. Depends on what he means by "formal" publication.
>>
>>38911767
His identity will be something completely unexpected, like a central african catholic priest
>ze poneis, zey ah a gif furom ze holeh farther
>>
>>38911240
I'll never understand Cookie for what he did
>>
>>38912114
He's an autistic zoomer faggot, don't waste time trying to understand retards like that.
>>
>>38912114
Jealousy. And also this >>38912123
>>
I saw a few papers on arxiv doing the same sort of emotion coding 15 is doing. He's taking so long on his paper that other people are starting to publish similar stuff and even more advanced stuff. The state of the art is catching up with him unfortunately because he's gotten so distracted by feature creep. By the time he publishes his paper, it will be outdated already.
>>
>>38912536
>implying he's not revising his paper
>>
>>38912544
I work in CG and have done a fair amount of research. There's this pit a lot of researchers can get stuck in where they keep getting distracted by better ways to do things and the paper ends up getting delayed indefinitely because they keep thinking of better and better ways to implement their idea, but specifics of implementation are often out of the scope of many papers so you can quickly lose track of the point and then end up having the field as a whole surpass your work before your paper is finished. I had that happen on a few of my projects. I understand he wants to put for the best implementation he possibly can but this project has bloated in scope massively and he's going to collapse under the weight of it unless he consolidates.
>>
>>38912552
Though I don't mean to seem over critical here I'm just worried about him because I am very well acquainted with academic burnout and it's grueling. I hope he's well and taking care of himself
>>
>model = (r"D:\vosk-model-en-us-0.22-lgraph")
>recognizer = KaldiRecognizer(model, 22050)

recognizer = KaldiRecognizer(model, 22050)
File "C:\Python\Python37\lib\site-packages\vosk\__init__.py", line 138, in __init__
self._handle = _c.vosk_recognizer_new(args[0]._handle, args[1])
AttributeError: 'str' object has no attribute '_handle'

I'm testing some offline text transcription but Im hitting the above problem, the python for some reason is unable to pass allong the str of model direction to the vosk module first argument. Anyone knows how to circumvent the '_handle' problem conversion?
>>
>>38913220
`model` should be an instance of the Model object, not a string. Something like `Model(model_path="...")`. See:
>https://github.com/alphacep/vosk-api/blob/master/python/example/test_empty.py#L7-L8
>https://github.com/alphacep/vosk-api/blob/master/python/vosk/__init__.py#L45
>>
>>38913244
using the:
>model = Model(model_path="D:\......."
causes this error:
raise Exception("Failed to create a model")
Exception: Failed to create a model

and if I place the directory in this spot:
>recognizer = KaldiRecognizer("D:\.......", 22050)
I once again get this error:
AttributeError: 'str' object has no attribute '_handle'
>>
>>38913496
Just realized, I think it would be easier if I just share the original code tutorial im trying to work with, to see what exaclly is diffrent between by process and one the guy is doing (btw, im using Windows 7 and Python 3.7):
https://www.youtube.com/watch?v=3Mga7_8bYpw
>>
>>38913533
>>38913533
https://github.com/alphacep/vosk-api/issues/1015
I've tried running the basic 'empty' example test but even that didn't work, until someone suggested to fuck with the SSL certificate and to tell it to go and fuck itself, and so it loaded in the end.
Also, turned out, the tutorial guy info was wrong, it's not about passing the directory of the model name BEFORE all the tech guts, but it is IN the tech guts (AKA: r("D:\.....\vosk-model-en-us-0.22-lgraph\vosk-model-en-us-0.22-lgraph"))
>>
>>38913650
https://u.smutty.horse/minbkfogydg.py
The small progress so far, managed to complete the guys tutorial with some minor modification (like using the large 1.8G model https://alphacephei.com/vosk/models), the mic test has transcribed 'one, two, three' as "the one to free the", so there is chance this whole work could be fruitless but still, gonna carry on with it just to see how much can it be done with it.
>>
>>38913761
https://u.smutty.horse/mincjqghyaj.py
more progress, worked out how to load up a single wav file and have it transcribe it (process takes about 30 seconds), in next few days I will test out ideas on how to load multiple audio files, transcribe them and export the transcription as text file (Im thinking in two different formats, first that will be large list that contain all the proper training list (fileName.wav|transcripText;) and second format of the original PPP export of having text file being same as audio file and transcription saved inside of it.
>>
>>38912536
>the same sort of emotion coding 15 is doing
Cookie was the one to discover that...
>>
>>38913761
>the one to free the
It is a prophecy. What did it say next?
>>
>>38913761
why not use google cloud
>>
>>38914420
Alphabet, maybe?
>>
I've started to see more and more commercial voice transfer/conversion models that at least match or surpass Controllable TalkNet, while I feel like the open source options have all come to a complete standstill.

Is quality vocal transfer likely to end up commercially gated at this point?
>>
>>38914420
Honestly? No reason other than autistic desire to have everything be potentially available offline, just in case people who operate google colab go full retard and ban and/or cripple the use of the ai tools on their servers.
Also I have a pretty lofty collection of voices from variety of games but can't be asked to transcribe everything by hand (and while random game characters are not poni, it would be nice to use some other male voices than Sunburst/Shining/Wheatley/Spongebob when it comes to dubbing random characters with ten or so spoken lines).
>>
>>38914770
That's what it's looking like. Just look at what's happening to Dall-E.
>>
>>38914770
you vill only use ze paid for vocal transfer vith code of conduct restrictions and censorship

you vill be happy
>>
>>38914880
At least until consumer computers start to catch up over the next decade, maybe sooner. There will eventually be diminishing returns for having an advantage of raw computing power at least when it comes to things like voice replication and image generation, and people will be able to eventually start affording the processing power to get 90% of the way to "perfection" by themselves. Commercial and research applications will likely have long moved onto more complex things by this point and we'll be asking this same question about those things.
>>
File: 1659399862796061.png (207 KB, 598x404)
207 KB
207 KB PNG
>>38914880
>Just look at what's happening to Dall-E.
The first one was operating a year an half and the 2 was published less than half year old, just few days ago the NovelAi dude has posted this and some other images based on the mini architecture but uncucked from all the "ministry of correct thoughts" meddling and results are already look pretty good.
Stuff that is made by OpenAi/Facebook/whatever big corpo will always look more impressive on basis that they have millions of dosh to throw at it, but it will always lack the passion and autism of people who actually give a shit, like before PPP started, how many tts models existed out there that actually conveyed any emotion expression in their voices? I personally do not know any but perhaps there were some, maybe even a dozen, maybe less than handful.
We are living at interesting times, like when the press machines were slowly replacing the monks copying books by hand, and in years/decades/centuries later those machines were replaced by type witters whom in turn got replaced by computers and printers. Al of this tech is too good for people to just look at it and say 'nah, this can't be made better', since it only takes handful of turbo autists to come along and shake things up in a way nobody has expect it.
>>
>>38914917
But that's the thing, everything leads to monetization in (big C)apitalism. Yes, that's great that PPP spearheaded TTS, but if Ubercuck is any indication then there are many companies champing at the bit to try and charge for it. Eventually a giant corpo is going to come along and 15 is going to be made an offer even he won't be able to refuse.
>>
>>38914927
>Eventually a giant corpo is going to come along and 15 is going to be made an offer even he won't be able to refuse.
Or more realistically, if he does refuse they'll just take that amount and throw it at other people who will do exactly what he's done but this time with seven figures behind it.
>>
>>38914927
>(big C)apitalism
While I like 15 and anons in this thread I do agree with you that writing enough zeros on check will break folks loyalty, specially if the alternative is getting flooded and suffocated with six dozen half-baked copycats.
But I still believe in PPP and the spirit of independent folks in caring on, since there are entire groups/communities of people that are able to not follow the normie trends I do believe there will be horsefuckers and other codefags that will keep on fighting the good fight to keep free ai tools free and available for everyone, no mater how noisy the corporations with near unlimited budged will try to bribe them or pay of army of shitpost raiders.
Poni will always exist, as long as there are people caring for poni.
>>
>>38914927
>>38914948
>>38914952
I thought 15 already turned down a huge figure for his work?
>>
>>38914957
Every man has their price. To think otherwise is naivete.
>>
File: twil-e.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
You look at this AI mare and tell me there's a God
>>
>>38914986
I have seen god and it is this mare.
>>
File: 1833583.png (449 KB, 900x800)
449 KB
449 KB PNG
Any chance there's a singing Sunset Shimmer model available for Talknet?
>>
>>38915112
>>>/trash/
>>
>>38914977
>"How does 15 zeros sound?"
>>
>>38915307
"You do know that adding more zeros behind the decimal point doesn't make it any bigger, right?"
>>
Synthbot do you remember the zstd/tar command parameters you used for compression.
Any special compression/dictionary options?
>>
>>38915671
(I'm talking about the XFL archives.)
>>
>>38914986
When I want to take a step back and revel in AI progress, I like to frame it in my head like what I'm looking at is being presented to someone at the start of the PPP.
>>
I ran a integrity/validation check on all the symbol files in the XFL the archives.
Found a few files in xfl-files-part1.tar.zst, which cut off, it's not a lot of files, but it's strange that it happened.

final-xfl-conversions/xfl-files-0-10a/MLP506_424/LIBRARY/IMP02&#042Hoof.xml
final-xfl-conversions/xfl-files-0-10a/xfl-files-10/MLP506_388A/LIBRARY/FIMP01&#042Hair_02.xml
final-xfl-conversions/Animation XFL 11-20/MLP509_551/LIBRARY/PC&#042Tail_change_animated.xml
final-xfl-conversions/Animation XFL 11-20/MLP509_532A/LIBRARY/IUC06&#042Blink_front&#032.xml
final-xfl-conversions/Animation XFL 36-49/MLPTST_01020100421110540/LIBRARY/PP&#042Tail.xml
final-xfl-conversions/Animation XFL 36-49/215_075_000_song_v4/LIBRARY/gfdhgfderw.xml
final-xfl-conversions/Animation XFL 36-49/MLPTST_010_04212010_11_2_52160/LIBRARY/PP&#042Tail.xml

I discovered it after trying to count the tag usage of each symbol file as a warm up exercise to work with the XFL archives.
Just in case someone finds it useful:
https://pst.klgrth.io/paste/djjy8
>>
Are there any free TTS with emotions? I feel like, even with reference audio, talknet is only good at one tone of voice per model, or it becomes muddy.
>>
>>38915671
>>38915797
Interesting. That's not a zstd issue. It looks like those files are just broken. Maybe Animate crashed part-way through those conversions. I can look into it tomorrow.
I didn't use any special parameters. This seems to work fine to extract the files:
>cat ../xfl-files-part1.tar.zst | zstd -d | tar x
>>
>>38916200
If you've already tried 15.ai and didn't get the desired level of emotion than you're in a spot of poor luck, better luck later, Raphael. You're emotional nuance is too precise for our current emotional fidelity.
>>
>>38916239
15's quality isn't my issue, it's the service. I'd like to train my own models with emotion, without overloading 15's email inbox with requests and data updates. I'm finding it hard to believe after those replies about the competition catching up, that there's nothing we can use on our own machines besides talknet and a broken tacotron.
>>
15 could release two papers. DeepThroat 1 and then 2. In this way he could make his ai open source, get credits and continue working on it
>>
Any p9ssibiloty for fairseq Pny? Gpt j is amazing, but we have even bigger models then evrr before completly open source. Maybe even a neox model? That would be a little much though
>>
>>38917263
Go drunk anon you're home
>>
File: Thumb.png (433 KB, 776x950)
433 KB
433 KB PNG
2 WEEKS REMAIN.
To our current claimants, let me know if you're still working on your clips.

The August 18th deadline draws closer and we still have a few unclaimed clips, those being 4-3 (STAMPEDE) and 4-9 (Mayor Speech). 4-11 (Twivestigation/Sleepy) also said they're willing to relinquish if anyone wants that clip, due to IRL stuff preventing progress. >>38879527

Finally, claiming 4-4.
>>
>>38917366
but this is home
>>
>>38917734
That's what he said.
>>
>>38917751
sorry i'm drunk
>>
>>38917793
We know.
>>
>>
>you have to guess her name in eldritch speak and in reverse alphabet
Bravo Roberta, bravo.
>>
>>38918077
Fuffey is a qt
>>
>>38894199
https://u.smutty.horse/minnzcjkaye.zip
100% Offline transcript for a pile of individual short wav files. Included a Readme.txt so do look into it before running this python script, but other than that I tried to make it as "goofproof" as possible, one downside is that it does require editing two lines in the script itself, and downloading the 2GB transcript model before running the script.
I would highly appreciate if Anons could test this script, so I know it works correctly on their end.
>>
>>38918149
Just one more note I forgot, the script will not able to read the wav files if the directory has '()' or '[]' brackets (I would imagine some other special symbols would mess with reading the directory too), so keep the format simple or just temporary move the wav files else were.
>>
>>38917679
I'm still hoping to complete 4-23 before the deadline.
>>
>>38916530
you can write a script to copy only the files with the emotion you want to a folder then train a talknet on that, and have separate talknets for every emotion.
>>
>>38918339
No
>>
>>38918339
We do ponies here. It's one of the P's in PPP.
>>
>>38918339
The PPP isn't your personal slave. You do the work yourself or fuck off.
>>
>>38918339
I did some voice models for other games, but I never asked here for help with the dataset. That's because PPP is strictly for ponies. Collect your own dataset man, don't be a lazy ass.
>>
>>38918339
You might be able to e-mail 15 to see if he wants to forward you off the datasets he's been sent. But if you go into it saying "I don't want to do the work so do the work for me" you're gonna have a bad time because you're coming across as some entitled little bitch with your "I'm an academic so bow down to me and just dump everything I'm asking for now."
>>
>>38918339
>I'm an academic working on emotive TTS
And? Guys here are more knowledgeable on TTS than your sissy ass. Get out retard.
>>
So many 15.ai wannabes these days, huh?
>>
File: Rainbow bruh.png (99 KB, 971x880)
99 KB
99 KB PNG
>>38918339
>deleted
>>
It’s so funny how so-called TTS researchers will show up in 2022 with models worse than what 15 first showed up to the PPP back in 2019 and demand that we work for them. Fucking lmao.
>>
>>38918339
>I'm going to revolutionize AI TTS
>No I will NOT do hard work myself
https://youtu.be/ToXc7sGxipY?t=55
>>
File: TrixBump.jpg (192 KB, 1280x1280)
192 KB
192 KB JPG
>10
>>
UP
>>
>>38918330
I've actually already spent quite a while trying that already (it's only been a few days since I actually deleted some of the garbage models trained in this way from my hard drive, though they're still on my Google drive), most datasets are too small for it, and they will each sound scuffed and recognisably talknet, anyway. Not ponies, but Anon's postal dude uses generally the same range throughout, and the character has a very smooth voice, but he still sounds clunky in talknet, and I think it still would be that way for the most part, even if I went through anon's work and pruned troublesome clips, because in a combined dataset, those would represent different emotions or other parameters and those would make him sound more human.

I think, so far, what I've learned (in boating school) is, talknet would prefer you fed it a dataset approaching monotony, and that's the case even if you're dividing a dataset up. You can have shouty for anon_angry, flaming turbofaggot for anon_happy, and sarcastic little shit for anon_smug, but without any real range that you'd hope for and expect from a real human.

Maybe I'll try cookie's tacotron today and see if that does what I'm looking for. Or maybe I'll spend all day generating pone sex noises on 15 instead, again.

>>38918339
Don't be such a baby. it'll take you a week to prepare a dozen, maybe two dozen datasets. If you're a programmer, you should be able to type these out even quicker than what I used to do with a pen and paper.
>>
>>38914986
crazy stuff man, tell me the prompt for this
>>
Any news on availability of text prompt art generating models?
>>
>>38920257
Haven't been keeping up too closely but I'm pretty sure there's something called Stable Diffusion coming soon. Quality-wise it seems really good, not sure about flexibility.
>>
>>38920129
no idea, some anon posted it with no details
>>
>>38920257
I'm in, but ran out of credits
>>
>>38918149
>>38918202
https://u.smutty.horse/minwtdtybuj.zip
Quick fix for version 2, now it loads the model before the start of the transcription loop, as the old way would unload it at the end of loop wasting 30s per transcription file. When testing it, it was now able to convert 155 files in just 12 minutes.
>>
>>38920892
I have Midjourney credits still if anyone wants anything?
>>
>>38921078
Lewd but cute Fluttershy
>>
Got into the Closed Beta of Stable Diffusion. The bot goes live in 24h time, so expect stable mares to be shared roughly around that time.

Might start with Nurse Redheart staring out the window of a hospital room. Hopefully she turns out okay.

There's quite a few unknowns at the moment, like how frequent the prompts can be, what limits there are in quantity, if there are any settings that can be selected, etc. We do know however that it's also strictly SFW, so there's that. I still wonder if horsey butts are safe enough, or if they'll consider that lewd. Who's to say. Gonna tread carefully in the meantime.
>>
>>38921078
Can you test this:
https://twitter.com/phillip_isola/status/1532189616106881027?t=8RFN5ChA7afKpciVIb-Fgg&s=19
>A very beautiful painting of a horse next to a cottage.
>A very very very very very very very very very very very very very very very very very very very very very very beautiful painting of a horse next to a cottage.
>>
>>38920918
Thanks, this is certainly way faster. Shame vosk seems to suck with low quality audio, I'm trying to make a dataset of all the voice lines from the HL1 games and it can't understand them
>>
>>38915797
>>38916238
It was an export issue. I'm pretty sure Animate crashed partway through those conversions, then my script saw the folder and continued as if the export were complete after restarting Animate.
Here's the re-exported version of those files. I'll make sure these are included with the eventual IPFS version of the XFL files.
- https://drive.google.com/file/d/1mUwf98abYBeuQH-6mHg7Tj9gfAYJzld2/view?usp=sharing

Thanks for checking, and let me know if you find any other issues.
>>
>>38921918
> Thanks for checking, and let me know if you find any other issues.
Will do.
>>
>Page 8
>>
>>38921802
Out of curiosity, I decided to test this out too. It was much harder getting good results from just "very" where as multiple "very"s got the expected result more of the time. Still issues with scaling though; damn that's a big horse.

Very on the left, Very(x10) on the right.
>>
File: Image1.png (938 KB, 1080x1080)
938 KB
938 KB PNG
>>38894215
Delivering 4-4
https://u.smutty.horse/miocxdcbjzc.mp4
Major thanks to the Anon who provided the merch pictures.
>>
>>38923175
What did Pinkie Pie say? "Sometimes I wonder if we were meant to be better ponies"?
>>
>>38923035
Wow. That is a tiny house.
I'm not seeing any obvious difference, so I guess it's model- or prompt-specific. Thanks for testing it.
>>
>>38923282
Do you want a test on DALLE?
>>
>>38923456
It wouldn't hurt. Just knowing that it doesn't apply to all transformer & guided diffusion models is telling, but it could be interesting see how models differ in this regard.
>>
File: c773e561.png (1.36 MB, 1650x632)
1.36 MB
1.36 MB PNG
>>38923495
Here's "very" once
>>
File: 81878569.png (1.32 MB, 1661x661)
1.32 MB
1.32 MB PNG
>>38923560
and many "very"s.
I don't think it did much of a difference at all here.
>>
>>38894215
Claiming 4-3
>>
File: Silly.png (1.06 MB, 1920x1080)
1.06 MB
1.06 MB PNG
>>38923974
Does anyone know how to replicate the vibrating effect on Pinkie's voice when she says "this makes my voice sound silly"? Preferably with Audacity.
>>
>>38923991
Very possible that's only a thing you can do by having your hand interact with your throat, or something else involving a real, biological throat and larynx. (Being vibrated/shaken, or *maybe* just simple pitch modulation.) Too bad you can't ask Andrea Libman how she did that effect. Or can you? https://twitter.com/AndreaLibman
>>
>>38923991
I recognize that effect, it's just pitch modulation, you just need to set a rate in Hz and a depth percentage. There can also be a shape waveform but it's usually sine. Also if you're planning on applying that properly to a voice, they'll need to be dragging out the words longer instead of speaking at a normal rate (Thiiiiiiiiis maaaaaaaakes myyyyyyyyy voooiiiiice sooooooooounnd siiillllyyyyyyyy)
I can't try this out now because I'm mobileposting. For fun, later I'll see if I can try to match the modulation rate and normalize Pinkie's voice.
>>
>>38924028
we live in an age where AI can generate images from a prompt and make cartoon character voices say anything and you think voice warbling is something that can't be done in software?
>>
>>38923560
I love the first result of this, can you please share the upscaled version?

>>38923562
The difference is pretty hard to notice, but if you compare the two, it kinda looks like the 'many very' version is more saturated/vivid in colour in relation to the other. Which doesn't work quite so well in a natural setting with a real horse. The duller versions are nicer.

Still, with more bias towards vividness, it may prove useful for pony pony. Overall, a good test ^-^
>>
Bump.
>>
File: unknown-295.png (680 KB, 512x512)
680 KB
680 KB PNG
For the next 40 minutes, Stable Diffusion has an invite link, so if you wanted to join in with the closed beta feel free to use it:
>https://discord.gg/akQHGE4

It is another Discord hosted Text-to-image AI, but unlike MidJourney, Stable is pretty much a seemingly Dall-e 2 equivalent. The bot will be live half an hour after the link expires; so 1.25h from now

It's unknown how long the closed beta will be, but there have been mentions that instances will be able to available to run locally in the future. Which they also don't care if it generates NSFW content providing it's not on their server. But yeah, if this interests you, be sure to get in while you can.
>>
Here's a collage of some of the results I liked enough to save. Still very much impressed by it and can't wait to test pony prompts. It at least does horsey horses well.

It apparently struggled with Twilight and Nurse Redheart thus far, but that was only one result and might've been prompted poorly. Only time and numerous attempts will tell how feasible it is for pony.
>>
>>38923991
>>38924028
It's not done with a biological throat, even in show. Literally all it is is pitch shifting the audio up and down rapidly. I don't know how it could be done in audacity but worst case just shoot me the line you need it applied to and I should be able to replicate the effect very closely.
>>
>>38923991
https://www.youtube.com/results?search_query=modulated+voice+wobble+effect
>>
>>38923991
>>38925065
https://www.youtube.com/watch?v=vucOFZvCv34 to be exact
>>
>>38925059
>>38925074
If it's just rapid pitch shifting then I think I should be able to work that out. Thanks.
>>
>Page 8
>>
>>38923991
Isn't it a tremolo (depth~80% and speed ~20Hz)?
If so, Audacity can do if in the second part of the effects (after the grey bar).
>>
>>38894215
Claiming 4-9
>>
>>38925787
Tremolo and vibrato are not the same.
>>
>>38924654
>Runs locally on 5BG of VRAM
>Can do NSFW
>Possibly uncensored/non-lobotomized dataset in general?
>Cutting edge results

Holy shit, that was fast. They better get a public version out quick so we can archive it before the pearl-clutchers write five thousand OP EDs and get it killed. Then again, if the technology's really at this point, censorship may be a cat/bag situation. It's gonna come out sooner or later anyway.
>>
>>38924243
>>
>>38924654
Dammit, 3am post in my timezone so I couldnt get in. These look amazing though and I'd love to try this out, so if they ever do any more invites please let me know
>>
>>38926672
Stable Diffusion's turn.
Here's the best result for the standard "very".
>>
>>38926944
Too many "very"s really distracts from the intended prompt. It gets some interesting results still. Some cursed, but some nice.
>>
>>38926971
what the hell is that poor horse on the bottom right hahahahahaha
>>
>>38926971
>>38927156
hoers
>>
>>38926971
The interesting thing about the failures to me is that they aren't complete gibberish. Bottom right horse does check out as a horse when viewed in individual chunks: the neck and front legs, the top of the head, the nose that goes on to become another head. All of these work if viewed on their own, they're just not put together right. This seems to be the exact opposite of DALLE Mini, which often gives results that say, look GENERALLY like David Bowie, but the details are always mangled in different ways.
>>
File: 2086962.jpg (49 KB, 564x563)
49 KB
49 KB JPG
https://u.smutty.horse/miotrdzsblv.wav
>>
>>38928074
bugbutt
>>
>>38894215
Delivering 4-3
https://u.smutty.horse/miovnxuwbub.mp4

I didn't have any good ideas for visuals so if anyone wants to build on it, feel free.
>>
File: 1734459.png (1.14 MB, 1920x1080)
1.14 MB
1.14 MB PNG
>>38928235
https://u.smutty.horse/miowqvkadas.wav
>>
>>38928074
>>38929141
B-b-bug m-mommy.
>>
up
>>
File: Doctor to Meadow.png (379 KB, 1536x1876)
379 KB
379 KB PNG
>>38910775
Hi Clipper. I watched some of the stream. You asked at one point how to take a pose of berry punch and change it to cherry berry. In this case, since the characters' shapes and hair styles are identical, you can just change the character's colors and cutie mark. In Adobe animate, you can do a global find-and-replace for colors. Hit CTRL+F, change the "For" field to "Color", select the color you want to replace and what to replace it with, then click "Replace All". In this example, I am changing Doctor Whooves into Meadow Song:

For the iris, however, you will probably need to drill down into the iris symbol and manually change it. Note: I use Adobe Flash CS6; the UI might be a little different for Adobe Animate.

As for changing the cutie mark, you might be able to get away with swapping the symbol. Try selecting the cutie mark, click the "Swap" button in the properties panel, and then select the other character's cutie mark. I don't have Meadow Hoof's cutie mark handy so I'm using Comet Tail's:

This technique may produce weird scaling and placement issues for the cutie mark depending on whether the show assets have a consistent size and anchorpoint placement.

Note: These screenshots are taken from my own personal show-accurate puppets, so the arrangement of the assets is probably different in your file.

You'll need to make sure that the other character's cutie mark is in your library first; you can copy it from the "library" panel in one file and paste it in another file's library panel:

Hope that info is useful.
>>
>>38929728
Meadow Song doesn't wear the same collar and bow tie, does he? I thought that was something they added to Time Turner later on as a fandom nod to Doctor Who.
>>
Stable Diffusion has proven to be very good overall for a variety of images. Though as it's typical with most Text-to-image AI, it struggles a bit with pony specifically.

It is capable though with a variety of prompts, and it does recognise key character names, but show accurate styling seems to be quite difficult to achieve; or at least good results if it

With plans to release open sourced version(s), so far the most promising direction is fine-tuning on image datasets from the boorus, which I imagine will help. It was mentioned that a V2 they're working on its intended to be runnable on only 16GBs VRAM, so local instancing may be possible for some anons, as well as some Google Colab instances and similar services. So it seems feasible that multiple anons could attempt fine-tuning to each get a good model from the same dataset fine-tuning, or training on their own personally selected datasets (specific ponies, SFW/NSFW focused, artist styles, tag exclusions, etc).

Overall, things are looking bright. NovelAI has also paired with Stable Diffusion, so even though the beta is currently closed, there's still potential for access beforehand. It's unclear if this will be with or without NSFW or other content limitations though.
>>
>>38930479
You are correct. I was in a bit of a rush when finishing that post and didn't notice that I had left in the bowtie.
>>
>Synthbot's 'Fimfarchive tools - release v4.ipynb' - Link
I was trying to access this in the quick guide, but the link is broken. Synthbot (or other Anons), do you happen to still have a copy of this colab?
>>
File: Scootabloom2.png (123 KB, 1920x1080)
123 KB
123 KB PNG
>>38929728
Thanks for this anon, very useful. With this I was able to change Scootaloo's colours to match Apple Bloom. For the iris (and future cutie marks too), my approach was to just copy/paste them in from an existing reference asset. So long as I have a reference asset for the pony conversion I want to do (which most of the time I will given the sheer number of assets in the leaks), this method should be replicable for others that need doing. This will mainly be used to create additional picture assets for background ponies that don't have many assets available, but do generally share basic designs with others.

>>38930479
>>38930902
Extra assets like that are their own symbols which can be easily deleted/added in as needed.



Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.