Stable Diffusion 3.5 が公開 AI -

Midnight Melody Blog Stable Diffusion 3.5 が公開 - AI -

Stable Diffusion 3.5 が公開 - AI -

2024-10-31 06:34

皆さまいかがお過ごしでしょうか。歌乃です。

今月 (2024年10月) 22日付で、stability.ai から Stable Diffusion の最新バージョン 3.5 シリーズが公開されました。

不評だった sd3 からのコミュニティ・フィードバックを反映して改善された (らしい) モデルとなります。

公開されているモデルは三種類で

Stable Diffusion 3.5 Large 最上位版。最高品質。
Stable Diffusion 3.5 Large Turbo Large のFineTune版、4ステップで生成可能。高品質。
Stable Diffusion 3.5 Medium 品質と速度のバランスを取ったモデル。

となっております。

各モデル出力

Large モデル

画像品質はかなりいいです。Fluxと比べても遜色ない感じです。それでいて生成速度もそこまで遅くありません。

fp8 (included clip,t5xx) モデルも用意されてます。

筆者の環境 RAM32GB RTX-3070 VRAM8G でも 1024x1024px 40秒程度で生成できます。どちらのモデルでも out of memory は発生しませんでした。

Flux1d で2分とかかかってたのを考えればかなり高速です。

Model: sd3.5_large_fp8_scaled ( included clip,t5xx,VAE )
Step:16, CFGScale: 4
Cliplmodel: clip_l
Clipgmodel: clip_g
Txxlmodel: t5xxl_fp8_e4m3fn
Prompt: Centaur cyborg girl, lower body half horse, upper body girl in mechanical suit, hard lights, rich details, horse head replace girls body,

Model: sd3.5_large
VAE: sd3.5VAE(official)
Step:16, CFGScale: 4
Cliplmodel: clip_l
Clipgmodel: clip_g
Txxlmodel: t5xxl_fp8_e4m3fn
Prompt: Centaur cyborg girl, lower body half horse, upper body girl in mechanical suit, hard lights, rich details, horse head replace girls body,

Large Turbo モデル

画像品質はわるくないです。Large と比べてそこまで落ちてる感じはしません。多少画像むらが有ったりしますが、髪の毛などでも毛筋がしっかり確認できる程度には精度があります。そして生成速度がかなり速い。

筆者の環境 RAM32GB RTX-3070 VRAM8G で 1024x1024px 17秒程度で生成できます。
4ステップなら12秒とかですが画質が落ちます。6ステップで20秒くらい。環境で変わるので、参考程度に。

Model: sd3.5_large_torbo
VAE: sd3.5VAE(official)
Step:4, CFGScale: 1.2
Cliplmodel: clip_l
Clipgmodel: clip_g
Txxlmodel: t5xxl_fp8_e4m3fn
Prompt: Centaur cyborg girl, lower body half horse, upper body girl in mechanical suit, hard lights, rich details, horse head replace girls body,

Model: sd3.5_large_torbo
VAE: sd3.5VAE(official)
Step:6, CFGScale: 1.2
Cliplmodel: clip_l
Clipgmodel: clip_g
Txxlmodel: t5xxl_fp8_e4m3fn
Prompt: Centaur cyborg girl, lower body half horse, upper body girl in mechanical suit, hard lights, rich details, horse head replace girls body,

Medium モデル

癖が強いです。公式の説明では高解像度 (1440x1440) での生成がおこなえるような記述があって期待したのですが、筆者では使いこなせませんでしたorz。ComfyUI用のサンプルを見る限り、SD3.5L と併用して解像度のスケールアップをおこなうといった応用ができるみたいです。

Model: sd3.5_Medium
VAE: Auto
Step:40, CFGScale: 6.0
Size:1024x1024
Sampler: dpmpp_2m
Scheduler: sgm_uniform
Zeronegative: true
Cliplmodel: clip_l
Clipgmodel: clip_g
Txxlmodel: t5xxl_fp8_e4m3fn

Prompt: Centaur cyborg girl, lower body half horse, upper body girl in mechanical suit, hard lights, rich details, horse head replace girls body,

sd3.5_medium_incl_clips_t5xxlfp8scaled はComfyUI が公開している sd3.5_medium の Clip,t5xx こみこみモデル。
Model: sd3.5_medium_incl_clips_t5xxlfp8scaled
VAE: Auto
Step:40, CFGScale: 6.0
Size:768x768
Sampler: dpmpp_2m
Scheduler: sgm_uniform
Zeronegative: true

Prompt: Centaur cyborg girl, lower body half horse, upper body girl in mechanical suit, hard lights, rich details, horse head replace girls body,

使用所感

それぞれに一長一短ありますが、総じて prompt の再現性・忠実性は低い気がします。学習データにも拠ると思いますが、一般的でないもの (今回のプロンプトであればケンタウロス、半人反馬という状態を AI がイメージできない＝学習データにケンタウロスが含まれていない可能性) は生成が難しいということが判ります。

今回のプロンプトのアイデアの元になったのは猫黒夏躯氏のプロンプトです。この場を借りて感謝を。

個人的には Large Turbo が使い勝手がよかったです。画質はそこそこですが速いので生成ガチャに向いてます(`・ω・´)

Post If you feel like it, I would be happy if you could post it.

Next document Prior document