2023-02-27

遅ればせながら Stable Diffusion を触ってみました・コピペで動かせるコマンド・コード紹介

プロンプト (呪文) テキストを入力すると画像を生成してくれる AI、Stable Diffusion。セットアップすればローカルマシン上でも動かせるのが特徴なので、今回は自分のマシンで Stable Diffusion による画像生成までを試してみた。今回紹介するコマンド・コードをコピペして真似していけばアナタのマシンでも Stable Diffusion が動かせるヨ！

AIとコラボして神絵師になる　論文から読み解くStable Diffusion (技術の泉シリーズ（NextPublishing）)

PC 環境
Python 3.9 系をインストールする
一応 pyenv-win を入れておく
CUDA をインストールする
PyTorch をインストールする
Transformers をインストールする
Diffusers をインストールする
Accelerate を入れると良いよとメッセージが出てきたので入れる
Hugging Face に登録して API Key を発行する
Waifu Diffusion も試してみる
今回はココまで！

PC 環境

今回検証した PC 環境は次のとおり。

OS : Windows 10
CPU : Intel Core i7-7700K
GPU : NVIDIA GeForce GTX1080

GTX1080 搭載の PC で、GPU を使って画像生成する。結果的に、1枚の画像は2分程度で生成できた。

Python 3.9 系をインストールする

今回は WSL を使わず、Windows 環境の PowerShell を使って、直接 Python をインストールしていくことにする。

Stable Diffusion を動作させるにあたって、PyTorch という機械学習ライブラリをインストールする必要がある。コレは Facebook が開発しているモノらしいが、動作する Python のバージョンが v3.9 系までで、v3.10 や v3.11 系ではうまくインストールができなかった。

自分の環境では Microsoft Store でインストールした v3.10 と v3.11 が微妙に併存していたので、一旦全部アンインストールし、Python.org から v3.9.13 をダウンロードしてインストールした。

参考 : PyTorchのインストール失敗時にチェックすべきこと | ジコログ

# Python v3.11 で PyTorch をインストールしようとすると次のようなエラーが出る
PS> pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu117
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

参考 : Python Release Python 3.9.13 | Python.org
- python-3.9.13-amd64.exe をダウンロードしインストールする

> python -V
Python 3.9.13

> pip list
Package    Version
---------- -------
pip        22.0.4
setuptools 58.1.0

# pip を最新版にアップグレードする
> python -m pip install --upgrade pip setuptools

ココまで準備できたら Python 本体の準備は完了。

一応 pyenv-win を入れておく

ホスト環境に直接 Python 3.9.13 をインストールしたので、今後 Python のバージョンを切り替えたりできるように、pyenv-win をインストールして、同じ v3.9.13 を使用する設定にしておく。

参考 : ★2022年最新★入門！Windowsでpyenvを使う方法 - ３流なSEのメモ帳

# pyenv-win をインストールする
> pip install pyenv-win --target .pyenv
# `~/.pyenv/` (`C:\Users\【ユーザ名】\.pyenv\`) フォルダにインストールされる

「システム環境変数」に PYENV を設定し、PATH を通す。

PYENV : C:\Users\【ユーザ名】\.pyenv\pyenv-win
PATH : %PYENV%\bin を最優先にしておく

再起動後、以下のように pyenv コマンドが動作したらグローバルに v3.9.13 を設定しておく。

> pyenv --version
pyenv 3.1.1

> pyenv install 3.9.13
> pyenv versions
> pyenv global 3.9.13

CUDA をインストールする

PyTorch ライブラリが GPU を利用できるようにするため、NVIDIA の CUDA というツールキットをインストールしておく。

参考 : TensorFlow 1系(GPU版)のためにCUDA 10.0をインストール | ジコログ
CUDA Toolkit Archive | NVIDIA Developer
- CUDA Toolkit 11.7.0 : cuda_11.7.0_516.01_windows.exe をダウンロード、インストールする

インストール後、環境変数 CUDA_PATH が設定されていることを確認したら、PowerShell で nvcc コマンドを使ってインストールが成功していることを確認する。

> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

PyTorch をインストールする

Python 本体・pyenv-win・CUDA のインストールが終わったので、ようやく PyTorch をインストールできる。

参考 : 【Windows】GPU版PyTorch 1.12系のインストール | ジコログ
参考 : Start Locally | PyTorch
- このサイトでインストールコマンドを組み立てられる

> pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
Installing collected packages: charset-normalizer, urllib3, typing-extensions, pillow, numpy, idna, certifi, torch, requests, torchvision, torchaudio
Successfully installed certifi-2022.12.7 charset-normalizer-3.0.1 idna-3.4 numpy-1.24.2 pillow-9.4.0 requests-2.28.2 torch-1.13.1+cu117 torchaudio-0.13.1+cu117 torchvision-0.14.1+cu117 typing-extensions-4.5.0 urllib3-1.26.14

PyTorch の動作確認用に、次のような Python スクリプトを書いてみる。

check-pytorch.py

import torch
print(torch.__version__)
print(torch.cuda.is_available())

PowerShell で実行してみる。

> python .\check-pytorch.py
1.13.1+cu117
True

このように表示されたら、PyTorch がインストールされており、GPU を利用できる (True) 状態と分かる。

Transformers をインストールする

Stable Diffusion は、Hugging Face が開発する自然言語処理ライブラリの Transformers も使用するので、コイツもインストールしておく。

参考 : HuggingfaceのTransformersをインストールする | ジコログ

> pip install transformers
Installing collected packages: tokenizers, regex, pyyaml, packaging, filelock, colorama, tqdm, huggingface-hub, transformers
Successfully installed colorama-0.4.6 filelock-3.9.0 huggingface-hub-0.12.1 packaging-23.0 pyyaml-6.0 regex-2022.10.31 tokenizers-0.13.2 tqdm-4.64.1 transformers-4.26.1

コチラも Transformers の動作確認スクリプトを作って確認してやろう。

check-transformers.py

from transformers import pipeline
classifier = pipeline('sentiment-analysis')
results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
for result in results:
  print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

次のように実行してやると、感情分析がされていることが分かる。

> python .\check-transformers.py
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309

Diffusers をインストールする

Stable Diffusion を素のまま使おうとすると、インストールが色々と手間取るので、実行時に必要に応じてモデルをダウンロードしたりしてくれる Diffusers というツールをインストールする。

参考 : 最先端の機械学習モデルを利用できるDiffusersのインストール | ジコログ

> pip install diffusers
Installing collected packages: zipp, importlib-metadata, diffusers
Successfully installed diffusers-0.13.1 importlib-metadata-6.0.0 zipp-3.14.0

コチラも Diffusers の動作確認スクリプトを書いてみる。後述するが現在のバージョンでは ["sample"] 部分を ["images"] と修正しないと正常に動作しない。

check-diffusers.py

from diffusers import DiffusionPipeline
model_id = "CompVis/ldm-text2im-large-256"
# load model and scheduler
ldm = DiffusionPipeline.from_pretrained(model_id)
# run pipeline in inference (sample random noise and denoise)
prompt = "A painting of a squirrel eating a burger"
images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6)["sample"]  # ← `["images"]` としないといけない
# save images
for idx, image in enumerate(images):
  image.save(f"squirrel-{idx}.png")

実行してみると次のようなエラーが出た。

> python .\check-diffusers.py
Traceback (most recent call last):
  File "C:\Dev\practice-stable-diffusion\check-diffusers.py", line 7, in <module>
    images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6)["sample"]
  File "C:\Users\Neo\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\utils\outputs.py", line 88, in __getitem__
    return inner_dict[k]
KeyError: 'sample'

参考 : Waifu DiffusionのOpen in Colabが動かない場合の対処方法 - Qiita

どうもバージョンアップにより、["sample"] 部分が ["images"] に変更されているらしいので、修正するとうまく動いてくれた。

Accelerate を入れると良いよとメッセージが出てきたので入れる

先程の check-diffusers.py を実行したときに、pip install accelerate しろというようなメッセージが出てきていたので入れておく。PyTorch を CPU・GPU をまたいで色々な環境でうまく動かすためのライブラリらしい。

> pip install accelerate
Installing collected packages: psutil, accelerate
Successfully installed accelerate-0.16.0 psutil-5.9.4

参考 : 【PyTorch】Accelerateのインストールと設定 | ジコログ

Hugging Face に登録して API Key を発行する

Stable Diffusion のモデルは Hugging Face というサイトが公開しており、モデルの利用には API Key の発行が必要になる。

参考 : CompVis/stable-diffusion-v1-4 · Hugging Face
- モデルのページ
参考 : Hugging Face – The AI community building the future.
- Hugging Face の登録ページ。登録してログインする
Hugging Face – The AI community building the future.
- ログイン後、画面右上のユーザアイコン → Settings → Access Token と進み、Read 権限の API Key を発行する

今回はついでに、NSFW フィルターを回避するコードを混ぜ込んで、Stable Diffusion を動かしてみる。

check-stable-diffusion.py

import torch
from diffusers import StableDiffusionPipeline
from torch import autocast

prompt = "a cat"  # ココにプロンプトを入れる
YOUR_TOKEN = "【Hugging Face の API Key】"
MODEL_ID = "CompVis/stable-diffusion-v1-4"
DEVICE = "cuda"

pipe = StableDiffusionPipeline.from_pretrained(MODEL_ID, revision="fp16", torch_dtype=torch.float16, use_auth_token=YOUR_TOKEN)
pipe.to(DEVICE)

# Avoid Safety Checker : Start
def null_safety(images, **kwargs):
  return images, False
pipe.safety_checker = null_safety
# Avoid Safety Checker : End

with autocast(DEVICE):
  image = pipe(prompt, guidance_scale=7.5)["images"][0]
  image.save("test.png")

pipe.safety_checker を誤魔化すことで、Stable Diffusion の NSFW フィルターを回避できる。決してイヤらしい目的ではなくて、誤解釈されて画像生成がうまくされないことを回避するために入れている。ｗ

> python .\check-stable-diffusion.py

初回はモデルのダウンロードが発生するが、2回目以降は省略される。1枚の画像は2・3分程度で生成される。出力された test.png を見てみよう。

コレで Stable Diffusion をローカルマシンで動かせるようになった。あとはプロンプトを色々と変更してみて、好きな画像を生成してみよう。

AIとコラボして神絵師になる論文から読み解くStable Diffusion【電子書籍】[ 白井暁彦 ]

楽天Kobo電子書籍ストア

価格 : 1760円

Waifu Diffusion も試してみる

アニメ絵をキレイに生成できる、Waifu Diffusion という別のモデルもあったので、公式のスクリプトだけ動かしてみる。

参考 : hakurei/waifu-diffusion · Hugging Face
- モデルのページ
参考 : Waifu-DiffusionをWindowsローカル環境で試す - Qiita
practice-waifu-diffusion.py

import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained('hakurei/waifu-diffusion', torch_dtype=torch.float16).to('cuda')
prompt = "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt"
with autocast("cuda"):
  image = pipe(prompt, guidance_scale=6)["images"][0]
image.save("waifu.png")

モデルのページにあるスクリプトでは、torch_dtype 部分が torch.float32 だったのだが、コレだと以下のように Out Of Memory エラーが出てしまったので、torch.float16 に変更している。コレだとうまく行った。

> python .\practice-waifu-diffusion.py
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 640.00 MiB (GPU 0; 8.00 GiB total capacity; 7.18 GiB already allocated; 0 bytes free; 7.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

# GPU メモリの仕様状況は以下で確認できる
> nvidia-smi