├── data
    └── .gitkeep
├── model
    └── .gitkeep
├── outputs
    └── .gitkeep
├── tests
    ├── __init__.py
    ├── taxi_prediction
    │   ├── __init__.py
    │   └── test_process.py
    └── test_double.py
├── src
    └── taxi_prediction
    │   ├── __init__.py
    │   ├── consts.py
    │   ├── py.typed
    │   ├── schema.py
    │   ├── model.py
    │   └── process.py
├── .streamlit
    └── config.toml
├── app
    ├── streamlit_sample
    │   ├── 00_simple.py
    │   ├── 02_markdown.py
    │   ├── 01_text_header.py
    │   ├── 05_user_input.py
    │   ├── 07_no_cache.py
    │   ├── 06_file_upload.py
    │   ├── 08_cache.py
    │   ├── 03_dataframe.py
    │   └── 04_plot.py
    └── app.py
├── docs
    ├── images
    │   ├── vscode_拡張機能.png
    │   ├── data_download.png
    │   ├── docker_desktop.png
    │   ├── vscode_拡張機能_検索.png
    │   ├── devcontainer_complete.png
    │   ├── devcontainer_起動ポップアップ.png
    │   ├── docker_desktop_complete.png
    │   ├── vscode_download_for_mac.png
    │   └── git_download_for_windows.png
    └── setup.md
├── .github
    ├── PULL_REQUEST_TEMPLATE.md
    └── ISSUE_TEMPLATE
    │   └── bug_report.yml
├── .env.example
├── compose.yml
├── scripts
    ├── python_interactive_window.py
    ├── conf
    │   └── config.yaml
    ├── train.py
    └── train_with_mlflow.py
├── .gitignore
├── README.md
├── LICENSE
├── Dockerfile
├── pyproject.toml
└── .devcontainer
    └── devcontainer.json


/data/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/model/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/outputs/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/taxi_prediction/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/tests/taxi_prediction/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/.streamlit/config.toml:
--------------------------------------------------------------------------------
1 | [browser]
2 | gatherUsageStats = false


--------------------------------------------------------------------------------
/src/taxi_prediction/consts.py:
--------------------------------------------------------------------------------
1 | MAX_PREDICT_DAYS = 7  # 何日先まで予測するか
2 | 


--------------------------------------------------------------------------------
/src/taxi_prediction/py.typed:
--------------------------------------------------------------------------------
1 | # Marker file for PEP 561.  The mypy package uses inline types.


--------------------------------------------------------------------------------
/app/streamlit_sample/00_simple.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | 
3 | st.write("Hello, Streamlit!")
4 | 


--------------------------------------------------------------------------------
/docs/images/vscode_拡張機能.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/vscode_拡張機能.png


--------------------------------------------------------------------------------
/docs/images/data_download.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/data_download.png


--------------------------------------------------------------------------------
/docs/images/docker_desktop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/docker_desktop.png


--------------------------------------------------------------------------------
/docs/images/vscode_拡張機能_検索.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/vscode_拡張機能_検索.png


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | ## レビュー期日
2 | 
3 | ## 何を達成したいのか、なぜその変更をしたか
4 | 
5 | ## 特にレビューして欲しい箇所
6 | 
7 | ## 関連情報やissuesなど（あれば）
8 | 


--------------------------------------------------------------------------------
/docs/images/devcontainer_complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/devcontainer_complete.png


--------------------------------------------------------------------------------
/docs/images/devcontainer_起動ポップアップ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/devcontainer_起動ポップアップ.png


--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------
1 | # アプリケーションの設定
2 | CONTAINER_NAME=ds_instructions_guide
3 | # 複数のdocker-composeで干渉するため、ポートはプロジェクトごとに変更することを推奨します
4 | PORT=8501
5 | 


--------------------------------------------------------------------------------
/docs/images/docker_desktop_complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/docker_desktop_complete.png


--------------------------------------------------------------------------------
/docs/images/vscode_download_for_mac.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/vscode_download_for_mac.png


--------------------------------------------------------------------------------
/docs/images/git_download_for_windows.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/git_download_for_windows.png


--------------------------------------------------------------------------------
/app/streamlit_sample/02_markdown.py:
--------------------------------------------------------------------------------
1 | import streamlit as st  # noqa
2 | 
3 | """
4 | # タクシーの乗車数予測
5 | タクシーの乗車数を予測するプロトタイプです。
6 | ## 予測用ファイルのアップロード
7 | """
8 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/01_text_header.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | 
3 | st.title("タクシーの乗車数予測")
4 | st.write("タクシーの乗車数を予測するプロトタイプです。")
5 | st.header("予測用ファイルのアップロード")
6 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/05_user_input.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | 
3 | list_selected_area: list[str] = st.multiselect("エリアを選択", ["1", "3"])
4 | display_period = st.slider("表示期間（日）", min_value=1, max_value=100, value=30)
5 | 
6 | st.write(f"選択したエリア: {list_selected_area}、表示期間: {display_period}日")
7 | 


--------------------------------------------------------------------------------
/compose.yml:
--------------------------------------------------------------------------------
 1 | services:
 2 |   app:
 3 |     build:
 4 |       context: ./
 5 |       dockerfile: Dockerfile
 6 |     container_name: ${CONTAINER_NAME}
 7 |     volumes:
 8 |       - ./:/app:cached
 9 |     working_dir: /app
10 |     env_file:
11 |       - .env
12 |     ports:
13 |       - 127.0.0.1:${PORT}:8501
14 |     tty: true
15 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/07_no_cache.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | 
 3 | import streamlit as st
 4 | 
 5 | 
 6 | def heavy_process() -> str:
 7 |     time.sleep(3)
 8 |     return "heavy process"
 9 | 
10 | 
11 | display_period = st.slider("表示期間（日）", min_value=1, max_value=100, value=30)
12 | heavy_process()
13 | st.write(f"表示期間: {display_period}日")
14 | 


--------------------------------------------------------------------------------
/scripts/python_interactive_window.py:
--------------------------------------------------------------------------------
 1 | # %%[markdown]
 2 | # マークダウンセルを使うこともできます
 3 | # # Header 1
 4 | # - list1
 5 | # - list2
 6 | 
 7 | # %%
 8 | import matplotlib.pyplot as plt
 9 | import numpy as np
10 | 
11 | # %%
12 | print("Hello, world!")
13 | 
14 | # %%
15 | x = np.linspace(0, 10, 100)
16 | y = np.sin(x)
17 | plt.plot(x, y)
18 | 
19 | # %%
20 | 


--------------------------------------------------------------------------------
/tests/test_double.py:
--------------------------------------------------------------------------------
 1 | def double(x: int) -> int:
 2 |     """与えられた整数を2倍にする"""
 3 |     return x * 2
 4 | 
 5 | 
 6 | def test_double_ok() -> None:
 7 |     """double関数が正しく動作することを確認する正常系のテスト"""
 8 |     assert 2 == double(1)
 9 | 
10 | 
11 | # 失敗するテストが適切に検知されるかを確認する
12 | def test_double_ng() -> None:
13 |     """わざと失敗するテスト"""
14 |     assert 2 == double(2)
15 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/06_file_upload.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import streamlit as st
3 | 
4 | uploaded_file = st.file_uploader("予測用ファイルをアップロードしてください", type="csv")
5 | if uploaded_file is not None:
6 |     df_upload = pd.read_csv(uploaded_file, parse_dates=["date"])
7 |     df_upload["area"] = df_upload["area"].astype("category")
8 |     st.dataframe(df_upload)
9 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/08_cache.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | 
 3 | import streamlit as st
 4 | 
 5 | 
 6 | @st.cache_data  # キャッシュを追加
 7 | def heavy_process() -> str:
 8 |     time.sleep(3)
 9 |     return "heavy process"
10 | 
11 | 
12 | display_period = st.slider("表示期間（日）", min_value=1, max_value=100, value=30)
13 | heavy_process()
14 | st.write(f"表示期間: {display_period}日")
15 | 


--------------------------------------------------------------------------------
/scripts/conf/config.yaml:
--------------------------------------------------------------------------------
 1 | data_path: "/app/data/taxi_dataset.csv"
 2 | train_ratio: 0.7
 3 | model:
 4 |   objective: "regression"
 5 |   metric: "rmse"
 6 |   boosting_type: "gbdt"
 7 |   num_leaves: 30
 8 |   learning_rate: 0.05
 9 |   feature_fraction: 0.9
10 |   bagging_fraction: 0.8
11 |   bagging_freq: 5
12 | train:
13 |   num_boost_round: 1000
14 |   early_stopping_rounds: 10
15 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Pythonのバイトコード
 2 | __pycache__/
 3 | # pytest, mypyのキャッシュ
 4 | .pytest_cache/
 5 | .mypy_cache/
 6 | # Jupyter Notebookのチェックポイント
 7 | .ipynb_checkpoints
 8 | # 環境変数ファイル
 9 | .env
10 | # VSCodeの設定ファイル
11 | .vscode/
12 | # macOSのDS_システムファイル
13 | .DS_Store
14 | # .gitkeep以外のdataディレクトリ
15 | data/*
16 | !data/.gitkeep
17 | # .gitkeep以外のmodelディレクトリ
18 | model/*
19 | !model/.gitkeep
20 | # .gitkeep以外のoutputsディレクトリ
21 | outputs/*
22 | !outputs/.gitkeep
23 | # MLflowの出力ディレクトリ
24 | mlruns/
25 | multirun/
26 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/03_dataframe.py:
--------------------------------------------------------------------------------
 1 | import datetime
 2 | 
 3 | import pandas as pd
 4 | import streamlit as st
 5 | 
 6 | df = pd.DataFrame(
 7 |     {
 8 |         "area": [1, 1, 1, 3, 3, 3],
 9 |         "date": [
10 |             datetime.date(2019, 12, 1),
11 |             datetime.date(2019, 12, 2),
12 |             datetime.date(2019, 12, 3),
13 |             datetime.date(2019, 12, 1),
14 |             datetime.date(2019, 12, 2),
15 |             datetime.date(2019, 12, 3),
16 |         ],
17 |         "num_trip": [100, 200, 300, 200, 400, 600],
18 |         "label": ["実績", "実績", "予測", "実績", "実績", "予測"],
19 |     }
20 | )
21 | 
22 | st.dataframe(df)
23 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.yml:
--------------------------------------------------------------------------------
 1 | name: バグ報告
 2 | description: 書籍のコードに従っても動作しない、またはバグがある場合はこちらを選択
 3 | labels: ["bug"]
 4 | body:
 5 | - type: textarea
 6 |   attributes:
 7 |     label: 実行時のエラー内容
 8 |     description: 実行エラーの詳細をご記入ください。
 9 |   validations:
10 |     required: true
11 | - type: textarea
12 |   attributes:
13 |     label: エラーの詳細
14 |     description: エラーメッセージやログの一部を貼り付けてください。
15 |   validations:
16 |     required: true
17 | - type: textarea
18 |   attributes:
19 |     label: 実行環境の情報
20 |     description: |
21 |       例:
22 |         - OS: macOS 14 / Ubuntu 22.04
23 |         - Python: 3.10.12
24 |         - ライブラリのバージョンなど
25 |     placeholder: |
26 |       OS:
27 |       Python:
28 |       ライブラリのバージョン:
29 |   validations:
30 |     required: true
31 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 『先輩データサイエンティストからの指南書』のサンプルリポジトリ
 2 | 
 3 | このリポジトリは、技術評論社の書籍『先輩データサイエンティストからの指南書 実務で生き抜くためのエンジニアリングスキル』のサンプルコードを提供するためのものです。
 4 | 
 5 | <img src="https://gihyo.jp/assets/images/cover/2025/9784297151003.jpg" alt="書籍表紙" width="200">
 6 | 
 7 | ## 書誌情報
 8 | 『先輩データサイエンティストからの指南書 実務で生き抜くためのエンジニアリングスキル』著：浅野純季、田中冬馬、武藤克大、木村真也、栁泉穂、2025年刊行、技術評論社、ISBN：978-4-297-15100-3（紙）、978-4-297-15101-0（電子）
 9 | 
10 | - 技術評論社：https://gihyo.jp/book/2025/978-4-297-15100-3
11 | - Amazon：https://www.amazon.co.jp/dp/4297151006/
12 | - サポートページ：https://gihyo.jp/book/2025/978-4-297-15100-3/support
13 |    - 参考文献・正誤表が掲載されています。
14 | 
15 | 本書に関するお問い合わせについては、下記の技術評論社サイトからお願いいたします。
16 | 
17 | https://gihyo.jp/site/inquiry/book?ISBN=978-4-297-15100-3
18 | 
19 | ## セットアップ方法
20 | 
21 | セットアップ手順は、以下のドキュメントを参照してください。
22 | 
23 | - [docs/setup.md](docs/setup.md)
24 | 


--------------------------------------------------------------------------------
/scripts/train.py:
--------------------------------------------------------------------------------
 1 | from logging import getLogger
 2 | 
 3 | import hydra
 4 | from omegaconf import DictConfig
 5 | 
 6 | from taxi_prediction.model import LGBModel
 7 | from taxi_prediction.process import load_dataset, preprocess_for_train, split_dataset
 8 | 
 9 | logger = getLogger(__name__)
10 | 
11 | 
12 | @hydra.main(config_path="conf", config_name="config")
13 | def main(config: DictConfig) -> None:
14 |     # データの読み込み
15 |     dataset = load_dataset(config.data_path)
16 | 
17 |     # 前処理
18 |     dataset_train, dataset_valid = split_dataset(dataset, config.train_ratio)
19 |     df_train = preprocess_for_train(dataset_train)
20 |     df_valid = preprocess_for_train(dataset_valid)
21 |     logger.info(f"train_size: {len(df_train)}")
22 |     logger.info(f"valid_size: {len(df_valid)}")
23 | 
24 |     # 学習
25 |     model = LGBModel(dict(config.model))
26 |     model.fit(df_train, df_valid, **config.train)
27 | 
28 |     # 予測・評価
29 |     scores = model.evaluate(df_valid)
30 |     logger.info(f"scores: {scores}")
31 | 
32 |     # モデルの保存
33 |     model.save("model.pickle")
34 | 
35 | 
36 | if __name__ == "__main__":
37 |     main()
38 | 


--------------------------------------------------------------------------------
/app/streamlit_sample/04_plot.py:
--------------------------------------------------------------------------------
 1 | import datetime
 2 | 
 3 | import pandas as pd
 4 | import plotly.express as px
 5 | import plotly.graph_objects as go
 6 | import streamlit as st
 7 | 
 8 | 
 9 | def _plot_prediction(df: pd.DataFrame) -> go.Figure:
10 |     fig = px.line(
11 |         df, x="date", y="num_trip", color="area", markers=True, line_dash="label"
12 |     )
13 |     fig.update_layout(
14 |         title="乗車数の推移",
15 |         xaxis_title="日付",
16 |         yaxis_title="乗車数",
17 |         legend_title="エリア, ラベル",
18 |     )
19 |     return fig
20 | 
21 | 
22 | df = pd.DataFrame(
23 |     {
24 |         "area": [1, 1, 1, 3, 3, 3],
25 |         "date": [
26 |             datetime.date(2019, 12, 1),
27 |             datetime.date(2019, 12, 2),
28 |             datetime.date(2019, 12, 3),
29 |             datetime.date(2019, 12, 1),
30 |             datetime.date(2019, 12, 2),
31 |             datetime.date(2019, 12, 3),
32 |         ],
33 |         "num_trip": [100, 200, 300, 200, 400, 600],
34 |         "label": ["実績", "実績", "予測", "実績", "実績", "予測"],
35 |     }
36 | )
37 | 
38 | fig = _plot_prediction(df)
39 | st.plotly_chart(fig)
40 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Junki Asano, Masaya Kimura, Toma Tanaka, Katsuhiro Muto, Mizuho Yanage.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 6 | 
 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 | 


--------------------------------------------------------------------------------
/src/taxi_prediction/schema.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import pandera as pa
 4 | from pandera.typing import Index, Series
 5 | 
 6 | from .consts import MAX_PREDICT_DAYS
 7 | 
 8 | 
 9 | class TaxiDatasetSchema(pa.DataFrameModel):
10 |     date: Series[np.datetime64]
11 |     area: Series[pd.CategoricalDtype]
12 |     num_trip: Series[int] = pa.Field(ge=0)
13 | 
14 | 
15 | class InferInputSchema(pa.DataFrameModel):
16 |     date: Index[np.datetime64]
17 |     target_date: Index[np.datetime64]
18 |     area: Series[pd.CategoricalDtype]
19 |     num_trip: Series[int] = pa.Field(ge=0)
20 |     weekday: Series[int] = pa.Field(in_range={"min_value": 0, "max_value": 6})
21 |     target_lead: Series[int] = pa.Field(
22 |         in_range={"min_value": 1, "max_value": MAX_PREDICT_DAYS}
23 |     )
24 | 
25 |     @pa.dataframe_check
26 |     def target_date_is_consistent(cls, df: pd.DataFrame) -> Series[bool]:
27 |         """date + target_lead = target_date の関係式が
28 |         成り立っていることをチェックする
29 |         """
30 |         return df.index.get_level_values("date") + pd.to_timedelta(  # type: ignore
31 |             df["target_lead"], unit="D"
32 |         ) == df.index.get_level_values("target_date")
33 | 
34 |     class Config:
35 |         strict = True
36 |         coerce = True
37 | 
38 | 
39 | class TrainInputSchema(InferInputSchema):
40 |     target: Series[int] = pa.Field(ge=0)
41 | 
42 | 
43 | class InferOutputSchema(pa.DataFrameModel):
44 |     date: Index[np.datetime64]
45 |     target_date: Index[np.datetime64]
46 |     area: Series[pd.CategoricalDtype]
47 |     pred: Series[float]
48 | 


--------------------------------------------------------------------------------
/scripts/train_with_mlflow.py:
--------------------------------------------------------------------------------
 1 | from logging import getLogger
 2 | 
 3 | import hydra
 4 | import mlflow
 5 | from omegaconf import DictConfig
 6 | 
 7 | from taxi_prediction.model import LGBModel
 8 | from taxi_prediction.process import load_dataset, preprocess_for_train, split_dataset
 9 | 
10 | logger = getLogger(__name__)
11 | 
12 | 
13 | @hydra.main(config_path="conf", config_name="config")
14 | def main(config: DictConfig) -> None:
15 |     mlflow.set_tracking_uri("file:///app/mlruns")
16 |     mlflow.set_experiment("taxi_prediction")
17 | 
18 |     with mlflow.start_run():
19 |         mlflow.log_param("train_ratio", config.train_ratio)
20 |         mlflow.log_params(config.model)
21 |         mlflow.log_params(config.train)
22 |         mlflow.set_tag("model_type", "lightgbm")
23 | 
24 |         # データの読み込み
25 |         dataset = load_dataset(config.data_path)
26 | 
27 |         # 前処理
28 |         dataset_train, dataset_valid = split_dataset(dataset, config.train_ratio)
29 |         df_train = preprocess_for_train(dataset_train)
30 |         df_valid = preprocess_for_train(dataset_valid)
31 |         logger.info(f"train_size: {len(df_train)}")
32 |         logger.info(f"valid_size: {len(df_valid)}")
33 |         mlflow.log_table(df_train, "df_train.json")
34 |         mlflow.log_table(df_valid, "df_valid.json")
35 | 
36 |         # 学習
37 |         model = LGBModel(dict(config.model))
38 |         model.fit(df_train, df_valid, **config.train)
39 | 
40 |         # 予測・評価
41 |         scores = model.evaluate(df_valid)
42 |         logger.info(f"scores: {scores}")
43 |         mlflow.log_metrics(scores)
44 | 
45 |         # モデルの保存
46 |         model.save("model.pickle")
47 |         mlflow.log_artifact("model.pickle")
48 | 
49 | 
50 | if __name__ == "__main__":
51 |     main()
52 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | # 以下のドキュメントを参考に作成
 2 | # https://docs.astral.sh/uv/guides/integration/docker/#using-uv-in-docker
 3 | # https://github.com/astral-sh/uv-docker-example/blob/main/Dockerfile
 4 | 
 5 | FROM python:3.12-slim-bookworm
 6 | 
 7 | # パッケージのインストール
 8 | RUN apt-get update && \
 9 |     apt-get install --no-install-recommends -y \
10 |         ca-certificates curl fonts-ipafont-gothic gcc git locales sudo tmux tzdata vim zsh && \
11 |     apt-get clean && \
12 |     rm -rf /var/lib/apt/lists/*
13 | 
14 | # 言語設定
15 | RUN echo "ja_JP UTF-8" > /etc/locale.gen && \
16 |     locale-gen ja_JP.UTF-8
17 | ENV LANG=ja_JP.UTF-8
18 | ENV LC_ALL=ja_JP.UTF-8
19 | ENV TZ=Asia/Tokyo
20 | 
21 | # uvのインストール
22 | ADD https://astral.sh/uv/install.sh /uv-installer.sh
23 | RUN sh /uv-installer.sh && rm /uv-installer.sh
24 | ENV PATH="/root/.local/bin/:$PATH"
25 | 
26 | # システムのPythonを使用する
27 | # cf. https://docs.astral.sh/uv/concepts/projects/config/#project-environment-path
28 | ENV UV_PROJECT_ENVIRONMENT="/usr/local/"
29 | ENV UV_LINK_MODE=copy
30 | 
31 | WORKDIR /app
32 | 
33 | # 依存関係のインストール
34 | # Note: プロジェクトの依存関係はソースコードに比べて変更頻度が低いので、レイヤーを分けてキャッシュを効率的に利用する
35 | # cf. https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
36 | RUN --mount=type=cache,target=/root/.cache/uv \
37 |     --mount=type=bind,source=uv.lock,target=uv.lock \
38 |     --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
39 |     uv sync --frozen --no-install-project
40 | 
41 | # プロジェクトのソースコードをインストール
42 | COPY ./src/ /app/src/
43 | COPY ./README.md /app/
44 | RUN --mount=type=cache,target=/root/.cache/uv \
45 |     --mount=type=bind,source=uv.lock,target=uv.lock \
46 |     --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
47 |     uv sync --frozen
48 | 
49 | COPY ./app/. /app/app/
50 | COPY ./model/. /app/model/
51 | 
52 | CMD ["streamlit", "run", "app/app.py", "--server.port", "8501"]
53 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [project]
 2 | name = "taxi_prediction"
 3 | description = "『先輩データサイエンティストからの指南書』（技術評論社）のサンプルリポジトリ"
 4 | authors = [
 5 |     { name = "Junki Asano" },
 6 |     { name = "Toma Tanaka" },
 7 |     { name = "Katsuhiro Muto" },
 8 |     { name = "Masaya Kimura" },
 9 |     { name = "Mizuho Yanagi" },
10 | ]
11 | version = "1.0.0"
12 | readme = "README.md"
13 | license = "MIT"
14 | requires-python = ">=3.12,<3.13"
15 | dependencies = [
16 |     "chardet>=5.2.0",
17 |     "google-cloud-bigquery>=3.32.0",
18 |     "lightgbm>=4.6.0",
19 |     "numpy>=1.26.4",
20 |     "pandas>=2.2.3",
21 |     "plotly>=6.0.1",
22 |     "scikit-learn>=1.5.2",
23 |     "statsmodels>=0.14.4",
24 |     "streamlit>=1.45.1",
25 |     "tqdm>=4.67.1",
26 | ]
27 | 
28 | [dependency-groups]
29 | dev = [
30 |     "hydra-core>=1.3.2",
31 |     "japanize-matplotlib>=1.1.3",
32 |     "jupyter>=1.1.1",
33 |     "matplotlib>=3.10.3",
34 |     "mlflow>=2.22.0",
35 |     "mypy>=1.15.0",
36 |     "pandera>=0.23.1",
37 |     "pytest>=8.3.5",
38 |     "ruff>=0.11.9",
39 | ]
40 | 
41 | [build-system]
42 | requires = ["hatchling"]
43 | build-backend = "hatchling.build"
44 | 
45 | [tool.ruff]
46 | target-version = "py312"
47 | 
48 | [tool.ruff.lint]
49 | # 有効化するルール
50 | # cf. https://beta.ruff.rs/docs/rules/
51 | select = [
52 |     "B",  # flake8-bugbear
53 |     "E",  # pycodestyle error
54 |     "F",  # Pyflakes
55 |     "I",  # isort
56 |     "W",  # pycodestyle warning
57 | ]
58 | # 自動的に修正するルール
59 | fixable = ["E", "F", "I"]
60 | # 自動的に修正しないルール
61 | unfixable = ["W"]
62 | 
63 | [tool.mypy]
64 | python_version = "3.12"
65 | # チェック項目
66 | warn_unreachable = true  # 到達不可能なコードがある場合に警告
67 | disallow_untyped_defs = true  # 型アノテーションのない関数を許可しない
68 | warn_return_any = true  # 戻り値の型が不明な場合に警告
69 | ignore_missing_imports = true  # サードパーティー製のライブラリを無視
70 | # 表示関係
71 | pretty = true
72 | show_error_context = true
73 | 
74 | [tool.pytest]
75 | minversion = "6.0"
76 | addopts = "-svv --tb=short --capture=no --full-trace"
77 | testpaths = ["tests/*"]
78 | 
79 | [tool.pytest.ini_options]
80 | # importlib import mode
81 | # cf. https://docs.pytest.org/en/latest/explanation/goodpractices.html#choosing-a-test-layout
82 | addopts = [
83 |     "--import-mode=importlib",
84 | ]
85 | 


--------------------------------------------------------------------------------
/tests/taxi_prediction/test_process.py:
--------------------------------------------------------------------------------
 1 | import io
 2 | 
 3 | import pandas as pd
 4 | import pytest
 5 | from pandas.testing import assert_frame_equal
 6 | from pandera.typing import DataFrame
 7 | 
 8 | from taxi_prediction.process import preprocess_for_infer, preprocess_for_train
 9 | from taxi_prediction.schema import TaxiDatasetSchema
10 | 
11 | 
12 | @pytest.fixture
13 | def df_sample() -> DataFrame[TaxiDatasetSchema]:
14 |     csv_data = """
15 | date,area,num_trip
16 | 2024-01-01,1,5
17 | 2024-01-02,1,8
18 | 2024-01-01,2,20
19 | 2024-01-02,2,18
20 | """
21 |     df = pd.read_csv(io.StringIO(csv_data), parse_dates=["date"])
22 |     df["area"] = df["area"].astype("category")
23 |     return df  # type: ignore
24 | 
25 | 
26 | def test_preprocess_for_infer(df_sample: DataFrame[TaxiDatasetSchema]) -> None:
27 |     actual = preprocess_for_infer(df_sample, max_predict_days=3)
28 | 
29 |     expected_csv = """
30 | date,target_date,area,num_trip,weekday,target_lead
31 | 2024-01-01,2024-01-02,1,5,0,1
32 | 2024-01-01,2024-01-03,1,5,0,2
33 | 2024-01-01,2024-01-04,1,5,0,3
34 | 2024-01-02,2024-01-03,1,8,1,1
35 | 2024-01-02,2024-01-04,1,8,1,2
36 | 2024-01-02,2024-01-05,1,8,1,3
37 | 2024-01-01,2024-01-02,2,20,0,1
38 | 2024-01-01,2024-01-03,2,20,0,2
39 | 2024-01-01,2024-01-04,2,20,0,3
40 | 2024-01-02,2024-01-03,2,18,1,1
41 | 2024-01-02,2024-01-04,2,18,1,2
42 | 2024-01-02,2024-01-05,2,18,1,3
43 | """
44 |     expected = pd.read_csv(
45 |         io.StringIO(expected_csv), parse_dates=["date", "target_date"]
46 |     ).set_index(["date", "target_date"])
47 |     expected["area"] = expected["area"].astype("category")
48 | 
49 |     assert_frame_equal(actual, expected)
50 | 
51 | 
52 | def test_preprocess_for_train(df_sample: DataFrame[TaxiDatasetSchema]) -> None:
53 |     actual = preprocess_for_train(df_sample, max_predict_days=3)
54 | 
55 |     expected_csv = """
56 | date,target_date,area,num_trip,weekday,target_lead,target
57 | 2024-01-01,2024-01-02,1,5,0,1,8
58 | 2024-01-01,2024-01-02,2,20,0,1,18
59 | """
60 |     expected = pd.read_csv(
61 |         io.StringIO(expected_csv), parse_dates=["date", "target_date"]
62 |     ).set_index(["date", "target_date"])
63 |     expected["area"] = expected["area"].astype("category")
64 | 
65 |     assert_frame_equal(actual, expected, check_dtype=False)
66 | 


--------------------------------------------------------------------------------
/src/taxi_prediction/model.py:
--------------------------------------------------------------------------------
  1 | import pickle
  2 | from pathlib import Path
  3 | from typing import Any, Self
  4 | 
  5 | import lightgbm as lgb
  6 | import pandera as pa
  7 | from pandera.typing import DataFrame
  8 | from sklearn.metrics import mean_absolute_error
  9 | 
 10 | from .schema import InferInputSchema, InferOutputSchema, TrainInputSchema
 11 | 
 12 | 
 13 | class LGBModel:
 14 |     """
 15 |     LightGBMのラッパークラス
 16 |     """
 17 | 
 18 |     def __init__(self, params: dict[str, Any]) -> None:
 19 |         self._params = params.copy()
 20 |         self._model: lgb.Booster | None = None
 21 | 
 22 |     @pa.check_types
 23 |     def fit(
 24 |         self,
 25 |         df_train: DataFrame[TrainInputSchema],
 26 |         df_valid: DataFrame[TrainInputSchema],
 27 |         num_boost_round: int = 1000,
 28 |         early_stopping_rounds: int = 10,
 29 |     ) -> Self:
 30 |         """
 31 |         モデルを学習する
 32 |         """
 33 |         data_train = lgb.Dataset(
 34 |             data=df_train.drop(columns=["target"]), label=df_train["target"]
 35 |         )
 36 |         data_valid = lgb.Dataset(
 37 |             data=df_valid.drop(columns=["target"]), label=df_valid["target"]
 38 |         )
 39 | 
 40 |         self._model = lgb.train(
 41 |             params=self._params,
 42 |             train_set=data_train,
 43 |             valid_sets=[data_train, data_valid],
 44 |             valid_names=["train", "valid"],
 45 |             num_boost_round=num_boost_round,
 46 |             callbacks=[
 47 |                 lgb.early_stopping(stopping_rounds=early_stopping_rounds),
 48 |                 lgb.log_evaluation(period=100),
 49 |             ],
 50 |         )
 51 | 
 52 |         return self
 53 | 
 54 |     @pa.check_types
 55 |     def predict(self, df: DataFrame[InferInputSchema]) -> DataFrame[InferOutputSchema]:
 56 |         """
 57 |         予測を実行する
 58 |         """
 59 |         if self._model is None:
 60 |             raise ValueError("Model has not been trained.")
 61 | 
 62 |         pred = self._model.predict(df, num_iteration=self._model.best_iteration)
 63 | 
 64 |         columns = list(InferOutputSchema.to_schema().columns.keys())
 65 |         return df.assign(pred=pred)[columns]  # type: ignore
 66 | 
 67 |     @pa.check_types
 68 |     def evaluate(self, df: DataFrame[TrainInputSchema]) -> dict[str, float]:
 69 |         """
 70 |         モデルの予測精度を評価する
 71 |         評価指標を格納した辞書を返す
 72 |         """
 73 |         target = df["target"]
 74 |         pred = self.predict(df.drop(columns=["target"]))["pred"]
 75 | 
 76 |         scores = {}
 77 |         scores["mae"] = mean_absolute_error(target, pred)
 78 | 
 79 |         return scores
 80 | 
 81 |     def save(self, filepath: str | Path) -> None:
 82 |         """
 83 |         モデルをpickleで保存する
 84 |         """
 85 |         with open(filepath, "wb") as f:
 86 |             pickle.dump(self, f)
 87 | 
 88 |     @classmethod
 89 |     def load(cls, filepath: str | Path) -> Self:
 90 |         """
 91 |         pickleからモデルのインスタンスを読み込む
 92 |         """
 93 |         with open(filepath, "rb") as file:
 94 |             model = pickle.load(file)
 95 | 
 96 |         if not isinstance(model, cls):
 97 |             raise TypeError(
 98 |                 f"Loaded object type does not match expected type. "
 99 |                 f"Expected: {cls.__name__}, Actual: {type(model).__name__}"
100 |             )
101 | 
102 |         return model
103 | 


--------------------------------------------------------------------------------
/app/app.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | from pathlib import Path
  3 | 
  4 | import pandas as pd
  5 | import pandera as pa
  6 | import plotly.express as px
  7 | import plotly.graph_objects as go
  8 | import streamlit as st
  9 | from pandera.typing import DataFrame
 10 | 
 11 | from taxi_prediction.model import LGBModel
 12 | from taxi_prediction.process import postprocess, preprocess_for_infer
 13 | from taxi_prediction.schema import TaxiDatasetSchema
 14 | 
 15 | 
 16 | @st.cache_resource
 17 | def load_model(model_path: str | Path) -> LGBModel:
 18 |     """学習済みモデルを読み込む"""
 19 |     return LGBModel.load(model_path)
 20 | 
 21 | 
 22 | @st.cache_data
 23 | @pa.check_types
 24 | def inference_usecase(
 25 |     df: DataFrame[TaxiDatasetSchema],
 26 |     model_path: str | Path,
 27 |     predict_start_date: datetime.date,
 28 | ) -> DataFrame[TaxiDatasetSchema]:
 29 |     """推論のワークフロー。前処理から予測までの一連の処理を行う"""
 30 |     df_processed = preprocess_for_infer(df)
 31 |     model = load_model(model_path)
 32 |     df_pred = model.predict(df_processed)
 33 |     return postprocess(df_pred, predict_date=predict_start_date)
 34 | 
 35 | 
 36 | def _filter_by_area(df: pd.DataFrame, list_selected_area: list[str]) -> pd.DataFrame:
 37 |     """ユーザーが入力したエリアでデータをフィルタリングする
 38 | 
 39 |     Note: list_selected_areaが空の場合はすべてのエリアを選択する
 40 |     """
 41 |     if len(list_selected_area) == 0:
 42 |         return df
 43 |     return df[df["area"].isin(list_selected_area)]
 44 | 
 45 | 
 46 | def _filter_by_display_period(df: pd.DataFrame, display_period: int) -> pd.DataFrame:
 47 |     """最新の日付から指定された表示期間（日数）分のデータを抽出する"""
 48 |     return df[df["date"] > df["date"].max() - pd.Timedelta(days=display_period)]
 49 | 
 50 | 
 51 | def _plot_prediction(df: pd.DataFrame) -> go.Figure:
 52 |     """予測結果をグラフ化する"""
 53 |     fig = px.line(
 54 |         df, x="date", y="num_trip", color="area", markers=True, line_dash="label"
 55 |     )
 56 |     fig.update_layout(
 57 |         title="乗車数の推移",
 58 |         xaxis_title="日付",
 59 |         yaxis_title="乗車数",
 60 |         legend_title="エリア, ラベル",
 61 |     )
 62 |     return fig
 63 | 
 64 | 
 65 | def main() -> None:
 66 |     model_path = "model/model.pickle"
 67 | 
 68 |     st.title("タクシーの乗車数予測")
 69 |     st.write("タクシーの乗車数を予測するプロトタイプです。")
 70 |     st.header("予測用ファイルのアップロード")
 71 | 
 72 |     uploaded_file = st.file_uploader(
 73 |         "予測用ファイルをアップロードしてください", type="csv"
 74 |     )
 75 | 
 76 |     if uploaded_file is not None:
 77 |         df_upload = pd.read_csv(uploaded_file, parse_dates=["date"])
 78 |         df_upload["area"] = df_upload["area"].astype("category")
 79 | 
 80 |         st.header("予測結果の表示")
 81 |         list_selected_area: list[str] = st.multiselect(
 82 |             "エリアを選択", df_upload["area"].unique()
 83 |         )
 84 |         display_period = st.slider(
 85 |             "実績データの表示期間（日）", min_value=1, max_value=100, value=30
 86 |         )
 87 | 
 88 |         df_upload = _filter_by_area(df_upload, list_selected_area)
 89 |         df_upload = _filter_by_display_period(df_upload, display_period)
 90 | 
 91 |         df_pred = inference_usecase(
 92 |             df_upload, model_path=model_path, predict_start_date=df_upload["date"].max()
 93 |         )
 94 | 
 95 |         df_upload["label"] = "実績"
 96 |         df_pred["label"] = "予測"
 97 | 
 98 |         df_concat = pd.concat([df_upload, df_pred], ignore_index=True)
 99 | 
100 |         fig = _plot_prediction(df_concat)
101 |         st.plotly_chart(fig)
102 | 
103 | 
104 | if __name__ == "__main__":
105 |     main()
106 | 


--------------------------------------------------------------------------------
/src/taxi_prediction/process.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | from pathlib import Path
  3 | 
  4 | import pandas as pd
  5 | import pandera as pa
  6 | from pandera.typing import DataFrame
  7 | 
  8 | from .consts import MAX_PREDICT_DAYS
  9 | from .schema import (
 10 |     InferInputSchema,
 11 |     InferOutputSchema,
 12 |     TaxiDatasetSchema,
 13 |     TrainInputSchema,
 14 | )
 15 | 
 16 | 
 17 | @pa.check_types
 18 | def load_dataset(filepath: str | Path) -> DataFrame[TaxiDatasetSchema]:
 19 |     """
 20 |     タクシー乗降数のデータセットを読み込む
 21 |     """
 22 |     df = pd.read_csv(filepath, parse_dates=["date"])
 23 |     df["area"] = df["area"].astype("category")
 24 |     return df  # type: ignore
 25 | 
 26 | 
 27 | @pa.check_types
 28 | def split_dataset(
 29 |     df: DataFrame[TaxiDatasetSchema], train_ratio: float
 30 | ) -> tuple[pd.DataFrame, pd.DataFrame]:
 31 |     """
 32 |     データセットを時系列に沿って2つに分割する
 33 |     """
 34 |     df = df.sort_values("date").reset_index(drop=True)
 35 |     split_idx = int(len(df) * train_ratio)
 36 |     return df.iloc[:split_idx], df.iloc[split_idx:]
 37 | 
 38 | 
 39 | @pa.check_types
 40 | def preprocess_for_infer(
 41 |     df: DataFrame[TaxiDatasetSchema], max_predict_days: int = MAX_PREDICT_DAYS
 42 | ) -> DataFrame[InferInputSchema]:
 43 |     """
 44 |     特徴量を加工し推論用のデータを返す
 45 |     """
 46 |     # 特徴量を追加
 47 |     df = df.assign(weekday=df["date"].dt.weekday.astype(int))
 48 | 
 49 |     df_result = pd.DataFrame()
 50 |     for lead in range(1, max_predict_days + 1):
 51 |         # 予測対象日は何日後か (target_lead) と予測対象日 (target_date) を付加
 52 |         df_sub = df.assign(
 53 |             target_lead=lead,
 54 |             target_date=df["date"] + pd.Timedelta(days=lead),
 55 |         )
 56 |         df_result = pd.concat([df_result, df_sub])
 57 | 
 58 |     df_result = df_result.sort_values(["area", "date", "target_date"]).set_index(
 59 |         ["date", "target_date"]
 60 |     )
 61 | 
 62 |     return df_result  # type: ignore
 63 | 
 64 | 
 65 | @pa.check_types
 66 | def preprocess_for_train(
 67 |     df: DataFrame[TaxiDatasetSchema], max_predict_days: int = MAX_PREDICT_DAYS
 68 | ) -> DataFrame[TrainInputSchema]:
 69 |     """
 70 |     推論用データに目的変数を追加した学習用データを返す
 71 |     """
 72 |     df_result = preprocess_for_infer(
 73 |         df, max_predict_days=max_predict_days
 74 |     ).reset_index()
 75 | 
 76 |     # 目的変数を付加
 77 |     df_target = df[["area", "date", "num_trip"]].rename(
 78 |         columns={"date": "target_date", "num_trip": "target"}
 79 |     )
 80 |     df_result = (
 81 |         df_result.merge(
 82 |             df_target, on=["area", "target_date"], how="left", validate="m:1"
 83 |         )
 84 |         .dropna(subset=["target"])
 85 |         .convert_dtypes()
 86 |     )
 87 | 
 88 |     df_result = df_result.sort_values(["area", "date", "target_date"]).set_index(
 89 |         ["date", "target_date"]
 90 |     )
 91 | 
 92 |     return df_result  # type: ignore
 93 | 
 94 | 
 95 | @pa.check_types
 96 | def postprocess(
 97 |     df: DataFrame[InferOutputSchema],
 98 |     predict_date: datetime.date,
 99 |     max_predict_days: int = MAX_PREDICT_DAYS,
100 | ) -> DataFrame[TaxiDatasetSchema]:
101 |     """
102 |     モデルの出力を後処理する
103 |     predict_dateからMAX_PREDICT_DAYS日後までの予測を元のデータセットと同様の形式で取得する
104 |     """
105 |     df = df.reset_index()
106 | 
107 |     # predict_dateからMAX_PREDICT_DAYS日後までの予測に対応する部分を抽出
108 |     predict_datetime = pd.to_datetime(predict_date)
109 |     df_filtered = df[
110 |         (df["date"] == predict_datetime)
111 |         & (df["target_date"] >= predict_datetime + pd.Timedelta(days=1))
112 |         & (df["target_date"] <= predict_datetime + pd.Timedelta(days=max_predict_days))
113 |     ]
114 | 
115 |     # TaxiDatasetSchemaの形式に変換
116 |     df_result = (
117 |         df_filtered[["target_date", "area", "pred"]]
118 |         .rename(columns={"target_date": "date", "pred": "num_trip"})
119 |         .sort_values(by=["area", "date"])
120 |         .reset_index(drop=True)
121 |     )
122 |     df_result["num_trip"] = df_result["num_trip"].clip(lower=0).astype(int)
123 | 
124 |     # すべてのエリアに対して予測値が揃っているかチェック
125 |     expected_nrows = df["area"].nunique() * max_predict_days
126 |     if len(df_result) != expected_nrows:
127 |         raise ValueError(
128 |             "Number of extracted rows does not match the expected value. "
129 |             f"Expected: {expected_nrows}, Actual: {len(df_result)}"
130 |         )
131 | 
132 |     return df_result  # type: ignore
133 | 


--------------------------------------------------------------------------------
/.devcontainer/devcontainer.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "name": "Existing Docker Compose (Extend)",
  3 |   "dockerComposeFile": ["../compose.yml"],
  4 |   "service": "app",
  5 |   "workspaceFolder": "/app",
  6 |   "shutdownAction": "stopCompose",
  7 |   "overrideCommand": true,
  8 |   "customizations": {
  9 |     "vscode": {
 10 |       "settings": {
 11 |         // Pythonのデフォルトのインタプリタパス
 12 |         "python.defaultInterpreterPath": "/usr/local/bin/python",
 13 |         "[python]": {
 14 |           // ファイルの保存時にフォーマットを実行する
 15 |           "editor.formatOnSave": true,
 16 |           // ファイルの保存時のアクションを設定する
 17 |           "editor.codeActionsOnSave": {
 18 |             // リンター機能の実行し、修正可能なエラーを修正する
 19 |             "source.fixAll.ruff": "explicit",
 20 |             // インポート文の整理を行う
 21 |             "source.organizeImports.ruff": "explicit"
 22 |           },
 23 |           // Ruffをフォーマッターとして利用する
 24 |           "editor.defaultFormatter": "charliermarsh.ruff"
 25 |         },
 26 |         // Notebookの保存時にフォーマットを実行する
 27 |         "notebook.formatOnSave.enabled": true,
 28 |         // Notebookの保存時のアクションを設定する
 29 |         "notebook.codeActionsOnSave": {
 30 |           // リンター機能の実行し、修正可能なエラーを修正する
 31 |           "notebook.source.fixAll": "explicit",
 32 |           // インポート文の整理を行う
 33 |           "notebook.source.organizeImports": "explicit"
 34 |         },
 35 |         "autoDocstring.docstringFormat": "google",
 36 |         "python.testing.pytestArgs": [
 37 |           "tests",
 38 |           "-vv", // 詳細結果の出力
 39 |           "-s" // print文の出力
 40 |         ],
 41 |         "python.testing.unittestEnabled": false,
 42 |         "python.testing.pytestEnabled": true,
 43 |         // コードセルの解析時にシェル割り当て(#!)、行マジック(#!%)およびセルマジック(#!%%)のコメントを解除します。
 44 |         "jupyter.interactiveWindow.textEditor.magicCommandsAsComments": true,
 45 |         // デバッグするときにライブラリコードにステップインできるようにする。
 46 |         "jupyter.debugJustMyCode": false,
 47 |         // 引数名を表示
 48 |         "python.analysis.inlayHints.callArgumentNames": "all",
 49 |         // 型を表示
 50 |         "python.analysis.inlayHints.variableTypes": true,
 51 |         // 戻り値の型を表示
 52 |         "python.analysis.inlayHints.functionReturnTypes": true,
 53 |         // デバッグ設定
 54 |         "launch": {
 55 |           "version": "0.2.0",
 56 |           "configurations": [
 57 |             // StreamlitでWebアプリケーションとして実行しながらデバッグする
 58 |             {
 59 |               "name": "Streamlit: サーバーを起動",
 60 |               "type": "debugpy",
 61 |               "request": "launch",
 62 |               "module": "streamlit",
 63 |               "cwd": "${workspaceFolder}",
 64 |               "console": "internalConsole",
 65 |               "args": [
 66 |                 "run",
 67 |                 "app/app.py",
 68 |                 "--server.port",
 69 |                 "8501",
 70 |                 "--server.runOnSave",
 71 |                 "true"
 72 |               ],
 73 |               "justMyCode": false
 74 |             },
 75 |             // 現在のファイル全体を実行してデバッグする
 76 |             {
 77 |               "name": "Python: Current File",
 78 |               "type": "debugpy",
 79 |               "request": "launch",
 80 |               "program": "${file}",
 81 |               "console": "internalConsole",
 82 |               "justMyCode": false
 83 |             },
 84 |             // テストコードを書いてテストコードの範囲内（単体テストの場合は関数単位）でデバッグする
 85 |             {
 86 |               "name": "Python: Debug Test",
 87 |               "type": "debugpy",
 88 |               "request": "launch",
 89 |               "purpose": ["debug-test"],
 90 |               "console": "internalConsole",
 91 |               "justMyCode": false
 92 |             }
 93 |           ]
 94 |         }
 95 |       },
 96 |       "extensions": [
 97 |         // 開発支援
 98 |         "ceintl.vscode-language-pack-ja",
 99 |         "github.copilot",
100 |         "usernamehw.errorlens",
101 |         "gruntfuggly.todo-tree",
102 |         "ibm.output-colorizer",
103 |         "editorconfig.editorconfig",
104 |         "mosapride.zenkaku",
105 |         "ionutvmi.path-autocomplete",
106 |         // python関連
107 |         "ms-python.python",
108 |         "charliermarsh.ruff",
109 |         "ms-python.mypy-type-checker",
110 |         "njpwerner.autodocstring",
111 |         "ms-toolsai.jupyter",
112 |         "ms-toolsai.jupyter-keymap",
113 |         // Markdown関連
114 |         "yzane.markdown-pdf",
115 |         // CSV関連
116 |         "janisdd.vscode-edit-csv",
117 |         "mechatroner.rainbow-csv",
118 |         // Git関連
119 |         "mhutchie.git-graph"
120 |       ]
121 |     }
122 |   }
123 | }
124 | 


--------------------------------------------------------------------------------
/docs/setup.md:
--------------------------------------------------------------------------------
  1 | # 本書で必要な環境構築
  2 | 
  3 | 本書では、ハンズオン形式で実際にコードを動かしながら学習を進めることができます。以下では、ハンズオン実施のために必要な環境構築の手順を記載します。この準備を行うことで、4章のデータ確認から5章の実験管理、6章のプロトタイプ開発までの一連の流れを、実践的に学ぶことができます。
  4 | 
  5 | 環境構築には、主に以下の3つのツールを使用します。
  6 | 
  7 | - **Visual Studio Code (VS Code)**：軽量で拡張性に優れた統合開発環境 (IDE) で、多くのプログラミング言語やツールに対応しています。
  8 | - **Docker**：アプリケーションの実行環境をコンテナとしてパッケージ化し、他の開発者と同じ環境を簡単に再現できる仕組みを提供します。
  9 | - **Git**：ソースコードや設定ファイルを管理するための分散型バージョン管理システムです。
 10 | 
 11 | それでは環境構築を実施していきましょう。
 12 | 
 13 | ## 環境構築の手順
 14 | 
 15 | 以下の手順で環境構築を実施します。
 16 | 
 17 | 1. 各種ツールのインストール
 18 | 2. リポジトリのクローン
 19 | 3. 必要なファイルのダウンロード
 20 | 4. 環境変数の設定ファイルの用意
 21 | 5. Dev Containerでの開発環境の立ち上げ
 22 | 
 23 | 
 24 | ### 1. 各種ツールのインストール
 25 | 
 26 | ##### VS Codeのインストール
 27 | 
 28 | 本書では、IDE（統合開発環境）としてVisual Studio Code（以下VS Code）を採用しています。VS CodeはMicrosoft社が開発しているオープンソースのIDEで、他のIDEと比べて起動速度が速く、動作が軽量です。また拡張機能も豊富に揃っているため、効率的に開発を進めることができます。
 29 | 
 30 | VS Codeの公式サイト[^vscode-home]からお使いのOSに応じたインストーラーをダウンロードし、端末にインストールを行ってください（図1）。
 31 | 
 32 | [^vscode-home]: https://code.visualstudio.com/
 33 | 
 34 | <p align="center">
 35 |   <img src="images/vscode_download_for_mac.png" alt="vscode" width="500"/>
 36 | </p>
 37 | <div align="center">
 38 |   ▲図1／VS Codeのホームページ（macOSの場合）
 39 | </div>
 40 | <br>
 41 | 
 42 | ##### Docker Desktopのインストール
 43 | 
 44 | 続いて、Dockerを使用するための準備を行います。Dockerは、アプリケーションを動かすための環境を「コンテナ」としてまとめて管理、実行できるツールです。
 45 | 
 46 | 本書ではDocker DesktopをインストールしてDockerの実行環境を構築します[^docker-caution]。Docker DesktopはGUI、Docker Engine、Docker CLI、Docker Composeなどのツールが一つにまとまったパッケージであり、Docker Desktopをインストールするだけで本書で必要な準備が整います。
 47 | 
 48 | [^docker-caution]: Docker Desktopは、個人利用および中小企業であれば無料で利用可能です。
 49 | ただし、従業員数が250人以上または年間収益が1,000万ドルを超える企業で商用利用する場合は、有料のサブスクリプションへの加入が必要となります。
 50 | 
 51 | Dockerの公式サイト[^docker-home]からお使いのOSに応じたインストーラーをダウンロードし、端末にインストールを行ってください（図2）。
 52 | 
 53 | [^docker-home]: https://www.docker.com/ja-jp/get-started/
 54 | 
 55 | <p align="center">
 56 |   <img src="images/docker_desktop.png" alt="docker desktop" width="500"/>
 57 | </p>
 58 | <div align="center">
 59 |   ▲図2／Dockerのホームページ
 60 | </div>
 61 | <br>
 62 | 
 63 | Docker Desktopのインストールが完了したら、起動を行ってください。初回起動時にはいくつかの確認事項が表示されますが、図3の画面が表示されれば、セットアップは完了です。
 64 | 
 65 | <p align="center">
 66 |   <img src="images/docker_desktop_complete.png" alt="docker desktop" width="500"/>
 67 | </p>
 68 | <div align="center">
 69 |   ▲図3／Docker Desktopの起動画面
 70 | </div>
 71 | <br>
 72 | 
 73 | ##### Gitのインストール
 74 | 
 75 | 続いてGitのインストールを行います。Gitはコードのバージョン管理を行うためのツールです。
 76 | 
 77 | - Windowsの場合
 78 | 
 79 | Windows用のGitのダウンロードサイト[^git-download-win]からインストーラーをダウンロードし、端末にインストールを行ってください（図4）。
 80 | 
 81 | [^git-download-win]: https://git-scm.com/downloads/win
 82 | 
 83 | <p align="center">
 84 |   <img src="images/git_download_for_windows.png" alt="git" width="450"/>
 85 | </p>
 86 | <div align="center">
 87 |   ▲図4／Windows用Gitのダウンロードサイト
 88 | </div>
 89 | <br>
 90 | 
 91 | なお、コマンドプロンプトまたはPowerShellで`winget`コマンドが利用できる場合は、以下のコマンドを実行してインストールすることも可能です。
 92 | 
 93 | ```bash
 94 | winget install --id Git.Git -e --source winget
 95 | ```
 96 | 
 97 | - macOSの場合
 98 | 
 99 | macOS用のパッケージマネージャーであるHomebrew[^homebrew-home]を通してインストールを行います。お使いの端末にHomebrewがインストールされていない場合は、ターミナルから以下のコマンドを実行してインストールを行ってください。
100 | 
101 | ```bash
102 | # Homebrewのインストール
103 | /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
104 | 
105 | # パスを通す
106 | echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
107 | eval "$(/opt/homebrew/bin/brew shellenv)"
108 | ```
109 | [^homebrew-home]: https://brew.sh/ja/
110 | 
111 | Homebrewがインストールされたことを確認したのち、以下のコマンドを実行してGitのインストールを行なってください。
112 | ```bash
113 | brew install git
114 | ```
115 | 
116 | なお、その他のインストール方法について確認したい場合は、macOS用のGitのダウンロードサイト[^git-download-mac]から確認してください。
117 | 
118 | [^git-download-mac]: https://git-scm.com/downloads/mac
119 | 
120 | ### 2. リポジトリのクローン
121 | 
122 | 本書で提供しているリポジトリをローカル環境にクローンします。適当なディレクトリに移動して以下のコマンドを実行してください。
123 | 
124 | ```bash
125 | git clone https://github.com/eycjur/ds_instructions_guide.git
126 | ```
127 | 
128 | クローンが完了したら、リポジトリをVS Codeで開いてください。
129 | 
130 | ### 3. 必要なファイルのダウンロード
131 | 
132 | 本書で使用するファイルをダウンロードします。以下のリンク先から、学習済みモデル`model.pickle`および、サンプルデータ`taxi_dataset.csv`と`taxi_dataset_for_upload.csv`の3つのファイルを取得してください（図5）。
133 | - https://github.com/eycjur/ds_instructions_guide/releases
134 | 
135 | <p align="center">
136 |   <img src="images/data_download.png" alt="data download" width="500"/>
137 | </p>
138 | <div align="center">
139 |   ▲図5／必要なファイルのダウンロード
140 | </div>
141 | <br>
142 | 
143 | `model.pickle`はリポジトリ内の`model`ディレクトリに保存してください。`taxi_dataset.csv`はリポジトリ内の`data`ディレクトリに保存してください。`taxi_dataset_for_upload.csv`は6章でStreamlitアプリケーションにアップロードして使用します。
144 | 
145 | ### 4. 環境変数の設定ファイルの用意
146 | 
147 | リポジトリのルートディレクトリに存在する`.env.example`ファイルをコピーして、同一の階層に`.env`という名前のファイルを作成してください。
148 | ファイルの内容は`.env.example`と同一のままで問題ありません。
149 | 
150 | ### 5. Dev Containerでの開発環境の立ち上げ
151 | まず、開発環境の構築に必要な拡張機能「Dev Containers」のインストールを行います。
152 | VS Codeの左側にあるサイドバーから「拡張機能 (Extensions) 」アイコンをクリックして、拡張機能の一覧を開きましょう。検索バーが現れますので、「Dev Containers」と入力します。結果に表示された「Dev Containers」をインストールしてください（図6）。
153 | 
154 | <div style="display: flex; justify-content: center; gap: 20px;" align="center">
155 |   <img src="images/vscode_拡張機能.png" alt="vscode" width="250"/>
156 |   <img src="images/vscode_拡張機能_検索.png" alt="vscode" width="250"/>
157 | </div>
158 | <div align="center">
159 |   ▲図6／拡張機能「Dev Containers」のインストール
160 | </div>
161 | <br>
162 | 
163 | リポジトリを開くと、VS Codeの右下に図7のようなポップアップが表示されます。「コンテナーで再度開く (Reopen in Container) 」をクリックすると、Dev Containerが起動し、開発環境が自動で構築されます。
164 | 
165 | <p align="center">
166 |   <img src="images/devcontainer_起動ポップアップ.png" alt="devcontainer_起動ポップアップ" width="500"/>
167 | </p>
168 | <div align="center">
169 |   ▲図7／Dev Container起動時のポップアップ
170 | </div>
171 | <br>
172 | 
173 | もし前述のようなポップアップが表示されない場合は、VS Codeのコマンドパレット（Windows: `Ctrl + Shift + P`、macOS: `Command + Shift + P`）を開き、「Dev Containers: Reopen in Container」を選択することで、同様に起動することができます。
174 | 
175 | VS Codeの左下に「Dev Container: ...」と表示され、サイドバーのエクスプローラー (Explorer) に各種ファイルが表示されていれば、本書で使用する開発環境の構築は完了です（図8）。
176 | 
177 | <p align="center">
178 |   <img src="images/devcontainer_complete.png" alt="devcontainer complete" width="500"/>
179 | </p>
180 | <div align="center">
181 |   ▲図8／環境構築完了時の画面
182 | </div>
183 | <br>
184 | 


--------------------------------------------------------------------------------