├── data
└── .gitkeep
├── model
└── .gitkeep
├── outputs
└── .gitkeep
├── tests
├── __init__.py
├── taxi_prediction
│ ├── __init__.py
│ └── test_process.py
└── test_double.py
├── src
└── taxi_prediction
│ ├── __init__.py
│ ├── consts.py
│ ├── py.typed
│ ├── schema.py
│ ├── model.py
│ └── process.py
├── .streamlit
└── config.toml
├── app
├── streamlit_sample
│ ├── 00_simple.py
│ ├── 02_markdown.py
│ ├── 01_text_header.py
│ ├── 05_user_input.py
│ ├── 07_no_cache.py
│ ├── 06_file_upload.py
│ ├── 08_cache.py
│ ├── 03_dataframe.py
│ └── 04_plot.py
└── app.py
├── docs
├── images
│ ├── vscode_拡張機能.png
│ ├── data_download.png
│ ├── docker_desktop.png
│ ├── vscode_拡張機能_検索.png
│ ├── devcontainer_complete.png
│ ├── devcontainer_起動ポップアップ.png
│ ├── docker_desktop_complete.png
│ ├── vscode_download_for_mac.png
│ └── git_download_for_windows.png
└── setup.md
├── .github
├── PULL_REQUEST_TEMPLATE.md
└── ISSUE_TEMPLATE
│ └── bug_report.yml
├── .env.example
├── compose.yml
├── scripts
├── python_interactive_window.py
├── conf
│ └── config.yaml
├── train.py
└── train_with_mlflow.py
├── .gitignore
├── README.md
├── LICENSE
├── Dockerfile
├── pyproject.toml
└── .devcontainer
└── devcontainer.json
/data/.gitkeep:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/model/.gitkeep:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/outputs/.gitkeep:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/src/taxi_prediction/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/tests/taxi_prediction/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/.streamlit/config.toml:
--------------------------------------------------------------------------------
1 | [browser]
2 | gatherUsageStats = false
--------------------------------------------------------------------------------
/src/taxi_prediction/consts.py:
--------------------------------------------------------------------------------
1 | MAX_PREDICT_DAYS = 7 # 何日先まで予測するか
2 |
--------------------------------------------------------------------------------
/src/taxi_prediction/py.typed:
--------------------------------------------------------------------------------
1 | # Marker file for PEP 561. The mypy package uses inline types.
--------------------------------------------------------------------------------
/app/streamlit_sample/00_simple.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 |
3 | st.write("Hello, Streamlit!")
4 |
--------------------------------------------------------------------------------
/docs/images/vscode_拡張機能.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/vscode_拡張機能.png
--------------------------------------------------------------------------------
/docs/images/data_download.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/data_download.png
--------------------------------------------------------------------------------
/docs/images/docker_desktop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/docker_desktop.png
--------------------------------------------------------------------------------
/docs/images/vscode_拡張機能_検索.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/vscode_拡張機能_検索.png
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | ## レビュー期日
2 |
3 | ## 何を達成したいのか、なぜその変更をしたか
4 |
5 | ## 特にレビューして欲しい箇所
6 |
7 | ## 関連情報やissuesなど(あれば)
8 |
--------------------------------------------------------------------------------
/docs/images/devcontainer_complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/devcontainer_complete.png
--------------------------------------------------------------------------------
/docs/images/devcontainer_起動ポップアップ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/devcontainer_起動ポップアップ.png
--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------
1 | # アプリケーションの設定
2 | CONTAINER_NAME=ds_instructions_guide
3 | # 複数のdocker-composeで干渉するため、ポートはプロジェクトごとに変更することを推奨します
4 | PORT=8501
5 |
--------------------------------------------------------------------------------
/docs/images/docker_desktop_complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/docker_desktop_complete.png
--------------------------------------------------------------------------------
/docs/images/vscode_download_for_mac.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/vscode_download_for_mac.png
--------------------------------------------------------------------------------
/docs/images/git_download_for_windows.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eycjur/ds_instructions_guide/HEAD/docs/images/git_download_for_windows.png
--------------------------------------------------------------------------------
/app/streamlit_sample/02_markdown.py:
--------------------------------------------------------------------------------
1 | import streamlit as st # noqa
2 |
3 | """
4 | # タクシーの乗車数予測
5 | タクシーの乗車数を予測するプロトタイプです。
6 | ## 予測用ファイルのアップロード
7 | """
8 |
--------------------------------------------------------------------------------
/app/streamlit_sample/01_text_header.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 |
3 | st.title("タクシーの乗車数予測")
4 | st.write("タクシーの乗車数を予測するプロトタイプです。")
5 | st.header("予測用ファイルのアップロード")
6 |
--------------------------------------------------------------------------------
/app/streamlit_sample/05_user_input.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 |
3 | list_selected_area: list[str] = st.multiselect("エリアを選択", ["1", "3"])
4 | display_period = st.slider("表示期間(日)", min_value=1, max_value=100, value=30)
5 |
6 | st.write(f"選択したエリア: {list_selected_area}、表示期間: {display_period}日")
7 |
--------------------------------------------------------------------------------
/compose.yml:
--------------------------------------------------------------------------------
1 | services:
2 | app:
3 | build:
4 | context: ./
5 | dockerfile: Dockerfile
6 | container_name: ${CONTAINER_NAME}
7 | volumes:
8 | - ./:/app:cached
9 | working_dir: /app
10 | env_file:
11 | - .env
12 | ports:
13 | - 127.0.0.1:${PORT}:8501
14 | tty: true
15 |
--------------------------------------------------------------------------------
/app/streamlit_sample/07_no_cache.py:
--------------------------------------------------------------------------------
1 | import time
2 |
3 | import streamlit as st
4 |
5 |
6 | def heavy_process() -> str:
7 | time.sleep(3)
8 | return "heavy process"
9 |
10 |
11 | display_period = st.slider("表示期間(日)", min_value=1, max_value=100, value=30)
12 | heavy_process()
13 | st.write(f"表示期間: {display_period}日")
14 |
--------------------------------------------------------------------------------
/scripts/python_interactive_window.py:
--------------------------------------------------------------------------------
1 | # %%[markdown]
2 | # マークダウンセルを使うこともできます
3 | # # Header 1
4 | # - list1
5 | # - list2
6 |
7 | # %%
8 | import matplotlib.pyplot as plt
9 | import numpy as np
10 |
11 | # %%
12 | print("Hello, world!")
13 |
14 | # %%
15 | x = np.linspace(0, 10, 100)
16 | y = np.sin(x)
17 | plt.plot(x, y)
18 |
19 | # %%
20 |
--------------------------------------------------------------------------------
/tests/test_double.py:
--------------------------------------------------------------------------------
1 | def double(x: int) -> int:
2 | """与えられた整数を2倍にする"""
3 | return x * 2
4 |
5 |
6 | def test_double_ok() -> None:
7 | """double関数が正しく動作することを確認する正常系のテスト"""
8 | assert 2 == double(1)
9 |
10 |
11 | # 失敗するテストが適切に検知されるかを確認する
12 | def test_double_ng() -> None:
13 | """わざと失敗するテスト"""
14 | assert 2 == double(2)
15 |
--------------------------------------------------------------------------------
/app/streamlit_sample/06_file_upload.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import streamlit as st
3 |
4 | uploaded_file = st.file_uploader("予測用ファイルをアップロードしてください", type="csv")
5 | if uploaded_file is not None:
6 | df_upload = pd.read_csv(uploaded_file, parse_dates=["date"])
7 | df_upload["area"] = df_upload["area"].astype("category")
8 | st.dataframe(df_upload)
9 |
--------------------------------------------------------------------------------
/app/streamlit_sample/08_cache.py:
--------------------------------------------------------------------------------
1 | import time
2 |
3 | import streamlit as st
4 |
5 |
6 | @st.cache_data # キャッシュを追加
7 | def heavy_process() -> str:
8 | time.sleep(3)
9 | return "heavy process"
10 |
11 |
12 | display_period = st.slider("表示期間(日)", min_value=1, max_value=100, value=30)
13 | heavy_process()
14 | st.write(f"表示期間: {display_period}日")
15 |
--------------------------------------------------------------------------------
/scripts/conf/config.yaml:
--------------------------------------------------------------------------------
1 | data_path: "/app/data/taxi_dataset.csv"
2 | train_ratio: 0.7
3 | model:
4 | objective: "regression"
5 | metric: "rmse"
6 | boosting_type: "gbdt"
7 | num_leaves: 30
8 | learning_rate: 0.05
9 | feature_fraction: 0.9
10 | bagging_fraction: 0.8
11 | bagging_freq: 5
12 | train:
13 | num_boost_round: 1000
14 | early_stopping_rounds: 10
15 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Pythonのバイトコード
2 | __pycache__/
3 | # pytest, mypyのキャッシュ
4 | .pytest_cache/
5 | .mypy_cache/
6 | # Jupyter Notebookのチェックポイント
7 | .ipynb_checkpoints
8 | # 環境変数ファイル
9 | .env
10 | # VSCodeの設定ファイル
11 | .vscode/
12 | # macOSのDS_システムファイル
13 | .DS_Store
14 | # .gitkeep以外のdataディレクトリ
15 | data/*
16 | !data/.gitkeep
17 | # .gitkeep以外のmodelディレクトリ
18 | model/*
19 | !model/.gitkeep
20 | # .gitkeep以外のoutputsディレクトリ
21 | outputs/*
22 | !outputs/.gitkeep
23 | # MLflowの出力ディレクトリ
24 | mlruns/
25 | multirun/
26 |
--------------------------------------------------------------------------------
/app/streamlit_sample/03_dataframe.py:
--------------------------------------------------------------------------------
1 | import datetime
2 |
3 | import pandas as pd
4 | import streamlit as st
5 |
6 | df = pd.DataFrame(
7 | {
8 | "area": [1, 1, 1, 3, 3, 3],
9 | "date": [
10 | datetime.date(2019, 12, 1),
11 | datetime.date(2019, 12, 2),
12 | datetime.date(2019, 12, 3),
13 | datetime.date(2019, 12, 1),
14 | datetime.date(2019, 12, 2),
15 | datetime.date(2019, 12, 3),
16 | ],
17 | "num_trip": [100, 200, 300, 200, 400, 600],
18 | "label": ["実績", "実績", "予測", "実績", "実績", "予測"],
19 | }
20 | )
21 |
22 | st.dataframe(df)
23 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.yml:
--------------------------------------------------------------------------------
1 | name: バグ報告
2 | description: 書籍のコードに従っても動作しない、またはバグがある場合はこちらを選択
3 | labels: ["bug"]
4 | body:
5 | - type: textarea
6 | attributes:
7 | label: 実行時のエラー内容
8 | description: 実行エラーの詳細をご記入ください。
9 | validations:
10 | required: true
11 | - type: textarea
12 | attributes:
13 | label: エラーの詳細
14 | description: エラーメッセージやログの一部を貼り付けてください。
15 | validations:
16 | required: true
17 | - type: textarea
18 | attributes:
19 | label: 実行環境の情報
20 | description: |
21 | 例:
22 | - OS: macOS 14 / Ubuntu 22.04
23 | - Python: 3.10.12
24 | - ライブラリのバージョンなど
25 | placeholder: |
26 | OS:
27 | Python:
28 | ライブラリのバージョン:
29 | validations:
30 | required: true
31 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 『先輩データサイエンティストからの指南書』のサンプルリポジトリ
2 |
3 | このリポジトリは、技術評論社の書籍『先輩データサイエンティストからの指南書 実務で生き抜くためのエンジニアリングスキル』のサンプルコードを提供するためのものです。
4 |
5 |
6 |
7 | ## 書誌情報
8 | 『先輩データサイエンティストからの指南書 実務で生き抜くためのエンジニアリングスキル』著:浅野純季、田中冬馬、武藤克大、木村真也、栁泉穂、2025年刊行、技術評論社、ISBN:978-4-297-15100-3(紙)、978-4-297-15101-0(電子)
9 |
10 | - 技術評論社:https://gihyo.jp/book/2025/978-4-297-15100-3
11 | - Amazon:https://www.amazon.co.jp/dp/4297151006/
12 | - サポートページ:https://gihyo.jp/book/2025/978-4-297-15100-3/support
13 | - 参考文献・正誤表が掲載されています。
14 |
15 | 本書に関するお問い合わせについては、下記の技術評論社サイトからお願いいたします。
16 |
17 | https://gihyo.jp/site/inquiry/book?ISBN=978-4-297-15100-3
18 |
19 | ## セットアップ方法
20 |
21 | セットアップ手順は、以下のドキュメントを参照してください。
22 |
23 | - [docs/setup.md](docs/setup.md)
24 |
--------------------------------------------------------------------------------
/scripts/train.py:
--------------------------------------------------------------------------------
1 | from logging import getLogger
2 |
3 | import hydra
4 | from omegaconf import DictConfig
5 |
6 | from taxi_prediction.model import LGBModel
7 | from taxi_prediction.process import load_dataset, preprocess_for_train, split_dataset
8 |
9 | logger = getLogger(__name__)
10 |
11 |
12 | @hydra.main(config_path="conf", config_name="config")
13 | def main(config: DictConfig) -> None:
14 | # データの読み込み
15 | dataset = load_dataset(config.data_path)
16 |
17 | # 前処理
18 | dataset_train, dataset_valid = split_dataset(dataset, config.train_ratio)
19 | df_train = preprocess_for_train(dataset_train)
20 | df_valid = preprocess_for_train(dataset_valid)
21 | logger.info(f"train_size: {len(df_train)}")
22 | logger.info(f"valid_size: {len(df_valid)}")
23 |
24 | # 学習
25 | model = LGBModel(dict(config.model))
26 | model.fit(df_train, df_valid, **config.train)
27 |
28 | # 予測・評価
29 | scores = model.evaluate(df_valid)
30 | logger.info(f"scores: {scores}")
31 |
32 | # モデルの保存
33 | model.save("model.pickle")
34 |
35 |
36 | if __name__ == "__main__":
37 | main()
38 |
--------------------------------------------------------------------------------
/app/streamlit_sample/04_plot.py:
--------------------------------------------------------------------------------
1 | import datetime
2 |
3 | import pandas as pd
4 | import plotly.express as px
5 | import plotly.graph_objects as go
6 | import streamlit as st
7 |
8 |
9 | def _plot_prediction(df: pd.DataFrame) -> go.Figure:
10 | fig = px.line(
11 | df, x="date", y="num_trip", color="area", markers=True, line_dash="label"
12 | )
13 | fig.update_layout(
14 | title="乗車数の推移",
15 | xaxis_title="日付",
16 | yaxis_title="乗車数",
17 | legend_title="エリア, ラベル",
18 | )
19 | return fig
20 |
21 |
22 | df = pd.DataFrame(
23 | {
24 | "area": [1, 1, 1, 3, 3, 3],
25 | "date": [
26 | datetime.date(2019, 12, 1),
27 | datetime.date(2019, 12, 2),
28 | datetime.date(2019, 12, 3),
29 | datetime.date(2019, 12, 1),
30 | datetime.date(2019, 12, 2),
31 | datetime.date(2019, 12, 3),
32 | ],
33 | "num_trip": [100, 200, 300, 200, 400, 600],
34 | "label": ["実績", "実績", "予測", "実績", "実績", "予測"],
35 | }
36 | )
37 |
38 | fig = _plot_prediction(df)
39 | st.plotly_chart(fig)
40 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 Junki Asano, Masaya Kimura, Toma Tanaka, Katsuhiro Muto, Mizuho Yanage.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6 |
7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 |
--------------------------------------------------------------------------------
/src/taxi_prediction/schema.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | import pandera as pa
4 | from pandera.typing import Index, Series
5 |
6 | from .consts import MAX_PREDICT_DAYS
7 |
8 |
9 | class TaxiDatasetSchema(pa.DataFrameModel):
10 | date: Series[np.datetime64]
11 | area: Series[pd.CategoricalDtype]
12 | num_trip: Series[int] = pa.Field(ge=0)
13 |
14 |
15 | class InferInputSchema(pa.DataFrameModel):
16 | date: Index[np.datetime64]
17 | target_date: Index[np.datetime64]
18 | area: Series[pd.CategoricalDtype]
19 | num_trip: Series[int] = pa.Field(ge=0)
20 | weekday: Series[int] = pa.Field(in_range={"min_value": 0, "max_value": 6})
21 | target_lead: Series[int] = pa.Field(
22 | in_range={"min_value": 1, "max_value": MAX_PREDICT_DAYS}
23 | )
24 |
25 | @pa.dataframe_check
26 | def target_date_is_consistent(cls, df: pd.DataFrame) -> Series[bool]:
27 | """date + target_lead = target_date の関係式が
28 | 成り立っていることをチェックする
29 | """
30 | return df.index.get_level_values("date") + pd.to_timedelta( # type: ignore
31 | df["target_lead"], unit="D"
32 | ) == df.index.get_level_values("target_date")
33 |
34 | class Config:
35 | strict = True
36 | coerce = True
37 |
38 |
39 | class TrainInputSchema(InferInputSchema):
40 | target: Series[int] = pa.Field(ge=0)
41 |
42 |
43 | class InferOutputSchema(pa.DataFrameModel):
44 | date: Index[np.datetime64]
45 | target_date: Index[np.datetime64]
46 | area: Series[pd.CategoricalDtype]
47 | pred: Series[float]
48 |
--------------------------------------------------------------------------------
/scripts/train_with_mlflow.py:
--------------------------------------------------------------------------------
1 | from logging import getLogger
2 |
3 | import hydra
4 | import mlflow
5 | from omegaconf import DictConfig
6 |
7 | from taxi_prediction.model import LGBModel
8 | from taxi_prediction.process import load_dataset, preprocess_for_train, split_dataset
9 |
10 | logger = getLogger(__name__)
11 |
12 |
13 | @hydra.main(config_path="conf", config_name="config")
14 | def main(config: DictConfig) -> None:
15 | mlflow.set_tracking_uri("file:///app/mlruns")
16 | mlflow.set_experiment("taxi_prediction")
17 |
18 | with mlflow.start_run():
19 | mlflow.log_param("train_ratio", config.train_ratio)
20 | mlflow.log_params(config.model)
21 | mlflow.log_params(config.train)
22 | mlflow.set_tag("model_type", "lightgbm")
23 |
24 | # データの読み込み
25 | dataset = load_dataset(config.data_path)
26 |
27 | # 前処理
28 | dataset_train, dataset_valid = split_dataset(dataset, config.train_ratio)
29 | df_train = preprocess_for_train(dataset_train)
30 | df_valid = preprocess_for_train(dataset_valid)
31 | logger.info(f"train_size: {len(df_train)}")
32 | logger.info(f"valid_size: {len(df_valid)}")
33 | mlflow.log_table(df_train, "df_train.json")
34 | mlflow.log_table(df_valid, "df_valid.json")
35 |
36 | # 学習
37 | model = LGBModel(dict(config.model))
38 | model.fit(df_train, df_valid, **config.train)
39 |
40 | # 予測・評価
41 | scores = model.evaluate(df_valid)
42 | logger.info(f"scores: {scores}")
43 | mlflow.log_metrics(scores)
44 |
45 | # モデルの保存
46 | model.save("model.pickle")
47 | mlflow.log_artifact("model.pickle")
48 |
49 |
50 | if __name__ == "__main__":
51 | main()
52 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | # 以下のドキュメントを参考に作成
2 | # https://docs.astral.sh/uv/guides/integration/docker/#using-uv-in-docker
3 | # https://github.com/astral-sh/uv-docker-example/blob/main/Dockerfile
4 |
5 | FROM python:3.12-slim-bookworm
6 |
7 | # パッケージのインストール
8 | RUN apt-get update && \
9 | apt-get install --no-install-recommends -y \
10 | ca-certificates curl fonts-ipafont-gothic gcc git locales sudo tmux tzdata vim zsh && \
11 | apt-get clean && \
12 | rm -rf /var/lib/apt/lists/*
13 |
14 | # 言語設定
15 | RUN echo "ja_JP UTF-8" > /etc/locale.gen && \
16 | locale-gen ja_JP.UTF-8
17 | ENV LANG=ja_JP.UTF-8
18 | ENV LC_ALL=ja_JP.UTF-8
19 | ENV TZ=Asia/Tokyo
20 |
21 | # uvのインストール
22 | ADD https://astral.sh/uv/install.sh /uv-installer.sh
23 | RUN sh /uv-installer.sh && rm /uv-installer.sh
24 | ENV PATH="/root/.local/bin/:$PATH"
25 |
26 | # システムのPythonを使用する
27 | # cf. https://docs.astral.sh/uv/concepts/projects/config/#project-environment-path
28 | ENV UV_PROJECT_ENVIRONMENT="/usr/local/"
29 | ENV UV_LINK_MODE=copy
30 |
31 | WORKDIR /app
32 |
33 | # 依存関係のインストール
34 | # Note: プロジェクトの依存関係はソースコードに比べて変更頻度が低いので、レイヤーを分けてキャッシュを効率的に利用する
35 | # cf. https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
36 | RUN --mount=type=cache,target=/root/.cache/uv \
37 | --mount=type=bind,source=uv.lock,target=uv.lock \
38 | --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
39 | uv sync --frozen --no-install-project
40 |
41 | # プロジェクトのソースコードをインストール
42 | COPY ./src/ /app/src/
43 | COPY ./README.md /app/
44 | RUN --mount=type=cache,target=/root/.cache/uv \
45 | --mount=type=bind,source=uv.lock,target=uv.lock \
46 | --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
47 | uv sync --frozen
48 |
49 | COPY ./app/. /app/app/
50 | COPY ./model/. /app/model/
51 |
52 | CMD ["streamlit", "run", "app/app.py", "--server.port", "8501"]
53 |
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [project]
2 | name = "taxi_prediction"
3 | description = "『先輩データサイエンティストからの指南書』(技術評論社)のサンプルリポジトリ"
4 | authors = [
5 | { name = "Junki Asano" },
6 | { name = "Toma Tanaka" },
7 | { name = "Katsuhiro Muto" },
8 | { name = "Masaya Kimura" },
9 | { name = "Mizuho Yanagi" },
10 | ]
11 | version = "1.0.0"
12 | readme = "README.md"
13 | license = "MIT"
14 | requires-python = ">=3.12,<3.13"
15 | dependencies = [
16 | "chardet>=5.2.0",
17 | "google-cloud-bigquery>=3.32.0",
18 | "lightgbm>=4.6.0",
19 | "numpy>=1.26.4",
20 | "pandas>=2.2.3",
21 | "plotly>=6.0.1",
22 | "scikit-learn>=1.5.2",
23 | "statsmodels>=0.14.4",
24 | "streamlit>=1.45.1",
25 | "tqdm>=4.67.1",
26 | ]
27 |
28 | [dependency-groups]
29 | dev = [
30 | "hydra-core>=1.3.2",
31 | "japanize-matplotlib>=1.1.3",
32 | "jupyter>=1.1.1",
33 | "matplotlib>=3.10.3",
34 | "mlflow>=2.22.0",
35 | "mypy>=1.15.0",
36 | "pandera>=0.23.1",
37 | "pytest>=8.3.5",
38 | "ruff>=0.11.9",
39 | ]
40 |
41 | [build-system]
42 | requires = ["hatchling"]
43 | build-backend = "hatchling.build"
44 |
45 | [tool.ruff]
46 | target-version = "py312"
47 |
48 | [tool.ruff.lint]
49 | # 有効化するルール
50 | # cf. https://beta.ruff.rs/docs/rules/
51 | select = [
52 | "B", # flake8-bugbear
53 | "E", # pycodestyle error
54 | "F", # Pyflakes
55 | "I", # isort
56 | "W", # pycodestyle warning
57 | ]
58 | # 自動的に修正するルール
59 | fixable = ["E", "F", "I"]
60 | # 自動的に修正しないルール
61 | unfixable = ["W"]
62 |
63 | [tool.mypy]
64 | python_version = "3.12"
65 | # チェック項目
66 | warn_unreachable = true # 到達不可能なコードがある場合に警告
67 | disallow_untyped_defs = true # 型アノテーションのない関数を許可しない
68 | warn_return_any = true # 戻り値の型が不明な場合に警告
69 | ignore_missing_imports = true # サードパーティー製のライブラリを無視
70 | # 表示関係
71 | pretty = true
72 | show_error_context = true
73 |
74 | [tool.pytest]
75 | minversion = "6.0"
76 | addopts = "-svv --tb=short --capture=no --full-trace"
77 | testpaths = ["tests/*"]
78 |
79 | [tool.pytest.ini_options]
80 | # importlib import mode
81 | # cf. https://docs.pytest.org/en/latest/explanation/goodpractices.html#choosing-a-test-layout
82 | addopts = [
83 | "--import-mode=importlib",
84 | ]
85 |
--------------------------------------------------------------------------------
/tests/taxi_prediction/test_process.py:
--------------------------------------------------------------------------------
1 | import io
2 |
3 | import pandas as pd
4 | import pytest
5 | from pandas.testing import assert_frame_equal
6 | from pandera.typing import DataFrame
7 |
8 | from taxi_prediction.process import preprocess_for_infer, preprocess_for_train
9 | from taxi_prediction.schema import TaxiDatasetSchema
10 |
11 |
12 | @pytest.fixture
13 | def df_sample() -> DataFrame[TaxiDatasetSchema]:
14 | csv_data = """
15 | date,area,num_trip
16 | 2024-01-01,1,5
17 | 2024-01-02,1,8
18 | 2024-01-01,2,20
19 | 2024-01-02,2,18
20 | """
21 | df = pd.read_csv(io.StringIO(csv_data), parse_dates=["date"])
22 | df["area"] = df["area"].astype("category")
23 | return df # type: ignore
24 |
25 |
26 | def test_preprocess_for_infer(df_sample: DataFrame[TaxiDatasetSchema]) -> None:
27 | actual = preprocess_for_infer(df_sample, max_predict_days=3)
28 |
29 | expected_csv = """
30 | date,target_date,area,num_trip,weekday,target_lead
31 | 2024-01-01,2024-01-02,1,5,0,1
32 | 2024-01-01,2024-01-03,1,5,0,2
33 | 2024-01-01,2024-01-04,1,5,0,3
34 | 2024-01-02,2024-01-03,1,8,1,1
35 | 2024-01-02,2024-01-04,1,8,1,2
36 | 2024-01-02,2024-01-05,1,8,1,3
37 | 2024-01-01,2024-01-02,2,20,0,1
38 | 2024-01-01,2024-01-03,2,20,0,2
39 | 2024-01-01,2024-01-04,2,20,0,3
40 | 2024-01-02,2024-01-03,2,18,1,1
41 | 2024-01-02,2024-01-04,2,18,1,2
42 | 2024-01-02,2024-01-05,2,18,1,3
43 | """
44 | expected = pd.read_csv(
45 | io.StringIO(expected_csv), parse_dates=["date", "target_date"]
46 | ).set_index(["date", "target_date"])
47 | expected["area"] = expected["area"].astype("category")
48 |
49 | assert_frame_equal(actual, expected)
50 |
51 |
52 | def test_preprocess_for_train(df_sample: DataFrame[TaxiDatasetSchema]) -> None:
53 | actual = preprocess_for_train(df_sample, max_predict_days=3)
54 |
55 | expected_csv = """
56 | date,target_date,area,num_trip,weekday,target_lead,target
57 | 2024-01-01,2024-01-02,1,5,0,1,8
58 | 2024-01-01,2024-01-02,2,20,0,1,18
59 | """
60 | expected = pd.read_csv(
61 | io.StringIO(expected_csv), parse_dates=["date", "target_date"]
62 | ).set_index(["date", "target_date"])
63 | expected["area"] = expected["area"].astype("category")
64 |
65 | assert_frame_equal(actual, expected, check_dtype=False)
66 |
--------------------------------------------------------------------------------
/src/taxi_prediction/model.py:
--------------------------------------------------------------------------------
1 | import pickle
2 | from pathlib import Path
3 | from typing import Any, Self
4 |
5 | import lightgbm as lgb
6 | import pandera as pa
7 | from pandera.typing import DataFrame
8 | from sklearn.metrics import mean_absolute_error
9 |
10 | from .schema import InferInputSchema, InferOutputSchema, TrainInputSchema
11 |
12 |
13 | class LGBModel:
14 | """
15 | LightGBMのラッパークラス
16 | """
17 |
18 | def __init__(self, params: dict[str, Any]) -> None:
19 | self._params = params.copy()
20 | self._model: lgb.Booster | None = None
21 |
22 | @pa.check_types
23 | def fit(
24 | self,
25 | df_train: DataFrame[TrainInputSchema],
26 | df_valid: DataFrame[TrainInputSchema],
27 | num_boost_round: int = 1000,
28 | early_stopping_rounds: int = 10,
29 | ) -> Self:
30 | """
31 | モデルを学習する
32 | """
33 | data_train = lgb.Dataset(
34 | data=df_train.drop(columns=["target"]), label=df_train["target"]
35 | )
36 | data_valid = lgb.Dataset(
37 | data=df_valid.drop(columns=["target"]), label=df_valid["target"]
38 | )
39 |
40 | self._model = lgb.train(
41 | params=self._params,
42 | train_set=data_train,
43 | valid_sets=[data_train, data_valid],
44 | valid_names=["train", "valid"],
45 | num_boost_round=num_boost_round,
46 | callbacks=[
47 | lgb.early_stopping(stopping_rounds=early_stopping_rounds),
48 | lgb.log_evaluation(period=100),
49 | ],
50 | )
51 |
52 | return self
53 |
54 | @pa.check_types
55 | def predict(self, df: DataFrame[InferInputSchema]) -> DataFrame[InferOutputSchema]:
56 | """
57 | 予測を実行する
58 | """
59 | if self._model is None:
60 | raise ValueError("Model has not been trained.")
61 |
62 | pred = self._model.predict(df, num_iteration=self._model.best_iteration)
63 |
64 | columns = list(InferOutputSchema.to_schema().columns.keys())
65 | return df.assign(pred=pred)[columns] # type: ignore
66 |
67 | @pa.check_types
68 | def evaluate(self, df: DataFrame[TrainInputSchema]) -> dict[str, float]:
69 | """
70 | モデルの予測精度を評価する
71 | 評価指標を格納した辞書を返す
72 | """
73 | target = df["target"]
74 | pred = self.predict(df.drop(columns=["target"]))["pred"]
75 |
76 | scores = {}
77 | scores["mae"] = mean_absolute_error(target, pred)
78 |
79 | return scores
80 |
81 | def save(self, filepath: str | Path) -> None:
82 | """
83 | モデルをpickleで保存する
84 | """
85 | with open(filepath, "wb") as f:
86 | pickle.dump(self, f)
87 |
88 | @classmethod
89 | def load(cls, filepath: str | Path) -> Self:
90 | """
91 | pickleからモデルのインスタンスを読み込む
92 | """
93 | with open(filepath, "rb") as file:
94 | model = pickle.load(file)
95 |
96 | if not isinstance(model, cls):
97 | raise TypeError(
98 | f"Loaded object type does not match expected type. "
99 | f"Expected: {cls.__name__}, Actual: {type(model).__name__}"
100 | )
101 |
102 | return model
103 |
--------------------------------------------------------------------------------
/app/app.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | from pathlib import Path
3 |
4 | import pandas as pd
5 | import pandera as pa
6 | import plotly.express as px
7 | import plotly.graph_objects as go
8 | import streamlit as st
9 | from pandera.typing import DataFrame
10 |
11 | from taxi_prediction.model import LGBModel
12 | from taxi_prediction.process import postprocess, preprocess_for_infer
13 | from taxi_prediction.schema import TaxiDatasetSchema
14 |
15 |
16 | @st.cache_resource
17 | def load_model(model_path: str | Path) -> LGBModel:
18 | """学習済みモデルを読み込む"""
19 | return LGBModel.load(model_path)
20 |
21 |
22 | @st.cache_data
23 | @pa.check_types
24 | def inference_usecase(
25 | df: DataFrame[TaxiDatasetSchema],
26 | model_path: str | Path,
27 | predict_start_date: datetime.date,
28 | ) -> DataFrame[TaxiDatasetSchema]:
29 | """推論のワークフロー。前処理から予測までの一連の処理を行う"""
30 | df_processed = preprocess_for_infer(df)
31 | model = load_model(model_path)
32 | df_pred = model.predict(df_processed)
33 | return postprocess(df_pred, predict_date=predict_start_date)
34 |
35 |
36 | def _filter_by_area(df: pd.DataFrame, list_selected_area: list[str]) -> pd.DataFrame:
37 | """ユーザーが入力したエリアでデータをフィルタリングする
38 |
39 | Note: list_selected_areaが空の場合はすべてのエリアを選択する
40 | """
41 | if len(list_selected_area) == 0:
42 | return df
43 | return df[df["area"].isin(list_selected_area)]
44 |
45 |
46 | def _filter_by_display_period(df: pd.DataFrame, display_period: int) -> pd.DataFrame:
47 | """最新の日付から指定された表示期間(日数)分のデータを抽出する"""
48 | return df[df["date"] > df["date"].max() - pd.Timedelta(days=display_period)]
49 |
50 |
51 | def _plot_prediction(df: pd.DataFrame) -> go.Figure:
52 | """予測結果をグラフ化する"""
53 | fig = px.line(
54 | df, x="date", y="num_trip", color="area", markers=True, line_dash="label"
55 | )
56 | fig.update_layout(
57 | title="乗車数の推移",
58 | xaxis_title="日付",
59 | yaxis_title="乗車数",
60 | legend_title="エリア, ラベル",
61 | )
62 | return fig
63 |
64 |
65 | def main() -> None:
66 | model_path = "model/model.pickle"
67 |
68 | st.title("タクシーの乗車数予測")
69 | st.write("タクシーの乗車数を予測するプロトタイプです。")
70 | st.header("予測用ファイルのアップロード")
71 |
72 | uploaded_file = st.file_uploader(
73 | "予測用ファイルをアップロードしてください", type="csv"
74 | )
75 |
76 | if uploaded_file is not None:
77 | df_upload = pd.read_csv(uploaded_file, parse_dates=["date"])
78 | df_upload["area"] = df_upload["area"].astype("category")
79 |
80 | st.header("予測結果の表示")
81 | list_selected_area: list[str] = st.multiselect(
82 | "エリアを選択", df_upload["area"].unique()
83 | )
84 | display_period = st.slider(
85 | "実績データの表示期間(日)", min_value=1, max_value=100, value=30
86 | )
87 |
88 | df_upload = _filter_by_area(df_upload, list_selected_area)
89 | df_upload = _filter_by_display_period(df_upload, display_period)
90 |
91 | df_pred = inference_usecase(
92 | df_upload, model_path=model_path, predict_start_date=df_upload["date"].max()
93 | )
94 |
95 | df_upload["label"] = "実績"
96 | df_pred["label"] = "予測"
97 |
98 | df_concat = pd.concat([df_upload, df_pred], ignore_index=True)
99 |
100 | fig = _plot_prediction(df_concat)
101 | st.plotly_chart(fig)
102 |
103 |
104 | if __name__ == "__main__":
105 | main()
106 |
--------------------------------------------------------------------------------
/src/taxi_prediction/process.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | from pathlib import Path
3 |
4 | import pandas as pd
5 | import pandera as pa
6 | from pandera.typing import DataFrame
7 |
8 | from .consts import MAX_PREDICT_DAYS
9 | from .schema import (
10 | InferInputSchema,
11 | InferOutputSchema,
12 | TaxiDatasetSchema,
13 | TrainInputSchema,
14 | )
15 |
16 |
17 | @pa.check_types
18 | def load_dataset(filepath: str | Path) -> DataFrame[TaxiDatasetSchema]:
19 | """
20 | タクシー乗降数のデータセットを読み込む
21 | """
22 | df = pd.read_csv(filepath, parse_dates=["date"])
23 | df["area"] = df["area"].astype("category")
24 | return df # type: ignore
25 |
26 |
27 | @pa.check_types
28 | def split_dataset(
29 | df: DataFrame[TaxiDatasetSchema], train_ratio: float
30 | ) -> tuple[pd.DataFrame, pd.DataFrame]:
31 | """
32 | データセットを時系列に沿って2つに分割する
33 | """
34 | df = df.sort_values("date").reset_index(drop=True)
35 | split_idx = int(len(df) * train_ratio)
36 | return df.iloc[:split_idx], df.iloc[split_idx:]
37 |
38 |
39 | @pa.check_types
40 | def preprocess_for_infer(
41 | df: DataFrame[TaxiDatasetSchema], max_predict_days: int = MAX_PREDICT_DAYS
42 | ) -> DataFrame[InferInputSchema]:
43 | """
44 | 特徴量を加工し推論用のデータを返す
45 | """
46 | # 特徴量を追加
47 | df = df.assign(weekday=df["date"].dt.weekday.astype(int))
48 |
49 | df_result = pd.DataFrame()
50 | for lead in range(1, max_predict_days + 1):
51 | # 予測対象日は何日後か (target_lead) と予測対象日 (target_date) を付加
52 | df_sub = df.assign(
53 | target_lead=lead,
54 | target_date=df["date"] + pd.Timedelta(days=lead),
55 | )
56 | df_result = pd.concat([df_result, df_sub])
57 |
58 | df_result = df_result.sort_values(["area", "date", "target_date"]).set_index(
59 | ["date", "target_date"]
60 | )
61 |
62 | return df_result # type: ignore
63 |
64 |
65 | @pa.check_types
66 | def preprocess_for_train(
67 | df: DataFrame[TaxiDatasetSchema], max_predict_days: int = MAX_PREDICT_DAYS
68 | ) -> DataFrame[TrainInputSchema]:
69 | """
70 | 推論用データに目的変数を追加した学習用データを返す
71 | """
72 | df_result = preprocess_for_infer(
73 | df, max_predict_days=max_predict_days
74 | ).reset_index()
75 |
76 | # 目的変数を付加
77 | df_target = df[["area", "date", "num_trip"]].rename(
78 | columns={"date": "target_date", "num_trip": "target"}
79 | )
80 | df_result = (
81 | df_result.merge(
82 | df_target, on=["area", "target_date"], how="left", validate="m:1"
83 | )
84 | .dropna(subset=["target"])
85 | .convert_dtypes()
86 | )
87 |
88 | df_result = df_result.sort_values(["area", "date", "target_date"]).set_index(
89 | ["date", "target_date"]
90 | )
91 |
92 | return df_result # type: ignore
93 |
94 |
95 | @pa.check_types
96 | def postprocess(
97 | df: DataFrame[InferOutputSchema],
98 | predict_date: datetime.date,
99 | max_predict_days: int = MAX_PREDICT_DAYS,
100 | ) -> DataFrame[TaxiDatasetSchema]:
101 | """
102 | モデルの出力を後処理する
103 | predict_dateからMAX_PREDICT_DAYS日後までの予測を元のデータセットと同様の形式で取得する
104 | """
105 | df = df.reset_index()
106 |
107 | # predict_dateからMAX_PREDICT_DAYS日後までの予測に対応する部分を抽出
108 | predict_datetime = pd.to_datetime(predict_date)
109 | df_filtered = df[
110 | (df["date"] == predict_datetime)
111 | & (df["target_date"] >= predict_datetime + pd.Timedelta(days=1))
112 | & (df["target_date"] <= predict_datetime + pd.Timedelta(days=max_predict_days))
113 | ]
114 |
115 | # TaxiDatasetSchemaの形式に変換
116 | df_result = (
117 | df_filtered[["target_date", "area", "pred"]]
118 | .rename(columns={"target_date": "date", "pred": "num_trip"})
119 | .sort_values(by=["area", "date"])
120 | .reset_index(drop=True)
121 | )
122 | df_result["num_trip"] = df_result["num_trip"].clip(lower=0).astype(int)
123 |
124 | # すべてのエリアに対して予測値が揃っているかチェック
125 | expected_nrows = df["area"].nunique() * max_predict_days
126 | if len(df_result) != expected_nrows:
127 | raise ValueError(
128 | "Number of extracted rows does not match the expected value. "
129 | f"Expected: {expected_nrows}, Actual: {len(df_result)}"
130 | )
131 |
132 | return df_result # type: ignore
133 |
--------------------------------------------------------------------------------
/.devcontainer/devcontainer.json:
--------------------------------------------------------------------------------
1 | {
2 | "name": "Existing Docker Compose (Extend)",
3 | "dockerComposeFile": ["../compose.yml"],
4 | "service": "app",
5 | "workspaceFolder": "/app",
6 | "shutdownAction": "stopCompose",
7 | "overrideCommand": true,
8 | "customizations": {
9 | "vscode": {
10 | "settings": {
11 | // Pythonのデフォルトのインタプリタパス
12 | "python.defaultInterpreterPath": "/usr/local/bin/python",
13 | "[python]": {
14 | // ファイルの保存時にフォーマットを実行する
15 | "editor.formatOnSave": true,
16 | // ファイルの保存時のアクションを設定する
17 | "editor.codeActionsOnSave": {
18 | // リンター機能の実行し、修正可能なエラーを修正する
19 | "source.fixAll.ruff": "explicit",
20 | // インポート文の整理を行う
21 | "source.organizeImports.ruff": "explicit"
22 | },
23 | // Ruffをフォーマッターとして利用する
24 | "editor.defaultFormatter": "charliermarsh.ruff"
25 | },
26 | // Notebookの保存時にフォーマットを実行する
27 | "notebook.formatOnSave.enabled": true,
28 | // Notebookの保存時のアクションを設定する
29 | "notebook.codeActionsOnSave": {
30 | // リンター機能の実行し、修正可能なエラーを修正する
31 | "notebook.source.fixAll": "explicit",
32 | // インポート文の整理を行う
33 | "notebook.source.organizeImports": "explicit"
34 | },
35 | "autoDocstring.docstringFormat": "google",
36 | "python.testing.pytestArgs": [
37 | "tests",
38 | "-vv", // 詳細結果の出力
39 | "-s" // print文の出力
40 | ],
41 | "python.testing.unittestEnabled": false,
42 | "python.testing.pytestEnabled": true,
43 | // コードセルの解析時にシェル割り当て(#!)、行マジック(#!%)およびセルマジック(#!%%)のコメントを解除します。
44 | "jupyter.interactiveWindow.textEditor.magicCommandsAsComments": true,
45 | // デバッグするときにライブラリコードにステップインできるようにする。
46 | "jupyter.debugJustMyCode": false,
47 | // 引数名を表示
48 | "python.analysis.inlayHints.callArgumentNames": "all",
49 | // 型を表示
50 | "python.analysis.inlayHints.variableTypes": true,
51 | // 戻り値の型を表示
52 | "python.analysis.inlayHints.functionReturnTypes": true,
53 | // デバッグ設定
54 | "launch": {
55 | "version": "0.2.0",
56 | "configurations": [
57 | // StreamlitでWebアプリケーションとして実行しながらデバッグする
58 | {
59 | "name": "Streamlit: サーバーを起動",
60 | "type": "debugpy",
61 | "request": "launch",
62 | "module": "streamlit",
63 | "cwd": "${workspaceFolder}",
64 | "console": "internalConsole",
65 | "args": [
66 | "run",
67 | "app/app.py",
68 | "--server.port",
69 | "8501",
70 | "--server.runOnSave",
71 | "true"
72 | ],
73 | "justMyCode": false
74 | },
75 | // 現在のファイル全体を実行してデバッグする
76 | {
77 | "name": "Python: Current File",
78 | "type": "debugpy",
79 | "request": "launch",
80 | "program": "${file}",
81 | "console": "internalConsole",
82 | "justMyCode": false
83 | },
84 | // テストコードを書いてテストコードの範囲内(単体テストの場合は関数単位)でデバッグする
85 | {
86 | "name": "Python: Debug Test",
87 | "type": "debugpy",
88 | "request": "launch",
89 | "purpose": ["debug-test"],
90 | "console": "internalConsole",
91 | "justMyCode": false
92 | }
93 | ]
94 | }
95 | },
96 | "extensions": [
97 | // 開発支援
98 | "ceintl.vscode-language-pack-ja",
99 | "github.copilot",
100 | "usernamehw.errorlens",
101 | "gruntfuggly.todo-tree",
102 | "ibm.output-colorizer",
103 | "editorconfig.editorconfig",
104 | "mosapride.zenkaku",
105 | "ionutvmi.path-autocomplete",
106 | // python関連
107 | "ms-python.python",
108 | "charliermarsh.ruff",
109 | "ms-python.mypy-type-checker",
110 | "njpwerner.autodocstring",
111 | "ms-toolsai.jupyter",
112 | "ms-toolsai.jupyter-keymap",
113 | // Markdown関連
114 | "yzane.markdown-pdf",
115 | // CSV関連
116 | "janisdd.vscode-edit-csv",
117 | "mechatroner.rainbow-csv",
118 | // Git関連
119 | "mhutchie.git-graph"
120 | ]
121 | }
122 | }
123 | }
124 |
--------------------------------------------------------------------------------
/docs/setup.md:
--------------------------------------------------------------------------------
1 | # 本書で必要な環境構築
2 |
3 | 本書では、ハンズオン形式で実際にコードを動かしながら学習を進めることができます。以下では、ハンズオン実施のために必要な環境構築の手順を記載します。この準備を行うことで、4章のデータ確認から5章の実験管理、6章のプロトタイプ開発までの一連の流れを、実践的に学ぶことができます。
4 |
5 | 環境構築には、主に以下の3つのツールを使用します。
6 |
7 | - **Visual Studio Code (VS Code)**:軽量で拡張性に優れた統合開発環境 (IDE) で、多くのプログラミング言語やツールに対応しています。
8 | - **Docker**:アプリケーションの実行環境をコンテナとしてパッケージ化し、他の開発者と同じ環境を簡単に再現できる仕組みを提供します。
9 | - **Git**:ソースコードや設定ファイルを管理するための分散型バージョン管理システムです。
10 |
11 | それでは環境構築を実施していきましょう。
12 |
13 | ## 環境構築の手順
14 |
15 | 以下の手順で環境構築を実施します。
16 |
17 | 1. 各種ツールのインストール
18 | 2. リポジトリのクローン
19 | 3. 必要なファイルのダウンロード
20 | 4. 環境変数の設定ファイルの用意
21 | 5. Dev Containerでの開発環境の立ち上げ
22 |
23 |
24 | ### 1. 各種ツールのインストール
25 |
26 | ##### VS Codeのインストール
27 |
28 | 本書では、IDE(統合開発環境)としてVisual Studio Code(以下VS Code)を採用しています。VS CodeはMicrosoft社が開発しているオープンソースのIDEで、他のIDEと比べて起動速度が速く、動作が軽量です。また拡張機能も豊富に揃っているため、効率的に開発を進めることができます。
29 |
30 | VS Codeの公式サイト[^vscode-home]からお使いのOSに応じたインストーラーをダウンロードし、端末にインストールを行ってください(図1)。
31 |
32 | [^vscode-home]: https://code.visualstudio.com/
33 |
34 |
35 |
36 |
56 |
57 |
66 |
67 |
84 |
85 |
136 |
137 |
156 |
157 |
166 |
167 |
178 |
179 |