5 |
10 |
14 |
18 |
22 | {sectionName} | Machine Learning Version Control System · DVC
23 |
24 | )
25 |
--------------------------------------------------------------------------------
/static/img/features/icons/cluster.svg:
--------------------------------------------------------------------------------
1 |
8 |
--------------------------------------------------------------------------------
/static/img/features/icons/git-icon.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/img/features/icons/languages-icon.svg:
--------------------------------------------------------------------------------
1 |
8 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/index.md:
--------------------------------------------------------------------------------
1 | # Using DVC Commands
2 |
3 | DVC is a command-line tool. The typical use case for DVC goes as follows
4 |
5 | - In an existing Git repository, initialize a DVC repository with `dvc init`.
6 | - Copy source files for modeling into the repository and convert the files into
7 | DVC data files with `dvc add` command.
8 | - Process source data files through your data processing and modeling code using
9 | the `dvc run` command.
10 | - Use `--outs` option to specify `dvc run` command outputs which will be
11 | converted to DVC data files after the code runs.
12 | - Clone a git repo with the code of your ML application pipeline. However, this
13 | will not copy your DVC cache. Use cloud storage settings and `dvc push` to
14 | share the cache (data).
15 | - Use `dvc repro` to quickly reproduce your pipeline on a new iteration, after
16 | your data item files or source code of your ML application are modified.
17 |
--------------------------------------------------------------------------------
/src/TextCollapse/index.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | import styled from 'styled-components'
3 | import Collapse from 'react-collapse'
4 | import { presets } from 'react-motion'
5 |
6 | class TextCollapse extends Component {
7 | state = {
8 | isOpened: false
9 | }
10 |
11 | toggleCollapsed = () => {
12 | this.setState(prevState => ({
13 | isOpened: !prevState.isOpened
14 | }))
15 | }
16 |
17 | render() {
18 | const { children, header } = this.props
19 | const { isOpened } = this.state
20 |
21 | return (
22 |
29 | )
30 | }
31 | }
32 |
33 | export default TextCollapse
34 |
35 | const MoreText = styled.div`
36 | color: #13adc7;
37 | `
38 |
--------------------------------------------------------------------------------
/static/img/features/icons/ml-pipe.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/img/discord.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/static/img/support/discord.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/static/img/features/icons/storage-icon.svg:
--------------------------------------------------------------------------------
1 |
8 |
--------------------------------------------------------------------------------
/static/img/features/icons/repro.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/docs/user-guide/plugins.md:
--------------------------------------------------------------------------------
1 | # IDE Plugins & Syntax Highlighting
2 |
3 | When you add a file or a stage to your pipeline, DVC creates a special
4 | [.dvc file](https://dvc.org/doc/user-guide/dvc-file-format) that contains all
5 | the needed information to track your data and transformations.
6 |
7 | The file itself is in a simple YAML format.
8 |
9 | ## Vim
10 |
11 | In order to recognize DVC stage files as YAML in Vim, you should add:
12 |
13 | ```vim
14 | " DVC
15 | autocmd! BufNewFile,BufRead Dvcfile,*.dvc setfiletype yaml
16 | ```
17 |
18 | to your `~/.vimrc`(to be created if it doesn't exist).
19 |
20 | ## IntelliJ IDEs
21 |
22 | A community member, [@prihoda](https://github.com/prihoda), maintains a plugin
23 | for IntelliJ IDEs, it offers a more robust integration than just syntax
24 | highlighting.
25 |
26 | You can download the plugin from
27 | [JetBrains Plugins repository](https://plugins.jetbrains.com/plugin/11368-dvc-support-poc)
28 |
29 | For more information, visit the plugin's repository:
30 | [iterative/intellij-dvc/](https://github.com/iterative/intellij-dvc/)
31 |
--------------------------------------------------------------------------------
/src/Hamburger/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 |
4 | const WIDTH = 30
5 | const HEIGHT = 3
6 |
7 | export default ({ open }) => (
8 |
9 |
10 |
11 |
12 |
13 | )
14 |
15 | const Wrapper = styled.div`
16 | display: inline-block;
17 | cursor: pointer;
18 | `
19 |
20 | const Line = styled.div`
21 | width: ${WIDTH}px;
22 | height: ${HEIGHT}px;
23 | background-color: #173042;
24 | margin: 5px 0;
25 | transition: 0.4s;
26 |
27 | ${props =>
28 | props.open &&
29 | `
30 | background-color: #fff;
31 | `} ${props =>
32 | props.open &&
33 | props.first &&
34 | `
35 | transform: rotate(-45deg) translate(-7px, 6px);
36 | `};
37 |
38 | ${props =>
39 | props.open &&
40 | props.second &&
41 | `
42 | opacity: 0;
43 | `};
44 |
45 | ${props =>
46 | props.open &&
47 | props.third &&
48 | `
49 |
50 | transform: rotate(45deg) translate(-5px,-5px);
51 | `};
52 | `
53 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/destroy.md:
--------------------------------------------------------------------------------
1 | # destroy
2 |
3 | Remove DVC files from your repository.
4 |
5 | It removes `.dvc` and `Dvcfile` files, `.dvc/` directory. It means cache will be
6 | removed as well by default, if it's not set to an external location (by default
7 | local cache is located in the `.dvc/cache` directory).
8 |
9 | ```usage
10 | usage: dvc destroy [-h] [-q] [-v] [-f]
11 |
12 | optional arguments:
13 | -f, --force Force destruction
14 | ```
15 |
16 | ## Options
17 |
18 | - `-h`, `--help` - prints the usage/help message, and exit.
19 |
20 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
21 | no problems arise, otherwise 1.
22 |
23 | - `-v`, `--verbose` - displays detailed tracing information.
24 |
25 | ## Example
26 |
27 | ```dvc
28 | $ dvc init
29 | $ echo foo > foo
30 | $ dvc add foo
31 | $ ls -a
32 |
33 | .dvc .git code.py foo foo.dvc
34 |
35 | $ dvc destroy
36 |
37 | This will destroy all information about your pipelines as well as cache in .dvc/cache.
38 | Are you sure you want to continue?
39 | yes
40 |
41 | $ ls -a
42 |
43 | .git code.py
44 | ```
45 |
--------------------------------------------------------------------------------
/static/docs/get-started/retrieve-data.md:
--------------------------------------------------------------------------------
1 | # Retrieve Data
2 |
3 | > Make sure that the steps described in
4 | > [initialization](/doc/get-started/initialize) and
5 | > [configuration](/doc/get-started/configure) are completed before you run the
6 | > `dvc pull` command in a newly cloned or initialized Git repository.
7 |
8 | To retrieve data files to your local machine and your project's workspace run:
9 |
10 | ```dvc
11 | $ dvc pull
12 | ```
13 |
14 | This command retrieves data files that are referenced in _all_ `.dvc` files in
15 | the current workspace. So, you usually run it after `git clone`, `git pull`, or
16 | `git checkout`.
17 |
18 | As an easy way to test it:
19 |
20 | ```dvc
21 | $ rm -f data/data.xml
22 | $ dvc pull
23 | ```
24 |
25 | Alternatively, if you want to retrieve a single dataset or a file:
26 |
27 | ```dvc
28 | $ dvc pull data/data.xml.dvc
29 | ```
30 |
31 | DVC remotes, `dvc push`, and `dvc pull` provide a basic collaboration workflow,
32 | the same way as Git remotes, `git push` and `git pull`. See
33 | [Share Data and Model Files](/doc/use-cases/share-data-and-model-files) for more
34 | information.
35 |
--------------------------------------------------------------------------------
/.circleci/config.yml:
--------------------------------------------------------------------------------
1 | # Javascript Node CircleCI 2.0 configuration file
2 | #
3 | # Check https://circleci.com/docs/2.0/language-javascript/ for more details
4 | #
5 | version: 2
6 | jobs:
7 | build:
8 | docker:
9 | # specify the version you desire here
10 | - image: circleci/node:7.10
11 |
12 | # Specify service dependencies here if necessary
13 | # CircleCI maintains a library of pre-built images
14 | # documented at https://circleci.com/docs/2.0/circleci-images/
15 | # - image: circleci/mongo:3.4.4
16 |
17 | working_directory: ~/repo
18 |
19 | steps:
20 | - checkout
21 |
22 | # Download and cache dependencies
23 | - restore_cache:
24 | keys:
25 | - v1-dependencies-{{ checksum "package.json" }}
26 | # fallback to using the latest cache if no exact match is found
27 | - v1-dependencies-
28 |
29 | - run: yarn install
30 |
31 | - save_cache:
32 | paths:
33 | - node_modules
34 | key: v1-dependencies-{{ checksum "package.json" }}
35 |
36 | # run tests!
37 | - run: yarn test
38 |
--------------------------------------------------------------------------------
/static/docs/get-started/index.md:
--------------------------------------------------------------------------------
1 | # Get Started
2 |
3 | Get started is a step by step introduction into basic DVC concepts. It does not
4 | go into details much, but provides links and expandable sections to learn more.
5 |
6 | At the very end there are a few complete step-by-step examples to give you more
7 | hands-on experience with real-life scenarios - first is about model and data set
8 | [versioning](/doc/get-started/example-versioning), and the second one is focused
9 | on [pipelines](/doc/get-started/example-pipeline) and reproducibility.
10 |
11 | ✅ Please, join our [community](/chat) or check these [support](/support)
12 | options if you have any questions or need any help. We are very responsive ⚡.
13 |
14 | ✅ Check out the [Github](https://github.com/iterative/dvc) page and give us a
15 | ⭐ if you like the project!
16 |
17 | ✅ Contribute either on [Github](https://github.com/iterative/dvc) or
18 | [Patreon](https://www.patreon.com/DVCorg/overview) to support the Project.
19 |
20 | This longer [Tutorial](/doc/tutorial) introduces DVC step-by-step while
21 | explaining in great detail the motivation and what's happening internally.
22 |
--------------------------------------------------------------------------------
/static/img/support/bug.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/static/img/support/chat.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/static/img/link.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Logs
2 | logs
3 | *.log
4 | npm-debug.log*
5 | yarn-debug.log*
6 | yarn-error.log*
7 |
8 | # Runtime data
9 | pids
10 | *.pid
11 | *.seed
12 | *.pid.lock
13 |
14 | # Directory for instrumented libs generated by jscoverage/JSCover
15 | lib-cov
16 |
17 | # Coverage directory used by tools like istanbul
18 | coverage
19 |
20 | # nyc test coverage
21 | .nyc_output
22 |
23 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
24 | .grunt
25 |
26 | # Bower dependency directory (https://bower.io/)
27 | bower_components
28 |
29 | # node-waf configuration
30 | .lock-wscript
31 |
32 | # Compiled binary addons (https://nodejs.org/api/addons.html)
33 | build/Release
34 |
35 | # Dependency directories
36 | node_modules/
37 | jspm_packages/
38 |
39 | # TypeScript v1 declaration files
40 | typings/
41 |
42 | # Optional npm cache directory
43 | .npm
44 |
45 | # Optional eslint cache
46 | .eslintcache
47 |
48 | # Optional REPL history
49 | .node_repl_history
50 |
51 | # Output of 'npm pack'
52 | *.tgz
53 |
54 | # Yarn Integrity file
55 | .yarn-integrity
56 |
57 | # dotenv environment variables file
58 | .env
59 |
60 | # next.js build output
61 | .idea
62 | .next/
63 | *.log
64 |
65 | # Mac finder artifacts
66 | .DS_Store
67 |
--------------------------------------------------------------------------------
/static/img/features/icons/branching.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/docs/tutorial/index.md:
--------------------------------------------------------------------------------
1 | # Tutorial
2 |
3 | This tutorial shows you how to solve a text classification problem using the DVC
4 | tool.
5 |
6 | Today the data science community is still lacking good practices for organizing
7 | their projects and effectively collaborating. ML algorithms and methods are no
8 | longer simple _tribal knowledge_ but are still difficult to implement, manage
9 | and reuse.
10 |
11 | > One of the biggest challenges in reusing, and hence the managing of ML
12 | > projects, is its reproducibility.
13 |
14 | DVC has been built to address the reproducibility.
15 |
16 | 
17 |
18 | Git branches should beautifully reflect the non-linear structure common to the
19 | ML process, where each hypothesis can be presented as a Git branch. However,
20 | inability to store data in a repository and the discrepancy between code and
21 | data make it extremely difficult to manage a data science project with Git.
22 |
23 | DVC streamlines large data files and binary models into a single Git environment
24 | and this approach will not require storing binary files in your Git repository.
25 |
26 | ## DVC Workflow
27 |
28 | The diagram below describes all the DVC commands and relationships between local
29 | cache and remote storage.
30 |
31 | 
32 |
--------------------------------------------------------------------------------
/static/docs/get-started/initialize.md:
--------------------------------------------------------------------------------
1 | # Initialize
2 |
3 | In order to start using DVC, you need first to initialize it in your project's
4 | directory. DVC doesn't require Git and can work without any source control
5 | management system, but for the best experience we recommend using DVC on top of
6 | Git repositories.
7 |
8 | If you don't have a directory for your project already, create it now with these
9 | commands:
10 |
11 | ```dvc
12 | $ mkdir example-get-started && cd example-get-started
13 | $ git init
14 | ```
15 |
16 | Run DVC initialization in a repository directory to create DVC metafiles and
17 | directories:
18 |
19 | ```dvc
20 | $ dvc init
21 | $ git commit -m "initialize DVC"
22 | ```
23 |
24 | After DVC initialization, a new directory `.dvc/` will be created with `config`
25 | and `.gitignore` files and `cache` directory. These files and directories are
26 | hidden from the user generally and are not meant to be manipulated directly.
27 |
28 | > See `dvc init` if you want to get more details about the initialization
29 | > process, and
30 | > [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to
31 | > learn about the DVC internal file and directories structure.
32 |
33 | The last command, `git commit`, puts the `.dvc/config` and `.dvc/.gitignore`
34 | files (DVC internals) under Git control.
35 |
--------------------------------------------------------------------------------
/src/Page404/index.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | // styles
3 | import styled from 'styled-components'
4 |
5 | export default class Page404 extends Component {
6 | goBack = () => window.history.back()
7 |
8 | render() {
9 | return (
10 |
11 | 404
12 | Oops! Page Not Found!
13 |
14 | Sorry, but the page you are looking for is not found. Please, make
15 | sure you have typed the current URL.
16 |
17 | Go Back
18 |
19 | )
20 | }
21 | }
22 |
23 | const Wrapper = styled.div`
24 | text-align: center;
25 | max-width: 650px;
26 | margin: 0 auto;
27 | display: flex;
28 | flex-direction: column;
29 | justify-content: center;
30 | align-items: center;
31 | max-height: 80vh;
32 | padding: 30px;
33 | `
34 |
35 | const StatusCode = styled.div`
36 | font-size: 120px;
37 | font-weight: 700;
38 | color: #13adc7;
39 | `
40 |
41 | const Message = styled.div`
42 | font-size: 36px;
43 | font-weight: 600;
44 | `
45 |
46 | const Text = styled.div`
47 | font-size: 18px;
48 | `
49 |
50 | const GoBackLink = styled.a`
51 | cursor: pointer;
52 | color: #13adc7;
53 | max-width: 100px;
54 | margin: 30px 0;
55 | font-size: 18px;
56 | `
57 |
--------------------------------------------------------------------------------
/static/docs/get-started/share-data.md:
--------------------------------------------------------------------------------
1 | # Share Data
2 |
3 | Now, that your data files are managed by DVC (see
4 | [Add Files](/doc/get-started/add-files)), you can push them from your repository
5 | to the default [remote](/doc/commands-reference/remote) storage\*:
6 |
7 | ```dvc
8 | $ dvc push
9 | ```
10 |
11 | The same way as with Git remote, it ensures that your data files and your models
12 | are safely stored remotely and are shareable. It means that this data could be
13 | pulled by your team or you when you need it.
14 |
15 | Usually, you run it along with `git commit` and `git push` to save changes to
16 | `.dvc` files to Git.
17 |
18 | See `dvc push` for more details and options for this command.
19 |
20 | > \*As noted in the DVC [configuration](/doc/get-started/configure) chapter, we
21 | > are using a **local remote** in this guide for educational purposes.
22 |
23 |
24 |
25 | ### Expand to learn more about DVC internals
26 |
27 | You can check now that actual data file has been copied to the remote we created
28 | in the [configuration](/doc/get-started/configure) chapter:
29 |
30 | ```dvc
31 | $ ls -R /tmp/dvc-storage
32 | /tmp/dvc-storage/a3:
33 | 04afb96060aad90176268345e10355
34 | ```
35 |
36 | where `a304afb96060aad90176268345e10355` is an MD5 hash of the `data.xml` file,
37 | and if you check the `data.xml.dvc` meta-file you will see that it has this hash
38 | inside.
39 |
40 |
41 |
--------------------------------------------------------------------------------
/static/docs/get-started/reproduce.md:
--------------------------------------------------------------------------------
1 | # Reproduce
2 |
3 | In the previous section, we described our first pipeline. Basically, we created
4 | a number of `*.dvc` files. Each file describes a single step we need to run to
5 | get to the final result. Each depends on some data (either source data files or
6 | some intermediate results from another `*.dvc` file) and code files.
7 |
8 | If you freshly checked out the project, make sure you first fetch the input data
9 | from DVC by calling `dvc pull`.
10 |
11 | It's now extremely easy for you or anyone in your team to reproduce the result
12 | end-to-end:
13 |
14 | ```dvc
15 | $ dvc repro train.dvc
16 | ```
17 |
18 | `train.dvc` file internally describes what data files and code we should take
19 | and how to run the command to get the binary model file. For each data file it
20 | depends on, we can, in turn, do the same analysis - find a corresponding `.dvc`
21 | file that includes the data file in its outputs, get dependencies and commands,
22 | and so on. It means that DVC can recursively build a complete tree of commands
23 | it needs to execute to get the model file.
24 |
25 | `dvc repro` is, essentially, building this execution graph, detects stages with
26 | modified dependencies or missing outputs and recursively executes this graph
27 | starting from these stages.
28 |
29 | Thus, `dvc run` and `dvc repro` provide a powerful framework for _reproducible
30 | experiments_ and _reproducible projects_.
31 |
--------------------------------------------------------------------------------
/static/img/share.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/img/support/forum.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/static/docs/get-started/pipeline.md:
--------------------------------------------------------------------------------
1 | # Pipeline
2 |
3 | This is the biggest DVC's difference from the other version control tools that
4 | can handle large data files, e.g. `git lfs`. By running `dvc run` multiple times
5 | and specifying outputs of a command (stage) as dependencies in another command
6 | (stage) we can, essentially, describe a sequence of commands that is required to
7 | get to the final result:
8 |
9 | The second stage (after the `prepare.dvc` that we created during the previous
10 | step), feature extraction:
11 |
12 | ```dvc
13 | $ dvc run -f featurize.dvc \
14 | -d src/featurization.py -d data/prepared \
15 | -o data/features \
16 | python src/featurization.py \
17 | data/prepared data/features
18 | ```
19 |
20 | The third stage, training:
21 |
22 | ```dvc
23 | $ dvc run -f train.dvc \
24 | -d src/train.py -d data/features \
25 | -o model.pkl \
26 | python src/train.py data/features model.pkl
27 | ```
28 |
29 | Let's commit DVC files that describe our pipeline so far:
30 |
31 | ```dvc
32 | $ git add data/.gitignore .gitignore featurize.dvc train.dvc
33 | $ git commit -m "add featurization and train steps to the pipeline"
34 | $ dvc push
35 | ```
36 |
37 | This example is simplified just to show you an idea of the pipeline, check
38 | [example](/doc/get-started/example-pipeline) or complete
39 | [tutorial](/doc/tutorial) to see the NLP processing pipeline end-to-end.
40 |
--------------------------------------------------------------------------------
/src/Layout/index.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | import styled from 'styled-components'
3 | import { withRouter } from 'next/router'
4 | // components
5 | import TopMenu from '../TopMenu'
6 | import Footer from '../Footer'
7 | import HamburgerMenu from '../HamburgerMenu'
8 | // utils
9 | import { initGA, logPageView } from '../utils/ga'
10 |
11 | class Layout extends Component {
12 | componentDidMount() {
13 | if (!window.GA_INITIALIZED) {
14 | initGA()
15 | window.GA_INITIALIZED = true
16 | }
17 | logPageView()
18 | }
19 |
20 | componentWillReceiveProps(nextProps) {
21 | const { router } = nextProps
22 | this.isDocPage = router.pathname.split('/')[1] === 'doc'
23 | }
24 |
25 | render() {
26 | const { children } = this.props
27 |
28 | return (
29 |
30 |
31 |
32 |
33 | {children}
34 |
35 |
36 |
37 | )
38 | }
39 | }
40 |
41 | export default withRouter(Layout)
42 |
43 | const Wrapper = styled.div`
44 | overflow: hidden;
45 | `
46 |
47 | const Bodybag = styled.div`
48 | position: fixed;
49 | top: 80px;
50 | bottom: 0;
51 | left: 0;
52 | right: 0;
53 | height: 100%;
54 | overflow-x: hidden;
55 | overflow-y: auto;
56 | transition: top 0.2s linear;
57 | -webkit-overflow-scrolling: touch;
58 | `
59 |
--------------------------------------------------------------------------------
/src/Popover/Popover.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | import ReactPopover from 'react-popover'
3 | import { injectGlobal } from 'styled-components'
4 |
5 | class Popover extends Component {
6 | constructor() {
7 | super()
8 | this.state = {
9 | isOpen: false
10 | }
11 | }
12 |
13 | componentDidMount() {
14 | if (this.trigger) {
15 | this.trigger.addEventListener('click', this.togglePopover)
16 | }
17 | }
18 |
19 | componentWillUnmount() {
20 | if (this.trigger) {
21 | this.trigger.removeEventListener('click', this.togglePopover)
22 | }
23 | }
24 |
25 | togglePopover = () => {
26 | this.setState(prevState => ({ isOpen: !prevState.isOpen }))
27 | }
28 |
29 | closePopover = () => this.setState({ isOpen: false })
30 |
31 | render() {
32 | const { children, ...rest } = this.props
33 | const { isOpen } = this.state
34 |
35 | return (
36 |
37 |
(this.trigger = ref)}>{children}
38 |
39 | )
40 | }
41 | }
42 |
43 | export default Popover
44 |
45 | injectGlobal`
46 | .Popover {
47 |
48 | .Popover-body {
49 | box-shadow: rgba(0,0,0,0.14) 0px 0px 4px, rgba(0,0,0,0.28) 0px 4px 8px;
50 | padding: 5px 15px;
51 | background: white;
52 | color: black;
53 | }
54 |
55 | .Popover-tip {
56 | display: none;
57 | }
58 | }
59 | `
60 |
--------------------------------------------------------------------------------
/static/img/star_small.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/pages/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 | import Head from 'next/head'
4 |
5 | import LearnMore from '../src/LearnMore'
6 |
7 | import Page from '../src/Page'
8 | import Hero from '../src/Hero'
9 | import LandingHero from '../src/LandingHero'
10 | import Diagram from '../src/Diagram'
11 | import PromoSection from '../src/PromoSection'
12 | import UseCases from '../src/UseCases'
13 | import Subscribe from '../src/Subscribe'
14 |
15 | const HeadInjector = () => (
16 |
17 |
23 |
28 | Machine Learning Version Control System · DVC
29 |
30 | )
31 |
32 | export default () => (
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 | )
48 |
49 | const LearnMoreSection = styled.div`
50 | z-index: 2;
51 | position: absolute;
52 | transform: translate(-50%, 0%);
53 | left: 50%;
54 | bottom: 16px;
55 | `
56 |
--------------------------------------------------------------------------------
/static/img/save-reprro.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/cache.md:
--------------------------------------------------------------------------------
1 | # cache
2 |
3 | Contains a helper command to set the cache directory location:
4 | [dir](/doc/commands-reference/cache-dir).
5 |
6 | ## Synopsis
7 |
8 | ```usage
9 | usage: dvc cache [-h] [-q] [-v] {dir} ...
10 |
11 | positional arguments:
12 | dir Configure cache directory location.
13 | ```
14 |
15 | ## Description
16 |
17 | After DVC initialization, a hidden directory `.dvc/` is created with the
18 | [DVC internal files](/doc/user-guide/dvc-files-and-directories), including the
19 | default `cache` directory.
20 |
21 | The DVC cache is where your data files, models, etc (anything you want to
22 | version with DVC) are actually stored. The corresponding files you see in the
23 | working directory or "workspace" simply link to the ones in cache. (See
24 | `dvc config cache` `type` setting for more information on file links on
25 | different platforms.)
26 |
27 | > For more cache-related configuration options refer to `dvc config cache`.
28 |
29 | ## Options
30 |
31 | - `-h`, `--help` - prints the usage/help message, and exit.
32 |
33 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
34 | no problems arise, otherwise 1.
35 |
36 | - `-v`, `--verbose` - displays detailed tracing information.
37 |
38 | ## Example
39 |
40 | The main use of this command is to apply the `-v` and `-q` options to
41 | `dvc cache dir` which doesn't have them:
42 |
43 | ```dvc
44 | $ dvc cache --verbose dir mycache
45 | DEBUG: Writing '/Users/user/myproject/.dvc/config'.
46 | $ dvc config cache.dir
47 | ../mycache
48 | ```
49 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/unlock.md:
--------------------------------------------------------------------------------
1 | # unlock
2 |
3 | Unlock DVC file (stage). See `dvc lock` for more information.
4 |
5 | ```usage
6 | usage: dvc unlock [-h] [-q] [-v] targets [targets ...]
7 |
8 | positional arguments:
9 | targets DVC files.
10 | ```
11 |
12 | ## Options
13 |
14 | - `-h`, `--help` - prints the usage/help message, and exit.
15 |
16 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
17 | no problems arise, otherwise 1.
18 |
19 | - `-v`, `--verbose` - displays detailed tracing information.
20 |
21 | ## Example
22 |
23 | - First, let's create a sample DVC file:
24 |
25 | ```dvc
26 | $ echo foo > foo
27 | $ dvc add foo
28 | $ dvc run -d foo -o bar cp foo bar
29 |
30 | Using 'bar.dvc' as a stage file
31 | Running command:
32 | cp foo bar
33 | ```
34 |
35 | - Then, let's change the file `foo` the stage `bar.dvc` depends on:
36 |
37 | ```dvc
38 | $ rm foo
39 | $ echo foo1 > foo
40 | $ dvc status
41 |
42 | bar.dvc
43 | deps
44 | changed: foo
45 | foo.dvc
46 | outs
47 | changed: foo
48 | ```
49 |
50 | - Now, let's lock the `bar` stage:
51 |
52 | ```dvc
53 | $ dvc lock bar.dvc
54 | $ dvc status
55 |
56 | foo.dvc
57 | outs
58 | changed: foo
59 | ```
60 |
61 | - Run `dvc unlock` to unlock it back:
62 |
63 | ```dvc
64 | $ dvc unlock bar.dvc
65 | $ dvc status
66 |
67 | bar.dvc
68 | deps
69 | changed: foo
70 | foo.dvc
71 | outs
72 | changed: foo
73 | ```
74 |
--------------------------------------------------------------------------------
/static/docs/get-started/compare-experiments.md:
--------------------------------------------------------------------------------
1 | # Compare Experiments
2 |
3 | DVC makes it easy to iterate on your project using Git commits with tags or Git
4 | branches. It provides a way to try different ideas, keep track of them, switch
5 | back and forth. To find the best performing experiment or track the progress, a
6 | special _metric_ output type is supported in DVC (described in one of the
7 | previous steps).
8 |
9 | Let's run evaluate for the latest `bigram` experiment we created in one of the
10 | previous steps. It mostly takes just running the `dvc repro`:
11 |
12 | ```dvc
13 | $ git checkout master
14 | $ dvc checkout
15 | $ dvc repro evaluate.dvc
16 | ```
17 |
18 | `git checkout master` and `dvc checkout` commands ensure that we have the latest
19 | experiment code and data respectively. And `dvc repro`, as we discussed in the
20 | [reproduce](/doc/get-started/reproduce) step, is a way to run all the necessary
21 | commands to build the model and measure its performance.
22 |
23 | ```dvc
24 | $ git commit -a -m "evaluate bigram model"
25 | $ git tag -a "bigram-experiment" -m "bigrams"
26 | ```
27 |
28 | Now, we can use `-T` option of the `dvc metrics show` command to see the
29 | difference between the `baseline` and `bigrams` experiments:
30 |
31 | ```dvc
32 | $ dvc metrics show -T
33 |
34 | baseline-experiment:
35 | auc.metric: 0.588426
36 | bigram-experiment:
37 | auc.metric: 0.602818
38 | ```
39 |
40 | DVC provides built-in support to track and navigate `JSON`, `TSV` or `CSV`
41 | metric files if you want to track additional information. Check `dvc metrics` to
42 | learn more.
43 |
--------------------------------------------------------------------------------
/src/Documentation/Markdown/lang/usage.js:
--------------------------------------------------------------------------------
1 | 'use strict'
2 |
3 | Object.defineProperty(exports, '__esModule', {
4 | value: true
5 | })
6 |
7 | var _javascript = function(hljs) {
8 | var QUOTE_STRING = {
9 | className: 'string',
10 | begin: /"/,
11 | end: /"/,
12 | contains: [
13 | hljs.BACKSLASH_ESCAPE,
14 | {
15 | className: 'variable',
16 | begin: /\$\(/,
17 | end: /\)/,
18 | contains: [hljs.BACKSLASH_ESCAPE]
19 | }
20 | ]
21 | }
22 |
23 | return {
24 | aliases: ['usage'],
25 | contains: [
26 | {
27 | begin: /^\s*(usage:|positional arguments:|optional arguments:)/,
28 | end: /\n|\Z/,
29 | lexemes: /\b-?[a-z\._]+\b/,
30 | keywords: {
31 | keyword: 'usage arguments optional positional'
32 | },
33 | contains: [
34 | {
35 | begin: / dvc [a-z]+/,
36 | keywords: {
37 | built_in:
38 | 'help dvc init add import checkout run pull push fetch status repro ' +
39 | 'remove move gc config remote metrics install root lock unlock ' +
40 | 'pipeline destroy unprotect commit cache diff tag pkg version'
41 | },
42 | className: 'strong'
43 | }
44 | ]
45 | },
46 |
47 | QUOTE_STRING,
48 | hljs.HASH_COMMENT_MODE
49 | ]
50 | }
51 | }
52 |
53 | var _javascript2 = _interopRequireDefault(_javascript)
54 |
55 | function _interopRequireDefault(obj) {
56 | return obj && obj.__esModule ? obj : { default: obj }
57 | }
58 |
59 | exports.default = _javascript2.default
60 |
--------------------------------------------------------------------------------
/static/docs/get-started/experiments.md:
--------------------------------------------------------------------------------
1 | # Experiments
2 |
3 | Data science process is inherently iterative and R&D like - data scientist may
4 | try many different approaches, different hyper-parameter values and "fail" many
5 | times before the required level of a metric is achieved.
6 |
7 | DVC is built to provide a way to capture different experiments and navigate
8 | easily between them. Let's say we want to try a modified feature extraction:
9 |
10 |
11 |
12 | ### Expand to see code modifications
13 |
14 | Edit `src/featurization.py` to enable bigrams and increase the number of
15 | features. Find and change the `CountVectorizer` arguments, specify `ngram_range`
16 | and increase number of features:
17 |
18 | ```python
19 | bag_of_words = CountVectorizer(stop_words='english',
20 | max_features=6000,
21 | ngram_range=(1, 2))
22 | ```
23 |
24 |
25 |
26 | ```dvc
27 | $ vi src/featurization.py # edit to use bigrams (see above)
28 | $ dvc repro train.dvc # get and save the new model.pkl
29 | $ git commit -a -m "bigram model"
30 | ```
31 |
32 | Now, we have a new `model.pkl` captured and saved. To get back to the initial
33 | version we run `git checkout` along with `dvc checkout` command:
34 |
35 | ```
36 | $ git checkout baseline-experiment
37 | $ dvc checkout
38 | ```
39 |
40 | DVC is designed to checkout large data files (no matter how large they are) into
41 | your workspace instantly on almost all modern operating systems with file links.
42 | See [Large Dataset Optimization](/docs/user-guide/large-dataset-optimization)
43 | for more information.
44 |
--------------------------------------------------------------------------------
/static/img/glyph-1.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/static/img/glyph-2.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | [](https://codeclimate.com/github/iterative/dvc.org/maintainability)
4 | [](https://circleci.com/gh/iterative/dvc.org)
5 |
6 | [DVC](https://github.com/iterative/dvc) project documenation and website source
7 | code.
8 |
9 | # Documentation
10 |
11 | Documentation engine was built using React, react Markdown library, Algolia to
12 | aprovide the full text search. Motivation for bulding it from scratch was lack of
13 | flexibility (customization, sidebar tuning) and ads in engines like
14 | readthedocs, docsify, etc.
15 |
16 | Please, feel free to use it for your own sites and
17 | [reach us out](https://dvc.org/support) if you have any questions.
18 |
19 | # Contributing
20 |
21 | We welcome contributions to DVC by the community!
22 |
23 | You can refer to our [Contribution
24 | Guide](https://dvc.org/doc/user-guide/contributing-documentation/) for more
25 | details. Thank you!
26 |
27 | **If you need help:**
28 |
29 | If you have any questions, please join the [community](https://dvc.org/chat)
30 | and use the `#dev-docs` channel to discuss any website and docs related issues.
31 | We are very responsive and happy to help.
32 |
33 | # Copyright
34 |
35 | This project is distributed under the Apache license version 2.0 (see the
36 | LICENSE file in the project root).
37 |
38 | By submitting a pull request for this project, you agree to license your
39 | contribution under the Apache license version 2.0 to this project.
40 |
41 | If you use images, please make a reference to the original site.
42 |
43 |
--------------------------------------------------------------------------------
/static/docs/get-started/agenda.md:
--------------------------------------------------------------------------------
1 | # Agenda
2 |
3 | In the next few sections we will rebuild a simple natural language processing
4 | (NLP) project from scratch. Like we already mentioned, if you'd like to get the
5 | final result or have some issues along the way, you can download the fully
6 | reproducible
7 | [Github DVC project](https://github.com/iterative/example-get-started) here:
8 |
9 | ```dvc
10 | $ git clone https://github.com/iterative/example-get-started
11 | ```
12 |
13 | Otherwise, bear with us and we will introduce the basic DVC concepts and get to
14 | the same result together!
15 |
16 | The idea of the project is a simplified version of the
17 | [tutorial](/doc/tutorial). It explores the NLP problem of predicting tags for a
18 | given StackOverflow question. For example, we want one classifier which can
19 | predict a post that is about the Python language by tagging it `python`.
20 |
21 | 
22 |
23 | Let the NLP nature of the example not to discourage you from using DVC in other
24 | Data Science areas. There was no strong reason behind picking the NLP area. On
25 | contrary, DVC is designed to be pretty agnostic of frameworks, languages, etc.
26 | If you have data files or data sets and/or you produce other data files, models,
27 | data sets and you want to:
28 |
29 | - capture and save those data artifacts the same way we capture code,
30 | - track and switch between different versions of these artifacts easily,
31 | - being able to answer the question of how those artifacts or models were built
32 | in the first place,
33 | - being able to compare them,
34 | - bring best practices to your team and get everyone on the same page.
35 |
36 | Then you are in a good place! Click the `Next` button below to start ↘.
37 |
--------------------------------------------------------------------------------
/src/GithubLine/index.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | import axios from 'axios'
3 | import styled from 'styled-components'
4 |
5 | const repo = `iterative/dvc`
6 | const gh = `https://github.com/${repo}`
7 | const api = `https://api.github.com/repos/${repo}`
8 |
9 | export default class GithubLine extends Component {
10 | static defaultProps = {}
11 |
12 | state = {
13 | count: `–––`
14 | }
15 |
16 | componentWillMount() {
17 | axios.get(api).then(this.process)
18 | }
19 |
20 | process = res => {
21 | const count = res.data.stargazers_count
22 | this.setState({
23 | count
24 | })
25 | }
26 |
27 | render() {
28 | const { count } = this.state
29 | return (
30 |
31 |
32 | We’re on
33 | Github
34 | {' '}
39 | {count}
40 |
41 | )
42 | }
43 | }
44 |
45 | const Wrapper = styled.div`
46 | font-family: BrandonGrotesqueMed;
47 | line-height: 20px;
48 | height: 20px;
49 | display: flex;
50 | align-items: center;
51 | `
52 |
53 | const Link = styled.a`
54 | font-family: BrandonGrotesqueMed;
55 | color: #40364d;
56 | margin-left: 0.3em;
57 |
58 | &:focus,
59 | &:hover,
60 | &:visited {
61 | color: #40364d;
62 | }
63 | `
64 |
65 | const Github = styled.img`
66 | font-family: BrandonGrotesqueMed;
67 | margin-right: 9px;
68 | `
69 |
70 | const Star = styled.img`
71 | margin-left: 7px;
72 | `
73 |
74 | const Count = styled.span`
75 | font-family: BrandonGrotesqueMed;
76 | margin-left: 6.3px;
77 | `
78 |
--------------------------------------------------------------------------------
/static/docs/get-started/metrics.md:
--------------------------------------------------------------------------------
1 | # Experiment Metrics
2 |
3 | The last stage we would like to add to the pipeline is the evaluation stage.
4 | Data science is a metric-driven R&D-like process and `dvc metrics` along with
5 | DVC metric files provide a framework to capture and compare experiments
6 | performance. It does not require installing any databases or instrumenting your
7 | code to use some API, all is tracked by Git and is stored in Git or DVC remote
8 | storage:
9 |
10 | ```dvc
11 | $ dvc run -f evaluate.dvc \
12 | -d src/evaluate.py -d model.pkl -d data/features \
13 | -M auc.metric \
14 | python src/evaluate.py model.pkl \
15 | data/features auc.metric
16 | ```
17 |
18 | `evaluate.py` calculates AUC value using the test data set. It reads features
19 | from the `features/test.pkl` file and produces a DVC metric file - `auc.metric`.
20 | It is a special DVC output file type, in this case it's just a plain text file
21 | with a single number inside.
22 |
23 | > Please, refer to the `dvc metrics` command documentation to see more available
24 | > options and details.
25 |
26 | Let's again commit and save results:
27 |
28 | ```dvc
29 | $ git add evaluate.dvc auc.metric
30 | $ git commit -m "add evaluation step to the pipeline"
31 | $ dvc push
32 | ```
33 |
34 | Let's also assign a Git tag, it will serve as a checkpoint for us to compare
35 | experiments in the future, or if we need to go back and checkout it and the
36 | corresponding data:
37 |
38 | ```dvc
39 | $ git tag -a "baseline-experiment" -m "baseline"
40 | ```
41 |
42 | The `dvc metrics show` command provides a way to compare different experiments,
43 | by analyzing DVC metric files across different branches, tags, etc. But first we
44 | need to create a new experiment to compare the baseline with.
45 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/remove.md:
--------------------------------------------------------------------------------
1 | # remove
2 |
3 | Remove data file or data directory.
4 |
5 | This command safely removes data files or stage outputs that are tracked by DVC
6 | from your _workspace_. It takes a `.dvc` file and removes all outputs and
7 | optionally removes the file itself.
8 |
9 | Note that it _does not_ remove files from the DVC cache or remote storage (see
10 | `dvc gc`). However, remember to run `dvc push` to save the files you actually
11 | want to use or share in the future.
12 |
13 | ```usage
14 | usage: dvc remove [-h] [-q] [-v] [-o | -p] [-f] targets [targets ...]
15 |
16 | positional arguments:
17 | targets DVC files.
18 | ```
19 |
20 | Check also [Update Tracked Files](/doc/user-guide/update-tracked-file) to see
21 | how it can be used to replace or modify files that are under DVC control.
22 |
23 | ## Options
24 |
25 | - `-o`, `--outs` (default) - remove outputs described in the provided DVC
26 | file(s), keep the DVC files.
27 |
28 | - `-p`, `--purge` - remove outputs and DVC files.
29 |
30 | - `-f`, `--force` - force purge. Skip confirmation prompt.
31 |
32 | - `-h`, `--help` - prints the usage/help message, and exit.
33 |
34 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
35 | no problems arise, otherwise 1.
36 |
37 | - `-v`, `--verbose` - displays detailed tracing information.
38 |
39 | ## Examples
40 |
41 | Let's imagine we have a `data.csv` under DVC control:
42 |
43 | ```dvc
44 | $ dvc add data.csv
45 | $ ls data.csv*
46 |
47 | data.csv
48 | data.csv.dvc
49 | ```
50 |
51 | Remove `data.csv` data file:
52 |
53 | ```dvc
54 | $ dvc remove data.csv.dvc
55 | $ ls data.csv*
56 |
57 | data.csv.dvc
58 | ```
59 |
60 | Purge DVC files:
61 |
62 | ```dvc
63 | $ dvc remove data.csv.dvc -p
64 | $ ls data.csv*
65 | ```
66 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/remote_remove.md:
--------------------------------------------------------------------------------
1 | # remote remove
2 |
3 | Remove a specified remote. This command affects DVC configuration files only, it
4 | does not physically remove your data files stored remotely.
5 |
6 | See also [add](/doc/commands-reference/remote-add),
7 | [default](/doc/commands-reference/remote-default),
8 | [list](/doc/commands-reference/remote-list), and
9 | [modify](/doc/commands-reference/remote-modify) commands to manage data remotes.
10 |
11 | ## Synopsis
12 |
13 | ```usage
14 | usage: dvc remote remove [-h] [-q | -v]
15 | [--global] [--system] [--local]
16 | name
17 |
18 | positional arguments:
19 | name Name of the remote to remove
20 | ```
21 |
22 | ## Description
23 |
24 | Remote `name` is required.
25 |
26 | This command removes a section in the DVC
27 | [config file](/doc/user-guide/dvc-files-and-directories). Alternatively, it is
28 | possible to edit config files manually.
29 |
30 | ## Options
31 |
32 | - `--global` - save remote configuration to the global config (e.g.
33 | `~/.config/dvc/config`) instead of `.dvc/config`.
34 |
35 | - `--system` - save remote configuration to the system config (e.g.
36 | `/etc/dvc.config`) instead of `.dvc/config`.
37 |
38 | - `--local` - remove remote specified in the
39 | [local](/doc/user-guide/dvc-files-and-directories) configuration file
40 | (`.dvc/config.local`). Local configuration files stores private settings or
41 | local environment specific settings that should not be tracked by Git.
42 |
43 | ## Examples
44 |
45 | Add AWS S3 remote and modify its region:
46 |
47 | ```dvc
48 | $ dvc remote add myremote s3://mybucket/myproject
49 | $ dvc remote modify myremote region us-east-2
50 | ```
51 |
52 | Remove remote:
53 |
54 | ```dvc
55 | $ dvc remote remove myremote
56 | ```
57 |
--------------------------------------------------------------------------------
/static/docs/understanding-dvc/existing-tools.md:
--------------------------------------------------------------------------------
1 | # Tools for Data Scientists
2 |
3 | ## Existing engineering tools
4 |
5 | There is one common opinion regarding data science tooling. Data scientists as
6 | engineers are supposed to use the best practices and collaboration software from
7 | software engineering. Source code version control system (Git), continuous
8 | integration services (CI), and unit test frameworks are all expected to be
9 | utilized in the data science pipeline.
10 |
11 | But a comprehensive look at data science processes shows that the software
12 | engineering toolset does not cover data science needs. Try to answer all the
13 | questions from the above using only engineering tools, and you are likely to be
14 | left wanting for more.
15 |
16 | ## Experiment management software
17 |
18 | To solve data scientists collaboration issues a new type of software was
19 | created - **experiment management software**. This software aims to cover the
20 | gap between data scientist needs and the existing toolset.
21 |
22 | The experimentation software is usually **user interface (UI) based in contrast
23 | to the existing command line engineering tools**. The UI is a bridge to a
24 | **separate cloud based environment**. The cloud environment is usually not so
25 | flexible as local data scientists environment. And the cloud environment is not
26 | fully integrated with the local environment.
27 |
28 | The separation of the local data scientist environment and the experimentation
29 | cloud environment creates another discrepancy issue and the environment
30 | synchronization requires addition work. Also, this style of software usually
31 | require external services, typically accompanied with a monthly bill. This might
32 | be a good solution for a particular companies or groups of data scientists.
33 | However a more accessible, free tool is needed for a wider audience.
34 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/remote_list.md:
--------------------------------------------------------------------------------
1 | # remote list
2 |
3 | Show all available remotes.
4 |
5 | See also [add](/doc/commands-reference/remote-add),
6 | [default](/doc/commands-reference/remote-default),
7 | [modify](/doc/commands-reference/remote-modify), and
8 | [remove](/doc/commands-reference/remote-remove) commands to manage data remotes.
9 |
10 | ## Synopsis
11 |
12 | ```usage
13 | usage: dvc remote list [-h] [-q | -v]
14 | [--global] [--system] [--local]
15 |
16 | List remotes.
17 | ```
18 |
19 | ## Description
20 |
21 | Reads DVC configuration files and prints the list of available remotes.
22 | Including names and URLs.
23 |
24 | ## Options
25 |
26 | - `--global` - save remote configuration to the global config (e.g.
27 | `~/.config/dvc/config`) instead of `.dvc/config`.
28 |
29 | - `--system` - save remote configuration to the system config (e.g.
30 | `/etc/dvc.config`) instead of `.dvc/config`.
31 |
32 | - `--local` - list remotes specified in the
33 | [local](/doc/user-guide/dvc-files-and-directories) configuration file
34 | (`.dvc/config.local`). Local configuration files stores private settings that
35 | should not be tracked by Git.
36 |
37 | ## Examples
38 |
39 | Let's for simplicity add a default local remote:
40 |
41 |
42 |
43 | ### What is a "local remote" ?
44 |
45 | While the term may seem contradictory, it doesn't have to be. The "local" part
46 | refers to the machine where the project is stored, so it can be any directory
47 | accessible to the same system. The "remote" part refers specifically to the
48 | project/repository itself.
49 |
50 |
51 |
52 | ```dvc
53 | $ dvc remote add -d myremote /path/to/remote
54 | Setting 'myremote' as a default remote.
55 | ```
56 |
57 | And now the list of remotes should look like:
58 |
59 | ```dvc
60 | $ dvc remote list
61 |
62 | myremote /path/to/remote
63 | ```
64 |
--------------------------------------------------------------------------------
/static/docs/get-started/older-versions.md:
--------------------------------------------------------------------------------
1 | # Get Older Data Version
2 |
3 | Now that we have multiple experiments, models, processed data sets, the question
4 | is how do we revert back to an older version of a model file? Or how can we get
5 | the previous version of the data set if it was changed at some point?
6 |
7 | The answer is the `dvc checkout` command, and we already touched briefly the
8 | process of switching between different data versions in the
9 | [Experiments](/doc/get-started/experiments) step of this get started guide.
10 |
11 | Let's say we want to get the previous `model.pkl` file. The short answer is:
12 |
13 | ```dvc
14 | $ git checkout baseline-experiment train.dvc
15 | $ dvc checkout train.dvc
16 | ```
17 |
18 | These two commands will bring the previous model file to its place in the
19 | working tree.
20 |
21 |
22 |
23 | ### Expand to learn more about DVC internals
24 |
25 | DVC is using special meta-files (`.dvc` files) to track data files, directories,
26 | end results that are under DVC control. In this case, `train.dvc` among other
27 | things describes the `model.pkl` file this way:
28 |
29 | ```yaml
30 | outs:
31 | md5: a66489653d1b6a8ba989799367b32c43
32 | path: model.pkl
33 | ```
34 |
35 | `a664...2c43` is the "address" of the file in the local or remote DVC storage.
36 |
37 | It means that if we want to get to the previous version, we need to restore the
38 | DVC file first to with the `git checkout` command. Only after that DVC can
39 | restore the model file using the new "address" from the `.dvc` file.
40 |
41 |
42 |
43 | To fully restore the previous experiment we just run `git checkout` and
44 | `dvc checkout` without specifying a target:
45 |
46 | ```dvc
47 | $ git checkout baseline-experiment
48 | $ dvc checkout
49 | ```
50 |
51 | Read `dvc checkout` command reference and a dedicated data versioning
52 | [example](/doc/get-started/example-versioning) to get more information.
53 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/cache_dir.md:
--------------------------------------------------------------------------------
1 | # dir
2 |
3 | Sets the cache directory location.
4 |
5 | ## Synopsis
6 |
7 | ```usage
8 | usage: dvc cache dir [-h] [--global] [--system] [--local] [-u] value
9 |
10 | positional arguments:
11 | value Path to cache directory. Relative paths are resolved relative
12 | to the current directory and saved to config relative to the
13 | config file location.
14 | ```
15 |
16 | ## Description
17 |
18 | Sets the `cache.dir` configuration option. Unlike doing so with
19 | `dvc config cache`, this command transform paths (`value`) that are provided
20 | relative to the present working directory into relative to the specified config
21 | file, as they are expected in the config file.
22 |
23 | ## Options
24 |
25 | - `--global` - modify a global config file (e.g. `~/.config/dvc/config`) instead
26 | of the project's `.dvc/config`.
27 |
28 | - `--system` - modify a system config file (e.g. `/etc/dvc.config`) instead of
29 | `.dvc/config`.
30 |
31 | - `--local` - modify a local config file instead of `.dvc/config`. It is located
32 | in `.dvc/config.local` and is Git-ignored. This is useful when you need to
33 | specify private config options in your config, that you don't want to track
34 | and share through Git.
35 |
36 | - `-u`, `--unset` - remove a specified config option from a config file.
37 |
38 | - `-h`, `--help` - prints the usage/help message, and exit.
39 |
40 | ## Examples: Using relative path
41 |
42 | ```dvc
43 | $ dvc cache dir ../dir
44 | $ cat .dvc/config
45 | ...
46 | [cache]
47 | dir = ../../dir
48 | ...
49 | ```
50 |
51 | `../dir` has been resolved relative to `.dvc/config` location, resulting in
52 | `../../dir`.
53 |
54 | ## Examples: Using absolute path
55 |
56 | ```dvc
57 | $ dvc cache dir /path/to/dir
58 | $ cat .dvc/config
59 | ...
60 | [cache]
61 | dir = /path/to/dir
62 | ...
63 | ```
64 |
65 | Absolute path `/path/to/dir` saved as is.
66 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/init.md:
--------------------------------------------------------------------------------
1 | # init
2 |
3 | This command initializes a DVC environment in a current Git repository.
4 |
5 | ## Synopsis
6 |
7 | ```usage
8 | usage: dvc init [-h] [-q] [-v] [--no-scm]
9 | ```
10 |
11 | ## Options
12 |
13 | - `--no-scm` - skip Git specific initializations, `.dvc/.gitignore` will not be
14 | populated and added to Git.
15 |
16 | - `-f`, `--force` - remove `.dvc/` if it exists before initialization. Will
17 | remove all local cache. Useful when first `dvc init` got corrupted for some
18 | reason.
19 |
20 | - `-h`, `--help` - prints the usage/help message, and exit.
21 |
22 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
23 | no problems arise, otherwise 1.
24 |
25 | - `-v`, `--verbose` - displays detailed tracing information.
26 |
27 | ## Details
28 |
29 | After DVC initialization, a new directory `.dvc/` will be created with `config`
30 | and `.gitignore` files and `cache` directory. These files and directories are
31 | hidden from the user generally and are not meant to be manipulated directly.
32 |
33 | `.dvc/cache directory` is one of the most important parts of any DVC
34 | repositories. The directory contains all content of data files. The most
35 | important part about this directory is that `.dvc/.gitignore` file is containing
36 | this directory which means that the cache directory is not under Git control —
37 | this is your local directory and you cannot push it to any Git remote.
38 |
39 | ## Examples
40 |
41 | - Creating a new DVC repository:
42 |
43 | ```dvc
44 | $ mkdir tag_classifier
45 | $ cd tag_classifier
46 |
47 | $ git init
48 | $ dvc init
49 | $ git status
50 |
51 | new file: .dvc/.gitignore
52 | new file: .dvc/config
53 |
54 | $ git commit -m "Init DVC"
55 | ```
56 |
57 | - Cache directory is not under git control, it contains data and model files and
58 | is managed by DVC:
59 |
60 | ```dvc
61 | $ cat .dvc/.gitignore
62 | cache
63 | state
64 | lock
65 | ```
66 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/lock.md:
--------------------------------------------------------------------------------
1 | # lock
2 |
3 | Lock a DVC file (stage). Use `dvc unlock` to unlock the file.
4 |
5 | If DVC file is locked the stage is considered _done_ and `dvc repro` will not
6 | run commands to rebuild outputs even if some dependencies have changed and even
7 | if `--force` is provided.
8 |
9 | ```usage
10 | usage: dvc lock [-h] [-q] [-v] targets [targets ...]
11 |
12 | positional arguments:
13 | targets DVC files
14 | ```
15 |
16 | ## Description
17 |
18 | Lock is useful to avoid syncing data from the top of a pipeline and keep
19 | iterating on the last stages only. In this sense `lock` makes a stage file to
20 | behave as a `.dvc` file that would be created by `dvc add` ran on outputs.
21 |
22 | ## Options
23 |
24 | - `-h`, `--help` - prints the usage/help message, and exit.
25 |
26 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
27 | no problems arise, otherwise 1.
28 |
29 | - `-v`, `--verbose` - displays detailed tracing information.
30 |
31 | ## Example
32 |
33 | - First, let's create a sample DVC file:
34 |
35 | ```dvc
36 | $ echo foo > foo
37 | $ dvc add foo
38 | $ dvc run -d foo -o bar cp foo bar
39 |
40 | Using 'bar.dvc' as a stage file
41 | Running command:
42 | cp foo bar
43 | ```
44 |
45 | - Then, let's change the file `foo` the stage `bar.dvc` depends on:
46 |
47 | ```dvc
48 | $ rm foo
49 | $ echo foo1 > foo
50 | $ dvc status
51 |
52 | bar.dvc
53 | deps
54 | changed: foo
55 | foo.dvc
56 | outs
57 | changed: foo
58 | ```
59 |
60 | - Now, let's lock the `bar` stage:
61 |
62 | ```dvc
63 | $ dvc lock bar.dvc
64 | $ dvc status
65 |
66 | foo.dvc
67 | outs
68 | changed: foo
69 | ```
70 |
71 | - Run `dvc unlock` to unlock it back:
72 |
73 | ```dvc
74 | $ dvc unlock bar.dvc
75 | $ dvc status
76 |
77 | bar.dvc
78 | deps
79 | changed: foo
80 | foo.dvc
81 | outs
82 | changed: foo
83 | ```
84 |
--------------------------------------------------------------------------------
/src/LearnMore/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled, { keyframes } from 'styled-components'
3 | import { media } from '../styles'
4 | import { scroller } from 'react-scroll'
5 | import { logEvent } from '../utils/ga'
6 |
7 | const scrollToDiagram = () => {
8 | logEvent('hero', 'learn-more')
9 | scroller.scrollTo('diagram-section', {
10 | duration: 800,
11 | offset: -75,
12 | delay: 0,
13 | smooth: 'easeInOut',
14 | containerId: 'bodybag'
15 | })
16 | }
17 |
18 | export default () => (
19 |
20 |
21 |
22 |
23 |
Learn more
24 |
25 | )
26 |
27 | const LearnMore = styled.a`
28 | display: flex;
29 | flex-direction: column;
30 | align-items: center;
31 | text-decoration: none;
32 | cursor: pointer;
33 | `
34 |
35 | export const bounce = keyframes`
36 | 0%, 30%, 45%, 65%, 100% {
37 | transform: translateY(0);
38 | }
39 |
40 | 40% {
41 | transform: translateY(-13px);
42 | }
43 |
44 | 60% {
45 | transform: translateY(-5px);
46 | }
47 | `
48 |
49 | export const bounce_mobile = keyframes`
50 | 0%, 30%, 50%, 70%, 100% {
51 | transform: translateY(0);
52 | }
53 |
54 | 40% {
55 | transform: translateY(-13px);
56 | }
57 |
58 | 60% {
59 | transform: translateY(-5px);
60 | }
61 | `
62 |
63 | const Icon = styled.div`
64 | width: 11px;
65 | height: 19px;
66 | will-change: transform;
67 | animation: ${bounce} 3s infinite;
68 |
69 | ${media.phone`animation: ${bounce_mobile} 3s infinite;`};
70 | ${media.phablet`animation: ${bounce_mobile} 3s infinite;`};
71 | `
72 |
73 | const Caption = styled.p`
74 | font-family: BrandonGrotesqueMed;
75 | line-height: 23px;
76 | font-size: 16px;
77 | font-weight: 500;
78 | color: #b0b8c5;
79 | display: initial;
80 | ${media.giant`display: initial;`};
81 | ${media.desktop`display: initial;`};
82 | ${media.tablet`display: initial;`};
83 | ${media.phablet`display: none;`};
84 | ${media.phone`display: none;`};
85 | `
86 |
--------------------------------------------------------------------------------
/static/docs/tutorial/sharing-data.md:
--------------------------------------------------------------------------------
1 | # Sharing Data
2 |
3 | ## Pushing data to the cloud
4 |
5 | It is pretty clear how code and DVC-files can be shared through Git
6 | repositories. These repositories will contain all the information needed for
7 | reproducibility and it might be a good idea to share these DVC-repositories
8 | using GitHub or other Git services.
9 |
10 | DVC is able to push the cache to a cloud.
11 |
12 | > Using your shared cache a colleague can reuse ML models that were trained on
13 | > your machine.
14 |
15 | First, you need to modify the cloud settings in the DVC config file. This can be
16 | done using the CLI as shown below.
17 |
18 | > Note that we are using `dvc-share` s3 bucket as an example and you don't have
19 | > write access to it, so in order to follow the tutorial you will need to either
20 | > create your own s3 bucket or use other types of
21 | > [remote storage](/doc/commands-reference/remote). E.g. you can set up a local
22 | > remote as we did in the [Get Started configure](/doc/get-started/configure)
23 | > section.
24 |
25 | ```dvc
26 | $ dvc remote add -d upstream s3://dvc-share/classify
27 | $ git status -s
28 | M .dvc/config
29 | ```
30 |
31 | Then, a simple command pushes files from your local cache to the cloud:
32 |
33 | ```dvc
34 | $ dvc push
35 | (1/9): [#########################] 100% 23/404ed8212fc1ee6f5a81ff6f6df2ef
36 | (2/9): [########## ] 34% 5f/42ecd9a121b4382cd6510534533ec3
37 | ```
38 |
39 | The command does not push all cached files, but only the ones that belong to the
40 | currently active Git repository and branch.
41 |
42 | For example, in this tutorial 16 data files were created and only 9 will be
43 | pushed because the rest of the data files belong to different branches like
44 | `bigram`.
45 |
46 | ## Pulling data from the cloud
47 |
48 | In order to reuse your data files, a colleague of yours can pull data the same
49 | way from the master branch:
50 |
51 | ```dvc
52 | $ git clone https://github.com/dmpetrov/new_tag_classifier.git
53 | $ dvc pull
54 | ```
55 |
56 | After executing this command, all the data files will be in the right place. You
57 | can check that by trying to reproduce the default goal:
58 |
59 | ```dvc
60 | # Nothing to reproduce:
61 | $ dvc repro
62 | ```
63 |
--------------------------------------------------------------------------------
/src/SubscribeForm/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 | import { media } from '../styles'
4 |
5 | export default () => (
6 |
38 | )
39 |
40 | const Form = styled.form`
41 | width: 100%;
42 | height: 100%;
43 | border-radius: 8px;
44 | background-color: #ffffff;
45 | display: flex;
46 |
47 | ${media.phablet`
48 | flex-direction: column;
49 | `};
50 | `
51 |
52 | const Input = styled.input`
53 | font-family: BrandonGrotesqueMed;
54 | display: flex;
55 | flex: 1;
56 | padding: 16px
57 | border: none;
58 | border-radius: 8px 0px 0px 8px;
59 | font-size: 20px;
60 | font-weight: 500;
61 |
62 | ${media.phablet`
63 | border-radius: 4px 4px 0px 0px;
64 | `};
65 |
66 | `
67 |
68 | const Button = styled.button`
69 | font-family: BrandonGrotesqueMed;
70 | width: 115px;
71 | border: none;
72 | border-radius: 0px 8px 8px 0px;
73 | background-color: #e4fbff;
74 | font-size: 20px;
75 | font-weight: 500;
76 | color: #13adc7;
77 | cursor: pointer;
78 |
79 | &:hover {
80 | background-color: #daf1f5;
81 | }
82 |
83 | ${media.phablet`
84 | min-height: 60px;
85 | width: 100%;
86 | border-radius: 0px 0px 4px 4px;
87 | justify-content: center;
88 | `};
89 | `
90 |
--------------------------------------------------------------------------------
/static/docs/changelog/0.18.md:
--------------------------------------------------------------------------------
1 | # v0.12 - v0.18
2 |
3 | We have been working hard last few weeks improving **user experience**,
4 | **performance**, and **documentation**. Kudos to `@sotte` and `@Hong-Xiang` for
5 | the great feedback they gave us!
6 |
7 | We are very close to the 1.0 release! Two major changes are coming in DVC 1.0 -
8 | [commit semantics](https://github.com/iterative/dvc/issues/919#issuecomment-414540094)
9 | and
10 | [execution matrix](https://github.com/iterative/dvc/issues/973#issuecomment-412739728).
11 | Both changes should make DVC more flexible for ML practitioners. Please, read,
12 | discuss and let us know about your thoughts!
13 |
14 | Since the last announcement we have released versions 0.12 through 0.18 and are
15 | really excited to share the progress with you:
16 |
17 | - ⚡ **DVC just got faster**:
18 |
19 | - Data files management commands - `dvc add`, `dvc push`, `dvc pull`, etc got
20 | up to 10x faster on data sets with large number of files.
21 |
22 | - Commands startup latency reduced 3x
23 |
24 | - 📙 **Documentation got better** - a whole new [get started](/doc/get-started)
25 | guide, new [use cases](/doc/use-cases), DVC internals, and lot of other great
26 | stuff you can find here.
27 |
28 | - 🙂 **Usability improvements** - DVC interface got more informative and easier
29 | to use:
30 |
31 | - More heavy operations render dynamic progress bar (e.g. hash computation):
32 | 
33 |
34 | - Pipeline visualization via command line. Just run `dvc pipeline show` with
35 | `ascii` option and a target: 
36 |
37 | - Many hidden gems: `dvc repro` dry and interactive modes, improved overall
38 | commands verbosity and revised commands help.
39 |
40 | - 💎 **Other changes include** -
41 | [DEB and RPM repositories](/doc/get-started/install), bug fixes and other
42 | things you can check in the complete
43 | [changelog](https://github.com/iterative/dvc/releases).
44 |
45 | Please use the discussion forum [discuss.dvc.org](discuss.dvc.org) and
46 | [issue tracker]() and don't hesitate to [⭐](https://github.com/iterative/dvc)
47 | our [DVC repository](https://github.com/iterative/dvc) if you haven't yet. We
48 | are waiting for your feedback!
49 |
--------------------------------------------------------------------------------
/static/docs/understanding-dvc/collaboration-issues.md:
--------------------------------------------------------------------------------
1 | # Collaboration Issues in Data Science
2 |
3 | Even with all the successes today in machine learning (ML), specifically deep
4 | learning and its applications in business, the data science community is still
5 | lacking good practices for organizing their projects and effectively
6 | collaborating across their varied ML projects. This is a massive challenge for
7 | the community and the industry now, when ML algorithms and methods are no longer
8 | simply "tribal knowledge" but are still difficult to implement, reuse, and
9 | manage.
10 |
11 | To make progress in this challenge, many areas of the ML experimentation process
12 | need to be formalized. Many common questions need to be answered in an unified,
13 | principled way:
14 |
15 | 1. Source code and data versioning.
16 |
17 | - How do you avoid any discrepancies between versions of the source code and
18 | versions of the data files when the data cannot fit into a repository?
19 |
20 | 2. Experiment time log.
21 |
22 | - How do you track which of the hyperparameter changes contributed the most
23 | to producing your target metric? How do you monitor the extent of each
24 | change?
25 |
26 | 3. Navigating through experiments.
27 |
28 | - How do you recover a model from last week without wasting time waiting for
29 | the model to re-train?
30 |
31 | - How do you quickly switch between the large data source and a small data
32 | subset without modifying source code?
33 |
34 | 4. Reproducibility.
35 |
36 | - How do you rerun a model's evaluation without re-training the model and
37 | preprocessing a raw dataset?
38 |
39 | 5. Managing and sharing large data files.
40 |
41 | - How do you share models trained in a GPU environment with colleagues who do
42 | not have access to a GPU?
43 |
44 | - How do you share the entire 147 GB of your project, with all of its data
45 | sources, intermediate data files, and models?
46 |
47 | Some of these questions are easy to answer individually. Any data scientist,
48 | engineer, or manager knows or could easily find answers to some of them.
49 | However, the variety of answers and approaches makes data science collaboration
50 | a nightmare. **A systematic approach is required.**
51 |
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "name": "website",
3 | "version": "1.0.0",
4 | "description": "dvc.org – website source code",
5 | "main": "index.js",
6 | "scripts": {
7 | "dev": "node server.js",
8 | "build": "next build",
9 | "start": "NODE_ENV=production node server.js",
10 | "test": "echo \"Error: no test specified\" && exit 1",
11 | "prettier-js": "prettier --write '{pages,src}/**/*.{js,jsx}'",
12 | "prettier-md": "prettier --write 'static/docs/**/*.md'"
13 | },
14 | "repository": {
15 | "type": "git",
16 | "url": "git+https://github.com/iterative/dvc.org.git"
17 | },
18 | "author": "",
19 | "license": "ISC",
20 | "bugs": {
21 | "url": "https://github.com/iterative/dvc.org/issues"
22 | },
23 | "homepage": "https://github.com/iterative/dvc.org#readme",
24 | "dependencies": {
25 | "animate-css-styled-components": "0.0.20",
26 | "axios": "^0.19.0",
27 | "babel-plugin-transform-define": "^1.3.1",
28 | "color": "^3.0.0",
29 | "force-ssl-heroku": "^1.0.2",
30 | "github-markdown-css": "^2.10.0",
31 | "heroku-ssl-redirect": "0.0.4",
32 | "jquery": "^3.4.1",
33 | "lodash": ">=4.17.11",
34 | "lodash.compact": "^3.0.1",
35 | "lodash.flatten": "^4.4.0",
36 | "lodash.includes": "^4.3.0",
37 | "lodash.kebabcase": "^4.1.1",
38 | "lodash.startcase": "^4.4.0",
39 | "lodash.throttle": "^4.1.1",
40 | "next": "^7.0.2",
41 | "perfect-scrollbar": "^1.4.0",
42 | "raw-loader": "^0.5.1",
43 | "react": "^16.4.1",
44 | "react-collapse": "^4.0.3",
45 | "react-collapsible": "^2.3.1",
46 | "react-dom": "^16.4.2",
47 | "react-ga": "^2.5.3",
48 | "react-markdown": "^3.4.1",
49 | "react-motion": "^0.5.2",
50 | "react-popover": "^0.5.10",
51 | "react-scroll": "^1.7.9",
52 | "react-slick": "^0.23.1",
53 | "react-syntax-highlighter": "^10.2.1",
54 | "react-youtube": "^7.6.0",
55 | "styled-components": "^3.3.2",
56 | "styled-reset": "^1.3.4",
57 | "unist-util-visit": "latest"
58 | },
59 | "devDependencies": {
60 | "babel-plugin-styled-components": "^1.5.1",
61 | "babel-plugin-transform-object-assign": "^6.22.0",
62 | "husky": "^2.3.0",
63 | "prettier": "^1.17.0",
64 | "pretty-quick": "^1.11.0"
65 | },
66 | "husky": {
67 | "hooks": {
68 | "pre-commit": "pretty-quick --staged --bail"
69 | }
70 | }
71 | }
72 |
--------------------------------------------------------------------------------
/static/img/github_icon.svg:
--------------------------------------------------------------------------------
1 |
7 |
--------------------------------------------------------------------------------
/static/docs/understanding-dvc/resources.md:
--------------------------------------------------------------------------------
1 | # Resources
2 |
3 | ## Talks
4 |
5 | - A fun video about reproducibility and DVC:
6 |
7 |
10 |
11 | - DVC Co-founder Dmitry Petrov talking about Model and Dataset versioning
12 | practises using DVC in Pycon, 2019:
13 |
14 |
17 |
18 | - DVC Co-founder Dmitry Petrov talking about Model and Dataset versioning
19 | practises using DVC in PyData Berlin, 2018:
20 |
21 |
24 |
25 | - Podcast featured by `Podcast.__init__`:
26 | [Version control For your Machine Learning Projects](https://www.pythonpodcast.com/data-version-control-episode-206/) -
27 |
28 | ## Articles
29 |
30 | - [Data version control with DVC. What do the authors have to say?](https://towardsdatascience.com/data-version-control-with-dvc-what-do-the-authors-have-to-say-3c3b10f27ee)
31 | - [Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis](https://towardsdatascience.com/why-git-and-git-lfs-is-not-enough-to-solve-the-machine-learning-reproducibility-crisis-f733b49e96e8)
32 | - [My First Try at DVC](https://stdiff.net/MB2019051301.html)
33 | - [Machine Learning Reproducibility crisis](https://petewarden.com/2018/03/19/the-machine-learning-reproducibility-crisis/)
34 | - [Data Science Workflow](http://fouryears.eu/2018/11/29/the-data-science-workflow/)
35 | - [The Data Science Workflow](https://towardsdatascience.com/the-data-science-workflow-43859db0415)
36 | - [Data Versioning Notebook](https://www.kaggle.com/rtatman/kerneld4769833fe)
37 | - [First Impressions of Data Science Version Control](https://medium.com/@christopher.samiullah/first-impressions-of-data-science-version-control-dvc-fe96ab29cdda?sk=05e1f1d1ba16c9037046f3568956f16c)
38 |
39 | ## Slides
40 |
41 | - [From ML experiments to production: Versioning and Reproducibility with MLV-too](https://peopledoc.github.io/mlv-tools-tutorial/talks/pyData/presentation.html#/)
42 |
--------------------------------------------------------------------------------
/static/docs/user-guide/update-tracked-file.md:
--------------------------------------------------------------------------------
1 | # Update a Tracked File
2 |
3 | Due to the way DVC handles linking between the data files in the cache and their
4 | counterparts in the working directory (refer to
5 | [Large Dataset Optimization](/docs/user-guide/large-dataset-optimization)),
6 | updating tracked files has to be carried out with caution to avoid data
7 | corruption when the DVC config option `cache.type` is set to `hardlink` or/and
8 | `symlink`. (See `dvc config cache` for more details on setting the cache file
9 | link types.)
10 |
11 | > For an example of the cache corruption problem see issue
12 | > [#599](https://github.com/iterative/dvc/issues/599 in our code repository.
13 |
14 | Assume `train.tsv` is tracked by dvc and you want to update it. Here updating
15 | may mean either replacing `train.tsv` with a new file having the same name or
16 | editing the content of the file.
17 |
18 | If you run `dvc repro` there is no need to manage generated (output) files
19 | manually, DVC removes them for you before running the stage which generates
20 | them.
21 |
22 | If you use DVC to track a file that is generated during your pipeline (e.g. some
23 | intermediate result or a final model file - `model.pkl`) and you don't use
24 | `dvc run` and `dvc repro` to manage your pipeline, use the procedure below (run
25 | `dvc unprotect` or `dvc remove`) to unlink it from DVC cache prior to the
26 | execution of the script that modifies it.
27 |
28 | See also `dvc unprotect` and `dvc config cache` to learn more about the
29 | recommended ways to protect your data files.
30 |
31 | ## Replacing file
32 |
33 | If you want to replace the file you should take the following steps.
34 |
35 | First, un-track the file. This will remove `train.tsv` from the workspace:
36 |
37 | ```dvc
38 | $ dvc remove train.tsv.dvc
39 | ```
40 |
41 | Next, replace the file with new content:
42 |
43 | ```dvc
44 | $ echo new > train.tsv
45 | ```
46 |
47 | And start tracking it again:
48 |
49 | ```dvc
50 | $ dvc add train.tsv
51 | $ git add train.tsv.dvc
52 | $ git commit -m "new train data"
53 | ```
54 |
55 | ## Modifying content
56 |
57 | "Unlink" the file with `dvc unprotect`. This will make `train.tsv` safe to edit:
58 |
59 | ```dvc
60 | $ dvc unprotect train.tsv
61 | ```
62 |
63 | Edit the content of the file:
64 |
65 | ```dvc
66 | $ echo "new data item" >> train.tsv
67 | ```
68 |
69 | Add a new version of the file back to DVC:
70 |
71 | ```dvc
72 | $ dvc add train.tsv
73 | $ git add train.tsv.dvc
74 | $ git commit -m "modify train data"
75 | ```
76 |
--------------------------------------------------------------------------------
/static/docs/user-guide/analytics.md:
--------------------------------------------------------------------------------
1 | # Anonymized Usage Analytics
2 |
3 | To help us better understand how DVC is used and improve it, DVC captures and
4 | reports _anonymized_ usage statistics. You will be notified the first time you
5 | run `dvc init`.
6 |
7 | ## Motivation
8 |
9 | Analytics help us to decide on how best to design future features and prioritize
10 | current work. Anonymous aggregate user analytics allow us to prioritize fixes
11 | and features based on how, where and when people use DVC. For example:
12 |
13 | - If reflinks (depends on a file system type) are supported for most users, we
14 | can keep cache protected mode off by default (see `dvc unprotect`).
15 | - Collecting the OS version and the way DVC was installed allows us to decide
16 | what versions of OS to prioritize and support.
17 | - If usage of some command is negligible small it makes us think about issues
18 | with a command or documentation.
19 |
20 | ## Retention period
21 |
22 | User and event data have a 14 month retention period.
23 |
24 | ## What
25 |
26 | DVC's analytics record the following information per event:
27 |
28 | - The DVC version, e.g. `0.22.0`
29 | - The operating system information, e.g. `linux`, `ubuntu`, `14.04`, etc
30 | - The underlying version control system, e.g. `git`
31 | - Command type, e.g. `CmdDataPull`
32 | - Command return code, e.g. `1`
33 | - Way the DVC was installed, e.g. `binary`
34 | - A DVC analytics user ID, e.g. `8ca59a29-ddd9-4247-992a-9b4775732aad`. This is
35 | generated by [`uuid`](https://docs.python.org/3/library/uuid.html).
36 |
37 | This _does not allow us to track individual users_ but does enable us to
38 | accurately measure user counts vs. event counts.
39 |
40 | ## Implementation
41 |
42 | The code is viewable in
43 | [`analytics.py`](https://github.com/iterative/dvc/blob/master/dvc/analytics.py).
44 | They are done in a separate background process and fail fast to avoid delaying
45 | any execution. They will fail immediately and silently if you have no network
46 | connection.
47 |
48 | DVC's analytics are sent throughout DVC's proxy to Google Analytics over HTTPS.
49 |
50 | ## Opting out
51 |
52 | DVC analytics helps the entire community and leaving it on is appreciated.
53 | However, if you want to opt out of DVC's analytics, you can disable it via
54 | `dvc config` command:
55 |
56 | ```dvc
57 | $ dvc config core.analytics false
58 | ```
59 |
60 | This will disable it for the project. Alternatively, you can specify `--global`
61 | or `--system` flags to disable it for an active user or for everyone in the
62 | system.
63 |
--------------------------------------------------------------------------------
/static/docs/get-started/visualize.md:
--------------------------------------------------------------------------------
1 | # Visualize
2 |
3 | Now that we have built our pipeline, we need a good way to visualize it to be
4 | able to wrap our heads around it. Luckily, DVC allows us to do that without
5 | leaving the terminal, making the experience distraction-less.
6 |
7 | We are using `--ascii` option below to better illustrate the pipeline. Please,
8 | refer to the `dvc pipeline show` documentation to explore other options this
9 | command supports (e.g. `.dot` files that can be used then in other tools).
10 |
11 | ## Stages
12 |
13 | ```dvc
14 | $ dvc pipeline show --ascii train.dvc
15 | +-------------------+
16 | | data/data.xml.dvc |
17 | +-------------------+
18 | *
19 | *
20 | *
21 | +-------------+
22 | | prepare.dvc |
23 | +-------------+
24 | *
25 | *
26 | *
27 | +---------------+
28 | | featurize.dvc |
29 | +---------------+
30 | *
31 | *
32 | *
33 | .---------------.
34 | | model.pkl.dvc |
35 | `---------------'
36 | ```
37 |
38 | ## Commands
39 |
40 | ```dvc
41 | $ dvc pipeline show --ascii train.dvc --commands
42 | +-------------------------------------+
43 | | python src/prepare.py data/data.xml |
44 | +-------------------------------------+
45 | *
46 | *
47 | *
48 | +---------------------------------------------------------+
49 | | python src/featurization.py data/prepared data/features |
50 | +---------------------------------------------------------+
51 | *
52 | *
53 | *
54 | +---------------------------------------------+
55 | | python src/train.py data/features model.pkl |
56 | +---------------------------------------------+
57 | ```
58 |
59 | ## Outputs
60 |
61 | ```dvc
62 | $ dvc pipeline show --ascii train.dvc --outs
63 | +---------------+
64 | | data/data.xml |
65 | +---------------+
66 | *
67 | *
68 | *
69 | +---------------+
70 | | data/prepared |
71 | +---------------+
72 | *
73 | *
74 | *
75 | +---------------+
76 | | data/features |
77 | +---------------+
78 | *
79 | *
80 | *
81 | +-----------+
82 | | model.pkl |
83 | +-----------+
84 | ```
85 |
--------------------------------------------------------------------------------
/server.js:
--------------------------------------------------------------------------------
1 | // This file doesn't go through babel or webpack transformation.
2 | // Make sure the syntax and sources this file requires are compatible with the current node version you are running
3 | // See https://github.com/zeit/next.js/issues/1245 for discussions on Universal Webpack or universal Babel
4 | const { createServer } = require('http')
5 | const { parse } = require('url')
6 | const _ = require('force-ssl-heroku')
7 | const next = require('next')
8 | const querystring = require('querystring')
9 |
10 | const dev = process.env.NODE_ENV !== 'production'
11 | const port = process.env.PORT || 3000
12 | const app = next({ dev })
13 | const handle = app.getRequestHandler()
14 |
15 | app.prepare().then(() => {
16 | createServer((req, res) => {
17 | const parsedUrl = parse(req.url, true)
18 | const { pathname, query } = parsedUrl
19 | const doc = /^\/doc.*/i
20 | const s3 = /^\/s3\/.*/i
21 | const pkg = /^\/(deb|rpm)\/.*/i
22 | const chat = /^\/(help|chat)\/?$/i
23 |
24 | if (req.headers.host === 'man.dvc.org') {
25 | const doc_pathname = "/doc/commands-reference" + pathname
26 | res.writeHead(301, { 'Location': "https://dvc.org" + doc_pathname })
27 | res.end()
28 | } else if (req.headers.host === 'pycon2019.dvc.org') {
29 | res.writeHead(301, { 'Location': "https://dvc.org/doc/get-started" })
30 | res.end()
31 | } else if (req.headers.host === 'remote.dvc.org') {
32 | res.writeHead(301, { 'Location': "https://s3-us-west-2.amazonaws.com/dvc-storage" + pathname})
33 | res.end()
34 | } else if (doc.test(pathname)) {
35 | let normalized_pathname = pathname.replace(/^\/doc[^?\/]*/i, '/doc')
36 | if (normalized_pathname !== pathname) {
37 | res.writeHead(301, { 'Location': normalized_pathname +
38 | (Object.keys(query).length === 0 ? '' : '?') +
39 | querystring.stringify(query)})
40 | res.end()
41 | } else {
42 | app.render(req, res, '/doc', query)
43 | }
44 | } else if (s3.test(pathname)) {
45 | res.writeHead(301, {'Location':
46 | "https://s3-us-west-2.amazonaws.com/dvc-share/" +
47 | pathname.substring(4)})
48 | res.end()
49 | } else if (pkg.test(pathname)) {
50 | res.writeHead(301, {'Location':
51 | "https://s3-us-east-2.amazonaws.com/dvc-s3-repo/" + pathname.substring(1, 4) + '/' +
52 | pathname.substring(5)})
53 | res.end()
54 | } else if (chat.test(pathname)) {
55 | res.writeHead(301, {'Location': "https://discordapp.com/invite/dvwXA2N"})
56 | res.end()
57 | } else {
58 | handle(req, res, parsedUrl)
59 | }
60 |
61 | }).listen(port, err => {
62 | if (err) throw err
63 | console.log('> Ready on http://localhost:3000')
64 | })
65 | })
66 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/metrics_remove.md:
--------------------------------------------------------------------------------
1 | # remove
2 |
3 | Keep file as an output, remove metric flag and stop tracking as a metric file.
4 |
5 | ## Synopsis
6 |
7 | ```usage
8 | usage: dvc metrics remove [-h] [-q] [-v]
9 | path
10 |
11 | positional arguments:
12 | path Path to a metric file.
13 |
14 | ```
15 |
16 | ## Description
17 |
18 | This command searches for the corresponding DVC file for the metric file path
19 | provided (i.e. a DVC stage file that specifies one of its outputs a metric with
20 | the path provided, see `dvc metrics add` or `dvc run` `-m` and `-M` options) and
21 | resets the metric flag for the provided output.
22 |
23 | It does not remove or delete the file provided. It only changes a flag in the
24 | corresponding `.dvc` file. It also keeps the file as an output of the
25 | corresponding stage.
26 |
27 | ## Examples
28 |
29 | Let's first create an output that is not a metric file:
30 |
31 | ```dvc
32 | $ dvc run -M metrics.tsv \
33 | "echo -e 'time/tauc/n2019-02-13/t0.9643' > metrics.tsv"
34 | ```
35 |
36 | This command produces the following metrics file:
37 |
38 | ```dvc
39 | $ cat metrics.tsv
40 |
41 | time auc
42 | 2019-02-13 0.9643
43 |
44 | ```
45 |
46 | To extract the AUC value out of it, we can use the `-t` and `-x` options that
47 | `dvc metrics show` command supports (alternatively, see `dvc metrics modify`
48 | command to learn how to apply `-t` and `-x` permanently):
49 |
50 | ```dvc
51 | $ dvc metrics show -t htsv -x 0,auc metrics.tsv
52 |
53 | metrics.tsv: ['0.9643']
54 | ```
55 |
56 | If you check the `metrics.tsv.dvc` file, you should see that `metric: true` is
57 | set:
58 |
59 | ```yaml
60 | cmd: echo -e 'time/tauc/n2019-02-13/t0.9643' > metrics.tsv
61 | md5: 6f910c9000bb03492d1e66035ba8faf6
62 | outs:
63 | - cache: false
64 | md5: 7ce0bc12da7f88c1493763cdd4c3f684
65 | metric: true
66 | path: metrics.tsv
67 | ```
68 |
69 | Now, let's reset the flag with the `dvc metrics remove` command:
70 |
71 | ```dvc
72 | $ dvc metrics remove metrics.tsv
73 |
74 | Saving information to 'metrics.tsv.dvc'.
75 | ```
76 |
77 | Let's check the stage file now:
78 |
79 | ```yaml
80 | cmd: echo -e 'time/tauc/n2019-02-13/t0.9643' > metrics.tsv
81 | md5: 6f910c9000bb03492d1e66035ba8faf6
82 | outs:
83 | - cache: false
84 | md5: 7ce0bc12da7f88c1493763cdd4c3f684
85 | metric: null
86 | path: metrics.tsv
87 | ```
88 |
89 | As you can see, nothing has changed at all, except the flag `metric: true`. And
90 | both files are still here:
91 |
92 | ```dvc
93 | $ tree .
94 |
95 | .
96 | ├── metrics.tsv
97 | └── metrics.tsv.dvc
98 |
99 | 0 directories, 2 files
100 | ```
101 |
--------------------------------------------------------------------------------
/src/Subscribe/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 | import { media, container } from '../styles'
4 |
5 | import SubscribeForm from '../SubscribeForm'
6 |
7 | export default ({}) => (
8 |
9 |
10 |
11 | Subscribe for updates. We won't spam you.
12 |
13 |
14 |
15 |
16 |
17 |
18 | )
19 |
20 | const Subscribe = styled.section`
21 | position: relative;
22 | height: 300px;
23 | background-color: #13adc7;
24 |
25 | ${media.phablet`
26 | display: flex;
27 | align-items: center;
28 | `};
29 |
30 | @media only screen and (min-device-width: 768px) and (max-device-width: 1024px) {
31 | display: flex;
32 | align-items: center;
33 | }
34 | `
35 |
36 | const Container = styled.div`
37 | width: 100%;
38 | margin: 0px auto;
39 | max-width: 1035px;
40 | padding-top: 90px;
41 |
42 | ${media.phablet`
43 | padding: 0px 10px;
44 | `};
45 |
46 | @media only screen and (min-device-width: 768px) and (max-device-width: 1024px) {
47 | padding: 0px 10px;
48 | }
49 | `
50 |
51 | const Glyph = styled.img`
52 | position: absolute;
53 | z-index: 0;
54 |
55 | width: 158px;
56 | height: auto;
57 |
58 | ${media.tablet`
59 | width: 110px;
60 | `};
61 |
62 | object-fit: contain;
63 |
64 | ${props =>
65 | props.gid === 'topleft' &&
66 | `
67 | top: -32px;
68 | left: 28px;
69 | `}
70 |
71 | ${props =>
72 | props.gid === 'rigthbottom' &&
73 | `
74 | bottom: -60px;
75 | right: 28px;
76 | `}
77 |
78 | ${media.phablet`
79 | display: none;
80 | `}
81 | `
82 |
83 | const Title = styled.h3`
84 | font-family: BrandonGrotesqueMed;
85 | min-width: 550px;
86 | min-height: 44px;
87 | font-size: 30px;
88 | font-weight: 500;
89 | color: #ffffff;
90 | margin: 0px auto;
91 | text-align: center;
92 |
93 | ${media.phablet`
94 | min-width: auto;
95 | `};
96 | `
97 |
98 | const SubscribeContainer = styled.div`
99 | margin: 0px auto;
100 | margin-top: 15px;
101 | max-width: 510px;
102 | border-radius: 8px;
103 | background-color: #ffffff;
104 |
105 | ${media.phablet`
106 | width: 100%;
107 | margin: 0px auto;
108 | margin-top: 40px;
109 | min-height: auto;
110 | `} @media only screen
111 | and (min-device-width : 768px)
112 | and (max-device-width : 1024px) {
113 | width: 100%;
114 | margin: 0px auto;
115 | margin-top: 40px;
116 | min-height: auto;
117 | }
118 | `
119 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/version.md:
--------------------------------------------------------------------------------
1 | # version
2 |
3 | This command shows the system/environment information along with the DVC
4 | version.
5 |
6 | ## Synopsis
7 |
8 | ```usage
9 | usage: dvc version [-h] [-q | -v]
10 | ```
11 |
12 | ## Description
13 |
14 | Running the command `dvc version` outputs the following information about the
15 | system/environment:
16 |
17 | | Type | Detail |
18 | | ---------------- | ------------------------------------------------------------------------------ |
19 | | `DVC version` | Version of DVC (along with a Git commit hash in case of a development version) |
20 | | `Python version` | Version of the Python being used for the project in which DVC is initialized |
21 | | `Platform` | Information about the operating system of the machine |
22 |
23 | #### Components of DVC version
24 |
25 | The detail of DVC version depends upon the way of installing the project.
26 |
27 | - **Official release**: This [install guide](/doc/get-started/install) mentions
28 | ways to install DVC using the official package stored in Python Packaging
29 | Authority. We mark these official releases with tags on DVC's repository. Any
30 | issues reported with the official build can be traced using the `BASE_VERSION`
31 | itself. So the output is simply `0.40.2`.
32 |
33 | - **Development version**: `pip install git+git://github.com/iterative/dvc` will
34 | install DVC using the `master` branch of DVC's repository. Another way of
35 | setting up the development version is to clone the repository and run
36 | `pip install .`. The master branch is continuously being updated with changes
37 | which might not be ready to publish yet. Therefore installing using the above
38 | command might have issues regarding its usage. So to trace any error reported
39 | with this setup, we need to know exactly which version is being used. For
40 | this, we rely on git commit hash which is displayed in output as
41 | `0.40.2+292cab.mod`. The part before `+` is the `BASE_VERSION` and the latter
42 | part is the git commit hash which is one of the commits in the `master` branch
43 | (also, optional suffix `.mod` means that code is modified).
44 |
45 | ## Options
46 |
47 | - `-h`, `--help` - prints the usage/help message, and exit.
48 |
49 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
50 | no problems arise, otherwise 1.
51 |
52 | - `-v`, `--verbose` - displays detailed tracing information.
53 |
54 | ## Example
55 |
56 | Getting the DVC version and environment information:
57 |
58 | ```dvc
59 | $ dvc version
60 |
61 | DVC version: 0.40.0+45f94e
62 | Python version: 3.6.7
63 | Platform: Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
64 | ```
65 |
--------------------------------------------------------------------------------
/static/docs/understanding-dvc/what-is-dvc.md:
--------------------------------------------------------------------------------
1 | # What Is DVC?
2 |
3 | Data Version Control, or DVC, is **a new type of experiment management
4 | software** that has been built **on top of the existing engineering toolset**
5 | and particularly on a source code version control system (currently - Git). DVC
6 | reduces the gap between the existing tools and the data scientist needs. This
7 | gives an ability to **use the advantages of the experimentation software while
8 | reusing existing skills and intuition**.
9 |
10 | The underlying source code control system **eliminates the need to use external
11 | services**. Data science experiment sharing and data scientist collaboration can
12 | be done through regular Git tools (commit messages, merges, pull requests, code
13 | comments), the same way it works for software engineers.
14 |
15 | DVC implements a **Git experimentation methodology** where each experiment
16 | exists with its code as well as data, and can be represented as a separate Git
17 | branch or commit.
18 |
19 | DVC uses a few core concepts:
20 |
21 | - **Experiment** is equivalent to a Git branch. Each experiment (extract new
22 | features, change model hyperparameters, data cleaning, add a new data source)
23 | should be performed in a separate branch and then merged into the master
24 | branch only if the experiment is successful. DVC allows experiments to be
25 | integrated into a project's history and NEVER needs to recompute the results
26 | after a successful merge.
27 |
28 | - **Experiment state** or state is equivalent to a Git snapshot (all committed
29 | files). Git checksum, branch name, or tag can be used as a reference to a
30 | experiment state.
31 |
32 | - **Reproducibility** - an action to reproduce an experiment state. This action
33 | generates output files based on a set of input files and source code. This
34 | action usually changes experiment state.
35 |
36 | - **Pipeline** - directed acyclic graph (DAG) of commands to reproduce an
37 | experiment state. The commands are connected by input and output files.
38 | Pipeline is defined by special **DVC files** (which act like Makefiles).
39 |
40 | - **Workflow** - set of experiments and relationships among them. Workflow
41 | corresponds to the entire Git repository.
42 |
43 | - **Data files** - cached files (for large files). For data files the file
44 | content is stored outside of the Git repository on a local hard drive, but
45 | data file metadata is stored in Git for DVC needs (to maintain pipelines and
46 | reproducibility).
47 |
48 | - **Data cache** - directory with all data files on a local hard drive or in
49 | cloud storage, but not in the Git repository.
50 |
51 | - **Cloud storage** support is a complement to the core DVC features. This is
52 | how a data scientist transfers large data files or shares a GPU-trained model
53 | to whom who does not have GPU.
54 |
--------------------------------------------------------------------------------
/static/docs/use-cases/share-data-and-model-files.md:
--------------------------------------------------------------------------------
1 | # Share Data and Model Files
2 |
3 | Same as Git, DVC allows for distributed environment and collaboration. It is
4 | dead easy to consistently get all your data files and code to any machine. All
5 | you need to do is to setup a remote DVC repository, that will store cache files
6 | for your project. Currently DVC supports AWS S3, Google Cloud Storage, Microsoft
7 | Azure Blob Storage, SSH and HDFS as remote location and the list is constantly
8 | growing. For complete information about supported remote types and their
9 | configuration take a look at `dvc remote`.
10 |
11 | 
12 |
13 | As an example, let's take a look at how you could setup DVC remote on Amazon S3
14 | and push/pull to/from it.
15 |
16 | ### Create a bucket at Amazon S3
17 |
18 | If you don't already have it, get Amazon S3 account and then follow instructions
19 | at
20 | [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)
21 | to create your bucket.
22 |
23 | ### Setup DVC remote
24 |
25 | To setup DVC remote on s3, you need to supply an URL to the location where you
26 | wish to store data:
27 |
28 | ```dvc
29 | $ dvc remote add -d myremote s3://mybucket/myproject
30 | Setting "myremote" as a default remote.
31 | ```
32 |
33 | > The `-d` (`--default`) option sets `myremote` as a default repository for the
34 | > project.
35 |
36 | This will add `myremote` to your `.dvc/config`. Commit your changes and push
37 | your code:
38 |
39 | ```dvc
40 | $ git add .dvc/config
41 | $ git push
42 | ```
43 |
44 | ### Upload data
45 |
46 | To upload data from your project run:
47 |
48 | ```dvc
49 | $ dvc push
50 |
51 | (1/5): [##############################] 100% images/0001.jpg
52 | (2/5): [##############################] 100% images/0002.jpg
53 | (3/5): [##############################] 100% images/0001.jpg
54 | (4/5): [##############################] 100% images
55 | (5/5): [##############################] 100% model.pkl
56 | ```
57 |
58 | ### Upload code
59 |
60 | Code with DVC metafiles should be uploaded through Git:
61 |
62 | ```dvc
63 | $ git push
64 | ```
65 |
66 | ### Download code
67 |
68 | Please use regular Git commands to download code and DVC metafiles from your Git
69 | servers.
70 |
71 | ```dvc
72 | $ git clone https://github.com/myaccount/myproject.git
73 | $ cd myproject
74 | ```
75 |
76 | or
77 |
78 | ```dvc
79 | $ git pull
80 | ```
81 |
82 | ### Download data
83 |
84 | To download data files for your project run:
85 |
86 | ```dvc
87 | $ dvc pull
88 |
89 | (1/5): [##############################] 100% images/0001.jpg
90 | (2/5): [##############################] 100% images/0002.jpg
91 | (3/5): [##############################] 100% images/0003.jpg
92 | (4/5): [##############################] 100% images
93 | (5/5): [##############################] 100% model.pkl
94 | ```
95 |
--------------------------------------------------------------------------------
/src/Nav/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 | import { media } from '../styles'
4 | import { logEvent } from '../utils/ga'
5 |
6 | const getStarted = () => {
7 | logEvent('menu', 'get-started')
8 | window.location = '/doc/get-started'
9 | }
10 |
11 | export default ({ mobile = false }) => (
12 |
57 | )
58 |
59 | const Links = styled.div`
60 | display: flex;
61 | flex-direction: row;
62 | `
63 |
64 | const Link = styled.a`
65 | text-decoration: none;
66 | text-transform: uppercase;
67 |
68 | font-family: BrandonGrotesqueBold, Tahoma, Arial;
69 | font-size: 13px;
70 | color: #838d93;
71 | padding-top: 10px;
72 | padding-bottom: 3px;
73 | border-bottom: 1.5px solid #fff;
74 | margin-left: 30px;
75 |
76 | &:hover {
77 | color: #40364d;
78 | border-bottom: 1.5px solid #40364d;
79 | }
80 | `
81 |
82 | const Nav = styled.div`
83 | display: flex;
84 | flex-shrink: 0;
85 | flex-direction: row;
86 | align-items: center;
87 |
88 | ${props =>
89 | props.mobile &&
90 | `
91 | display: none;
92 | `} ${media.phablet`
93 | ${props =>
94 | !props.mobile &&
95 | `
96 | display: none;
97 | `}
98 | `};
99 | `
100 |
101 | const GetStartedButton = styled.button`
102 | text-decoration: none;
103 | margin-left: 40px;
104 | border-radius: 4px;
105 | background-color: #13adc7;
106 | font-family: BrandonGrotesqueMed, Tahoma, Arial;
107 | color: #fff;
108 | width: 113px;
109 | height: 38px;
110 | font-size: 16px;
111 | border: none;
112 | cursor: pointer;
113 | transition: 0.2s background-color ease-out;
114 |
115 | &:hover {
116 | background-color: #13a3bd;
117 | }
118 | `
119 |
--------------------------------------------------------------------------------
/static/docs/get-started/configure.md:
--------------------------------------------------------------------------------
1 | # Configure
2 |
3 | Once you install DVC, you will be able to start using it (in its local setup)
4 | immediately.
5 |
6 | However, remote storage should be set up (see `dvc remote`) if you need to share
7 | data or models outside of the context of single project, for example with other
8 | collaborators or even with yourself, in a different computing environment. It's
9 | similar to the way you would use Github or any other Git server to store and
10 | share your code.
11 |
12 | For simplicity, let's setup a local remote:
13 |
14 |
15 |
16 | ### What is a "local remote" ?
17 |
18 | While the term may seem contradictory, it doesn't have to be. The "local" part
19 | refers to the machine where the project is stored, so it can be any directory
20 | accessible to the same system. The "remote" part refers specifically to the
21 | project/repository itself.
22 |
23 |
24 |
25 | ```dvc
26 | $ dvc remote add -d myremote /tmp/dvc-storage
27 | $ git commit .dvc/config -m "initialize DVC local remote"
28 | ```
29 |
30 | > We only use a local remote in this guide for simplicity's sake in following
31 | > these basic steps as you are learning to use DVC. We realize that for most
32 | > [use cases](/doc/use-cases), other "more remote" types of remotes will be
33 | > required.
34 |
35 | Adding a remote should be specified by both its type prefix and its path. DVC
36 | currently supports seven types of remotes:
37 |
38 | - `local` - Local directory
39 | - `s3` - Amazon Simple Storage Service
40 | - `gs` - Google Cloud Storage
41 | - `azure` - Azure Blob Storage
42 | - `ssh` - Secure Shell
43 | - `hdfs` - The Hadoop Distributed File System
44 | - `http` - Support for HTTP and HTTPS protocol
45 |
46 | > Depending on the [remote storage](/doc/commands-reference/remote) type you
47 | > plan to use to keep and share your data you might need to specify one of the
48 | > optional dependencies: `s3`, `gs`, `azure`, `ssh`. Or `all_remotes` to include
49 | > them all. The command should look like this: `pip install dvc[s3]` - it will
50 | > install `boto3` library along with DVC to support AWS S3 storage. This is
51 | > valid for `pip install` option only. Other ways to install DVC already include
52 | > support for all remotes.
53 |
54 | For example, to setup an S3 remote we would use something like:
55 |
56 | ```dvc
57 | $ dvc remote add -d s3remote s3://mybucket/myproject
58 | ```
59 |
60 | > This command is only shown for informational purposes. No need to actually run
61 | > it in order to continue with this guide.
62 |
63 | You can see, that DVC does not require installing any databases, servers, or
64 | warehouses. It can use bare S3 or SSH to store data, intermediate results, and
65 | your models.
66 |
67 | See `dvc config` to get information about more configuration options and
68 | `dvc remote` to learn more about remotes and get more examples.
69 |
--------------------------------------------------------------------------------
/src/Documentation/RightPanel/RightPanel.js:
--------------------------------------------------------------------------------
1 | import styled from 'styled-components'
2 | import { LightButton } from '../LightButton'
3 |
4 | export const RightPanel = ({ headings, scrollToLink, githubLink }) => (
5 |
6 | {!!headings.length ? (
7 | <>
8 | Content
9 |
10 | >
11 | ) : (
12 |
13 | )}
14 |
15 | {!!headings.length &&
16 | headings.map(({ text, slug }, headingIndex) => (
17 | scrollToLink('#' + slug)}
21 | href={`#${slug}`}
22 | >
23 | {text}
24 |
25 | ))}
26 |
27 |
28 | Found an issue? Let us know or fix it:
29 |
30 |
31 |
32 |
33 | Edit on Github
34 |
35 |
36 |
37 |
38 |
39 | Have a question? Join our chat, we will help you:
40 |
41 |
42 |
43 |
44 | Discord Chat
45 |
46 |
47 |
48 | )
49 |
50 | const Wrapper = styled.div`
51 | width: 170px;
52 | min-width: 170px;
53 | font-size: 16px;
54 | height: calc(100vh - 78px);
55 | position: sticky;
56 | top: 0;
57 |
58 | @media only screen and (max-width: 1200px) {
59 | display: none;
60 | }
61 |
62 | hr {
63 | opacity: 0.5;
64 | }
65 | `
66 |
67 | const Header = styled.p`
68 | color: #3c3937;
69 | font-size: 14px;
70 | text-transform: uppercase;
71 | margin-top: 56px;
72 | `
73 |
74 | const HeadingLink = styled.a`
75 | display: block;
76 | position: relative;
77 | font-size: 16px;
78 | font-weight: 500;
79 | color: #a0a8a5;
80 | text-decoration: none;
81 | font-weight: 400;
82 | line-height: 26px;
83 | min-height: 26px;
84 | margin-bottom: 3px;
85 | cursor: pointer;
86 |
87 | &:hover {
88 | color: #3c3937;
89 | }
90 | `
91 |
92 | const GithubButton = styled(LightButton)`
93 | min-width: 120px;
94 | margin: 10px 0;
95 |
96 | i {
97 | background-image: url(/static/img/github_icon.svg);
98 | }
99 | `
100 |
101 | const DiscordButton = styled(LightButton)`
102 | min-width: 120px;
103 | margin: 10px 0;
104 |
105 | i {
106 | background-image: url(/static/img/discord.svg);
107 | width: 1.2em;
108 | height: 1.2em;
109 | }
110 | `
111 |
112 | const Link = styled.a`
113 | text-decoration: none;
114 | `
115 |
116 | const Spacer = styled.div`
117 | height: 65px;
118 | `
119 |
120 | const Description = styled.p`
121 | color: #3c3937;
122 | `
123 |
--------------------------------------------------------------------------------
/static/docs/understanding-dvc/how-it-works.md:
--------------------------------------------------------------------------------
1 | # How It Works
2 |
3 | 1. DVC is a command line tool that works on top of Git:
4 |
5 | ```dvc
6 | $ cd my_git_repo
7 | $ dvc init
8 | ```
9 |
10 | 2. DVC helps define pipelines of your commands, and keeps all the commands and
11 | dependencies in a Git repository:
12 |
13 | ```dvc
14 | $ dvc run -d input.csv -o model.pkl -o results.csv \
15 | python cnn_train.py --seed 20180227 --epoch 20 \
16 | input.csv model.pkl results.csv
17 | $ git add model.pkl.dvc
18 | $ git commit -m "Train CNN. 20 epochs."
19 | ```
20 |
21 | 3. DVC is programming language agnostic. R command example:
22 |
23 | ```dvc
24 | $ dvc run -d result.csv -o plots.jpg Rscript plot.R result.csv plots.jpg
25 | $ git add plots.jpg.dvc
26 | $ git commit -m "CNN plots"
27 | ```
28 |
29 | 4. DVC can reproduce a pipeline with respect to the pipeline's dependencies:
30 |
31 | ```dvc
32 | # The input dataset was changed
33 | $ dvc repro plots.jpg.dvc
34 |
35 | Reproducing 'model.pkl':
36 | python cnn_train.py --seed 20180227 --epoch 20 input.csv model.pkl results.csv
37 | Reproducing 'plots.jpg':
38 | Rscript plot.R result.csv plots.jpg
39 | ```
40 |
41 | 5. DVC introduces the concept of data files to Git repositories. DVC keeps data
42 | files outside of the repository but retains the metadata in Git:
43 |
44 | ```dvc
45 | $ git checkout a03_normbatch_vgg16 # checkout code and DVC meta data
46 | $ dvc checkout # checkout data files from the local cache (not Git)
47 | $ ls -l data/ # These LARGE files were copied from DVC cache, not from Git
48 |
49 | total 1017488
50 | -r-------- 2 501 staff 273M Jan 27 03:48 Posts-test.tsv
51 | -r-------- 2 501 staff 12G Jan 27 03:48 Posts-train.tsv
52 | ```
53 |
54 | 6. DVC makes repositories reproducible. DVC metadata can be easily shared
55 | through any Git server, and allows for experiments to be easily reproduced:
56 |
57 | ```dvc
58 | $ git clone https://github.com/dataversioncontrol/myrepo.git
59 | $ cd myrepo
60 | # Reproduce data files
61 | $ dvc repro
62 |
63 | Reproducing 'output.p':
64 | python cnn_train.py --seed 20180227 --epoch 20 input.csv model.pkl results.csv
65 | Reproducing 'plots.jpg':
66 | Rscript plot.R result.csv plots.jpg
67 | ```
68 |
69 | 7. DVC's local cache can be transferred to your colleagues and partners through
70 | AWS S3, Azure Blob Storage or GCP Storage:
71 |
72 | ```dvc
73 | $ git push
74 | $ dvc push # push the data cache to the remote storage
75 |
76 | # On a colleague machine:
77 | $ git clone https://github.com/dataversioncontrol/myrepo.git
78 | $ cd myrepo
79 | $ git pull # get the data cache from cloud
80 | $ dvc checkout # checkout data files
81 | $ ls -l data/ # You just got gigabytes of data through Git and DVC:
82 |
83 | total 1017488
84 | -r-------- 2 501 staff 273M Jan 27 03:48 Posts-test.tsv
85 | ```
86 |
87 | 8. DVC works on Mac, Linux ,and Windows.
88 |
--------------------------------------------------------------------------------
/src/TrySection/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 |
4 | import { media, container } from '../styles'
5 |
6 | export default ({ title, buttonText = 'Get Started' }) => (
7 |
8 |
9 |
10 | {title}
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 | )
20 |
21 | const TrySection = styled.section`
22 | position: relative;
23 | height: 278px;
24 | background-color: #945dd6;
25 | display: flex;
26 | align-items: center;
27 | justify-content: center;
28 | `
29 |
30 | const Container = styled.div`
31 | ${container};
32 | width: 100%;
33 | `
34 |
35 | const Title = styled.h3`
36 | font-family: BrandonGrotesqueMed;
37 | max-width: 600px;
38 | min-height: 44px;
39 | font-size: 30px;
40 | font-weight: 500;
41 | text-align: center;
42 | color: #ffffff;
43 | margin: 0px auto;
44 | `
45 |
46 | const Buttons = styled.div`
47 | max-width: 386px;
48 | margin: 0px auto;
49 | margin-top: 20px;
50 | align-items: center;
51 | display: flex;
52 | justify-content: center;
53 | `
54 |
55 | const Button = styled.button`
56 | font-family: BrandonGrotesqueMed;
57 | min-width: 186px;
58 | height: 60px;
59 | border-radius: 4px;
60 | background-color: #945dd6;
61 | border: solid 2px rgba(255, 255, 255, 0.3);
62 |
63 | font-size: 20px;
64 | line-height: 0.9;
65 |
66 | text-align: left;
67 | padding: 0px 50px 0 20px;
68 |
69 | color: #ffffff;
70 | transition: 0.2s background-color ease-out;
71 |
72 | &:hover {
73 | background-color: #f5f5f5;
74 | }
75 |
76 | background: url('/static/img/arrow_right_white.svg') right center no-repeat;
77 | background-position-x: calc(100% - 15px);
78 |
79 | cursor: pointer;
80 |
81 | ${props =>
82 | props.first &&
83 | `
84 | color: #945dd6;
85 | margin-right: 14px;
86 | border-radius: 4px;
87 | background-color: #ffffff;
88 | box-shadow: 0 2px 4px 0 rgba(0, 0, 0, 0.21);
89 |
90 | background-image: url('/static/img/arrow_right_dark.svg');
91 | `};
92 | `
93 |
94 | const Glyph = styled.img`
95 | position: absolute;
96 | z-index: 0;
97 |
98 | width: 158px;
99 | height: 192px;
100 |
101 | object-fit: contain;
102 |
103 | ${props =>
104 | props.gid === 'topleft' &&
105 | `
106 | top: -25px;
107 | left: 40px;
108 | `} ${props =>
109 | props.gid === 'rigthbottom' &&
110 | `
111 | bottom: -60px;
112 | right: 30px;
113 | `};
114 |
115 | ${media.phablet`
116 | display: none;
117 | `};
118 | `
119 |
--------------------------------------------------------------------------------
/static/docs/user-guide/dvcignore.md:
--------------------------------------------------------------------------------
1 | # dvcignore
2 |
3 | Mark which files and/or directories should be ignored when traversing
4 | repository.
5 |
6 | Sometimes you might want DVC to ignore files while traversing the project
7 | directory. For example, when working on a project with many files in its data
8 | directory, you might encounter extended execution time for operations that are
9 | as simple as `dvc status`. To prevent this, we are implementing `.dvcignore`
10 | files handling. When fully implemented, their implementation is intended to
11 | provide similar functionality as `.gitignore` files provide for `git`.
12 |
13 | ## How does it work
14 |
15 | - You need to create `.dvcignore` file.
16 | - Populate it with [patterns](https://git-scm.com/docs/gitignore) that you would
17 | like to ignore.
18 | - Each line should contain only one pattern.
19 | - During execution of commands that traverse directories, DVC will ignore
20 | matching paths.
21 | - Not every operation supports `.dvcignore`. To see current limitations, read
22 | following paragraph.
23 |
24 | ## Current limitations
25 |
26 | During development, we noticed that there are few potential uses cases that
27 | might be tricky to handle (e.g. what to do when we are `dvc add`-ing directory
28 | containing `.dvcignore` file). Therefore, we decided to enable this feature
29 | gradually in different parts of the project.
30 |
31 | Currently `.dvcignore` files will be read and applied in any operation that
32 | collects stage files (e.g. `checkout`, `metrics`, `status`, `run`, `repro`), so
33 | it is advised to use it in cases described in the first paragraph, when amount
34 | of files in tree of repository directory causes performance issues.
35 |
36 | ## Syntax
37 |
38 | The same as for [`.gitignore`](https://git-scm.com/docs/gitignore).
39 |
40 | ## Example
41 |
42 | Lets analyze an example project:
43 |
44 | ```dvc
45 | $ mkdir dir1 dir2
46 | $ echo data1 >> dir1/data1
47 | $ echo data2 >> dir2/data2
48 | $ dvc add dir1/data1 dir2/data2
49 | $ tree .
50 | .
51 | ├── dir1
52 | │ ├── data1
53 | │ └── data1.dvc
54 | └── dir2
55 | ├── data2
56 | └── data2.dvc
57 | ```
58 |
59 | Modify data files:
60 |
61 | ```dvc
62 | $ echo mod > dir1/data1
63 | $ echo mod > dir2/data2
64 | ```
65 |
66 | Check status:
67 |
68 | ```dvc
69 | $ dvc status
70 | dir1/data1.dvc:
71 | changed outs:
72 | modified: dir1/data1
73 | dir2/data2.dvc:
74 | changed outs:
75 | modified: dir2/data2
76 | ```
77 |
78 | Note that both data files are displayed as modified. Create a `.dvcignore` file
79 | and insert pattern matching one of the files:
80 |
81 | ```dvc
82 | $ echo dir1/* >> .dvcignore
83 | ```
84 |
85 | Check status again:
86 |
87 | ```dvc
88 | $ dvc status
89 | dir2/data2.dvc:
90 | changed outs:
91 | modified: dir2/data2
92 | ```
93 |
94 | Only second file is displayed because DVC ignores `data1.dvc` and `data1` when
95 | collecting stage files.
96 |
--------------------------------------------------------------------------------
/static/docs/user-guide/dvc-file-format.md:
--------------------------------------------------------------------------------
1 | # DVC File Format
2 |
3 | When you add a file or a stage to your pipeline, DVC creates a special `.dvc`
4 | file that contains all the needed information to track your data. The file
5 | itself is in a simple YAML format and could be easily written or altered (after
6 | being created by `dvc run` or `dvc add`) by hand.
7 |
8 | Check the [Syntax Highlighting](/doc/user-guide/plugins) to enable the
9 | highlighting for your editor.
10 |
11 | Here is an example of a DVC file:
12 |
13 | ```yaml
14 | cmd: python cmd.py input.data output.data metrics.json
15 | deps:
16 | - md5: da2259ee7c12ace6db43644aef2b754c
17 | path: cmd.py
18 | - md5: e309de87b02312e746ec5a500844ce77
19 | path: input.data
20 | md5: 521ac615cfc7323604059d81d052ce00
21 | outs:
22 | - cache: true
23 | md5: 70f3c9157e3b92a6d2c93eb51439f822
24 | metric: false
25 | path: output.data
26 | - cache: false
27 | md5: d7a82c3cdfd45c4ace13484a931fc526
28 | metric:
29 | type: json
30 | xpath: AUC
31 | path: metrics.json
32 | locked: True
33 |
34 | # Comments are persisted between multiple executions of dvc repro/commit.
35 | # Comments are not persisted between multiple executions of dvc run/add/import.
36 | meta: # key to contain arbitary user data
37 | name: John
38 | email: john@xyz.com
39 | ```
40 |
41 | ## Structure
42 |
43 | On the top level, `.dvc` file consists of such fields:
44 |
45 | - `cmd`: a command that is being run in this stage of the pipeline;
46 | - `deps`: a list of dependencies for this stage;
47 | - `outs`: a list of outputs for this stage;
48 | - `md5`: md5 checksum for this dvc file;
49 | - `locked`: whether or not this stage is locked from reproduction;
50 | - `wdir`: a directory to run command in (default `.`);
51 |
52 | A dependency entry consists of such fields:
53 |
54 | - `path`: path to the dependency, relative to the `wdir` path;
55 | - `md5`: md5 checksum for the dependency;
56 |
57 | An output entry consists of such fields:
58 |
59 | - `path`: path to the output, relative to the `wdir` path;
60 | - `md5`: md5 checksum for the output;
61 | - `cache`: whether or not dvc should cache the output;
62 | - `metric`: whether or not this file is a metric file;
63 |
64 | A metric entry consists of such fields:
65 |
66 | - `type`: type of the metrics file (e.g. raw/json/tsv/htsv/csv/hcsv);
67 | - `xpath`: path within the metrics file to the metrics data(e.g. `AUC.value` for
68 | `{"AUC": {"value": 0.624321}}`);
69 |
70 | A meta entry consists of `key:value` pairs such as `name: john`. A meta entry
71 | can have any structure and contain any number of attributes. `"meta: string"` is
72 | also possible, it doesn't necessarily need to contain dictionary always.
73 |
74 | Comments can be added to the .dvc file using `# comment` syntax. Comments and
75 | meta values are preserved between multiple executions of `dvc repro` and
76 | `dvc commit` commands.
77 |
78 | If user overwrites the file, comments and meta values are not preserved between
79 | multiple executions of `dvc run`,`dvc add`,`dvc import` commands.
80 |
--------------------------------------------------------------------------------
/src/Documentation/Markdown/lang/dvc.js:
--------------------------------------------------------------------------------
1 | 'use strict'
2 |
3 | Object.defineProperty(exports, '__esModule', {
4 | value: true
5 | })
6 |
7 | let _javascript = function(hljs) {
8 | let VAR = {
9 | className: 'variable',
10 | variants: [{ begin: /\$[\w\d#@][\w\d_]*/ }, { begin: /\$\{(.*?)}/ }]
11 | }
12 | let QUOTE_STRING = {
13 | className: 'string',
14 | begin: /"/,
15 | end: /"/,
16 | contains: [
17 | hljs.BACKSLASH_ESCAPE,
18 | VAR,
19 | {
20 | className: 'variable',
21 | begin: /\$\(/,
22 | end: /\)/,
23 | contains: [hljs.BACKSLASH_ESCAPE]
24 | }
25 | ]
26 | }
27 | let APOS_STRING = {
28 | className: 'string',
29 | begin: /'/,
30 | end: /'/
31 | }
32 |
33 | return {
34 | aliases: ['dvc'],
35 | contains: [
36 | {
37 | begin: /^\s*\$/,
38 | end: /\n|\Z/,
39 | returnBegin: true,
40 | keywords: {
41 | keyword:
42 | 'ls cat vi mkdir cd wget du python cp export echo pip curl tar ' +
43 | 'exec autoload sudo unzip rm tree file md5 source virtualenv which'
44 | },
45 | contains: [
46 | {
47 | begin: /^\s*\$\s(dvc|git) [a-z\-]+/,
48 | returnBegin: true,
49 | contains: [
50 | {
51 | begin: /^\s*\$\s/,
52 | className: 'skipped'
53 | },
54 | {
55 | begin: /git [a-z\-]+/,
56 | keywords: {
57 | keyword:
58 | 'git commit status pull push fetch add init checkout ' +
59 | 'merge clone tag'
60 | }
61 | },
62 | {
63 | begin: /dvc [a-z\-]+/,
64 | keywords: {
65 | built_in:
66 | 'help dvc init add import checkout run pull push fetch ' +
67 | 'status repro remove move gc config remote metrics' +
68 | ' install root lock unlock pipeline destroy unprotect ' +
69 | ' commit cache pkg tag diff version'
70 | },
71 | className: 'strong'
72 | }
73 | ]
74 | },
75 | {
76 | begin: /^\s*\$\s/,
77 | className: 'skipped'
78 | },
79 | {
80 | begin: /\\\n/
81 | },
82 | QUOTE_STRING,
83 | APOS_STRING,
84 | VAR,
85 | hljs.HASH_COMMENT_MODE
86 | ]
87 | },
88 | hljs.HASH_COMMENT_MODE,
89 | {
90 | begin: /^\s*[^\s#$]/,
91 | end: /\n|\Z/,
92 | className: 'meta'
93 | }
94 | ]
95 | }
96 | }
97 |
98 | let _javascript2 = _interopRequireDefault(_javascript)
99 |
100 | function _interopRequireDefault(obj) {
101 | return obj && obj.__esModule ? obj : { default: obj }
102 | }
103 |
104 | exports.default = _javascript2.default
105 |
--------------------------------------------------------------------------------
/static/docs/get-started/install.md:
--------------------------------------------------------------------------------
1 | # Install
2 |
3 | There are three ways to install DVC: `pip`, OS-specific package, and Homebrew
4 | (depending on your OS some of these ways may be not available for you).
5 |
6 | To install DVC from terminal, run:
7 |
8 | ```dvc
9 | $ pip install dvc
10 | ```
11 |
12 | > Depending on the [remote storage](/doc/commands-reference/remote) type you
13 | > plan to use to keep and share your data, you might need to specify one of the
14 | > optional dependencies: `s3`, `gs`, `azure`, `ssh`. Or `all_remotes` to include
15 | > them all. The command should look like this: `pip install dvc[s3]` - it
16 | > installs the `boto3` library along with DVC to support the AWS S3 storage.
17 | > This is valid for `pip install` option only. Other ways to install DVC already
18 | > include support for all remotes.
19 |
20 | As an easier option, self-contained binary packages are also available. Use the
21 | Download button in the [home page](https://dvc.org/) to the left or get them
22 | [here](https://github.com/iterative/dvc/releases/). We also provide `deb`, `rpm`
23 | and `homebrew` repositories:
24 |
25 |
26 |
27 | ### Expand to install from deb repository (Ubuntu, Debian)
28 |
29 | ```dvc
30 | $ sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
31 | $ sudo apt-get update
32 | $ sudo apt-get install dvc
33 | ```
34 |
35 |
36 |
37 |
38 |
39 | ### Expand to install from rpm repository (Fedora, CentOS)
40 |
41 | ```dvc
42 | $ sudo wget https://dvc.org/rpm/dvc.repo -O /etc/yum.repos.d/dvc.repo
43 | $ sudo yum update
44 | $ sudo yum install dvc
45 | ```
46 |
47 |
48 |
49 |
50 |
51 | ### Expand to install via Homebrew (Mac OS)
52 |
53 | ```dvc
54 | $ brew install iterative/homebrew-dvc/dvc
55 | ```
56 |
57 | or:
58 |
59 | ```dvc
60 | $ brew cask install iterative/homebrew-dvc/dvc
61 | ```
62 |
63 |
64 |
65 |
66 |
67 | ### Expand to install from pkg installer (Mac OS)
68 |
69 | Click the `Download` button on the main page and download `.pkg` to install it.
70 | Alternatively, you can always find the latest version of this installer
71 | [here](https://github.com/iterative/dvc/releases).
72 |
73 |
74 |
75 |
76 |
77 | ### Expand to install using installer (Windows)
78 |
79 | If you have any problems with `pip install`, click the `Download` button on the
80 | main page and download `.exe` to install DVC. Alternatively, you can always find
81 | the latest version of this binary installer here:
82 | [here](https://github.com/iterative/dvc/releases).
83 |
84 |
85 |
86 | See [Development](/doc/user-guide/development) if you want to install the most
87 | recent development version.
88 |
89 | ### Shell autocomplete
90 |
91 | Visit [Shell Autocomplete](/doc/user-guide/autocomplete) section to find and
92 | install the completion scripts for your shell.
93 |
94 | ### Editors and IDEs integration
95 |
96 | Visit [Vim and IDE Integrations](/doc/user-guide/plugins) for reference on how
97 | to enable shell syntax highlighting and install DVC support for different IDEs.
98 |
--------------------------------------------------------------------------------
/src/TextRotate/index.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | import styled, { css } from 'styled-components'
3 |
4 | export default class TextRotate extends Component {
5 | static defaultProps = {
6 | words: [],
7 | delay: 80,
8 | wordDelay: 1000,
9 | textBefore: ``,
10 | textAfter: ``
11 | }
12 |
13 | state = {
14 | ready: false,
15 | currentWordIndex: 0,
16 | pos: 1,
17 | length: 0,
18 | grow: true,
19 | timer: null
20 | }
21 |
22 | componentDidMount() {
23 | this.initTimer()
24 | this.setState({
25 | grow: true,
26 | ready: true
27 | })
28 | }
29 |
30 | componentWillUnmount() {
31 | this.deinitTimer()
32 | }
33 |
34 | initTimer() {
35 | const word = this.props.words[this.state.currentWordIndex]
36 |
37 | this.setState({
38 | pos: word.length,
39 | grow: false
40 | })
41 |
42 | setTimeout(() => {
43 | const timer = setInterval(this.animate, this.props.delay)
44 | this.setState({
45 | timer
46 | })
47 | }, this.props.wordDelay)
48 | }
49 |
50 | deinitTimer() {
51 | clearTimeout(this.state.timer)
52 | }
53 |
54 | animate = () => {
55 | if (
56 | this.state.pos === this.props.words[this.state.currentWordIndex].length
57 | ) {
58 | this.setState(
59 | prevState => ({
60 | grow: !prevState.grow
61 | }),
62 | () => {
63 | if (!this.state.grow) {
64 | clearInterval(this.state.timer)
65 | this.initTimer()
66 | }
67 | }
68 | )
69 | }
70 |
71 | if (!this.state.grow && this.state.pos === 0) {
72 | return this.nextWord()
73 | }
74 |
75 | if (this.state.grow) {
76 | this.setState(prevState => ({
77 | pos: prevState.pos + 1
78 | }))
79 | } else {
80 | this.setState(prevState => ({
81 | pos: prevState.pos - 1
82 | }))
83 | }
84 | }
85 |
86 | nextWord = () => {
87 | let nextWordIndex = this.state.currentWordIndex + 1
88 | if (nextWordIndex > this.props.words.length - 1) {
89 | nextWordIndex = 0
90 | }
91 |
92 | this.setState(prevState => ({
93 | currentWordIndex: nextWordIndex,
94 | pos: 0,
95 | grow: true
96 | }))
97 | }
98 |
99 | getCurrentWord() {
100 | const currentWord = this.props.words[this.state.currentWordIndex]
101 |
102 | if (!this.state.ready) return currentWord
103 |
104 | return currentWord.slice(0, this.state.pos + 1)
105 | }
106 |
107 | render() {
108 | const { textBefore, textAfter } = this.props
109 | const word = this.getCurrentWord()
110 |
111 | return (
112 |
113 |
{textBefore}
{' '}
114 |
115 | {word}
116 | |{' '}
117 |
118 |
{textAfter}
119 |
120 | )
121 | }
122 | }
123 |
124 | const Wrapper = styled.span``
125 | const Cursor = styled.span`
126 | vertical-align: 4px;
127 | `
128 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/metrics.md:
--------------------------------------------------------------------------------
1 | # metrics
2 |
3 | A set of commands to collect and display project metrics:
4 | [add](/doc/commands-reference/metrics-add),
5 | [show](/doc/commands-reference/metrics-show),
6 | [modify](/doc/commands-reference/metrics-modify), and
7 | [remove](/doc/commands-reference/metrics-remove).
8 |
9 | ## Synopsis
10 |
11 | ```usage
12 | usage: dvc metrics [-h] [-q] [-v]
13 | {show, add, modify, remove}
14 | ...
15 |
16 | positional arguments:
17 | show Output metric values.
18 | add Tag file as a metric file.
19 | modify Modify metric file options.
20 | remove Remove files's metric tag.
21 | ```
22 |
23 | ## Description
24 |
25 | DVC has the ability to tag a specified output file as a file that contains
26 | metrics to track. Metrics are usually any project specific numbers - `AUC`,
27 | `ROC`, etc. DVC itself does not imply any specific meaning for these numbers.
28 | Usually these numbers are produced by the model evaluation script and serve as a
29 | way to compare and pick the best performing experiment variant.
30 |
31 | [Add](/doc/commands-reference/metrics-add),
32 | [show](/doc/commands-reference/metrics-show),
33 | [modify](/doc/commands-reference/metrics-modify), and
34 | [remove](/doc/commands-reference/metrics-remove) commands are available to set
35 | up and manage DVC metrics.
36 |
37 | ## Options
38 |
39 | - `-h`, `--help` - prints the usage/help message, and exit.
40 |
41 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
42 | no problems arise, otherwise 1.
43 |
44 | - `-v`, `--verbose` - displays detailed tracing information.
45 |
46 | ## Examples
47 |
48 | First, let's create a simple DVC stage file:
49 |
50 | ```dvc
51 | $ dvc run -d code/evaluate.py -M data/eval.json -f Dvcfile \
52 | python code/evaluate.py
53 | ```
54 |
55 | > `-M|--metrics-no-cache` is telling DVC to mark `data/eval.json` as a metric
56 | > file. Using this option is equivalent to using `-O|--outs-no-cache` and then
57 | > using `dvc metrics add data/eval.json` to explicitly mark `data/eval.json` as
58 | > a metric file.
59 |
60 | Now let's print metric values that we are tracking in the current project:
61 |
62 | ```dvc
63 | $ dvc metrics show -a
64 |
65 | master:
66 | data/eval.json: {"AUC": "0.624652"}
67 | ```
68 |
69 | Then we can tell DVC an `xpath` for the metric file, so that it can output only
70 | the value of AUC. In the case of JSON, it uses
71 | [JSONPath expressions](https://goessner.net/articles/JsonPath/index.html) to
72 | selectively extract data out of metric files:
73 |
74 | ```dvc
75 | $ dvc metrics modify data/eval.json --type json --xpath AUC
76 | $ dvc metrics show
77 |
78 | master:
79 | data/eval.json: 0.624652
80 | ```
81 |
82 | And finally let's remove `data/eval.json` from project's metrics:
83 |
84 | ```dvc
85 | $ dvc metrics remove data/eval.json
86 | $ dvc metrics show
87 |
88 | Failed to show metrics: No metric files in this repository.
89 | Use 'dvc metrics add' to add a metric file to track.
90 | ```
91 |
--------------------------------------------------------------------------------
/src/TopMenu/index.js:
--------------------------------------------------------------------------------
1 | import React, { Component } from 'react'
2 | // components
3 | import Nav from '../Nav'
4 | // utils
5 | import throttle from 'lodash.throttle'
6 | // styles
7 | import styled from 'styled-components'
8 | import { media } from '../styles'
9 |
10 | const MIN_HEIGHT = 78
11 |
12 | class TopMenu extends Component {
13 | constructor() {
14 | super()
15 | this.state = {
16 | scrolled: false
17 | }
18 | this.handleScrollThrottled = throttle(this.handleScroll, 300)
19 | }
20 |
21 | componentDidMount() {
22 | this.bodybag = document.getElementById('bodybag')
23 | this.isPhablet = window.innerWidth <= 572
24 |
25 | if (!this.isPhablet) {
26 | this.bodybag.addEventListener('scroll', this.handleScrollThrottled)
27 | this.handleScroll()
28 | }
29 | }
30 |
31 | componentWillUnmount() {
32 | if (!this.isPhablet) {
33 | this.bodybag.removeEventListener('scroll', this.handleScrollThrottled)
34 | }
35 | }
36 |
37 | handleScroll = e => {
38 | if (this.props.isDocPage) return
39 | const scrollTop = e ? e.target.scrollTop : 0
40 | this.setState({
41 | scrolled: scrollTop > 25
42 | })
43 | }
44 |
45 | render() {
46 | const { isDocPage } = this.props
47 | const { scrolled } = this.state
48 |
49 | return (
50 |
51 |
52 |
53 |
59 |
60 |
61 |
62 |
63 | )
64 | }
65 | }
66 |
67 | export default TopMenu
68 |
69 | const Wrapper = styled.div`
70 | position: fixed;
71 | z-index: 10;
72 | top: 0px;
73 | left: 0px;
74 | right: 0px;
75 |
76 | background-color: #ffffff;
77 | box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.15);
78 | overflow-y: scroll;
79 |
80 | &::-webkit-scrollbar {
81 | visibility: hidden;
82 | }
83 | `
84 |
85 | const Container = styled.section`
86 | margin: 0 auto;
87 | padding: 0px 15px;
88 | max-width: ${props => (props.wide ? '1200px' : '1005px')};
89 | min-height: ${MIN_HEIGHT}px;
90 | width: auto;
91 |
92 | ${props => `
93 | height: ${MIN_HEIGHT + (props.scrolled ? 0 : 20)}px;
94 | `};
95 |
96 | z-index: 3;
97 | position: relative;
98 | color: #ffffff;
99 | display: flex;
100 | flex-shrink: 0;
101 | justify-content: space-between;
102 | align-items: center;
103 | transition: height 0.2s linear;
104 | will-change: height;
105 |
106 | ${media.phablet`
107 | flex-direction: column;
108 | justify-content: center;
109 | align-items: start;
110 | height: auto;
111 | `};
112 | `
113 |
114 | const Logo = styled.a`
115 | display: block;
116 | padding-top: 10px;
117 | z-index: 999;
118 |
119 | ${media.phablet`
120 | padding-top: 10px;
121 | padding-bottom: 0px;
122 | `};
123 | `
124 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/unprotect.md:
--------------------------------------------------------------------------------
1 | # unprotect
2 |
3 | Unprotect tracked files or directories (when the cache protected mode has been
4 | enabled with `dvc config cache`).
5 |
6 | ## Synopsis
7 |
8 | ```usage
9 | usage: dvc unprotect [-h] [-q | -v] targets [targets ...]
10 |
11 | Unprotect data file/directory.
12 |
13 | positional arguments:
14 | targets Data files/directory.
15 | ```
16 |
17 | ## Description
18 |
19 | By default this command is not necessary, as DVC avoids hardlinks and symlinks
20 | to link tracked data files in the workspace to the cache. However, these types
21 | of file links can be enabled with `dvc config cache` (`cache.type` config
22 | option). These link types also require the `cache.protected` mode to be turned
23 | on, which makes the tracked data files in the workspace read-only to prevent
24 | users from accidentally corrupting the cache by modifying them.
25 |
26 | Running `dvc unprotect` guarantees that the target files or directories
27 | (`targets`) in the workspace are physically "unlinked" from the cache and can be
28 | safely updated. Read the
29 | [Update a Tracked File](/doc/user-guide/update-tracked-file) guide to learn more
30 | on this process.
31 |
32 | `dvc unprotect` can be an expensive operation (involves copying data), check
33 | first whether your task matches one of the cases that are considered safe, even
34 | when cache protected mode is enabled:
35 |
36 | - Adding more files to a directory input data set (say, images or videos).
37 |
38 | - Deleting files from a directory data set.
39 |
40 | ## Options
41 |
42 | - `-h`, `--help` - prints the usage/help message, and exit.
43 |
44 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
45 | no problems arise, otherwise 1.
46 |
47 | - `-v`, `--verbose` - displays detailed tracing information.
48 |
49 | ## Example
50 |
51 | Enable cache protected mode is enabled:
52 |
53 | ```dvc
54 | $ dvc config cache.protected true
55 | ```
56 |
57 | Put a data file under DVC control:
58 |
59 | ```dvc
60 | $ ls -lh
61 | -rw-r--r-- 1 10576022 Nov 27 13:30 Posts.xml.zip
62 |
63 | $ dvc add Posts.xml.zip
64 | Adding 'Posts.xml.zip' to '.gitignore'.
65 | Saving 'Posts.xml.zip' to cache '.dvc/cache'.
66 | Saving information to 'Posts.xml.zip.dvc'.
67 |
68 | To track the changes with git run:
69 |
70 | git add .gitignore Posts.xml.zip.dvc
71 | ```
72 |
73 | Check that file is a read-only link (@ sign means a link):
74 |
75 | ```dvc
76 | $ ls -lh
77 | -r--r--r--@ 1 10576022 Apr 25 2017 Posts.xml.zip
78 | -rw-r--r-- 1 120 Nov 27 13:29 Posts.xml.zip.dvc
79 | ```
80 |
81 | Unprotect the file:
82 |
83 | ```dvc
84 | $ dvc unprotect Posts.xml.zip
85 | [##############################] 100% Posts.xml.zip
86 | ```
87 |
88 | Check that the file is writable now, the cached version is intact, and they are
89 | not linked (the file in the workspace is a copy of the file):
90 |
91 | ```dvc
92 | $ ls -lh
93 | -rw-r--r-- 1 120B Nov 27 13:29 Posts.xml.zip.dvc
94 | -rw-r--r-- 1 10M Nov 27 13:30 Posts.xml.zip
95 |
96 | $ ls -lh ls -lh .dvc/cache/ce/
97 | -rw-r--r--@ 1 10M Apr 25 2017 68b98d82545628782c66192c96f2d2
98 | ```
99 |
--------------------------------------------------------------------------------
/static/docs/use-cases/multiple-data-scientists-on-a-single-machine.md:
--------------------------------------------------------------------------------
1 | # Shared Development Server
2 |
3 | It's pretty common to see that teams prefer using one single shared machine to
4 | run their experiments. This allows you to have a better resource utilization
5 | such as ability to use multiple GPUs, store all your data in one place, etc.
6 |
7 | 
8 |
9 | With DVC, you can easily setup a shared data storage on the server that will
10 | allow your team to share and store data for your projects as effectively as
11 | possible and have a workspace restoration/switching speed as instant
12 | as`git checkout` for your code.
13 |
14 | ### Preparation
15 |
16 | In order to make it work on a shared server, you need to setup a shared cache
17 | location for your projects, so that every team member is using the same cache
18 | storage:
19 |
20 | ```dvc
21 | $ mkdir -p /dvc-cache
22 | ```
23 |
24 | You will have to make sure that the directory has proper permissions setup, so
25 | that every one on your team can read and write to it and can access cache files
26 | written by others. The most straightforward way to do that is to make sure that
27 | you and your colleagues are members of the same group (e.g. 'users') and that
28 | your shared cache directory is owned by that group and has respective
29 | permissions.
30 |
31 | ### Transfer Existing Cache (Optional)
32 |
33 | This step is optional. You can skip it if you are setting up a new DVC
34 | repository and don't have your local cache stored in `.dvc/cache`. If you did
35 | work on your project with DVC previously and you wish to transfer your cache to
36 | the external cache directory, you will need to simply move it from an old cache
37 | location to the new one:
38 |
39 | ```dvc
40 | $ mv .dvc/cache/* /dvc-cache
41 | ```
42 |
43 | ### Configure External Cache
44 |
45 | Tell DVC to use the directory we've set up as an external cache location by
46 | running:
47 |
48 | ```dvc
49 | $ dvc config cache.dir /dvc-cache
50 | ```
51 |
52 | Commit changes to `.dvc/config` and push them to your git remote:
53 |
54 | ```dvc
55 | $ git add .dvc/config
56 | $ git commit -m "dvc: setup external cache dir"
57 | ```
58 |
59 | ### Example
60 |
61 | You and your colleagues can work in your own workspaces as usual and DVC will
62 | handle all your data in the most effective way possible. Let's say you are
63 | cleaning up the data:
64 |
65 | ```dvc
66 | $ dvc add raw
67 | $ dvc run -d raw -o clean ./cleanup.py raw clean
68 | $ git add raw.dvc clean.dvc
69 | $ git commit -m "cleanup raw data"
70 | $ git push
71 | ```
72 |
73 | Your colleague can pull the code and have both `raw` and `clean` instantly
74 | appear in his workspace without copying. After this he decides to continue
75 | building the pipeline and process the cleaned up data:
76 |
77 | ```dvc
78 | $ git pull
79 | $ dvc checkout
80 | $ dvc run -d clean -o processed ./process.py clean process
81 | $ git add processed.dvc
82 | $ git commit -m "process clean data"
83 | $ git push
84 | ```
85 |
86 | And now you can just as easily get his work appear in your workspace by:
87 |
88 | ```dvc
89 | $ git pull
90 | $ dvc checkout
91 | ```
92 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/remote_default.md:
--------------------------------------------------------------------------------
1 | # remote default
2 |
3 | To set/unset default data remote. Depending on your storage type you may need to
4 | run `dvc remote modify` to provide credentials and/or configure other remote
5 | parameters.
6 |
7 | See also [add](/doc/commands-reference/remote-add),
8 | [list](/doc/commands-reference/remote-list),
9 | [modify](/doc/commands-reference/remote-modify), and
10 | [remove](/doc/commands-reference/remote-remove) commands to manage data remotes.
11 |
12 | ## Synopsis
13 |
14 | ```usage
15 | usage: dvc remote default [-h] [-q | -v] [-u]
16 | [--global] [--system] [--local]
17 | [name]
18 |
19 | positional arguments:
20 | name Name of the remote.
21 | ```
22 |
23 | ## Description
24 |
25 | You can query/set/replace/unset default remote using options of this command. If
26 | the `name` of the remote is not provided and `--unset` is not specified, this
27 | command returns the name of the default remote.
28 |
29 | ```dvc
30 | $ dvc remote default myremote
31 | ```
32 |
33 | This command assigns the default remote in the core section of the DVC
34 | [config file](/doc/user-guide/dvc-files-and-directories).
35 |
36 | ```ini
37 | [core]
38 | remote = myremote
39 | ```
40 |
41 | For the commands which take a `--remote` option (`dvc pull`, `dvc push`,
42 | `dvc status`, `dvc gc`, `dvc fetch`), default remote is used if that option is
43 | not specified.
44 |
45 | You can also use [`dvc config`](/doc/user-guide/dvc-files-and-directories),
46 | [`dvc remote add`](/doc/commands-reference/remote-add) and
47 | [`dvc remote modify`](/doc/commands-reference/remote-modify) commands to
48 | set/unset/change the default remote configurations.
49 |
50 | ## Options
51 |
52 | - `-u`, `--unset` - unsets default remote.
53 |
54 | - `--global` - save remote configuration to the global config (e.g.
55 | `~/.config/dvc/config`) instead of `.dvc/config`.
56 |
57 | - `--system` - save remote configuration to the system config (e.g.
58 | `/etc/dvc.config`) instead of `.dvc/config`.
59 |
60 | - `--local` - save the remote configuration to the
61 | [local](/doc/user-guide/dvc-files-and-directories) config
62 | (`.dvc/config.local`). This is useful when you need to specify private options
63 | or local environment specific settings in your config, that you don't want to
64 | track and share through Git (credentials, private locations, etc).
65 |
66 | - `-h`, `--help` - prints the usage/help message and exit.
67 |
68 | - `-q`, `--quiet` - does not write anything to standard output. Exit with 0 if
69 | no problems arise, otherwise 1.
70 |
71 | - `-v`, `--verbose` - displays detailed tracing information.
72 |
73 | ## Examples
74 |
75 | Set `myremote` as default remote:
76 |
77 | ```dvc
78 | $ dvc remote default myremote
79 | ```
80 |
81 | Get default remote:
82 |
83 | ```dvc
84 | $ dvc remote default
85 |
86 | myremote
87 | ```
88 |
89 | Change default remote value:
90 |
91 | ```dvc
92 | $ dvc remote default mynewremote
93 | ```
94 |
95 | In the DVC config file, the updated value of default remote can be found in the
96 | core section (run `cat .dvc/config`):
97 |
98 | ```ini
99 | [core]
100 | remote = mynewremote
101 | ```
102 |
103 | Clear/unset default remote value:
104 |
105 | ```dvc
106 | $ dvc remote default -u
107 | ```
108 |
--------------------------------------------------------------------------------
/static/docs/tutorial/preparation.md:
--------------------------------------------------------------------------------
1 | # Preparation
2 |
3 | In this document, we will be building an ML model to classify
4 | [StackOverflow](https://stackoverflow.com) questions by two classes: with
5 | `python` tag and without `python` tag. For training purposes, a small subset of
6 | data will be used — only 180Mb xml files.
7 |
8 | Most of the code for the problem is ready and will be downloaded in the first
9 | steps. Later we will be modifying the code a bit to improve the model.
10 |
11 | ## Getting the sample code
12 |
13 | Take the following steps to initialize a new Git repository and get the sample
14 | code into it:
15 |
16 |
17 |
18 | ### Expand to learn how to download on Windows
19 |
20 | Windows does not ship `wget` utility by default, so you'll need to use browser
21 | to download `code.zip`.
22 |
23 |
24 |
25 | ```dvc
26 | $ mkdir classify
27 | $ cd classify
28 | $ git init
29 | $ wget https://dvc.org/s3/so/code.zip
30 | $ unzip code.zip -d code && rm -f code.zip
31 | $ git add code/
32 | $ git commit -m "download code"
33 | ```
34 |
35 | (Optional) It's highly recommended to initialize a virtual environment to keep
36 | your global packages clean and untouched:
37 |
38 | ```dvc
39 | $ virtualenv .env
40 | $ source .env/bin/activate
41 | $ echo ".env/" >> .git/info/exclude
42 | ```
43 |
44 | Install the code requirements:
45 |
46 | ```dvc
47 | $ pip install -r code/requirements.txt
48 | ```
49 |
50 | ## Install DVC
51 |
52 | Now DVC software should be installed. The easiest way to install DVC is a system
53 | dependent package. DVC supports all common operating systems: Mac OS X, Linux
54 | and Windows. You can find the latest version of the package on the
55 | [home page](https://dvc.org).
56 |
57 | Alternatively, you can install DVC by Python package manager — PIP if you use
58 | Python:
59 |
60 | ```dvc
61 | $ pip install dvc
62 | ```
63 |
64 | ## Initialize
65 |
66 | DVC works on top of Git repositories. You run DVC initialization in a repository
67 | directory to create DVC metafiles and directories.
68 |
69 | After DVC initialization, a new directory `.dvc/` will be created with `config`
70 | and `.gitignore` files and `cache` directory. These files and directories are
71 | hidden from the user generally and are not meant to be manipulated directly.
72 | However, we describe some DVC internals below for a better understanding of how
73 | it works.
74 |
75 | ```dvc
76 | $ dvc init
77 | ...
78 |
79 | $ ls -a .dvc
80 | ./ ../ .gitignore cache/ config
81 |
82 | $ git status -s
83 | A .dvc/.gitignore
84 | A .dvc/config
85 |
86 | $ cat .dvc/.gitignore
87 | /state
88 | /lock
89 | /config.local
90 | /updater
91 | /updater.lock
92 | /state-journal
93 | /state-wal
94 | /cache
95 |
96 | $ git commit -m "init DVC"
97 | ```
98 |
99 | The `.dvc/cache` directory is one of the most important parts of any DVC
100 | repository. The directory contains all the content of data files and will be
101 | described in the next chapter in more detail. The most important part about this
102 | directory is that it is contained in the `.dvc/.gitignore` file, which means
103 | that the cache directory is not under Git control — this is your local directory
104 | and you cannot push it to any Git remote.
105 |
106 | For more information refer to
107 | [DVC Files and Directories](https://dvc.org/doc/user-guide/dvc-files-and-directories).
108 |
--------------------------------------------------------------------------------
/static/docs/get-started/add-files.md:
--------------------------------------------------------------------------------
1 | # Add Files or Directories
2 |
3 | DVC allows storing and versioning source data files, ML models, directories,
4 | intermediate results with Git, without checking the file contents into Git.
5 | Let's get a sample data set to play with:
6 |
7 |
8 |
9 | ### Expand if you're on Windows or having problems downloading from command line
10 |
11 | If you experienced problems using `wget` or you're on Windows and you don't want
12 | to install it, you'll need to use a browser to download `data.xml` and save it
13 | into `data` subdirectory. To download, right-click
14 | [this link](https://dvc.org/s3/get-started/data.xml) and click `Save link as`
15 | (Chrome) or `Save object as` (Firefox).
16 |
17 |
18 |
19 | ```dvc
20 | $ mkdir data
21 | $ wget https://dvc.org/s3/get-started/data.xml -O data/data.xml
22 | ```
23 |
24 | To take a file (or a directory) under DVC control just run `dvc add`, it accepts
25 | any **file** or a **directory**:
26 |
27 | ```dvc
28 | $ dvc add data/data.xml
29 | ```
30 |
31 | DVC stores information about your data file in a special `.dvc` file, that has a
32 | human-readable [description](/doc/user-guide/dvc-file-format) and can be
33 | committed to Git to track versions of your file:
34 |
35 | ```dvc
36 | $ git add data/.gitignore data/data.xml.dvc
37 | $ git commit -m "add source data to DVC"
38 | ```
39 |
40 |
41 |
42 | ### Expand to learn about DVC internals
43 |
44 | You can see that actual data file has been moved to the `.dvc/cache` directory,
45 | while the entries in the working directory may be links to the actual files in
46 | the DVC cache. (See
47 | [File link types](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
48 | to learn about the supported file linking options, their tradeoffs, and how to
49 | enable them).
50 |
51 | ```dvc
52 | $ ls -R .dvc/cache
53 | .dvc/cache/a3:
54 | 04afb96060aad90176268345e10355
55 | ```
56 |
57 | where `a304afb96060aad90176268345e10355` is an MD5 hash of the `data.xml` file.
58 | And if you check the `data/data.xml.dvc` meta-file you will see that it has this
59 | hash inside.
60 |
61 |
62 |
63 |
64 |
65 | ### Expand for an important note on cache performance
66 |
67 | DVC tries to use reflinks\* by default to link your data files from the DVC
68 | cache to the workspace, optimizing speed and storage space. However, reflinks
69 | are not widely supported yet and DVC falls back to actually copying data files
70 | to/from the cache **which can be very slow with large files**, and duplicates
71 | storage requirements.
72 |
73 | Hardlinks and symlinks are also available for optimized cache linking but,
74 | (unlike reflinks) they carry the risk of accidentally corrupting the cache if
75 | tacked data files are modified in the workspace.
76 |
77 | See [Large Dataset Optimization](/docs/user-guide/large-dataset-optimization)
78 | and `dvc config cache` for more information.
79 |
80 | > \***copy-on-write links or "reflinks"** are a relatively new way to link files
81 | > in UNIX-style file systems. Unlike hardlinks or symlinks, they support
82 | > transparent [copy on write](https://en.wikipedia.org/wiki/Copy-on-write). This
83 | > means that editing a reflinked file is always safe as all the other links to
84 | > the file will reflect the changes.
85 |
86 |
87 |
88 | Refer to
89 | [Data and Model Files Versioning](/doc/use-cases/data-and-model-files-versioning),
90 | `dvc add`, and `dvc run` for more information on storing and versioning data
91 | files with DVC.
92 |
--------------------------------------------------------------------------------
/static/docs/commands-reference/pipeline_show.md:
--------------------------------------------------------------------------------
1 | # show
2 |
3 | Show stages in a pipeline that lead to the specified stage. By default it lists
4 | stage files (usually `.dvc` files). There are `-c` or `-o` options to list or
5 | visualize a pipeline commands or data files flow instead.
6 |
7 | ## Synopsis
8 |
9 | ```usage
10 | usage: dvc pipeline show [-h] [-q | -v] [-c | -o]
11 | [--dot DOT] [--ascii]
12 | [--tree] [-l]
13 | [targets [targets ...]]
14 |
15 | positional arguments:
16 | targets DVC files.
17 | ```
18 |
19 | ## Options
20 |
21 | - `-c`, `--commands` - show pipeline as a list (graph, if `--ascii` or `--dot`
22 | option is specified) of commands instead of paths to DVC files.
23 |
24 | - `-o`, `--outs` - show pipeline as a list (graph, if `--ascii` or `--dot`
25 | option is specified) of stage output files instead of paths to DVC files.
26 |
27 | - `--ascii` - visualize pipeline. It will print a graph (ASCII) instead of a
28 | list of path to DVC files.
29 |
30 | - `--dot` - show contents of `.dot` files with a DVC pipeline graph. It can be
31 | passed to third party visualization utilities.
32 |
33 | - `--tree` - list dependencies tree like recursive directory listing.
34 |
35 | - `-l`, `--locked` - print locked DVC stages only.
36 |
37 | ## Examples
38 |
39 | - Default mode, show stages `output.dvc` recursively depends on:
40 |
41 | ```dvc
42 | $ dvc pipeline show output.dvc
43 |
44 | raw.dvc
45 | data.dvc
46 | output.dvc
47 | ```
48 |
49 | - The same as previous, but show commands instead of DVC files:
50 |
51 | ```dvc
52 | $ dvc pipeline show output.dvc --commands
53 |
54 | download.py s3://mybucket/myrawdata raw
55 | cleanup.py raw data
56 | process.py data output
57 | ```
58 |
59 | - Visualize DVC pipeline:
60 |
61 | ```dvc
62 | $ dvc pipeline show eval.txt.dvc --ascii
63 | .------------------------.
64 | | data/Posts.xml.zip.dvc |
65 | `------------------------'
66 | *
67 | *
68 | *
69 | .---------------.
70 | | Posts.xml.dvc |
71 | `---------------'
72 | *
73 | *
74 | *
75 | .---------------.
76 | | Posts.tsv.dvc |
77 | `---------------'
78 | *
79 | *
80 | *
81 | .---------------------.
82 | | Posts-train.tsv.dvc |
83 | `---------------------'
84 | *
85 | *
86 | *
87 | .--------------------.
88 | | matrix-train.p.dvc |
89 | `--------------------'
90 | *** ***
91 | ** ***
92 | ** **
93 | .-------------. **
94 | | model.p.dvc | **
95 | `-------------' ***
96 | *** ***
97 | ** **
98 | ** **
99 | .--------------.
100 | | eval.txt.dvc |
101 | `--------------'
102 | ```
103 |
104 | - List dependencies recursively if graph have tree structure
105 |
106 | ```dvc
107 | dvc pipeline show e.file.dvc --tree
108 | e.file.dvc
109 | ├── c.file.dvc
110 | │ └── b.file.dvc
111 | │ └── a.file.dvc
112 | └── d.file.dvc
113 | ```
114 |
--------------------------------------------------------------------------------
/src/PromoSection/index.js:
--------------------------------------------------------------------------------
1 | import React from 'react'
2 | import styled from 'styled-components'
3 | import { media } from '../styles'
4 | import { logEvent } from '../utils/ga'
5 |
6 | const getStarted = () => {
7 | logEvent('promo', 'get-started')
8 | window.location = '/doc/get-started'
9 | }
10 |
11 | const features = () => {
12 | logEvent('promo', 'features')
13 | window.location = '/features'
14 | }
15 |
16 | export default ({}) => (
17 |
18 |
19 |
20 |
21 | For data scientists, by data scientists
22 |
23 |
26 |
27 |
28 |
29 |
30 |
31 | )
32 |
33 | const PromoSection = styled.section`
34 | position: relative;
35 | height: 278px;
36 | background-color: #945dd6;
37 | display: flex;
38 | align-items: center;
39 | justify-content: center;
40 | `
41 |
42 | const Container = styled.div`
43 | width: 100%;
44 | max-width: 1035px;
45 | `
46 |
47 | const Title = styled.h3`
48 | font-family: BrandonGrotesqueMed;
49 | max-width: 438px;
50 | min-height: 44px;
51 | font-size: 30px;
52 | font-weight: 500;
53 | text-align: center;
54 | color: #ffffff;
55 | margin: 0px auto;
56 | `
57 |
58 | const Buttons = styled.div`
59 | display: flex;
60 | max-width: 386px;
61 | margin: 0px auto;
62 | margin-top: 20px;
63 | align-items: center;
64 | flex-direction: row;
65 | ${media.phablet`
66 | flex-direction: column;
67 | `};
68 | `
69 |
70 | const Button = styled.button`
71 | font-family: BrandonGrotesqueMed;
72 | cursor: pointer;
73 | min-width: 186px;
74 | height: 60px;
75 | border-radius: 4px;
76 | background-color: #945dd6;
77 | border: solid 2px rgba(255, 255, 255, 0.3);
78 |
79 | font-size: 20px;
80 | font-weight: 500;
81 | line-height: 0.9;
82 |
83 | text-align: left;
84 | padding: 0px 21px;
85 |
86 | color: #ffffff;
87 |
88 | background: url('/static/img/arrow_right_white.svg') right center no-repeat;
89 | background-position-x: 147px;
90 | transition: 0.2s background-color ease-out;
91 |
92 | &:hover {
93 | background-color: #885ccb;
94 | }
95 |
96 | ${props =>
97 | props.first &&
98 | `
99 | color: #945dd6;
100 | margin-right: 14px;
101 | border-radius: 4px;
102 | background-color: #ffffff;
103 | box-shadow: 0 2px 4px 0 rgba(0, 0, 0, 0.21);
104 |
105 | background-image: url('/static/img/arrow_right_dark.svg');
106 | transition: 0.2s background-color ease-out;
107 |
108 | &:hover {
109 | background-color: #F5F5F5
110 | }
111 |
112 | ${media.phablet`
113 | margin-right: 0px;
114 | `}
115 | `};
116 |
117 | ${media.phablet`
118 | margin-bottom: 12px;
119 | margin-right: 0px !important;
120 | `};
121 | `
122 |
123 | const Glyph = styled.img`
124 | position: absolute;
125 | z-index: 0;
126 | width: 158px;
127 | height: auto;
128 |
129 | ${media.tablet`
130 | width: 110px;
131 | `};
132 |
133 | object-fit: contain;
134 |
135 | ${props =>
136 | props.gid === 'topleft' &&
137 | `
138 | top: -25px;
139 | left: 40px;
140 | `} ${props =>
141 | props.gid === 'rigthbottom' &&
142 | `
143 | bottom: -60px;
144 | right: 30px;
145 | `};
146 |
147 | ${media.phablet`
148 | display: none;
149 | `};
150 | `
151 |
--------------------------------------------------------------------------------
/static/docs/changelog/0.35.md:
--------------------------------------------------------------------------------
1 | # v0.19 - v0.35
2 |
3 | We've launched the
4 | [DVC Patreon campaign](https://www.patreon.com/DVCorg/overview) - it's one of
5 | the ways to support the project if you like it.
6 |
7 | Now, let’s **highlight the changes** (not including bug fixes, and minor
8 | improvements) we have done in the last few months:
9 |
10 | - 🏷 We received a lot of feedback that using Git branches is not always an
11 | optimal way to manage experiments. We have added an option to **support Git
12 | tags** (Git commits are coming). The new option `-T` or `--all-tags` is
13 | supported by all DVC commands that support`-a` or `--all-branches`.
14 |
15 | - 📖 [Get started guide](https://dvc.org/doc/get-started/agenda) has been
16 | simplified (e.g. to use tags instead of branches) and extended. We have also
17 | prepared a
18 | [Github DVC project ](https://github.com/iterative/example-get-started)that
19 | reflects the sequence of steps in the “get started” guide. You can now
20 | download the whole project and reproduce all the models.
21 |
22 | - **`dvc diff`** **command introduced**. Summary statistics for the
23 | directory/file under the DVC control. How many files were
24 | added/deleted/modified/size:
25 |
26 | ```diff
27 | (HEAD)$ tree image (HEAD^)$ tree image
28 | images images
29 | ├── color.png └── grey.png
30 | └── grey.png
31 | ```
32 |
33 | ```dvc
34 | $ dvc diff -t images HEAD^1
35 |
36 | diff for 'images'
37 | -images with md5 ad0a6adcd409cae3263b28487064e1f2.dir
38 | +images with md5 283215dface0d41291482330324632fc.dir
39 |
40 | 1 file not changed, 0 files modified, 1 file added, 0 files deleted, size was increased by 15.3 MB
41 | ```
42 |
43 | - We’ve introduced the dvc commit command and `dvc run/repro/add --no-commit`
44 | flag to give a way to **avoid uncontrolled cache growth** and as a way to save
45 | some `dvc repro` runs. In the future we plan to have “do-not-cache-my-data” as
46 | a default mode for `dvc run`, `dvc add` and `dvc repro`.
47 | - **SSH remotes (data storage) support** - config options to set port, key
48 | files, timeouts, password, etc + improved stability and Windows support!
49 | Introduced **HTTP remotes** - external dependencies and as a read-only cache.
50 | - **Control over where DVC files are located in your project** - place them
51 | wherever you want with the `-f` option supported by all relevant commands -
52 | `dvc add`, `dvc run`, and `dvc import`.
53 | - 🙂A lot of **UI improvements** . Starting from the finally fixed nasty issue
54 | with Windows terminal printing a lot of garbage symbols, to using progress
55 | bars for checkouts, better metrics output, and lots of smaller things:
56 | 
57 |
58 | - **⚡️Performance optimizations.** The most notable one is the migration from
59 | using the plain JSON file to the embedded SQLLite engine to cache file and
60 | directory checksums, another one is improved performance, stability and
61 | general user experience for the commands that navigate tags or branches (all
62 | the commands that include `--all-bracnhes`, `-a` or `--all-tags`, `-T`).
63 |
64 | There are new
65 | [DVC integrations and plugins](https://dvc.org/doc/user-guide/plugins)
66 | available:
67 |
68 | - Finally there is an official
69 | [Bash and Zsh completion](https://dvc.org/doc/user-guide/autocomplete) for
70 | DVC!
71 | - David Příhoda contributed and is developing the
72 | [JetBrains IDEs plugin](https://plugins.jetbrains.com/plugin/11368-data-version-control-dvc-support)
73 | (PyCharm, IntelliJ, etc).
74 |
75 | Don't hesitate to
76 | [like\star DVC repository](https://github.com/iterative/dvc/stargazers) if you
77 | haven't yet. We are waiting for your feedback!
78 |
--------------------------------------------------------------------------------