├── _posts ├── .gitkeep ├── 2016-10-19-current-environment.md ├── 2016-10-19-contributing-to-the-documentation.md ├── 2025-11-23-using-the-current-environment.md ├── 2016-10-19-code-of-conduct.md ├── 2016-10-19-contributing-with-code.md ├── 2025-11-09-linear-regression.md └── 2025-11-09-roadmap.md ├── .gitignore ├── README.md ├── _layouts ├── page.html └── default.html ├── _includes ├── header.html ├── google_analytics.html ├── footer.html ├── disqus.html └── navigation.html ├── _drafts └── 2018-01-18-basic-setup.md ├── index.md ├── _config.yml ├── css ├── main.css └── syntax.css ├── bin └── jekyll-page └── toc.js /_posts/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.sw? 2 | _site 3 | _pages 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Want to contribute? 2 | 3 | Read the docs: http://datahaskell.org/docs 4 | -------------------------------------------------------------------------------- /_layouts/page.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | --- 4 | 5 | 10 | 11 | {{ content }} 12 | -------------------------------------------------------------------------------- /_includes/header.html: -------------------------------------------------------------------------------- 1 |

{{ site.title }} 2 | {% if site.subtitle %}{{ site.subtitle }}{% endif %} 3 |

4 | 11 | 12 | -------------------------------------------------------------------------------- /_includes/google_analytics.html: -------------------------------------------------------------------------------- 1 | 10 | -------------------------------------------------------------------------------- /_includes/footer.html: -------------------------------------------------------------------------------- 1 | 19 | 20 | -------------------------------------------------------------------------------- /_includes/disqus.html: -------------------------------------------------------------------------------- 1 |
2 | 13 | 14 | -------------------------------------------------------------------------------- /_drafts/2018-01-18-basic-setup.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "basic setup for the beginner tutorials" 4 | category: tutorial 5 | date: 2018-01-18 13:14:00 6 | --- 7 | 8 | Welcome to the beginner tutorials. The basic development setup suggested below intends to provide a starting point for beginners and a starting point to follow the workflow used in the other tutorials. Feel free to ignore or use an alternative setup of your preference. 9 | 10 | ## Dependencies 11 | 12 | - Haskell [Stack](https://haskell-lang.org/get-started) build system 13 | - [emacs](https://www.gnu.org/software/emacs/) editor/IDE 14 | - [Intero](https://haskell-lang.org/intero) Haskell interactive development extension for IDEs 15 | 16 | ## Basic workflow 17 | 18 | - [reading a csv](http://howistart.org/posts/haskell/1/) Chris Allen has written a very approachable tutorial describing all the basic workflow from creating a Haskell project to reading a CSV file to summing how many *at bats* present in the file. It also showcases something that Haskell lets you more conveniently than other languages: streaming the data in constant memory. 19 | 20 | -------------------------------------------------------------------------------- /_includes/navigation.html: -------------------------------------------------------------------------------- 1 | 28 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "Documentation" 4 | --- 5 | 6 | # DataHaskell 7 | 8 | Welcome to the **dataHaskell** documentation page. 9 | 10 | DataHaskell is: 11 | * A collaborative network of maintainers, users, researchers, and educators. 12 | * A curated ecosystem of libraries for data access, numerics, ML, visualization, and tooling. 13 | * A supportive space to learn, teach, and build together. 14 | 15 | We coordinate work across many projects and orgs. You don’t have to be an expert—curiosity is enough. 16 | 17 | ### How to Participate 18 | Pick a path that suits your energy and time: 19 | * Try it out: get started with the DataHaskell stack and port some of your workflows to DataHaskell. 20 | * Engage the material: Read through some tutorials and give feedback on them. 21 | * Ask & answer questions: help someone today in Discord or Discourse. 22 | * Improve docs: fix a typo, add an example, or clarify a concept. 23 | * Triage issues: label, reproduce, and suggest next steps. 24 | * Build examples: small, runnable notebooks or scripts showcasing a pattern. 25 | * Propose an initiative: open an RFC for a cross‑project improvement. 26 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | # Site title and subtitle. This is used in _includes/header.html 2 | title: 'dataHaskell' 3 | subtitle: 'Documentation and general information' 4 | 5 | # if you wish to integrate disqus on pages set your shortname here 6 | disqus_shortname: 'datahaskell' 7 | 8 | # if you use google analytics, add your tracking id here 9 | google_analytics_id: '' 10 | 11 | # Enable/show navigation. There are there options: 12 | # 0 - always hide 13 | # 1 - always show 14 | # 2 - show only if posts are present 15 | navigation: 2 16 | 17 | # URL to source code, used in _includes/footer.html 18 | codeurl: 'https://github.com/datahaskell/docs' 19 | 20 | # Default categories (in order) to appear in the navigation 21 | sections: [ 22 | ['getting_started', 'Getting Started'], 23 | ['community', 'Community'], 24 | ['tutorial', 'Tutorials'], 25 | ['library', 'Libraries'], 26 | ['help', 'Help wanted'], 27 | ['other', 'Other'] 28 | ] 29 | 30 | # Keep as an empty string if served up at the root. If served up at a specific 31 | # path (e.g. on GitHub pages) leave off the trailing slash, e.g. /my-project 32 | baseurl: 'https://www.datahaskell.org/docs' 33 | 34 | # Dates are not included in permalinks 35 | permalink: none 36 | 37 | # Syntax highlighting 38 | highlighter: rouge 39 | 40 | # Since these are pages, it doesn't really matter 41 | future: true 42 | 43 | # Exclude non-site files 44 | exclude: ['bin', 'README.md'] 45 | 46 | # Use the kramdown Markdown renderer 47 | markdown: kramdown 48 | redcarpet: 49 | extensions: [ 50 | 'no_intra_emphasis', 51 | 'fenced_code_blocks', 52 | 'autolink', 53 | 'strikethrough', 54 | 'superscript', 55 | 'with_toc_data', 56 | 'tables', 57 | 'hardwrap' 58 | ] 59 | -------------------------------------------------------------------------------- /css/main.css: -------------------------------------------------------------------------------- 1 | body { 2 | font-weight: 400; 3 | font-size: 1.15em; 4 | text-shadow: 0 1px 1px rgba(255, 255, 255, 0.7); 5 | display: flex; 6 | min-height: 100vh; 7 | flex-direction: column; 8 | } 9 | 10 | h1 { 11 | font-size: 3em; 12 | } 13 | 14 | h2 { 15 | font-size: 2.5em; 16 | } 17 | 18 | h3 { 19 | font-size: 2em; 20 | } 21 | 22 | main { 23 | flex: 1 0 auto; 24 | } 25 | 26 | pre, code, pre code { 27 | border: none; 28 | border-radius: 0; 29 | background-color: #f9f9f9; 30 | font-size: 0.95em; 31 | } 32 | 33 | .highlight { 34 | background-color: #f9f9f9; 35 | } 36 | 37 | pre { 38 | /*font-size: 1em;*/ 39 | } 40 | 41 | code { 42 | color: inherit; 43 | } 44 | 45 | #header { 46 | border-bottom: 1px solid #eee; 47 | margin-bottom: 20px; 48 | } 49 | 50 | #header a:hover { 51 | text-decoration: none; 52 | } 53 | 54 | #footer { 55 | margin: 20px 0; 56 | font-size: 0.85em; 57 | color: #999; 58 | text-align: center; 59 | } 60 | body.rtl { 61 | direction: rtl; 62 | } 63 | 64 | body.rtl #header .brand { 65 | float: right; 66 | margin-left: 5px; 67 | } 68 | body.rtl .row-fluid [class*="span"] { 69 | float: right !important; 70 | margin-left: 0; 71 | margin-right: 2.564102564102564%; 72 | } 73 | body.rtl .row-fluid [class*="span"]:first-child { 74 | margin-right: 0; 75 | } 76 | 77 | body.rtl ul, body.rtl ol { 78 | margin: 0 25px 10px 0; 79 | } 80 | 81 | table { 82 | margin-bottom: 1rem; 83 | border: 1px solid #e5e5e5; 84 | border-collapse: collapse; 85 | } 86 | 87 | td, th { 88 | padding: .25rem .5rem; 89 | border: 1px solid #e5e5e5; 90 | } 91 | 92 | #header-img { 93 | filter: brightness(70%); 94 | } 95 | 96 | .p-coll { 97 | font-size: 0.8em; 98 | } 99 | 100 | .side-nav{ 101 | padding-right: -10%; 102 | overflow: hidden; 103 | } 104 | 105 | .side-nav:hover { 106 | overflow-y: scroll; 107 | } 108 | -------------------------------------------------------------------------------- /_posts/2016-10-19-current-environment.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Supported libraries" 4 | category: community 5 | date: 2016-10-19 21:08:18 6 | --- 7 | 8 | Libraries written or supported by members of the DataHaskell team. 9 | 10 | ## Notebooks 11 | - **IHaskell**[](https://github.com/IHaskell/IHaskell){:.github} [![Hackage](https://img.shields.io/hackage/v/ihaskell.svg)](https://hackage.haskell.org/package/ihaskell) [![ihaskell](http://stackage.org/package/hasktorch/badge/nightly)](http://stackage.org/nightly/package/hasktorch) : IHaskell is a kernel for the Jupyter project, which allows you to use Haskell inside Jupyter frontends (including the console and notebook). 12 |

Maintainers: [Vaibhav Sagar](https://github.com/vaibhavsagar) 13 | 14 | ## Machine Learning 15 | 16 | ### Neural Networks 17 | - **hasktorch** [](https://github.com/hasktorch/hasktorch){:.github} [![Hackage](https://img.shields.io/hackage/v/hasktorch.svg)](https://hackage.haskell.org/package/hasktorch) [![hasktorch](http://stackage.org/package/hasktorch/badge/nightly)](http://stackage.org/nightly/package/hasktorch) : Hasktorch is a library for tensors and neural networks in Haskell. 18 |

Maintainers: [Junji Hashimoto](https://github.com/junjihashimoto) 19 | 20 | ## Publication 21 | 22 | - **pandoc-plot** [](https://github.com/LaurentRDC/pandoc-plot)[![Hackage](https://img.shields.io/hackage/v/pandoc-plot.svg)](https://hackage.haskell.org/package/pandoc-plot) [![pandoc-plot](http://stackage.org/package/pandoc-plot/badge/nightly)](http://stackage.org/nightly/package/pandoc-plot): A Pandoc filter to include figures generated from code blocks. Keep the document and code in the same location. 23 |

Maintainers: [Laurent P. René de Cotret](https://github.com/LaurentRDC) 24 | 25 | ## Data structures 26 | 27 | ### Data frames 28 | 29 | - **dataframe** [](https://github.com/mchav/dataframe)[![Hackage](https://img.shields.io/hackage/v/dataframe.svg)](https://hackage.haskell.org/package/dataframe) [![dataframe](http://stackage.org/package/Frames/badge/nightly)](http://stackage.org/nightly/package/dataframe) : A fast, safe, and intuitive DataFrame library. 30 |

Maintainers: [Michael Chavinda](https://github.com/mchav) 31 | -------------------------------------------------------------------------------- /_posts/2016-10-19-contributing-to-the-documentation.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Contributing to the documentation" 4 | category: community 5 | date: 2016-10-19 15:43:05 6 | --- 7 | **dataHaskell**'s documentation is served as a Jekyll site, using GitHub pages. 8 | 9 | ## Steps for contribution 10 | 11 | 1. **Fork** the [dataHaskell/docs](https://github.com/DataHaskell/docs) repository. 12 | 2. If you want to add a page, in the root of the repository **execute** `ruby bin/jekyll-page "TITLE" CATEGORY` where `TITLE` is the title of the page that you want to add, and `CATEGORY` is one of these: 13 | - `community` - Documentation related to the community. Contribution guidelines, codes of conduct, information of interest, events... 14 | - `tutorial` - Tutorial on how to achieve different data science or Haskell goals. From doing a regression, to understanding how to SubHask works. 15 | - `library` - Library overviews. Benchmarks, documentation for them (if no good official documentation is provided), advantages and disadvantages. 16 | - `other` - Before submitting any page to this category, discuss with the community if a new category should be created. 17 | 3. Optionally, but optimally, do a `jekyll serve` in the root of the repository to be sure that all the contributions you've made are displayed correctly. 18 | 19 | ## Things to have in mind 20 | 21 | - Where possible, adhere to the existing formats. For example, if adding a library overview try to mimick as much as possible the format of other pages in the same category. If you are submitting something new, ask in the community. 22 | - When writing informative material, which will be most of the time, try to explain as for a person that does not know anything but has the ability of learning everything in a split second, avoiding acronyms. For example, instead of saying: *"Here we use a DRNN for..."* try to write *"Here we use a Deep Recursive Neural Network (which is a kind of machine learning algorithm) for..."*. Even if this is not 100% true, it might save the reader a couple of minutes googling and figuring out how/why it works. The reader might deepen into the subject if it is their desire. 23 | - **Use a lot of examples**, even if they are really simple. Most of the people learn better from examples than from theorems or just words. If you are able to, include graphics, plots, code snippets or whatever comes to your mind. 24 | -------------------------------------------------------------------------------- /bin/jekyll-page: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env ruby 2 | 3 | require 'date' 4 | require 'optparse' 5 | 6 | options = { 7 | # Expects to be in the bin/ sub-directory by default 8 | :path => File.dirname(File.dirname(__FILE__)) 9 | } 10 | 11 | parser = OptionParser.new do |opt| 12 | opt.banner = 'usage: jekyll-page TITLE CATEGORY [FILENAME] [OPTIONS]' 13 | opt.separator '' 14 | opt.separator 'Options' 15 | opt.on('-e', '--edit', 'Edit the page') do |edit| 16 | options[:edit] = true 17 | end 18 | opt.on('-l', '--link', 'Relink pages') do |link| 19 | options[:link] = true 20 | end 21 | opt.on('-p PATH', '--path PATH', String, 'Path to project root') do |path| 22 | options[:path] = path 23 | end 24 | opt.separator '' 25 | end 26 | 27 | parser.parse! 28 | 29 | title = ARGV[0] 30 | category = ARGV[1] 31 | filename = ARGV[2] 32 | 33 | # Resolve any relative links 34 | BASE_DIR = File.expand_path(options[:path]) 35 | POSTS_DIR = "#{BASE_DIR}/_posts" 36 | PAGES_DIR = "#{BASE_DIR}/_pages" 37 | 38 | # Ensure the _posts directory exists (we are in the correct directory) 39 | if not Dir.exists?(POSTS_DIR) 40 | puts "#{POSTS_DIR} directory does not exists" 41 | exit 42 | end 43 | 44 | # Create _pages directory if it doesn't exist 45 | if not Dir.exists?(PAGES_DIR) 46 | Dir.mkdir(PAGES_DIR) 47 | end 48 | 49 | if options[:link] 50 | Dir.foreach(POSTS_DIR) do |name| 51 | next if name[0] == '.' 52 | nodate = name[/\d{4}-\d{2}-\d{2}-(?.*)/, 'rest'] 53 | if File.symlink?("#{PAGES_DIR}/#{nodate}") 54 | File.delete("#{PAGES_DIR}/#{nodate}") 55 | end 56 | abspath = File.absolute_path("#{POSTS_DIR}/#{name}") 57 | File.symlink(abspath, "#{PAGES_DIR}/#{nodate}") 58 | end 59 | end 60 | 61 | if not title or not category 62 | # This flag can be used by itself, exit silently if no arguments 63 | # are defined 64 | if not options[:link] 65 | puts parser 66 | end 67 | exit 68 | end 69 | 70 | if not filename 71 | filename = title.downcase.gsub(/[^a-z0-9\s]/, '').gsub(/\s+/, '-') 72 | end 73 | 74 | today=Date.today().strftime('%F') 75 | now=DateTime.now().strftime('%F %T') 76 | 77 | filepath = "#{POSTS_DIR}/#{today}-#{filename}.md" 78 | symlink = "#{PAGES_DIR}/#{filename}.md" 79 | 80 | if File.exists?(filepath) 81 | puts "File #{filepath} already exists" 82 | exit 83 | end 84 | 85 | content = <Jump to...', 7 | minimumHeaders: 3, 8 | headers: 'h1, h2, h3, h4, h5, h6', 9 | listType: 'ol', // values: [ol|ul] 10 | showEffect: 'show', // values: [show|slideDown|fadeIn|none] 11 | showSpeed: 'slow', // set to 0 to deactivate effect 12 | classes: { list: '', 13 | item: '' 14 | } 15 | }, 16 | settings = $.extend(defaults, options); 17 | 18 | function fixedEncodeURIComponent (str) { 19 | return encodeURIComponent(str).replace(/[!'()*]/g, function(c) { 20 | return '%' + c.charCodeAt(0).toString(16); 21 | }); 22 | } 23 | 24 | function createLink (header) { 25 | var innerText = (header.textContent === undefined) ? header.innerText : header.textContent; 26 | return "" + innerText + ""; 27 | } 28 | 29 | var headers = $(settings.headers).filter(function() { 30 | // get all headers with an ID 31 | var previousSiblingName = $(this).prev().attr( "name" ); 32 | if (!this.id && previousSiblingName) { 33 | this.id = $(this).attr( "id", previousSiblingName.replace(/\./g, "-") ); 34 | } 35 | return this.id; 36 | }), output = $(this); 37 | if (!headers.length || headers.length < settings.minimumHeaders || !output.length) { 38 | $(this).hide(); 39 | return; 40 | } 41 | 42 | if (0 === settings.showSpeed) { 43 | settings.showEffect = 'none'; 44 | } 45 | 46 | var render = { 47 | show: function() { output.hide().html(html).show(settings.showSpeed); }, 48 | slideDown: function() { output.hide().html(html).slideDown(settings.showSpeed); }, 49 | fadeIn: function() { output.hide().html(html).fadeIn(settings.showSpeed); }, 50 | none: function() { output.html(html); } 51 | }; 52 | 53 | var get_level = function(ele) { return parseInt(ele.nodeName.replace("H", ""), 10); }; 54 | var highest_level = headers.map(function(_, ele) { return get_level(ele); }).get().sort()[0]; 55 | var return_to_top = ' '; 56 | 57 | var level = get_level(headers[0]), 58 | this_level, 59 | html = settings.title + " <" +settings.listType + " class=\"" + settings.classes.list +"\">"; 60 | headers.on('click', function() { 61 | if (!settings.noBackToTopLinks) { 62 | window.location.hash = this.id; 63 | } 64 | }) 65 | .addClass('clickable-header') 66 | .each(function(_, header) { 67 | this_level = get_level(header); 68 | if (!settings.noBackToTopLinks && this_level === highest_level) { 69 | $(header).addClass('top-level-header').after(return_to_top); 70 | } 71 | if (this_level === level) // same level as before; same indenting 72 | html += "
  • " + createLink(header); 73 | else if (this_level <= level){ // higher level than before; end parent ol 74 | for(var i = this_level; i < level; i++) { 75 | html += "
  • " 76 | } 77 | html += "
  • " + createLink(header); 78 | } 79 | else if (this_level > level) { // lower level than before; expand the previous to contain a ol 80 | for(i = this_level; i > level; i--) { 81 | html += "<" + settings.listType + " class=\"" + settings.classes.list +"\">" + 82 | "
  • " 83 | } 84 | html += createLink(header); 85 | } 86 | level = this_level; // update for the next one 87 | }); 88 | html += ""; 89 | if (!settings.noBackToTopLinks) { 90 | $(document).on('click', '.back-to-top', function() { 91 | $(window).scrollTop(0); 92 | window.location.hash = ''; 93 | }); 94 | } 95 | 96 | render[settings.showEffect](); 97 | }; 98 | })(jQuery); 99 | -------------------------------------------------------------------------------- /_layouts/default.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | {{ site.title }}{% if page.title %} : {{ page.title }}{% endif %} 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
    21 |
    22 |
    23 | {% include navigation.html %} 24 |
    25 | 26 |
    27 | {{ content }} 28 |
    29 | 30 |
    31 |
    32 |
    33 |
    34 |
    35 | 36 | {% if page.disqus == 1 %} 37 |
    38 | {% if site.navigation == 1 or post_count > 0 %} 39 | 40 |
    41 | {% include disqus.html %} 42 |
    43 | {% else %} 44 |
    45 | {% include disqus.html %} 46 |
    47 | {% endif %} 48 |
    49 | {% endif %} 50 | 51 | 52 | 53 | {% include footer.html %} 54 | 123 | {% if site.google_analytics_id != "" %} 124 | {% include google_analytics.html %} 125 | {% endif %} 126 | 127 | 128 | -------------------------------------------------------------------------------- /_posts/2025-11-23-using-the-current-environment.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Using the current environment" 4 | category: getting_started 5 | date: 2026-11-23 20:42:38 6 | --- 7 | 8 | ## Getting started 9 | 10 | We recommend using **VS Code + Jupyter** as the default development stack for DataHaskell: 11 | - VS Code as your editor 12 | - Jupyter notebooks for literate, reproducible analysis 13 | - A Haskell notebook kernel (currently IHaskell) 14 | - The DataHaskell libraries (e.g. `dataframe`, `hasktorch`, plotting, etc.) 15 | 16 | This page walks you through: 17 | 18 | 1. Installing the basic tools 19 | 2. Choosing an environment (Dev Container vs local install) 20 | 3. Verifying everything with a “hello DataHaskell” notebook 21 | 22 | --- 23 | 24 | ## 1. Install the basics 25 | 26 | You only need to do this once per machine. 27 | 28 | ### 1.1. VS Code 29 | 30 | 1. Install **Visual Studio Code** from the official website. 31 | 2. Open VS Code and install these extensions: 32 | - **Jupyter** 33 | - **Python** (used by the Jupyter extension, even if you write Haskell) 34 | - **Dev Containers** (if you plan to use the container-based environment) 35 | - **Haskell** (for syntax highlighting, type info, etc.) 36 | 37 | ### 1.2. Git 38 | 39 | Install Git so you can clone repositories: 40 | 41 | - macOS: via Homebrew (`brew install git`) or Xcode command line tools 42 | - Linux: via your package manager (e.g. `sudo apt install git`) 43 | - Windows: [Git for Windows] or via WSL (Ubuntu on Windows) 44 | 45 | ### 1.3. (Optional but recommended) Docker 46 | 47 | If you want the easiest, most reproducible setup, install Docker: 48 | 49 | - Docker Desktop (macOS/Windows) or 50 | - `docker` + `docker-compose` from your Linux distro 51 | 52 | The Dev Container–based environment assumes Docker is available. 53 | 54 | --- 55 | 56 | ## 2. Choose an environment 57 | 58 | You have **two main options**: 59 | 60 | 1. **Option A (recommended): VS Code Dev Container** 61 | Everything is pre-installed in a Docker image (GHC, Cabal/Stack, IHaskell, DataFrame, etc). 62 | 63 | 2. **Option B: Local installation** 64 | Install GHC, Cabal, Jupyter, IHaskell, and DataHaskell libraries directly on your machine. 65 | 66 | If you’re not sure which to choose, pick **Option A**. 67 | 68 | --- 69 | 70 | ## 3. Option A – Dev Container (recommended) 71 | 72 | This is the “batteries included” path. You get a pinned environment without polluting your global system. 73 | 74 | ### 3.1. Clone the starter repository 75 | 76 | We provide a starter repository with a ready-made environment and example notebooks: 77 | 78 | ```bash 79 | git clone https://github.com/DataHaskell/datahaskell-starter 80 | cd datahaskell-starter 81 | ``` 82 | 83 | ### 3.2. Open the project in VS Code 84 | 85 | ```bash 86 | code . 87 | ``` 88 | 89 | You'll get a popup asking if you want to re-ooen the project in a container. 90 | Select this option and VS Code will load the DataHaskell docker file. 91 | 92 | ### 3.3. Running the example notebook 93 | 94 | Open the `getting-started` notebook. You'll see a section that says `Select Kernel` at the top right. 95 | 96 | Upon clicking it you'll be asked to select a kernel. Go to `Jupyter Environment` and use the Haskell kernel installed there. 97 | 98 | ## 3. Option B – Installing everything locally 99 | 100 | We recommend you use cabal for this section. 101 | 102 | ```bash 103 | cabal update 104 | cabal install --lib dataframe ihaskell-dataframe hasktorch \ 105 | ihaskell dataframe-hasktorch ihaskell-dataframe time ihaskell template-haskell \ 106 | vector text containers array random unix directory regex-tdfa containers \ 107 | cassava statistics monad-bayes aeson \ 108 | --force-reinstalls 109 | cabal install ihaskell --install-method=copy --installdir=/opt/bin 110 | ihaskell install --ghclib=$(ghc --print-libdir) --prefix=$HOME/.local/ 111 | jupyter kernelspec install $HOME/.local/share/jupyter/kernels/haskell/ 112 | jupyter notebook 113 | ``` 114 | 115 | Check if this setup is working by trying out the linear regression tutorial from the DataHaskell website. 116 | 117 | > Note this way of globally installing packages might break some of your existing projects. 118 | 119 | -------------------------------------------------------------------------------- /css/syntax.css: -------------------------------------------------------------------------------- 1 | .highlight .hll { background-color: #ffffcc } 2 | .highlight { background: #ffffff; } 3 | .highlight .c { color: #888888 } /* Comment */ 4 | .highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */ 5 | .highlight .k { color: #008800; font-weight: bold } /* Keyword */ 6 | .highlight .cm { color: #888888 } /* Comment.Multiline */ 7 | .highlight .cp { color: #cc0000; font-weight: bold } /* Comment.Preproc */ 8 | .highlight .c1 { color: #888888 } /* Comment.Single */ 9 | .highlight .cs { color: #cc0000; font-weight: bold; background-color: #fff0f0 } /* Comment.Special */ 10 | .highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */ 11 | .highlight .ge { font-style: italic } /* Generic.Emph */ 12 | .highlight .gr { color: #aa0000 } /* Generic.Error */ 13 | .highlight .gh { color: #333333 } /* Generic.Heading */ 14 | .highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */ 15 | .highlight .go { color: #888888 } /* Generic.Output */ 16 | .highlight .gp { color: #555555 } /* Generic.Prompt */ 17 | .highlight .gs { font-weight: bold } /* Generic.Strong */ 18 | .highlight .gu { color: #666666 } /* Generic.Subheading */ 19 | .highlight .gt { color: #aa0000 } /* Generic.Traceback */ 20 | .highlight .kc { color: #008800; font-weight: bold } /* Keyword.Constant */ 21 | .highlight .kd { color: #008800; font-weight: bold } /* Keyword.Declaration */ 22 | .highlight .kn { color: #008800; font-weight: bold } /* Keyword.Namespace */ 23 | .highlight .kp { color: #008800 } /* Keyword.Pseudo */ 24 | .highlight .kr { color: #008800; font-weight: bold } /* Keyword.Reserved */ 25 | .highlight .kt { color: #888888; font-weight: bold } /* Keyword.Type */ 26 | .highlight .m { color: #0000DD; font-weight: bold } /* Literal.Number */ 27 | .highlight .s { color: #dd2200; background-color: #fff0f0 } /* Literal.String */ 28 | .highlight .na { color: #336699 } /* Name.Attribute */ 29 | .highlight .nb { color: #003388 } /* Name.Builtin */ 30 | .highlight .nc { color: #bb0066; font-weight: bold } /* Name.Class */ 31 | .highlight .no { color: #003366; font-weight: bold } /* Name.Constant */ 32 | .highlight .nd { color: #555555 } /* Name.Decorator */ 33 | .highlight .ne { color: #bb0066; font-weight: bold } /* Name.Exception */ 34 | .highlight .nf { color: #0066bb; font-weight: bold } /* Name.Function */ 35 | .highlight .nl { color: #336699; font-style: italic } /* Name.Label */ 36 | .highlight .nn { color: #bb0066; font-weight: bold } /* Name.Namespace */ 37 | .highlight .py { color: #336699; font-weight: bold } /* Name.Property */ 38 | .highlight .nt { color: #bb0066; font-weight: bold } /* Name.Tag */ 39 | .highlight .nv { color: #336699 } /* Name.Variable */ 40 | .highlight .ow { color: #008800 } /* Operator.Word */ 41 | .highlight .w { color: #bbbbbb } /* Text.Whitespace */ 42 | .highlight .mf { color: #0000DD; font-weight: bold } /* Literal.Number.Float */ 43 | .highlight .mh { color: #0000DD; font-weight: bold } /* Literal.Number.Hex */ 44 | .highlight .mi { color: #0000DD; font-weight: bold } /* Literal.Number.Integer */ 45 | .highlight .mo { color: #0000DD; font-weight: bold } /* Literal.Number.Oct */ 46 | .highlight .sb { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Backtick */ 47 | .highlight .sc { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Char */ 48 | .highlight .sd { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Doc */ 49 | .highlight .s2 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Double */ 50 | .highlight .se { color: #0044dd; background-color: #fff0f0 } /* Literal.String.Escape */ 51 | .highlight .sh { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Heredoc */ 52 | .highlight .si { color: #3333bb; background-color: #fff0f0 } /* Literal.String.Interpol */ 53 | .highlight .sx { color: #22bb22; background-color: #f0fff0 } /* Literal.String.Other */ 54 | .highlight .sr { color: #008800; background-color: #fff0ff } /* Literal.String.Regex */ 55 | .highlight .s1 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Single */ 56 | .highlight .ss { color: #aa6600; background-color: #fff0f0 } /* Literal.String.Symbol */ 57 | .highlight .bp { color: #003388 } /* Name.Builtin.Pseudo */ 58 | .highlight .vc { color: #336699 } /* Name.Variable.Class */ 59 | .highlight .vg { color: #dd7700 } /* Name.Variable.Global */ 60 | .highlight .vi { color: #3333bb } /* Name.Variable.Instance */ 61 | .highlight .il { color: #0000DD; font-weight: bold } /* Literal.Number.Integer.Long */ 62 | -------------------------------------------------------------------------------- /_posts/2016-10-19-code-of-conduct.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Code of conduct" 4 | category: community 5 | date: 2016-10-19 16:18:58 6 | --- 7 | 8 | **dataHaskell** supports the [Berlin code of conduct](http://berlincodeofconduct.org/), which has a clear goal: 9 | **To enrich a friendly, safe and welcoming environment.** 10 | 11 | ## Purpose 12 | 13 | A primary goal of all the conferences and user groups that refer to this Code of Conduct is to be inclusive to the largest number of contributors, with the most varied and diverse backgrounds possible. As such, we are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, ability, ethnicity, socioeconomic status and religion (or lack thereof). 14 | 15 | This Code of Conduct outlines our expectations for all those who participate in our community, as well as the consequences for unacceptable behavior. 16 | 17 | We invite all those who participate in our events to help us create safe and positive experiences for everyone. 18 | 19 | ## Open [Source/Culture/Tech] Citizenship 20 | 21 | A supplemental goal of this Code of Conduct is to increase open [source/culture/tech] citizenship by encouraging participants to recognize and strengthen the relationships between our actions and their effects on our community. 22 | 23 | Communities mirror the societies in which they exist and positive action is essential to counteract the many forms of inequality and abuses of power that exist in society. 24 | 25 | If you see someone who is making an extra effort to ensure our community is welcoming, friendly, and encourages all participants to contribute to the fullest extent, we want to know. 26 | 27 | ## Expected Behavior 28 | 29 | - Participate in an authentic and active way. In doing so, you contribute to the health and longevity of this community. 30 | - Exercise consideration and respect in your speech and actions. 31 | - Attempt collaboration before conflict. 32 | - Refrain from demeaning, discriminatory, or harassing behavior and speech. 33 | - Be mindful of your surroundings and of your fellow participants. Alert community leaders if you notice a dangerous situation, someone in distress, or violations of this Code of Conduct, even if they seem inconsequential. 34 | 35 | ## Unacceptable Behavior 36 | 37 | Unacceptable behaviors include: intimidating, harassing, abusive, discriminatory, derogatory or demeaning speech or actions by any participant in our community online, at all related events and in one-on-one communications carried out in the context of community business. Community event venues may be shared with members of the public; please be respectful to all patrons of these locations. 38 | 39 | Harassment includes: harmful or prejudicial verbal or written comments related to gender, sexual orientation, race, religion, disability; inappropriate use of nudity and/or sexual images in public spaces (including presentation slides); deliberate intimidation, stalking or following; harassing photography or recording; sustained disruption of talks or other events; inappropriate physical contact, and unwelcome sexual attention. 40 | 41 | ## Consequences of Unacceptable Behavior 42 | 43 | Unacceptable behavior from any community member, including sponsors and those with decision-making authority, will not be tolerated. Anyone asked to stop unacceptable behavior is expected to comply immediately. 44 | 45 | If a community member engages in unacceptable behavior, the community organizers may take any action they deem appropriate, up to and including a temporary ban or permanent expulsion from the community without warning (and without refund in the case of a paid event). 46 | 47 | ## If You Witness or Are Subject to Unacceptable Behavior 48 | 49 | If you are subject to or witness unacceptable behavior, or have any other concerns, please notify a community organizer as soon as possible. You can find a list of organizers to contact for each of the supporters of this code of conduct at the bottom of this page. Additionally, community organizers are available to help community members engage with local law enforcement or to otherwise help those experiencing unacceptable behavior feel safe. In the context of in-person events, organizers will also provide escorts as desired by the person experiencing distress. 50 | 51 | ## Addressing Grievances 52 | 53 | If you feel you have been falsely or unfairly accused of violating this Code of Conduct, you should notify one of the event organizers with a concise description of your grievance. Your grievance will be handled in accordance with our existing governing policies. 54 | 55 | ## Scope 56 | 57 | We expect all community participants (contributors, paid or otherwise; sponsors; and other guests) to abide by this Code of Conduct in all community venues—online and in-person—as well as in all one-on-one communications pertaining to community business. 58 | 59 | ## License and attribution 60 | 61 | Berlin Code of Conduct is distributed under a Creative Commons Attribution-ShareAlike license. It is based on the [pdx.rb code of conduct](http://pdxruby.org/codeofconduct), which is distributed under the same license. 62 | -------------------------------------------------------------------------------- /_posts/2016-10-19-contributing-with-code.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Contributing with code" 4 | category: community 5 | date: 2016-10-19 17:00:25 6 | --- 7 | 8 | Managing code can be hard sometimes, in Haskell generally TMTOWTDI (There's more than one way to do it). Here are some tips for making your, and everyone's involved in your contribution, life easier: 9 | 10 | ## Formatting 11 | This code style guide is based on [the haskell style guide](https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md). 12 | 13 | ### Line Length 14 | 15 | Maximum line length is *80 characters*. 16 | 17 | ### Indentation 18 | 19 | Tabs are illegal. Use spaces for indenting. Indent your code blocks 20 | with *4 spaces*. Indent the `where` keyword two spaces to set it 21 | apart from the rest of the code and indent the definitions in a 22 | `where` clause 2 spaces. Some examples: 23 | 24 | ```haskell 25 | sayHello :: IO () 26 | sayHello = do 27 | name <- getLine 28 | putStrLn $ greeting name 29 | where 30 | greeting name = "Hello, " ++ name ++ "!" 31 | 32 | filter :: (a -> Bool) -> [a] -> [a] 33 | filter _ [] = [] 34 | filter p (x:xs) 35 | | p x = x : filter p xs 36 | | otherwise = filter p xs 37 | ``` 38 | 39 | ### Blank Lines 40 | 41 | One blank line between top-level definitions. No blank lines between 42 | type signatures and function definitions. Add one blank line between 43 | functions in a type class instance declaration if the function bodies 44 | are large. Use your judgement. 45 | 46 | ### Whitespace 47 | 48 | Surround binary operators with a single space on either side. Use 49 | your better judgement for the insertion of spaces around arithmetic 50 | operators but always be consistent about whitespace on either side of 51 | a binary operator. Don't insert a space after a lambda. 52 | 53 | ### Data Declarations 54 | 55 | Align the constructors in a data type definition. Example: 56 | 57 | ```haskell 58 | data Tree a = Branch !a !(Tree a) !(Tree a) 59 | | Leaf 60 | ``` 61 | 62 | For long type names the following formatting is also acceptable: 63 | 64 | ```haskell 65 | data HttpException 66 | = InvalidStatusCode Int 67 | | MissingContentHeader 68 | ``` 69 | 70 | Format records as follows: 71 | 72 | ```haskell 73 | data Person = Person 74 | { firstName :: !String -- ^ First name 75 | , lastName :: !String -- ^ Last name 76 | , age :: !Int -- ^ Age 77 | } deriving (Eq, Show) 78 | ``` 79 | 80 | ### List Declarations 81 | 82 | Align the elements in the list. Example: 83 | 84 | ```haskell 85 | exceptions = 86 | [ InvalidStatusCode 87 | , MissingContentHeader 88 | , InternalServerError 89 | ] 90 | ``` 91 | 92 | Optionally, you can skip the first newline. Use your judgement. 93 | 94 | ```haskell 95 | directions = [ North 96 | , East 97 | , South 98 | , West 99 | ] 100 | ``` 101 | 102 | ### Pragmas 103 | 104 | Put pragmas immediately following the function they apply to. 105 | Example: 106 | 107 | ```haskell 108 | id :: a -> a 109 | id x = x 110 | {-# INLINE id #-} 111 | ``` 112 | 113 | In the case of data type definitions you must put the pragma before 114 | the type it applies to. Example: 115 | 116 | ```haskell 117 | data Array e = Array 118 | {-# UNPACK #-} !Int 119 | !ByteArray 120 | ``` 121 | 122 | ### Hanging Lambdas 123 | 124 | You may or may not indent the code following a "hanging" lambda. Use 125 | your judgement. Some examples: 126 | 127 | ```haskell 128 | bar :: IO () 129 | bar = forM_ [1, 2, 3] $ \n -> do 130 | putStrLn "Here comes a number!" 131 | print n 132 | 133 | foo :: IO () 134 | foo = alloca 10 $ \a -> 135 | alloca 20 $ \b -> 136 | cFunction a b 137 | ``` 138 | 139 | ### Export Lists 140 | 141 | Format export lists as follows: 142 | 143 | ```haskell 144 | module Data.Set 145 | ( 146 | -- * The @Set@ type 147 | Set 148 | , empty 149 | , singleton 150 | 151 | -- * Querying 152 | , member 153 | ) where 154 | ``` 155 | 156 | ### If-then-else clauses 157 | 158 | Generally, guards and pattern matches should be preferred over if-then-else 159 | clauses, where possible. Short cases should usually be put on a single line 160 | (when line length allows it). 161 | 162 | When writing non-monadic code (i.e. when not using `do`) and guards 163 | and pattern matches can't be used, you can align if-then-else clauses 164 | like you would normal expressions: 165 | 166 | ```haskell 167 | foo = if ... 168 | then ... 169 | else ... 170 | ``` 171 | 172 | Otherwise, you should be consistent with the 4-spaces indent rule, and the 173 | `then` and the `else` keyword should be aligned. Examples: 174 | 175 | ```haskell 176 | foo = do 177 | someCode 178 | if condition 179 | then someMoreCode 180 | else someAlternativeCode 181 | ``` 182 | 183 | ```haskell 184 | foo = bar $ \qux -> if predicate qux 185 | then doSomethingSilly 186 | else someOtherCode 187 | ``` 188 | 189 | The same rule applies to nested do blocks: 190 | 191 | ```haskell 192 | foo = do 193 | instruction <- decodeInstruction 194 | skip <- load Memory.skip 195 | if skip == 0x0000 196 | then do 197 | execute instruction 198 | addCycles $ instructionCycles instruction 199 | else do 200 | store Memory.skip 0x0000 201 | addCycles 1 202 | ``` 203 | 204 | ### Case expressions 205 | 206 | The alternatives in a case expression can be indented using either of 207 | the two following styles: 208 | 209 | ```haskell 210 | foobar = case something of 211 | Just j -> foo 212 | Nothing -> bar 213 | ``` 214 | 215 | or as 216 | 217 | ```haskell 218 | foobar = case something of 219 | Just j -> foo 220 | Nothing -> bar 221 | ``` 222 | 223 | Align the `->` arrows when it helps readability. 224 | 225 | ## Imports 226 | 227 | Imports should be grouped in the following order: 228 | 229 | 1. standard library imports 230 | 2. related third party imports 231 | 3. local application/library specific imports 232 | 233 | Put a blank line between each group of imports. The imports in each 234 | group should be sorted alphabetically, by module name. 235 | 236 | Always use explicit import lists or `qualified` imports for standard 237 | and third party libraries. This makes the code more robust against 238 | changes in these libraries. Exception: The Prelude. 239 | 240 | ## Comments 241 | 242 | ### Punctuation 243 | 244 | Write proper sentences; start with a capital letter and use proper 245 | punctuation. 246 | 247 | ### Top-Level Definitions 248 | 249 | Comment every top level function (particularly exported functions), 250 | and provide a type signature; use Haddock syntax in the comments. 251 | Comment every exported data type. Function example: 252 | 253 | ```haskell 254 | -- | Send a message on a socket. The socket must be in a connected 255 | -- state. Returns the number of bytes sent. Applications are 256 | -- responsible for ensuring that all data has been sent. 257 | send :: Socket -- ^ Connected socket 258 | -> ByteString -- ^ Data to send 259 | -> IO Int -- ^ Bytes sent 260 | ``` 261 | 262 | For functions the documentation should give enough information to 263 | apply the function without looking at the function's definition. 264 | 265 | Record example: 266 | 267 | ```haskell 268 | -- | Bla bla bla. 269 | data Person = Person 270 | { age :: !Int -- ^ Age 271 | , name :: !String -- ^ First name 272 | } 273 | ``` 274 | 275 | For fields that require longer comments format them like so: 276 | 277 | ```haskell 278 | data Record = Record 279 | { -- | This is a very very very long comment that is split over 280 | -- multiple lines. 281 | field1 :: !Text 282 | 283 | -- | This is a second very very very long comment that is split 284 | -- over multiple lines. 285 | , field2 :: !Int 286 | } 287 | ``` 288 | 289 | ### End-of-Line Comments 290 | 291 | Separate end-of-line comments from the code using 2 spaces. Align 292 | comments for data type definitions. Some examples: 293 | 294 | ```haskell 295 | data Parser = Parser 296 | !Int -- Current position 297 | !ByteString -- Remaining input 298 | 299 | foo :: Int -> Int 300 | foo n = salt * 32 + 9 301 | where 302 | salt = 453645243 -- Magic hash salt. 303 | ``` 304 | 305 | ### Links 306 | 307 | Use in-line links economically. You are encouraged to add links for 308 | API names. It is not necessary to add links for all API names in a 309 | Haddock comment. We therefore recommend adding a link to an API name 310 | if: 311 | 312 | * The user might actually want to click on it for more information (in 313 | your judgment), and 314 | 315 | * Only for the first occurrence of each API name in the comment (don't 316 | bother repeating a link) 317 | 318 | ## Naming 319 | 320 | Use camel case (e.g. `functionName`) when naming functions and upper 321 | camel case (e.g. `DataType`) when naming data types. 322 | 323 | For readability reasons, don't capitalize all letters when using an 324 | abbreviation. For example, write `HttpServer` instead of 325 | `HTTPServer`. Exception: Two letter abbreviations, e.g. `IO`. 326 | 327 | ### Modules 328 | 329 | Use singular when naming modules e.g. use `Data.Map` and 330 | `Data.ByteString.Internal` instead of `Data.Maps` and 331 | `Data.ByteString.Internals`. 332 | 333 | ## Dealing with laziness 334 | 335 | By default, use strict data types and lazy functions. 336 | 337 | ### Data types 338 | 339 | Constructor fields should be strict, unless there's an explicit reason 340 | to make them lazy. This avoids many common pitfalls caused by too much 341 | laziness and reduces the number of brain cycles the programmer has to 342 | spend thinking about evaluation order. 343 | 344 | ```haskell 345 | -- Good 346 | data Point = Point 347 | { pointX :: !Double -- ^ X coordinate 348 | , pointY :: !Double -- ^ Y coordinate 349 | } 350 | ``` 351 | 352 | ```haskell 353 | -- Bad 354 | data Point = Point 355 | { pointX :: Double -- ^ X coordinate 356 | , pointY :: Double -- ^ Y coordinate 357 | } 358 | ``` 359 | 360 | Additionally, unpacking simple fields often improves performance and 361 | reduces memory usage: 362 | 363 | ```haskell 364 | data Point = Point 365 | { pointX :: {-# UNPACK #-} !Double -- ^ X coordinate 366 | , pointY :: {-# UNPACK #-} !Double -- ^ Y coordinate 367 | } 368 | ``` 369 | 370 | As an alternative to the `UNPACK` pragma, you can put 371 | 372 | ```haskell 373 | {-# OPTIONS_GHC -funbox-strict-fields #-} 374 | ``` 375 | 376 | at the top of the file. Including this flag in the file itself instead 377 | of e.g. in the Cabal file is preferable as the optimization will be 378 | applied even if someone compiles the file using other means (i.e. the 379 | optimization is attached to the source code it belongs to). 380 | 381 | Note that `-funbox-strict-fields` applies to all strict fields, not 382 | just small fields (e.g. `Double` or `Int`). If you're using GHC 7.4 or 383 | later you can use `NOUNPACK` to selectively opt-out for the unpacking 384 | enabled by `-funbox-strict-fields`. 385 | 386 | ### Functions 387 | 388 | Have function arguments be lazy unless you explicitly need them to be 389 | strict. 390 | 391 | The most common case when you need strict function arguments is in 392 | recursion with an accumulator: 393 | 394 | ```haskell 395 | mysum :: [Int] -> Int 396 | mysum = go 0 397 | where 398 | go !acc [] = acc 399 | go acc (x:xs) = go (acc + x) xs 400 | ``` 401 | 402 | Misc 403 | ---- 404 | 405 | ### Point-free style 406 | 407 | Avoid over-using point-free style. For example, this is hard to read: 408 | 409 | ```haskell 410 | -- Bad: 411 | f = (g .) . h 412 | ``` 413 | 414 | ### Warnings 415 | 416 | Code should be compilable with `-Wall -Werror`. There should be no 417 | warnings. 418 | 419 | ## Submitting pull requests 420 | 421 | Try to submit a pull request for **everything**. From a small function to some docs, a comment, or whatever. Even if you are the author of a new repository. 422 | 423 | Pull requests help everyone get to know what you just did. Everyone learns from you and you learn from anyone that suggests changes. Isn't that awesome? 424 | -------------------------------------------------------------------------------- /_posts/2025-11-09-linear-regression.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Linear Regression: California House Price Prediction" 4 | category: tutorial 5 | date: 2025-11-09 14:08:18 6 | --- 7 | 8 | In this tutorial, we'll predict California housing prices using two Haskell libraries: **DataFrame** (for data wrangling) and **Hasktorch** (for machine learning). 9 | 10 | You can follow along and code [here](https://ulwazi-exh9dbh2exbzgbc9.westus-01.azurewebsites.net/lab/tree/California_Housing.ipynb). 11 | 12 | ## What Are We Building? 13 | 14 | We're going to: 15 | 1. 📊 Load and clean real housing data 16 | 2. 🔧 Engineer some clever features 17 | 3. 🤖 Train a linear regression model 18 | 4. 🎯 Predict house prices! 19 | 20 | Think of it as teaching a computer to estimate home values based on things like location, number of rooms, and how close the house is to the ocean. 21 | 22 | ## Our libraries 23 | 24 | ### DataFrame 25 | DataFrame is Swiss Army knife of data manipulation. It lets you work with tabular data (like CSV files) in a mostly type-safe, functional way. 26 | 27 | ### Hasktorch 28 | Hasktorch brings the power of Torch to Haskell. It lets us do numerical computing and machine learning. It has tensors (multi-dimensional arrays) which are the building blocks of neural networks. 29 | 30 | ## Let's Dive Into The Code! 31 | 32 | ### Setting Up Our Imports 33 | 34 | ```haskell 35 | {-# LANGUAGE BangPatterns #-} 36 | {-# LANGUAGE NumericUnderscores #-} 37 | {-# LANGUAGE OverloadedStrings #-} 38 | {-# LANGUAGE ScopedTypeVariables #-} 39 | {-# LANGUAGE TypeApplications #-} 40 | 41 | module Main where 42 | 43 | import qualified DataFrame as D 44 | import qualified DataFrame.Functions as F 45 | import DataFrame.Hasktorch (toTensor) 46 | import Torch 47 | import DataFrame ((|>)) 48 | ``` 49 | 50 | **What's happening here?** We're enabling some handy language extensions and importing our tools. The `|>` operator is particularly cool. The operator is like the Unix pipe, letting us chain operations left-to-right! 51 | 52 | ### Step 1: Loading the Data 53 | 54 | ```haskell 55 | df <- D.readCsv "../data/housing.csv" 56 | ``` 57 | 58 | **Simple, right?** We're loading California housing data from a CSV file. This dataset contains information about different neighborhoods—things like population, median income, and (importantly) median house values. 59 | 60 | ### Step 2: Handling Missing Data 61 | 62 | Real-world data is messy. Sometimes values are missing, and we need to deal with that: 63 | 64 | ```haskell 65 | let meanTotalBedrooms = df |> D.filterJust "total_bedrooms" |> D.mean (F.col @Double "total_bedrooms") 66 | ``` 67 | 68 | **Translation:** "Hey DataFrame, take our data, filter out the rows where `total_bedrooms` is missing, then calculate the mean of what's left." 69 | 70 | We'll use this mean to fill in the blanks later. This is called **imputation**—fancy word for "educated guess filling." 71 | 72 | ### Step 3: Feature Engineering 73 | 74 | Arguably the most important part of the learning process is making sure your data is meaningful. This is called **"feature engineering."** We want to combine our features in interesting ways so that patterns become easier for the model to spot. 75 | 76 | Machine learning models are powerful, but they're not magic. They can only learn from what we give them. If we just hand over raw numbers, we're making the model work way harder than it needs to. But if we do some creative thinking and craft features that highlight the relationships we care about, we can make even a simple model perform amazingly well. 77 | 78 | In our housing example, we're going to: 79 | - Convert text categories (like "NEAR OCEAN") into numbers the model can use (with 0 being the closest to the ocean and 5 being the furthest). 80 | - Create a brand new feature: `rooms_per_household` (because maybe spacious homes are worth more?) 81 | - Normalize everything so no single feature dominates 82 | 83 | ```haskell 84 | oceanProximityMapping :: [(Text, Int)] 85 | oceanProximityMapping = [("ISLAND", 0), ("NEAR OCEAN", 1), ("NEAR BAY", 2), ("<1H OCEAN", 3), ("INLAND", 4)] 86 | 87 | let cleaned = 88 | df 89 | |> D.impute (F.col @(Maybe Double "total_bedrooms")) meanTotalBedrooms 90 | |> D.exclude ["median_house_value"] 91 | |> D.derive "ocean_proximity" (F.recodeWithDefault 5 oceanProximityMapping (F.col "ocean_proximity")) 92 | |> D.derive 93 | "rooms_per_household" 94 | (F.col @Double "total_rooms" / F.col "households") 95 | |> normalizeFeatures 96 | ``` 97 | 98 | **Let's break this pipeline down:** 99 | 100 | 1. **Impute**: Fill in those missing bedroom values with the mean we calculated 101 | 2. **Exclude**: Remove the house value column (we'll use it as labels, not features) 102 | 3. **Derive ocean_proximity**: Convert text like "NEAR OCEAN" into numbers (0-4) that our model can understand 103 | 4. **Derive rooms_per_household**: Create a new feature! Maybe houses with more rooms per household are worth more? 104 | 5. **Normalize**: Scale all features to a 0-1 range so no single feature dominates 105 | 106 | ### Feature Normalization 107 | 108 | ```haskell 109 | normalizeFeatures :: D.DataFrame -> D.DataFrame 110 | normalizeFeatures df = 111 | df 112 | |> D.fold 113 | ( \name d -> 114 | let col = F.col @Double name 115 | in D.derive name ((col - F.minimum col) / (F.maximum col - F.minimum col)) d 116 | ) 117 | (D.columnNames (df |> D.selectBy [D.byProperty (D.hasElemType @Double)])) 118 | ``` 119 | 120 | Neural networks do better when all the data is scaled the same. We applying **min-max normalization** to every numeric column: 121 | 122 | ``` 123 | normalized_value = (value - min) / (max - min) 124 | ``` 125 | 126 | This squishes every feature to the 0-1 range. Why? Imagine if house prices ranged from 0-500,000 but number of bedrooms ranged from 0-5. The huge price numbers would dominate the small bedroom numbers during training. Normalization levels the playing field. 127 | 128 | ### Step 4: From DataFrame to Tensors 129 | 130 | ```haskell 131 | features = toTensor cleaned 132 | labels = toTensor (D.select ["median_house_value"] df) 133 | ``` 134 | 135 | **Bridge time!** We're converting our nice, clean DataFrame into Hasktorch tensors. Think of tensors as supercharged matrices that GPUs love to work with. Our `features` are what the model learns from, and `labels` are what it's trying to predict. 136 | 137 | ### Step 5: Building Our Model 138 | 139 | ```haskell 140 | init <- sample $ LinearSpec{in_features = snd (D.dimensions cleaned), out_features = 1} 141 | ``` 142 | 143 | **What's a linear model?** Imagine drawing the best-fit line through a scatter plot—except we're doing it in many dimensions. The model learns: 144 | 145 | ``` 146 | house_price = w₁×feature₁ + w₂×feature₂ + ... + wₙ×featureₙ + bias 147 | ``` 148 | 149 | We're creating a linear layer with as many inputs as we have features (after cleaning) and 1 output (the predicted price). 150 | 151 | ```haskell 152 | model :: Linear -> Tensor -> Tensor 153 | model state input = squeezeAll $ linear state input 154 | ``` 155 | 156 | This is our prediction function—feed in features, get out a price estimate. 157 | 158 | ### Step 6: Training Loop 159 | 160 | ```haskell 161 | trained <- foldLoop init 100_000 $ \state i -> do 162 | let labels' = model state features 163 | loss = mseLoss labels labels' 164 | when (i `mod` 10_000 == 0) $ do 165 | putStrLn $ "Iteration: " ++ show i ++ " | Loss: " ++ show loss 166 | (state', _) <- runStep state GD loss 0.1 167 | pure state' 168 | ``` 169 | 170 | **This is where learning happens!** Let's break it down: 171 | 172 | 1. **100,000 iterations**: The model gets 100,000 chances to improve 173 | 2. **labels'**: Make predictions with current model weights 174 | 3. **loss**: How wrong are we? MSE (Mean Squared Error) measures the average squared difference between predictions and real prices 175 | 4. **Print every 10,000 steps**: Show us how we're doing! 176 | 5. **runStep with GD**: Update the model using **Gradient Descent** with a learning rate of 0.1 177 | - Think of gradient descent as rolling a ball down a hill to find the lowest point (best model) 178 | - Learning rate controls how big our steps are 179 | 180 | **What you'll see:** 181 | ``` 182 | Training linear regression model... 183 | Iteration: 10000 | Loss: Tensor Float [] 5.0225e9 184 | Iteration: 20000 | Loss: Tensor Float [] 4.9093e9 185 | Iteration: 30000 | Loss: Tensor Float [] 4.8576e9 186 | Iteration: 40000 | Loss: Tensor Float [] 4.8333e9 187 | Iteration: 50000 | Loss: Tensor Float [] 4.8217e9 188 | Iteration: 60000 | Loss: Tensor Float [] 4.8160e9 189 | Iteration: 70000 | Loss: Tensor Float [] 4.8130e9 190 | Iteration: 80000 | Loss: Tensor Float [] 4.8114e9 191 | Iteration: 90000 | Loss: Tensor Float [] 4.8105e9 192 | Iteration: 100000 | Loss: Tensor Float [] 4.8099e9 193 | ``` 194 | 195 | ### Step 7: Making Predictions 196 | 197 | ```haskell 198 | let predictions = 199 | D.insertUnboxedVector 200 | "predicted_house_value" 201 | (asValue @(VU.Vector Float) (model trained features)) 202 | df 203 | print $ D.select ["median_house_value", "predicted_house_value"] predictions 204 | ``` 205 | 206 | **The grand finale!** We're: 207 | 1. Using our trained model to predict all the house values 208 | 2. Converting the tensor back to a vector 209 | 3. Adding it as a new column in our original DataFrame 210 | 4. Printing a comparison of real vs. predicted values 211 | 212 | You'll see something like: 213 | ``` 214 | ------------------------------------------- 215 | median_house_value | predicted_house_value 216 | --------------------|---------------------- 217 | Double | Float 218 | --------------------|---------------------- 219 | 452600.0 | 414079.94 220 | 358500.0 | 423011.94 221 | 352100.0 | 383239.06 222 | 341300.0 | 324928.94 223 | 342200.0 | 256934.23 224 | 269700.0 | 264944.84 225 | 299200.0 | 259094.13 226 | 241400.0 | 257224.55 227 | 226700.0 | 201753.69 228 | 261100.0 | 268698.7 229 | ... 230 | ``` 231 | 232 | ## Key Concepts We Learned 233 | 234 | **DataFrame Operations:** 235 | - `|>` - Pipeline operator (read left to right!) 236 | - `readCsv` - Load data from CSV files 237 | - `impute` - Fill in missing values 238 | - `derive` - Create new columns from existing ones 239 | - `filterJust` - Remove rows with missing values 240 | - `select` / `exclude` - Choose which columns to keep 241 | 242 | **Hasktorch:** 243 | - `toTensor` - Convert DataFrames to tensors 244 | - `Linear` - Linear regression layer 245 | - `mseLoss` - Mean Squared Error loss function 246 | - `runStep` with `GD` - Gradient descent optimization 247 | - `sample` - Initialize model parameters 248 | 249 | **Machine Learning Flow:** 250 | 1. **Load Data** → Get it into your program 251 | 2. **Clean & Transform** → Handle missing values, normalize 252 | 3. **Feature Engineering** → Create useful new features 253 | 4. **Train** → Iteratively improve the model 254 | 5. **Predict** → Use the trained model on data 255 | 256 | ## Try It Yourself! 257 | 258 | **Experiment ideas:** 259 | - Change the learning rate (0.1) to see how it affects training 260 | - Add more derived features (like income per person) 261 | - Try different numbers of iterations 262 | - Use different normalization strategies 263 | 264 | ## The advantages of this approach 265 | 266 | 1. **Type Safety**: DataFrame's type system catches most errors at compile time 267 | 2. **Functional Style**: Pure functions and pipelines make data transformations clear 268 | 3. **Performance**: Hasktorch uses PyTorch's battle-tested backend 269 | 4. **Readability**: The `|>` operator makes data pipelines read like stories 270 | 271 | ## Next Steps 272 | 273 | Now that you've mastered the basics: 274 | - Try different models (polynomial regression, neural networks) 275 | - Experiment with more complex feature engineering 276 | - Learn about train/test splits and model validation 277 | - Explore Hasktorch's neural network modules 278 | 279 | ## Get involved 280 | Wanna help contribute to data science in Haskell? 281 | -------------------------------------------------------------------------------- /_posts/2025-11-09-roadmap.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Roadmap" 4 | category: community 5 | date: 2025-11-09 13:31:54 6 | --- 7 | 8 | # DataHaskell Roadmap 2026-2027 9 | 10 | **Version**: 1.0 11 | **Date**: November 2026 12 | **Coordinators**: DataHaskell Community 13 | **Key Partners**: dataframe, Hasktorch, distributed-process 14 | 15 | --- 16 | 17 | ## Executive Summary 18 | 19 | This roadmap outlines the strategic direction for building a complete, production-ready Haskell data science ecosystem. With three major pillars already in active development—**dataframe** (data manipulation), **Hasktorch** (deep learning), and **distributed-process** (distributed computing)—we are positioned to create a cohesive platform that rivals Python, R, and Julia for data science workloads. 20 | 21 | ### Vision 22 | By 2027, DataHaskell will provide a high-performance, end-to-end data science toolkit that enables practitioners to build reliable machine learning systems from data ingestion through model deployment. 23 | 24 | ### Core Principles 25 | 1. **Interoperability**: Seamless integration between ecosystem components 26 | 2. **Performance**: Match or exceed Python/R performance benchmarks 27 | 3. **Ergonomics**: Intuitive APIs that lower the barrier to entry 28 | 4. **Production Ready**: Focus on reliability, monitoring, and deployment 29 | 5. **Type Safety**: Leverage Haskell's type system (where possible) to catch errors at compile time 30 | 31 | --- 32 | 33 | ## Current State Assessment 34 | 35 | ### Strengths 36 | - **dataframe**: Modern dataframe library with IHaskell integration 37 | - **Hasktorch**: Mature deep learning library with PyTorch backend and GPU support 38 | - **distributed-process**: Battle-tested distributed computing framework 39 | - **IHaskell**: A Haskell kernel for Jupyter notebooks. 40 | - Strong functional programming foundations 41 | - Excellent parallelism and concurrency primitives 42 | 43 | ### Gaps to Address 44 | - No community of maintainers and contributors 45 | - Fragmented visualization ecosystem 46 | - Limited data I/O format support 47 | - Incomplete documentation and tutorials 48 | - Sparse integration examples between major libraries 49 | - Limited model deployment tooling 50 | 51 | ### Critical Needs 52 | - Unified onboarding experience 53 | - Comprehensive benchmarking against Python/R 54 | - Production deployment patterns 55 | - Enterprise adoption case studies 56 | 57 | --- 58 | 59 | ## Strategic Pillars 60 | 61 | ## Pillar 1: Core Data Infrastructure 62 | 63 | ### Phase 1 (Q1-Q2 2026) - Foundation 64 | **Owner**: dataframe team 65 | 66 | **Goals**: 67 | - Complete dataframe v1 release (March 2026) 68 | - Establish dataframe as the standard tabular data library 69 | - Performance parity with Pandas/Polars for common operations 70 | 71 | **Deliverables**: 72 | 1. **dataframe v0.1.0** 73 | - SQL-like API finalized 74 | - IHaskell integration complete 75 | - Type-safe column operations 76 | - Comprehensive test suite 77 | - Apache Arrow integration 78 | 79 | 2. **File Format Support** 80 | - CSV/TSV (existing) 81 | - Parquet (high priority) 82 | - Arrow IPC format 83 | - Excel (xlsx) 84 | - JSON (nested structures) 85 | - HDF5 (coordination with scientific computing) 86 | 87 | 3. **Performance Benchmarks** 88 | - Public benchmark suite comparing to: 89 | - Pandas 90 | - Polars 91 | - dplyr/tidyverse 92 | - Focus areas: filtering, grouping, joining, aggregations 93 | - Document optimization strategies 94 | 95 | ### Phase 2 (Q3-Q4 2026) - Expansion 96 | **Owner**: dataframe + community 97 | 98 | **Goals**: 99 | - Advanced data manipulation features 100 | - Computing on files larger than memory 101 | - Integration with Cloud database systems 102 | 103 | **Deliverables**: 104 | 1. **Advanced Operations** 105 | - Window functions 106 | - Rolling aggregations 107 | - Pivot/unpivot operations 108 | - Complex joins (anti, semi) 109 | - Reshaping operations (melt, cast) 110 | 111 | 2. **Cloud database Connectivity** 112 | - Read files from AWS/GCP/Azure 113 | - PostgreSQL integration 114 | - SQLite support 115 | - Query pushdown optimization 116 | - Streaming query results 117 | 118 | --- 119 | 120 | ## Pillar 2: Statistical Computing & Visualization 121 | 122 | ### Phase 1 (Q2-Q3 2026) - Statistics Core 123 | **Owner**: Community (needs maintainer) 124 | 125 | **Goals**: 126 | - Create a unified machine learning library on top of Hasktorch and Statistics 127 | - Create unified plotting API 128 | 129 | **Deliverables**: 130 | 1. **statistics** 131 | - Extend hypothesis testing (t-test, ANOVA) 132 | - Simple regression models (linear and logistic) 133 | - Generalized linear models (GLM) 134 | - Survival analysis basics 135 | - Integration with dataframe 136 | 137 | 2. **Plotting & Visualization** 138 | - **Option A**: Extend hvega (Vega-Lite) with dataframe integration 139 | - **Option B**: Create native plotting library with backends 140 | - Priority features: 141 | - Scatter plots, line plots, bar charts 142 | - Histograms and distributions 143 | - Heatmaps and correlation plots 144 | - Interactive plots for notebooks 145 | - Export to PNG, SVG, PDF 146 | 147 | ### Phase 2 (Q4 2026 - Q1 2027) - Advanced Analytics 148 | **Owner**: Community 149 | 150 | **Deliverables**: 151 | 1. **Advanced Statistical Methods** 152 | - Mixed effects models 153 | - Time series analysis (ARIMA, state space models) 154 | - Bayesian inference (integration with existing libraries) 155 | - Causal inference methods 156 | - Spatial statistics 157 | 158 | 2. **Visualization Expansion** 159 | - Grammar of graphics implementation 160 | - Geographic/mapping support 161 | - Network visualization 162 | - 3D plotting capabilities 163 | 164 | --- 165 | 166 | ## Pillar 3: Machine Learning & Deep Learning 167 | 168 | ### Phase 1 (Q1-Q2 2026) - Integration 169 | **Owners**: Hasktorch + dataframe teams 170 | 171 | **Goals**: 172 | - Improve dataframe → tensor pipeline 173 | - Example-driven documentation 174 | 175 | **Deliverables**: 176 | 1. **dataframe ↔ Hasktorch Bridge** 177 | - Zero-copy conversion where possible 178 | - Automatic type mapping 179 | - GPU memory management 180 | - Batch loading utilities 181 | 182 | 2. **ML Workflow Examples with new unified library** 183 | - End-to-end classification (Iris, MNIST) 184 | - Regression examples (California Housing) 185 | - Time series forecasting 186 | - NLP pipeline (text classification) 187 | - Computer vision (image classification) 188 | 189 | 3. **Data Preprocessing** 190 | - Feature scaling/normalization 191 | - One-hot encoding 192 | - Missing value imputation 193 | - Train/test splitting 194 | - Cross-validation utilities 195 | 196 | ### Phase 2 (Q3-Q4 2026) - Classical ML 197 | **Owner**: Community (coordinate with Hasktorch) 198 | 199 | **Goals**: 200 | - Fill gap between dataframe and deep learning 201 | - Provide scikit-learn equivalent 202 | 203 | **Deliverables**: 204 | 1. **haskell-ml-toolkit** (new library) 205 | - Decision trees and random forests 206 | - Gradient boosting (XGBoost integration or native) 207 | - Support Vector Machines 208 | - K-means and hierarchical clustering 209 | - Dimensionality reduction (PCA, t-SNE, UMAP) 210 | - Model evaluation metrics 211 | - Hyperparameter optimization 212 | 213 | 2. **Feature Engineering** 214 | - Automatic feature generation 215 | - Feature selection methods 216 | - Polynomial features 217 | - Text feature extraction 218 | 219 | ### Phase 3 (Q1-Q2 2027) - Model Management 220 | **Owners**: Hasktorch + community 221 | 222 | **Deliverables**: 223 | 1. **Model Serialization & Versioning** 224 | - Standard model format 225 | - Version tracking 226 | - Metadata storage 227 | - Model registry concept 228 | 229 | 2. **Model Deployment** 230 | - REST API server templates 231 | - Batch prediction utilities 232 | - Model monitoring hooks 233 | - ONNX export for interoperability 234 | 235 | --- 236 | 237 | ## Pillar 4: Distributed & Parallel Computing 238 | 239 | ### Phase 1 (Q2-Q3 2026) - Core Integration 240 | **Owners**: distributed-process + dataframe teams 241 | 242 | **Goals**: 243 | - Enable distributed data processing 244 | - Provide MapReduce-style operations 245 | 246 | **Deliverables**: 247 | 1. **Distributed DataFrame Operations** 248 | - Distributed CSV/Parquet reading 249 | - Parallel groupby and aggregations 250 | - Distributed joins 251 | - Shuffle operations 252 | - Fault tolerance mechanisms 253 | 254 | 2. **distributed-ml** (new library) 255 | - Distributed model training 256 | - Parameter servers 257 | - Data parallelism primitives 258 | - Model parallelism support 259 | - Integration with Hasktorch 260 | 261 | 3. **Examples & Patterns** 262 | - Multi-node data processing 263 | - Distributed hyperparameter search 264 | - Large-scale model training 265 | - Stream processing patterns 266 | 267 | ### Phase 2 (Q4 2026 - Q1 2027) - Production Features 268 | **Owner**: distributed-process team 269 | 270 | **Deliverables**: 271 | 1. **Cluster Management** 272 | - Node discovery and registration 273 | - Health monitoring 274 | - Resource allocation 275 | - Job scheduling 276 | 277 | 2. **Cloud Integration** 278 | - AWS backend 279 | - Google Cloud backend 280 | - Kubernetes deployment patterns 281 | - Docker containerization templates 282 | 283 | --- 284 | 285 | ## Pillar 5: Developer Experience 286 | 287 | ### Phase 1 (Q1-Q2 2026) - Documentation Blitz 288 | **Owner**: All maintainers + community 289 | 290 | **Goals**: 291 | - Lower barrier to entry 292 | - Comprehensive learning path 293 | 294 | **Deliverables**: 295 | 1. **DataHaskell Website Revamp** 296 | - Clear getting started guide 297 | - Library comparison matrix 298 | - Migration guides (from Python, R) 299 | - Success stories 300 | 301 | 2. **Tutorial Series** 302 | - Installation and setup (all platforms) 303 | - Your first data analysis 304 | - DataFrames deep dive 305 | - Machine learning workflow 306 | - Distributed computing basics 307 | - Production deployment 308 | 309 | 3. **Notebook Gallery** 310 | - 20+ example notebooks covering: 311 | - Data cleaning and exploration 312 | - Statistical analysis 313 | - ML model building 314 | - Visualization 315 | - Domain-specific examples (finance, biology, etc.) 316 | 317 | ### Phase 2 (Q3-Q4 2026) - Tooling 318 | **Owner**: Community 319 | 320 | **Deliverables**: 321 | 1. **datahaskell-cli** (new tool) 322 | - Project scaffolding 323 | - Dependency management presets 324 | - Environment setup automation 325 | - Example project templates 326 | 327 | 2. **IDE Support Improvements** 328 | - VSCode IHaskell support with dataHaskell stack supported out the box 329 | - HLS integration guides 330 | - Debugging workflows 331 | - IHaskell kernel improvements 332 | 333 | 3. **Testing & CI Templates** 334 | - Property-based testing examples 335 | - Benchmark suites 336 | - GitHub Actions templates 337 | - Continuous deployment patterns 338 | 339 | --- 340 | 341 | ## Pillar 6: Community & Ecosystem 342 | 343 | ### Ongoing Initiatives 344 | 345 | **Goals**: 346 | - Grow contributor base 347 | - Foster collaboration 348 | - Drive adoption 349 | 350 | **Deliverables**: 351 | 1. **Community Building** 352 | - Monthly community calls (starting Q1 2026) 353 | - Discord/Slack workspace 354 | - Quarterly virtual conferences 355 | - Mentorship program 356 | 357 | 2. **Contribution Framework** 358 | - Good first issues across all projects 359 | - Contribution guidelines 360 | - Code review standards 361 | - Recognition program 362 | 363 | 3. **Outreach** 364 | - Blog post series 365 | - Conference talks (Haskell Symposium, ZuriHac, etc.) 366 | - Academic collaborations 367 | - Industry partnerships 368 | 369 | 4. **Package Standards** 370 | - Naming conventions 371 | - API design guidelines 372 | - Documentation requirements 373 | - Testing standards 374 | - Version compatibility matrix 375 | 376 | --- 377 | 378 | ## Success Metrics 379 | 380 | ### Q2 2026 381 | - [ ] dataframe v1 released 382 | - [ ] 3 complete end-to-end tutorials published 383 | - [ ] Performance benchmarks showing ≥70% of Pandas speed 384 | - [ ] 5 integration examples between major libraries 385 | 386 | ### Q4 2026 387 | - [ ] 10,000+ total library downloads/month across ecosystem 388 | - [ ] 5+ active contributors 389 | - [ ] Performance parity (≥90%) with Pandas for common operations 390 | - [ ] Complete ML workflow from data to deployment documented 391 | 392 | ### Q2 2027 393 | - [ ] 2+ companies using DataHaskell 394 | - [ ] DataHaskell track at major Haskell conference 395 | - [ ] 3+ published case studies 396 | - [ ] Comprehensive distributed computing examples 397 | 398 | ### Q4 2027 399 | - [ ] Feature completeness with Python's core data science stack 400 | - [ ] 5+ production ML systems case studies 401 | - [ ] Enterprise support offerings available 402 | 403 | --- 404 | 405 | ## Resource Requirements 406 | 407 | ### Maintainer Coordination 408 | - **Monthly sync**: All pillar leads (1 hour) 409 | - **Quarterly planning**: Full maintainer group (2 hours) 410 | 411 | ### Funding Needs (Optional but Helpful) 412 | 1. **Infrastructure** 413 | - Benchmark server (GPU-enabled) 414 | - CI/CD resources 415 | - Documentation hosting 416 | 417 | 2. **Developer Support** 418 | - Part-time technical writer 419 | - Maintainer stipends or grants 420 | - Summer of Haskell projects 421 | 422 | 3. **Events** 423 | - Quarterly virtual meetups 424 | - Annual in-person hackathon 425 | - Conference sponsorships 426 | 427 | --- 428 | 429 | ## Risk Mitigation 430 | 431 | ### Technical Risks 432 | 433 | | Risk | Mitigation | 434 | |------|-----------| 435 | | Performance doesn't match Python | Early benchmarking, profiling, and optimization sprints | 436 | | Integration complexity | Defined interfaces, versioning strategy, compatibility tests | 437 | | Breaking changes in dependencies | Conservative version bounds, testing matrix | 438 | 439 | ### Community Risks 440 | 441 | | Risk | Mitigation | 442 | |------|-----------| 443 | | Maintainer burnout | Distributed ownership, recognition program, funding support | 444 | | Fragmentation | Regular coordination, shared roadmap, integration testing | 445 | | Slow adoption | Marketing efforts, case studies, migration guides | 446 | 447 | ### Ecosystem Risks 448 | 449 | | Risk | Mitigation | 450 | |------|-----------| 451 | | GHC changes break libraries | Test against multiple GHC versions, engage with GHC team | 452 | | Competing projects | Focus on collaboration, clear differentiation | 453 | | Limited contributor pool | Mentorship, good documentation, welcoming community | 454 | 455 | --- 456 | 457 | ## Decision Framework 458 | 459 | ### When to add new libraries 460 | **Criteria**: 461 | 1. Fills clear gap in ecosystem 462 | 2. Has committed maintainer 463 | 3. Integrates with existing components 464 | 4. Follows API design guidelines 465 | 5. Includes comprehensive tests and docs 466 | 467 | ### When to deprecate/consolidate 468 | **Criteria**: 469 | 1. Unmaintained for >6 months 470 | 2. Better alternative exists 471 | 3. Creates confusion in ecosystem 472 | 473 | ### Version Compatibility Policy 474 | - Support last 2 major GHC versions 475 | - Semantic versioning (PVP) 476 | - Deprecation warnings for 2 releases before removal 477 | - Compatibility matrix published on website 478 | 479 | --- 480 | 481 | ## Communication Plan 482 | 483 | ### Internal (Maintainers) 484 | - **Discord channel**: Daily async communication 485 | - **GitHub Discussions**: Technical decisions, RFCs 486 | - **Monthly video call**: Roadmap progress, blockers 487 | - **Quarterly planning session**: Next phase priorities 488 | 489 | ### External (Community) 490 | - **Blog**: Monthly progress updates 491 | - **Twitter/Social**: Weekly highlights 492 | - **Haskell Discourse**: Major announcements 493 | - **Newsletter**: Quarterly ecosystem update 494 | - **Documentation**: Always up-to-date 495 | 496 | --- 497 | 498 | ## How to Use This Roadmap 499 | 500 | This is a **living document**. We will: 501 | - Review quarterly and adjust priorities 502 | - Track progress in GitHub projects 503 | - Celebrate milestones publicly 504 | - Adapt based on community feedback 505 | 506 | **Questions?** Open a discussion on GitHub or join our community calls. 507 | 508 | --- 509 | 510 | *Let's build the future of data science in Haskell together! 🚀* 511 | --------------------------------------------------------------------------------