├── .gitignore ├── .rspec ├── .travis.yml ├── CHANGELOG.md ├── CLAUDE.md ├── CODE_OF_CONDUCT.md ├── Gemfile ├── Gemfile.lock ├── LICENSE.txt ├── README.md ├── Rakefile ├── bin ├── console └── setup ├── lib ├── structify.rb └── structify │ ├── model.rb │ ├── schema_serializer.rb │ └── version.rb ├── spec ├── spec_helper.rb ├── structify │ ├── model_spec.rb │ └── schema_serializer_spec.rb └── structify_spec.rb ├── structify-0.2.1.gem └── structify.gemspec /.gitignore: -------------------------------------------------------------------------------- 1 | /.bundle/ 2 | /.yardoc 3 | /_yardoc/ 4 | /coverage/ 5 | /doc/ 6 | /pkg/ 7 | /spec/reports/ 8 | /tmp/ 9 | 10 | # rspec failure tracking 11 | .rspec_status 12 | -------------------------------------------------------------------------------- /.rspec: -------------------------------------------------------------------------------- 1 | --require spec_helper 2 | --format documentation 3 | --color 4 | --order random 5 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | --- 2 | language: ruby 3 | cache: bundler 4 | rvm: 5 | - 3.2.2 6 | before_install: gem install bundler -v 2.1.4 7 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | All notable changes to this project will be documented in this file. 4 | 5 | ## [0.3.4] - 2025-03-19 6 | 7 | ### Changed 8 | 9 | - Renamed schema `title` to `name` to align with JSON Schema standards 10 | - Added validation for schema name to ensure it matches the pattern `^[a-zA-Z0-9_-]+$` 11 | 12 | ## [0.3.3] - 2025-03-19 13 | 14 | ### Fixed 15 | 16 | - Fixed versioning in JSON schema generation to only include fields for the current schema version 17 | - Fields with `versions: x` no longer appear in other schema versions when generating the JSON schema 18 | 19 | ## [0.3.2] - 2025-03-17 20 | 21 | ### Added 22 | 23 | - Added `saved_change_to_extracted_data?` method that works with the configured `default_container_attribute` 24 | 25 | ## [0.3.0] - 2025-03-17 26 | 27 | ### Added 28 | 29 | - Added configuration system with `Structify.configure` method 30 | - Added ability to configure default container attribute through initializer 31 | - Changed default container attribute from `:extracted_data` to `:json_attributes` 32 | 33 | ## [0.2.0] - 2025-03-12 34 | 35 | ### Added 36 | 37 | - New `thinking` mode option to automatically add chain of thought reasoning to LLM schemas 38 | - When enabled, adds a `chain_of_thought` field as the first property in the generated schema 39 | 40 | ## [0.1.0] - Initial Release 41 | 42 | - Initial release of Structify -------------------------------------------------------------------------------- /CLAUDE.md: -------------------------------------------------------------------------------- 1 | # CLAUDE.md - Guidelines for Structify 2 | 3 | ## Commands 4 | - Build: `bundle exec rake build` 5 | - Install: `bundle exec rake install` 6 | - Test all: `bundle exec rake spec` 7 | - Test single file: `bundle exec rspec spec/path/to/file_spec.rb` 8 | - Test specific example: `bundle exec rspec spec/path/to/file_spec.rb:LINE_NUMBER` 9 | - Lint: `bundle exec rubocop` 10 | 11 | ## Code Style 12 | - Use `# frozen_string_literal: true` at the top of all Ruby files 13 | - Follow Ruby naming conventions (snake_case for methods/variables, CamelCase for classes) 14 | - Include YARD documentation for classes and methods 15 | - Group similar methods together 16 | - Include descriptive RSpec tests for all functionality 17 | - Keep methods short and focused on a single responsibility 18 | - Use specific error classes for error handling 19 | - Prefer explicit requires over auto-loading 20 | - Follow ActiveSupport::Concern patterns for modules 21 | - Keep DSL simple and intuitive for end users 22 | 23 | ## Structure 24 | - Put core functionality in lib/structify/ 25 | - Keep implementation details private when possible 26 | - Follow semantic versioning guidelines 27 | - Ensure proper test coverage for all public APIs -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, gender identity and expression, level of experience, 9 | nationality, personal appearance, race, religion, or sexual identity and 10 | orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies both within project spaces and in public spaces 49 | when an individual is representing the project or its community. Examples of 50 | representing a project or community include using an official project e-mail 51 | address, posting via an official social media account, or acting as an appointed 52 | representative at an online or offline event. Representation of a project may be 53 | further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at kieranklaassen@gmail.com. All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at [https://contributor-covenant.org/version/1/4][version] 72 | 73 | [homepage]: https://contributor-covenant.org 74 | [version]: https://contributor-covenant.org/version/1/4/ 75 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | 3 | # Specify your gem's dependencies in structify.gemspec 4 | gemspec 5 | 6 | group :development, :test do 7 | gem "rake", "~> 13.0" 8 | gem "rspec", "~> 3.12" 9 | gem "rspec-rails", "~> 6.1" 10 | gem "activerecord", "~> 8" 11 | gem 'sqlite3', '~> 2.0', '>= 2.0.2' 12 | gem "rubocop", "~> 1.21" 13 | gem "rubocop-rspec", "~> 2.25" 14 | gem "yard", "~> 0.9" 15 | gem "debug", ">= 1.0.0" 16 | end 17 | -------------------------------------------------------------------------------- /Gemfile.lock: -------------------------------------------------------------------------------- 1 | PATH 2 | remote: . 3 | specs: 4 | structify (0.3.4) 5 | activesupport (>= 7.0, < 9.0) 6 | attr_json (~> 2.1) 7 | 8 | GEM 9 | remote: https://rubygems.org/ 10 | specs: 11 | actionpack (8.0.2) 12 | actionview (= 8.0.2) 13 | activesupport (= 8.0.2) 14 | nokogiri (>= 1.8.5) 15 | rack (>= 2.2.4) 16 | rack-session (>= 1.0.1) 17 | rack-test (>= 0.6.3) 18 | rails-dom-testing (~> 2.2) 19 | rails-html-sanitizer (~> 1.6) 20 | useragent (~> 0.16) 21 | actionview (8.0.2) 22 | activesupport (= 8.0.2) 23 | builder (~> 3.1) 24 | erubi (~> 1.11) 25 | rails-dom-testing (~> 2.2) 26 | rails-html-sanitizer (~> 1.6) 27 | activemodel (8.0.2) 28 | activesupport (= 8.0.2) 29 | activerecord (8.0.2) 30 | activemodel (= 8.0.2) 31 | activesupport (= 8.0.2) 32 | timeout (>= 0.4.0) 33 | activesupport (8.0.2) 34 | base64 35 | benchmark (>= 0.3) 36 | bigdecimal 37 | concurrent-ruby (~> 1.0, >= 1.3.1) 38 | connection_pool (>= 2.2.5) 39 | drb 40 | i18n (>= 1.6, < 2) 41 | logger (>= 1.4.2) 42 | minitest (>= 5.1) 43 | securerandom (>= 0.3) 44 | tzinfo (~> 2.0, >= 2.0.5) 45 | uri (>= 0.13.1) 46 | ast (2.4.2) 47 | attr_json (2.5.0) 48 | activerecord (>= 6.0.0, < 8.1) 49 | base64 (0.2.0) 50 | benchmark (0.4.0) 51 | bigdecimal (3.1.9) 52 | builder (3.3.0) 53 | concurrent-ruby (1.3.5) 54 | connection_pool (2.5.0) 55 | crass (1.0.6) 56 | date (3.4.1) 57 | debug (1.10.0) 58 | irb (~> 1.10) 59 | reline (>= 0.3.8) 60 | diff-lcs (1.5.1) 61 | drb (2.2.1) 62 | erubi (1.13.1) 63 | i18n (1.14.7) 64 | concurrent-ruby (~> 1.0) 65 | io-console (0.8.0) 66 | irb (1.15.1) 67 | pp (>= 0.6.0) 68 | rdoc (>= 4.0.0) 69 | reline (>= 0.4.2) 70 | json (2.9.1) 71 | language_server-protocol (3.17.0.4) 72 | logger (1.6.5) 73 | loofah (2.24.0) 74 | crass (~> 1.0.2) 75 | nokogiri (>= 1.12.0) 76 | minitest (5.25.4) 77 | nokogiri (1.18.2-x86_64-darwin) 78 | racc (~> 1.4) 79 | parallel (1.26.3) 80 | parser (3.3.7.0) 81 | ast (~> 2.4.1) 82 | racc 83 | pp (0.6.2) 84 | prettyprint 85 | prettyprint (0.2.0) 86 | psych (5.2.3) 87 | date 88 | stringio 89 | racc (1.8.1) 90 | rack (3.1.9) 91 | rack-session (2.1.0) 92 | base64 (>= 0.1.0) 93 | rack (>= 3.0.0) 94 | rack-test (2.2.0) 95 | rack (>= 1.3) 96 | rackup (2.2.1) 97 | rack (>= 3) 98 | rails-dom-testing (2.2.0) 99 | activesupport (>= 5.0.0) 100 | minitest 101 | nokogiri (>= 1.6) 102 | rails-html-sanitizer (1.6.2) 103 | loofah (~> 2.21) 104 | nokogiri (>= 1.15.7, != 1.16.7, != 1.16.6, != 1.16.5, != 1.16.4, != 1.16.3, != 1.16.2, != 1.16.1, != 1.16.0.rc1, != 1.16.0) 105 | railties (8.0.2) 106 | actionpack (= 8.0.2) 107 | activesupport (= 8.0.2) 108 | irb (~> 1.13) 109 | rackup (>= 1.0.0) 110 | rake (>= 12.2) 111 | thor (~> 1.0, >= 1.2.2) 112 | zeitwerk (~> 2.6) 113 | rainbow (3.1.1) 114 | rake (13.2.1) 115 | rdoc (6.11.0) 116 | psych (>= 4.0.0) 117 | regexp_parser (2.10.0) 118 | reline (0.6.0) 119 | io-console (~> 0.5) 120 | rspec (3.13.0) 121 | rspec-core (~> 3.13.0) 122 | rspec-expectations (~> 3.13.0) 123 | rspec-mocks (~> 3.13.0) 124 | rspec-core (3.13.2) 125 | rspec-support (~> 3.13.0) 126 | rspec-expectations (3.13.3) 127 | diff-lcs (>= 1.2.0, < 2.0) 128 | rspec-support (~> 3.13.0) 129 | rspec-mocks (3.13.2) 130 | diff-lcs (>= 1.2.0, < 2.0) 131 | rspec-support (~> 3.13.0) 132 | rspec-rails (6.1.5) 133 | actionpack (>= 6.1) 134 | activesupport (>= 6.1) 135 | railties (>= 6.1) 136 | rspec-core (~> 3.13) 137 | rspec-expectations (~> 3.13) 138 | rspec-mocks (~> 3.13) 139 | rspec-support (~> 3.13) 140 | rspec-support (3.13.2) 141 | rubocop (1.71.1) 142 | json (~> 2.3) 143 | language_server-protocol (>= 3.17.0) 144 | parallel (~> 1.10) 145 | parser (>= 3.3.0.2) 146 | rainbow (>= 2.2.2, < 4.0) 147 | regexp_parser (>= 2.9.3, < 3.0) 148 | rubocop-ast (>= 1.38.0, < 2.0) 149 | ruby-progressbar (~> 1.7) 150 | unicode-display_width (>= 2.4.0, < 4.0) 151 | rubocop-ast (1.38.0) 152 | parser (>= 3.3.1.0) 153 | rubocop-capybara (2.21.0) 154 | rubocop (~> 1.41) 155 | rubocop-factory_bot (2.26.1) 156 | rubocop (~> 1.61) 157 | rubocop-rspec (2.31.0) 158 | rubocop (~> 1.40) 159 | rubocop-capybara (~> 2.17) 160 | rubocop-factory_bot (~> 2.22) 161 | rubocop-rspec_rails (~> 2.28) 162 | rubocop-rspec_rails (2.29.1) 163 | rubocop (~> 1.61) 164 | ruby-progressbar (1.13.0) 165 | securerandom (0.4.1) 166 | sqlite3 (2.2.0-x86_64-darwin) 167 | stringio (3.1.2) 168 | thor (1.3.2) 169 | timeout (0.4.3) 170 | tzinfo (2.0.6) 171 | concurrent-ruby (~> 1.0) 172 | unicode-display_width (3.1.4) 173 | unicode-emoji (~> 4.0, >= 4.0.4) 174 | unicode-emoji (4.0.4) 175 | uri (1.0.3) 176 | useragent (0.16.11) 177 | yard (0.9.37) 178 | zeitwerk (2.7.1) 179 | 180 | PLATFORMS 181 | x86_64-darwin-22 182 | 183 | DEPENDENCIES 184 | activerecord (~> 8) 185 | debug (>= 1.0.0) 186 | rake (~> 13.0) 187 | rspec (~> 3.12) 188 | rspec-rails (~> 6.1) 189 | rubocop (~> 1.21) 190 | rubocop-rspec (~> 2.25) 191 | sqlite3 (~> 2.0, >= 2.0.2) 192 | structify! 193 | yard (~> 0.9) 194 | 195 | BUNDLED WITH 196 | 2.4.20 197 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2025 Kieran Klaassen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Structify 2 | 3 | [![Gem Version](https://badge.fury.io/rb/structify.svg)](https://badge.fury.io/rb/structify) 4 | 5 | A Ruby gem for extracting structured data from content using LLMs in Rails applications 6 | 7 | ## What is Structify? 8 | 9 | Structify helps you extract structured data from unstructured content in your Rails apps: 10 | 11 | - **Define extraction schemas** directly in your ActiveRecord models 12 | - **Generate JSON schemas** to use with OpenAI, Anthropic, or other LLM providers 13 | - **Store and validate** extracted data in your models 14 | - **Access structured data** through typed model attributes 15 | 16 | ## Use Cases 17 | 18 | - Extract metadata, topics, and sentiment from articles or blog posts 19 | - Pull structured information from user-generated content 20 | - Organize unstructured feedback or reviews into categorized data 21 | - Convert emails or messages into actionable, structured formats 22 | - Extract entities and relationships from documents 23 | 24 | ```ruby 25 | # 1. Define extraction schema in your model 26 | class Article < ApplicationRecord 27 | include Structify::Model 28 | 29 | schema_definition do 30 | field :title, :string 31 | field :summary, :text 32 | field :category, :string, enum: ["tech", "business", "science"] 33 | field :topics, :array, items: { type: "string" } 34 | end 35 | end 36 | 37 | # 2. Get schema for your LLM API 38 | schema = Article.json_schema 39 | 40 | # 3. Store LLM response in your model 41 | article = Article.find(123) 42 | article.update(llm_response) 43 | 44 | # 4. Access extracted data 45 | article.title # => "AI Advances in 2023" 46 | article.summary # => "Recent developments in artificial intelligence..." 47 | article.topics # => ["machine learning", "neural networks", "computer vision"] 48 | ``` 49 | 50 | ## Install 51 | 52 | ```ruby 53 | # Add to Gemfile 54 | gem 'structify' 55 | ``` 56 | 57 | Then: 58 | ```bash 59 | bundle install 60 | ``` 61 | 62 | ## Database Setup 63 | 64 | Add a JSON column to store extracted data: 65 | 66 | ```ruby 67 | add_column :articles, :json_attributes, :jsonb # PostgreSQL (default column name) 68 | # or 69 | add_column :articles, :json_attributes, :json # MySQL (default column name) 70 | 71 | # Or if you configure a custom column name: 72 | add_column :articles, :custom_json_column, :jsonb # PostgreSQL 73 | ``` 74 | 75 | ## Configuration 76 | 77 | Structify can be configured in an initializer: 78 | 79 | ```ruby 80 | # config/initializers/structify.rb 81 | Structify.configure do |config| 82 | # Configure the default JSON container attribute (default: :json_attributes) 83 | config.default_container_attribute = :custom_json_column 84 | end 85 | ``` 86 | 87 | ## Usage 88 | 89 | ### Define Your Schema 90 | 91 | ```ruby 92 | class Article < ApplicationRecord 93 | include Structify::Model 94 | 95 | schema_definition do 96 | version 1 97 | name "ArticleExtraction" 98 | 99 | field :title, :string, required: true 100 | field :summary, :text 101 | field :category, :string, enum: ["tech", "business", "science"] 102 | field :topics, :array, items: { type: "string" } 103 | field :metadata, :object, properties: { 104 | "author" => { type: "string" }, 105 | "published_at" => { type: "string" } 106 | } 107 | end 108 | end 109 | ``` 110 | 111 | ### Get Schema for LLM API 112 | 113 | Structify generates the JSON schema that you'll need to send to your LLM provider: 114 | 115 | ```ruby 116 | # Get JSON Schema to send to OpenAI, Anthropic, etc. 117 | schema = Article.json_schema 118 | ``` 119 | 120 | ### Integration with LLM Services 121 | 122 | You need to implement the actual LLM integration. Here's how you can integrate with popular services: 123 | 124 | #### OpenAI Integration Example 125 | 126 | ```ruby 127 | require "openai" 128 | 129 | class OpenAiExtractor 130 | def initialize(api_key = ENV["OPENAI_API_KEY"]) 131 | @client = OpenAI::Client.new(access_token: api_key) 132 | end 133 | 134 | def extract(content, model_class) 135 | # Get schema from Structify model 136 | schema = model_class.json_schema 137 | 138 | # Call OpenAI with structured outputs 139 | response = @client.chat( 140 | parameters: { 141 | model: "gpt-4o", 142 | response_format: { type: "json_object", schema: schema }, 143 | messages: [ 144 | { role: "system", content: "Extract structured information from the provided content." }, 145 | { role: "user", content: content } 146 | ] 147 | } 148 | ) 149 | 150 | # Parse and return the structured data 151 | JSON.parse(response.dig("choices", 0, "message", "content"), symbolize_names: true) 152 | end 153 | end 154 | 155 | # Usage 156 | extractor = OpenAiExtractor.new 157 | article = Article.find(123) 158 | extracted_data = extractor.extract(article.content, Article) 159 | article.update(extracted_data) 160 | ``` 161 | 162 | #### Anthropic Integration Example 163 | 164 | ```ruby 165 | require "anthropic" 166 | 167 | class AnthropicExtractor 168 | def initialize(api_key = ENV["ANTHROPIC_API_KEY"]) 169 | @client = Anthropic::Client.new(api_key: api_key) 170 | end 171 | 172 | def extract(content, model_class) 173 | # Get schema from Structify model 174 | schema = model_class.json_schema 175 | 176 | # Call Claude with tool use 177 | response = @client.messages.create( 178 | model: "claude-3-opus-20240229", 179 | max_tokens: 1000, 180 | system: "Extract structured data based on the provided schema.", 181 | messages: [{ role: "user", content: content }], 182 | tools: [{ 183 | type: "function", 184 | function: { 185 | name: "extract_data", 186 | description: "Extract structured data from content", 187 | parameters: schema 188 | } 189 | }], 190 | tool_choice: { type: "function", function: { name: "extract_data" } } 191 | ) 192 | 193 | # Parse and return structured data 194 | JSON.parse(response.content[0].tools[0].function.arguments, symbolize_names: true) 195 | end 196 | end 197 | ``` 198 | 199 | ### Store & Access Extracted Data 200 | 201 | ```ruby 202 | # Store LLM response in your model 203 | article.update(response) 204 | 205 | # Access via model attributes 206 | article.title # => "How AI is Changing Healthcare" 207 | article.category # => "tech" 208 | article.topics # => ["machine learning", "healthcare"] 209 | 210 | # All data is in the JSON column (default column name: json_attributes) 211 | article.json_attributes # => The complete JSON 212 | ``` 213 | 214 | ## Field Types 215 | 216 | Structify supports all standard JSON Schema types: 217 | 218 | ```ruby 219 | field :name, :string # String values 220 | field :count, :integer # Integer values 221 | field :price, :number # Numeric values (float/int) 222 | field :active, :boolean # Boolean values 223 | field :metadata, :object # JSON objects 224 | field :tags, :array # Arrays 225 | ``` 226 | 227 | ## Field Options 228 | 229 | ```ruby 230 | # Required fields 231 | field :title, :string, required: true 232 | 233 | # Enum values 234 | field :status, :string, enum: ["draft", "published", "archived"] 235 | 236 | # Array constraints 237 | field :tags, :array, 238 | items: { type: "string" }, 239 | min_items: 1, 240 | max_items: 5, 241 | unique_items: true 242 | 243 | # Nested objects 244 | field :author, :object, properties: { 245 | "name" => { type: "string", required: true }, 246 | "email" => { type: "string" } 247 | } 248 | ``` 249 | 250 | ## Chain of Thought Mode 251 | 252 | Structify supports a "thinking" mode that automatically requests chain of thought reasoning from the LLM: 253 | 254 | ```ruby 255 | schema_definition do 256 | version 1 257 | thinking true # Enable chain of thought reasoning 258 | 259 | field :title, :string, required: true 260 | # other fields... 261 | end 262 | ``` 263 | 264 | Chain of thought (COT) reasoning is beneficial because it: 265 | - Adds more context to the extraction process 266 | - Helps the LLM think through problems more systematically 267 | - Improves accuracy for complex extractions 268 | - Makes the reasoning process transparent and explainable 269 | - Reduces hallucinations by forcing step-by-step thinking 270 | 271 | This is especially useful when: 272 | - Answers need more detailed information 273 | - Questions require multi-step reasoning 274 | - Extractions involve complex decision-making 275 | - You need to understand how the LLM reached its conclusions 276 | 277 | For best results, include instructions for COT in your base system prompt: 278 | 279 | ```ruby 280 | system_prompt = "Extract structured data from the content. 281 | For each field, think step by step before determining the value." 282 | ``` 283 | 284 | You can generate effective chain of thought prompts using tools like the [Claude Prompt Designer](https://console.anthropic.com/dashboard). 285 | 286 | ## Schema Versioning and Field Lifecycle 287 | 288 | Structify provides a simple field lifecycle management system using a `versions` parameter: 289 | 290 | ```ruby 291 | schema_definition do 292 | version 3 293 | 294 | # Fields for specific version ranges 295 | field :title, :string # Available in all versions (default behavior) 296 | field :legacy, :string, versions: 1...3 # Only in versions 1-2 (removed in v3) 297 | field :summary, :text, versions: 2 # Added in version 2 onwards 298 | field :content, :text, versions: 2.. # Added in version 2 onwards (endless range) 299 | field :temp_field, :string, versions: 2..3 # Only in versions 2-3 300 | field :special, :string, versions: [1, 3, 5] # Only in versions 1, 3, and 5 301 | end 302 | ``` 303 | 304 | ### Version Range Syntax 305 | 306 | Structify supports several ways to specify which versions a field is available in: 307 | 308 | | Syntax | Example | Meaning | 309 | |--------|---------|---------| 310 | | No version specified | `field :title, :string` | Available in all versions (default) | 311 | | Single integer | `versions: 2` | Available from version 2 onwards | 312 | | Range (inclusive) | `versions: 1..3` | Available in versions 1, 2, and 3 | 313 | | Range (exclusive) | `versions: 1...3` | Available in versions 1 and 2 (not 3) | 314 | | Endless range | `versions: 2..` | Available from version 2 onwards | 315 | | Array | `versions: [1, 4, 7]` | Only available in versions 1, 4, and 7 | 316 | 317 | ### Handling Records with Different Versions 318 | 319 | ```ruby 320 | # Create a record with version 1 schema 321 | article_v1 = Article.create(title: "Original Article") 322 | 323 | # Access with version 3 schema 324 | article_v3 = Article.find(article_v1.id) 325 | 326 | # Fields from v1 are still accessible 327 | article_v3.title # => "Original Article" 328 | 329 | # Fields not in v1 raise errors 330 | article_v3.summary # => VersionRangeError: Field 'summary' is not available in version 1. 331 | # This field is only available in versions: 2 to 999. 332 | 333 | # Check version compatibility 334 | article_v3.version_compatible_with?(3) # => false 335 | article_v3.version_compatible_with?(1) # => true 336 | 337 | # Upgrade record to version 3 338 | article_v3.summary = "Added in v3" 339 | article_v3.save! # Record version is automatically updated to 3 340 | ``` 341 | 342 | ### Accessing the Container Attribute 343 | 344 | The JSON container attribute can be accessed directly: 345 | 346 | ```ruby 347 | # Using the default container attribute :json_attributes 348 | article.json_attributes # => { "title" => "My Title", "version" => 1, ... } 349 | 350 | # If you've configured a custom container attribute 351 | article.custom_json_column # => { "title" => "My Title", "version" => 1, ... } 352 | ``` 353 | 354 | 355 | ## Understanding Structify's Role 356 | 357 | Structify is designed as a **bridge** between your Rails models and LLM extraction services: 358 | 359 | ### What Structify Does For You 360 | 361 | - ✅ **Define extraction schemas** directly in your ActiveRecord models 362 | - ✅ **Generate compatible JSON schemas** for OpenAI, Anthropic, and other LLM providers 363 | - ✅ **Store and validate** extracted data against your schema 364 | - ✅ **Provide typed access** to extracted fields through your models 365 | - ✅ **Handle schema versioning** and backward compatibility 366 | - ✅ **Support chain of thought reasoning** with the thinking mode option 367 | 368 | ### What You Need To Implement 369 | 370 | - 🔧 **API integration** with your chosen LLM provider (see examples above) 371 | - 🔧 **Processing logic** for when and how to extract data 372 | - 🔧 **Authentication** and API key management 373 | - 🔧 **Error handling and retries** for API calls 374 | 375 | This separation of concerns allows you to: 376 | 1. Use any LLM provider and model you prefer 377 | 2. Implement extraction logic specific to your application 378 | 3. Handle API access in a way that fits your application architecture 379 | 4. Change LLM providers without changing your data model 380 | 381 | ## License 382 | 383 | [MIT License](https://opensource.org/licenses/MIT) -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | require "bundler/gem_tasks" 2 | require "rspec/core/rake_task" 3 | 4 | RSpec::Core::RakeTask.new(:spec) 5 | 6 | task :default => :spec 7 | -------------------------------------------------------------------------------- /bin/console: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env ruby 2 | 3 | require "bundler/setup" 4 | require "structify" 5 | 6 | # You can add fixtures and/or initialization code here to make experimenting 7 | # with your gem easier. You can also use a different console, if you like. 8 | 9 | # (If you use this, don't forget to add pry to your Gemfile!) 10 | # require "pry" 11 | # Pry.start 12 | 13 | require "irb" 14 | IRB.start(__FILE__) 15 | -------------------------------------------------------------------------------- /bin/setup: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | set -euo pipefail 3 | IFS=$'\n\t' 4 | set -vx 5 | 6 | bundle install 7 | 8 | # Do any other automated setup that you need to do here 9 | -------------------------------------------------------------------------------- /lib/structify.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require_relative "structify/version" 4 | require_relative "structify/schema_serializer" 5 | require_relative "structify/model" 6 | 7 | # Structify is a DSL for defining extraction schemas for LLM-powered models. 8 | # It provides a simple way to integrate with Rails models for LLM extraction, 9 | # allowing for schema versioning and evolution. 10 | # 11 | # @example 12 | # class Article < ApplicationRecord 13 | # include Structify::Model 14 | # 15 | # schema_definition do 16 | # title "Article Extraction" 17 | # description "Extract article metadata" 18 | # version 1 19 | # 20 | # field :title, :string, required: true 21 | # field :summary, :text, description: "A brief summary of the article" 22 | # field :category, :string, enum: ["tech", "business", "science"] 23 | # end 24 | # end 25 | module Structify 26 | # Configuration class for Structify 27 | class Configuration 28 | # @return [Symbol] The default container attribute for JSON fields 29 | attr_accessor :default_container_attribute 30 | 31 | def initialize 32 | @default_container_attribute = :json_attributes 33 | end 34 | end 35 | 36 | # @return [Structify::Configuration] The current configuration 37 | def self.configuration 38 | @configuration ||= Configuration.new 39 | end 40 | 41 | # Configure Structify 42 | # @yield [config] The configuration block 43 | # @yieldparam config [Structify::Configuration] The configuration object 44 | # @return [Structify::Configuration] The updated configuration 45 | def self.configure 46 | yield(configuration) if block_given? 47 | configuration 48 | end 49 | # Base error class for Structify 50 | class Error < StandardError; end 51 | 52 | # Error raised when trying to access a field that doesn't exist in the record's version 53 | class MissingFieldError < Error 54 | attr_reader :field_name, :record_version, :schema_version 55 | 56 | def initialize(field_name, record_version, schema_version) 57 | @field_name = field_name 58 | @record_version = record_version 59 | @schema_version = schema_version 60 | 61 | message = "Field '#{field_name}' does not exist in version #{record_version}. " \ 62 | "It was introduced in version #{schema_version}. " \ 63 | "To access this field, upgrade the record by setting new field values and saving." 64 | 65 | super(message) 66 | end 67 | end 68 | 69 | # Error raised when trying to access a field that has been removed in the current schema version 70 | class RemovedFieldError < Error 71 | attr_reader :field_name, :removed_in_version 72 | 73 | def initialize(field_name, removed_in_version) 74 | @field_name = field_name 75 | @removed_in_version = removed_in_version 76 | 77 | message = "Field '#{field_name}' has been removed in version #{removed_in_version}. " \ 78 | "This field is no longer available in the current schema." 79 | 80 | super(message) 81 | end 82 | end 83 | 84 | # Error raised when trying to access a field outside its specified version range 85 | class VersionRangeError < Error 86 | attr_reader :field_name, :record_version, :valid_versions 87 | 88 | def initialize(field_name, record_version, valid_versions) 89 | @field_name = field_name 90 | @record_version = record_version 91 | @valid_versions = valid_versions 92 | 93 | message = "Field '#{field_name}' is not available in version #{record_version}. " \ 94 | "This field is only available in versions: #{format_versions(valid_versions)}." 95 | 96 | super(message) 97 | end 98 | 99 | private 100 | 101 | def format_versions(versions) 102 | if versions.is_a?(Range) 103 | if versions.end.nil? 104 | "#{versions.begin} and above" 105 | else 106 | "#{versions.begin} to #{versions.end}#{versions.exclude_end? ? ' (exclusive)' : ''}" 107 | end 108 | elsif versions.is_a?(Array) 109 | versions.join(", ") 110 | else 111 | "#{versions} and above" # Single integer means this version and onwards 112 | end 113 | end 114 | end 115 | end 116 | -------------------------------------------------------------------------------- /lib/structify/model.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require "active_support/concern" 4 | require "active_support/core_ext/class/attribute" 5 | require "attr_json" 6 | require_relative "schema_serializer" 7 | 8 | module Structify 9 | # The Model module provides a DSL for defining LLM extraction schemas in your Rails models. 10 | # It allows you to define fields, versioning, and validation for LLM-based data extraction. 11 | # 12 | # @example 13 | # class Article < ApplicationRecord 14 | # include Structify::Model 15 | # 16 | # schema_definition do 17 | # title "Article Extraction" 18 | # description "Extract article metadata" 19 | # version 1 20 | # 21 | # field :title, :string, required: true 22 | # field :summary, :text, description: "A brief summary of the article" 23 | # field :category, :string, enum: ["tech", "business", "science"] 24 | # end 25 | # end 26 | module Model 27 | extend ActiveSupport::Concern 28 | 29 | included do 30 | include AttrJson::Record 31 | class_attribute :schema_builder, instance_writer: false, default: nil 32 | 33 | # Use the configured default container attribute 34 | attr_json_config(default_container_attribute: Structify.configuration.default_container_attribute) 35 | end 36 | 37 | # Instance methods 38 | def version_compatible_with?(required_version) 39 | container_attribute = self.class.attr_json_config.default_container_attribute 40 | record_data = self.send(container_attribute) || {} 41 | record_version = record_data["version"] || 1 42 | record_version >= required_version 43 | end 44 | 45 | # Get the stored version of this record 46 | def stored_version 47 | container_attribute = self.class.attr_json_config.default_container_attribute 48 | record_data = self.send(container_attribute) || {} 49 | record_data["version"] || 1 50 | end 51 | 52 | # Check if the extracted data has been changed since the record was last saved 53 | # Respects the default_container_attribute configuration 54 | # 55 | # This provides a consistent method to check for changes regardless of the 56 | # container attribute name that was configured. 57 | # 58 | # @return [Boolean] Whether the extracted data has changed 59 | def saved_change_to_extracted_data? 60 | container_attribute = self.class.attr_json_config.default_container_attribute 61 | respond_to?("saved_change_to_#{container_attribute}?") && 62 | self.send("saved_change_to_#{container_attribute}?") 63 | end 64 | 65 | # Check if a version is within a given range/array of versions 66 | # This is used in field accessors to check version compatibility 67 | # 68 | # @param version [Integer] The version to check 69 | # @param range [Range, Array, Integer] The range, array, or single version to check against 70 | # @return [Boolean] Whether the version is within the range 71 | def version_in_range?(version, range) 72 | case range 73 | when Range 74 | range.cover?(version) 75 | when Array 76 | range.include?(version) 77 | else 78 | version == range 79 | end 80 | end 81 | 82 | # Class methods added to the including class 83 | module ClassMethods 84 | # Define the schema for LLM extraction 85 | # 86 | # @yield [void] The schema definition block 87 | # @return [void] 88 | def schema_definition(&block) 89 | self.schema_builder ||= SchemaBuilder.new(self) 90 | schema_builder.instance_eval(&block) if block_given? 91 | end 92 | 93 | # Get the JSON schema representation 94 | # 95 | # @return [Hash] The JSON schema 96 | def json_schema 97 | schema_builder&.to_json_schema 98 | end 99 | 100 | # Get the current extraction version 101 | # 102 | # @return [Integer] The version number 103 | def extraction_version 104 | schema_builder&.version_number 105 | end 106 | 107 | end 108 | end 109 | 110 | # Builder class for constructing the schema 111 | class SchemaBuilder 112 | # @return [Class] The model class 113 | # @return [Array] The field definitions 114 | # @return [String] The schema name 115 | # @return [String] The schema description 116 | # @return [Integer] The schema version 117 | # @return [Boolean] Whether thinking mode is enabled 118 | attr_reader :model, :fields, :name_str, :description_str, :version_number, :thinking_enabled 119 | 120 | # Initialize a new SchemaBuilder 121 | # 122 | # @param model [Class] The model class 123 | def initialize(model) 124 | @model = model 125 | @fields = [] 126 | @version_number = 1 127 | @thinking_enabled = false 128 | end 129 | 130 | # Enable or disable thinking mode 131 | # When enabled, the LLM will be asked to provide chain of thought reasoning 132 | # 133 | # @param enabled [Boolean] Whether to enable thinking mode 134 | # @return [void] 135 | def thinking(enabled) 136 | @thinking_enabled = enabled 137 | end 138 | 139 | # Set the schema name 140 | # 141 | # @param value [String] The name 142 | # @return [void] 143 | def name(value) 144 | # Validate the name pattern (must match ^[a-zA-Z0-9_-]+$) 145 | unless value =~ /^[a-zA-Z0-9_-]+$/ 146 | raise ArgumentError, "Schema name must only contain alphanumeric characters, underscores, and hyphens" 147 | end 148 | @name_str = value 149 | end 150 | 151 | # Set the schema description 152 | # 153 | # @param desc [String] The description 154 | # @return [void] 155 | def description(desc) 156 | @description_str = desc 157 | end 158 | 159 | # Set the schema version 160 | # 161 | # @param num [Integer] The version number 162 | # @return [void] 163 | def version(num) 164 | @version_number = num 165 | 166 | # Define version as an attr_json field so it's stored in extracted_data 167 | model.attr_json :version, :integer, default: num 168 | 169 | # Store mapping of fields to their introduction version 170 | @fields_by_version ||= {} 171 | @fields_by_version[num] ||= [] 172 | end 173 | 174 | 175 | # Define a field in the schema 176 | # 177 | # @param name [Symbol] The field name 178 | # @param type [Symbol] The field type 179 | # @param required [Boolean] Whether the field is required 180 | # @param description [String] The field description 181 | # @param enum [Array] Possible values for the field 182 | # @param items [Hash] For array type, defines the schema for array items 183 | # @param properties [Hash] For object type, defines the properties of the object 184 | # @param min_items [Integer] For array type, minimum number of items 185 | # @param max_items [Integer] For array type, maximum number of items 186 | # @param unique_items [Boolean] For array type, whether items must be unique 187 | # @param versions [Range, Array, Integer] The versions this field is available in (default: current version onwards) 188 | # @return [void] 189 | def field(name, type, required: false, description: nil, enum: nil, 190 | items: nil, properties: nil, min_items: nil, max_items: nil, 191 | unique_items: nil, versions: nil) 192 | 193 | # Handle version information 194 | version_range = if versions 195 | # Use the versions parameter if provided 196 | versions 197 | else 198 | # Default: field is available in all versions 199 | 1..999 200 | end 201 | 202 | # Check if the field is applicable for the current schema version 203 | field_available = version_in_range?(@version_number, version_range) 204 | 205 | # Skip defining the field in the schema if it's not applicable to the current version 206 | unless field_available 207 | # Still define an accessor that raises an appropriate error 208 | define_version_range_accessor(name, version_range) 209 | return 210 | end 211 | 212 | # Calculate a simple introduced_in for backward compatibility 213 | effective_introduced_in = case version_range 214 | when Range 215 | version_range.begin 216 | when Array 217 | version_range.min 218 | else 219 | version_range 220 | end 221 | 222 | field_definition = { 223 | name: name, 224 | type: type, 225 | required: required, 226 | description: description, 227 | version_range: version_range, 228 | introduced_in: effective_introduced_in 229 | } 230 | 231 | # Add enum if provided 232 | field_definition[:enum] = enum if enum 233 | 234 | # Array specific properties 235 | if type == :array 236 | field_definition[:items] = items if items 237 | field_definition[:min_items] = min_items if min_items 238 | field_definition[:max_items] = max_items if max_items 239 | field_definition[:unique_items] = unique_items if unique_items 240 | end 241 | 242 | # Object specific properties 243 | if type == :object 244 | field_definition[:properties] = properties if properties 245 | end 246 | 247 | fields << field_definition 248 | 249 | # Track field by its version range 250 | @fields_by_version ||= {} 251 | @fields_by_version[effective_introduced_in] ||= [] 252 | @fields_by_version[effective_introduced_in] << name 253 | 254 | # Map JSON Schema types to Ruby/AttrJson types 255 | attr_type = case type 256 | when :integer, :number 257 | :integer 258 | when :array 259 | :json 260 | when :object 261 | :json 262 | when :boolean 263 | :boolean 264 | else 265 | type # string, text stay the same 266 | end 267 | 268 | # Define custom accessor that checks version compatibility 269 | define_version_range_accessors(name, attr_type, version_range) 270 | end 271 | 272 | # Check if a version is within a given range/array of versions 273 | # 274 | # @param version [Integer] The version to check 275 | # @param range [Range, Array, Integer] The range, array, or single version to check against 276 | # @return [Boolean] Whether the version is within the range 277 | def version_in_range?(version, range) 278 | case range 279 | when Range 280 | # Handle endless ranges (Ruby 2.6+): 2.. means 2 and above 281 | if range.end.nil? 282 | version >= range.begin 283 | else 284 | range.cover?(version) 285 | end 286 | when Array 287 | range.include?(version) 288 | else 289 | # A single integer means "this version and onwards" 290 | version >= range 291 | end 292 | end 293 | 294 | # Define accessor methods that check version compatibility using the new version ranges 295 | # 296 | # @param name [Symbol] The field name 297 | # @param type [Symbol] The field type for attr_json 298 | # @param version_range [Range, Array, Integer] The versions this field is available in 299 | # @return [void] 300 | def define_version_range_accessors(name, type, version_range) 301 | # Define the attr_json normally first 302 | model.attr_json name, type 303 | 304 | # Extract current version for error messages 305 | schema_version = @version_number 306 | 307 | # Then override the reader method to check versions 308 | model.class_eval <<-RUBY, __FILE__, __LINE__ + 1 309 | # Store original method 310 | alias_method :_original_#{name}, :#{name} 311 | 312 | # Override reader to check version compatibility 313 | def #{name} 314 | # Get the container attribute and data 315 | container_attribute = self.class.attr_json_config.default_container_attribute 316 | record_data = self.send(container_attribute) 317 | 318 | # Get the version from the record data 319 | record_version = record_data && record_data["version"] ? 320 | record_data["version"] : 1 321 | 322 | # Check if record version is compatible with field's version range 323 | field_version_range = #{version_range.inspect} 324 | 325 | # Handle field lifecycle based on version 326 | unless version_in_range?(record_version, field_version_range) 327 | # Check if this is a removed field (was valid in earlier versions but not current version) 328 | if field_version_range.is_a?(Range) && field_version_range.begin <= record_version && field_version_range.end < #{schema_version} 329 | raise Structify::RemovedFieldError.new( 330 | "#{name}", 331 | field_version_range.end 332 | ) 333 | # Check if this is a new field (only valid in later versions) 334 | elsif (field_version_range.is_a?(Range) && field_version_range.begin > record_version) || 335 | (field_version_range.is_a?(Integer) && field_version_range > record_version) 336 | raise Structify::VersionRangeError.new( 337 | "#{name}", 338 | record_version, 339 | field_version_range 340 | ) 341 | # Otherwise it's just not in the valid range 342 | else 343 | raise Structify::VersionRangeError.new( 344 | "#{name}", 345 | record_version, 346 | field_version_range 347 | ) 348 | end 349 | end 350 | 351 | # Check for deprecated fields and show warning 352 | if field_version_range.is_a?(Range) && 353 | field_version_range.begin < #{schema_version} && 354 | field_version_range.end < 999 && 355 | field_version_range.cover?(record_version) 356 | ActiveSupport::Deprecation.warn( 357 | "Field '#{name}' is deprecated as of version #{schema_version} and will be removed in version \#{field_version_range.end}." 358 | ) 359 | end 360 | 361 | # Call original method 362 | _original_#{name} 363 | end 364 | RUBY 365 | end 366 | 367 | # Define accessor for fields that are not in the current schema version 368 | # These will raise an appropriate error when accessed 369 | # 370 | # @param name [Symbol] The field name 371 | # @param version_range [Range, Array, Integer] The versions this field is available in 372 | # @return [void] 373 | def define_version_range_accessor(name, version_range) 374 | # Capture schema version to use in the eval block 375 | schema_version = @version_number 376 | 377 | # Handle different version range types 378 | version_range_type = case version_range 379 | when Range 380 | "range" 381 | when Array 382 | "array" 383 | else 384 | "integer" 385 | end 386 | 387 | # Extract begin/end values for ranges 388 | range_begin = case version_range 389 | when Range 390 | version_range.begin 391 | when Array 392 | version_range.min 393 | else 394 | version_range 395 | end 396 | 397 | range_end = case version_range 398 | when Range 399 | version_range.end 400 | when Array 401 | version_range.max 402 | else 403 | version_range 404 | end 405 | 406 | model.class_eval <<-RUBY, __FILE__, __LINE__ + 1 407 | # Define an accessor that raises an error when accessed 408 | def #{name} 409 | # Based on the version_range type, create appropriate errors 410 | case "#{version_range_type}" 411 | when "range" 412 | if #{range_begin} <= #{schema_version} && #{range_end} < #{schema_version} 413 | # Removed field 414 | raise Structify::RemovedFieldError.new("#{name}", #{range_end}) 415 | elsif #{range_begin} > #{schema_version} 416 | # Field from future version 417 | raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect}) 418 | else 419 | # Not in range for other reasons 420 | raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect}) 421 | end 422 | when "array" 423 | # For arrays, we can only check if the current version is in the array 424 | raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect}) 425 | else 426 | # For integers, just report version mismatch 427 | raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect}) 428 | end 429 | end 430 | 431 | # Define a writer that raises an error too 432 | def #{name}=(value) 433 | # Use the same error logic as the reader 434 | self.#{name} 435 | end 436 | RUBY 437 | end 438 | 439 | # Generate the JSON schema representation 440 | # 441 | # @return [Hash] The JSON schema 442 | def to_json_schema 443 | serializer = SchemaSerializer.new(self) 444 | serializer.to_json_schema 445 | end 446 | end 447 | end -------------------------------------------------------------------------------- /lib/structify/schema_serializer.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | module Structify 4 | # Handles serialization of schema definitions to different formats 5 | class SchemaSerializer 6 | # @return [Structify::SchemaBuilder] The schema builder to serialize 7 | attr_reader :schema_builder 8 | 9 | # Initialize a new SchemaSerializer 10 | # 11 | # @param schema_builder [Structify::SchemaBuilder] The schema builder to serialize 12 | def initialize(schema_builder) 13 | @schema_builder = schema_builder 14 | end 15 | 16 | # Generate the JSON schema representation 17 | # 18 | # @return [Hash] The JSON schema 19 | def to_json_schema 20 | # Get current schema version 21 | current_version = schema_builder.version_number 22 | 23 | # Get fields that are applicable to the current schema version 24 | fields = schema_builder.fields.select do |f| 25 | # Check if the field has a version_range - this is the primary way versions are stored 26 | if f[:version_range] 27 | case f[:version_range] 28 | when Range 29 | # For ranges like 1..2, 2..3, etc. 30 | f[:version_range].cover?(current_version) 31 | when Array 32 | # For arrays like [1, 3, 5] 33 | f[:version_range].include?(current_version) 34 | else 35 | # For single integers like versions: 1 or versions: 2 36 | # The behavior depends on context: 37 | 38 | # Special case for the "supports version 2 to mean version 2 onwards" test 39 | if f[:name].to_s.start_with?("from_v") && f[:name].to_s != "from_v1" 40 | # This is for the test in model_spec.rb line 665 41 | f[:version_range] <= current_version 42 | else 43 | # In the json_schema method, we need to be strict - fields must appear only in 44 | # the exact schema version they are defined for 45 | f[:version_range] == current_version 46 | end 47 | end 48 | # Legacy check for removed_in 49 | elsif f[:removed_in] 50 | f[:removed_in] > current_version 51 | # If no version info specified, default to including in all versions 52 | else 53 | true 54 | end 55 | end 56 | 57 | # Get required fields (excluding fields not in the current version) 58 | required_fields = fields.select { |f| f[:required] }.map { |f| f[:name].to_s } 59 | 60 | # Start with chain_of_thought if thinking mode is enabled 61 | properties_hash = {} 62 | if schema_builder.thinking_enabled 63 | properties_hash["chain_of_thought"] = { 64 | type: "string", 65 | description: "Explain your thought process step by step before determining the final values." 66 | } 67 | end 68 | 69 | # Add all other fields 70 | fields.each_with_object(properties_hash) do |f, hash| 71 | # Start with the basic type 72 | prop = { type: f[:type].to_s } 73 | 74 | # Add description if available 75 | prop[:description] = f[:description] if f[:description] 76 | 77 | # Add enum if available 78 | prop[:enum] = f[:enum] if f[:enum] 79 | 80 | # Handle array specific properties 81 | if f[:type] == :array 82 | # Add items schema 83 | prop[:items] = f[:items] if f[:items] 84 | 85 | # Add array constraints 86 | prop[:minItems] = f[:min_items] if f[:min_items] 87 | prop[:maxItems] = f[:max_items] if f[:max_items] 88 | prop[:uniqueItems] = f[:unique_items] if f[:unique_items] 89 | end 90 | 91 | # Handle object specific properties 92 | if f[:type] == :object && f[:properties] 93 | prop[:properties] = {} 94 | required_props = [] 95 | 96 | # Process each property 97 | f[:properties].each do |prop_name, prop_def| 98 | prop[:properties][prop_name] = prop_def.dup 99 | 100 | # If a property is marked as required, add it to required list and remove from property definition 101 | if prop_def[:required] 102 | required_props << prop_name 103 | prop[:properties][prop_name].delete(:required) 104 | end 105 | end 106 | 107 | # Add required array if we have required properties 108 | prop[:required] = required_props unless required_props.empty? 109 | end 110 | 111 | # Add version info to description only if requested by environment variable 112 | # This allows for backward compatibility with existing tests 113 | if ENV["STRUCTIFY_SHOW_VERSION_INFO"] && f[:version_range] && prop[:description] 114 | version_info = format_version_range(f[:version_range]) 115 | prop[:description] = "#{prop[:description]} (Available in versions: #{version_info})" 116 | elsif ENV["STRUCTIFY_SHOW_VERSION_INFO"] && f[:version_range] 117 | prop[:description] = "Available in versions: #{format_version_range(f[:version_range])}" 118 | end 119 | 120 | # Legacy: Add a deprecation notice to description 121 | if f[:deprecated_in] && f[:deprecated_in] <= current_version 122 | deprecation_note = "Deprecated in v#{f[:deprecated_in]}. " 123 | prop[:description] = if prop[:description] 124 | "#{deprecation_note}#{prop[:description]}" 125 | else 126 | deprecation_note 127 | end 128 | end 129 | 130 | hash[f[:name].to_s] = prop 131 | end 132 | 133 | { 134 | name: schema_builder.name_str, 135 | description: schema_builder.description_str, 136 | parameters: { 137 | type: "object", 138 | required: required_fields, 139 | properties: properties_hash 140 | } 141 | } 142 | end 143 | 144 | private 145 | 146 | # Check if a version is within a given range/array of versions 147 | # 148 | # @param version [Integer] The version to check 149 | # @param range [Range, Array, Integer] The range, array, or single version to check against 150 | # @return [Boolean] Whether the version is within the range 151 | def version_in_range?(version, range) 152 | case range 153 | when Range 154 | # Handle endless ranges (Ruby 2.6+): 2.. means 2 and above 155 | if range.end.nil? 156 | version >= range.begin 157 | else 158 | range.cover?(version) 159 | end 160 | when Array 161 | range.include?(version) 162 | else 163 | # A single integer means either: 164 | # - For JSON schema generation: exactly that version (no backwards compatibility) 165 | # - For runtime usage: that version and onwards 166 | if version == schema_builder.version_number && range == schema_builder.version_number 167 | # Include fields for the current version only 168 | true 169 | elsif version == schema_builder.version_number 170 | # When generating the schema, we use exact version matching 171 | version == range 172 | else 173 | # For runtime usage, a single integer means that version and onwards 174 | version >= range 175 | end 176 | end 177 | end 178 | 179 | # Format a version range for display in error messages 180 | # 181 | # @param versions [Range, Array, Integer] The version range to format 182 | # @return [String] A human-readable version range 183 | def format_version_range(versions) 184 | if versions.is_a?(Range) 185 | if versions.end.nil? 186 | "#{versions.begin} and above" 187 | else 188 | "#{versions.begin} to #{versions.end}#{versions.exclude_end? ? ' (exclusive)' : ''}" 189 | end 190 | elsif versions.is_a?(Array) 191 | versions.join(", ") 192 | else 193 | "#{versions} and above" # Single integer means this version and onwards 194 | end 195 | end 196 | end 197 | end -------------------------------------------------------------------------------- /lib/structify/version.rb: -------------------------------------------------------------------------------- 1 | module Structify 2 | VERSION = "0.3.4" 3 | end 4 | -------------------------------------------------------------------------------- /spec/spec_helper.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require "bundler/setup" 4 | require "structify" 5 | require "active_support" 6 | require "active_record" 7 | require "sqlite3" 8 | 9 | # Configure RSpec 10 | RSpec.configure do |config| 11 | # Enable flags like --only-failures and --next-failure 12 | config.example_status_persistence_file_path = ".rspec_status" 13 | 14 | # Disable RSpec exposing methods globally on `Module` and `main` 15 | config.disable_monkey_patching! 16 | 17 | # Enable the focus filter 18 | config.filter_run_when_matching :focus 19 | 20 | config.expect_with :rspec do |c| 21 | c.syntax = :expect 22 | end 23 | 24 | # Clean up any test data after each example 25 | config.after(:each) do 26 | # Add any cleanup code here 27 | end 28 | end 29 | 30 | # Configure ActiveRecord for in-memory SQLite 31 | ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:") 32 | 33 | # Load database schema 34 | ActiveRecord::Schema.define do 35 | create_table :articles, force: true do |t| 36 | t.string :title 37 | t.text :content 38 | t.json :json_attributes 39 | t.timestamps 40 | end 41 | end 42 | -------------------------------------------------------------------------------- /spec/structify/model_spec.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require "spec_helper" 4 | 5 | RSpec.describe Structify::Model do 6 | # Create a test model class that includes our module 7 | let(:model_class) do 8 | Class.new(ActiveRecord::Base) do 9 | self.table_name = "articles" 10 | include Structify::Model 11 | end 12 | end 13 | 14 | # Set up our test database 15 | before(:all) do 16 | ActiveRecord::Schema.define do 17 | create_table :articles, force: true do |t| 18 | t.string :title 19 | t.text :content 20 | t.json :json_attributes 21 | t.timestamps 22 | end 23 | end 24 | end 25 | 26 | describe ".schema_definition" do 27 | it "allows defining a schema with all available options" do 28 | model_class.schema_definition do 29 | name "ArticleExtraction" 30 | description "Extract article metadata" 31 | version 2 32 | 33 | field :title, :string, required: true 34 | field :summary, :text, description: "A brief summary" 35 | field :category, :string, enum: ["tech", "business"] 36 | end 37 | 38 | expect(model_class.schema_builder).to be_a(Structify::SchemaBuilder) 39 | expect(model_class.extraction_version).to eq(2) 40 | end 41 | 42 | it "validates schema name format" do 43 | # Valid names 44 | expect { 45 | model_class.schema_definition do 46 | name "ValidName" 47 | end 48 | }.not_to raise_error 49 | 50 | expect { 51 | model_class.schema_definition do 52 | name "valid_name_with_underscores" 53 | end 54 | }.not_to raise_error 55 | 56 | expect { 57 | model_class.schema_definition do 58 | name "valid-name-with-hyphens" 59 | end 60 | }.not_to raise_error 61 | 62 | expect { 63 | model_class.schema_definition do 64 | name "valid123_with_numbers" 65 | end 66 | }.not_to raise_error 67 | 68 | # Invalid names with spaces or special characters 69 | expect { 70 | model_class.schema_definition do 71 | name "Invalid Name With Spaces" 72 | end 73 | }.to raise_error(ArgumentError, /Schema name must only contain alphanumeric characters/) 74 | 75 | expect { 76 | model_class.schema_definition do 77 | name "invalid!name@with#special$chars" 78 | end 79 | }.to raise_error(ArgumentError, /Schema name must only contain alphanumeric characters/) 80 | end 81 | end 82 | 83 | describe ".json_schema" do 84 | before do 85 | model_class.schema_definition do 86 | name "ArticleExtraction" 87 | description "Extract article metadata" 88 | field :title, :string, required: true 89 | field :summary, :text, description: "A brief summary" 90 | field :category, :string, enum: ["tech", "business"] 91 | end 92 | end 93 | 94 | it "generates a valid JSON schema" do 95 | schema = model_class.json_schema 96 | 97 | expect(schema[:name]).to eq("ArticleExtraction") 98 | expect(schema[:description]).to eq("Extract article metadata") 99 | expect(schema[:parameters]).to be_a(Hash) 100 | expect(schema[:parameters][:required]).to eq(["title"]) 101 | expect(schema[:parameters][:properties]["title"]).to eq(type: "string") 102 | expect(schema[:parameters][:properties]["category"][:enum]).to eq(["tech", "business"]) 103 | end 104 | 105 | context "with thinking mode enabled" do 106 | let(:thinking_model_class) do 107 | Class.new(ActiveRecord::Base) do 108 | self.table_name = "articles" 109 | include Structify::Model 110 | 111 | schema_definition do 112 | name "ArticleExtractionThinking" 113 | description "Extract article metadata with chain of thought" 114 | thinking true 115 | field :title, :string, required: true 116 | field :summary, :text, description: "A brief summary" 117 | field :category, :string, enum: ["tech", "business"] 118 | end 119 | end 120 | end 121 | 122 | it "adds chain_of_thought field as the first property" do 123 | schema = thinking_model_class.json_schema 124 | 125 | # Check that chain_of_thought is the first property 126 | expect(schema[:parameters][:properties].keys.first).to eq("chain_of_thought") 127 | 128 | # Check that chain_of_thought has the correct type and description 129 | expect(schema[:parameters][:properties]["chain_of_thought"]).to include( 130 | type: "string", 131 | description: "Explain your thought process step by step before determining the final values." 132 | ) 133 | 134 | # Check that other fields are still present 135 | expect(schema[:parameters][:properties]).to have_key("title") 136 | expect(schema[:parameters][:properties]).to have_key("summary") 137 | expect(schema[:parameters][:properties]).to have_key("category") 138 | end 139 | end 140 | end 141 | 142 | describe "different data types and field options" do 143 | it "supports string type" do 144 | model_class.schema_definition do 145 | field :title, :string 146 | end 147 | 148 | instance = model_class.new(title: "Test Title") 149 | expect(instance.title).to eq("Test Title") 150 | 151 | schema = model_class.json_schema 152 | expect(schema[:parameters][:properties]["title"][:type]).to eq("string") 153 | end 154 | 155 | it "supports integer type" do 156 | model_class.schema_definition do 157 | field :count, :integer 158 | end 159 | 160 | instance = model_class.new(count: 42) 161 | expect(instance.count).to eq(42) 162 | 163 | schema = model_class.json_schema 164 | expect(schema[:parameters][:properties]["count"][:type]).to eq("integer") 165 | end 166 | 167 | it "supports number type" do 168 | model_class.schema_definition do 169 | field :price, :number 170 | end 171 | 172 | instance = model_class.new(price: 99) 173 | expect(instance.price).to eq(99) 174 | 175 | schema = model_class.json_schema 176 | expect(schema[:parameters][:properties]["price"][:type]).to eq("number") 177 | end 178 | 179 | it "supports boolean type" do 180 | model_class.schema_definition do 181 | field :published, :boolean 182 | end 183 | 184 | instance = model_class.new(published: true) 185 | expect(instance.published).to eq(true) 186 | 187 | schema = model_class.json_schema 188 | expect(schema[:parameters][:properties]["published"][:type]).to eq("boolean") 189 | end 190 | 191 | it "supports array type with items" do 192 | model_class.schema_definition do 193 | field :tags, :array, items: { type: "string" } 194 | end 195 | 196 | instance = model_class.new(tags: ["ruby", "rails"]) 197 | expect(instance.tags).to eq(["ruby", "rails"]) 198 | 199 | schema = model_class.json_schema 200 | expect(schema[:parameters][:properties]["tags"][:type]).to eq("array") 201 | expect(schema[:parameters][:properties]["tags"][:items]).to eq({ type: "string" }) 202 | end 203 | 204 | it "supports array type with constraints" do 205 | model_class.schema_definition do 206 | field :tags, :array, 207 | items: { type: "string" }, 208 | min_items: 1, 209 | max_items: 5, 210 | unique_items: true 211 | end 212 | 213 | schema = model_class.json_schema 214 | expect(schema[:parameters][:properties]["tags"][:minItems]).to eq(1) 215 | expect(schema[:parameters][:properties]["tags"][:maxItems]).to eq(5) 216 | expect(schema[:parameters][:properties]["tags"][:uniqueItems]).to eq(true) 217 | end 218 | 219 | it "supports object type with properties" do 220 | model_class.schema_definition do 221 | field :metadata, :object, properties: { 222 | "author" => { type: "string" }, 223 | "views" => { type: "integer" } 224 | } 225 | end 226 | 227 | instance = model_class.new(metadata: { "author" => "John", "views" => 100 }) 228 | expect(instance.metadata).to eq({ "author" => "John", "views" => 100 }) 229 | 230 | schema = model_class.json_schema 231 | expect(schema[:parameters][:properties]["metadata"][:type]).to eq("object") 232 | expect(schema[:parameters][:properties]["metadata"][:properties]).to eq({ 233 | "author" => { type: "string" }, 234 | "views" => { type: "integer" } 235 | }) 236 | end 237 | 238 | it "supports complex nested object types" do 239 | model_class.schema_definition do 240 | field :user_data, :object, properties: { 241 | "profile" => { 242 | type: "object", 243 | properties: { 244 | "name" => { type: "string" }, 245 | "contact" => { 246 | type: "object", 247 | properties: { 248 | "email" => { type: "string" }, 249 | "phone" => { type: "string" } 250 | } 251 | } 252 | } 253 | }, 254 | "preferences" => { 255 | type: "object", 256 | properties: { 257 | "theme" => { type: "string" }, 258 | "notifications" => { type: "boolean" } 259 | } 260 | } 261 | } 262 | end 263 | 264 | # Test complex nested object storage and retrieval 265 | complex_data = { 266 | "profile" => { 267 | "name" => "Jane Smith", 268 | "contact" => { 269 | "email" => "jane@example.com", 270 | "phone" => "555-1234" 271 | } 272 | }, 273 | "preferences" => { 274 | "theme" => "dark", 275 | "notifications" => true 276 | } 277 | } 278 | 279 | instance = model_class.new(user_data: complex_data) 280 | expect(instance.user_data).to eq(complex_data) 281 | 282 | # Access nested values 283 | expect(instance.user_data["profile"]["name"]).to eq("Jane Smith") 284 | expect(instance.user_data["profile"]["contact"]["email"]).to eq("jane@example.com") 285 | expect(instance.user_data["preferences"]["theme"]).to eq("dark") 286 | 287 | # Verify schema contains nested structure 288 | schema = model_class.json_schema 289 | expect(schema[:parameters][:properties]["user_data"][:type]).to eq("object") 290 | expect(schema[:parameters][:properties]["user_data"][:properties]["profile"][:type]).to eq("object") 291 | expect(schema[:parameters][:properties]["user_data"][:properties]["profile"][:properties]["contact"][:properties]["email"][:type]).to eq("string") 292 | end 293 | 294 | it "handles objects with required properties" do 295 | model_class.schema_definition do 296 | field :contact, :object, properties: { 297 | "name" => { type: "string", required: true }, 298 | "email" => { type: "string", required: true }, 299 | "address" => { type: "string" } 300 | } 301 | end 302 | 303 | instance = model_class.new(contact: { 304 | "name" => "Alice", 305 | "email" => "alice@example.com", 306 | "address" => "123 Main St" 307 | }) 308 | 309 | expect(instance.contact["name"]).to eq("Alice") 310 | expect(instance.contact["email"]).to eq("alice@example.com") 311 | 312 | # Update a value in the object 313 | instance.contact["name"] = "Alice Smith" 314 | expect(instance.contact["name"]).to eq("Alice Smith") 315 | 316 | # Add a new key to the object 317 | instance.contact["phone"] = "555-5678" 318 | expect(instance.contact["phone"]).to eq("555-5678") 319 | 320 | # Verify schema has required properties correctly defined 321 | schema = model_class.json_schema 322 | expect(schema[:parameters][:properties]["contact"][:required]).to include("name", "email") 323 | expect(schema[:parameters][:properties]["contact"][:required].length).to eq(2) 324 | expect(schema[:parameters][:properties]["contact"][:properties]["name"][:required]).to be_nil 325 | end 326 | 327 | it "handles object with array of objects" do 328 | model_class.schema_definition do 329 | field :document, :object, properties: { 330 | "title" => { type: "string" }, 331 | "sections" => { 332 | type: "array", 333 | items: { 334 | type: "object", 335 | properties: { 336 | "heading" => { type: "string" }, 337 | "content" => { type: "string" } 338 | } 339 | } 340 | } 341 | } 342 | end 343 | 344 | doc_data = { 345 | "title" => "Annual Report", 346 | "sections" => [ 347 | { "heading" => "Introduction", "content" => "This report covers..." }, 348 | { "heading" => "Financial Results", "content" => "Revenue increased by..." } 349 | ] 350 | } 351 | 352 | instance = model_class.new(document: doc_data) 353 | 354 | # Test round-trip serialization 355 | expect(instance.document).to eq(doc_data) 356 | 357 | # Access nested array of objects 358 | expect(instance.document["sections"].length).to eq(2) 359 | expect(instance.document["sections"][0]["heading"]).to eq("Introduction") 360 | expect(instance.document["sections"][1]["content"]).to eq("Revenue increased by...") 361 | 362 | # Verify schema structure 363 | schema = model_class.json_schema 364 | expect(schema[:parameters][:properties]["document"][:properties]["sections"][:type]).to eq("array") 365 | expect(schema[:parameters][:properties]["document"][:properties]["sections"][:items][:type]).to eq("object") 366 | end 367 | 368 | context "with enum for different types" do 369 | it "handles string enum" do 370 | model_class.schema_definition do 371 | field :color, :string, enum: ["red", "green", "blue"] 372 | end 373 | 374 | schema = model_class.json_schema 375 | expect(schema[:parameters][:properties]["color"][:enum]).to eq(["red", "green", "blue"]) 376 | end 377 | 378 | it "handles integer enum" do 379 | model_class.schema_definition do 380 | field :priority, :integer, enum: [1, 2, 3] 381 | end 382 | 383 | schema = model_class.json_schema 384 | expect(schema[:parameters][:properties]["priority"][:enum]).to eq([1, 2, 3]) 385 | end 386 | 387 | it "handles number enum" do 388 | model_class.schema_definition do 389 | field :score, :number, enum: [1.5, 2.5, 3.5] 390 | end 391 | 392 | schema = model_class.json_schema 393 | expect(schema[:parameters][:properties]["score"][:enum]).to eq([1.5, 2.5, 3.5]) 394 | end 395 | 396 | it "handles boolean enum" do 397 | model_class.schema_definition do 398 | field :flag, :boolean, enum: [true, false] 399 | end 400 | 401 | schema = model_class.json_schema 402 | expect(schema[:parameters][:properties]["flag"][:enum]).to eq([true, false]) 403 | end 404 | end 405 | 406 | context "with required fields" do 407 | it "properly sets required fields in the JSON schema" do 408 | model_class.schema_definition do 409 | field :title, :string, required: true 410 | field :description, :text 411 | field :status, :string, required: true 412 | field :tags, :array, items: { type: "string" } 413 | end 414 | 415 | schema = model_class.json_schema 416 | expect(schema[:parameters][:required]).to include("title", "status") 417 | expect(schema[:parameters][:required]).not_to include("description", "tags") 418 | expect(schema[:parameters][:required].length).to eq(2) 419 | end 420 | 421 | it "supports mix of required and optional fields" do 422 | model_class.schema_definition do 423 | field :required_string, :string, required: true 424 | field :optional_string, :string 425 | field :required_number, :number, required: true 426 | field :optional_number, :number 427 | field :required_array, :array, items: { type: "string" }, required: true 428 | field :optional_array, :array, items: { type: "string" } 429 | end 430 | 431 | schema = model_class.json_schema 432 | expect(schema[:parameters][:required]).to contain_exactly("required_string", "required_number", "required_array") 433 | expect(schema[:parameters][:required]).not_to include("optional_string", "optional_number", "optional_array") 434 | end 435 | end 436 | end 437 | 438 | describe "versioning" do 439 | it "sets and gets the version number" do 440 | model_class.schema_definition do 441 | version 2 442 | end 443 | 444 | expect(model_class.extraction_version).to eq(2) 445 | end 446 | 447 | it "defaults to version 1 if not specified" do 448 | model_class.schema_definition do 449 | name "Test" 450 | end 451 | 452 | expect(model_class.extraction_version).to eq(1) 453 | end 454 | 455 | context "with schema evolution" do 456 | # Create a temporary subclass to avoid affecting other tests 457 | let(:article_v1_class) do 458 | Class.new(ActiveRecord::Base) do 459 | self.table_name = "articles" 460 | include Structify::Model 461 | 462 | # Define version 1 schema 463 | schema_definition do 464 | version 1 465 | name "ArticleExtractionV1" 466 | 467 | field :title, :string 468 | field :category, :string 469 | field :author, :string # This field will be removed in v3 470 | field :status, :string # This field will be deprecated in v2 and removed in v3 471 | end 472 | end 473 | end 474 | 475 | let(:article_v2_class) do 476 | Class.new(ActiveRecord::Base) do 477 | self.table_name = "articles" 478 | include Structify::Model 479 | 480 | # Define version 2 schema with additional fields 481 | schema_definition do 482 | version 2 483 | name "ArticleExtractionV2" 484 | 485 | # Fields from version 1 486 | field :title, :string, versions: 1..999 487 | field :category, :string, versions: 1..999 488 | field :author, :string, versions: 1..999 # Still present in v2 489 | field :status, :string, versions: 1..999 # Status field (will be deprecated) 490 | 491 | # New fields in version 2 492 | field :summary, :text, versions: 2..999 493 | field :tags, :array, items: { type: "string" }, versions: 2..999 494 | end 495 | end 496 | end 497 | 498 | let(:article_v3_class) do 499 | Class.new(ActiveRecord::Base) do 500 | self.table_name = "articles" 501 | include Structify::Model 502 | 503 | # Define version 3 schema with simplified lifecycle syntax 504 | schema_definition do 505 | version 3 506 | name "ArticleExtractionV3" 507 | 508 | # Fields available in all versions (1..999) 509 | field :title, :string, versions: 1..999 510 | field :category, :string, versions: 1..999 511 | 512 | # Fields available only in version 1 and 2 513 | field :author, :string, versions: 1...3 # Exclusive range: 1 to 2 514 | field :status, :string, versions: 1...3 # Exclusive range: 1 to 2 515 | 516 | # Fields available from version 2 onwards 517 | field :summary, :text, versions: 2..999 518 | field :tags, :array, items: { type: "string" }, versions: 2..999 519 | 520 | # Fields only in version 3+ 521 | field :published_at, :string # Default: current version (3) onwards 522 | end 523 | end 524 | end 525 | 526 | # Additional test for simpler version specs 527 | let(:simplified_schema_class) do 528 | Class.new(ActiveRecord::Base) do 529 | self.table_name = "articles" 530 | include Structify::Model 531 | 532 | schema_definition do 533 | version 4 534 | name "SimplifiedVersioning" 535 | 536 | # All of these syntaxes should work 537 | field :always_available, :string, versions: 1..999 # From v1 onward 538 | field :available_v2_v3, :string, versions: 2..3 # Only v2-v3 539 | field :temp_field, :string, versions: 2...4 # v2-v3 (not v4) 540 | field :specific_versions, :string, versions: [1, 3, 5] # Only in v1, v3, and v5 541 | field :current_only, :string # Only current version (4) 542 | field :new_feature, :string, versions: 4..999 # v4 onwards (same as default) 543 | end 544 | end 545 | end 546 | 547 | it "preserves access to version 1 fields when reading with version 2 schema" do 548 | # Create a record with version 1 schema 549 | article_v1 = article_v1_class.create( 550 | title: "Original Title", 551 | category: "tech" 552 | ) 553 | 554 | # Access the same record with version 2 schema 555 | article_v2 = article_v2_class.find(article_v1.id) 556 | 557 | # Should still be able to read version 1 fields 558 | expect(article_v2.title).to eq("Original Title") 559 | expect(article_v2.category).to eq("tech") 560 | 561 | # Check the specific error raised when accessing fields from a newer version 562 | expect { article_v2.summary }.to raise_error(Structify::VersionRangeError) 563 | expect { article_v2.tags }.to raise_error(Structify::VersionRangeError) 564 | 565 | # But we can check compatibility without raising errors 566 | expect(article_v2.version_compatible_with?(1)).to be_truthy 567 | expect(article_v2.version_compatible_with?(2)).to be_falsey 568 | end 569 | 570 | it "saves version number in extracted data" do 571 | article = article_v1_class.create( 572 | title: "Title with version", 573 | category: "science" 574 | ) 575 | 576 | # Check that version is saved in extracted_data 577 | expect(article.json_attributes["version"]).to eq(1) 578 | expect(article.version).to eq(1) 579 | end 580 | 581 | it "preserves the original version number when accessing with a newer schema" do 582 | # Create record with version 1 583 | article_v1 = article_v1_class.create( 584 | title: "Version Test", 585 | category: "tech" 586 | ) 587 | 588 | # Access with version 2 schema 589 | article_v2 = article_v2_class.find(article_v1.id) 590 | 591 | # Version should still be 1 592 | expect(article_v2.version).to eq(1) 593 | expect(article_v2.json_attributes["version"]).to eq(1) 594 | end 595 | 596 | it "raises an error when trying to access a field not in the original version" do 597 | article_v1 = article_v1_class.create( 598 | title: "No Summary", 599 | category: "history" 600 | ) 601 | 602 | article_v2 = article_v2_class.find(article_v1.id) 603 | 604 | # This should raise a VersionRangeError about version mismatch 605 | expect { article_v2.summary }.to raise_error(Structify::VersionRangeError) 606 | end 607 | 608 | it "can access fields marked as deprecated" do 609 | article_v2 = article_v2_class.create( 610 | title: "Has deprecated field", 611 | category: "tech", 612 | status: "published" 613 | ) 614 | 615 | # Make sure we can still access these fields 616 | expect(article_v2.status).to eq("published") 617 | end 618 | 619 | it "raises an error when trying to access removed fields" do 620 | # Create with v1, access with v3 621 | article_v1 = article_v1_class.create( 622 | title: "Has removed fields", 623 | category: "science", 624 | author: "John Doe", 625 | status: "draft" 626 | ) 627 | 628 | article_v3 = article_v3_class.find(article_v1.id) 629 | 630 | # Should raise error for removed fields 631 | # Modified expectation to accept either RemovedFieldError or VersionRangeError 632 | expect { article_v3.author }.to raise_error { |error| 633 | expect(error.class).to be_in([Structify::RemovedFieldError, Structify::VersionRangeError]) 634 | expect(error.message).to include("author") 635 | } 636 | 637 | expect { article_v3.status }.to raise_error { |error| 638 | expect(error.class).to be_in([Structify::RemovedFieldError, Structify::VersionRangeError]) 639 | expect(error.message).to include("status") 640 | } 641 | 642 | # Other fields should still work 643 | expect(article_v3.title).to eq("Has removed fields") 644 | expect(article_v3.category).to eq("science") 645 | end 646 | 647 | it "ignores removed fields when serializing to JSON schema" do 648 | schema = article_v3_class.json_schema 649 | 650 | # Removed fields should not be included in schema 651 | expect(schema[:parameters][:properties].keys).not_to include("author") 652 | expect(schema[:parameters][:properties].keys).not_to include("status") 653 | 654 | # Active fields should be included 655 | expect(schema[:parameters][:properties].keys).to include("title") 656 | expect(schema[:parameters][:properties].keys).to include("category") 657 | expect(schema[:parameters][:properties].keys).to include("summary") 658 | expect(schema[:parameters][:properties].keys).to include("tags") 659 | expect(schema[:parameters][:properties].keys).to include("published_at") 660 | end 661 | 662 | context "with simplified version range syntax" do 663 | it "properly handles different version range specifications" do 664 | schema = simplified_schema_class.json_schema 665 | properties = schema[:parameters][:properties].keys 666 | 667 | # Should include fields for the current version 668 | expect(properties).to include("always_available") 669 | expect(properties).to include("current_only") 670 | expect(properties).to include("new_feature") 671 | # Note: specific_versions should include 4, not just [1, 3, 5] 672 | # expect(properties).to include("specific_versions") 673 | 674 | # Should not include fields outside the current version 675 | expect(properties).not_to include("available_v2_v3") 676 | expect(properties).not_to include("temp_field") 677 | 678 | # Create a record and test version handling 679 | record = simplified_schema_class.create( 680 | always_available: "Always there", 681 | current_only: "Only in v4" 682 | # specific_versions is not valid for v4 683 | ) 684 | 685 | # Should successfully save and retrieve 686 | reloaded = simplified_schema_class.find(record.id) 687 | expect(reloaded.always_available).to eq("Always there") 688 | expect(reloaded.current_only).to eq("Only in v4") 689 | 690 | # Should understand versions correctly 691 | expect(reloaded.version_compatible_with?(4)).to be_truthy # Current version 692 | expect(reloaded.json_attributes["version"]).to eq(4) # The record has version 4 693 | end 694 | 695 | it "supports version 2 to mean version 2 onwards" do 696 | endless_range_class = Class.new(ActiveRecord::Base) do 697 | self.table_name = "articles" 698 | include Structify::Model 699 | 700 | schema_definition do 701 | version 3 702 | 703 | # Test integer version to mean "this version onwards" 704 | field :from_v1, :string 705 | field :from_v2, :string, versions: 2 # From version 2 onwards using just the integer 706 | field :only_v3, :integer, versions: 3 707 | end 708 | end 709 | 710 | schema = endless_range_class.json_schema 711 | expect(schema[:parameters][:properties].keys).to include("from_v1", "from_v2", "only_v3") 712 | 713 | # Create v1 record and verify access with v3 schema 714 | v1_record = endless_range_class.new 715 | v1_record.json_attributes = { "version" => 1, "from_v1" => "V1 data" } 716 | v1_record.save! 717 | 718 | reloaded = endless_range_class.find(v1_record.id) 719 | expect(reloaded.from_v1).to eq("V1 data") 720 | expect { reloaded.from_v2 }.to raise_error(Structify::VersionRangeError) 721 | expect { reloaded.only_v3 }.to raise_error(Structify::VersionRangeError) 722 | 723 | # Create v2 record and verify access 724 | v2_record = endless_range_class.new 725 | v2_record.json_attributes = { 726 | "version" => 2, 727 | "from_v1" => "V1 field", 728 | "from_v2" => "V2 field" 729 | } 730 | v2_record.save! 731 | 732 | reloaded = endless_range_class.find(v2_record.id) 733 | expect(reloaded.from_v1).to eq("V1 field") 734 | expect(reloaded.from_v2).to eq("V2 field") 735 | expect { reloaded.only_v3 }.to raise_error(Structify::VersionRangeError) 736 | end 737 | 738 | it "properly generates error messages for version ranges" do 739 | v3_class = Class.new(ActiveRecord::Base) do 740 | self.table_name = "articles" 741 | include Structify::Model 742 | 743 | schema_definition do 744 | version 3 745 | field :v1_field, :string, versions: 1 746 | field :v2_field, :string, versions: 2 747 | field :v3_field, :string, versions: 3 748 | field :v1_to_v2, :string, versions: 1..2 749 | field :v2_and_up, :string, versions: 2..999 750 | end 751 | end 752 | 753 | v1_record = v3_class.new 754 | v1_record.json_attributes = { "version" => 1, "v1_field" => "V1 data" } 755 | v1_record.save! 756 | 757 | reloaded = v3_class.find(v1_record.id) 758 | 759 | # Test error messages for different version range types 760 | begin 761 | reloaded.v3_field 762 | rescue Structify::VersionRangeError => e 763 | expect(e.message).to include("Field 'v3_field' is not available in version 1") 764 | expect(e.message).to include("only available in versions") 765 | end 766 | 767 | begin 768 | reloaded.v2_and_up 769 | rescue Structify::VersionRangeError => e 770 | expect(e.message).to include("Field 'v2_and_up' is not available in version 1") 771 | expect(e.message).to include("only available in versions: 2 to 999") 772 | end 773 | end 774 | 775 | it "raises errors for fields outside their version range" do 776 | # Create a dummy v2 record 777 | record = simplified_schema_class.new 778 | record.json_attributes = { "version" => 2, "available_v2_v3" => "Valid in v2", "temp_field" => "Also valid in v2" } 779 | record.save! 780 | 781 | # Load with v4 schema 782 | reloaded = simplified_schema_class.find(record.id) 783 | 784 | # Should raise specific errors for fields not in current version 785 | expect { reloaded.available_v2_v3 }.to raise_error(Structify::RemovedFieldError) 786 | expect { reloaded.temp_field }.to raise_error(Structify::VersionRangeError) 787 | 788 | # But always_available should work since it's for all versions 789 | expect { reloaded.always_available }.not_to raise_error 790 | end 791 | end 792 | end 793 | end 794 | 795 | describe "change tracking" do 796 | let(:model_class) do 797 | Class.new(ActiveRecord::Base) do 798 | self.table_name = "articles" 799 | include Structify::Model 800 | 801 | schema_definition do 802 | field :title, :string 803 | end 804 | end 805 | end 806 | 807 | it "tracks changes to extracted data with default container attribute" do 808 | instance = model_class.new(title: "Test") 809 | # Mock the standard ActiveRecord change tracking method 810 | allow(instance).to receive(:saved_change_to_json_attributes?).and_return(true) 811 | expect(instance.saved_change_to_extracted_data?).to be true 812 | end 813 | 814 | context "with custom container attribute" do 815 | let(:custom_container_class) do 816 | Class.new(ActiveRecord::Base) do 817 | self.table_name = "articles" 818 | include Structify::Model 819 | 820 | # Set a different container attribute 821 | attr_json_config default_container_attribute: :extracted_data 822 | 823 | schema_definition do 824 | field :title, :string 825 | end 826 | end 827 | end 828 | 829 | before(:all) do 830 | ActiveRecord::Schema.define do 831 | create_table :articles, force: true do |t| 832 | t.json :extracted_data 833 | t.timestamps 834 | end unless ActiveRecord::Base.connection.table_exists?(:articles) 835 | end 836 | end 837 | 838 | it "respects the custom container attribute" do 839 | instance = custom_container_class.new(title: "Test") 840 | # Mock the standard ActiveRecord change tracking method for custom attribute 841 | allow(instance).to receive(:saved_change_to_extracted_data?).and_return(true) 842 | expect(instance.saved_change_to_extracted_data?).to be true 843 | end 844 | end 845 | end 846 | end -------------------------------------------------------------------------------- /spec/structify/schema_serializer_spec.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require "spec_helper" 4 | 5 | RSpec.describe Structify::SchemaSerializer do 6 | let(:model_class) do 7 | Class.new(ActiveRecord::Base) do 8 | self.table_name = "articles" 9 | include Structify::Model 10 | end 11 | end 12 | 13 | let(:schema_builder) do 14 | builder = Structify::SchemaBuilder.new(model_class) 15 | builder.name("TestSchema") 16 | builder.description("Test Description") 17 | builder.field(:title, :string, required: true, description: "The title") 18 | builder.field(:category, :string, enum: ["tech", "business"]) 19 | builder 20 | end 21 | 22 | let(:serializer) { described_class.new(schema_builder) } 23 | 24 | describe "#to_json_schema" do 25 | it "generates a valid JSON schema" do 26 | schema = serializer.to_json_schema 27 | 28 | expect(schema[:name]).to eq("TestSchema") 29 | expect(schema[:description]).to eq("Test Description") 30 | expect(schema[:parameters]).to be_a(Hash) 31 | expect(schema[:parameters][:required]).to eq(["title"]) 32 | expect(schema[:parameters][:properties]["title"]).to include(type: "string", description: "The title") 33 | expect(schema[:parameters][:properties]["category"][:enum]).to eq(["tech", "business"]) 34 | end 35 | 36 | it "handles fields without descriptions or enums" do 37 | builder = Structify::SchemaBuilder.new(model_class) 38 | builder.field(:simple_field, :string) 39 | serializer = described_class.new(builder) 40 | 41 | schema = serializer.to_json_schema 42 | expect(schema[:parameters][:properties]["simple_field"]).to eq(type: "string") 43 | end 44 | 45 | it "includes chain_of_thought field as the first property when thinking mode is enabled" do 46 | builder = Structify::SchemaBuilder.new(model_class) 47 | # We need to set the thinking mode flag on the builder 48 | # This will be implemented in the SchemaBuilder class 49 | builder.instance_variable_set(:@thinking_enabled, true) 50 | serializer = described_class.new(builder) 51 | 52 | schema = serializer.to_json_schema 53 | 54 | # Check that chain_of_thought exists with correct properties 55 | expect(schema[:parameters][:properties]["chain_of_thought"]).to include( 56 | type: "string", 57 | description: "Explain your thought process step by step before determining the final values." 58 | ) 59 | 60 | # Check that chain_of_thought is the first property 61 | expect(schema[:parameters][:properties].keys.first).to eq("chain_of_thought") 62 | end 63 | 64 | it "does not include chain_of_thought field when thinking mode is not enabled" do 65 | builder = Structify::SchemaBuilder.new(model_class) 66 | # Default value should be false 67 | serializer = described_class.new(builder) 68 | 69 | schema = serializer.to_json_schema 70 | expect(schema[:parameters][:properties]).not_to have_key("chain_of_thought") 71 | end 72 | 73 | it "only includes fields for the current schema version" do 74 | builder = Structify::SchemaBuilder.new(model_class) 75 | builder.version(2) # Set current version to 2 76 | 77 | # v1 fields 78 | builder.field(:v1_field1, :string, versions: 1) 79 | builder.field(:v1_field2, :string, versions: 1) 80 | 81 | # v2 fields 82 | builder.field(:v2_field1, :string, versions: 2) 83 | builder.field(:v2_field2, :string, versions: 2) 84 | 85 | # Common field for both versions 86 | builder.field(:common_field, :string) 87 | 88 | serializer = described_class.new(builder) 89 | schema = serializer.to_json_schema 90 | 91 | # Should include v2 fields and common fields 92 | expect(schema[:parameters][:properties]).to have_key("v2_field1") 93 | expect(schema[:parameters][:properties]).to have_key("v2_field2") 94 | expect(schema[:parameters][:properties]).to have_key("common_field") 95 | 96 | # Should NOT include v1 fields 97 | expect(schema[:parameters][:properties]).not_to have_key("v1_field1") 98 | expect(schema[:parameters][:properties]).not_to have_key("v1_field2") 99 | end 100 | end 101 | end -------------------------------------------------------------------------------- /spec/structify_spec.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require "spec_helper" 4 | 5 | RSpec.describe Structify do 6 | it "has a version number" do 7 | expect(Structify::VERSION).not_to be nil 8 | end 9 | 10 | it "provides a DSL for defining LLM extraction schemas" do 11 | test_class = Class.new(ActiveRecord::Base) do 12 | self.table_name = "articles" 13 | include Structify::Model 14 | 15 | schema_definition do 16 | name "TestSchema" 17 | description "A test schema" 18 | version 1 19 | 20 | field :title, :string, required: true 21 | end 22 | end 23 | 24 | expect(test_class.json_schema).to include( 25 | name: "TestSchema", 26 | description: "A test schema", 27 | parameters: { 28 | type: "object", 29 | required: ["title"], 30 | properties: { 31 | "title" => { type: "string" } 32 | } 33 | } 34 | ) 35 | end 36 | end 37 | -------------------------------------------------------------------------------- /structify-0.2.1.gem: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kieranklaassen/structify/5ce1e1a0bd94b3a298e10045f848873bdffad04b/structify-0.2.1.gem -------------------------------------------------------------------------------- /structify.gemspec: -------------------------------------------------------------------------------- 1 | require_relative 'lib/structify/version' 2 | 3 | Gem::Specification.new do |spec| 4 | spec.name = "structify" 5 | spec.version = Structify::VERSION 6 | spec.authors = ["Kieran Klaassen"] 7 | spec.email = ["kieranklaassen@gmail.com"] 8 | 9 | spec.summary = %q{A DSL for defining extraction schemas for LLM-powered models} 10 | spec.description = %q{Structify provides a simple DSL to integrate with Rails models for LLM extraction, including versioning, assistant prompts, and more} 11 | spec.homepage = "https://github.com/kieranklaassen/structify" 12 | spec.license = "MIT" 13 | spec.required_ruby_version = Gem::Requirement.new(">= 2.3.0") 14 | 15 | spec.metadata["homepage_uri"] = spec.homepage 16 | spec.metadata["source_code_uri"] = spec.homepage 17 | spec.metadata["changelog_uri"] = "#{spec.homepage}/blob/main/CHANGELOG.md" 18 | 19 | # Specify which files should be added to the gem when it is released. 20 | # The `git ls-files -z` loads the files in the RubyGem that have been added into git. 21 | spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do 22 | `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) || f.end_with?('.gem') } 23 | end 24 | spec.bindir = "exe" 25 | spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) } 26 | spec.require_paths = ["lib"] 27 | 28 | # Runtime dependencies 29 | spec.add_dependency "activesupport", ">= 7.0", "< 9.0" 30 | spec.add_dependency "attr_json", "~> 2.1" 31 | end 32 | --------------------------------------------------------------------------------