├── .dockerignore ├── .gitignore ├── Dockerfile ├── README.md ├── SSLConfigs.ini ├── data ├── paul_graham │ └── paul_graham_essay.txt ├── scotch_review.csv └── state_of_the_union.txt ├── demo ├── .ipynb_checkpoints │ └── sql_demo-checkpoint.ipynb ├── README.md ├── SQLSyntax.md ├── cloud_sql_demo.ipynb ├── hybrid-search.ipynb ├── iris_notebook_container.ipynb ├── langchain_demo.ipynb ├── llama_demo.ipynb └── sql_demo.ipynb ├── docker-compose.yml └── requirements.txt /.dockerignore: -------------------------------------------------------------------------------- 1 | demo/.ipynb_checkpoints 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | .DS_STORE 3 | /demo/.DS_STORE 4 | /demo/.ipynb_checkpoints 5 | /external/.DS_STORE -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM jupyter/base-notebook 2 | 3 | COPY --chown=jovyan demo "${HOME}/demo" 4 | COPY --chown=jovyan data "${HOME}/data" 5 | COPY requirements.txt "${HOME}" 6 | 7 | RUN pip install -r requirements.txt 8 | RUN pip install jupyterlab-execute-time 9 | 10 | # COPY lib/intersystems_irispython-5.0.0-6545-6545-cp36.cp37.cp38.cp39.cp310.cp311.cp312-cp36m.cp37m.cp38.cp39.cp310.cp311.cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl /tmp/lib/intersystems_irispython-5.0.0-6545-cp36.cp37.cp38.cp39.cp310.cp311.cp312-cp36m.cp37m.cp38.cp39.cp310.cp311.cp312-manylinux2014_x86_64.whl 11 | # RUN pip install /tmp/lib/intersystems_irispython-5.0.0-6545-cp36.cp37.cp38.cp39.cp310.cp311.cp312-cp36m.cp37m.cp38.cp39.cp310.cp311.cp312-manylinux2014_x86_64.whl 12 | 13 | COPY --chown=jovyan SSLConfigs.ini "/usr/cert-demo/" 14 | ENV ISC_SSLconfigurations="/usr/cert-demo/SSLConfigs.ini" 15 | 16 | # run without password 17 | CMD start.sh jupyter lab --LabApp.token='' 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # InterSystems IRIS Vector Search 2 | 3 | We've added a powerful [Vector Search capability to the InterSystems IRIS Data Platform](https://www.intersystems.com/news/iris-vector-search-support-ai-applications/), to help you innovate faster and build intelligent applications powered by Generative AI. At the center of the new capability is a new [`VECTOR` native datatype](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_datatype&ADJUST=1) for IRIS SQL, along with [similarity functions](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch&ADJUST=1) that leverage optimized chipset instructions (SIMD). 4 | 5 | **InterSystems IRIS 2025.1** introduces key performance improvements and bug fixes, along with a powerful new feature: a **disk-based Approximate Nearest Neighbors (ANN) index** for fast vector search.This new indexing method significantly improves search performance on large vector datasets (typically over 100K vectors), making IRIS even more effective for AI and ML workloads.See [the docs](https://docs.intersystems.com/iris20251/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_index) for more information on how to define and use the index. 6 | 7 | The same Vector Search capability is now also available with [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/). Check out [`cloud_sql_demo.ipynb`](demo/cloud_sql_demo.ipynb) for instructions on setting up a connection from your Jupyter notebook. The notebooks exploring langchain and llama-index also support connecting to Cloud SQL deployments. 8 | 9 | This repository offers code samples to get you started with the new features, and we'll continue to add more, but encourage you to let us know about your own experiments on the [InterSystems Developer Community](https://community.intersystems.com). At the bottom of this page, you'll find links to a few demo repositories we liked a lot! 10 | 11 | 12 | ## InterSystems IRIS Vector Search Quickstart 13 | 14 | 1. Clone the repo 15 | ```Shell 16 | git clone https://github.com/intersystems-community/iris-vector-search.git 17 | ``` 18 | 19 | 20 | ### Using a Jupyter container 21 | 22 | If you prefer just running the demos from your local Python environment, skip to [Using your local Python environment](#using-your-local-python-environment). 23 | 24 | 25 | 2. For [`langchain_demo.ipynb`](demo/langchain_demo.ipynb) and [`llama_demo.ipynb`](demo/llama_demo.ipynb), you need an [OpenAI API Key](https://platform.openai.com/api-keys). Update the corresponding entry in `docker-compose.yml`: 26 | ``` 27 | OPENAI_API_KEY: xxxxxxxxx 28 | ``` 29 | 30 | 3. Start the Docker containers (one for IRIS, one for Jupyter): 31 | ```Shell 32 | docker-compose up 33 | ``` 34 | 35 | Please note that building the container involves downloading the `sentence_transformers` module, which measures over 2GB! 36 | 37 | ### Using your local Python environment 38 | 39 | 2. Install IRIS Community Edtion in a container: 40 | ```Shell 41 | docker run -d --name iris-comm -p 1972:1972 -p 52773:52773 -e IRIS_PASSWORD=demo -e IRIS_USERNAME=demo intersystemsdc/iris-community:latest 42 | ``` 43 | :information_source: After running the above command, you can access the System Management Portal via http://localhost:52773/csp/sys/UtilHome.csp. Please note you may need to [configure your web server separately](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCGI_private_web&ADJUST=1) when using another product edition. 44 | 45 | 3. Create a Python environment and activate it (conda, venv or however you wish) For example: 46 | 47 | ```Shell 48 | conda create --name iris-vector-search python=3.10 49 | conda activate iris-vector-search 50 | ``` 51 | 52 | 4. Install packages for all demos: 53 | ```Shell 54 | pip install -r requirements.txt 55 | ``` 56 | 57 | 5. For [`langchain_demo.ipynb`](demo/langchain_demo.ipynb) and [`llama_demo.ipynb`](demo/llama_demo.ipynb), you need an [OpenAI API Key](https://platform.openai.com/api-keys). Create a `.env` file in this repo to store the key: 58 | ``` 59 | OPENAI_API_KEY=xxxxxxxxx 60 | ``` 61 | 62 | 6. The demos in this repository are formatted as Jupyter notebooks. To run them, just start Jupyter and navigate to the `/demo/` folder: 63 | 64 | ```Shell 65 | jupyter lab 66 | ``` 67 | 68 | ## Basic Demos 69 | 70 | ### [sql_demo.ipynb](demo/sql_demo.ipynb) 71 | 72 | IRIS SQL now supports vector search (with other columns)! In this demo, we're searching a whiskey dataset for whiskeys that are priced < $100 and have a taste description similar to "earthy and creamy taste". 73 | 74 | ### [langchain_demo.ipynb](demo/langchain_demo.ipynb) 75 | 76 | IRIS now has a langchain integration as a VectorDB! In this demo, we use the langchain framework with IRIS to ingest and search through a document. 77 | 78 | ### [llama_demo.ipynb](demo/llama_demo.ipynb) 79 | 80 | IRIS now has a llama_index integration as a VectorDB! In this demo, we use the llama_index framework with IRIS to ingest and search through a document. 81 | 82 | ### [cloud_sql_demo.ipynb](demo/cloud_sql_demo.ipynb) 83 | 84 | This notebook describes how to tap into the Vector Search capability when using [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/) instead of a local install or container. It covers the additional settings for establishing a secure connection to a Cloud SQL deployment. 85 | 86 | 87 | ## Which to use? 88 | 89 | If you need to use hybrid search (similarity search with other columns), use IRIS SQL. 90 | 91 | If you're building a genAI app that uses a variety of tools (agents, chained reasoning, api calls), go for langchain. 92 | 93 | If you're building a RAG app, go for the approach llama_index. 94 | 95 | Feel free to contact Fan / Thomas or file an issue in this GitHub repository if you have any questions! 96 | 97 | 98 | ## More Demos / References: 99 | 100 | ### [Voice-controlled shopping cart](https://github.com/intersystems-dach/RAG-demo) 101 | Neat shopping cart demo that leverages Vector Search to match your voice-recorded order to available items. 102 | 103 | ### [NLP Queries on Youtube Audio Transcription](https://github.com/jrpereirajr/intersystems-iris-notebooks/blob/main/vector/langchain-iris/nlp_queries_on_youtube_audio_transcription_dataset.ipynb) 104 | Uses langchain-iris to search Youtube Audio transcriptions 105 | 106 | ### [langchain-iris demo](https://github.com/caretdev/langchain-iris/blob/main/demo.ipynb) 107 | Original IRIS langhain demo, that runs the containerized IRIS in the notebook 108 | 109 | ### [llama-iris demo](https://github.com/caretdev/llama-iris/blob/main/demo.ipynb) 110 | Original IRIS llama_index demo, that runs the containerized IRIS in the notebook 111 | 112 | ### [InterSystems Documentation](https://docs.intersystems.com/) 113 | Official page for InterSystems Documentation 114 | -------------------------------------------------------------------------------- /SSLConfigs.ini: -------------------------------------------------------------------------------- 1 | [CloudSQL] 2 | CertFile=/usr/cert-demo/certificateSQLaaS.pem 3 | KeyType=2 4 | Protocols=28 5 | CipherList=ALL:!aNULL:!eNULL:!EXP:!SSLv2 6 | VerifyPeer=0 7 | VerifyDepth=9 8 | -------------------------------------------------------------------------------- /data/state_of_the_union.txt: -------------------------------------------------------------------------------- 1 | Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. 2 | 3 | Last year COVID-19 kept us apart. This year we are finally together again. 4 | 5 | Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 6 | 7 | With a duty to one another to the American people to the Constitution. 8 | 9 | And with an unwavering resolve that freedom will always triumph over tyranny. 10 | 11 | Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 12 | 13 | He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 14 | 15 | He met the Ukrainian people. 16 | 17 | From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 18 | 19 | Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. 20 | 21 | In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. 22 | 23 | Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. 24 | 25 | Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. 26 | 27 | Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. 28 | 29 | They keep moving. 30 | 31 | And the costs and the threats to America and the world keep rising. 32 | 33 | That’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. 34 | 35 | The United States is a member along with 29 other nations. 36 | 37 | It matters. American diplomacy matters. American resolve matters. 38 | 39 | Putin’s latest attack on Ukraine was premeditated and unprovoked. 40 | 41 | He rejected repeated efforts at diplomacy. 42 | 43 | He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did. 44 | 45 | We prepared extensively and carefully. 46 | 47 | We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 48 | 49 | I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression. 50 | 51 | We countered Russia’s lies with truth. 52 | 53 | And now that he has acted the free world is holding him accountable. 54 | 55 | Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. 56 | 57 | We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. 58 | 59 | Together with our allies –we are right now enforcing powerful economic sanctions. 60 | 61 | We are cutting off Russia’s largest banks from the international financial system. 62 | 63 | Preventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless. 64 | 65 | We are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come. 66 | 67 | Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. 68 | 69 | The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. 70 | 71 | We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. 72 | 73 | And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value. 74 | 75 | The Russian stock market has lost 40% of its value and trading remains suspended. Russia’s economy is reeling and Putin alone is to blame. 76 | 77 | Together with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance. 78 | 79 | We are giving more than $1 Billion in direct assistance to Ukraine. 80 | 81 | And we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering. 82 | 83 | Let me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine. 84 | 85 | Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west. 86 | 87 | For that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. 88 | 89 | As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power. 90 | 91 | And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them. 92 | 93 | Putin has unleashed violence and chaos. But while he may make gains on the battlefield – he will pay a continuing high price over the long run. 94 | 95 | And a proud Ukrainian people, who have known 30 years of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards. 96 | 97 | To all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. 98 | 99 | And I’m taking robust action to make sure the pain of our sanctions is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. 100 | 101 | Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. 102 | 103 | America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. 104 | 105 | These steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. 106 | 107 | But I want you to know that we are going to be okay. 108 | 109 | When the history of this era is written Putin’s war on Ukraine will have left Russia weaker and the rest of the world stronger. 110 | 111 | While it shouldn’t have taken something so terrible for people around the world to see what’s at stake now everyone sees it clearly. 112 | 113 | We see the unity among leaders of nations and a more unified Europe a more unified West. And we see unity among the people who are gathering in cities in large crowds around the world even in Russia to demonstrate their support for Ukraine. 114 | 115 | In the battle between democracy and autocracy, democracies are rising to the moment, and the world is clearly choosing the side of peace and security. 116 | 117 | This is a real test. It’s going to take time. So let us continue to draw inspiration from the iron will of the Ukrainian people. 118 | 119 | To our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you. 120 | 121 | Putin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people. 122 | 123 | He will never extinguish their love of freedom. He will never weaken the resolve of the free world. 124 | 125 | We meet tonight in an America that has lived through two of the hardest years this nation has ever faced. 126 | 127 | The pandemic has been punishing. 128 | 129 | And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. 130 | 131 | I understand. 132 | 133 | I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. 134 | 135 | That’s why one of the first things I did as President was fight to pass the American Rescue Plan. 136 | 137 | Because people were hurting. We needed to act, and we did. 138 | 139 | Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis. 140 | 141 | It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans. 142 | 143 | Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. 144 | 145 | And as my Dad used to say, it gave people a little breathing room. 146 | 147 | And unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind. 148 | 149 | And it worked. It created jobs. Lots of jobs. 150 | 151 | In fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year 152 | than ever before in the history of America. 153 | 154 | Our economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long. 155 | 156 | For the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. 157 | 158 | But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. 159 | 160 | Vice President Harris and I ran for office with a new economic vision for America. 161 | 162 | Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up 163 | and the middle out, not from the top down. 164 | 165 | Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. 166 | 167 | America used to have the best roads, bridges, and airports on Earth. 168 | 169 | Now our infrastructure is ranked 13th in the world. 170 | 171 | We won’t be able to compete for the jobs of the 21st Century if we don’t fix that. 172 | 173 | That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. 174 | 175 | This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. 176 | 177 | We’re done talking about infrastructure weeks. 178 | 179 | We’re going to have an infrastructure decade. 180 | 181 | It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China. 182 | 183 | As I’ve told Xi Jinping, it is never a good bet to bet against the American people. 184 | 185 | We’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. 186 | 187 | And we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice. 188 | 189 | We’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. 190 | 191 | 4,000 projects have already been announced. 192 | 193 | And tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair. 194 | 195 | When we use taxpayer dollars to rebuild America – we are going to Buy American: buy American products to support American jobs. 196 | 197 | The federal government spends about $600 Billion a year to keep the country safe and secure. 198 | 199 | There’s been a law on the books for almost a century 200 | to make sure taxpayers’ dollars support American jobs and businesses. 201 | 202 | Every Administration says they’ll do it, but we are actually doing it. 203 | 204 | We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. 205 | 206 | But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors. 207 | 208 | That’s why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing. 209 | 210 | Let me give you one example of why it’s so important to pass it. 211 | 212 | If you travel 20 miles east of Columbus, Ohio, you’ll find 1,000 empty acres of land. 213 | 214 | It won’t look like much, but if you stop and look closely, you’ll see a “Field of dreams,” the ground on which America’s future will be built. 215 | 216 | This is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”. 217 | 218 | Up to eight state-of-the-art factories in one place. 10,000 new good-paying jobs. 219 | 220 | Some of the most sophisticated manufacturing in the world to make computer chips the size of a fingertip that power the world and our everyday lives. 221 | 222 | Smartphones. The Internet. Technology we have yet to invent. 223 | 224 | But that’s just the beginning. 225 | 226 | Intel’s CEO, Pat Gelsinger, who is here tonight, told me they are ready to increase their investment from 227 | $20 billion to $100 billion. 228 | 229 | That would be one of the biggest investments in manufacturing in American history. 230 | 231 | And all they’re waiting for is for you to pass this bill. 232 | 233 | So let’s not wait any longer. Send it to my desk. I’ll sign it. 234 | 235 | And we will really take off. 236 | 237 | And Intel is not alone. 238 | 239 | There’s something happening in America. 240 | 241 | Just look around and you’ll see an amazing story. 242 | 243 | The rebirth of the pride that comes from stamping products “Made In America.” The revitalization of American manufacturing. 244 | 245 | Companies are choosing to build new factories here, when just a few years ago, they would have built them overseas. 246 | 247 | That’s what is happening. Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country. 248 | 249 | GM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan. 250 | 251 | All told, we created 369,000 new manufacturing jobs in America just last year. 252 | 253 | Powered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight. 254 | 255 | As Ohio Senator Sherrod Brown says, “It’s time to bury the label “Rust Belt.” 256 | 257 | It’s time. 258 | 259 | But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. 260 | 261 | Inflation is robbing them of the gains they might otherwise feel. 262 | 263 | I get it. That’s why my top priority is getting prices under control. 264 | 265 | Look, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories. 266 | 267 | The pandemic also disrupted global supply chains. 268 | 269 | When factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up. 270 | 271 | Look at cars. 272 | 273 | Last year, there weren’t enough semiconductors to make all the cars that people wanted to buy. 274 | 275 | And guess what, prices of automobiles went up. 276 | 277 | So—we have a choice. 278 | 279 | One way to fight inflation is to drive down wages and make Americans poorer. 280 | 281 | I have a better plan to fight inflation. 282 | 283 | Lower your costs, not your wages. 284 | 285 | Make more cars and semiconductors in America. 286 | 287 | More infrastructure and innovation in America. 288 | 289 | More goods moving faster and cheaper in America. 290 | 291 | More jobs where you can earn a good living in America. 292 | 293 | And instead of relying on foreign supply chains, let’s make it in America. 294 | 295 | Economists call it “increasing the productive capacity of our economy.” 296 | 297 | I call it building a better America. 298 | 299 | My plan to fight inflation will lower your costs and lower the deficit. 300 | 301 | 17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And here’s the plan: 302 | 303 | First – cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis. 304 | 305 | He and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make. 306 | 307 | But drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshua’s mom. 308 | 309 | Imagine what it’s like to look at your child who needs insulin and have no idea how you’re going to pay for it. 310 | 311 | What it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be. 312 | 313 | Joshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy. 314 | 315 | For Joshua, and for the 200,000 other young people with Type 1 diabetes, let’s cap the cost of insulin at $35 a month so everyone can afford it. 316 | 317 | Drug companies will still do very well. And while we’re at it let Medicare negotiate lower prices for prescription drugs, like the VA already does. 318 | 319 | Look, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent. 320 | 321 | Second – cut energy costs for families an average of $500 a year by combatting climate change. 322 | 323 | Let’s provide investments and tax credits to weatherize your homes and businesses to be energy efficient and you get a tax credit; double America’s clean energy production in solar, wind, and so much more; lower the price of electric vehicles, saving you another $80 a month because you’ll never have to pay at the gas pump again. 324 | 325 | Third – cut the cost of child care. Many families pay up to $14,000 a year for child care per child. 326 | 327 | Middle-class and working families shouldn’t have to pay more than 7% of their income for care of young children. 328 | 329 | My plan will cut the cost in half for most families and help parents, including millions of women, who left the workforce during the pandemic because they couldn’t afford child care, to be able to get back to work. 330 | 331 | My plan doesn’t stop there. It also includes home and long-term care. More affordable housing. And Pre-K for every 3- and 4-year-old. 332 | 333 | All of these will lower costs. 334 | 335 | And under my plan, nobody earning less than $400,000 a year will pay an additional penny in new taxes. Nobody. 336 | 337 | The one thing all Americans agree on is that the tax system is not fair. We have to fix it. 338 | 339 | I’m not looking to punish anyone. But let’s make sure corporations and the wealthiest Americans start paying their fair share. 340 | 341 | Just last year, 55 Fortune 500 corporations earned $40 billion in profits and paid zero dollars in federal income tax. 342 | 343 | That’s simply not fair. That’s why I’ve proposed a 15% minimum tax rate for corporations. 344 | 345 | We got more than 130 countries to agree on a global minimum tax rate so companies can’t get out of paying their taxes at home by shipping jobs and factories overseas. 346 | 347 | That’s why I’ve proposed closing loopholes so the very wealthy don’t pay a lower tax rate than a teacher or a firefighter. 348 | 349 | So that’s my plan. It will grow the economy and lower costs for families. 350 | 351 | So what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation. 352 | 353 | My plan will not only lower costs to give families a fair shot, it will lower the deficit. 354 | 355 | The previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted. 356 | 357 | But in my administration, the watchdogs have been welcomed back. 358 | 359 | We’re going after the criminals who stole billions in relief money meant for small businesses and millions of Americans. 360 | 361 | And tonight, I’m announcing that the Justice Department will name a chief prosecutor for pandemic fraud. 362 | 363 | By the end of this year, the deficit will be down to less than half what it was before I took office. 364 | 365 | The only president ever to cut the deficit by more than one trillion dollars in a single year. 366 | 367 | Lowering your costs also means demanding more competition. 368 | 369 | I’m a capitalist, but capitalism without competition isn’t capitalism. 370 | 371 | It’s exploitation—and it drives up prices. 372 | 373 | When corporations don’t have to compete, their profits go up, your prices go up, and small businesses and family farmers and ranchers go under. 374 | 375 | We see it happening with ocean carriers moving goods in and out of America. 376 | 377 | During the pandemic, these foreign-owned companies raised prices by as much as 1,000% and made record profits. 378 | 379 | Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. 380 | 381 | And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. 382 | 383 | That ends on my watch. 384 | 385 | Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. 386 | 387 | We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. 388 | 389 | Let’s pass the Paycheck Fairness Act and paid leave. 390 | 391 | Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. 392 | 393 | Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges. 394 | 395 | And let’s pass the PRO Act when a majority of workers want to form a union—they shouldn’t be stopped. 396 | 397 | When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America. 398 | 399 | For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. 400 | 401 | And I know you’re tired, frustrated, and exhausted. 402 | 403 | But I also know this. 404 | 405 | Because of the progress we’ve made, because of your resilience and the tools we have, tonight I can say 406 | we are moving forward safely, back to more normal routines. 407 | 408 | We’ve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July. 409 | 410 | Just a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines. 411 | 412 | Under these new guidelines, most Americans in most of the country can now be mask free. 413 | 414 | And based on the projections, more of the country will reach that point across the next couple of weeks. 415 | 416 | Thanks to the progress we have made this past year, COVID-19 need no longer control our lives. 417 | 418 | I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. 419 | 420 | We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. 421 | 422 | Here are four common sense steps as we move forward safely. 423 | 424 | First, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. 425 | 426 | We will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. 427 | 428 | The scientists are working hard to get that done and we’ll be ready with plenty of vaccines when they do. 429 | 430 | We’re also ready with anti-viral treatments. If you get COVID-19, the Pfizer pill reduces your chances of ending up in the hospital by 90%. 431 | 432 | We’ve ordered more of these pills than anyone in the world. And Pfizer is working overtime to get us 1 Million pills this month and more than double that next month. 433 | 434 | And we’re launching the “Test to Treat” initiative so people can get tested at a pharmacy, and if they’re positive, receive antiviral pills on the spot at no cost. 435 | 436 | If you’re immunocompromised or have some other vulnerability, we have treatments and free high-quality masks. 437 | 438 | We’re leaving no one behind or ignoring anyone’s needs as we move forward. 439 | 440 | And on testing, we have made hundreds of millions of tests available for you to order for free. 441 | 442 | Even if you already ordered free tests tonight, I am announcing that you can order more from covidtests.gov starting next week. 443 | 444 | Second – we must prepare for new variants. Over the past year, we’ve gotten much better at detecting new variants. 445 | 446 | If necessary, we’ll be able to deploy new vaccines within 100 days instead of many more months or years. 447 | 448 | And, if Congress provides the funds we need, we’ll have new stockpiles of tests, masks, and pills ready if needed. 449 | 450 | I cannot promise a new variant won’t come. But I can promise you we’ll do everything within our power to be ready if it does. 451 | 452 | Third – we can end the shutdown of schools and businesses. We have the tools we need. 453 | 454 | It’s time for Americans to get back to work and fill our great downtowns again. People working from home can feel safe to begin to return to the office. 455 | 456 | We’re doing that here in the federal government. The vast majority of federal workers will once again work in person. 457 | 458 | Our schools are open. Let’s keep it that way. Our kids need to be in school. 459 | 460 | And with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely. 461 | 462 | We achieved this because we provided free vaccines, treatments, tests, and masks. 463 | 464 | Of course, continuing this costs money. 465 | 466 | I will soon send Congress a request. 467 | 468 | The vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly. 469 | 470 | Fourth, we will continue vaccinating the world. 471 | 472 | We’ve sent 475 Million vaccine doses to 112 countries, more than any other nation. 473 | 474 | And we won’t stop. 475 | 476 | We have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. 477 | 478 | Let’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease. 479 | 480 | Let’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans. 481 | 482 | We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. 483 | 484 | I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. 485 | 486 | They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. 487 | 488 | Officer Mora was 27 years old. 489 | 490 | Officer Rivera was 22. 491 | 492 | Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. 493 | 494 | I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. 495 | 496 | I’ve worked on these issues a long time. 497 | 498 | I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. 499 | 500 | So let’s not abandon our streets. Or choose between safety and equal justice. 501 | 502 | Let’s come together to protect our communities, restore trust, and hold law enforcement accountable. 503 | 504 | That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. 505 | 506 | That’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope. 507 | 508 | We should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities. 509 | 510 | I ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe. 511 | 512 | And I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and can’t be traced. 513 | 514 | And I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon? 515 | 516 | Ban assault weapons and high-capacity magazines. 517 | 518 | Repeal the liability shield that makes gun manufacturers the only industry in America that can’t be sued. 519 | 520 | These laws don’t infringe on the Second Amendment. They save lives. 521 | 522 | The most fundamental right in America is the right to vote – and to have it counted. And it’s under assault. 523 | 524 | In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. 525 | 526 | We cannot let this happen. 527 | 528 | Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 529 | 530 | Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 531 | 532 | One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 533 | 534 | And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 535 | 536 | A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 537 | 538 | And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 539 | 540 | We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. 541 | 542 | We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. 543 | 544 | We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. 545 | 546 | We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. 547 | 548 | We can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours. 549 | 550 | Provide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers. 551 | 552 | Revise our laws so businesses have the workers they need and families don’t wait decades to reunite. 553 | 554 | It’s not only the right thing to do—it’s the economically smart thing to do. 555 | 556 | That’s why immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce. 557 | 558 | Let’s get it done once and for all. 559 | 560 | Advancing liberty and justice also requires protecting the rights of women. 561 | 562 | The constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before. 563 | 564 | If we want to go forward—not backward—we must protect access to health care. Preserve a woman’s right to choose. And let’s continue to advance maternal health care in America. 565 | 566 | And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. 567 | 568 | As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 569 | 570 | While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. 571 | 572 | And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. 573 | 574 | So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together. 575 | 576 | First, beat the opioid epidemic. 577 | 578 | There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery. 579 | 580 | Get rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers. 581 | 582 | If you’re suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery. 583 | 584 | Second, let’s take on mental health. Especially among our children, whose lives and education have been turned upside down. 585 | 586 | The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning. 587 | 588 | I urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor. 589 | 590 | Children were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media. 591 | 592 | As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. 593 | 594 | It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. 595 | 596 | And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. 597 | 598 | Third, support our veterans. 599 | 600 | Veterans are the best of us. 601 | 602 | I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. 603 | 604 | My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free. 605 | 606 | Our troops in Iraq and Afghanistan faced many dangers. 607 | 608 | One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. 609 | 610 | When they came home, many of the world’s fittest and best trained warriors were never the same. 611 | 612 | Headaches. Numbness. Dizziness. 613 | 614 | A cancer that would put them in a flag-draped coffin. 615 | 616 | I know. 617 | 618 | One of those soldiers was my son Major Beau Biden. 619 | 620 | We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. 621 | 622 | But I’m committed to finding out everything we can. 623 | 624 | Committed to military families like Danielle Robinson from Ohio. 625 | 626 | The widow of Sergeant First Class Heath Robinson. 627 | 628 | He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. 629 | 630 | Stationed near Baghdad, just yards from burn pits the size of football fields. 631 | 632 | Heath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter. 633 | 634 | But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. 635 | 636 | Danielle says Heath was a fighter to the very end. 637 | 638 | He didn’t know how to stop fighting, and neither did she. 639 | 640 | Through her pain she found purpose to demand we do better. 641 | 642 | Tonight, Danielle—we are. 643 | 644 | The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. 645 | 646 | And tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers. 647 | 648 | I’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. 649 | 650 | And fourth, let’s end cancer as we know it. 651 | 652 | This is personal to me and Jill, to Kamala, and to so many of you. 653 | 654 | Cancer is the #2 cause of death in America–second only to heart disease. 655 | 656 | Last month, I announced our plan to supercharge 657 | the Cancer Moonshot that President Obama asked me to lead six years ago. 658 | 659 | Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases. 660 | 661 | More support for patients and families. 662 | 663 | To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. 664 | 665 | It’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more. 666 | 667 | ARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. 668 | 669 | A unity agenda for the nation. 670 | 671 | We can do this. 672 | 673 | My fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. 674 | 675 | In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. 676 | 677 | We have fought for freedom, expanded liberty, defeated totalitarianism and terror. 678 | 679 | And built the strongest, freest, and most prosperous nation the world has ever known. 680 | 681 | Now is the hour. 682 | 683 | Our moment of responsibility. 684 | 685 | Our test of resolve and conscience, of history itself. 686 | 687 | It is in this moment that our character is formed. Our purpose is found. Our future is forged. 688 | 689 | Well I know this nation. 690 | 691 | We will meet the test. 692 | 693 | To protect freedom and liberty, to expand fairness and opportunity. 694 | 695 | We will save democracy. 696 | 697 | As hard as these times have been, I am more optimistic about America today than I have been my whole life. 698 | 699 | Because I see the future that is within our grasp. 700 | 701 | Because I know there is simply nothing beyond our capacity. 702 | 703 | We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. 704 | 705 | The only nation that can be defined by a single word: possibilities. 706 | 707 | So on this night, in our 245th year as a nation, I have come to report on the State of the Union. 708 | 709 | And my report is this: the State of the Union is strong—because you, the American people, are strong. 710 | 711 | We are stronger today than we were a year ago. 712 | 713 | And we will be stronger a year from now than we are today. 714 | 715 | Now is our moment to meet and overcome the challenges of our time. 716 | 717 | And we will, as one people. 718 | 719 | One America. 720 | 721 | The United States of America. 722 | 723 | May God bless you all. May God protect our troops. -------------------------------------------------------------------------------- /demo/.ipynb_checkpoints/sql_demo-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Vector Search with IRIS SQL\n", 8 | "This tutorial covers how to use IRIS as a vector database. \n", 9 | "\n", 10 | "For this tutorial, we will use a dataset of 2.2k online reviews of scotch (\n", 11 | "dataset from https://www.kaggle.com/datasets/koki25ando/22000-scotch-whisky-reviews) . With our latest vector database functionality, we can leverage the latest embedding models to run semantic search on the online reviews of scotch whiskeys. In addition, we'll be able to apply filters on columns with structured data. For example, we will be able to search for whiskeys that are priced under $100, and are 'earthy, smooth, and easy to drink'. Let's find our perfect whiskey!" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "First step is to do some imports and establish a connection to InterSystems IRIS." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 23, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import os, pandas as pd\n", 28 | "from sentence_transformers import SentenceTransformer\n", 29 | "from sqlalchemy import create_engine, text\n", 30 | "\n", 31 | "username = 'demo'\n", 32 | "password = 'demo'\n", 33 | "hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 34 | "port = '1972' \n", 35 | "namespace = 'USER'\n", 36 | "\n", 37 | "\n", 38 | "# username = 'demo'\n", 39 | "# password = 'demo'\n", 40 | "# hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 41 | "# port = '63958' \n", 42 | "# namespace = 'USER'\n", 43 | "\n", 44 | "CONNECTION_STRING = f\"iris://{username}:{password}@{hostname}:{port}/{namespace}\"\n", 45 | "#iris://demo:demo@localhost:63958/demo\n", 46 | "engine = create_engine(CONNECTION_STRING)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "## Exploring the dataset\n", 54 | "\n", 55 | "Let's take a look at the data in our CSV file with whiskey reviews." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 24, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "data": { 65 | "text/html": [ 66 | "
\n", 67 | "\n", 80 | "\n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | "
Unnamed: 0namecategoryreview.pointpricecurrencydescription
01Johnnie Walker Blue Label, 40%Blended Scotch Whisky97225$Magnificently powerful and intense. Caramels, ...
12Black Bowmore, 1964 vintage, 42 year old, 40.5%Single Malt Scotch974500.00$What impresses me most is how this whisky evol...
23Bowmore 46 year old (distilled 1964), 42.9%Single Malt Scotch9713500.00$There have been some legendary Bowmores from t...
34Compass Box The General, 53.4%Blended Malt Scotch Whisky96325$With a name inspired by a 1926 Buster Keaton m...
45Chivas Regal Ultis, 40%Blended Malt Scotch Whisky96160$Captivating, enticing, and wonderfully charmin...
\n", 146 | "
" 147 | ], 148 | "text/plain": [ 149 | " Unnamed: 0 name \\\n", 150 | "0 1 Johnnie Walker Blue Label, 40% \n", 151 | "1 2 Black Bowmore, 1964 vintage, 42 year old, 40.5% \n", 152 | "2 3 Bowmore 46 year old (distilled 1964), 42.9% \n", 153 | "3 4 Compass Box The General, 53.4% \n", 154 | "4 5 Chivas Regal Ultis, 40% \n", 155 | "\n", 156 | " category review.point price currency \\\n", 157 | "0 Blended Scotch Whisky 97 225 $ \n", 158 | "1 Single Malt Scotch 97 4500.00 $ \n", 159 | "2 Single Malt Scotch 97 13500.00 $ \n", 160 | "3 Blended Malt Scotch Whisky 96 325 $ \n", 161 | "4 Blended Malt Scotch Whisky 96 160 $ \n", 162 | "\n", 163 | " description \n", 164 | "0 Magnificently powerful and intense. Caramels, ... \n", 165 | "1 What impresses me most is how this whisky evol... \n", 166 | "2 There have been some legendary Bowmores from t... \n", 167 | "3 With a name inspired by a 1926 Buster Keaton m... \n", 168 | "4 Captivating, enticing, and wonderfully charmin... " 169 | ] 170 | }, 171 | "execution_count": 24, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "# Load the CSV file\n", 178 | "df = pd.read_csv('../data/scotch_review.csv')\n", 179 | "df.head()" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "Now we'll reorganize the data a little bit with panda functions to make it more practical to store in a table." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 25, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "text/html": [ 197 | "
\n", 198 | "\n", 211 | "\n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | "
namecategoryreview.pointpricedescription
0Johnnie Walker Blue Label, 40%Blended Scotch Whisky97225Magnificently powerful and intense. Caramels, ...
1Black Bowmore, 1964 vintage, 42 year old, 40.5%Single Malt Scotch974500.00What impresses me most is how this whisky evol...
2Bowmore 46 year old (distilled 1964), 42.9%Single Malt Scotch9713500.00There have been some legendary Bowmores from t...
3Compass Box The General, 53.4%Blended Malt Scotch Whisky96325With a name inspired by a 1926 Buster Keaton m...
4Chivas Regal Ultis, 40%Blended Malt Scotch Whisky96160Captivating, enticing, and wonderfully charmin...
\n", 265 | "
" 266 | ], 267 | "text/plain": [ 268 | " name \\\n", 269 | "0 Johnnie Walker Blue Label, 40% \n", 270 | "1 Black Bowmore, 1964 vintage, 42 year old, 40.5% \n", 271 | "2 Bowmore 46 year old (distilled 1964), 42.9% \n", 272 | "3 Compass Box The General, 53.4% \n", 273 | "4 Chivas Regal Ultis, 40% \n", 274 | "\n", 275 | " category review.point price \\\n", 276 | "0 Blended Scotch Whisky 97 225 \n", 277 | "1 Single Malt Scotch 97 4500.00 \n", 278 | "2 Single Malt Scotch 97 13500.00 \n", 279 | "3 Blended Malt Scotch Whisky 96 325 \n", 280 | "4 Blended Malt Scotch Whisky 96 160 \n", 281 | "\n", 282 | " description \n", 283 | "0 Magnificently powerful and intense. Caramels, ... \n", 284 | "1 What impresses me most is how this whisky evol... \n", 285 | "2 There have been some legendary Bowmores from t... \n", 286 | "3 With a name inspired by a 1926 Buster Keaton m... \n", 287 | "4 Captivating, enticing, and wonderfully charmin... " 288 | ] 289 | }, 290 | "execution_count": 25, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "# Clean data\n", 297 | "# Remove the specified columns\n", 298 | "df.drop(['currency'], axis=1, inplace=True)\n", 299 | "\n", 300 | "# Drop the first column\n", 301 | "df.drop(columns=df.columns[0], inplace=True)\n", 302 | "\n", 303 | "# Remove rows without a price\n", 304 | "df.dropna(subset=['price'], inplace=True)\n", 305 | "\n", 306 | "# Ensure values in 'price' are numbers\n", 307 | "df = df[pd.to_numeric(df['price'], errors='coerce').notna()]\n", 308 | "\n", 309 | "# Replace NaN values in other columns with an empty string\n", 310 | "df.fillna('', inplace=True)\n", 311 | "\n", 312 | "df.head()" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "## Creating the table in IRIS SQL\n", 320 | "\n", 321 | "Now, InterSystems IRIS supports vectors as a datatype in tables! Here, we create a table with a few different columns. The last column, `description_vector` of type `VECTOR(FLOAT, 384)`, will be used to store vectors that are generated by passing the `description` of a review through an embedding model. The `FLOAT` option here is new in 2024.3, and 384 is the number of dimensions the chosen embedding model uses." 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 26, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "with engine.connect() as conn:\n", 331 | " with conn.begin():# Load \n", 332 | " sql = f\"\"\"\n", 333 | " CREATE TABLE IF NOT EXISTS scotch_reviews (\n", 334 | " name VARCHAR(255),\n", 335 | " category VARCHAR(255),\n", 336 | " review_point INT,\n", 337 | " price DOUBLE,\n", 338 | " description VARCHAR(2000),\n", 339 | " description_vector VECTOR(FLOAT, 384)\n", 340 | " )\n", 341 | " \"\"\"\n", 342 | " result = conn.execute(text(sql))" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "## Creating the embeddings\n", 350 | "\n", 351 | "Next, we'll create the embeddings for the `description` column. In IRIS 2024.3, you can leave this work to IRIS by using the new [`EMBEDDING` datatype](https://docs.intersystems.com/iris20243/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_insembed), but for now we'll go with classic Pythonic ways of creating them, based on a common Sentence Transformer model." 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 27, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "# Load a pre-trained sentence transformer model. This model's output vectors are of size 384\n", 361 | "model = SentenceTransformer('all-MiniLM-L6-v2') " 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 28, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/html": [ 372 | "
\n", 373 | "\n", 386 | "\n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | "
namecategoryreview.pointpricedescriptiondescription_vector
0Johnnie Walker Blue Label, 40%Blended Scotch Whisky97225Magnificently powerful and intense. Caramels, ...[-0.010494445450603962, 0.014728965237736702, ...
1Black Bowmore, 1964 vintage, 42 year old, 40.5%Single Malt Scotch974500.00What impresses me most is how this whisky evol...[0.02318125031888485, -0.05123035982251167, 0....
2Bowmore 46 year old (distilled 1964), 42.9%Single Malt Scotch9713500.00There have been some legendary Bowmores from t...[0.04333321005105972, -0.017066635191440582, -...
3Compass Box The General, 53.4%Blended Malt Scotch Whisky96325With a name inspired by a 1926 Buster Keaton m...[-0.07594005018472672, -0.03676239028573036, 0...
4Chivas Regal Ultis, 40%Blended Malt Scotch Whisky96160Captivating, enticing, and wonderfully charmin...[-0.012818857096135616, -0.09769789129495621, ...
\n", 446 | "
" 447 | ], 448 | "text/plain": [ 449 | " name \\\n", 450 | "0 Johnnie Walker Blue Label, 40% \n", 451 | "1 Black Bowmore, 1964 vintage, 42 year old, 40.5% \n", 452 | "2 Bowmore 46 year old (distilled 1964), 42.9% \n", 453 | "3 Compass Box The General, 53.4% \n", 454 | "4 Chivas Regal Ultis, 40% \n", 455 | "\n", 456 | " category review.point price \\\n", 457 | "0 Blended Scotch Whisky 97 225 \n", 458 | "1 Single Malt Scotch 97 4500.00 \n", 459 | "2 Single Malt Scotch 97 13500.00 \n", 460 | "3 Blended Malt Scotch Whisky 96 325 \n", 461 | "4 Blended Malt Scotch Whisky 96 160 \n", 462 | "\n", 463 | " description \\\n", 464 | "0 Magnificently powerful and intense. Caramels, ... \n", 465 | "1 What impresses me most is how this whisky evol... \n", 466 | "2 There have been some legendary Bowmores from t... \n", 467 | "3 With a name inspired by a 1926 Buster Keaton m... \n", 468 | "4 Captivating, enticing, and wonderfully charmin... \n", 469 | "\n", 470 | " description_vector \n", 471 | "0 [-0.010494445450603962, 0.014728965237736702, ... \n", 472 | "1 [0.02318125031888485, -0.05123035982251167, 0.... \n", 473 | "2 [0.04333321005105972, -0.017066635191440582, -... \n", 474 | "3 [-0.07594005018472672, -0.03676239028573036, 0... \n", 475 | "4 [-0.012818857096135616, -0.09769789129495621, ... " 476 | ] 477 | }, 478 | "execution_count": 28, 479 | "metadata": {}, 480 | "output_type": "execute_result" 481 | } 482 | ], 483 | "source": [ 484 | "# Generate embeddings for all descriptions at once.\n", 485 | "# Batch processing before inserting into the table makes it faster, but this step may still take a minute\n", 486 | "embeddings = model.encode(df['description'].tolist(), normalize_embeddings=True)\n", 487 | "\n", 488 | "# Add the embeddings to the DataFrame\n", 489 | "df['description_vector'] = embeddings.tolist()\n", 490 | "\n", 491 | "df.head()" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "Now we'll load the data into our table. Note the `str()` call as we're passing the vector as a comma-separated list of values in string format, because there is no specific vector datatype in the DB-API driver standard." 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 29, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "with engine.connect() as conn:\n", 508 | " with conn.begin():\n", 509 | " for index, row in df.iterrows():\n", 510 | " sql = text(\"\"\"\n", 511 | " INSERT INTO scotch_reviews \n", 512 | " (name, category, review_point, price, description, description_vector) \n", 513 | " VALUES (:name, :category, :review_point, :price, :description, TO_VECTOR(:description_vector))\n", 514 | " \"\"\")\n", 515 | " conn.execute(sql, {\n", 516 | " 'name': row['name'], \n", 517 | " 'category': row['category'], \n", 518 | " 'review_point': row['review.point'], \n", 519 | " 'price': row['price'], \n", 520 | " 'description': row['description'], \n", 521 | " 'description_vector': str(row['description_vector'])\n", 522 | " })\n" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "metadata": {}, 528 | "source": [ 529 | "## Running a few queries\n", 530 | "\n", 531 | "Let's look for a scotch that costs less than $100, and has an earthy and creamy taste." 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 30, 537 | "metadata": {}, 538 | "outputs": [], 539 | "source": [ 540 | "description_search = \"earthy and creamy taste\"\n", 541 | "search_vector = model.encode(description_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": 31, 547 | "metadata": {}, 548 | "outputs": [ 549 | { 550 | "name": "stdout", 551 | "output_type": "stream", 552 | "text": [ 553 | "[('Signatory (distilled at Bowmore), 16 year old, 1988 vintage, cask #42508, 46%', 'Single Malt Scotch', 87, 60.0, 'Medium-bodied and nicely textured. Good balance of flavors -- and well-integrated, too -- with lovely sweet notes (cereal grain, cookie dough, carame ... (48 characters truncated) ... fishnets, and brine that is complementary, but not aggressive, with a suggestion of lavender and tangerine. Balanced finish. (332 bottles produced.)', '-.048620376735925674438,-.082065843045711517333,.039660684764385223388,-.018970852717757225036,-.017485298216342926026,.042453121393918991088,.046325 ... (8848 characters truncated) ... 064819,-.0038620312698185443878,-.022344633936882019042,.052769336849451065063,-.061306387186050415039,.048756919801235198974,-.063436612486839294433'), ('Shieldaig 12 year old, 40%', 'Blended Scotch Whisky', 85, 31.0, 'This is a sharp dresser, with a firm, solid mouthfeel and an altogether finer and more focused taste than Shieldaig Classic (see\\r\\nbelow). It’s not ... (114 characters truncated) ... e, and some soft fruit, including a touch of overripe banana and melon notes. The savoriness this time comes from a touch of pepper rather than salt.', '-.0049302759580314159393,-.070051722228527069091,.046160325407981872558,.053877647966146469116,.0037386598996818065643,.018159903585910797119,.076887 ... (8828 characters truncated) ... 3170166,-.0015221295179799199104,.047901224344968795776,.0098907267674803733826,-.026278590783476829528,.042504664510488510131,.041063331067562103271'), ('The Arran Malt, Single Bourbon Cask, (Cask#1801), 1996 Vintage, 50.5%', 'Single Malt Scotch', 86, 80.0, 'Fresh and clean, with notes of vanilla, ripe barley, honey, caramel apple, and toasted coconut. Creamy and mouth-coating in texture, leading to a pleasingly dry, spicy oak finish. Very drinkable, yet satisfying. Quite nice. \\r\\n', '-.0010089599527418613433,-.050370443612337112426,.046052008867263793946,.074557252228260040283,-.0048394058831036090851,.039374433457851409912,.02021 ... (8848 characters truncated) ... 5349731,.0031521078199148178101,-.0083352373912930488586,.10131823271512985229,-.021709911525249481201,.037876114249229431152,.0095796724781394004821')]\n" 554 | ] 555 | } 556 | ], 557 | "source": [ 558 | "with engine.connect() as conn:\n", 559 | " with conn.begin():\n", 560 | " sql = text(\"\"\"\n", 561 | " SELECT TOP 3 * FROM scotch_reviews \n", 562 | " WHERE price < 100 \n", 563 | " ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(:search_vector)) DESC\n", 564 | " \"\"\")\n", 565 | "\n", 566 | " results = conn.execute(sql, {'search_vector': str(search_vector)}).fetchall()\n", 567 | "\n", 568 | "print(results)" 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": {}, 574 | "source": [ 575 | "Let's print that result a little more nicely!" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 32, 581 | "metadata": {}, 582 | "outputs": [ 583 | { 584 | "data": { 585 | "text/html": [ 586 | "
\n", 587 | "\n", 600 | "\n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | "
namecategoryreview.pointpricedescription
0Signatory (distilled at Bowmore), 16 year old, 1988 vintage, cask #42508, 46%Single Malt Scotch8760.0Medium-bodied and nicely textured. Good balance of flavors -- and well-integrated, too -- with lovely sweet notes (cereal grain, cookie dough, caramel, and vanilla cream), young heathery peat, tar, fishnets, and brine that is complementary, but not aggressive, with a suggestion of lavender and tangerine. Balanced finish. (332 bottles produced.)
1Shieldaig 12 year old, 40%Blended Scotch Whisky8531.0This is a sharp dresser, with a firm, solid mouthfeel and an altogether finer and more focused taste than Shieldaig Classic (see\\r\\nbelow). It’s not coastal or earthy particularly, either. Instead the flavors are softer and built around mocha, smooth creamy toffee, and some soft fruit, including a touch of overripe banana and melon notes. The savoriness this time comes from a touch of pepper rather than salt.
2The Arran Malt, Single Bourbon Cask, (Cask#1801), 1996 Vintage, 50.5%Single Malt Scotch8680.0Fresh and clean, with notes of vanilla, ripe barley, honey, caramel apple, and toasted coconut. Creamy and mouth-coating in texture, leading to a pleasingly dry, spicy oak finish. Very drinkable, yet satisfying. Quite nice. \\r\\n
\n", 638 | "
" 639 | ], 640 | "text/plain": [ 641 | " name \\\n", 642 | "0 Signatory (distilled at Bowmore), 16 year old, 1988 vintage, cask #42508, 46% \n", 643 | "1 Shieldaig 12 year old, 40% \n", 644 | "2 The Arran Malt, Single Bourbon Cask, (Cask#1801), 1996 Vintage, 50.5% \n", 645 | "\n", 646 | " category review.point price \\\n", 647 | "0 Single Malt Scotch 87 60.0 \n", 648 | "1 Blended Scotch Whisky 85 31.0 \n", 649 | "2 Single Malt Scotch 86 80.0 \n", 650 | "\n", 651 | " description \n", 652 | "0 Medium-bodied and nicely textured. Good balance of flavors -- and well-integrated, too -- with lovely sweet notes (cereal grain, cookie dough, caramel, and vanilla cream), young heathery peat, tar, fishnets, and brine that is complementary, but not aggressive, with a suggestion of lavender and tangerine. Balanced finish. (332 bottles produced.) \n", 653 | "1 This is a sharp dresser, with a firm, solid mouthfeel and an altogether finer and more focused taste than Shieldaig Classic (see\\r\\nbelow). It’s not coastal or earthy particularly, either. Instead the flavors are softer and built around mocha, smooth creamy toffee, and some soft fruit, including a touch of overripe banana and melon notes. The savoriness this time comes from a touch of pepper rather than salt. \n", 654 | "2 Fresh and clean, with notes of vanilla, ripe barley, honey, caramel apple, and toasted coconut. Creamy and mouth-coating in texture, leading to a pleasingly dry, spicy oak finish. Very drinkable, yet satisfying. Quite nice. \\r\\n " 655 | ] 656 | }, 657 | "execution_count": 32, 658 | "metadata": {}, 659 | "output_type": "execute_result" 660 | } 661 | ], 662 | "source": [ 663 | "results_df = pd.DataFrame(results, columns=df.columns).iloc[:, :-1] # Remove vector\n", 664 | "pd.set_option('display.max_colwidth', None) # Easier to read description\n", 665 | "results_df.head()" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "## Indexing vector data\n", 673 | "\n", 674 | "The latest version of IRIS 2025.1 includes not only bug fixes and performance enhancements, but a new disk-based Approximate Nearest Neighbors index that speeds up vector search for large collections of vectors (typically over 100K). See [the docs](https://docs.intersystems.com/iris20251/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_index) for more information on how to define and use the index.\n", 675 | "\n", 676 | "```SQL\n", 677 | "CREATE INDEX HNSWIndex ON TABLE scotch_reviews (description_vector) AS HNSW(M=80, Distance='DotProduct');\n", 678 | "```\n", 679 | "\n", 680 | "The index will automatically get used if you issue a query that uses a `TOP` clause and an `ORDER BY` to sort by the distance function for which the index was created. You can verify its use in the query plan, by using the `EXPLAIN` command or checking the plan through the System Management Portal UI.\n", 681 | "\n", 682 | "```SQL\n", 683 | "SELECT TOP 10 * FROM scotch_reviews ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(:search_vector)) DESC;\n", 684 | "```\n", 685 | "\n", 686 | "Since this notebook is working with a dataset much smaller than 100K rows, there won't be a measurable performance benefit, and this is provided as an example you can adapt.\n" 687 | ] 688 | } 689 | ], 690 | "metadata": { 691 | "kernelspec": { 692 | "display_name": "Python 3 (ipykernel)", 693 | "language": "python", 694 | "name": "python3" 695 | }, 696 | "language_info": { 697 | "codemirror_mode": { 698 | "name": "ipython", 699 | "version": 3 700 | }, 701 | "file_extension": ".py", 702 | "mimetype": "text/x-python", 703 | "name": "python", 704 | "nbconvert_exporter": "python", 705 | "pygments_lexer": "ipython3", 706 | "version": "3.10.16" 707 | } 708 | }, 709 | "nbformat": 4, 710 | "nbformat_minor": 4 711 | } 712 | -------------------------------------------------------------------------------- /demo/README.md: -------------------------------------------------------------------------------- 1 | ## Demo notes 2 | 3 | [sql_demo.ipynb](demo/sql_demo.ipynb), [llama_demo.ipynb](demo/llama_demo.ipynb), and [langchain_demo.ipynb](demo/langchain_demo.ipynb) assume you followed the instructions in the quickstart to run a containerized instance of IRIS on your computer. 4 | 5 | In order to run an iris container in a notebook (eg. If you're using colab), refer to [iris_notebook_container.ipynb](demo/iris_notebook_container.ipynb). 6 | 7 | For more details on our vector SQl syntax, refer to [SQLSyntax.md](demo/SQLSyntax.md) or the [product documentation](https://docs.intersystems.com/irislatest/csp/docbook/Doc.View.cls?KEY=GSQL_vecsearch). 8 | -------------------------------------------------------------------------------- /demo/SQLSyntax.md: -------------------------------------------------------------------------------- 1 | 2 | # Using Vectors in IRIS SQL 3 | 4 | :alert: Please refer to the [full product documentation](https://docs.intersystems.com/iris20241/csp/docbook/Doc.View.cls?KEY=GSQL_vecsearch) for the full syntax and instructions. This page is for offline reference only. 5 | 6 | ## VECTOR (type, length) 7 | **Optional parameters:** 8 | 9 | - `type` - Optional, defaults to FLOAT (was DOUBLE up to 2024.2). The datatype of elements allowed to be stored in the vector. Can be DECIMAL, DOUBLE, FLOAT, INTEGER, TIMESTAMP, or STRING. 10 | - `length` - Optional, can be specified only if type is also specified. An integer for the number of elements allowed to be stored in the vector. If specified, length restriction for INSERT INTO the vector column will be imposed. 11 | 12 | ### Creating a table with vector columns: 13 | ```sql 14 | CREATE TABLE Test.Demo (vec1 VECTOR(FLOAT,3)) 15 | CREATE TABLE Test.Demo (vec1 VECTOR(FLOAT)) 16 | CREATE TABLE Test.Demo (vec1 VECTOR) 17 | ``` 18 | ### Inserting into a table with vector columns: 19 | ```sql 20 | INSERT INTO Test.Demo (vec1) VALUES ('0.1,0.2,0.3') 21 | ``` 22 | This query will succeed following any of the above three table creations. It will default to the table's vector type. 23 | 24 | ### Selecting from a table with vector columns: 25 | ```sql 26 | SELECT * FROM Test.Demo 27 | ``` 28 | 29 | ## Vector Index 30 | 31 | **Note**: This feature is available only for InterSystems IRIS 2025.1 and later versions. Please join the [Early Access Program](https://live.evaluation.iscinternal.com/download/adminearlyaccess.csp?earlyAccessProgram=Vector_Search) if you'd like access to a preview kit. 32 | 33 | After storing data in InterSystems IRIS in the VECTOR type, you may define a vector index (also called an approximate nearest neighbor index or an ANN index) to improve the efficiency of searches issued against your stored vectors. 34 | 35 | In a standard vector search, comparisons against an input vector must be made against every individual vector in the database. While this approach guarantees that your searches are completely accurate, it is computationally inefficient. A vector index leverages nearest neighbor algorithms to store the vectors in a sorted data structure that limits the number of comparison operations performed between an input vector and the stored vectors. As a result, when a search is performed, the system does not make comparisons with each stored vector but instead uses the sorted data structure to eliminate vectors that are not close to the input vector. This approach dramatically improves the performance of searches on a vector database, particularly when dealing with large amounts of high-dimensional data. 36 | 37 | > **Note:** Queries that use a vector index currently do not support parallelization. 38 | 39 | As with standard indexes, the query optimizer may decide that the most efficient query plan does not use the vector index you have defined. To see if a query uses the vector index, examine the query plan with the `EXPLAIN` command. 40 | 41 | ### Hierarchical Navigable Small World Index 42 | 43 | InterSystems SQL allows you to define a Hierarchical Navigable Small World (HNSW) index, which uses the HNSW algorithm to create a vector index. 44 | 45 | You can define an HNSW index using a `CREATE INDEX` statement. To define an HNSW index, the following requirements must be met: 46 | 47 | - The HNSW index is defined on a VECTOR-typed field with a fixed length that is of type `FLOAT`, `DOUBLE` or `DECIMAL`. 48 | - The table the index is defined on must have IDs that are bitmap-supported. 49 | - The table the index is defined on must use default storage. 50 | 51 | There are three parameters you can specify when defining an HNSW index: 52 | 53 | 1. **Distance** (required): The distance function used by the index, surrounded by quotes (`''`). There are two possible values: `Cosine` and `DotProduct`. This parameter is case-insensitive. 54 | 2. **M** (optional): The number of bi-directional links created for every new element during construction. This value should be a positive integer larger than 1; the value will fall between 2–100. Higher M values work better on datasets with high dimensionality or recall, while lower M values work better with low dimensionality or recall. The default value is 64. 55 | 3. **efConstruct** (optional): The size of the dynamic list for the nearest neighbors. This value should be a positive integer larger than M. Larger `efConstruct` values generally lead to better index quality but longer construction time. There is a maximum value past which `efConstruct` does not improve the quality of the index. The default value is 64. 56 | 57 | #### Examples of defining HNSW indexes with various parameter values: 58 | 59 | ```sql 60 | CREATE INDEX HNSWIndex ON TABLE Company.People (Biography) 61 | AS %SQL.Index.HNSW(Distance='Cosine') 62 | 63 | CREATE INDEX HNSWIndex ON TABLE Company.People (Biography) 64 | AS %SQL.Index.HNSW(M=80, Distance='DotProduct') 65 | 66 | CREATE INDEX HNSWIndex ON TABLE Company.People (Biography) 67 | AS %SQL.Index.HNSW(M=72, efConstruct=100, Distance='Cosine') 68 | ``` 69 | 70 | ## SQL Functions 71 | 72 | ### TO_VECTOR (input, type, length) 73 | **Parameters:** 74 | 75 | - `input` - String value (VARCHAR) representing the vector contents in either of the supported input formats, "val1,val2,val3" (recommended), or "[ val1,val2, val3]" 76 | - `type` - Optional, defaults to FLOAT. The datatype of elements in the array, can be DECIMAL, DOUBLE, FLOAT, INTEGER, TIMESTAMP, or STRING. 77 | - `length` - Optional. When specified, input will be padded with NULL values or truncated to the specified length, such that the result is a VECTOR of the specified length. The two-argument version of this function simply returns a vector with as many elements as the supplied list. 78 | 79 | **Returns:** the corresponding vector to be added to tables or used in other vector operations. 80 | 81 | **Example:** 82 | ```sql 83 | INSERT INTO Test.Demo (vec1) VALUES (TO_VECTOR('0.1,0.2,0.3', FLOAT, 3)) 84 | ``` 85 | ### VECTOR_COSINE (vec1, vec2) 86 | **Parameters:** 87 | 88 | - `vec1, vec2` - vectors 89 | 90 | **Returns:** a double value of the cosine distance between the two vectors, taking value from -1 to 1. 91 | 92 | **Example:** 93 | ```sql 94 | SELECT * FROM Test.Demo WHERE (VECTOR_COSINE(vec1, TO_VECTOR('0.4,0.5,0.6')) < 0) 95 | ``` 96 | ### VECTOR_DOT_PRODUCT (vec1, vec2) 97 | **Parameters:** 98 | 99 | - `vec1, vec2` - vectors 100 | 101 | **Returns:** a double value of the dot product of two vectors. 102 | 103 | **Example:** 104 | ```sql 105 | SELECT * FROM Test.Demo WHERE (VECTOR_DOT_PRODUCT(vec1, TO_VECTOR('0.4,0.5,0.6')) > 10) 106 | SELECT * FROM Test.Demo WHERE (VECTOR_DOT_PRODUCT(vec1, vec1) > 10) 107 | ``` 108 | ## Nearest Neighbor Search 109 | Getting the top 3 most similar vectors (to an input vector) from a table 110 | 111 | **Using Cosine Similarity:** 112 | ```sql 113 | SELECT TOP 3 * FROM Test.Demo ORDER BY VECTOR_COSINE(vec1, TO_VECTOR('0.2,0.4,0.6', FLOAT)) DESC 114 | ``` 115 | **Using Dot Product:** 116 | ```sql 117 | SELECT TOP 3 * FROM Test.Demo ORDER BY VECTOR_DOT_PRODUCT(vec1, TO_VECTOR('0.2,0.4,0.6', FLOAT)) DESC 118 | ``` 119 | Note that we use 'DESC', since a higher magnitude for dot product/cosine similarity means the vector is more similar. 120 | 121 | This can be combined with 'WHERE' clauses to add filters on other columns. 122 | 123 | -------------------------------------------------------------------------------- /demo/cloud_sql_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "6d3a14d8", 6 | "metadata": {}, 7 | "source": [ 8 | "# Vector Search with Cloud SQL" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "b151275e", 14 | "metadata": {}, 15 | "source": [ 16 | "In this notebook, we'll leverage the Vector Search capabilities available in [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/). The feature works in exactly the same way as in the InterSystems IRIS 2025.1 release, but Cloud SQL requires secure connections, and this notebook illustrates how to set those up.\n", 17 | "\n", 18 | "First, please adapt the password and hostname entries in the following cell to match your Cloud SQL deployment." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "id": "57149010", 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "username = 'SQLAdmin'\n", 29 | "password = '...'\n", 30 | "hostname = '...'\n", 31 | "port = 443 \n", 32 | "namespace = 'USER'" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "id": "d199a694", 38 | "metadata": {}, 39 | "source": [ 40 | "### Copying the certificate\n", 41 | "\n", 42 | "In order to connect securely, you'll need to point the driver at the `certificateSQLaaS.pem` file for your Cloud SQL deployment. You can download the certificate file from your deployment's detail screen. Look for the button that says \"Get X.509 certificate\". If you're running this notebook in a container, you can copy the certificate file into the container using the following command:\n", 43 | "\n", 44 | "```Shell\n", 45 | "docker cp ~/Downloads/certificateSQLaaS.pem iris-vector-search-jupyter-1:/usr/cert-demo/certificateSQLaaS.pem\n", 46 | "```\n", 47 | "\n", 48 | "We'll use simple DB-API commands to establish a connection in this example:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "id": "6adf3bcf", 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "import intersystems_iris as iris\n", 59 | "import ssl\n", 60 | "\n", 61 | "# change this to wherever you copied your certificate to\n", 62 | "certificateFile = \"/usr/cert-demo/certificateSQLaaS.pem\"\n", 63 | "sslcontext = ssl.create_default_context(cafile=certificateFile)\n", 64 | "\n", 65 | "connection = iris.connect( hostname, port, namespace, username, password, sslcontext = sslcontext )\n", 66 | "cursor = connection.cursor()\n", 67 | "\n", 68 | "cursor.execute(\"SELECT 'hello secure world!'\")\n", 69 | "cursor.fetchone()[0]" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "id": "1c0a4e20", 75 | "metadata": {}, 76 | "source": [ 77 | "## Vector time!\n", 78 | "\n", 79 | "Now that we have established a secure connection, let's get onto some actual vector stuff!\n", 80 | "\n", 81 | "See the neighbouring `sql_demo.ipynb` for full detail on what we're trying to achieve here." 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "id": "679c7a03", 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "import pandas as pd\n", 92 | "\n", 93 | "# Load the CSV file\n", 94 | "df = pd.read_csv('../data/scotch_review.csv')\n", 95 | "df.head()" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "id": "cd122719", 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "# Clean data\n", 106 | "# Remove the specified columns\n", 107 | "df.drop(['currency'], axis=1, inplace=True)\n", 108 | "\n", 109 | "# Drop the first column\n", 110 | "df.drop(columns=df.columns[0], inplace=True)\n", 111 | "\n", 112 | "# Remove rows without a price\n", 113 | "df.dropna(subset=['price'], inplace=True)\n", 114 | "\n", 115 | "# Ensure values in 'price' are numbers\n", 116 | "df = df[pd.to_numeric(df['price'], errors='coerce').notna()]\n", 117 | "\n", 118 | "# Replace NaN values in other columns with an empty string\n", 119 | "df.fillna('', inplace=True)\n", 120 | "\n", 121 | "df.head()" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "id": "d6bd2994", 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "from sentence_transformers import SentenceTransformer\n", 132 | "\n", 133 | "# Load a pre-trained sentence transformer model. This model's output vectors are of size 384\n", 134 | "model = SentenceTransformer('all-MiniLM-L6-v2') \n", 135 | "\n", 136 | "# Generate embeddings for all descriptions at once. Batch processing makes it faster\n", 137 | "embeddings = model.encode(df['description'].tolist(), normalize_embeddings=True)\n", 138 | "\n", 139 | "# Add the embeddings to the DataFrame\n", 140 | "df['description_vector'] = embeddings.tolist()\n", 141 | "\n", 142 | "df.head()" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "id": "b0b756b6", 148 | "metadata": {}, 149 | "source": [ 150 | "## And now load them into Cloud SQL\n", 151 | "\n", 152 | "We'll first create a table and then ingest all the rows from the dataframe we created earlier." 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "id": "d19c1ab7", 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "cursor.execute('DROP TABLE IF EXISTS scotch_reviews')\n", 163 | "cursor.execute(f\"\"\"CREATE TABLE scotch_reviews (\n", 164 | " name VARCHAR(255),\n", 165 | " category VARCHAR(255),\n", 166 | " review_point INT,\n", 167 | " price DOUBLE,\n", 168 | " description VARCHAR(2000),\n", 169 | " description_vector VECTOR(FLOAT, 384)\n", 170 | " )\"\"\")\n", 171 | "\n", 172 | "seq = []\n", 173 | "for index, row in df.iterrows():\n", 174 | " seq.append((row['name'], row['category'], row['review.point'], row['price'], row['description'], str(row['description_vector'])))\n", 175 | "\n", 176 | "success = cursor.executemany(\"INSERT INTO scotch_reviews (name, category, review_point, price, description, description_vector) VALUES (?, ?, ?, ?, ?, TO_VECTOR(?))\", seq)\n" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "id": "d7c8b0fa", 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "description_search = \"earthy and creamy taste\"\n", 187 | "search_vector = model.encode(description_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector\n", 188 | "\n", 189 | "cursor.execute(\"\"\"\n", 190 | " SELECT TOP 3 * FROM scotch_reviews \n", 191 | " WHERE price < 100 \n", 192 | " ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(?)) DESC\n", 193 | " \"\"\", [str(search_vector)])\n", 194 | "\n", 195 | "print(cursor.fetchall())" 196 | ] 197 | } 198 | ], 199 | "metadata": { 200 | "kernelspec": { 201 | "display_name": "Python 3 (ipykernel)", 202 | "language": "python", 203 | "name": "python3" 204 | }, 205 | "language_info": { 206 | "codemirror_mode": { 207 | "name": "ipython", 208 | "version": 3 209 | }, 210 | "file_extension": ".py", 211 | "mimetype": "text/x-python", 212 | "name": "python", 213 | "nbconvert_exporter": "python", 214 | "pygments_lexer": "ipython3", 215 | "version": "3.10.16" 216 | } 217 | }, 218 | "nbformat": 4, 219 | "nbformat_minor": 5 220 | } 221 | -------------------------------------------------------------------------------- /demo/hybrid-search.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "a5eb6179-9308-41d3-9181-e113263239bf", 6 | "metadata": { 7 | "execution": { 8 | "iopub.execute_input": "2024-12-08T18:39:16.315978Z", 9 | "iopub.status.busy": "2024-12-08T18:39:16.315563Z", 10 | "iopub.status.idle": "2024-12-08T18:39:16.323650Z", 11 | "shell.execute_reply": "2024-12-08T18:39:16.321327Z", 12 | "shell.execute_reply.started": "2024-12-08T18:39:16.315931Z" 13 | } 14 | }, 15 | "source": [ 16 | "# Hybrid search demo\n", 17 | "\n", 18 | "Please run the `sql_demo.ipynb` notebook first to populate the `scotch_reviews` table. \n", 19 | "\n", 20 | "Alternatively, look at the bottom of this notebook for steps to manually create the table and take advantage of the new `EMBEDDING()` datatype and function.\n", 21 | "\n", 22 | "Now, let's establish a connection for use in this notebook." 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "id": "58f2a158-d1e7-465f-ae19-7a9475241b0f", 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "import os, pandas as pd\n", 33 | "from sqlalchemy import create_engine, text\n", 34 | "\n", 35 | "username = 'demo'\n", 36 | "password = 'demo'\n", 37 | "hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 38 | "port = '1972' \n", 39 | "namespace = 'USER'\n", 40 | "CONNECTION_STRING = f\"iris://{username}:{password}@{hostname}:{port}/{namespace}\"\n", 41 | "\n", 42 | "engine = create_engine(CONNECTION_STRING)\n", 43 | "connection = engine.connect()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "id": "360dbd16-777c-4a79-b7ba-f892fd2bb00a", 49 | "metadata": {}, 50 | "source": [ 51 | "## Adding the full text index\n", 52 | "\n", 53 | "Now let's create an iFind (aka [SQL Text Search](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GSQLSRCH_txtsrch)) index on our scotch review column using the following command:\n", 54 | "```SQL\n", 55 | "CREATE INDEX ifind ON scotch_reviews(description) AS %iFind.Index.Basic\n", 56 | "```\n", 57 | "There's a number of options to refine the behaviour of the [iFind index](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25iFind.Index.Basic), as well as more advanced (or basic) types wrt the text indexation, but let's stick with this for now." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 2, 63 | "id": "c923c186-9633-435d-912f-f6d12e3aa34c", 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "res = connection.execute(text(\"\"\"CREATE INDEX ifind ON scotch_reviews(description) AS %iFind.Index.Basic\"\"\"))" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "id": "49629218-21a0-4557-b188-34c824785ab5", 73 | "metadata": {}, 74 | "source": [ 75 | "Creating an index through DDL will automatically build it, so there's nothing extra to do here.\n", 76 | "Now we can query the index using rich fulltext search, including phrase search, wildcard search, fuzzy search, and more (syntax options are described [here](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25iFind.Index.Basic)):\n", 77 | "\n", 78 | "```SQL\n", 79 | "SELECT name, description FROM scotch_reviews WHERE %ID %FIND search_index(ifind, 'chocolate AND coffee');\n", 80 | "SELECT name, description FROM scotch_reviews WHERE %ID %FIND search_index(ifind, 'caramel*');\n", 81 | "SELECT name, description FROM scotch_reviews WHERE %ID %FIND search_index(ifind, 'scootish', 3); -- fuzzy search\n", 82 | "```" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "id": "091d1571-277d-48f9-82ab-e8f4c43ab9a9", 88 | "metadata": {}, 89 | "source": [ 90 | "## Creating a Hybrid Search query\n", 91 | "\n", 92 | "Next, we'll need to create a vector for our search string using the same embedding model used for populating the `description_vector` column in our table." 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 3, 98 | "id": "1f895a45", 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "name": "stderr", 103 | "output_type": "stream", 104 | "text": [ 105 | "/opt/anaconda3/envs/iris-vector-search/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", 106 | " from .autonotebook import tqdm as notebook_tqdm\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "from sentence_transformers import SentenceTransformer\n", 112 | "\n", 113 | "model = SentenceTransformer('all-MiniLM-L6-v2') \n", 114 | "search_vector = model.encode(\"vanilla\", normalize_embeddings=True).tolist()" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "id": "8f25f84e-663a-4fcc-ab2c-9a3c2608ff31", 120 | "metadata": {}, 121 | "source": [ 122 | "\n", 123 | "\n", 124 | "And now we can start building our hybrid search query:" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 4, 130 | "id": "9067cbef-6bed-4c50-ac26-d20882fd96e0", 131 | "metadata": {}, 132 | "outputs": [ 133 | { 134 | "data": { 135 | "text/html": [ 136 | "
\n", 137 | "\n", 150 | "\n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | "
namecategorydescriptionIFindScoreVectorScoreIFindRankVectorRankScore
0Dewar’s 18 year old The Vintage, 40%Blended Scotch WhiskyVanilla laced with spice, fondant icing, grapefruit peel, and lime zest leave the vanilla and floral notes lower down the pecking order. In the mouth, grapefruit and orange dominate the vanilla, yet the mouthfeel is thinner and the acidity tips toward the taste of bitter orange seeds. A lingering bitter orange finish.0.033212338537247367870.46841422.1666666666666666667
1Mortlach, 1997 vintage, 57.1%Single Malt ScotchMatured in a bourbon cask. Thick and creamy, with mouth-coating vanilla, ripe barley, toasted marshmallow, vanilla wafer, key lime pie, golden delicious apple, lemongrass, and hay. The vanilla sweetness lingers to the finish, mixing with dried herbs and hay. I was expecting more from a carefully chosen Mortlach, given its pedigree, but this is still nice. (240 bottles) £2500.029271891592150222530.52726671.149732620320855615
2The Glenrothes, 1987, 43%Single Malt ScotchAmber gold color. Rich aromas of complex fruit and vanilla. Thick and rich in body, with a mouth-coating texture. Flavors of honeyed malt, well structured fruit, and vanilla, with a long finish. \\r\\n0.035980033415351315190.3671661116.0988455988455988456
3Dewar’s 12 year old The Ancestor, 40%Blended Scotch WhiskyA straightforward proposition of honey, vanilla sponge cake, barley notes, hints of apple, fresh banana, melon, and bundles of dry straw. It’s a sweetheart: soft vanilla fudge, heather honey, banana-topped banoffee pie, fudge, vanilla sandwich cookies, barley sugar, and lemon peel, with hardly any spice in the early phase. The finish has a snag of pepper at the end, but this is gorgeously tasty, with smooth vanilla fudge all the way.0.031982251924756724610.413522428.0977443609022556391
4Isle of Jura, 16 year old, 43%Single Malt ScotchAntique gold. Gentle aromas of oak, caramel, and a hint of vanilla and sea breeze. Light-medium body, with a creamy texture. Soft, gentle flavors of vanilla, toffee, subtle fruit and brine, with a dryish oaky finish. \\r\\n0.031982251924756724610.412506431.095818815331010453
\n", 222 | "
" 223 | ], 224 | "text/plain": [ 225 | " name category \\\n", 226 | "0 Dewar’s 18 year old The Vintage, 40% Blended Scotch Whisky \n", 227 | "1 Mortlach, 1997 vintage, 57.1% Single Malt Scotch \n", 228 | "2 The Glenrothes, 1987, 43% Single Malt Scotch \n", 229 | "3 Dewar’s 12 year old The Ancestor, 40% Blended Scotch Whisky \n", 230 | "4 Isle of Jura, 16 year old, 43% Single Malt Scotch \n", 231 | "\n", 232 | " description \\\n", 233 | "0 Vanilla laced with spice, fondant icing, grapefruit peel, and lime zest leave the vanilla and floral notes lower down the pecking order. In the mouth, grapefruit and orange dominate the vanilla, yet the mouthfeel is thinner and the acidity tips toward the taste of bitter orange seeds. A lingering bitter orange finish. \n", 234 | "1 Matured in a bourbon cask. Thick and creamy, with mouth-coating vanilla, ripe barley, toasted marshmallow, vanilla wafer, key lime pie, golden delicious apple, lemongrass, and hay. The vanilla sweetness lingers to the finish, mixing with dried herbs and hay. I was expecting more from a carefully chosen Mortlach, given its pedigree, but this is still nice. (240 bottles) £250 \n", 235 | "2 Amber gold color. Rich aromas of complex fruit and vanilla. Thick and rich in body, with a mouth-coating texture. Flavors of honeyed malt, well structured fruit, and vanilla, with a long finish. \\r\\n \n", 236 | "3 A straightforward proposition of honey, vanilla sponge cake, barley notes, hints of apple, fresh banana, melon, and bundles of dry straw. It’s a sweetheart: soft vanilla fudge, heather honey, banana-topped banoffee pie, fudge, vanilla sandwich cookies, barley sugar, and lemon peel, with hardly any spice in the early phase. The finish has a snag of pepper at the end, but this is gorgeously tasty, with smooth vanilla fudge all the way. \n", 237 | "4 Antique gold. Gentle aromas of oak, caramel, and a hint of vanilla and sea breeze. Light-medium body, with a creamy texture. Soft, gentle flavors of vanilla, toffee, subtle fruit and brine, with a dryish oaky finish. \\r\\n \n", 238 | "\n", 239 | " IFindScore VectorScore IFindRank VectorRank \\\n", 240 | "0 0.03321233853724736787 0.468414 2 2 \n", 241 | "1 0.02927189159215022253 0.527266 7 1 \n", 242 | "2 0.03598003341535131519 0.367166 1 116 \n", 243 | "3 0.03198225192475672461 0.413522 4 28 \n", 244 | "4 0.03198225192475672461 0.412506 4 31 \n", 245 | "\n", 246 | " Score \n", 247 | "0 .1666666666666666667 \n", 248 | "1 .149732620320855615 \n", 249 | "2 .0988455988455988456 \n", 250 | "3 .0977443609022556391 \n", 251 | "4 .095818815331010453 " 252 | ] 253 | }, 254 | "execution_count": 4, 255 | "metadata": {}, 256 | "output_type": "execute_result" 257 | } 258 | ], 259 | "source": [ 260 | "sql = text(\"\"\"\n", 261 | " WITH \n", 262 | " \n", 263 | " filtered AS (\n", 264 | " SELECT %ID AS ID, * FROM scotch_reviews\n", 265 | " WHERE %ID %FIND search_index(ifind, 'vanilla')\n", 266 | " ),\n", 267 | " \n", 268 | " scored AS (\n", 269 | " SELECT name, category, description,\n", 270 | " scotchreviews_ifindrank(ID, 'vanilla') AS IFindScore,\n", 271 | " vector_cosine(description_vector, TO_VECTOR(:search_vec ,FLOAT)) AS VectorScore\n", 272 | " FROM filtered\n", 273 | " ), \n", 274 | " \n", 275 | " with_rank AS (\n", 276 | " SELECT *,\n", 277 | " RANK() OVER (ORDER BY IFindScore DESC) AS IFindRank,\n", 278 | " RANK() OVER (ORDER BY VectorScore DESC) AS VectorRank\n", 279 | " FROM scored\n", 280 | " ),\n", 281 | "\n", 282 | " -- using k = 10\n", 283 | " full_score AS (\n", 284 | " SELECT *, (1/(IFindRank + 10) + 1/(VectorRank + 10)) AS Score\n", 285 | " FROM with_rank\n", 286 | " )\n", 287 | " \n", 288 | " SELECT TOP 10 * \n", 289 | " FROM full_score \n", 290 | " ORDER BY Score desc\"\"\")\n", 291 | "\n", 292 | "# alternatively, you can filter using vector similarity search:\n", 293 | "# filtered AS (\n", 294 | "# SELECT TOP 100 %ID AS ID, * FROM scotch_reviews\n", 295 | "# ORDER BY vector_cosine(description_vector, TO_VECTOR(:search_vec ,FLOAT)) DESC\n", 296 | "# ),\n", 297 | "\n", 298 | "result = connection.execute(sql, { \"search_vec\": str(search_vector) }).fetchall()\n", 299 | "df = pd.DataFrame(result)\n", 300 | "pd.set_option('display.max_colwidth', None) # Easier to read description\n", 301 | "df.head()" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "id": "f8f94c22-0f18-407c-bbd1-6d75268f2e30", 307 | "metadata": {}, 308 | "source": [ 309 | "\n", 310 | "## Creating the table manually\n", 311 | "\n", 312 | "If you're of the more adventurous type, why not create everything from scratch using the most recent `EMBEDDING()` datatype and function ([documented here](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_insembed))?\n", 313 | "To do so, we'll first need to create an embedding configuration, which we can then refer to when creating our column. Embedding configurations are simple SQL rows in the `%Embedding.Config` table:\n", 314 | "\n", 315 | "```SQL\n", 316 | "INSERT INTO %Embedding.Config (Name, EmbeddingClass, Configuration, VectorLength, Description) \n", 317 | " VALUES ('my-sentence-transformers', '%Embedding.SentenceTransformers', '{\"modelName\": \"all-MiniLM-L6-v2\"}', 384, 'SentenceTransformers \"all-MiniLM-L6-v2\" model')\n", 318 | "```\n", 319 | "\n", 320 | "Now create the table and load the data:\n", 321 | "\n", 322 | "```SQL\n", 323 | "CREATE TABLE hybrid.scotch_reviews (\n", 324 | " name VARCHAR(255),\n", 325 | " category VARCHAR(255),\n", 326 | " review_point INT,\n", 327 | " price DOUBLE,\n", 328 | " currency VARCHAR(10),\n", 329 | " description VARCHAR(2000),\n", 330 | " description_embedding EMBBEDDING('description', 'my-sentence-transformers')\n", 331 | ")\n", 332 | "\n", 333 | "LOAD DATA FROM '~/data/scotch_review.csv'\n", 334 | " COLUMNS (\n", 335 | " id INT,\n", 336 | " name VARCHAR(255),\n", 337 | " category VARCHAR(255),\n", 338 | " review_point INT,\n", 339 | " price DOUBLE,\n", 340 | " currency VARCHAR(10),\n", 341 | " description VARCHAR(2000)\n", 342 | " )\n", 343 | " INTO hybrid.scotch_reviews (name, category, review_point, price, description)\n", 344 | " VALUES (name, category, review_point, price, description)\n", 345 | " USING { \"from\" : { \"file\" : { \"header\" : 1 } } } \n", 346 | "```\n", 347 | "\n", 348 | "Depending on how you mounted this demo, you may need to copy the data file to load from into the container for the `LOAD DATA` command to work:\n", 349 | "```Shell\n", 350 | "docker cp ./data/scotch_review.csv iris-vector-search-iris-1:/tmp/\n", 351 | "```\n", 352 | "\n", 353 | "Now we can create our iFind index, and if you are on 2025.1, you can also add an Approximate Nearest Neighbour index:\n", 354 | "```SQL\n", 355 | "CREATE INDEX ifind ON hybrid.scotch_reviews(description) AS %iFind.Index.Basic;\n", 356 | "\n", 357 | "-- only on 2025.1!\n", 358 | "CREATE INDEX hnsw ON hybrid.scotch_reviews(description_emb) AS HNSW;\n", 359 | "```\n", 360 | "\n", 361 | "And now our query becomes (note the small changes in table and embedding column names):\n", 362 | "```SQL\n", 363 | "WITH \n", 364 | "\n", 365 | "filtered_text AS (\n", 366 | " SELECT %ID AS ID, * FROM hybrid.scotch_reviews\n", 367 | " WHERE %ID %FIND search_index(ifind, 'vanilla')\n", 368 | "),\n", 369 | "\n", 370 | "filtered_vec AS (\n", 371 | " SELECT TOP 100 %ID AS ID, * FROM hybrid.scotch_reviews\n", 372 | " ORDER BY vector_cosine(description_emb, EMBEDDING('vanilla')) DESC\n", 373 | "),\n", 374 | "\n", 375 | "scored AS (\n", 376 | " SELECT name, category, description,\n", 377 | " hybrid.scotchreviews_ifindrank(ID, 'vanilla') AS IFindScore,\n", 378 | " vector_cosine(description_emb, EMBEDDING('vanilla')) AS VectorScore\n", 379 | " FROM filtered_text\n", 380 | "), \n", 381 | "\n", 382 | "with_rank AS (\n", 383 | " SELECT *,\n", 384 | " RANK() OVER (ORDER BY IFindScore DESC) AS IFindRank,\n", 385 | " RANK() OVER (ORDER BY VectorScore DESC) AS VectorRank\n", 386 | " FROM scored\n", 387 | "),\n", 388 | "\n", 389 | "-- using k = 10\n", 390 | "full_score AS (\n", 391 | " SELECT *, (1/(IFindRank + 10) + 1/(VectorRank + 10)) AS Score\n", 392 | " FROM with_rank\n", 393 | ")\n", 394 | "\n", 395 | "SELECT TOP 10 * \n", 396 | "FROM full_score \n", 397 | "ORDER BY Score desc\n", 398 | "```" 399 | ] 400 | } 401 | ], 402 | "metadata": { 403 | "kernelspec": { 404 | "display_name": "Python 3 (ipykernel)", 405 | "language": "python", 406 | "name": "python3" 407 | }, 408 | "language_info": { 409 | "codemirror_mode": { 410 | "name": "ipython", 411 | "version": 3 412 | }, 413 | "file_extension": ".py", 414 | "mimetype": "text/x-python", 415 | "name": "python", 416 | "nbconvert_exporter": "python", 417 | "pygments_lexer": "ipython3", 418 | "version": "3.10.16" 419 | } 420 | }, 421 | "nbformat": 4, 422 | "nbformat_minor": 5 423 | } 424 | -------------------------------------------------------------------------------- /demo/iris_notebook_container.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Run and connect to a containerized IRIS instance on a notebook" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "2025.1: Pulling from intersystems/iris-community\n", 20 | "Digest: sha256:0dee36e1468f37d880932ceb189aebd2ddca7b71988ef9f4059977264435de54\n", 21 | "Status: Downloaded newer image for intersystems/iris-community:2025.1\n", 22 | "docker.io/intersystems/iris-community:2025.1\n", 23 | "\u001b[1m\n", 24 | "What's next:\u001b[0m\n", 25 | " View a summary of image vulnerabilities and recommendations → \u001b[36mdocker scout quickview intersystems/iris-community:2025.1\u001b[0m\n" 26 | ] 27 | } 28 | ], 29 | "source": [ 30 | "# !docker pull containers.intersystems.com/intersystems/iris-community:2024.1\n", 31 | "!docker pull intersystems/iris-community:2025.1" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 2, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "name": "stderr", 41 | "output_type": "stream", 42 | "text": [ 43 | "Pulling image testcontainers/ryuk:0.8.1\n", 44 | "Container started: bc3f7ff4ce10\n", 45 | "Waiting for container with image testcontainers/ryuk:0.8.1 to be ready ...\n", 46 | "Pulling image intersystems/iris-community:2025.1\n", 47 | "Container started: 17242db8f2f5\n", 48 | "Waiting for container with image intersystems/iris-community:2025.1 to be ready ...\n" 49 | ] 50 | }, 51 | { 52 | "name": "stdout", 53 | "output_type": "stream", 54 | "text": [ 55 | "res iris session iris -U %SYS '##class(%SQL.Statement).%ExecDirect(,\"CREATE DATABASE demo\")' ExecResult(exit_code=0, output=b'')\n", 56 | "res iris session iris -U %SYS '##class(Security.Users).Create(\"demo\",\"%ALL\",\"demo\")' ExecResult(exit_code=0, output=b'')\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "\n", 62 | "from testcontainers.iris import IRISContainer\n", 63 | "import os\n", 64 | "\n", 65 | "image = 'intersystems/iris-community:2025.1'\n", 66 | "container = IRISContainer(image, username=\"demo\", password=\"demo\", namespace=\"demo\")\n", 67 | "container.with_exposed_ports(1972, 52773)\n", 68 | "container.start()\n", 69 | "CONNECTION_STRING = container.get_connection_url(os.getenv(\"IRIS_HOSTNAME\",\"localhost\"))\n" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "name": "stdout", 79 | "output_type": "stream", 80 | "text": [ 81 | "iris://demo:demo@localhost:57834/demo\n" 82 | ] 83 | } 84 | ], 85 | "source": [ 86 | "print(CONNECTION_STRING)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "#### Using this connection string, you can connect and use IRIS as a vectordb via sql, langchain and llama_index as shown in the other demos.\n" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [] 102 | } 103 | ], 104 | "metadata": { 105 | "kernelspec": { 106 | "display_name": "Python 3 (ipykernel)", 107 | "language": "python", 108 | "name": "python3" 109 | }, 110 | "language_info": { 111 | "codemirror_mode": { 112 | "name": "ipython", 113 | "version": 3 114 | }, 115 | "file_extension": ".py", 116 | "mimetype": "text/x-python", 117 | "name": "python", 118 | "nbconvert_exporter": "python", 119 | "pygments_lexer": "ipython3", 120 | "version": "3.10.16" 121 | } 122 | }, 123 | "nbformat": 4, 124 | "nbformat_minor": 4 125 | } 126 | -------------------------------------------------------------------------------- /demo/langchain_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Using InterSystems Vector Search with LangChain\n", 8 | "\n", 9 | "In this notebook, we'll leverage the Vector Search capabilities available in [InterSystems IRIS 2025.1](https://www.intersystems.com/news/iris-vector-search-support-ai-applications/) and [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/), using the well-known [LangChain](https://www.langchain.com/) framework." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Setting up the connection\n", 17 | "\n", 18 | "First, let's make sure we set up the connection to your InterSystems IRIS instance or Cloud SQL deployment. When targeting a Cloud SQL deployment, change the username and password to `SQLAdmin` and the corresponding password you chose when creating the deployment, and set the port to 443. \n" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import os\n", 28 | "\n", 29 | "username = 'demo'\n", 30 | "password = 'demo'\n", 31 | "hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 32 | "port = 1972 \n", 33 | "namespace = 'USER'" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "### Securing the connection\n", 41 | "\n", 42 | "If the target you're connecting to requires secure connections, as is the case for Cloud SQL deployments, we need to supply a certificate and some additional settings to the driver. For Cloud SQL, you can download the certificate file from your deployment's details screen. Look for the button that says \"Get X.509 certificate\", and copy it into a local folder, such as `/usr/cert-demo/`. If you're running this notebook in a container, you can copy the certificate file into the container using the following command:\n", 43 | "\n", 44 | "```Shell\n", 45 | "docker cp ~/Downloads/certificateSQLaaS.pem iris-vector-search-jupyter-1:/usr/cert-demo/certificateSQLaaS.pem\n", 46 | "```\n", 47 | "\n", 48 | "Remember to also set the port to 443 in the cell above." 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "name": "stdout", 58 | "output_type": "stream", 59 | "text": [ 60 | "No certificate file found, continuing with insecure connection\n" 61 | ] 62 | } 63 | ], 64 | "source": [ 65 | "import ssl\n", 66 | "\n", 67 | "certificateFile = \"/usr/cert-demo/certificateSQLaaS.pem\"\n", 68 | "\n", 69 | "if (os.path.exists(certificateFile)):\n", 70 | " print(\"Located SSL certficate at '%s', initializing SSL configuration\", certificateFile)\n", 71 | " sslcontext = ssl.create_default_context(cafile=certificateFile)\n", 72 | "else:\n", 73 | " print(\"No certificate file found, continuing with insecure connection\")\n", 74 | " sslcontext = None" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "name": "stdout", 84 | "output_type": "stream", 85 | "text": [ 86 | "hello world!\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "from sqlalchemy import create_engine, text\n", 92 | "\n", 93 | "url = f\"iris://{username}:{password}@{hostname}:{port}/{namespace}\"\n", 94 | "engine = create_engine(url, connect_args={\"sslcontext\": sslcontext})\n", 95 | "with engine.connect() as conn:\n", 96 | " print(conn.execute(text(\"SELECT 'hello world!'\")).first()[0])" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "## Creating vectors using LangChain\n", 104 | "\n", 105 | "The following cell will load the `state_of_the_union.txt` file from the `/data/` directory and split it into chunks that are ready for translation into vectors, using standard LangChain components." 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 4, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "from langchain.docstore.document import Document\n", 115 | "from langchain.document_loaders import TextLoader\n", 116 | "from langchain.text_splitter import CharacterTextSplitter\n", 117 | "\n", 118 | "loader = TextLoader(\"../data/state_of_the_union.txt\", encoding='utf-8')\n", 119 | "documents = loader.load()\n", 120 | "text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=20)\n", 121 | "docs = text_splitter.split_documents(documents)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "### Setting up your OpenAI API key\n", 129 | "\n", 130 | "If you have an OpenAI subscription, use the following cell to pick up your OpenAI API key, and use `OpenAIEmbeddings()` in the cells below. \n", 131 | "\n", 132 | "Alternatively, you can skip this step and use a local embeddings model that's included in the libraries already imported, such as `HuggingFaceEmbeddings()`, `FastEmbeddings()`, or `FakeEmbeddings()` (for testing purposes!). Just comment / uncomment the corresponding lines in the cells further down the notebook." 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": {}, 139 | "outputs": [ 140 | { 141 | "name": "stdin", 142 | "output_type": "stream", 143 | "text": [ 144 | "OpenAI API Key: ········\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "import getpass\n", 150 | "import os\n", 151 | "from dotenv import load_dotenv\n", 152 | "\n", 153 | "load_dotenv(override=True)\n", 154 | "\n", 155 | "if \"OPENAI_API_KEY\" in os.environ:\n", 156 | " os.environ.pop(\"OPENAI_API_KEY\")\n", 157 | "\n", 158 | "if not os.environ.get(\"OPENAI_API_KEY\"): \n", 159 | " os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "In this cell, we'll put all the pieces together and create embeddings for our document collection and store them as a collection in our IRIS database." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 6, 172 | "metadata": {}, 173 | "outputs": [ 174 | { 175 | "name": "stderr", 176 | "output_type": "stream", 177 | "text": [ 178 | "/var/folders/vl/q0pvzbmx0y793pp4gl7pm25sffw2vg/T/ipykernel_70467/1414313187.py:9: LangChainDeprecationWarning: The class `OpenAIEmbeddings` was deprecated in LangChain 0.0.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import OpenAIEmbeddings``.\n", 179 | " embedding = OpenAIEmbeddings(),\n" 180 | ] 181 | }, 182 | { 183 | "name": "stdout", 184 | "output_type": "stream", 185 | "text": [ 186 | "Number of docs in vector store: 114\n" 187 | ] 188 | } 189 | ], 190 | "source": [ 191 | "from langchain.embeddings.openai import OpenAIEmbeddings\n", 192 | "from langchain.embeddings import HuggingFaceEmbeddings\n", 193 | "from langchain.embeddings import FakeEmbeddings\n", 194 | "from langchain.embeddings.fastembed import FastEmbedEmbeddings\n", 195 | "\n", 196 | "from langchain_iris import IRISVector\n", 197 | "\n", 198 | "db = IRISVector.from_documents(\n", 199 | " embedding = OpenAIEmbeddings(), \n", 200 | " # embedding = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\"),\n", 201 | " # embedding = FastEmbeddings(),\n", 202 | " # embedding = FakeEmbeddings(size=123),\n", 203 | " documents = docs,\n", 204 | " collection_name = \"state_of_the_union_test\",\n", 205 | " connection_string = f\"iris://{username}:{password}@{hostname}:{port}/{namespace}\",\n", 206 | " engine_args = { \"connect_args\": {\"sslcontext\": sslcontext} }\n", 207 | ")\n", 208 | "\n", 209 | "print(f\"Number of docs in vector store: {len(db.get()['ids'])}\")" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "Next, we'll use LangChain's similarity search API to retrieve documents from our collection that match a free text query." 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 7, 222 | "metadata": {}, 223 | "outputs": [ 224 | { 225 | "name": "stdout", 226 | "output_type": "stream", 227 | "text": [ 228 | "--------------------------------------------------------------------------------\n", 229 | "Score: 0.180300940505385\n", 230 | "And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n", 231 | "\n", 232 | "We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n", 233 | "\n", 234 | "We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.\n", 235 | "--------------------------------------------------------------------------------\n", 236 | "--------------------------------------------------------------------------------\n", 237 | "Score: 0.207073243024113\n", 238 | "So let’s not abandon our streets. Or choose between safety and equal justice. \n", 239 | "\n", 240 | "Let’s come together to protect our communities, restore trust, and hold law enforcement accountable. \n", 241 | "\n", 242 | "That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n", 243 | "--------------------------------------------------------------------------------\n", 244 | "--------------------------------------------------------------------------------\n", 245 | "Score: 0.208734754649556\n", 246 | "There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery. \n", 247 | "\n", 248 | "Get rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers.\n", 249 | "--------------------------------------------------------------------------------\n", 250 | "--------------------------------------------------------------------------------\n", 251 | "Score: 0.211442890123872\n", 252 | "We are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come. \n", 253 | "\n", 254 | "Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n", 255 | "\n", 256 | "The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.\n", 257 | "--------------------------------------------------------------------------------\n" 258 | ] 259 | } 260 | ], 261 | "source": [ 262 | "query = \"Joint patrols to catch traffickers\"\n", 263 | "docs_with_score = db.similarity_search_with_score(query)\n", 264 | "\n", 265 | "for doc, score in docs_with_score:\n", 266 | " print(\"-\" * 80)\n", 267 | " print(\"Score: \", score)\n", 268 | " print(doc.page_content)\n", 269 | " print(\"-\" * 80)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 8, 275 | "metadata": {}, 276 | "outputs": [ 277 | { 278 | "data": { 279 | "text/plain": [ 280 | "(Document(metadata={}, page_content='dog'), 0.0)" 281 | ] 282 | }, 283 | "execution_count": 8, 284 | "metadata": {}, 285 | "output_type": "execute_result" 286 | } 287 | ], 288 | "source": [ 289 | "db.add_documents([Document(page_content=\"dog\")])\n", 290 | "docs_with_score = db.similarity_search_with_score(\"dog\")\n", 291 | "docs_with_score[0]" 292 | ] 293 | } 294 | ], 295 | "metadata": { 296 | "kernelspec": { 297 | "display_name": "Python 3 (ipykernel)", 298 | "language": "python", 299 | "name": "python3" 300 | }, 301 | "language_info": { 302 | "codemirror_mode": { 303 | "name": "ipython", 304 | "version": 3 305 | }, 306 | "file_extension": ".py", 307 | "mimetype": "text/x-python", 308 | "name": "python", 309 | "nbconvert_exporter": "python", 310 | "pygments_lexer": "ipython3", 311 | "version": "3.10.16" 312 | } 313 | }, 314 | "nbformat": 4, 315 | "nbformat_minor": 4 316 | } 317 | -------------------------------------------------------------------------------- /demo/llama_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Using InterSystems Vector Search with LlamaIndex\n", 8 | "\n", 9 | "In this notebook, we'll leverage the Vector Search capabilities available in [InterSystems IRIS 2025.1](https://www.intersystems.com/news/iris-vector-search-support-ai-applications/) and [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/), using the well-known [LlamaIndex](https://www.llamaindex.ai/) framework." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Setting up the connection\n", 17 | "\n", 18 | "First, let's make sure we set up the connection to your InterSystems IRIS instance or Cloud SQL deployment. When targeting a Cloud SQL deployment, change the username and password to `SQLAdmin` and the corresponding password you chose when creating the deployment, and set the port to 443. \n" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import os\n", 28 | "\n", 29 | "username = 'demo'\n", 30 | "password = 'demo'\n", 31 | "hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 32 | "\n", 33 | "port = 1972 \n", 34 | "namespace = 'USER'" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "### Securing the connection\n", 42 | "\n", 43 | "If the target you're connecting to requires secure connections, as is the case for Cloud SQL deployments, we need to supply a certificate and some additional settings to the driver. For Cloud SQL, you can download the certificate file from your deployment's details screen. Look for the button that says \"Get X.509 certificate\", and copy it into a local folder, such as `/usr/cert-demo/`. If you're running this notebook in a container, you can copy the certificate file into the container using the following command:\n", 44 | "\n", 45 | "```Shell\n", 46 | "docker cp ~/Downloads/certificateSQLaaS.pem iris-vector-search-jupyter-1:/usr/cert-demo/certificateSQLaaS.pem\n", 47 | "```\n", 48 | "\n", 49 | "Remember to also set the port to 443 in the cell above." 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 2, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "No certificate file found, continuing with insecure connection\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "import ssl\n", 67 | "\n", 68 | "certificateFile = \"/usr/cert-demo/certificateSQLaaS.pem\"\n", 69 | "\n", 70 | "if (os.path.exists(certificateFile)):\n", 71 | " print(\"Located SSL certficate at '%s', initializing SSL configuration\", certificateFile)\n", 72 | " sslcontext = ssl.create_default_context(cafile=certificateFile)\n", 73 | "else:\n", 74 | " print(\"No certificate file found, continuing with insecure connection\")\n", 75 | " sslcontext = None" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [ 83 | { 84 | "name": "stdout", 85 | "output_type": "stream", 86 | "text": [ 87 | "hello world!\n" 88 | ] 89 | } 90 | ], 91 | "source": [ 92 | "from sqlalchemy import create_engine, text\n", 93 | "\n", 94 | "url = f\"iris://{username}:{password}@{hostname}:{port}/{namespace}\"\n", 95 | "\n", 96 | "engine = create_engine(url, connect_args={\"sslcontext\": sslcontext})\n", 97 | "with engine.connect() as conn:\n", 98 | " print(conn.execute(text(\"SELECT 'hello world!'\")).first()[0])" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## Creating Vectors using LlamaIndex\n", 106 | "\n", 107 | "In the following cell we'll leverage standard LlamaIndex components to read the files in the `/data/paul_graham/` directory and prepare them for creating embeddings." 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "name": "stdout", 117 | "output_type": "stream", 118 | "text": [ 119 | "First Document ID: 6057efff-2d5f-4331-9b90-cddc73a284bf\n" 120 | ] 121 | } 122 | ], 123 | "source": [ 124 | "#from llama_index import SimpleDirectoryReader\n", 125 | "from llama_index.legacy import SimpleDirectoryReader\n", 126 | "\n", 127 | "documents = SimpleDirectoryReader(\"../data/paul_graham\").load_data()\n", 128 | "print(\"First Document ID:\", documents[0].doc_id)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "### Setting up your OpenAI API key\n", 136 | "\n", 137 | "If you have an OpenAI subscription, use the following cell to pick up your OpenAI API key, and use `OpenAIEmbeddings()` in the cells below. \n", 138 | "\n", 139 | "Alternatively, you can skip this step and use a local embeddings model that's included in the libraries already imported, such as `HuggingFaceEmbeddings()`, `FastEmbeddings()`, or `FakeEmbeddings()` (for testing purposes!). Just comment / uncomment the corresponding lines in the cells further down the notebook." 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 6, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdin", 149 | "output_type": "stream", 150 | "text": [ 151 | "OpenAI API Key: ········\n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "import getpass\n", 157 | "import os\n", 158 | "from dotenv import load_dotenv\n", 159 | "\n", 160 | "load_dotenv(override=True)\n", 161 | "\n", 162 | "if (not os.environ.get(\"OPENAI_API_KEY\")) or (os.environ.get(\"OPENAI_API_KEY\")=='your-key-goes-here'):\n", 163 | " os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "The next cell sets up a local language model as the default to create the embeddings if you don't have an OpenAI key. LlamaIndex' default is to use OpenAI, so not running the following cell will assume you want to continue with OpenAI." 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 9, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "from llama_index.legacy import ServiceContext, set_global_service_context\n", 180 | "from llama_index.embeddings.langchain import LangchainEmbedding\n", 181 | "from langchain.embeddings import HuggingFaceEmbeddings\n", 182 | "from langchain.embeddings import FakeEmbeddings\n", 183 | "\n", 184 | "if (not os.environ.get(\"OPENAI_API_KEY\")) or (os.environ.get(\"OPENAI_API_KEY\")=='your-key-goes-here'):\n", 185 | " lc_embed_model = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n", 186 | " # lc_embed_model = FakeEmbeddings(size=1536)\n", 187 | "\n", 188 | " # ServiceContext captures how vectors will be generated\n", 189 | " service_context = ServiceContext.from_defaults(\n", 190 | " embed_model=LangchainEmbedding(lc_embed_model)\n", 191 | " )\n", 192 | "\n", 193 | " set_global_service_context(service_context)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "Now we'll configure the VectorStore object that will be used to save our vectors in IRIS." 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 10, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "from llama_index.legacy import StorageContext\n", 210 | "from llama_index.legacy.indices.vector_store import VectorStoreIndex\n", 211 | "from llama_iris import IRISVectorStore\n", 212 | "\n", 213 | "# StorageContext captures how vectors will be stored\n", 214 | "vector_store = IRISVectorStore.from_params(\n", 215 | " connection_string = url,\n", 216 | " table_name = \"paul_graham_essay\",\n", 217 | " embed_dim = 1536, # openai embedding dimensionality\n", 218 | " # embed_dim = 384, # HugginFace all-MiniLM-L6-v2 dimensionality\n", 219 | " engine_args = { \"connect_args\": {\"sslcontext\": sslcontext} }\n", 220 | ")\n", 221 | "storage_context = StorageContext.from_defaults(vector_store=vector_store)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "And now putting it all together: feeding the documents to our VectorStore." 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 11, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "name": "stderr", 238 | "output_type": "stream", 239 | "text": [ 240 | "/opt/anaconda3/envs/iris-vector-search/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", 241 | " from .autonotebook import tqdm as notebook_tqdm\n", 242 | "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 34.41it/s]\n", 243 | "Generating embeddings: 100%|█| 22/22 [00:01<00:00, 13.30it/s\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "index = VectorStoreIndex.from_documents(\n", 249 | " documents, \n", 250 | " storage_context=storage_context, \n", 251 | " show_progress=True, \n", 252 | ")\n", 253 | "query_engine = index.as_query_engine()" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 12, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [ 262 | "# # If reconnecting to the vector store, use this: \n", 263 | "\n", 264 | "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)\n", 265 | "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n", 266 | "query_engine = index.as_query_engine()\n", 267 | "\n", 268 | "# Adding documents to existing index\n", 269 | "\n", 270 | "for d in documents:\n", 271 | " index.insert(document=d, storage_context=storage_context)" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 13, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "response = query_engine.query(\"What did the author do?\")" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 14, 286 | "metadata": {}, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "The author wrote essays and also started to think about other things they could work on.\n" 293 | ] 294 | } 295 | ], 296 | "source": [ 297 | "import textwrap\n", 298 | "print(textwrap.fill(str(response), 100))" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 15, 304 | "metadata": {}, 305 | "outputs": [ 306 | { 307 | "name": "stdout", 308 | "output_type": "stream", 309 | "text": [ 310 | "AI was in the air in the mid 1980s.\n" 311 | ] 312 | } 313 | ], 314 | "source": [ 315 | "response = query_engine.query(\"What happened in the mid 1980s?\")\n", 316 | "print(textwrap.fill(str(response), 100))" 317 | ] 318 | } 319 | ], 320 | "metadata": { 321 | "kernelspec": { 322 | "display_name": "Python 3 (ipykernel)", 323 | "language": "python", 324 | "name": "python3" 325 | }, 326 | "language_info": { 327 | "codemirror_mode": { 328 | "name": "ipython", 329 | "version": 3 330 | }, 331 | "file_extension": ".py", 332 | "mimetype": "text/x-python", 333 | "name": "python", 334 | "nbconvert_exporter": "python", 335 | "pygments_lexer": "ipython3", 336 | "version": "3.10.16" 337 | } 338 | }, 339 | "nbformat": 4, 340 | "nbformat_minor": 4 341 | } 342 | -------------------------------------------------------------------------------- /demo/sql_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Vector Search with IRIS SQL\n", 8 | "This tutorial covers how to use IRIS as a vector database. \n", 9 | "\n", 10 | "For this tutorial, we will use a dataset of 2.2k online reviews of scotch (\n", 11 | "dataset from https://www.kaggle.com/datasets/koki25ando/22000-scotch-whisky-reviews) . With our latest vector database functionality, we can leverage the latest embedding models to run semantic search on the online reviews of scotch whiskeys. In addition, we'll be able to apply filters on columns with structured data. For example, we will be able to search for whiskeys that are priced under $100, and are 'earthy, smooth, and easy to drink'. Let's find our perfect whiskey!" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "First step is to do some imports and establish a connection to InterSystems IRIS." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 23, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import os, pandas as pd\n", 28 | "from sentence_transformers import SentenceTransformer\n", 29 | "from sqlalchemy import create_engine, text\n", 30 | "\n", 31 | "username = 'demo'\n", 32 | "password = 'demo'\n", 33 | "hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 34 | "port = '1972' \n", 35 | "namespace = 'USER'\n", 36 | "\n", 37 | "\n", 38 | "# username = 'demo'\n", 39 | "# password = 'demo'\n", 40 | "# hostname = os.getenv('IRIS_HOSTNAME', 'localhost')\n", 41 | "# port = '63958' \n", 42 | "# namespace = 'USER'\n", 43 | "\n", 44 | "CONNECTION_STRING = f\"iris://{username}:{password}@{hostname}:{port}/{namespace}\"\n", 45 | "#iris://demo:demo@localhost:63958/demo\n", 46 | "engine = create_engine(CONNECTION_STRING)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "## Exploring the dataset\n", 54 | "\n", 55 | "Let's take a look at the data in our CSV file with whiskey reviews." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 24, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "data": { 65 | "text/html": [ 66 | "
\n", 67 | "\n", 80 | "\n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | "
Unnamed: 0namecategoryreview.pointpricecurrencydescription
01Johnnie Walker Blue Label, 40%Blended Scotch Whisky97225$Magnificently powerful and intense. Caramels, ...
12Black Bowmore, 1964 vintage, 42 year old, 40.5%Single Malt Scotch974500.00$What impresses me most is how this whisky evol...
23Bowmore 46 year old (distilled 1964), 42.9%Single Malt Scotch9713500.00$There have been some legendary Bowmores from t...
34Compass Box The General, 53.4%Blended Malt Scotch Whisky96325$With a name inspired by a 1926 Buster Keaton m...
45Chivas Regal Ultis, 40%Blended Malt Scotch Whisky96160$Captivating, enticing, and wonderfully charmin...
\n", 146 | "
" 147 | ], 148 | "text/plain": [ 149 | " Unnamed: 0 name \\\n", 150 | "0 1 Johnnie Walker Blue Label, 40% \n", 151 | "1 2 Black Bowmore, 1964 vintage, 42 year old, 40.5% \n", 152 | "2 3 Bowmore 46 year old (distilled 1964), 42.9% \n", 153 | "3 4 Compass Box The General, 53.4% \n", 154 | "4 5 Chivas Regal Ultis, 40% \n", 155 | "\n", 156 | " category review.point price currency \\\n", 157 | "0 Blended Scotch Whisky 97 225 $ \n", 158 | "1 Single Malt Scotch 97 4500.00 $ \n", 159 | "2 Single Malt Scotch 97 13500.00 $ \n", 160 | "3 Blended Malt Scotch Whisky 96 325 $ \n", 161 | "4 Blended Malt Scotch Whisky 96 160 $ \n", 162 | "\n", 163 | " description \n", 164 | "0 Magnificently powerful and intense. Caramels, ... \n", 165 | "1 What impresses me most is how this whisky evol... \n", 166 | "2 There have been some legendary Bowmores from t... \n", 167 | "3 With a name inspired by a 1926 Buster Keaton m... \n", 168 | "4 Captivating, enticing, and wonderfully charmin... " 169 | ] 170 | }, 171 | "execution_count": 24, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "# Load the CSV file\n", 178 | "df = pd.read_csv('../data/scotch_review.csv')\n", 179 | "df.head()" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "Now we'll reorganize the data a little bit with panda functions to make it more practical to store in a table." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 25, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "text/html": [ 197 | "
\n", 198 | "\n", 211 | "\n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | "
namecategoryreview.pointpricedescription
0Johnnie Walker Blue Label, 40%Blended Scotch Whisky97225Magnificently powerful and intense. Caramels, ...
1Black Bowmore, 1964 vintage, 42 year old, 40.5%Single Malt Scotch974500.00What impresses me most is how this whisky evol...
2Bowmore 46 year old (distilled 1964), 42.9%Single Malt Scotch9713500.00There have been some legendary Bowmores from t...
3Compass Box The General, 53.4%Blended Malt Scotch Whisky96325With a name inspired by a 1926 Buster Keaton m...
4Chivas Regal Ultis, 40%Blended Malt Scotch Whisky96160Captivating, enticing, and wonderfully charmin...
\n", 265 | "
" 266 | ], 267 | "text/plain": [ 268 | " name \\\n", 269 | "0 Johnnie Walker Blue Label, 40% \n", 270 | "1 Black Bowmore, 1964 vintage, 42 year old, 40.5% \n", 271 | "2 Bowmore 46 year old (distilled 1964), 42.9% \n", 272 | "3 Compass Box The General, 53.4% \n", 273 | "4 Chivas Regal Ultis, 40% \n", 274 | "\n", 275 | " category review.point price \\\n", 276 | "0 Blended Scotch Whisky 97 225 \n", 277 | "1 Single Malt Scotch 97 4500.00 \n", 278 | "2 Single Malt Scotch 97 13500.00 \n", 279 | "3 Blended Malt Scotch Whisky 96 325 \n", 280 | "4 Blended Malt Scotch Whisky 96 160 \n", 281 | "\n", 282 | " description \n", 283 | "0 Magnificently powerful and intense. Caramels, ... \n", 284 | "1 What impresses me most is how this whisky evol... \n", 285 | "2 There have been some legendary Bowmores from t... \n", 286 | "3 With a name inspired by a 1926 Buster Keaton m... \n", 287 | "4 Captivating, enticing, and wonderfully charmin... " 288 | ] 289 | }, 290 | "execution_count": 25, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "# Clean data\n", 297 | "# Remove the specified columns\n", 298 | "df.drop(['currency'], axis=1, inplace=True)\n", 299 | "\n", 300 | "# Drop the first column\n", 301 | "df.drop(columns=df.columns[0], inplace=True)\n", 302 | "\n", 303 | "# Remove rows without a price\n", 304 | "df.dropna(subset=['price'], inplace=True)\n", 305 | "\n", 306 | "# Ensure values in 'price' are numbers\n", 307 | "df = df[pd.to_numeric(df['price'], errors='coerce').notna()]\n", 308 | "\n", 309 | "# Replace NaN values in other columns with an empty string\n", 310 | "df.fillna('', inplace=True)\n", 311 | "\n", 312 | "df.head()" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "## Creating the table in IRIS SQL\n", 320 | "\n", 321 | "Now, InterSystems IRIS supports vectors as a datatype in tables! Here, we create a table with a few different columns. The last column, `description_vector` of type `VECTOR(FLOAT, 384)`, will be used to store vectors that are generated by passing the `description` of a review through an embedding model. The `FLOAT` option here is new in 2024.3, and 384 is the number of dimensions the chosen embedding model uses." 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 26, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "with engine.connect() as conn:\n", 331 | " with conn.begin():# Load \n", 332 | " sql = f\"\"\"\n", 333 | " CREATE TABLE IF NOT EXISTS scotch_reviews (\n", 334 | " name VARCHAR(255),\n", 335 | " category VARCHAR(255),\n", 336 | " review_point INT,\n", 337 | " price DOUBLE,\n", 338 | " description VARCHAR(2000),\n", 339 | " description_vector VECTOR(FLOAT, 384)\n", 340 | " )\n", 341 | " \"\"\"\n", 342 | " result = conn.execute(text(sql))" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "## Creating the embeddings\n", 350 | "\n", 351 | "Next, we'll create the embeddings for the `description` column. In IRIS 2024.3, you can leave this work to IRIS by using the new [`EMBEDDING` datatype](https://docs.intersystems.com/iris20243/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_insembed), but for now we'll go with classic Pythonic ways of creating them, based on a common Sentence Transformer model." 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 27, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "# Load a pre-trained sentence transformer model. This model's output vectors are of size 384\n", 361 | "model = SentenceTransformer('all-MiniLM-L6-v2') " 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 28, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/html": [ 372 | "
\n", 373 | "\n", 386 | "\n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | "
namecategoryreview.pointpricedescriptiondescription_vector
0Johnnie Walker Blue Label, 40%Blended Scotch Whisky97225Magnificently powerful and intense. Caramels, ...[-0.010494445450603962, 0.014728965237736702, ...
1Black Bowmore, 1964 vintage, 42 year old, 40.5%Single Malt Scotch974500.00What impresses me most is how this whisky evol...[0.02318125031888485, -0.05123035982251167, 0....
2Bowmore 46 year old (distilled 1964), 42.9%Single Malt Scotch9713500.00There have been some legendary Bowmores from t...[0.04333321005105972, -0.017066635191440582, -...
3Compass Box The General, 53.4%Blended Malt Scotch Whisky96325With a name inspired by a 1926 Buster Keaton m...[-0.07594005018472672, -0.03676239028573036, 0...
4Chivas Regal Ultis, 40%Blended Malt Scotch Whisky96160Captivating, enticing, and wonderfully charmin...[-0.012818857096135616, -0.09769789129495621, ...
\n", 446 | "
" 447 | ], 448 | "text/plain": [ 449 | " name \\\n", 450 | "0 Johnnie Walker Blue Label, 40% \n", 451 | "1 Black Bowmore, 1964 vintage, 42 year old, 40.5% \n", 452 | "2 Bowmore 46 year old (distilled 1964), 42.9% \n", 453 | "3 Compass Box The General, 53.4% \n", 454 | "4 Chivas Regal Ultis, 40% \n", 455 | "\n", 456 | " category review.point price \\\n", 457 | "0 Blended Scotch Whisky 97 225 \n", 458 | "1 Single Malt Scotch 97 4500.00 \n", 459 | "2 Single Malt Scotch 97 13500.00 \n", 460 | "3 Blended Malt Scotch Whisky 96 325 \n", 461 | "4 Blended Malt Scotch Whisky 96 160 \n", 462 | "\n", 463 | " description \\\n", 464 | "0 Magnificently powerful and intense. Caramels, ... \n", 465 | "1 What impresses me most is how this whisky evol... \n", 466 | "2 There have been some legendary Bowmores from t... \n", 467 | "3 With a name inspired by a 1926 Buster Keaton m... \n", 468 | "4 Captivating, enticing, and wonderfully charmin... \n", 469 | "\n", 470 | " description_vector \n", 471 | "0 [-0.010494445450603962, 0.014728965237736702, ... \n", 472 | "1 [0.02318125031888485, -0.05123035982251167, 0.... \n", 473 | "2 [0.04333321005105972, -0.017066635191440582, -... \n", 474 | "3 [-0.07594005018472672, -0.03676239028573036, 0... \n", 475 | "4 [-0.012818857096135616, -0.09769789129495621, ... " 476 | ] 477 | }, 478 | "execution_count": 28, 479 | "metadata": {}, 480 | "output_type": "execute_result" 481 | } 482 | ], 483 | "source": [ 484 | "# Generate embeddings for all descriptions at once.\n", 485 | "# Batch processing before inserting into the table makes it faster, but this step may still take a minute\n", 486 | "embeddings = model.encode(df['description'].tolist(), normalize_embeddings=True)\n", 487 | "\n", 488 | "# Add the embeddings to the DataFrame\n", 489 | "df['description_vector'] = embeddings.tolist()\n", 490 | "\n", 491 | "df.head()" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "Now we'll load the data into our table. Note the `str()` call as we're passing the vector as a comma-separated list of values in string format, because there is no specific vector datatype in the DB-API driver standard." 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 29, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "with engine.connect() as conn:\n", 508 | " with conn.begin():\n", 509 | " for index, row in df.iterrows():\n", 510 | " sql = text(\"\"\"\n", 511 | " INSERT INTO scotch_reviews \n", 512 | " (name, category, review_point, price, description, description_vector) \n", 513 | " VALUES (:name, :category, :review_point, :price, :description, TO_VECTOR(:description_vector))\n", 514 | " \"\"\")\n", 515 | " conn.execute(sql, {\n", 516 | " 'name': row['name'], \n", 517 | " 'category': row['category'], \n", 518 | " 'review_point': row['review.point'], \n", 519 | " 'price': row['price'], \n", 520 | " 'description': row['description'], \n", 521 | " 'description_vector': str(row['description_vector'])\n", 522 | " })\n" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "metadata": {}, 528 | "source": [ 529 | "## Running a few queries\n", 530 | "\n", 531 | "Let's look for a scotch that costs less than $100, and has an earthy and creamy taste." 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 30, 537 | "metadata": {}, 538 | "outputs": [], 539 | "source": [ 540 | "description_search = \"earthy and creamy taste\"\n", 541 | "search_vector = model.encode(description_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": 31, 547 | "metadata": {}, 548 | "outputs": [ 549 | { 550 | "name": "stdout", 551 | "output_type": "stream", 552 | "text": [ 553 | "[('Signatory (distilled at Bowmore), 16 year old, 1988 vintage, cask #42508, 46%', 'Single Malt Scotch', 87, 60.0, 'Medium-bodied and nicely textured. Good balance of flavors -- and well-integrated, too -- with lovely sweet notes (cereal grain, cookie dough, carame ... (48 characters truncated) ... fishnets, and brine that is complementary, but not aggressive, with a suggestion of lavender and tangerine. Balanced finish. (332 bottles produced.)', '-.048620376735925674438,-.082065843045711517333,.039660684764385223388,-.018970852717757225036,-.017485298216342926026,.042453121393918991088,.046325 ... (8848 characters truncated) ... 064819,-.0038620312698185443878,-.022344633936882019042,.052769336849451065063,-.061306387186050415039,.048756919801235198974,-.063436612486839294433'), ('Shieldaig 12 year old, 40%', 'Blended Scotch Whisky', 85, 31.0, 'This is a sharp dresser, with a firm, solid mouthfeel and an altogether finer and more focused taste than Shieldaig Classic (see\\r\\nbelow). It’s not ... (114 characters truncated) ... e, and some soft fruit, including a touch of overripe banana and melon notes. The savoriness this time comes from a touch of pepper rather than salt.', '-.0049302759580314159393,-.070051722228527069091,.046160325407981872558,.053877647966146469116,.0037386598996818065643,.018159903585910797119,.076887 ... (8828 characters truncated) ... 3170166,-.0015221295179799199104,.047901224344968795776,.0098907267674803733826,-.026278590783476829528,.042504664510488510131,.041063331067562103271'), ('The Arran Malt, Single Bourbon Cask, (Cask#1801), 1996 Vintage, 50.5%', 'Single Malt Scotch', 86, 80.0, 'Fresh and clean, with notes of vanilla, ripe barley, honey, caramel apple, and toasted coconut. Creamy and mouth-coating in texture, leading to a pleasingly dry, spicy oak finish. Very drinkable, yet satisfying. Quite nice. \\r\\n', '-.0010089599527418613433,-.050370443612337112426,.046052008867263793946,.074557252228260040283,-.0048394058831036090851,.039374433457851409912,.02021 ... (8848 characters truncated) ... 5349731,.0031521078199148178101,-.0083352373912930488586,.10131823271512985229,-.021709911525249481201,.037876114249229431152,.0095796724781394004821')]\n" 554 | ] 555 | } 556 | ], 557 | "source": [ 558 | "with engine.connect() as conn:\n", 559 | " with conn.begin():\n", 560 | " sql = text(\"\"\"\n", 561 | " SELECT TOP 3 * FROM scotch_reviews \n", 562 | " WHERE price < 100 \n", 563 | " ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(:search_vector)) DESC\n", 564 | " \"\"\")\n", 565 | "\n", 566 | " results = conn.execute(sql, {'search_vector': str(search_vector)}).fetchall()\n", 567 | "\n", 568 | "print(results)" 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": {}, 574 | "source": [ 575 | "Let's print that result a little more nicely!" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 32, 581 | "metadata": {}, 582 | "outputs": [ 583 | { 584 | "data": { 585 | "text/html": [ 586 | "
\n", 587 | "\n", 600 | "\n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | "
namecategoryreview.pointpricedescription
0Signatory (distilled at Bowmore), 16 year old, 1988 vintage, cask #42508, 46%Single Malt Scotch8760.0Medium-bodied and nicely textured. Good balance of flavors -- and well-integrated, too -- with lovely sweet notes (cereal grain, cookie dough, caramel, and vanilla cream), young heathery peat, tar, fishnets, and brine that is complementary, but not aggressive, with a suggestion of lavender and tangerine. Balanced finish. (332 bottles produced.)
1Shieldaig 12 year old, 40%Blended Scotch Whisky8531.0This is a sharp dresser, with a firm, solid mouthfeel and an altogether finer and more focused taste than Shieldaig Classic (see\\r\\nbelow). It’s not coastal or earthy particularly, either. Instead the flavors are softer and built around mocha, smooth creamy toffee, and some soft fruit, including a touch of overripe banana and melon notes. The savoriness this time comes from a touch of pepper rather than salt.
2The Arran Malt, Single Bourbon Cask, (Cask#1801), 1996 Vintage, 50.5%Single Malt Scotch8680.0Fresh and clean, with notes of vanilla, ripe barley, honey, caramel apple, and toasted coconut. Creamy and mouth-coating in texture, leading to a pleasingly dry, spicy oak finish. Very drinkable, yet satisfying. Quite nice. \\r\\n
\n", 638 | "
" 639 | ], 640 | "text/plain": [ 641 | " name \\\n", 642 | "0 Signatory (distilled at Bowmore), 16 year old, 1988 vintage, cask #42508, 46% \n", 643 | "1 Shieldaig 12 year old, 40% \n", 644 | "2 The Arran Malt, Single Bourbon Cask, (Cask#1801), 1996 Vintage, 50.5% \n", 645 | "\n", 646 | " category review.point price \\\n", 647 | "0 Single Malt Scotch 87 60.0 \n", 648 | "1 Blended Scotch Whisky 85 31.0 \n", 649 | "2 Single Malt Scotch 86 80.0 \n", 650 | "\n", 651 | " description \n", 652 | "0 Medium-bodied and nicely textured. Good balance of flavors -- and well-integrated, too -- with lovely sweet notes (cereal grain, cookie dough, caramel, and vanilla cream), young heathery peat, tar, fishnets, and brine that is complementary, but not aggressive, with a suggestion of lavender and tangerine. Balanced finish. (332 bottles produced.) \n", 653 | "1 This is a sharp dresser, with a firm, solid mouthfeel and an altogether finer and more focused taste than Shieldaig Classic (see\\r\\nbelow). It’s not coastal or earthy particularly, either. Instead the flavors are softer and built around mocha, smooth creamy toffee, and some soft fruit, including a touch of overripe banana and melon notes. The savoriness this time comes from a touch of pepper rather than salt. \n", 654 | "2 Fresh and clean, with notes of vanilla, ripe barley, honey, caramel apple, and toasted coconut. Creamy and mouth-coating in texture, leading to a pleasingly dry, spicy oak finish. Very drinkable, yet satisfying. Quite nice. \\r\\n " 655 | ] 656 | }, 657 | "execution_count": 32, 658 | "metadata": {}, 659 | "output_type": "execute_result" 660 | } 661 | ], 662 | "source": [ 663 | "results_df = pd.DataFrame(results, columns=df.columns).iloc[:, :-1] # Remove vector\n", 664 | "pd.set_option('display.max_colwidth', None) # Easier to read description\n", 665 | "results_df.head()" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "## Indexing vector data\n", 673 | "\n", 674 | "The latest version of IRIS 2025.1 includes not only bug fixes and performance enhancements, but a new disk-based Approximate Nearest Neighbors index that speeds up vector search for large collections of vectors (typically over 100K). See [the docs](https://docs.intersystems.com/iris20251/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_index) for more information on how to define and use the index.\n", 675 | "\n", 676 | "```SQL\n", 677 | "CREATE INDEX HNSWIndex ON TABLE scotch_reviews (description_vector) AS HNSW(M=80, Distance='DotProduct');\n", 678 | "```\n", 679 | "\n", 680 | "The index will automatically get used if you issue a query that uses a `TOP` clause and an `ORDER BY` to sort by the distance function for which the index was created. You can verify its use in the query plan, by using the `EXPLAIN` command or checking the plan through the System Management Portal UI.\n", 681 | "\n", 682 | "```SQL\n", 683 | "SELECT TOP 10 * FROM scotch_reviews ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(:search_vector)) DESC;\n", 684 | "```\n", 685 | "\n", 686 | "Since this notebook is working with a dataset much smaller than 100K rows, there won't be a measurable performance benefit, and this is provided as an example you can adapt.\n" 687 | ] 688 | } 689 | ], 690 | "metadata": { 691 | "kernelspec": { 692 | "display_name": "Python 3 (ipykernel)", 693 | "language": "python", 694 | "name": "python3" 695 | }, 696 | "language_info": { 697 | "codemirror_mode": { 698 | "name": "ipython", 699 | "version": 3 700 | }, 701 | "file_extension": ".py", 702 | "mimetype": "text/x-python", 703 | "name": "python", 704 | "nbconvert_exporter": "python", 705 | "pygments_lexer": "ipython3", 706 | "version": "3.10.16" 707 | } 708 | }, 709 | "nbformat": 4, 710 | "nbformat_minor": 4 711 | } 712 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | iris: 3 | image: intersystemsdc/iris-community:latest-cd-zpm 4 | environment: 5 | IRIS_USERNAME: demo 6 | IRIS_PASSWORD: demo 7 | restart: always 8 | hostname: iris 9 | ports: 10 | - 1972:1972 11 | - 52773:52773 12 | 13 | jupyter: 14 | build: 15 | context: . 16 | dockerfile: Dockerfile 17 | environment: 18 | OPENAI_API_KEY: PUT--YOUR--KEY--HERE 19 | IRIS_HOSTNAME: iris 20 | restart: always 21 | hostname: jupyter 22 | ports: 23 | - 8888:8888 24 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | sqlalchemy-iris 2 | langchain-iris 3 | testcontainers-iris 4 | llama-index-embeddings-langchain 5 | llama-iris 6 | llama-index 7 | llama-index-embeddings-langchain 8 | sentence-transformers 9 | langchain 10 | fastembed 11 | openai 12 | tiktoken 13 | python-dotenv 14 | pandas 15 | ipykernel 16 | setuptools --------------------------------------------------------------------------------