├── LICENSE ├── README.md ├── demo.png └── index.html /LICENSE: -------------------------------------------------------------------------------- 1 | It's public domain, whatever, use it as you want :) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SmolVLM real-time camera demo 2 | 3 | ![demo](./demo.png) 4 | 5 | This repository is a simple demo for how to use llama.cpp server with SmolVLM 500M to get real-time object detection 6 | 7 | ## How to setup 8 | 9 | 1. Install [llama.cpp](https://github.com/ggml-org/llama.cpp) 10 | 2. Run `llama-server -hf ggml-org/SmolVLM-500M-Instruct-GGUF` 11 | Note: you may need to add `-ngl 99` to enable GPU (if you are using NVidia/AMD/Intel GPU) 12 | Note (2): You can also try other models [here](https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md) 13 | 3. Open `index.html` 14 | 4. Optionally change the instruction (for example, make it returns JSON) 15 | 5. Click on "Start" and enjoy 16 | -------------------------------------------------------------------------------- /demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ngxson/smolvlm-realtime-webcam/37b62fc3c9fee5b90040a4496db8ad9e4f66d959/demo.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Camera Interaction App 7 | 71 | 72 | 73 | 74 |

Camera Interaction App

75 | 76 |

77 |

78 | 79 |

80 |

81 | Base API:
82 | 83 |

84 |

85 | Instruction:
86 | 87 |

88 |

89 | Response:
90 | 91 |

92 |

93 | 94 |

95 | Interval between 2 requests: 96 | 103 | 104 |

105 | 106 | 264 | 265 | --------------------------------------------------------------------------------