├── .gitignore ├── Cargo.toml ├── LICENSE ├── README.md ├── images └── spectrogram_peaks.jpg └── src ├── db.rs ├── fingerprint.rs ├── lib.rs ├── main.rs ├── sample.rs ├── spectrogram.rs └── utils.rs /.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | debug/ 3 | Cargo.lock 4 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "shezem-rs" 3 | version = "0.1.0" 4 | edition = "2024" 5 | 6 | [dependencies] 7 | minimp3 = { git = "https://github.com/Kither12/minimp3-rs" } 8 | anyhow = "1.0.97" 9 | microfft = "0.6.0" 10 | clap = { version = "4.5.32", features = ["derive"] } 11 | rusqlite = "0.34.0" 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Kither 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Shezem-rs 2 | ## About 3 | A Rust implementation of a fast audio fingerprinting system inspired by Shazam, for audio recognition and identification. It focuses on speed, efficiency and simplicity. 4 | 5 | ## Usage 6 | ### Build 7 | ```bash 8 | # Clone the repository 9 | git clone https://github.com/Kither12/shezem-rs.git 10 | cd shezem-rs 11 | 12 | # Build the project 13 | cargo build --release 14 | 15 | # The executable will be available at 16 | # ./target/release/shezem-rs 17 | ``` 18 | 19 | The CLI provides two main commands: `index` and `search` 20 | 21 | ### Indexing Audio Files 22 | 23 | To create an index of audio files in a directory: 24 | 25 | ```bash 26 | shezem-rs index /path/to/audio/folder 27 | ``` 28 | 29 | This will create a `.db` folder in the specified directory and store the database file (`db.db3`) inside it. 30 | 31 | ### Searching for Similar Audio 32 | 33 | To find similar audio files to a query file: 34 | 35 | ```bash 36 | shezem-rs search /path/to/query.mp3 --path /path/to/indexed/folder 37 | ``` 38 | 39 | By default, this will return the top 10 matches. You can change the number of results with the `--rank` option: 40 | 41 | ```bash 42 | shezem-rs search /path/to/query.mp3 --path /path/to/indexed/folder --rank 5 43 | ``` 44 | ## Performance 45 | Performance benchmarks were conducted on a collection of 100 songs totaling approximately 1.1GB, using an AMD Ryzen 5 5600H (12) @ 4.28 GHz processor: 46 | 47 | - **Indexing Speed**: Complete folder indexing was accomplished in 35.5 seconds 48 | - **Search Performance**: 49 | - 10-second audio sample search: 0.3 seconds 50 | - 3-minute audio sample search: 1.02 seconds 51 | 52 | ## How it works 53 | The algorithm is based on a fingerprinting system, heavily inspired by this article: 54 | [How does Shazam work - Coding Geek](https://drive.google.com/file/d/1ahyCTXBAZiuni6RTzHzLoOwwfTRFaU-C/view) 55 | 56 | While working on the audio fingerprinting process, I developed some interesting approaches that I believe are both faster and more efficient. I'll explain them in detail here. 57 | 58 | ### Preprocessing 59 | First, we need to convert the audio from stereo to mono by averaging the left and right channels. To reduce computational load, we also downsample the audio, which decreases the number of samples we need to process. Most downloaded songs have a sampling rate of 44.1kHz, but we'll downsample it to 11.025kHz. Before doing so, we must filter out any frequencies above the [Nyquist frequency](https://en.wikipedia.org/wiki/Nyquist_frequency) to prevent aliasing. We can achieve this by applying a simple [IIR low-pass filter](https://tomroelandts.com/articles/low-pass-single-pole-iir-filter). 60 | 61 | ### Spectrogram 62 | The audio is transformed into a spectrogram using a Short-Time Fourier Transform (STFT) with a 1024-sample Hamming window and 50% overlap between adjacent windows. This creates a time-frequency representation of the audio signal. 63 | 64 | To identify significant features, the algorithm divides the frequency spectrum into discrete bands for each time window. Within each band, only the maximum amplitude is preserved. The system then applies a threshold filter, eliminating any bands with amplitudes below the average level. The remaining high-energy points constitute the characteristic peaks of the spectrogram, which serve as the audio fingerprint. 65 | 66 | ![Spectrogram Peaks](images/spectrogram_peaks.jpg) 67 | 68 | ### Storing Fingerprint 69 | After getting the peaks from spectrogram, How can we store and use it in an efficient way? We’ll do this by using a hash function. Here we will combine some adjacent peaks to form a group of peaks. This group will have an anchor, then we address other peaks inside the group using that anchor. The address will be identified by (anchor frequency, peak frequency, delta time between peaks and anchor). This tuple is easily fits in a 32-bit integer. To advance 64 bits, I also store the anchor address along with each peak. 70 | 71 | ### Searching and Ranking 72 | When identifying matching audio, the system first processes the input sample to create a fingerprint. After retrieving potential matching fingerprints from the database, the algorithm performs a temporal coherence analysis by sorting the retrieved fingerprints according to their chronological appearance in the sample. 73 | 74 | The system then calculates the longest increasing subsequence of these sorted fingerprints, ensuring that the detected peaks maintain the same sequential order as in the original audio. To address the challenge of false positives when comparing short samples against longer recordings, the algorithm implements a sliding window technique. This approach identifies the window with the highest concentration of matching peaks and uses this density for the match score calculation. Final results are then ranked according to these match scores, with higher scores indicating stronger matches. 75 | -------------------------------------------------------------------------------- /images/spectrogram_peaks.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kither12/shezem-rs/bb49dd2e7791c65b4a51223b4ed92b715b59605b/images/spectrogram_peaks.jpg -------------------------------------------------------------------------------- /src/db.rs: -------------------------------------------------------------------------------- 1 | use std::{cmp::max, collections::HashMap, path::PathBuf}; 2 | 3 | use anyhow::Result; 4 | use rusqlite::{Connection, Transaction, params}; 5 | 6 | use crate::{ 7 | NEIGHBORHOOD_SIZE, 8 | fingerprint::{Fingerprint, FingerprintData}, 9 | utils::longest_increasing_subsequence, 10 | }; 11 | 12 | #[derive(Debug)] 13 | pub struct SongData { 14 | // Only title for now 15 | pub title: String, 16 | } 17 | 18 | #[derive(Debug)] 19 | pub struct RankingData { 20 | pub data: SongData, 21 | pub score: i32, 22 | } 23 | 24 | #[derive(Debug, Clone, PartialEq, Eq, Hash)] 25 | struct Couples { 26 | pub anchor_address: u32, 27 | pub anchor_time: u32, 28 | pub song_id: i32, 29 | } 30 | 31 | pub struct DbClient { 32 | conn: Connection, 33 | } 34 | 35 | impl DbClient { 36 | pub fn new(path: &PathBuf) -> Self { 37 | let conn = Connection::open(path).unwrap(); 38 | let client = DbClient { conn }; 39 | client.create_tables().unwrap(); 40 | client 41 | } 42 | pub fn get_conn<'a>(&'a mut self) -> Transaction<'a> { 43 | self.conn.transaction().unwrap() 44 | } 45 | pub fn create_tables(&self) -> rusqlite::Result<()> { 46 | self.conn.execute( 47 | "CREATE TABLE IF NOT EXISTS songs ( 48 | id INTEGER PRIMARY KEY AUTOINCREMENT, 49 | title TEXT NOT NULL 50 | )", 51 | [], 52 | )?; 53 | 54 | self.conn.execute( 55 | "CREATE TABLE IF NOT EXISTS fingerprints ( 56 | address INTEGER NOT NULL, 57 | anchorAddress INTEGER NOT NULL, 58 | anchorTime INTEGER NOT NULL, 59 | songID INTEGER NOT NULL, 60 | PRIMARY KEY (address, anchorAddress, anchorTime, songID) 61 | )", 62 | [], 63 | )?; 64 | 65 | Ok(()) 66 | } 67 | 68 | pub fn register_song(&self, song_data: &SongData) -> Result { 69 | let mut stmt = self 70 | .conn 71 | .prepare_cached("INSERT INTO songs (title) VALUES (?)")?; 72 | let result = stmt.execute([&song_data.title])?; 73 | 74 | if result == 0 { 75 | return Err(rusqlite::Error::StatementChangedRows(0).into()); 76 | } 77 | 78 | let song_id = self.conn.last_insert_rowid(); 79 | Ok(song_id) 80 | } 81 | 82 | pub fn register_fingerprint<'a>( 83 | fingerprint_data: &FingerprintData, 84 | tx: &mut Transaction<'a>, 85 | ) -> rusqlite::Result<()> { 86 | let mut stmt = tx.prepare_cached( 87 | "INSERT OR IGNORE INTO fingerprints (address, anchorAddress, anchorTime, songID) VALUES (?, ?, ?, ?)", 88 | )?; 89 | stmt.execute(params![ 90 | &fingerprint_data.fingerprint.address, 91 | &fingerprint_data.fingerprint.anchor_address, 92 | &fingerprint_data.fingerprint.anchor_time, 93 | &fingerprint_data.song_id, 94 | ])?; 95 | 96 | Ok(()) 97 | } 98 | 99 | pub fn get_song_data(&self, song_id: i32) -> Result { 100 | let mut stmt = self 101 | .conn 102 | .prepare_cached("SELECT title FROM songs WHERE id = ?")?; 103 | let row = stmt.query_row([song_id], |row| Ok(SongData { title: row.get(0)? }))?; 104 | 105 | Ok(row) 106 | } 107 | fn get_fingerprint_from_database( 108 | &self, 109 | fingerprints: &Vec, 110 | ) -> Result> { 111 | let placeholders: String = std::iter::repeat("?") 112 | .take(fingerprints.len()) 113 | .collect::>() 114 | .join(","); 115 | 116 | let params: Vec<&dyn rusqlite::types::ToSql> = fingerprints 117 | .iter() 118 | .map(|f| &f.address as &dyn rusqlite::types::ToSql) 119 | .collect(); 120 | 121 | let mut stmt = self.conn.prepare(&format!( 122 | "SELECT address, anchorAddress, anchorTime, songID FROM fingerprints WHERE address IN ({})", 123 | placeholders 124 | ))?; 125 | 126 | let rows = stmt 127 | .query_map(params.as_slice(), |row| { 128 | Ok(FingerprintData { 129 | fingerprint: Fingerprint { 130 | address: row.get(0)?, 131 | anchor_address: row.get(1)?, 132 | anchor_time: row.get(2)?, 133 | }, 134 | song_id: row.get(3)?, 135 | }) 136 | })? 137 | .collect::, _>>()?; 138 | 139 | Ok(rows) 140 | } 141 | pub fn search(&self, fingerprints: Vec, rank: usize) -> Result> { 142 | let sample_duration = 143 | fingerprints.last().unwrap().anchor_time - fingerprints.first().unwrap().anchor_time; 144 | 145 | let index_map: HashMap = fingerprints 146 | .iter() 147 | .enumerate() 148 | .map(|(i, fp)| (fp.address, i)) 149 | .collect(); 150 | 151 | let db_result = self.get_fingerprint_from_database(&fingerprints)?; 152 | 153 | // Track fingerprint matches directly by song ID 154 | let mut song_fingerprints: HashMap> = HashMap::new(); 155 | 156 | // Count matches to identify complete neighborhoods 157 | let mut match_counts: HashMap = HashMap::new(); 158 | 159 | for row in db_result { 160 | let key = Couples { 161 | anchor_address: row.fingerprint.anchor_address, 162 | anchor_time: row.fingerprint.anchor_time, 163 | song_id: row.song_id, 164 | }; 165 | let count = match_counts.entry(key).or_insert(0); 166 | *count += 1; 167 | 168 | if *count == NEIGHBORHOOD_SIZE { 169 | song_fingerprints 170 | .entry(row.song_id) 171 | .or_insert_with(Vec::new) 172 | .push(row.fingerprint); 173 | } 174 | } 175 | 176 | /* 177 | After get the fingerprints for each song from database, we need to verify their temporal coherence with the sample. 178 | 179 | We'll implement a Longest Increasing Subsequence (LIS) algorithm on each song's fingerprint to ensure the chronological 180 | order of peaks matches our sample's pattern. 181 | 182 | To guard against false positives from random matching peaks, we'll employ a sliding window technique. This window, 183 | sized to match our sample duration, will move across the LIS results. The maximum number of matching peaks detected 184 | within any window position will serve as our relevance score for that song. 185 | */ 186 | 187 | let mut result = Vec::with_capacity(song_fingerprints.len()); 188 | for (song_id, mut fingerprints) in song_fingerprints { 189 | fingerprints.sort_unstable_by_key(|fp| index_map[&fp.address]); 190 | let times = fingerprints 191 | .iter() 192 | .map(|v| v.anchor_time) 193 | .collect::>(); 194 | let lis = longest_increasing_subsequence(×); 195 | 196 | let mut l: usize = 0; 197 | let mut score = 0; 198 | for r in 0..lis.len() { 199 | while lis[r] - lis[l] > sample_duration { 200 | l += 1; 201 | } 202 | score = max(score, r - l + 1); 203 | } 204 | 205 | result.push((song_id, score as i32)); 206 | } 207 | 208 | result.sort_unstable_by_key(|a| -a.1); 209 | result.truncate(rank); 210 | 211 | Ok(result 212 | .into_iter() 213 | .filter_map(|(song_id, score)| { 214 | self.get_song_data(song_id) 215 | .ok() 216 | .map(|data| RankingData { data, score }) 217 | }) 218 | .collect::>()) 219 | } 220 | } 221 | -------------------------------------------------------------------------------- /src/fingerprint.rs: -------------------------------------------------------------------------------- 1 | use crate::{NEIGHBORHOOD_SIZE, spectrogram::Peak}; 2 | 3 | #[derive(Debug)] 4 | pub struct FingerprintData { 5 | pub fingerprint: Fingerprint, 6 | pub song_id: i32, 7 | } 8 | 9 | #[derive(Debug)] 10 | pub struct Fingerprint { 11 | pub address: u32, 12 | pub anchor_address: u32, 13 | pub anchor_time: u32, 14 | } 15 | 16 | pub fn generate_fingerprint(mut peaks: Vec) -> Vec { 17 | peaks.sort_by(|a, b| { 18 | a.time 19 | .partial_cmp(&b.time) 20 | .unwrap() 21 | .then(a.freq.partial_cmp(&b.freq).unwrap()) 22 | }); 23 | 24 | let mut addresses = Vec::new(); 25 | for i in 0..(peaks.len() - NEIGHBORHOOD_SIZE) { 26 | let anchor_address = build_address(&peaks[i], &peaks[i + NEIGHBORHOOD_SIZE]); 27 | for j in i..(i + NEIGHBORHOOD_SIZE) { 28 | let address = build_address(&peaks[i], &peaks[j]); 29 | addresses.push(Fingerprint { 30 | address, 31 | anchor_address, 32 | anchor_time: (peaks[i].time * 1000.0) as u32, 33 | }); 34 | } 35 | } 36 | addresses 37 | } 38 | 39 | pub fn build_address(peak_a: &Peak, peak_b: &Peak) -> u32 { 40 | let delta_time = (peak_b.time - peak_a.time) * 1000.0; 41 | /* 42 | 9 bits for storing peak_a.freq 43 | 9 bits for storing peak_b.freq 44 | 14 bits for storing delta_time 45 | */ 46 | (peak_a.freq << 23) | (peak_b.freq << 14) | delta_time as u32 47 | } 48 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | use std::{collections::VecDeque, fs, path::PathBuf}; 2 | 3 | use anyhow::{Ok, Result}; 4 | use db::{DbClient, SongData}; 5 | use fingerprint::{FingerprintData, generate_fingerprint}; 6 | use sample::Sample; 7 | use spectrogram::{filter_spectrogram, generate_spectrogram}; 8 | 9 | pub mod db; 10 | pub mod fingerprint; 11 | pub mod sample; 12 | pub mod spectrogram; 13 | pub mod utils; 14 | 15 | const NEIGHBORHOOD_SIZE: usize = 5; 16 | 17 | pub fn index_folder(path: &PathBuf, database_path: &PathBuf) -> Result<()> { 18 | let entries: Vec<_> = fs::read_dir(path)?.collect::>()?; 19 | //this stack hold the song ids 20 | let mut stack = VecDeque::new(); 21 | 22 | //First we insert the song information into the song database 23 | { 24 | let db_client = DbClient::new(database_path); 25 | for entry in &entries { 26 | if entry.path().extension().and_then(|e| e.to_str()) != Some("mp3") { 27 | continue; 28 | } 29 | 30 | let title = entry 31 | .path() 32 | .file_stem() 33 | .unwrap() 34 | .to_str() 35 | .unwrap() 36 | .to_string(); 37 | let song_id = db_client.register_song(&SongData { title })?; 38 | stack.push_back(song_id as i32); 39 | } 40 | } 41 | 42 | //Then we bulk insert all the fingerprints into the fingerprint database 43 | { 44 | let mut db_client = DbClient::new(database_path); 45 | let mut tx = db_client.get_conn(); 46 | for entry in entries { 47 | if entry.path().extension().and_then(|e| e.to_str()) != Some("mp3") { 48 | continue; 49 | } 50 | 51 | let mut sample = Sample::read_mp3(&entry.path())?; 52 | sample = sample.downsample(4); 53 | let mut spectrogram = 54 | generate_spectrogram(&sample.sample, spectrogram::WindowSize::S1024, 512); 55 | let peaks = filter_spectrogram(&mut spectrogram, sample.sample_rate); 56 | let fingerprints = generate_fingerprint(peaks); 57 | 58 | let song_id = stack.pop_front().unwrap(); 59 | for fingerprint in fingerprints { 60 | DbClient::register_fingerprint( 61 | &FingerprintData { 62 | fingerprint, 63 | song_id, 64 | }, 65 | &mut tx, 66 | )?; 67 | } 68 | } 69 | tx.commit()?; 70 | } 71 | Ok(()) 72 | } 73 | 74 | pub fn search(query_file: &PathBuf, database_path: &PathBuf, rank: usize) -> Result<()> { 75 | let db_client = DbClient::new(database_path); 76 | 77 | let mut sample = Sample::read_mp3(query_file)?; 78 | sample = sample.downsample(4); 79 | 80 | let mut spectrogram = generate_spectrogram(&sample.sample, spectrogram::WindowSize::S1024, 512); 81 | let peaks = filter_spectrogram(&mut spectrogram, sample.sample_rate); 82 | 83 | let fingerprints = generate_fingerprint(peaks); 84 | 85 | let ranking = db_client.search(fingerprints, rank)?; 86 | for (index, data) in ranking.iter().enumerate() { 87 | println!("{}. {} (score: {})", index + 1, data.data.title, data.score); 88 | } 89 | Ok(()) 90 | } 91 | -------------------------------------------------------------------------------- /src/main.rs: -------------------------------------------------------------------------------- 1 | use anyhow::Result; 2 | use clap::{Parser, Subcommand}; 3 | use shezem_rs::{index_folder, search}; 4 | use std::path::PathBuf; 5 | 6 | #[derive(Parser)] 7 | #[command( 8 | name = "shezem-rs", 9 | about = "Index and retrieve audio files", 10 | version = "0.0.1", 11 | author = "Kither" 12 | )] 13 | struct Cli { 14 | #[command(subcommand)] 15 | command: Commands, 16 | } 17 | 18 | #[derive(Subcommand)] 19 | enum Commands { 20 | Index { 21 | #[arg(value_name = "PATH")] 22 | path: PathBuf, 23 | }, 24 | 25 | Search { 26 | #[arg(value_name = "AUDIO_FILE")] 27 | query_file: PathBuf, 28 | 29 | #[arg(short, long, value_name = "DB_PATH")] 30 | path: PathBuf, 31 | 32 | #[arg(short, long, default_value = "10")] 33 | rank: usize, 34 | }, 35 | } 36 | 37 | const DEFAULT_DB_PATH: &str = "db.db3"; 38 | const DEFAULT_FOLDER_DB_PATH: &str = ".db"; 39 | 40 | fn main() -> Result<()> { 41 | let cli = Cli::parse(); 42 | 43 | match &cli.command { 44 | Commands::Index { path } => { 45 | let db_folder_path = path.join(DEFAULT_FOLDER_DB_PATH); 46 | if !db_folder_path.exists() { 47 | std::fs::create_dir_all(&db_folder_path)?; 48 | } 49 | 50 | let default_db_path = db_folder_path.join(DEFAULT_DB_PATH); 51 | index_folder(path, &default_db_path)?; 52 | Ok(()) 53 | } 54 | 55 | Commands::Search { 56 | query_file, 57 | path, 58 | rank, 59 | } => { 60 | let default_db_path = path.join(DEFAULT_FOLDER_DB_PATH).join(DEFAULT_DB_PATH); 61 | search(query_file, &default_db_path, *rank)?; 62 | Ok(()) 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /src/sample.rs: -------------------------------------------------------------------------------- 1 | use std::{fs::File, io::BufReader, path::PathBuf}; 2 | 3 | use anyhow::Result; 4 | 5 | pub struct Sample { 6 | pub sample: Vec, 7 | pub sample_rate: usize, 8 | } 9 | 10 | impl Sample { 11 | pub fn low_pass_filter(&self, cutoff_freq: f32) -> Sample { 12 | // IIR low pass filter 13 | // y[n] = alpha * x[n] + (1.0 - alpha) * y[n-1] 14 | 15 | let fc = cutoff_freq / self.sample_rate as f32; 16 | let alpha = 2.0 * std::f32::consts::PI * fc / (2.0 * std::f32::consts::PI * fc + 1.0); 17 | 18 | let mut filtered = vec![0.0; self.sample.len()]; 19 | 20 | if !self.sample.is_empty() { 21 | filtered[0] = self.sample[0]; 22 | } 23 | 24 | for i in 1..self.sample.len() { 25 | filtered[i] = alpha * self.sample[i] + (1.0 - alpha) * filtered[i - 1]; 26 | } 27 | 28 | Sample { 29 | sample: filtered, 30 | sample_rate: self.sample_rate, 31 | } 32 | } 33 | 34 | pub fn downsample(&mut self, factor: usize) -> Sample { 35 | /* 36 | When downsampling by a factor, the Nyquist frequency of the new sample rate 37 | will be (sample_rate/factor)/2. To prevent aliasing, we need to filter out 38 | frequencies above this threshold. Using 0.45 instead of 0.5 provides a small 39 | margin to account for the non-ideal nature of our simple filter. 40 | */ 41 | let cutoff_freq = (self.sample_rate / factor) as f32 * 0.45; 42 | let filtered = self.low_pass_filter(cutoff_freq); 43 | 44 | let new_len = filtered.sample.len() / factor; 45 | let mut downsampled = Vec::with_capacity(new_len); 46 | 47 | for i in (0..filtered.sample.len()).step_by(factor) { 48 | downsampled.push(filtered.sample[i]); 49 | } 50 | 51 | Sample { 52 | sample: downsampled, 53 | sample_rate: self.sample_rate / factor, 54 | } 55 | } 56 | 57 | pub fn read_mp3(path: &PathBuf) -> Result { 58 | let file = File::open(path)?; 59 | let reader = BufReader::new(file); 60 | let mut decoder = minimp3::Decoder::new(reader); 61 | 62 | let mut mono_samples = Vec::new(); 63 | let mut sampling_rate = 0; 64 | 65 | while let Ok(minimp3::Frame { 66 | data, 67 | sample_rate, 68 | channels, 69 | .. 70 | }) = decoder.next_frame() 71 | { 72 | if sampling_rate == 0 { 73 | sampling_rate = sample_rate; 74 | } 75 | 76 | match channels { 77 | 1 => { 78 | mono_samples.extend(data.iter().map(|&s| s as f32)); 79 | } 80 | 2 => { 81 | let len = data.len() / 2; 82 | mono_samples.reserve(len); 83 | 84 | for chunk in data.chunks_exact(2) { 85 | let avg = (chunk[0] as f32 + chunk[1] as f32) * 0.5; 86 | mono_samples.push(avg); 87 | } 88 | } 89 | _ => panic!("Unsupported number of channels: {}", channels), 90 | } 91 | } 92 | Ok(Sample { 93 | sample: mono_samples, 94 | sample_rate: sampling_rate as usize, 95 | }) 96 | } 97 | } 98 | -------------------------------------------------------------------------------- /src/spectrogram.rs: -------------------------------------------------------------------------------- 1 | use std::f32::consts::PI; 2 | 3 | use microfft::Complex32; 4 | 5 | pub fn hamming_window(samples: &[f32]) -> Vec { 6 | let mut windowed_samples = Vec::with_capacity(samples.len()); 7 | let n = samples.len() as f32; 8 | for (i, sample) in samples.iter().enumerate() { 9 | let multiplier = 0.54 - (0.92 * PI * i as f32 / n).cos(); 10 | windowed_samples.push(multiplier * sample) 11 | } 12 | windowed_samples 13 | } 14 | 15 | #[derive(Debug, Clone, Copy)] 16 | pub enum WindowSize { 17 | S2, 18 | S4, 19 | S8, 20 | S16, 21 | S32, 22 | S64, 23 | S128, 24 | S256, 25 | S512, 26 | S1024, 27 | S2048, 28 | S4096, 29 | S8192, 30 | } 31 | 32 | impl From for usize { 33 | fn from(window_size: WindowSize) -> Self { 34 | match window_size { 35 | WindowSize::S2 => 2, 36 | WindowSize::S4 => 4, 37 | WindowSize::S8 => 8, 38 | WindowSize::S16 => 16, 39 | WindowSize::S32 => 32, 40 | WindowSize::S64 => 64, 41 | WindowSize::S128 => 128, 42 | WindowSize::S256 => 256, 43 | WindowSize::S512 => 512, 44 | WindowSize::S1024 => 1024, 45 | WindowSize::S2048 => 2048, 46 | WindowSize::S4096 => 4096, 47 | WindowSize::S8192 => 8192, 48 | } 49 | } 50 | } 51 | 52 | impl From for f32 { 53 | fn from(window_size: WindowSize) -> Self { 54 | usize::from(window_size) as f32 55 | } 56 | } 57 | 58 | pub fn apply_fft(sample: &[f32], window_size: WindowSize) -> Vec { 59 | let size: usize = window_size.into(); 60 | if sample.len() != size { 61 | panic!("Sample length must match window size"); 62 | } 63 | 64 | let result = match window_size { 65 | WindowSize::S2 => { 66 | let mut array = [0.0f32; 2]; 67 | array.copy_from_slice(&sample[0..2]); 68 | let complex_result = microfft::real::rfft_2(&mut array); 69 | complex_result.to_vec() 70 | } 71 | WindowSize::S4 => { 72 | let mut array = [0.0f32; 4]; 73 | array.copy_from_slice(&sample[0..4]); 74 | let complex_result = microfft::real::rfft_4(&mut array); 75 | complex_result.to_vec() 76 | } 77 | WindowSize::S8 => { 78 | let mut array = [0.0f32; 8]; 79 | array.copy_from_slice(&sample[0..8]); 80 | let complex_result = microfft::real::rfft_8(&mut array); 81 | complex_result.to_vec() 82 | } 83 | WindowSize::S16 => { 84 | let mut array = [0.0f32; 16]; 85 | array.copy_from_slice(&sample[0..16]); 86 | let complex_result = microfft::real::rfft_16(&mut array); 87 | complex_result.to_vec() 88 | } 89 | WindowSize::S32 => { 90 | let mut array = [0.0f32; 32]; 91 | array.copy_from_slice(&sample[0..32]); 92 | let complex_result = microfft::real::rfft_32(&mut array); 93 | complex_result.to_vec() 94 | } 95 | WindowSize::S64 => { 96 | let mut array = [0.0f32; 64]; 97 | array.copy_from_slice(&sample[0..64]); 98 | let complex_result = microfft::real::rfft_64(&mut array); 99 | complex_result.to_vec() 100 | } 101 | WindowSize::S128 => { 102 | let mut array = [0.0f32; 128]; 103 | array.copy_from_slice(&sample[0..128]); 104 | let complex_result = microfft::real::rfft_128(&mut array); 105 | complex_result.to_vec() 106 | } 107 | WindowSize::S256 => { 108 | let mut array = [0.0f32; 256]; 109 | array.copy_from_slice(&sample[0..256]); 110 | let complex_result = microfft::real::rfft_256(&mut array); 111 | complex_result.to_vec() 112 | } 113 | WindowSize::S512 => { 114 | let mut array = [0.0f32; 512]; 115 | array.copy_from_slice(&sample[0..512]); 116 | let complex_result = microfft::real::rfft_512(&mut array); 117 | complex_result.to_vec() 118 | } 119 | WindowSize::S1024 => { 120 | let mut array = [0.0f32; 1024]; 121 | array.copy_from_slice(&sample[0..1024]); 122 | let complex_result = microfft::real::rfft_1024(&mut array); 123 | complex_result.to_vec() 124 | } 125 | WindowSize::S2048 => { 126 | let mut array = [0.0f32; 2048]; 127 | array.copy_from_slice(&sample[0..2048]); 128 | let complex_result = microfft::real::rfft_2048(&mut array); 129 | complex_result.to_vec() 130 | } 131 | WindowSize::S4096 => { 132 | let mut array = [0.0f32; 4096]; 133 | array.copy_from_slice(&sample[0..4096]); 134 | let complex_result = microfft::real::rfft_4096(&mut array); 135 | complex_result.to_vec() 136 | } 137 | WindowSize::S8192 => { 138 | let mut array = [0.0f32; 8192]; 139 | array.copy_from_slice(&sample[0..8192]); 140 | let complex_result = microfft::real::rfft_8192(&mut array); 141 | complex_result.to_vec() 142 | } 143 | }; 144 | 145 | result 146 | } 147 | 148 | pub struct FFTWindow { 149 | pub start_idx: usize, 150 | pub data: Vec, 151 | } 152 | 153 | pub fn generate_spectrogram( 154 | sample: &[f32], 155 | window_size: WindowSize, 156 | overlap: usize, 157 | ) -> Vec { 158 | let w_size = window_size.into(); 159 | if overlap >= w_size { 160 | panic!("overlap size must less than window size"); 161 | } 162 | 163 | let mut start: usize = 0; 164 | 165 | let mut spectrogram = Vec::new(); 166 | 167 | while start < sample.len() { 168 | if start + w_size > sample.len() { 169 | start = sample.len() - w_size; 170 | } 171 | let window = hamming_window(&sample[start..start + w_size]); 172 | let fft_res = apply_fft(&window, window_size); 173 | 174 | spectrogram.push(FFTWindow { 175 | start_idx: start, 176 | data: fft_res, 177 | }); 178 | 179 | if start + w_size >= sample.len() { 180 | break; 181 | } 182 | start += w_size - overlap; 183 | } 184 | spectrogram 185 | } 186 | 187 | #[derive(Debug)] 188 | pub struct Peak { 189 | pub time: f32, 190 | pub freq: u32, 191 | } 192 | 193 | pub fn filter_spectrogram(spectrogram: &mut Vec, sample_rate: usize) -> Vec { 194 | let bands = [ 195 | (20, 30), 196 | (30, 40), 197 | (40, 60), 198 | (60, 80), 199 | (80, 120), 200 | (120, 200), 201 | (200, 511), 202 | ]; 203 | 204 | let mut peaks = Vec::new(); 205 | 206 | for window in spectrogram.iter() { 207 | let mut strongest_bins = Vec::with_capacity(bands.len()); 208 | 209 | for &(start, end) in &bands { 210 | let end = end.min(window.data.len() - 1); 211 | 212 | let mut max_magnitude = 0.0; 213 | let mut max_bin = start; 214 | 215 | for bin in start..end { 216 | if bin < window.data.len() { 217 | let magnitude = window.data[bin].norm_sqr(); 218 | if magnitude > max_magnitude { 219 | max_magnitude = magnitude; 220 | max_bin = bin; 221 | } 222 | } 223 | } 224 | 225 | strongest_bins.push((max_bin, window.data[max_bin])); 226 | } 227 | 228 | let average_magnitude = strongest_bins 229 | .iter() 230 | .map(|(_, complex)| complex.norm_sqr()) 231 | .sum::() 232 | / strongest_bins.len() as f32; 233 | 234 | let threshold = average_magnitude; 235 | 236 | for (bin_index, complex) in strongest_bins { 237 | if complex.norm_sqr() > threshold { 238 | let time_in_seconds = window.start_idx as f32 / sample_rate as f32; 239 | peaks.push(Peak { 240 | time: time_in_seconds, 241 | freq: bin_index as u32, 242 | }); 243 | } 244 | } 245 | } 246 | 247 | peaks 248 | } 249 | -------------------------------------------------------------------------------- /src/utils.rs: -------------------------------------------------------------------------------- 1 | use std::cmp::Ordering::{Greater, Less}; 2 | 3 | pub fn longest_increasing_subsequence(a: &[T]) -> Vec 4 | where 5 | T: Ord + Clone, 6 | { 7 | if a.is_empty() { 8 | return Vec::new(); 9 | } 10 | 11 | let mut m: Vec = Vec::with_capacity(a.len()); 12 | let mut p: Vec = vec![0; a.len()]; 13 | 14 | for i in 0..a.len() { 15 | match m.binary_search_by(|&j| if a[j] > a[i] { Greater } else { Less }) { 16 | Ok(pos) => { 17 | m[pos] = i; 18 | if pos > 0 { 19 | p[i] = m[pos - 1]; 20 | } 21 | } 22 | Err(pos) => { 23 | if pos > 0 { 24 | p[i] = m[pos - 1]; 25 | } 26 | if pos == m.len() { 27 | m.push(i); 28 | } else { 29 | m[pos] = i; 30 | } 31 | } 32 | } 33 | } 34 | 35 | let mut result = Vec::with_capacity(m.len()); 36 | if !m.is_empty() { 37 | let mut k = m[m.len() - 1]; 38 | result.push(a[k].clone()); 39 | 40 | for _ in 0..m.len() - 1 { 41 | k = p[k]; 42 | result.push(a[k].clone()); 43 | } 44 | 45 | result.reverse(); 46 | } 47 | 48 | result 49 | } 50 | 51 | #[cfg(test)] 52 | mod tests { 53 | use super::*; 54 | 55 | #[test] 56 | fn test_empty() { 57 | let empty: Vec = vec![]; 58 | assert_eq!(longest_increasing_subsequence(&empty), vec![]); 59 | } 60 | 61 | #[test] 62 | fn test_single_element() { 63 | let single = vec![5]; 64 | assert_eq!(longest_increasing_subsequence(&single), vec![5]); 65 | } 66 | 67 | #[test] 68 | fn test_all_same() { 69 | let all_same = vec![3, 3, 3, 3]; 70 | assert_eq!(longest_increasing_subsequence(&all_same), vec![3, 3, 3, 3]); 71 | } 72 | 73 | #[test] 74 | fn test_strictly_increasing() { 75 | let increasing = vec![1, 2, 3, 4, 5]; 76 | assert_eq!( 77 | longest_increasing_subsequence(&increasing), 78 | vec![1, 2, 3, 4, 5] 79 | ); 80 | } 81 | } 82 | --------------------------------------------------------------------------------