├── .gitignore
├── Cargo.toml
├── LICENSE
├── README.md
├── images
    └── spectrogram_peaks.jpg
└── src
    ├── db.rs
    ├── fingerprint.rs
    ├── lib.rs
    ├── main.rs
    ├── sample.rs
    ├── spectrogram.rs
    └── utils.rs


/.gitignore:
--------------------------------------------------------------------------------
1 | target/
2 | debug/
3 | Cargo.lock
4 | 


--------------------------------------------------------------------------------
/Cargo.toml:
--------------------------------------------------------------------------------
 1 | [package]
 2 | name = "shezem-rs"
 3 | version = "0.1.0"
 4 | edition = "2024"
 5 | 
 6 | [dependencies]
 7 | minimp3 = { git = "https://github.com/Kither12/minimp3-rs" }
 8 | anyhow = "1.0.97"
 9 | microfft = "0.6.0"
10 | clap = { version = "4.5.32", features = ["derive"] }
11 | rusqlite = "0.34.0"
12 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Kither
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Shezem-rs
 2 | ## About
 3 | A Rust implementation of a fast audio fingerprinting system inspired by Shazam, for audio recognition and identification. It focuses on speed, efficiency and simplicity.
 4 | 
 5 | ## Usage
 6 | ### Build
 7 | ```bash
 8 | # Clone the repository
 9 | git clone https://github.com/Kither12/shezem-rs.git
10 | cd shezem-rs
11 | 
12 | # Build the project
13 | cargo build --release
14 | 
15 | # The executable will be available at
16 | # ./target/release/shezem-rs
17 | ```
18 | 
19 | The CLI provides two main commands: `index` and `search`
20 | 
21 | ### Indexing Audio Files
22 | 
23 | To create an index of audio files in a directory:
24 | 
25 | ```bash
26 | shezem-rs index /path/to/audio/folder
27 | ```
28 | 
29 | This will create a `.db` folder in the specified directory and store the database file (`db.db3`) inside it.
30 | 
31 | ### Searching for Similar Audio
32 | 
33 | To find similar audio files to a query file:
34 | 
35 | ```bash
36 | shezem-rs search /path/to/query.mp3 --path /path/to/indexed/folder
37 | ```
38 | 
39 | By default, this will return the top 10 matches. You can change the number of results with the `--rank` option:
40 | 
41 | ```bash
42 | shezem-rs search /path/to/query.mp3 --path /path/to/indexed/folder --rank 5
43 | ```
44 | ## Performance
45 | Performance benchmarks were conducted on a collection of 100 songs totaling approximately 1.1GB, using an AMD Ryzen 5 5600H (12) @ 4.28 GHz processor:
46 | 
47 | - **Indexing Speed**: Complete folder indexing was accomplished in 35.5 seconds
48 | - **Search Performance**:
49 |   - 10-second audio sample search: 0.3 seconds
50 |   - 3-minute audio sample search: 1.02 seconds
51 | 
52 | ## How it works
53 | The algorithm is based on a fingerprinting system, heavily inspired by this article:
54 | [How does Shazam work - Coding Geek](https://drive.google.com/file/d/1ahyCTXBAZiuni6RTzHzLoOwwfTRFaU-C/view)
55 | 
56 | While working on the audio fingerprinting process, I developed some interesting approaches that I believe are both faster and more efficient. I'll explain them in detail here.
57 | 
58 | ### Preprocessing
59 | First, we need to convert the audio from stereo to mono by averaging the left and right channels. To reduce computational load, we also downsample the audio, which decreases the number of samples we need to process. Most downloaded songs have a sampling rate of 44.1kHz, but we'll downsample it to 11.025kHz. Before doing so, we must filter out any frequencies above the [Nyquist frequency](https://en.wikipedia.org/wiki/Nyquist_frequency) to prevent aliasing. We can achieve this by applying a simple [IIR low-pass filter](https://tomroelandts.com/articles/low-pass-single-pole-iir-filter).
60 | 
61 | ### Spectrogram
62 | The audio is transformed into a spectrogram using a Short-Time Fourier Transform (STFT) with a 1024-sample Hamming window and 50% overlap between adjacent windows. This creates a time-frequency representation of the audio signal.
63 | 
64 | To identify significant features, the algorithm divides the frequency spectrum into discrete bands for each time window. Within each band, only the maximum amplitude is preserved. The system then applies a threshold filter, eliminating any bands with amplitudes below the average level. The remaining high-energy points constitute the characteristic peaks of the spectrogram, which serve as the audio fingerprint.
65 | 
66 | ![Spectrogram Peaks](images/spectrogram_peaks.jpg)
67 | 
68 | ### Storing Fingerprint
69 | After getting the peaks from spectrogram, How can we store and use it in an efficient way? We’ll do this by using a hash function. Here we will combine some adjacent peaks to form a group of peaks. This group will have an anchor, then we address other peaks inside the group using that anchor. The address will be identified by (anchor frequency, peak frequency, delta time between peaks and anchor). This tuple is easily fits in a 32-bit integer. To advance 64 bits, I also store the anchor address along with each peak.
70 | 
71 | ### Searching and Ranking
72 | When identifying matching audio, the system first processes the input sample to create a fingerprint. After retrieving potential matching fingerprints from the database, the algorithm performs a temporal coherence analysis by sorting the retrieved fingerprints according to their chronological appearance in the sample.
73 | 
74 | The system then calculates the longest increasing subsequence of these sorted fingerprints, ensuring that the detected peaks maintain the same sequential order as in the original audio. To address the challenge of false positives when comparing short samples against longer recordings, the algorithm implements a sliding window technique. This approach identifies the window with the highest concentration of matching peaks and uses this density for the match score calculation. Final results are then ranked according to these match scores, with higher scores indicating stronger matches.
75 | 


--------------------------------------------------------------------------------
/images/spectrogram_peaks.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kither12/shezem-rs/bb49dd2e7791c65b4a51223b4ed92b715b59605b/images/spectrogram_peaks.jpg


--------------------------------------------------------------------------------
/src/db.rs:
--------------------------------------------------------------------------------
  1 | use std::{cmp::max, collections::HashMap, path::PathBuf};
  2 | 
  3 | use anyhow::Result;
  4 | use rusqlite::{Connection, Transaction, params};
  5 | 
  6 | use crate::{
  7 |     NEIGHBORHOOD_SIZE,
  8 |     fingerprint::{Fingerprint, FingerprintData},
  9 |     utils::longest_increasing_subsequence,
 10 | };
 11 | 
 12 | #[derive(Debug)]
 13 | pub struct SongData {
 14 |     // Only title for now
 15 |     pub title: String,
 16 | }
 17 | 
 18 | #[derive(Debug)]
 19 | pub struct RankingData {
 20 |     pub data: SongData,
 21 |     pub score: i32,
 22 | }
 23 | 
 24 | #[derive(Debug, Clone, PartialEq, Eq, Hash)]
 25 | struct Couples {
 26 |     pub anchor_address: u32,
 27 |     pub anchor_time: u32,
 28 |     pub song_id: i32,
 29 | }
 30 | 
 31 | pub struct DbClient {
 32 |     conn: Connection,
 33 | }
 34 | 
 35 | impl DbClient {
 36 |     pub fn new(path: &PathBuf) -> Self {
 37 |         let conn = Connection::open(path).unwrap();
 38 |         let client = DbClient { conn };
 39 |         client.create_tables().unwrap();
 40 |         client
 41 |     }
 42 |     pub fn get_conn<'a>(&'a mut self) -> Transaction<'a> {
 43 |         self.conn.transaction().unwrap()
 44 |     }
 45 |     pub fn create_tables(&self) -> rusqlite::Result<()> {
 46 |         self.conn.execute(
 47 |             "CREATE TABLE IF NOT EXISTS songs (
 48 |                 id INTEGER PRIMARY KEY AUTOINCREMENT,
 49 |                 title TEXT NOT NULL
 50 |             )",
 51 |             [],
 52 |         )?;
 53 | 
 54 |         self.conn.execute(
 55 |             "CREATE TABLE IF NOT EXISTS fingerprints (
 56 |                 address INTEGER NOT NULL,
 57 |                 anchorAddress INTEGER NOT NULL,
 58 |                 anchorTime INTEGER NOT NULL,
 59 |                 songID INTEGER NOT NULL,
 60 |                 PRIMARY KEY (address, anchorAddress, anchorTime, songID)
 61 |             )",
 62 |             [],
 63 |         )?;
 64 | 
 65 |         Ok(())
 66 |     }
 67 | 
 68 |     pub fn register_song(&self, song_data: &SongData) -> Result<i64> {
 69 |         let mut stmt = self
 70 |             .conn
 71 |             .prepare_cached("INSERT INTO songs (title) VALUES (?)")?;
 72 |         let result = stmt.execute([&song_data.title])?;
 73 | 
 74 |         if result == 0 {
 75 |             return Err(rusqlite::Error::StatementChangedRows(0).into());
 76 |         }
 77 | 
 78 |         let song_id = self.conn.last_insert_rowid();
 79 |         Ok(song_id)
 80 |     }
 81 | 
 82 |     pub fn register_fingerprint<'a>(
 83 |         fingerprint_data: &FingerprintData,
 84 |         tx: &mut Transaction<'a>,
 85 |     ) -> rusqlite::Result<()> {
 86 |         let mut stmt = tx.prepare_cached(
 87 |             "INSERT OR IGNORE INTO fingerprints (address, anchorAddress, anchorTime, songID) VALUES (?, ?, ?, ?)",
 88 |         )?;
 89 |         stmt.execute(params![
 90 |             &fingerprint_data.fingerprint.address,
 91 |             &fingerprint_data.fingerprint.anchor_address,
 92 |             &fingerprint_data.fingerprint.anchor_time,
 93 |             &fingerprint_data.song_id,
 94 |         ])?;
 95 | 
 96 |         Ok(())
 97 |     }
 98 | 
 99 |     pub fn get_song_data(&self, song_id: i32) -> Result<SongData> {
100 |         let mut stmt = self
101 |             .conn
102 |             .prepare_cached("SELECT title FROM songs WHERE id = ?")?;
103 |         let row = stmt.query_row([song_id], |row| Ok(SongData { title: row.get(0)? }))?;
104 | 
105 |         Ok(row)
106 |     }
107 |     fn get_fingerprint_from_database(
108 |         &self,
109 |         fingerprints: &Vec<Fingerprint>,
110 |     ) -> Result<Vec<FingerprintData>> {
111 |         let placeholders: String = std::iter::repeat("?")
112 |             .take(fingerprints.len())
113 |             .collect::<Vec<_>>()
114 |             .join(",");
115 | 
116 |         let params: Vec<&dyn rusqlite::types::ToSql> = fingerprints
117 |             .iter()
118 |             .map(|f| &f.address as &dyn rusqlite::types::ToSql)
119 |             .collect();
120 | 
121 |         let mut stmt = self.conn.prepare(&format!(
122 |                 "SELECT address, anchorAddress, anchorTime, songID FROM fingerprints WHERE address IN ({})",
123 |                 placeholders
124 |             ))?;
125 | 
126 |         let rows = stmt
127 |             .query_map(params.as_slice(), |row| {
128 |                 Ok(FingerprintData {
129 |                     fingerprint: Fingerprint {
130 |                         address: row.get(0)?,
131 |                         anchor_address: row.get(1)?,
132 |                         anchor_time: row.get(2)?,
133 |                     },
134 |                     song_id: row.get(3)?,
135 |                 })
136 |             })?
137 |             .collect::<Result<Vec<_>, _>>()?;
138 | 
139 |         Ok(rows)
140 |     }
141 |     pub fn search(&self, fingerprints: Vec<Fingerprint>, rank: usize) -> Result<Vec<RankingData>> {
142 |         let sample_duration =
143 |             fingerprints.last().unwrap().anchor_time - fingerprints.first().unwrap().anchor_time;
144 | 
145 |         let index_map: HashMap<u32, usize> = fingerprints
146 |             .iter()
147 |             .enumerate()
148 |             .map(|(i, fp)| (fp.address, i))
149 |             .collect();
150 | 
151 |         let db_result = self.get_fingerprint_from_database(&fingerprints)?;
152 | 
153 |         // Track fingerprint matches directly by song ID
154 |         let mut song_fingerprints: HashMap<i32, Vec<Fingerprint>> = HashMap::new();
155 | 
156 |         // Count matches to identify complete neighborhoods
157 |         let mut match_counts: HashMap<Couples, usize> = HashMap::new();
158 | 
159 |         for row in db_result {
160 |             let key = Couples {
161 |                 anchor_address: row.fingerprint.anchor_address,
162 |                 anchor_time: row.fingerprint.anchor_time,
163 |                 song_id: row.song_id,
164 |             };
165 |             let count = match_counts.entry(key).or_insert(0);
166 |             *count += 1;
167 | 
168 |             if *count == NEIGHBORHOOD_SIZE {
169 |                 song_fingerprints
170 |                     .entry(row.song_id)
171 |                     .or_insert_with(Vec::new)
172 |                     .push(row.fingerprint);
173 |             }
174 |         }
175 | 
176 |         /*
177 |             After get the fingerprints for each song from database, we need to verify their temporal coherence with the sample.
178 | 
179 |             We'll implement a Longest Increasing Subsequence (LIS) algorithm on each song's fingerprint to ensure the chronological
180 |             order of peaks matches our sample's pattern.
181 | 
182 |             To guard against false positives from random matching peaks, we'll employ a sliding window technique. This window,
183 |             sized to match our sample duration, will move across the LIS results. The maximum number of matching peaks detected
184 |             within any window position will serve as our relevance score for that song.
185 |         */
186 | 
187 |         let mut result = Vec::with_capacity(song_fingerprints.len());
188 |         for (song_id, mut fingerprints) in song_fingerprints {
189 |             fingerprints.sort_unstable_by_key(|fp| index_map[&fp.address]);
190 |             let times = fingerprints
191 |                 .iter()
192 |                 .map(|v| v.anchor_time)
193 |                 .collect::<Box<[u32]>>();
194 |             let lis = longest_increasing_subsequence(&times);
195 | 
196 |             let mut l: usize = 0;
197 |             let mut score = 0;
198 |             for r in 0..lis.len() {
199 |                 while lis[r] - lis[l] > sample_duration {
200 |                     l += 1;
201 |                 }
202 |                 score = max(score, r - l + 1);
203 |             }
204 | 
205 |             result.push((song_id, score as i32));
206 |         }
207 | 
208 |         result.sort_unstable_by_key(|a| -a.1);
209 |         result.truncate(rank);
210 | 
211 |         Ok(result
212 |             .into_iter()
213 |             .filter_map(|(song_id, score)| {
214 |                 self.get_song_data(song_id)
215 |                     .ok()
216 |                     .map(|data| RankingData { data, score })
217 |             })
218 |             .collect::<Vec<_>>())
219 |     }
220 | }
221 | 


--------------------------------------------------------------------------------
/src/fingerprint.rs:
--------------------------------------------------------------------------------
 1 | use crate::{NEIGHBORHOOD_SIZE, spectrogram::Peak};
 2 | 
 3 | #[derive(Debug)]
 4 | pub struct FingerprintData {
 5 |     pub fingerprint: Fingerprint,
 6 |     pub song_id: i32,
 7 | }
 8 | 
 9 | #[derive(Debug)]
10 | pub struct Fingerprint {
11 |     pub address: u32,
12 |     pub anchor_address: u32,
13 |     pub anchor_time: u32,
14 | }
15 | 
16 | pub fn generate_fingerprint(mut peaks: Vec<Peak>) -> Vec<Fingerprint> {
17 |     peaks.sort_by(|a, b| {
18 |         a.time
19 |             .partial_cmp(&b.time)
20 |             .unwrap()
21 |             .then(a.freq.partial_cmp(&b.freq).unwrap())
22 |     });
23 | 
24 |     let mut addresses = Vec::new();
25 |     for i in 0..(peaks.len() - NEIGHBORHOOD_SIZE) {
26 |         let anchor_address = build_address(&peaks[i], &peaks[i + NEIGHBORHOOD_SIZE]);
27 |         for j in i..(i + NEIGHBORHOOD_SIZE) {
28 |             let address = build_address(&peaks[i], &peaks[j]);
29 |             addresses.push(Fingerprint {
30 |                 address,
31 |                 anchor_address,
32 |                 anchor_time: (peaks[i].time * 1000.0) as u32,
33 |             });
34 |         }
35 |     }
36 |     addresses
37 | }
38 | 
39 | pub fn build_address(peak_a: &Peak, peak_b: &Peak) -> u32 {
40 |     let delta_time = (peak_b.time - peak_a.time) * 1000.0;
41 |     /*
42 |         9 bits for storing peak_a.freq
43 |         9 bits for storing peak_b.freq
44 |         14 bits for storing delta_time
45 |     */
46 |     (peak_a.freq << 23) | (peak_b.freq << 14) | delta_time as u32
47 | }
48 | 


--------------------------------------------------------------------------------
/src/lib.rs:
--------------------------------------------------------------------------------
 1 | use std::{collections::VecDeque, fs, path::PathBuf};
 2 | 
 3 | use anyhow::{Ok, Result};
 4 | use db::{DbClient, SongData};
 5 | use fingerprint::{FingerprintData, generate_fingerprint};
 6 | use sample::Sample;
 7 | use spectrogram::{filter_spectrogram, generate_spectrogram};
 8 | 
 9 | pub mod db;
10 | pub mod fingerprint;
11 | pub mod sample;
12 | pub mod spectrogram;
13 | pub mod utils;
14 | 
15 | const NEIGHBORHOOD_SIZE: usize = 5;
16 | 
17 | pub fn index_folder(path: &PathBuf, database_path: &PathBuf) -> Result<()> {
18 |     let entries: Vec<_> = fs::read_dir(path)?.collect::<Result<_, _>>()?;
19 |     //this stack hold the song ids
20 |     let mut stack = VecDeque::new();
21 | 
22 |     //First we insert the song information into the song database
23 |     {
24 |         let db_client = DbClient::new(database_path);
25 |         for entry in &entries {
26 |             if entry.path().extension().and_then(|e| e.to_str()) != Some("mp3") {
27 |                 continue;
28 |             }
29 | 
30 |             let title = entry
31 |                 .path()
32 |                 .file_stem()
33 |                 .unwrap()
34 |                 .to_str()
35 |                 .unwrap()
36 |                 .to_string();
37 |             let song_id = db_client.register_song(&SongData { title })?;
38 |             stack.push_back(song_id as i32);
39 |         }
40 |     }
41 | 
42 |     //Then we bulk insert all the fingerprints into the fingerprint database
43 |     {
44 |         let mut db_client = DbClient::new(database_path);
45 |         let mut tx = db_client.get_conn();
46 |         for entry in entries {
47 |             if entry.path().extension().and_then(|e| e.to_str()) != Some("mp3") {
48 |                 continue;
49 |             }
50 | 
51 |             let mut sample = Sample::read_mp3(&entry.path())?;
52 |             sample = sample.downsample(4);
53 |             let mut spectrogram =
54 |                 generate_spectrogram(&sample.sample, spectrogram::WindowSize::S1024, 512);
55 |             let peaks = filter_spectrogram(&mut spectrogram, sample.sample_rate);
56 |             let fingerprints = generate_fingerprint(peaks);
57 | 
58 |             let song_id = stack.pop_front().unwrap();
59 |             for fingerprint in fingerprints {
60 |                 DbClient::register_fingerprint(
61 |                     &FingerprintData {
62 |                         fingerprint,
63 |                         song_id,
64 |                     },
65 |                     &mut tx,
66 |                 )?;
67 |             }
68 |         }
69 |         tx.commit()?;
70 |     }
71 |     Ok(())
72 | }
73 | 
74 | pub fn search(query_file: &PathBuf, database_path: &PathBuf, rank: usize) -> Result<()> {
75 |     let db_client = DbClient::new(database_path);
76 | 
77 |     let mut sample = Sample::read_mp3(query_file)?;
78 |     sample = sample.downsample(4);
79 | 
80 |     let mut spectrogram = generate_spectrogram(&sample.sample, spectrogram::WindowSize::S1024, 512);
81 |     let peaks = filter_spectrogram(&mut spectrogram, sample.sample_rate);
82 | 
83 |     let fingerprints = generate_fingerprint(peaks);
84 | 
85 |     let ranking = db_client.search(fingerprints, rank)?;
86 |     for (index, data) in ranking.iter().enumerate() {
87 |         println!("{}. {} (score: {})", index + 1, data.data.title, data.score);
88 |     }
89 |     Ok(())
90 | }
91 | 


--------------------------------------------------------------------------------
/src/main.rs:
--------------------------------------------------------------------------------
 1 | use anyhow::Result;
 2 | use clap::{Parser, Subcommand};
 3 | use shezem_rs::{index_folder, search};
 4 | use std::path::PathBuf;
 5 | 
 6 | #[derive(Parser)]
 7 | #[command(
 8 |     name = "shezem-rs",
 9 |     about = "Index and retrieve audio files",
10 |     version = "0.0.1",
11 |     author = "Kither"
12 | )]
13 | struct Cli {
14 |     #[command(subcommand)]
15 |     command: Commands,
16 | }
17 | 
18 | #[derive(Subcommand)]
19 | enum Commands {
20 |     Index {
21 |         #[arg(value_name = "PATH")]
22 |         path: PathBuf,
23 |     },
24 | 
25 |     Search {
26 |         #[arg(value_name = "AUDIO_FILE")]
27 |         query_file: PathBuf,
28 | 
29 |         #[arg(short, long, value_name = "DB_PATH")]
30 |         path: PathBuf,
31 | 
32 |         #[arg(short, long, default_value = "10")]
33 |         rank: usize,
34 |     },
35 | }
36 | 
37 | const DEFAULT_DB_PATH: &str = "db.db3";
38 | const DEFAULT_FOLDER_DB_PATH: &str = ".db";
39 | 
40 | fn main() -> Result<()> {
41 |     let cli = Cli::parse();
42 | 
43 |     match &cli.command {
44 |         Commands::Index { path } => {
45 |             let db_folder_path = path.join(DEFAULT_FOLDER_DB_PATH);
46 |             if !db_folder_path.exists() {
47 |                 std::fs::create_dir_all(&db_folder_path)?;
48 |             }
49 | 
50 |             let default_db_path = db_folder_path.join(DEFAULT_DB_PATH);
51 |             index_folder(path, &default_db_path)?;
52 |             Ok(())
53 |         }
54 | 
55 |         Commands::Search {
56 |             query_file,
57 |             path,
58 |             rank,
59 |         } => {
60 |             let default_db_path = path.join(DEFAULT_FOLDER_DB_PATH).join(DEFAULT_DB_PATH);
61 |             search(query_file, &default_db_path, *rank)?;
62 |             Ok(())
63 |         }
64 |     }
65 | }
66 | 


--------------------------------------------------------------------------------
/src/sample.rs:
--------------------------------------------------------------------------------
 1 | use std::{fs::File, io::BufReader, path::PathBuf};
 2 | 
 3 | use anyhow::Result;
 4 | 
 5 | pub struct Sample {
 6 |     pub sample: Vec<f32>,
 7 |     pub sample_rate: usize,
 8 | }
 9 | 
10 | impl Sample {
11 |     pub fn low_pass_filter(&self, cutoff_freq: f32) -> Sample {
12 |         // IIR low pass filter
13 |         // y[n] = alpha * x[n] + (1.0 - alpha) * y[n-1]
14 | 
15 |         let fc = cutoff_freq / self.sample_rate as f32;
16 |         let alpha = 2.0 * std::f32::consts::PI * fc / (2.0 * std::f32::consts::PI * fc + 1.0);
17 | 
18 |         let mut filtered = vec![0.0; self.sample.len()];
19 | 
20 |         if !self.sample.is_empty() {
21 |             filtered[0] = self.sample[0];
22 |         }
23 | 
24 |         for i in 1..self.sample.len() {
25 |             filtered[i] = alpha * self.sample[i] + (1.0 - alpha) * filtered[i - 1];
26 |         }
27 | 
28 |         Sample {
29 |             sample: filtered,
30 |             sample_rate: self.sample_rate,
31 |         }
32 |     }
33 | 
34 |     pub fn downsample(&mut self, factor: usize) -> Sample {
35 |         /*
36 |             When downsampling by a factor, the Nyquist frequency of the new sample rate
37 |             will be (sample_rate/factor)/2. To prevent aliasing, we need to filter out
38 |             frequencies above this threshold. Using 0.45 instead of 0.5 provides a small
39 |             margin to account for the non-ideal nature of our simple filter.
40 |         */
41 |         let cutoff_freq = (self.sample_rate / factor) as f32 * 0.45;
42 |         let filtered = self.low_pass_filter(cutoff_freq);
43 | 
44 |         let new_len = filtered.sample.len() / factor;
45 |         let mut downsampled = Vec::with_capacity(new_len);
46 | 
47 |         for i in (0..filtered.sample.len()).step_by(factor) {
48 |             downsampled.push(filtered.sample[i]);
49 |         }
50 | 
51 |         Sample {
52 |             sample: downsampled,
53 |             sample_rate: self.sample_rate / factor,
54 |         }
55 |     }
56 | 
57 |     pub fn read_mp3(path: &PathBuf) -> Result<Self> {
58 |         let file = File::open(path)?;
59 |         let reader = BufReader::new(file);
60 |         let mut decoder = minimp3::Decoder::new(reader);
61 | 
62 |         let mut mono_samples = Vec::new();
63 |         let mut sampling_rate = 0;
64 | 
65 |         while let Ok(minimp3::Frame {
66 |             data,
67 |             sample_rate,
68 |             channels,
69 |             ..
70 |         }) = decoder.next_frame()
71 |         {
72 |             if sampling_rate == 0 {
73 |                 sampling_rate = sample_rate;
74 |             }
75 | 
76 |             match channels {
77 |                 1 => {
78 |                     mono_samples.extend(data.iter().map(|&s| s as f32));
79 |                 }
80 |                 2 => {
81 |                     let len = data.len() / 2;
82 |                     mono_samples.reserve(len);
83 | 
84 |                     for chunk in data.chunks_exact(2) {
85 |                         let avg = (chunk[0] as f32 + chunk[1] as f32) * 0.5;
86 |                         mono_samples.push(avg);
87 |                     }
88 |                 }
89 |                 _ => panic!("Unsupported number of channels: {}", channels),
90 |             }
91 |         }
92 |         Ok(Sample {
93 |             sample: mono_samples,
94 |             sample_rate: sampling_rate as usize,
95 |         })
96 |     }
97 | }
98 | 


--------------------------------------------------------------------------------
/src/spectrogram.rs:
--------------------------------------------------------------------------------
  1 | use std::f32::consts::PI;
  2 | 
  3 | use microfft::Complex32;
  4 | 
  5 | pub fn hamming_window(samples: &[f32]) -> Vec<f32> {
  6 |     let mut windowed_samples = Vec::with_capacity(samples.len());
  7 |     let n = samples.len() as f32;
  8 |     for (i, sample) in samples.iter().enumerate() {
  9 |         let multiplier = 0.54 - (0.92 * PI * i as f32 / n).cos();
 10 |         windowed_samples.push(multiplier * sample)
 11 |     }
 12 |     windowed_samples
 13 | }
 14 | 
 15 | #[derive(Debug, Clone, Copy)]
 16 | pub enum WindowSize {
 17 |     S2,
 18 |     S4,
 19 |     S8,
 20 |     S16,
 21 |     S32,
 22 |     S64,
 23 |     S128,
 24 |     S256,
 25 |     S512,
 26 |     S1024,
 27 |     S2048,
 28 |     S4096,
 29 |     S8192,
 30 | }
 31 | 
 32 | impl From<WindowSize> for usize {
 33 |     fn from(window_size: WindowSize) -> Self {
 34 |         match window_size {
 35 |             WindowSize::S2 => 2,
 36 |             WindowSize::S4 => 4,
 37 |             WindowSize::S8 => 8,
 38 |             WindowSize::S16 => 16,
 39 |             WindowSize::S32 => 32,
 40 |             WindowSize::S64 => 64,
 41 |             WindowSize::S128 => 128,
 42 |             WindowSize::S256 => 256,
 43 |             WindowSize::S512 => 512,
 44 |             WindowSize::S1024 => 1024,
 45 |             WindowSize::S2048 => 2048,
 46 |             WindowSize::S4096 => 4096,
 47 |             WindowSize::S8192 => 8192,
 48 |         }
 49 |     }
 50 | }
 51 | 
 52 | impl From<WindowSize> for f32 {
 53 |     fn from(window_size: WindowSize) -> Self {
 54 |         usize::from(window_size) as f32
 55 |     }
 56 | }
 57 | 
 58 | pub fn apply_fft(sample: &[f32], window_size: WindowSize) -> Vec<Complex32> {
 59 |     let size: usize = window_size.into();
 60 |     if sample.len() != size {
 61 |         panic!("Sample length must match window size");
 62 |     }
 63 | 
 64 |     let result = match window_size {
 65 |         WindowSize::S2 => {
 66 |             let mut array = [0.0f32; 2];
 67 |             array.copy_from_slice(&sample[0..2]);
 68 |             let complex_result = microfft::real::rfft_2(&mut array);
 69 |             complex_result.to_vec()
 70 |         }
 71 |         WindowSize::S4 => {
 72 |             let mut array = [0.0f32; 4];
 73 |             array.copy_from_slice(&sample[0..4]);
 74 |             let complex_result = microfft::real::rfft_4(&mut array);
 75 |             complex_result.to_vec()
 76 |         }
 77 |         WindowSize::S8 => {
 78 |             let mut array = [0.0f32; 8];
 79 |             array.copy_from_slice(&sample[0..8]);
 80 |             let complex_result = microfft::real::rfft_8(&mut array);
 81 |             complex_result.to_vec()
 82 |         }
 83 |         WindowSize::S16 => {
 84 |             let mut array = [0.0f32; 16];
 85 |             array.copy_from_slice(&sample[0..16]);
 86 |             let complex_result = microfft::real::rfft_16(&mut array);
 87 |             complex_result.to_vec()
 88 |         }
 89 |         WindowSize::S32 => {
 90 |             let mut array = [0.0f32; 32];
 91 |             array.copy_from_slice(&sample[0..32]);
 92 |             let complex_result = microfft::real::rfft_32(&mut array);
 93 |             complex_result.to_vec()
 94 |         }
 95 |         WindowSize::S64 => {
 96 |             let mut array = [0.0f32; 64];
 97 |             array.copy_from_slice(&sample[0..64]);
 98 |             let complex_result = microfft::real::rfft_64(&mut array);
 99 |             complex_result.to_vec()
100 |         }
101 |         WindowSize::S128 => {
102 |             let mut array = [0.0f32; 128];
103 |             array.copy_from_slice(&sample[0..128]);
104 |             let complex_result = microfft::real::rfft_128(&mut array);
105 |             complex_result.to_vec()
106 |         }
107 |         WindowSize::S256 => {
108 |             let mut array = [0.0f32; 256];
109 |             array.copy_from_slice(&sample[0..256]);
110 |             let complex_result = microfft::real::rfft_256(&mut array);
111 |             complex_result.to_vec()
112 |         }
113 |         WindowSize::S512 => {
114 |             let mut array = [0.0f32; 512];
115 |             array.copy_from_slice(&sample[0..512]);
116 |             let complex_result = microfft::real::rfft_512(&mut array);
117 |             complex_result.to_vec()
118 |         }
119 |         WindowSize::S1024 => {
120 |             let mut array = [0.0f32; 1024];
121 |             array.copy_from_slice(&sample[0..1024]);
122 |             let complex_result = microfft::real::rfft_1024(&mut array);
123 |             complex_result.to_vec()
124 |         }
125 |         WindowSize::S2048 => {
126 |             let mut array = [0.0f32; 2048];
127 |             array.copy_from_slice(&sample[0..2048]);
128 |             let complex_result = microfft::real::rfft_2048(&mut array);
129 |             complex_result.to_vec()
130 |         }
131 |         WindowSize::S4096 => {
132 |             let mut array = [0.0f32; 4096];
133 |             array.copy_from_slice(&sample[0..4096]);
134 |             let complex_result = microfft::real::rfft_4096(&mut array);
135 |             complex_result.to_vec()
136 |         }
137 |         WindowSize::S8192 => {
138 |             let mut array = [0.0f32; 8192];
139 |             array.copy_from_slice(&sample[0..8192]);
140 |             let complex_result = microfft::real::rfft_8192(&mut array);
141 |             complex_result.to_vec()
142 |         }
143 |     };
144 | 
145 |     result
146 | }
147 | 
148 | pub struct FFTWindow {
149 |     pub start_idx: usize,
150 |     pub data: Vec<Complex32>,
151 | }
152 | 
153 | pub fn generate_spectrogram(
154 |     sample: &[f32],
155 |     window_size: WindowSize,
156 |     overlap: usize,
157 | ) -> Vec<FFTWindow> {
158 |     let w_size = window_size.into();
159 |     if overlap >= w_size {
160 |         panic!("overlap size must less than window size");
161 |     }
162 | 
163 |     let mut start: usize = 0;
164 | 
165 |     let mut spectrogram = Vec::new();
166 | 
167 |     while start < sample.len() {
168 |         if start + w_size > sample.len() {
169 |             start = sample.len() - w_size;
170 |         }
171 |         let window = hamming_window(&sample[start..start + w_size]);
172 |         let fft_res = apply_fft(&window, window_size);
173 | 
174 |         spectrogram.push(FFTWindow {
175 |             start_idx: start,
176 |             data: fft_res,
177 |         });
178 | 
179 |         if start + w_size >= sample.len() {
180 |             break;
181 |         }
182 |         start += w_size - overlap;
183 |     }
184 |     spectrogram
185 | }
186 | 
187 | #[derive(Debug)]
188 | pub struct Peak {
189 |     pub time: f32,
190 |     pub freq: u32,
191 | }
192 | 
193 | pub fn filter_spectrogram(spectrogram: &mut Vec<FFTWindow>, sample_rate: usize) -> Vec<Peak> {
194 |     let bands = [
195 |         (20, 30),
196 |         (30, 40),
197 |         (40, 60),
198 |         (60, 80),
199 |         (80, 120),
200 |         (120, 200),
201 |         (200, 511),
202 |     ];
203 | 
204 |     let mut peaks = Vec::new();
205 | 
206 |     for window in spectrogram.iter() {
207 |         let mut strongest_bins = Vec::with_capacity(bands.len());
208 | 
209 |         for &(start, end) in &bands {
210 |             let end = end.min(window.data.len() - 1);
211 | 
212 |             let mut max_magnitude = 0.0;
213 |             let mut max_bin = start;
214 | 
215 |             for bin in start..end {
216 |                 if bin < window.data.len() {
217 |                     let magnitude = window.data[bin].norm_sqr();
218 |                     if magnitude > max_magnitude {
219 |                         max_magnitude = magnitude;
220 |                         max_bin = bin;
221 |                     }
222 |                 }
223 |             }
224 | 
225 |             strongest_bins.push((max_bin, window.data[max_bin]));
226 |         }
227 | 
228 |         let average_magnitude = strongest_bins
229 |             .iter()
230 |             .map(|(_, complex)| complex.norm_sqr())
231 |             .sum::<f32>()
232 |             / strongest_bins.len() as f32;
233 | 
234 |         let threshold = average_magnitude;
235 | 
236 |         for (bin_index, complex) in strongest_bins {
237 |             if complex.norm_sqr() > threshold {
238 |                 let time_in_seconds = window.start_idx as f32 / sample_rate as f32;
239 |                 peaks.push(Peak {
240 |                     time: time_in_seconds,
241 |                     freq: bin_index as u32,
242 |                 });
243 |             }
244 |         }
245 |     }
246 | 
247 |     peaks
248 | }
249 | 


--------------------------------------------------------------------------------
/src/utils.rs:
--------------------------------------------------------------------------------
 1 | use std::cmp::Ordering::{Greater, Less};
 2 | 
 3 | pub fn longest_increasing_subsequence<T>(a: &[T]) -> Vec<T>
 4 | where
 5 |     T: Ord + Clone,
 6 | {
 7 |     if a.is_empty() {
 8 |         return Vec::new();
 9 |     }
10 | 
11 |     let mut m: Vec<usize> = Vec::with_capacity(a.len());
12 |     let mut p: Vec<usize> = vec![0; a.len()];
13 | 
14 |     for i in 0..a.len() {
15 |         match m.binary_search_by(|&j| if a[j] > a[i] { Greater } else { Less }) {
16 |             Ok(pos) => {
17 |                 m[pos] = i;
18 |                 if pos > 0 {
19 |                     p[i] = m[pos - 1];
20 |                 }
21 |             }
22 |             Err(pos) => {
23 |                 if pos > 0 {
24 |                     p[i] = m[pos - 1];
25 |                 }
26 |                 if pos == m.len() {
27 |                     m.push(i);
28 |                 } else {
29 |                     m[pos] = i;
30 |                 }
31 |             }
32 |         }
33 |     }
34 | 
35 |     let mut result = Vec::with_capacity(m.len());
36 |     if !m.is_empty() {
37 |         let mut k = m[m.len() - 1];
38 |         result.push(a[k].clone());
39 | 
40 |         for _ in 0..m.len() - 1 {
41 |             k = p[k];
42 |             result.push(a[k].clone());
43 |         }
44 | 
45 |         result.reverse();
46 |     }
47 | 
48 |     result
49 | }
50 | 
51 | #[cfg(test)]
52 | mod tests {
53 |     use super::*;
54 | 
55 |     #[test]
56 |     fn test_empty() {
57 |         let empty: Vec<i32> = vec![];
58 |         assert_eq!(longest_increasing_subsequence(&empty), vec![]);
59 |     }
60 | 
61 |     #[test]
62 |     fn test_single_element() {
63 |         let single = vec![5];
64 |         assert_eq!(longest_increasing_subsequence(&single), vec![5]);
65 |     }
66 | 
67 |     #[test]
68 |     fn test_all_same() {
69 |         let all_same = vec![3, 3, 3, 3];
70 |         assert_eq!(longest_increasing_subsequence(&all_same), vec![3, 3, 3, 3]);
71 |     }
72 | 
73 |     #[test]
74 |     fn test_strictly_increasing() {
75 |         let increasing = vec![1, 2, 3, 4, 5];
76 |         assert_eq!(
77 |             longest_increasing_subsequence(&increasing),
78 |             vec![1, 2, 3, 4, 5]
79 |         );
80 |     }
81 | }
82 | 


--------------------------------------------------------------------------------