├── .gitignore ├── CHANGELOG.md ├── Cargo.toml ├── LICENSE ├── README.md ├── Rocket.toml ├── examples ├── cached_files │ ├── Cargo.toml │ ├── src │ │ ├── main.rs │ │ └── tests.rs │ └── www │ │ └── test.txt └── dynamic_files │ ├── Cargo.toml │ ├── src │ └── main.rs │ └── www │ └── test.txt ├── rustfmt.toml └── src ├── cache.rs ├── cache_builder.rs ├── cached_file.rs ├── in_memory_file.rs ├── lib.rs ├── named_in_memory_file.rs └── priority_function.rs /.gitignore: -------------------------------------------------------------------------------- 1 | /target/ 2 | **/*.rs.bk 3 | Cargo.lock 4 | .idea 5 | **/target/ 6 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # 1.0.0-beta 2 | ### Misc 3 | * `CacheBuilder::new()` no longer takes a `size_limit: usize` parameter. 4 | The maximum size can now be set with a `size_limit()` function on the builder. 5 | If the size is not set, the cache will assume it has a `usize::MAX` size, meaning that it will never rotate elements out of the cache. 6 | * `Cache::new()` removed. Use CacheBuilder::new().build().unwrap() instead. 7 | 8 | # 0.12.0 9 | ### Features 10 | * Automatically refresh files in the cache based on a specified number of accesses. 11 | 12 | ### Misc 13 | * `Cache::refresh()` now returns a `CachedFile` instead of a `bool`. 14 | * Responding with a `NamedInMemoryFile` will set the raw body of the response instead of streaming the file. 15 | This should fix a bug regarding displaying estimated download times. 16 | 17 | # 0.11.1 18 | ### Misc 19 | * Removed `unwrap()`s from the Cache, making sure it will not `panic!`, even under very rare concurrent conditions. 20 | * Improved documentation. 21 | 22 | # 0.11.0 23 | ### Misc 24 | * `Cache::get()` now returns just a `CachedFile` instead of an `Option` 25 | * `CachedFile` now handles the error case where the requested file cannot be found, supplanting the need for an `Option`. 26 | * A proper use of the cache now looks like: 27 | ```rust 28 | #[get("/")] 29 | fn files(file: PathBuf, cache: State ) -> CachedFile { 30 | CachedFile::open(Path::new("static/").join(file), cache.inner()) 31 | } 32 | ``` 33 | 34 | # 0.10.1 35 | ### Misc 36 | * Improved documentation. 37 | 38 | # 0.10.0 39 | ### Misc 40 | * Moved `FileStats` struct into in_memory_file.rs. 41 | * Removed capacity field from `CacheBuilder`. 42 | 43 | # 0.9.0 44 | ### Misc 45 | * `CacheBuilder::new()` now takes a `usize` indicating what the size limit (bytes) of the cache should be. 46 | * `CacheBuilder::size_limit_bytes()` has been removed. 47 | * Added ability to set concurrency setting of the cache's backing concurrent HashMaps. 48 | * `NamedInMemoryFile` must be imported as `use rocket_file_cache::named_in_memory_file::NamedInMemoryFile` now. 49 | * `PriorityFunction` type is no longer public. 50 | * Made calculation of priority functions protected against overflowing, 51 | allowing the access count for files in the cache to be safely set to `usize::MAX` by calling `alter_access_count()`, 52 | effectively marking them as always in the cache. 53 | * `Cache::remove()` now returns a `bool`. 54 | 55 | 56 | # 0.8.0 57 | ### Features 58 | * The cache is now fully concurrent. 59 | * This means that the Cache no longer needs to be wrapped in a mutex. 60 | * Performance under heavy loads should be better. 61 | * Calling `cache.get()` will return a ResponderFile that may contain a lock bound to the file in the Cache. 62 | 63 | 64 | ### Misc 65 | * `ResponderFile` is now named `CachedFile`. 66 | 67 | ### Regressions 68 | * Use of this library with pools of caches as well as falling back to getting files from the FS if the cache is locked no longer work. 69 | 70 | # 0.7.0 71 | ### Misc 72 | * Public functions for `Cache` now take `P: AsRef` instead of `PathBuf` or `&PathBuf` now. 73 | 74 | # 0.6.2 75 | ### Misc 76 | * Fixed how incrementing access counts and updating stats for files in the cache works. 77 | * This should make performance for cache misses better. 78 | 79 | 80 | # 0.6.1 81 | ### Misc 82 | * Improved documentation. 83 | 84 | 85 | # 0.6.0 86 | ### Features 87 | * Added `Cache::alter_access_count()`, which allows the setting of the access count for a given file in the cache. 88 | * This allows for manual control of the priority of a given file. 89 | * Added `Cache::alter_all_access_counts()`, which allows the setting of all access counts for every file monitored by the cache. 90 | 91 | ### Misc 92 | * `Cache::get()` takes a `&PathBuf` instead of a `Pathbuf`. 93 | * Moved `ResponderFile` into its own module. 94 | 95 | # 0.5.0 96 | ### Misc 97 | * Implemented `ResponderFile` as a replacement for `RespondableFile`. 98 | * `ResponderFile` is implemented as a normal enum, instead of the former tuple struct that wrapped an `Either`. 99 | 100 | # 0.4.1 101 | ### Misc 102 | * Changed the priority functions to be functions instead of constants. 103 | * Changed `SizedFile` into `InMemoryFile` 104 | 105 | ### Bug Fixes 106 | * Fixed a bug related to incrementing access counts and updating stats. 107 | 108 | # 0.4.0 109 | ### Features 110 | * Return a `RespondableFile`, which wraps either a `rocket::response::NamedFile`, or a `CachedFile`. 111 | * This allows the cache to return a NamedFile if it knows that the requested file will not make it into the cache. 112 | * This vastly improves performance on cache-misses because creating a CachedFile requires reading the whole file into memory, 113 | and then copying it when setting the response. 114 | Responding with a NamedFile sets the response's body by directly reading the file, which is faster. 115 | 116 | # 0.3.0 117 | ### Features 118 | * Added `CacheBuilder` that allows configuring the cache at instantiation. 119 | 120 | ### Misc 121 | * Split project into multiple files. 122 | 123 | # 0.2.2 124 | ### Misc 125 | * Added another constructor that takes a priority function, as well as the maximum size. 126 | 127 | # 0.2.1 128 | ### Misc 129 | * Improved documentation 130 | 131 | # 0.2.0 132 | ### Features 133 | * Added priority functions, allowing consumers of the crate to set the algorithm used to determine which files make it into the cache. 134 | 135 | 136 | # 0.1.0 137 | * Initial publish -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "rocket-file-cache" 3 | version = "1.0.0" 4 | authors = ["Henry Zimmerman "] 5 | exclude = [ 6 | ] 7 | 8 | 9 | 10 | description = "An in-memory file cache for the Rocket web framework." 11 | readme = "README.md" 12 | 13 | homepage = "https://github.com/hgzimmerman/rocket-file-cache" 14 | repository = "https://github.com/hgzimmerman/rocket-file-cache" 15 | documentation = "https://docs.rs/crate/rocket-file-cache" 16 | license = "MIT" 17 | 18 | [badges] 19 | # There may be a small push in the near future where this is marked deprecated. 20 | # I am working on abstracting the caching mechanism used in this crate into a more flexable crate, 21 | # from which a non Rocket-centric web-file-cache can be built. 22 | # When that is finished, I will mark this crate as deprecated and suggest a migration to the new one. 23 | # Expect minor breaking changes. 24 | maintenance = {status = "passively-maintained"} 25 | 26 | [dependencies] 27 | rocket = "0.4" 28 | log = "0.4.6" 29 | concurrent-hashmap = "0.2.2" 30 | 31 | [dev-dependencies] 32 | tempdir = "0.3.7" 33 | rand = "0.6.4" 34 | 35 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Henry Zimmerman 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Current Crates.io Version](https://img.shields.io/crates/v/rocket-file-cache.svg)](https://crates.io/crates/rocket-file-cache) 2 | 3 | # Rocket File Cache 4 | A concurrent, in-memory file cache for the Rocket web framework. 5 | 6 | Rocket File Cache can be used as a drop in replacement for Rocket's NamedFile when serving files. 7 | 8 | This code from the [static_files](https://github.com/SergioBenitez/Rocket/blob/master/examples/static_files/src/main.rs) example from Rocket: 9 | ```rust 10 | #[get("/")] 11 | fn files(file: PathBuf) -> Option { 12 | NamedFile::open(Path::new("static/").join(file)).ok() 13 | } 14 | 15 | fn main() { 16 | rocket::ignite().mount("/", routes![files]).launch(); 17 | } 18 | ``` 19 | Can be sped up by getting files via a cache instead: 20 | ```rust 21 | #[get("/")] 22 | fn files(file: PathBuf, cache: State ) -> CachedFile { 23 | CachedFile::open(Path::new("static/").join(file), cache.inner()) 24 | } 25 | 26 | 27 | fn main() { 28 | let cache: Cache = CacheBuilder::new() 29 | .size_limit(1024 * 1024 * 40) // 40 MB 30 | .build() 31 | .unwrap(); 32 | 33 | rocket::ignite() 34 | .manage(cache) 35 | .mount("/", routes![files]) 36 | .launch(); 37 | } 38 | ``` 39 | 40 | 41 | # Use case 42 | Rocket File Cache keeps a set of frequently accessed files in memory so your webserver won't have to wait for your disk to read the files. 43 | This should improve latency and throughput on systems that are bottlenecked on disk I/O. 44 | 45 | If you are serving a known size of static files (index.html, js bundle, a couple of assets), 46 | you should try to set the maximum size of the cache to let them all fit, 47 | especially if all of these are served every time someone visits your website. 48 | 49 | If you serve static files with a larger aggregate size than what would nicely fit into memory, 50 | but you have some content that is visited more often than others, you should specify enough space for the cache 51 | so that the most popular content will fit. 52 | If your popular content changes over time, and you want the cache to reflect what is currently most popular, 53 | it is possible to use the `alter_all_access_counts()` method to reduce the access count of all items currently in the cache, 54 | making it easier for newer content to find its way into the cache. 55 | 56 | 57 | If you serve user created files, the same logic regarding file popularity applies, 58 | only that you may want to spawn a thread every 10000 or so requests that will use `alter_all_access_counts()` 59 | to reduce the access counts of the items in the cache. 60 | 61 | ### Performance 62 | 63 | The bench tests try to get the file from whatever source, either cache or filesystem, and read it once into memory. 64 | The misses measure the time it takes for the cache to realize that the file is not stored, and to read the file from disk. 65 | Running the bench tests on an AWS EC2 t2 micro instance (82 MB/s HDD) returned these results: 66 | ``` 67 | test cache::tests::cache_get_10mb ... bench: 1,444,068 ns/iter (+/- 251,467) 68 | test cache::tests::cache_get_1mb ... bench: 79,397 ns/iter (+/- 4,613) 69 | test cache::tests::cache_get_1mb_from_1000_entry_cache ... bench: 79,038 ns/iter (+/- 1,751) 70 | test cache::tests::cache_get_5mb ... bench: 724,262 ns/iter (+/- 7,751) 71 | test cache::tests::cache_miss_10mb ... bench: 3,184,473 ns/iter (+/- 299,657) 72 | test cache::tests::cache_miss_1mb ... bench: 806,821 ns/iter (+/- 19,731) 73 | test cache::tests::cache_miss_1mb_from_1000_entry_cache ... bench: 1,379,925 ns/iter (+/- 25,118) 74 | test cache::tests::cache_miss_5mb ... bench: 1,542,059 ns/iter (+/- 27,063) 75 | test cache::tests::cache_miss_5mb_from_1000_entry_cache ... bench: 2,090,871 ns/iter (+/- 37,040) 76 | test cache::tests::in_memory_file_read_10mb ... bench: 7,222,402 ns/iter (+/- 596,325) 77 | test cache::tests::named_file_read_10mb ... bench: 4,908,544 ns/iter (+/- 581,408) 78 | test cache::tests::named_file_read_1mb ... bench: 893,447 ns/iter (+/- 18,354) 79 | test cache::tests::named_file_read_5mb ... bench: 1,605,741 ns/iter (+/- 41,418) 80 | ``` 81 | 82 | It can be seen that on a server with slow disk reads, small file access times are vastly improved versus the disk. 83 | Larger files also seem to benefit, although to a lesser degree. 84 | Minimum and maximum file sizes can be set to keep files in the cache within size bounds. 85 | 86 | For queries that will retrieve an entry from the cache, there is no time penalty for each additional file in the cache. 87 | The more items in the cache, the larger the time penalty for a cache miss. 88 | 89 | 90 | 91 | ### Requirements 92 | * Rocket >= 0.3.6 93 | * Nightly Rust 94 | 95 | 96 | # Notes 97 | If you have any feature requests, notice any bugs, or if anything in the documentation is unclear, please open an Issue and I will respond ASAP. 98 | 99 | Development on this crate has slowed, but you should expect a couple more breaking changes before this reaches a 1.0.0 release. 100 | You can keep up to date with these changes with the [changelog](CHANGELOG.md). 101 | 102 | # Alternatives 103 | * [Nginx](http://nginx.org/) 104 | * Write your own. 105 | Most of the work here focuses on when to replace items in the cache. 106 | If you know that you will never grow or shrink your cache of files, all you need is a 107 | `Mutex>>`, an `impl Responder<'static> for Vec {...}`, and some glue logic 108 | to hold your files in memory and serve them as responses. 109 | * Rely on setting the cache-control HTTP header to cause the files to be cached in end-user's browsers 110 | and internet infrastructure between them and the server. This strategy can be used in conjunction with 111 | Rocket File Cache as well. 112 | -------------------------------------------------------------------------------- /Rocket.toml: -------------------------------------------------------------------------------- 1 | [development] 2 | address = "localhost" 3 | port = 8001 4 | workers = 2 5 | log = "normal" 6 | limits = { forms = 32768 } 7 | 8 | # Use this for running bench tests to hide logs 9 | [staging] 10 | address = "0.0.0.0" 11 | port = 80 12 | workers = 2 13 | log = "critical" 14 | limits = { forms = 32768 } 15 | 16 | -------------------------------------------------------------------------------- /examples/cached_files/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "cached_files" 3 | version = "0.0.0" 4 | 5 | [dependencies] 6 | rocket-file-cache = { path = "../../" } 7 | rocket = "0.3.6" 8 | rocket_codegen = "0.3.6" 9 | -------------------------------------------------------------------------------- /examples/cached_files/src/main.rs: -------------------------------------------------------------------------------- 1 | #![feature(plugin, decl_macro)] 2 | #![plugin(rocket_codegen)] 3 | 4 | extern crate rocket; 5 | extern crate rocket_file_cache; 6 | 7 | #[cfg(test)] 8 | mod tests; 9 | 10 | use rocket_file_cache::{Cache, CacheBuilder, CachedFile}; 11 | use std::path::{Path, PathBuf}; 12 | use rocket::State; 13 | use rocket::Rocket; 14 | 15 | 16 | #[get("/")] 17 | fn files(file: PathBuf, cache: State ) -> CachedFile { 18 | CachedFile::open(Path::new("www/").join(file), cache.inner()) 19 | } 20 | 21 | 22 | fn main() { 23 | rocket().launch(); 24 | } 25 | 26 | fn rocket() -> Rocket { 27 | let cache: Cache = CacheBuilder::new() 28 | .size_limit(1024 * 1024 * 40) // 40Mb 29 | .build() 30 | .unwrap(); 31 | rocket::ignite() 32 | .manage(cache) 33 | .mount("/", routes![files]) 34 | } -------------------------------------------------------------------------------- /examples/cached_files/src/tests.rs: -------------------------------------------------------------------------------- 1 | use std::fs::File; 2 | use std::io::Read; 3 | 4 | use rocket::local::Client; 5 | use rocket::http::Status; 6 | 7 | use super::rocket; 8 | 9 | fn test_query_file (path: &str, file: T, status: Status) 10 | where T: Into> 11 | { 12 | let client = Client::new(rocket()).unwrap(); 13 | let mut response = client.get(path).dispatch(); 14 | assert_eq!(response.status(), status); 15 | 16 | let body_data = response.body().and_then(|body| body.into_bytes()); 17 | if let Some(filename) = file.into() { 18 | let expected_data = read_file_content(filename); 19 | assert!(body_data.map_or(false, |s| s == expected_data)); 20 | } 21 | } 22 | 23 | fn read_file_content(path: &str) -> Vec { 24 | let mut fp = File::open(&path).expect(&format!("Can not open {}", path)); 25 | let mut file_content = vec![]; 26 | 27 | fp.read_to_end(&mut file_content).expect(&format!("Reading {} failed.", path)); 28 | file_content 29 | } 30 | 31 | #[test] 32 | fn test_get_file() { 33 | test_query_file("/test.txt", "www/test.txt", Status::Ok); 34 | test_query_file("/test.txt?v=1", "www/test.txt", Status::Ok); 35 | test_query_file("/test.txt?this=should&be=ignored", "www/test.txt", Status::Ok); 36 | } -------------------------------------------------------------------------------- /examples/cached_files/www/test.txt: -------------------------------------------------------------------------------- 1 | Hello World! -------------------------------------------------------------------------------- /examples/dynamic_files/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "dynamic_files" 3 | version = "0.0.0" 4 | 5 | [dependencies] 6 | rocket-file-cache = { path = "../../" } 7 | rocket = "0.3.6" 8 | rocket_codegen = "0.3.6" -------------------------------------------------------------------------------- /examples/dynamic_files/src/main.rs: -------------------------------------------------------------------------------- 1 | #![feature(plugin, decl_macro)] 2 | #![plugin(rocket_codegen)] 3 | 4 | extern crate rocket; 5 | extern crate rocket_file_cache; 6 | 7 | use rocket_file_cache::{Cache, CacheBuilder, CachedFile}; 8 | use std::path::{Path, PathBuf}; 9 | use std::fs; 10 | use std::io; 11 | use rocket::State; 12 | use rocket::Data; 13 | 14 | 15 | #[get("/")] 16 | fn files(file: PathBuf, cache: State ) -> CachedFile { 17 | let path: PathBuf = Path::new("www/").join(file).to_owned(); 18 | cache.inner().get(&path) // Getting the file will add it to the cache if there is room. 19 | } 20 | 21 | #[post("/", data = "")] 22 | fn upload(file: PathBuf, data: Data) -> io::Result { 23 | let path: PathBuf = Path::new("www/").join(file).to_owned(); 24 | if path.exists() { 25 | return Err(io::Error::new(io::ErrorKind::AlreadyExists, "File at path already exists.")) 26 | } 27 | data.stream_to_file(path).map(|n| n.to_string()) 28 | } 29 | 30 | #[put("/", data = "")] 31 | fn update(file: PathBuf, data: Data, cache: State) -> io::Result { 32 | let path: PathBuf = Path::new("www/").join(file).to_owned(); 33 | let result = data.stream_to_file(path.clone()).map(|n| n.to_string()); 34 | 35 | cache.refresh(&path); // Make sure the file in the cache is updated to reflect the FS. 36 | result 37 | } 38 | 39 | #[delete("/")] 40 | fn remove(file: PathBuf, cache: State) { 41 | let path: PathBuf = Path::new("www/").join(file).to_owned(); 42 | let _ = fs::remove_file(&path); // Remove the file from the FS. 43 | { 44 | cache.remove(&path); // Remove the file from the cache. 45 | } 46 | { 47 | // Reset the count to 0 so if a file with the same name is added in the future, 48 | // it won't immediately have the same priority as the file that was deleted here. 49 | cache.alter_access_count(&path, |_x| 0); 50 | } 51 | } 52 | 53 | 54 | fn main() { 55 | let cache: Cache = CacheBuilder::new() 56 | .size_limit(1024 * 1024 * 20) // 20 MB 57 | .build() 58 | .unwrap(); 59 | rocket::ignite() 60 | .manage(cache) 61 | .mount("/", routes![files, remove, upload, update]) 62 | .launch(); 63 | } -------------------------------------------------------------------------------- /examples/dynamic_files/www/test.txt: -------------------------------------------------------------------------------- 1 | Hello World! -------------------------------------------------------------------------------- /rustfmt.toml: -------------------------------------------------------------------------------- 1 | verbose = false 2 | disable_all_formatting = false 3 | skip_children = false 4 | max_width = 250 5 | error_on_line_overflow = true 6 | tab_spaces = 4 7 | fn_call_width = 60 8 | struct_lit_width = 18 9 | struct_variant_width = 35 10 | force_explicit_abi = true 11 | newline_style = "Unix" 12 | fn_brace_style = "SameLineWhere" 13 | item_brace_style = "SameLineWhere" 14 | control_style = "Rfc" 15 | control_brace_style = "AlwaysSameLine" 16 | impl_empty_single_line = true 17 | trailing_comma = "Vertical" 18 | fn_empty_single_line = true 19 | fn_single_line = false 20 | fn_return_indent = "WithArgs" 21 | fn_args_paren_newline = false 22 | fn_args_density = "Tall" 23 | fn_args_layout = "Block" 24 | array_layout = "Block" 25 | array_width = 60 26 | type_punctuation_density = "Wide" 27 | where_style = "Rfc" 28 | where_density = "CompressedIfEmpty" 29 | where_layout = "Vertical" 30 | where_pred_indent = "Visual" 31 | generics_indent = "Block" 32 | struct_lit_style = "Block" 33 | struct_lit_multiline_style = "PreferSingle" 34 | fn_call_style = "Block" 35 | report_todo = "Never" 36 | report_fixme = "Never" 37 | chain_indent = "Block" 38 | chain_one_line_max = 60 39 | chain_split_single_child = false 40 | reorder_imports = false 41 | reorder_imports_in_group = false 42 | reorder_imported_names = false 43 | single_line_if_else_max_width = 50 44 | format_strings = false 45 | force_format_strings = false 46 | take_source_hints = false 47 | hard_tabs = false 48 | wrap_comments = false 49 | comment_width = 160 50 | normalize_comments = false 51 | wrap_match_arms = true 52 | match_block_trailing_comma = false 53 | indent_match_arms = true 54 | closure_block_indent_threshold = 7 55 | space_before_type_annotation = false 56 | space_after_type_annotation_colon = true 57 | space_before_struct_lit_field_colon = false 58 | space_after_struct_lit_field_colon = true 59 | space_before_bound = false 60 | space_after_bound_colon = true 61 | spaces_around_ranges = false 62 | spaces_within_angle_brackets = false 63 | spaces_within_square_brackets = false 64 | spaces_within_parens = false 65 | use_try_shorthand = false 66 | write_mode = "Replace" 67 | condense_wildcard_suffixes = false 68 | combine_control_expr = true 69 | -------------------------------------------------------------------------------- /src/cache.rs: -------------------------------------------------------------------------------- 1 | use std::path::{PathBuf, Path}; 2 | use std::usize; 3 | use rocket::response::NamedFile; 4 | use std::fs::Metadata; 5 | use std::fs; 6 | use named_in_memory_file::NamedInMemoryFile; 7 | use cached_file::CachedFile; 8 | use in_memory_file::InMemoryFile; 9 | use concurrent_hashmap::ConcHashMap; 10 | use std::collections::hash_map::RandomState; 11 | use std::fmt::Debug; 12 | use std::fmt; 13 | use std::fmt::Formatter; 14 | use in_memory_file::FileStats; 15 | 16 | #[derive(Debug, PartialEq)] 17 | enum CacheError { 18 | NoMoreFilesToRemove, 19 | NewPriorityIsNotHighEnough, 20 | InvalidMetadata, 21 | InvalidPath, 22 | } 23 | 24 | 25 | 26 | /// The cache holds a number of files whose bytes fit into its size_limit. 27 | /// The cache acts as a proxy to the filesystem, returning cached files if they are in the cache, 28 | /// or reading a file directly from the filesystem if the file is not in the cache. 29 | /// 30 | /// When the cache is full, each file in the cache will have priority score determined by a provided 31 | /// priority function. 32 | /// When a call to `get()` is made, an access counter for the file in question is incremented, 33 | /// usually (depending on the supplied priority function) increasing the priority score of the file. 34 | /// When a a new file is attempted to be stored, it will calculate the priority of the new score and 35 | /// compare that against the score of the file with the lowest priority in the cache. 36 | /// If the new file's priority is higher, then the file in the cache will be removed and replaced 37 | /// with the new file. 38 | /// If removing the first file doesn't free up enough space for the new file, then the file with the 39 | /// next lowest priority will have its priority added to the other removed file's and the aggregate 40 | /// cached file's priority will be tested against the new file's. 41 | /// 42 | /// This will repeat until either enough space can be freed for the new file, and the new file is 43 | /// inserted, or until the priority of the cached files is greater than that of the new file, 44 | /// in which case, the new file isn't inserted. 45 | pub struct Cache { 46 | /// The number of bytes the file_map should be able hold at once. 47 | pub(crate) size_limit: usize, 48 | /// The minimum number of bytes a file must have in order to be accepted into the Cache. 49 | pub(crate) min_file_size: usize, 50 | /// The maximum number of bytes a file can have in order to be accepted into the Cache. 51 | pub(crate) max_file_size: usize, 52 | /// The function that is used to calculate the priority score that is used to determine which files should be in the cache. 53 | pub(crate) priority_function: fn(usize, usize) -> usize, 54 | /// If a given file's access count modulo this value equals 0, then that file will be refreshed from the FileSystem instead of from the Cache. 55 | pub(crate) accesses_per_refresh: Option, 56 | pub(crate) file_map: ConcHashMap, // Holds the files that the cache is caching 57 | pub(crate) access_count_map: ConcHashMap, // Every file that is accessed will have the number of times it is accessed logged in this map. 58 | } 59 | 60 | 61 | impl Debug for Cache { 62 | fn fmt(&self, fmt: &mut Formatter) -> fmt::Result { 63 | fmt.debug_map() 64 | .entries(self.file_map.iter().map( 65 | |(ref k, ref v)| (k.clone(), v.clone()), 66 | )) 67 | .finish() 68 | } 69 | } 70 | 71 | impl Cache { 72 | 73 | /// Either gets the file from the cache if it exists there, gets it from the filesystem and 74 | /// tries to cache it, or fails to find the file. 75 | /// 76 | /// The CachedFile that is returned takes a lock out on that file in the cache, if that file happens to exist in the cache. 77 | /// This lock will release when the CachedFile goes out of scope. 78 | /// 79 | /// # Arguments 80 | /// 81 | /// * `path` - A path that represents the path of the file in the filesystem. The path 82 | /// also acts as a key for the file in the cache. 83 | /// The path will be used to find a cached file in the cache or find a file in the filesystem if 84 | /// an entry in the cache doesn't exist. 85 | /// 86 | /// # Example 87 | /// 88 | /// ``` 89 | /// #![feature(attr_literals)] 90 | /// #![feature(custom_attribute)] 91 | /// # extern crate rocket; 92 | /// # extern crate rocket_file_cache; 93 | /// 94 | /// # fn main() { 95 | /// use rocket_file_cache::{Cache, CachedFile}; 96 | /// use std::path::{Path, PathBuf}; 97 | /// use rocket::State; 98 | /// use std::sync::Arc; 99 | /// use std::sync::atomic::AtomicPtr; 100 | /// 101 | /// 102 | /// #[get("/")] 103 | /// fn files<'a>(file: PathBuf, cache: State<'a, Cache> ) -> CachedFile<'a> { 104 | /// let path: PathBuf = Path::new("www/").join(file).to_owned(); 105 | /// cache.inner().get(path) 106 | /// } 107 | /// # } 108 | /// ``` 109 | pub fn get>(&self, path: P) -> CachedFile { 110 | trace!("{:#?}", self); 111 | // First, try to get the file in the cache that corresponds to the desired path. 112 | 113 | if self.contains_key(&path.as_ref().to_path_buf()) { 114 | // File is in the cache, increment the count, update the stats attached to the cache entry. 115 | self.increment_access_count(&path); 116 | self.update_stats(&path); 117 | 118 | // See if the file should be refreshed 119 | if let Some(accesses_per_refresh) = self.accesses_per_refresh { 120 | match self.access_count_map.find(&path.as_ref().to_path_buf()) { 121 | Some(a) => { 122 | let access_count: usize = a.get().clone(); 123 | // If the access count is a multiple of the refresh parameter, then refresh the file. 124 | if access_count % accesses_per_refresh == 0 { 125 | debug!( "Refreshing entry for path: {:?}", path.as_ref() ); 126 | return self.refresh(path.as_ref()) 127 | } 128 | } 129 | None => warn!("Cache contains entry for {:?}, but does not tract its access counts.", path.as_ref()) 130 | } 131 | } 132 | 133 | } else { 134 | return self.try_insert(path); 135 | } 136 | 137 | self.get_from_cache(&path) 138 | } 139 | 140 | 141 | /// If a file has changed on disk, the cache will not automatically know that a change has occurred. 142 | /// Calling this function will check if the file exists, read the new file into memory, 143 | /// replace the old file, and update the priority score to reflect the new size of the file. 144 | /// 145 | /// # Arguments 146 | /// 147 | /// * `path` - A path that represents the path of the file in the filesystem, and key to 148 | /// the file in the cache. 149 | /// The path will be used to find the new file in the filesystem and to find the old file to replace in 150 | /// the cache. 151 | /// 152 | /// # Return 153 | /// 154 | /// The CachedFile will indicate NotFound if the file isn't already in the cache or if it can't 155 | /// be found in the filesystem. 156 | /// It will otherwise return a CachedFile::InMemory variant. 157 | pub fn refresh>(&self, path: P) -> CachedFile { 158 | 159 | let mut is_ok_to_refresh: bool = false; 160 | 161 | // Check if the file exists in the cache 162 | if self.contains_key(&path.as_ref().to_path_buf()) { 163 | // See if the new file exists. 164 | let path_string: String = match path.as_ref().to_str() { 165 | Some(s) => String::from(s), 166 | None => return CachedFile::NotFound, 167 | }; 168 | if let Ok(metadata) = fs::metadata(path_string.as_str()) { 169 | if metadata.is_file() { 170 | // If the entry for the old file exists 171 | if self.file_map.find(&path.as_ref().to_path_buf()).is_some() { 172 | is_ok_to_refresh = true; 173 | } 174 | } 175 | }; 176 | } 177 | 178 | if is_ok_to_refresh { 179 | if let Ok(new_file) = InMemoryFile::open(path.as_ref().to_path_buf()) { 180 | debug!("Refreshing file: {:?}", path.as_ref()); 181 | { 182 | self.file_map.remove(&path.as_ref().to_path_buf()); 183 | self.file_map.insert(path.as_ref().to_path_buf(), new_file); 184 | } 185 | self.update_stats(&path); 186 | 187 | return self.get_from_cache(path) 188 | } 189 | } 190 | 191 | CachedFile::NotFound 192 | } 193 | 194 | /// Removes the file from the cache. 195 | /// This will not reset the access count, so the next time the file is accessed, it will be added to the cache again. 196 | /// The access count will have to be reset separately using `alter_access_count()`. 197 | /// 198 | /// # Arguments 199 | /// 200 | /// * `path` - A path that acts as a key to look up the file that should be removed from the cache. 201 | /// 202 | /// # Example 203 | /// 204 | /// ``` 205 | /// use rocket_file_cache::{Cache, CacheBuilder}; 206 | /// use std::path::PathBuf; 207 | /// 208 | /// let mut cache = CacheBuilder::new().build().unwrap(); 209 | /// let pathbuf = PathBuf::new(); 210 | /// cache.remove(&pathbuf); 211 | /// assert!(cache.contains_key(&pathbuf) == false); 212 | /// ``` 213 | pub fn remove>(&self, path: P) -> bool { 214 | if let Some(_) = self.file_map.remove(&path.as_ref().to_path_buf()) { 215 | true 216 | } else { 217 | false 218 | } 219 | } 220 | 221 | /// Returns a boolean indicating if the cache has an entry corresponding to the given key. 222 | /// 223 | /// # Arguments 224 | /// 225 | /// * `path` - A path that is used as a key to look up the file. 226 | /// 227 | /// # Example 228 | /// 229 | /// ``` 230 | /// use rocket_file_cache::{CacheBuilder}; 231 | /// use std::path::PathBuf; 232 | /// 233 | /// let mut cache = CacheBuilder::new().build().unwrap(); 234 | /// let pathbuf: PathBuf = PathBuf::new(); 235 | /// cache.get(&pathbuf); 236 | /// assert!(cache.contains_key(&pathbuf) == false); 237 | /// ``` 238 | pub fn contains_key>(&self, path: P) -> bool { 239 | self.file_map.find(&path.as_ref().to_path_buf()).is_some() 240 | } 241 | 242 | /// Alters the access count value of one file in the access_count_map. 243 | /// # Arguments 244 | /// 245 | /// * `path` - The key to look up the file. 246 | /// * `alter_count_function` - A function that determines how to alter the access_count for the file. 247 | /// 248 | /// # Example 249 | /// 250 | /// ``` 251 | /// use rocket_file_cache::{Cache, CacheBuilder}; 252 | /// use std::path::PathBuf; 253 | /// 254 | /// let mut cache = CacheBuilder::new().build().unwrap(); 255 | /// let pathbuf = PathBuf::new(); 256 | /// cache.get(&pathbuf); // Add a file to the cache 257 | /// cache.remove(&pathbuf); // Removing the file will not reset its access count. 258 | /// cache.alter_access_count(&pathbuf, | x | { 0 }); // Set the access count to 0. 259 | /// ``` 260 | /// 261 | pub fn alter_access_count>(&self, path: P, alter_count_function: fn(&usize) -> usize) -> bool { 262 | let new_count: usize; 263 | { 264 | match self.access_count_map.find(&path.as_ref().to_path_buf()) { 265 | Some(access_count_entry) => { 266 | new_count = alter_count_function(&access_count_entry.get()); 267 | } 268 | None => return false, // Can't update a file that isn't in the cache. 269 | } 270 | } 271 | { 272 | self.access_count_map.insert( 273 | path.as_ref().to_path_buf(), 274 | new_count, 275 | ); 276 | } 277 | self.update_stats(&path); 278 | return true; 279 | } 280 | 281 | /// Alters the access count value of every file in the access_count_map. 282 | /// This is useful for manually aging-out entries in the cache. 283 | /// 284 | /// # Arguments 285 | /// 286 | /// * `alter_count_function` - A function that determines how to alter the access_count for the file. 287 | /// 288 | /// # Example 289 | /// 290 | /// ``` 291 | /// use rocket_file_cache::{Cache, CacheBuilder}; 292 | /// use std::path::PathBuf; 293 | /// 294 | /// let mut cache = CacheBuilder::new().build().unwrap(); 295 | /// let pathbuf = PathBuf::new(); 296 | /// let other_pathbuf = PathBuf::new(); 297 | /// cache.get(&pathbuf); 298 | /// cache.get(&other_pathbuf); 299 | /// // Reduce all access counts by half, 300 | /// // allowing newer files to enter the cache more easily. 301 | /// cache.alter_all_access_counts(| x | { x / 2 }); 302 | /// ``` 303 | /// 304 | pub fn alter_all_access_counts(&self, alter_count_function: fn(&usize) -> usize) { 305 | let all_counts: Vec; 306 | { 307 | all_counts = self.access_count_map 308 | .iter() 309 | .map(|x: (&PathBuf, &usize)| x.0.clone()) 310 | .collect(); 311 | } 312 | for pathbuf in all_counts { 313 | self.alter_access_count(&pathbuf, alter_count_function); 314 | } 315 | 316 | } 317 | 318 | /// Gets the sum of the sizes of the files that are stored in the cache. 319 | /// 320 | /// # Example 321 | /// 322 | /// ``` 323 | /// use rocket_file_cache::{Cache, CacheBuilder}; 324 | /// 325 | /// let cache = CacheBuilder::new().build().unwrap(); 326 | /// assert!(cache.used_bytes() == 0); 327 | /// ``` 328 | pub fn used_bytes(&self) -> usize { 329 | self.file_map.iter().fold( 330 | 0usize, 331 | |size, x| size + x.1.stats.size, 332 | ) 333 | } 334 | 335 | /// Gets the size of the file from the file's metadata. 336 | /// This avoids having to read the file into memory in order to get the file size. 337 | fn get_file_size_from_metadata>(path: P) -> Result { 338 | let path_string: String = match path.as_ref().to_str() { 339 | Some(s) => String::from(s), 340 | None => return Err(CacheError::InvalidPath), 341 | }; 342 | let metadata: Metadata = match fs::metadata(path_string.as_str()) { 343 | Ok(m) => m, 344 | Err(_) => return Err(CacheError::InvalidMetadata), 345 | }; 346 | let size: usize = metadata.len() as usize; 347 | Ok(size) 348 | } 349 | 350 | 351 | /// Attempt to store a given file in the the cache. 352 | /// Storing will fail if the current files have more access attempts than the file being added. 353 | /// If the provided file has more more access attempts than one of the files in the cache, 354 | /// but the cache is full, a file will have to be removed from the cache to make room 355 | /// for the new file. 356 | /// 357 | /// If the insertion works, the cache will update the priority score for the file being inserted. 358 | /// The cached priority score requires the file in question to exist in the file map, so it will 359 | /// have a size to use when calculating. 360 | /// 361 | /// It will get the size of the file to be inserted. 362 | /// If will use this size to check if the file could be inserted. 363 | /// If it can be inserted, it reads the file into memory, stores a copy of the in-memory 364 | /// file behind a pointer, and constructs a CachedFile to return. 365 | /// 366 | /// If the file can't be added, it will open a NamedFile and construct a CachedFile from that, 367 | /// and return it. 368 | /// This means that it doesn't need to read the whole file into memory before reading through it 369 | /// again to set the response body. 370 | /// The lack of the need to read the whole file twice keeps performance of cache misses on par 371 | /// with just normally reading the file without a cache. 372 | /// 373 | /// 374 | /// # Arguments 375 | /// 376 | /// * `path` - The path of the file to be stored. Acts as a key for the file in the cache. Is used 377 | /// look up the location of the file in the filesystem if the file is not in the cache. 378 | /// 379 | /// 380 | fn try_insert>(&self, path: P) -> CachedFile { 381 | let path: PathBuf = path.as_ref().to_path_buf(); 382 | trace!("Trying to insert file {:?}", path); 383 | 384 | // If the FS can read metadata for a file, then the file exists, and it should be safe to increment 385 | // the access_count and update. 386 | let size: usize = match Cache::get_file_size_from_metadata(&path) { 387 | Ok(size) => size, 388 | Err(_) => return CachedFile::NotFound // Could not open file to read metadata. 389 | }; 390 | 391 | // Determine how much space can still be used (represented by a negative value) or how much 392 | // space needs to be freed in order to make room for the new file 393 | let required_space_for_new_file: isize = (self.used_bytes() as isize + size as isize) - self.size_limit as isize; 394 | 395 | 396 | if size > self.max_file_size || size < self.min_file_size { 397 | self.get_file_from_fs(&path) 398 | } else if required_space_for_new_file < 0 && size < self.size_limit { 399 | self.get_file_from_fs_and_add_to_cache(&path) 400 | } else { 401 | debug!("Trying to make room for the file"); 402 | 403 | // Because the size was gotten from the file's metadata, we know that it exists, 404 | // so its fine to increment the account 405 | self.increment_access_count(&path); 406 | 407 | // The access_count should have incremented since the last time this was called, so the priority must be recalculated. 408 | // Also, the size generally 409 | let new_file_priority: usize; 410 | { 411 | let new_file_access_count: &usize = match self.access_count_map.find(&path) { 412 | Some(access_count) => &access_count.get(), 413 | None => &1, 414 | }; 415 | new_file_priority = (self.priority_function)(new_file_access_count.clone(), size); 416 | } 417 | 418 | 419 | match self.make_room_for_new_file(required_space_for_new_file as usize, new_file_priority) { 420 | Ok(files_to_be_removed) => { 421 | debug!("Made room for new file"); 422 | match InMemoryFile::open(path.as_path()) { 423 | Ok(file) => { 424 | 425 | // We have read a new file into memory, it is safe to 426 | // remove the old files. 427 | for file_key in files_to_be_removed { 428 | // The file was accessed with this key earlier when sorting priorities, which should make removal safe. 429 | match self.file_map.remove(&file_key) { 430 | Some(_) => {}, 431 | None => warn!("Likely due to concurrent mutations, a file being removed from the cache was not found because another thread removed it first.") 432 | }; 433 | } 434 | 435 | self.file_map.insert(path.clone(), file); 436 | self.update_stats(&path); 437 | 438 | let cache_file_accessor = match self.file_map.find(&path) { 439 | Some(accessor_to_file) => accessor_to_file, 440 | None => { 441 | // If a concurrent remove operation removes the file before 442 | // it can be gotten via an accessor lock, recursively try to add 443 | // the file to the Cache until the lock can be attained. 444 | 445 | // Because this action takes place after room was made for 446 | // the new file in the cache, those files will be left out of the cache. 447 | warn!("Tried to add file to cache, but it was removed before it could be added. Attempting to insert file again."); 448 | // Because this recursion only occurs under extremely rare 449 | // circumstances due to concurrent removal of the file being 450 | // added between the insertion into the map, and locking an 451 | // accessor, a stack overflow is almost impossible. This would require 452 | // the file to be removed on every recursive attempt to re-insert it, 453 | // with the exact same timing required to invalidate the `find()` method, 454 | // for as many times as it takes to fill up the stack. It's not 455 | // going to happen. 456 | return self.try_insert(path); 457 | } 458 | }; 459 | 460 | let named_in_memory_file: NamedInMemoryFile = NamedInMemoryFile::new( 461 | path.clone(), 462 | cache_file_accessor 463 | ); 464 | 465 | return CachedFile::from(named_in_memory_file); 466 | } 467 | Err(_) => return CachedFile::NotFound 468 | } 469 | } 470 | Err(_) => { 471 | debug!("The file does not have enough priority or is too large to be accepted into the cache."); 472 | // The new file would not be accepted by the cache, so instead of reading the whole file 473 | // into memory, and then copying it yet again when it is attached to the body of the 474 | // response, use a NamedFile instead. 475 | match NamedFile::open(path.clone()) { 476 | Ok(named_file) => CachedFile::from(named_file), 477 | Err(_) => CachedFile::NotFound, 478 | } 479 | } 480 | } 481 | } 482 | } 483 | 484 | /// Gets a file from the filesystem and converts it to a CachedFile. 485 | /// 486 | /// This should be used when the cache knows that the new file won't make it into the cache. 487 | fn get_file_from_fs< P: AsRef>(&self, path: P) -> CachedFile{ 488 | debug!("File does not fit size constraints of the cache."); 489 | match NamedFile::open(path.as_ref().to_path_buf()) { 490 | Ok(named_file) => { 491 | self.increment_access_count(path); 492 | return CachedFile::from(named_file); 493 | } 494 | Err(_) => return CachedFile::NotFound 495 | } 496 | } 497 | 498 | /// Reads a file from the filesystem into memory and stores it in the cache. 499 | /// 500 | /// This is the slowest operation the cache can perform, slower than just getting the file. 501 | /// It should only be used when the cache decides to store the file. 502 | fn get_file_from_fs_and_add_to_cache>(&self, path: P) -> CachedFile { 503 | debug!("Cache has room for the file."); 504 | match InMemoryFile::open(&path) { 505 | Ok(file) => { 506 | self.file_map.insert(path.as_ref().to_path_buf(), file); 507 | 508 | self.increment_access_count(&path); 509 | self.update_stats(&path); 510 | 511 | let cache_file_accessor = match self.file_map.find(path.as_ref()) { 512 | Some(accessor_to_file) => accessor_to_file, 513 | None => { 514 | // If for whatever reason, a concurrent remove operation removes the file 515 | // before it can be gotten via an accessor lock, recursively try to add 516 | // the file to the Cache until the lock can be attained. 517 | warn!("Tried to add file to cache, but it was removed before it could be added. Attempting to get file again."); 518 | // Because this recursion only occurs under extremely rare circumstances 519 | // due to a concurrent removal of the file being added between the insertion 520 | // into the map, and locking an accessor, a stack overflow is almost impossible. 521 | return self.get_file_from_fs_and_add_to_cache(path); 522 | } 523 | }; 524 | 525 | let cached_file: NamedInMemoryFile = NamedInMemoryFile::new( 526 | path.as_ref().to_path_buf(), 527 | cache_file_accessor 528 | ); 529 | 530 | return CachedFile::from(cached_file); 531 | } 532 | Err(_) => return CachedFile::NotFound, 533 | } 534 | } 535 | 536 | 537 | 538 | /// Remove the n lowest priority files to make room for a file with a size: required_space. 539 | /// 540 | /// If this returns an OK, this function has removed the required file space from the file_map. 541 | /// If this returns an Err, then either not enough space could be freed, or the priority of 542 | /// files that would need to be freed to make room for the new file is greater than the 543 | /// new file's priority, and as result no memory was freed. 544 | /// 545 | /// # Arguments 546 | /// 547 | /// * `required_space` - A `usize` representing the number of bytes that must be freed to make room for a new file. 548 | /// * `new_file_priority` - A `usize` representing the priority of the new file to be added. If the priority of the files possibly being removed 549 | /// is greater than this value, then the files won't be removed. 550 | fn make_room_for_new_file(&self, required_space: usize, new_file_priority: usize) -> Result, CacheError> { 551 | let mut possibly_freed_space: usize = 0; 552 | let mut priority_score_to_free: usize = 0; 553 | let mut file_paths_to_remove: Vec = vec![]; 554 | 555 | let mut stats: Vec<(PathBuf, FileStats)> = self.sorted_priorities(); 556 | while possibly_freed_space < required_space { 557 | // pop the priority group with the lowest priority off of the vector 558 | match stats.pop() { 559 | Some(lowest) => { 560 | let (lowest_key, lowest_stats) = lowest; 561 | 562 | possibly_freed_space += lowest_stats.size; 563 | priority_score_to_free += lowest_stats.priority; 564 | file_paths_to_remove.push(lowest_key.clone()); 565 | 566 | // Check if total priority to free is greater than the new file's priority, 567 | // If it is, then don't free the files, as they in aggregate, are more important 568 | // than the new file. 569 | if priority_score_to_free > new_file_priority { 570 | return Err(CacheError::NewPriorityIsNotHighEnough); 571 | } 572 | } 573 | None => return Err(CacheError::NoMoreFilesToRemove), 574 | }; 575 | } 576 | Ok(file_paths_to_remove) 577 | 578 | } 579 | 580 | ///Helper function that gets the file from the cache if it exists there. 581 | fn get_from_cache>(&self, path: P) -> CachedFile { 582 | match self.file_map.find(&path.as_ref().to_path_buf()) { 583 | Some(in_memory_file) => { 584 | trace!("Found file: {:?} in cache.", path.as_ref()); 585 | CachedFile::from(NamedInMemoryFile::new( 586 | path.as_ref().to_path_buf(), 587 | in_memory_file, 588 | )) 589 | } 590 | None => CachedFile::NotFound, 591 | } 592 | 593 | } 594 | 595 | /// Helper function for incrementing the access count for a given file name. 596 | /// 597 | /// This should only be used in cases where the file is known to exist, to avoid bloating the access count map with useless values. 598 | fn increment_access_count>(&self, path: P) { 599 | self.access_count_map.upsert( 600 | path.as_ref().to_path_buf(), 601 | 1, // insert 1 if nothing at key. The closure will not execute. 602 | &|access_count| { 603 | *access_count = match usize::checked_add(access_count.clone(), 1) { 604 | Some(v) => v, // return the incremented value 605 | None => usize::MAX, // If the access count bumps up against the usize max, keep the value the same. 606 | } 607 | }, 608 | ); 609 | } 610 | 611 | 612 | /// Update the stats associated with this file. 613 | fn update_stats>(&self, path: P) { 614 | 615 | let access_count: usize = match self.access_count_map.find(&path.as_ref().to_path_buf()) { 616 | Some(access_count) => access_count.get().clone(), 617 | None => 1, 618 | }; 619 | 620 | self.file_map.upsert( 621 | // Key 622 | path.as_ref().to_path_buf(), 623 | // Default Value 624 | InMemoryFile { 625 | bytes: Vec::new(), 626 | stats: FileStats { 627 | size: 0, 628 | access_count: 0, 629 | priority: 0, 630 | }, 631 | }, 632 | // Update Function 633 | &|file_entry| { 634 | // If the size is initialized to 0, then try to get the actual size from the filesystem 635 | if file_entry.stats.size == 0 { 636 | file_entry.stats.size = Cache::get_file_size_from_metadata(&path.as_ref().to_path_buf()).unwrap_or(0); 637 | } 638 | file_entry.stats.access_count = access_count; 639 | file_entry.stats.priority = (self.priority_function)(file_entry.stats.access_count, file_entry.stats.size); // update the priority score. 640 | }, 641 | ); 642 | 643 | 644 | } 645 | 646 | 647 | 648 | 649 | 650 | /// Gets a vector of tuples containing the Path, priority score, and size in bytes of all items 651 | /// in the file_map. 652 | /// 653 | /// The vector is sorted from highest to lowest priority. 654 | /// This allows the assumption that the last element to be popped from the vector will have the 655 | /// lowest priority, and therefore is the most eligible candidate for elimination from the 656 | /// cache. 657 | /// 658 | fn sorted_priorities(&self) -> Vec<(PathBuf, FileStats)> { 659 | 660 | let mut priorities: Vec<(PathBuf, FileStats)> = self.file_map 661 | .iter() 662 | .map(|x| (x.0.clone(), x.1.stats.clone())) 663 | .collect(); 664 | 665 | // Sort the priorities from highest priority to lowest, so when they are pop()ed later, 666 | // the last element will have the lowest priority. 667 | priorities.sort_by(|l, r| r.1.priority.cmp(&l.1.priority)); 668 | priorities 669 | } 670 | } 671 | 672 | 673 | 674 | #[cfg(test)] 675 | mod tests { 676 | extern crate test; 677 | extern crate tempdir; 678 | extern crate rand; 679 | 680 | use super::*; 681 | 682 | use self::tempdir::TempDir; 683 | use self::test::Bencher; 684 | use self::rand::rngs::StdRng; 685 | use std::io::{Write, BufWriter}; 686 | use std::fs::File; 687 | use rocket::response::NamedFile; 688 | use std::io::Read; 689 | use in_memory_file::InMemoryFile; 690 | use concurrent_hashmap::Accessor; 691 | use std::sync::Arc; 692 | use std::mem; 693 | use cache_builder::CacheBuilder; 694 | use self::rand::FromEntropy; 695 | use self::rand::RngCore; 696 | 697 | 698 | const MEG1: usize = 1024 * 1024; 699 | const MEG2: usize = MEG1 * 2; 700 | const MEG5: usize = MEG1 * 5; 701 | const MEG10: usize = MEG1 * 10; 702 | 703 | const DIR_TEST: &'static str = "test1"; 704 | const FILE_MEG1: &'static str = "meg1.txt"; 705 | const FILE_MEG2: &'static str = "meg2.txt"; 706 | const FILE_MEG5: &'static str = "meg5.txt"; 707 | const FILE_MEG10: &'static str = "meg10.txt"; 708 | 709 | // Helper function that creates test files in a directory that is cleaned up after the test runs. 710 | fn create_test_file(temp_dir: &TempDir, size: usize, name: &str) -> PathBuf { 711 | let path = temp_dir.path().join(name); 712 | let tmp_file = File::create(path.clone()).unwrap(); 713 | let mut rand_data: Vec = vec![0u8; size]; 714 | StdRng::from_entropy().fill_bytes(rand_data.as_mut()); 715 | let mut buffer = BufWriter::new(tmp_file); 716 | buffer.write(&rand_data).unwrap(); 717 | path 718 | } 719 | 720 | 721 | // Standardize the way a file is used in these tests. 722 | impl<'a> CachedFile<'a> { 723 | fn dummy_write(self) { 724 | match self { 725 | CachedFile::InMemory(cached_file) => unsafe { 726 | let file: *const Accessor<'a, PathBuf, InMemoryFile> = Arc::into_raw(cached_file.file); 727 | let mut v: Vec = Vec::new(); 728 | let _ = (*file).get().bytes.as_slice().read_to_end(&mut v).unwrap(); 729 | let _ = Arc::from_raw(file); // To prevent a memory leak, an Arc needs to be reconstructed from the raw pointer. 730 | }, 731 | CachedFile::FileSystem(mut named_file) => { 732 | let mut v: Vec = Vec::new(); 733 | let _ = named_file.read_to_end(&mut v).unwrap(); 734 | } 735 | CachedFile::NotFound => { 736 | panic!("tried to write using a non-existent file") 737 | } 738 | } 739 | } 740 | 741 | fn get_in_memory_file(self) -> NamedInMemoryFile<'a> { 742 | match self { 743 | CachedFile::InMemory(n) => n, 744 | _ => panic!("tried to get cached file for named file"), 745 | 746 | } 747 | } 748 | 749 | fn get_named_file(self) -> NamedFile { 750 | match self { 751 | CachedFile::FileSystem(n) => n, 752 | _ => panic!("tried to get cached file for named file"), 753 | } 754 | } 755 | } 756 | 757 | #[bench] 758 | fn cache_get_10mb(b: &mut Bencher) { 759 | let cache: Cache = CacheBuilder::new() 760 | .size_limit(MEG1 * 20) 761 | .build() 762 | .unwrap(); 763 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 764 | let path_10m = create_test_file(&temp_dir, MEG10, FILE_MEG10); 765 | cache.get(&path_10m); // add the 10 mb file to the cache 766 | 767 | b.iter(|| { 768 | let cached_file = cache.get(&path_10m); 769 | cached_file.dummy_write() 770 | }); 771 | } 772 | 773 | #[bench] 774 | fn cache_miss_10mb(b: &mut Bencher) { 775 | let cache: Cache = CacheBuilder::new() 776 | .size_limit(0) 777 | .build() 778 | .unwrap(); 779 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 780 | let path_10m = create_test_file(&temp_dir, MEG10, FILE_MEG10); 781 | 782 | b.iter(|| { 783 | let cached_file = cache.get(&path_10m); 784 | cached_file.dummy_write() 785 | }); 786 | } 787 | 788 | #[bench] 789 | fn named_file_read_10mb(b: &mut Bencher) { 790 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 791 | let path_10m = create_test_file(&temp_dir, MEG10, FILE_MEG10); 792 | b.iter(|| { 793 | let named_file = CachedFile::from(NamedFile::open(&path_10m).unwrap()); 794 | named_file.dummy_write() 795 | }); 796 | } 797 | 798 | #[bench] 799 | fn cache_get_1mb(b: &mut Bencher) { 800 | let cache: Cache = CacheBuilder::new() 801 | .size_limit(MEG1 * 20) 802 | .build() 803 | .unwrap(); //Cache can hold 20Mb 804 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 805 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 806 | cache.get(&path_1m); // add the 10 mb file to the cache 807 | 808 | b.iter(|| { 809 | let cached_file = cache.get(&path_1m); 810 | cached_file.dummy_write() 811 | }); 812 | } 813 | 814 | #[bench] 815 | fn cache_miss_1mb(b: &mut Bencher) { 816 | let cache: Cache = CacheBuilder::new() 817 | .size_limit(0) 818 | .build() 819 | .unwrap(); 820 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 821 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 822 | 823 | b.iter(|| { 824 | let cached_file = cache.get(&path_1m); 825 | cached_file.dummy_write() 826 | }); 827 | } 828 | 829 | #[bench] 830 | fn named_file_read_1mb(b: &mut Bencher) { 831 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 832 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 833 | 834 | b.iter(|| { 835 | let named_file = CachedFile::from(NamedFile::open(&path_1m).unwrap()); 836 | named_file.dummy_write() 837 | }); 838 | } 839 | 840 | 841 | 842 | #[bench] 843 | fn cache_get_5mb(b: &mut Bencher) { 844 | let cache: Cache = CacheBuilder::new() 845 | .size_limit(MEG1 * 20) 846 | .build() 847 | .unwrap(); 848 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 849 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 850 | cache.get(&path_5m); // add the 10 mb file to the cache 851 | 852 | b.iter(|| { 853 | let cached_file = cache.get(&path_5m); 854 | cached_file.dummy_write() 855 | }); 856 | } 857 | 858 | #[bench] 859 | fn cache_miss_5mb(b: &mut Bencher) { 860 | let cache: Cache = CacheBuilder::new() 861 | .size_limit(0) 862 | .build() 863 | .unwrap(); 864 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 865 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 866 | 867 | b.iter(|| { 868 | let cached_file = cache.get(&path_5m); 869 | cached_file.dummy_write() 870 | }); 871 | } 872 | 873 | #[bench] 874 | fn named_file_read_5mb(b: &mut Bencher) { 875 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 876 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 877 | 878 | b.iter(|| { 879 | let named_file = CachedFile::from(NamedFile::open(&path_5m).unwrap()); 880 | named_file.dummy_write() 881 | }); 882 | } 883 | 884 | 885 | 886 | // Constant time access regardless of size. 887 | #[bench] 888 | fn cache_get_1mb_from_1000_entry_cache(b: &mut Bencher) { 889 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 890 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 891 | let cache: Cache = CacheBuilder::new() 892 | .size_limit(MEG1 * 3) 893 | .build() 894 | .unwrap(); 895 | cache.get(&path_1m); // add the file to the cache 896 | 897 | // Add 1024 1kib files to the cache. 898 | for i in 0..1024 { 899 | let path = create_test_file(&temp_dir, 1024, format!("{}_1kib.txt", i).as_str()); 900 | cache.get(&path); 901 | } 902 | // make sure that the file has a high priority. 903 | cache.alter_all_access_counts(|x| x + 1 * 100000); 904 | 905 | assert_eq!(cache.used_bytes(), MEG1 * 2); 906 | 907 | let named_file = CachedFile::from(NamedFile::open(&path_1m).unwrap()); 908 | 909 | b.iter(|| { 910 | let cached_file = cache.get(&path_1m); 911 | assert!(mem::discriminant(&cached_file) != mem::discriminant(&named_file)); 912 | cached_file.dummy_write() 913 | }); 914 | } 915 | 916 | // There is a penalty for missing the cache. 917 | #[bench] 918 | fn cache_miss_1mb_from_1000_entry_cache(b: &mut Bencher) { 919 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 920 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 921 | let cache: Cache = CacheBuilder::new() 922 | .size_limit(MEG1) 923 | .build() 924 | .unwrap(); 925 | 926 | // Add 1024 1kib files to the cache. 927 | for i in 0..1024 { 928 | let path = create_test_file(&temp_dir, 1024, format!("{}_1kib.txt", i).as_str()); 929 | cache.get(&path); 930 | } 931 | // make sure that the file has a high priority. 932 | cache.alter_all_access_counts(|x| x + 1 * 100_000_000_000_000_000); 933 | let named_file = CachedFile::from(NamedFile::open(&path_1m).unwrap()); 934 | 935 | b.iter(|| { 936 | let cached_file = cache.get(&path_1m); 937 | assert!(mem::discriminant(&cached_file) == mem::discriminant(&named_file)); // get() in this case should only return files in the FS 938 | cached_file.dummy_write() 939 | }); 940 | } 941 | 942 | // This is pretty much a worst-case scenario, where every file would try to be removed to make room for the new file. 943 | // There is a penalty for missing the cache. 944 | #[bench] 945 | fn cache_miss_5mb_from_1000_entry_cache(b: &mut Bencher) { 946 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 947 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG1); 948 | let cache: Cache = CacheBuilder::new() 949 | .size_limit(MEG5) 950 | .build() 951 | .unwrap(); 952 | 953 | // Add 1024 5kib files to the cache. 954 | for i in 0..1024 { 955 | let path = create_test_file(&temp_dir, 1024 * 5, format!("{}_5kib.txt", i).as_str()); 956 | cache.get(&path); 957 | } 958 | // make sure that the file has a high priority. 959 | cache.alter_all_access_counts(|x| x + 1 * 100_000_000_000_000_000); 960 | let named_file = CachedFile::from(NamedFile::open(&path_5m).unwrap()); 961 | 962 | b.iter(|| { 963 | let cached_file: CachedFile = cache.get(&path_5m); 964 | // Mimic what is done when the response body is set. 965 | assert!(mem::discriminant(&cached_file) == mem::discriminant(&named_file)); // get() in this case should only return files in the FS 966 | cached_file.dummy_write() 967 | }); 968 | } 969 | 970 | 971 | #[bench] 972 | fn in_memory_file_read_10mb(b: &mut Bencher) { 973 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 974 | let path_10m = create_test_file(&temp_dir, MEG10, FILE_MEG10); 975 | 976 | b.iter(|| { 977 | let in_memory_file = Arc::new(InMemoryFile::open(path_10m.clone()).unwrap()); 978 | let file: *const InMemoryFile = Arc::into_raw(in_memory_file); 979 | unsafe { 980 | let _ = (*file).bytes.clone(); 981 | let _ = Arc::from_raw(file); 982 | } 983 | }); 984 | } 985 | 986 | 987 | #[test] 988 | fn file_exceeds_size_limit() { 989 | let cache: Cache = CacheBuilder::new() 990 | .size_limit(MEG1 * 8) // Cache can hold only 8Mb 991 | .build() 992 | .unwrap(); 993 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 994 | let path_10m = create_test_file(&temp_dir, MEG10, FILE_MEG10); 995 | 996 | let named_file = NamedFile::open(path_10m.clone()).unwrap(); 997 | 998 | // expect the cache to get the item from the FS. 999 | assert_eq!(cache.try_insert(path_10m), CachedFile::from(named_file)); 1000 | } 1001 | 1002 | 1003 | #[test] 1004 | fn file_replaces_other_file() { 1005 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 1006 | 1007 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 1008 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 1009 | 1010 | let named_file_1m = NamedFile::open(path_1m.clone()).unwrap(); 1011 | let named_file_1m_2 = NamedFile::open(path_1m.clone()).unwrap(); 1012 | 1013 | 1014 | let mut imf_5m = InMemoryFile::open(path_5m.clone()).unwrap(); 1015 | let mut imf_1m = InMemoryFile::open(path_1m.clone()).unwrap(); 1016 | 1017 | // set expected stats for 5m 1018 | imf_5m.stats.access_count = 1; 1019 | imf_5m.stats.priority = 2289; 1020 | 1021 | 1022 | let cache: Cache = CacheBuilder::new() 1023 | .size_limit(5500000) //Cache can hold only about 5.5Mib 1024 | .build() 1025 | .unwrap(); 1026 | 1027 | println!("0:\n{:#?}", cache); 1028 | 1029 | assert_eq!( 1030 | cache 1031 | .try_insert(path_5m.clone()) 1032 | .get_in_memory_file() 1033 | .file 1034 | .as_ref() 1035 | .get(), 1036 | &imf_5m 1037 | ); 1038 | println!("1:\n{:#?}", cache); 1039 | assert_eq!( 1040 | cache.try_insert(path_1m.clone()), 1041 | CachedFile::from(named_file_1m) 1042 | ); 1043 | println!("2:\n{:#?}", cache); 1044 | assert_eq!( 1045 | cache.try_insert(path_1m.clone()), 1046 | CachedFile::from(named_file_1m_2) 1047 | ); 1048 | println!("3:\n{:#?}", cache); 1049 | 1050 | // set the expected stats for 1m 1051 | imf_1m.stats.access_count = 3; 1052 | imf_1m.stats.priority = 3072; 1053 | 1054 | assert_eq!( 1055 | cache 1056 | .try_insert(path_1m.clone()) 1057 | .get_in_memory_file() 1058 | .file 1059 | .as_ref() 1060 | .get(), 1061 | &imf_1m 1062 | ); 1063 | println!("4:\n{:#?}", cache); 1064 | } 1065 | 1066 | 1067 | 1068 | 1069 | #[test] 1070 | fn new_file_replaces_lowest_priority_file() { 1071 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 1072 | let path_1m = create_test_file(&temp_dir, MEG1, FILE_MEG1); 1073 | let path_2m = create_test_file(&temp_dir, MEG2, FILE_MEG2); 1074 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 1075 | 1076 | 1077 | #[allow(unused_variables)] 1078 | let named_file_1m = NamedFile::open(path_1m.clone()).unwrap(); 1079 | 1080 | let cache: Cache = CacheBuilder::new() 1081 | .size_limit(MEG1 * 7 + 2000) // cache can hold a little more than 7MB 1082 | .build() 1083 | .unwrap(); 1084 | 1085 | println!("1:\n{:#?}", cache); 1086 | let mut imf_5m: InMemoryFile = InMemoryFile::open(path_5m.clone()).unwrap(); 1087 | imf_5m.stats.priority = 2289; 1088 | imf_5m.stats.access_count = 1; 1089 | 1090 | assert_eq!( 1091 | cache.get(&path_5m) 1092 | .get_in_memory_file() 1093 | .file 1094 | .as_ref() 1095 | .get(), 1096 | &imf_5m 1097 | ); 1098 | 1099 | println!("2:\n{:#?}", cache); 1100 | let mut imf_2m: InMemoryFile = InMemoryFile::open(path_2m.clone()).unwrap(); 1101 | imf_2m.stats.priority = 1448; 1102 | imf_2m.stats.access_count = 1; 1103 | assert_eq!( 1104 | cache.get(&path_2m) 1105 | .get_in_memory_file() 1106 | .file 1107 | .as_ref() 1108 | .get(), 1109 | &imf_2m 1110 | ); 1111 | 1112 | 1113 | println!("3:\n{:#?}", cache); 1114 | let mut named_1m = NamedFile::open(path_1m.clone()).unwrap(); 1115 | let mut v: Vec = Vec::new(); 1116 | let _ = cache 1117 | .get(&path_1m) 1118 | .get_named_file() 1119 | .read_to_end(&mut v) 1120 | .unwrap(); 1121 | 1122 | let mut file_vec: Vec = Vec::new(); 1123 | let _ = named_1m.read_to_end(&mut file_vec); 1124 | assert_eq!( 1125 | v, 1126 | file_vec 1127 | ); 1128 | 1129 | 1130 | println!("4:\n{:#?}", cache); 1131 | let mut imf_1m: InMemoryFile = InMemoryFile::open(path_1m.clone()).unwrap(); 1132 | imf_1m.stats.priority = 2048; // This priority is higher than the in memory file - 2m's 1448, and therefore will replace it now 1133 | imf_1m.stats.access_count = 1; 1134 | 1135 | // The cache will now accept the 1 meg file because (sqrt(2)_size * 1_access) for the old 1136 | // file is less than (sqrt(1)_size * 2_access) for the new file. 1137 | assert_eq!( 1138 | cache.get(&path_1m) 1139 | .get_in_memory_file() 1140 | .file 1141 | .as_ref() 1142 | .get() 1143 | .bytes, 1144 | imf_1m.bytes 1145 | ); 1146 | println!("5:\n{:#?}", cache); 1147 | 1148 | 1149 | if let CachedFile::NotFound = cache.get_from_cache(&path_1m) { 1150 | panic!("Expected 1m file to be in the cache"); 1151 | } 1152 | 1153 | // Check if the 5m file is still in the cache 1154 | if let CachedFile::NotFound = cache.get_from_cache(&path_5m) { 1155 | panic!("Expected 5m file to be in the cache"); 1156 | } 1157 | 1158 | // 1159 | if let CachedFile::InMemory(_) = cache.get_from_cache(&path_2m) { 1160 | panic!("Expected 2m file to not be in the cache"); 1161 | } 1162 | 1163 | drop(cache); 1164 | } 1165 | 1166 | 1167 | 1168 | 1169 | #[test] 1170 | fn remove_file() { 1171 | let cache: Cache = CacheBuilder::new() 1172 | .size_limit(MEG1 * 10) 1173 | .build() 1174 | .unwrap(); 1175 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 1176 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 1177 | 1178 | let mut imf: InMemoryFile = InMemoryFile::open(path_5m.clone()).unwrap(); 1179 | 1180 | // Set the expected values for the stats in IMF. 1181 | imf.stats.priority = 2289; 1182 | imf.stats.access_count = 1; 1183 | 1184 | // expect the cache to get the item from the FS. 1185 | assert_eq!( 1186 | cache 1187 | .get(&path_5m) 1188 | .get_in_memory_file() 1189 | .file 1190 | .as_ref() 1191 | .get(), 1192 | &imf 1193 | ); 1194 | 1195 | cache.remove(&path_5m); 1196 | 1197 | assert_eq!(cache.contains_key(&path_5m.clone()), false); 1198 | } 1199 | 1200 | #[test] 1201 | fn refresh_file() { 1202 | let cache: Cache = CacheBuilder::new() 1203 | .size_limit(MEG1 * 10) 1204 | .build() 1205 | .unwrap(); 1206 | 1207 | let temp_dir = TempDir::new(DIR_TEST).unwrap(); 1208 | let path_5m = create_test_file(&temp_dir, MEG5, FILE_MEG5); 1209 | 1210 | 1211 | assert_eq!( 1212 | match cache.get(&path_5m) { 1213 | CachedFile::InMemory(c) => c.file.get().stats.size, 1214 | CachedFile::FileSystem(_) => unreachable!(), 1215 | CachedFile::NotFound => unreachable!() 1216 | }, 1217 | MEG5 1218 | ); 1219 | 1220 | let path_of_file_with_10mb_but_path_name_5m = create_test_file(&temp_dir, MEG10, FILE_MEG5); 1221 | 1222 | 1223 | cache.refresh(&path_5m); 1224 | 1225 | assert_eq!( 1226 | match cache.get(&path_of_file_with_10mb_but_path_name_5m) { 1227 | CachedFile::InMemory(c) => c.file.get().stats.size, 1228 | CachedFile::FileSystem(_) => unreachable!(), 1229 | CachedFile::NotFound => unreachable!() 1230 | }, 1231 | MEG10 1232 | ); 1233 | 1234 | drop(cache); 1235 | } 1236 | 1237 | } 1238 | -------------------------------------------------------------------------------- /src/cache_builder.rs: -------------------------------------------------------------------------------- 1 | use cache::Cache; 2 | 3 | use priority_function::default_priority_function; 4 | use std::usize; 5 | 6 | use concurrent_hashmap::{ConcHashMap, Options}; 7 | use std::collections::hash_map::RandomState; 8 | 9 | 10 | /// Error types that can be encountered when a cache is built. 11 | #[derive(Debug, PartialEq)] 12 | pub enum CacheBuildError { 13 | MinFileSizeIsLargerThanMaxFileSize, 14 | } 15 | 16 | /// A builder for Caches. 17 | #[derive(Debug)] 18 | pub struct CacheBuilder { 19 | size_limit: Option, 20 | accesses_per_refresh: Option, 21 | concurrency: Option, 22 | priority_function: Option usize>, 23 | min_file_size: Option, 24 | max_file_size: Option, 25 | } 26 | 27 | 28 | impl CacheBuilder { 29 | 30 | /// Create a new CacheBuilder. 31 | /// 32 | pub fn new() -> CacheBuilder { 33 | CacheBuilder { 34 | size_limit: None, 35 | accesses_per_refresh: None, 36 | concurrency: None, 37 | priority_function: None, 38 | min_file_size: None, 39 | max_file_size: None, 40 | } 41 | } 42 | 43 | /// Sets the maximum number of bytes (as they exist in the FS) that the cache can hold. 44 | /// The cache will take up more space in memory due to the backing concurrent HashMap it uses. 45 | /// The memory overhead can be controlled by setting the concurrency parameter. 46 | /// 47 | /// # Arguments 48 | /// * size_limit - The number of bytes the cache will be able to hold. 49 | /// 50 | pub fn size_limit<'a>(&'a mut self, size_limit: usize) -> &mut Self { 51 | self.size_limit = Some(size_limit); 52 | self 53 | } 54 | 55 | /// Sets the concurrency setting of the concurrent hashmap backing the cache. 56 | /// A higher concurrency setting allows more threads to access the hashmap at the expense of more memory use. 57 | /// The default is 16. 58 | pub fn concurrency<'a>(&'a mut self, concurrency: u16) -> &mut Self { 59 | self.concurrency = Some(concurrency); 60 | self 61 | } 62 | 63 | 64 | /// Sets the number of times a file can be accessed from the cache before it will be refreshed from the disk. 65 | /// By providing 1000, that will instruct the cache to refresh the file every 1000 times its accessed. 66 | /// By default, the cache will not refresh the file. 67 | /// 68 | /// This should be useful if you anticipate bitrot for the cache contents in RAM, as it will 69 | /// refresh the file from the FileSystem, meaning that if there is an error in the cached data, 70 | /// it will only be served for an average of n/2 accesses before the automatic refresh replaces it 71 | /// with an assumed correct copy. 72 | /// 73 | /// # Panics 74 | /// This function will panic if 0 is supplied. 75 | /// This is to prevent 0 being used as a divisor in a modulo operation later. 76 | /// 77 | pub fn accesses_per_refresh<'a>(&'a mut self, accesses: usize) -> &mut Self { 78 | if accesses < 1 { 79 | panic!("Incorrectly configured access_per_refresh rate. Values of 0 or 1 are not allowed."); 80 | } else { 81 | if accesses == 1 { 82 | warn!("The accesses_per_refresh value of 1 should not be used except in a development environment. This will cause the cache to refresh the file every time it is requested, negating its purpose as a cache."); 83 | } 84 | self.accesses_per_refresh = Some(accesses); 85 | } 86 | self 87 | } 88 | 89 | 90 | /// Override the default priority function used for determining if the cache should hold a file. 91 | /// By default a score is calculated using the square root of the size of a file, times the number 92 | /// of times it was accessed. 93 | /// Files with higher priority scores will be kept in the cache when files with lower scores are 94 | /// added. 95 | /// If there isn't room in the cache for two files, the one with the lower score will be removed / 96 | /// won't be added. 97 | /// 98 | /// The priority function should be kept simple, as it is calculated on every file in the cache 99 | /// every time a new file is attempted to be added. 100 | /// 101 | /// # Example 102 | /// 103 | /// ``` 104 | /// use rocket_file_cache::Cache; 105 | /// use rocket_file_cache::CacheBuilder; 106 | /// let cache: Cache = CacheBuilder::new() 107 | /// .priority_function(|access_count, size| { 108 | /// access_count * access_count * size 109 | /// }) 110 | /// .build() 111 | /// .unwrap(); 112 | /// ``` 113 | pub fn priority_function<'a>(&'a mut self, priority_function: fn(usize, usize) -> usize) -> &mut Self { 114 | self.priority_function = Some(priority_function); 115 | self 116 | } 117 | 118 | /// Set the minimum size in bytes for files that can be stored in the cache 119 | pub fn min_file_size<'a>(&'a mut self, min_size: usize) -> &mut Self { 120 | self.min_file_size = Some(min_size); 121 | self 122 | } 123 | 124 | /// Set the maximum size in bytes for files that can be stored in the cache 125 | pub fn max_file_size<'a>(&'a mut self, max_size: usize) -> &mut Self { 126 | self.max_file_size = Some(max_size); 127 | self 128 | } 129 | 130 | /// Finalize the cache. 131 | /// 132 | /// # Example 133 | /// 134 | /// ``` 135 | /// use rocket_file_cache::Cache; 136 | /// use rocket_file_cache::CacheBuilder; 137 | /// 138 | /// let cache: Cache = CacheBuilder::new() 139 | /// .size_limit(1024 * 1024 * 50) // 50 MB cache 140 | /// .min_file_size(1024 * 4) // Don't store files smaller than 4 KB 141 | /// .max_file_size(1024 * 1024 * 6) // Don't store files larger than 6 MB 142 | /// .build() 143 | /// .unwrap(); 144 | /// ``` 145 | pub fn build(&self) -> Result { 146 | 147 | let size_limit: usize = match self.size_limit { 148 | Some(s) => s, 149 | None => { 150 | warn!("Size for cache not configured. This may lead to the cache using more memory than necessary."); 151 | usize::MAX 152 | } 153 | }; 154 | 155 | let priority_function = match self.priority_function { 156 | Some(pf) => pf, 157 | None => default_priority_function, 158 | }; 159 | 160 | if let Some(min_file_size) = self.min_file_size { 161 | if let Some(max_file_size) = self.max_file_size { 162 | if min_file_size > max_file_size { 163 | return Err(CacheBuildError::MinFileSizeIsLargerThanMaxFileSize); 164 | } 165 | } 166 | } 167 | 168 | let min_file_size: usize = match self.min_file_size { 169 | Some(min) => min, 170 | None => 0, 171 | }; 172 | 173 | let max_file_size: usize = match self.max_file_size { 174 | Some(max) => max, 175 | None => usize::MAX, 176 | }; 177 | 178 | 179 | 180 | let mut options_files_map: Options = Options::default(); 181 | let mut options_access_map: Options = Options::default(); 182 | 183 | if let Some(conc) = self.concurrency { 184 | options_files_map.concurrency = conc; 185 | options_access_map.concurrency = conc; 186 | } 187 | 188 | 189 | Ok(Cache { 190 | size_limit: size_limit, 191 | min_file_size, 192 | max_file_size, 193 | priority_function, 194 | accesses_per_refresh: self.accesses_per_refresh, 195 | file_map: ConcHashMap::with_options(options_files_map), 196 | access_count_map: ConcHashMap::with_options(options_access_map), 197 | }) 198 | 199 | } 200 | } 201 | 202 | #[cfg(test)] 203 | mod tests { 204 | use super::*; 205 | 206 | #[test] 207 | fn min_greater_than_max() { 208 | 209 | let e: CacheBuildError = CacheBuilder::new() 210 | .min_file_size(1024 * 1024 * 5) 211 | .max_file_size(1024 * 1024 * 4) 212 | .build() 213 | .unwrap_err(); 214 | assert_eq!(CacheBuildError::MinFileSizeIsLargerThanMaxFileSize, e); 215 | } 216 | 217 | #[test] 218 | fn all_options_used_in_build() { 219 | let _: Cache = CacheBuilder::new() 220 | .size_limit(1024 * 1024 * 20) 221 | .priority_function(|access_count: usize, size: usize| access_count * size) 222 | .max_file_size(1024 * 1024 * 10) 223 | .min_file_size(1024 * 10) 224 | .concurrency(20) 225 | .accesses_per_refresh(1000) 226 | .build() 227 | .unwrap(); 228 | } 229 | 230 | } 231 | -------------------------------------------------------------------------------- /src/cached_file.rs: -------------------------------------------------------------------------------- 1 | use rocket::http::Status; 2 | use rocket::response::{Response, Responder, NamedFile}; 3 | use rocket::request::Request; 4 | use cache::Cache; 5 | use std::path::Path; 6 | 7 | use named_in_memory_file::NamedInMemoryFile; 8 | 9 | 10 | /// Wrapper around data that can represent a file - either in memory (cache), or on disk. 11 | /// 12 | /// When getting a `CachedFile` from the cache: 13 | /// * An `InMemory` variant indicates that the file was read into the cache and a reference to that file is attached to the variant. 14 | /// * A `FileSystem` variant indicates that the file is not in the cache, but it can be accessed from the filesystem. 15 | /// * A `NotFound` variant indicates that the file can not be found in the filesystem or the cache. 16 | #[derive(Debug)] 17 | pub enum CachedFile<'a> { 18 | /// A file that has been loaded into the cache. 19 | InMemory(NamedInMemoryFile<'a>), 20 | /// A file that exists in the filesystem. 21 | FileSystem(NamedFile), 22 | /// The file does not exist in either the cache or the filesystem. 23 | NotFound 24 | } 25 | 26 | impl<'a> CachedFile<'a> { 27 | 28 | /// A convenience function that wraps the getting of a cached file. 29 | /// 30 | /// This is done to keep the code required to use the cache as similar to the typical use of 31 | /// rocket::response::NamedFile. 32 | pub fn open>(path: P, cache: &'a Cache) -> CachedFile<'a> { 33 | cache.get(path) 34 | } 35 | } 36 | 37 | 38 | impl<'a> From> for CachedFile<'a> { 39 | fn from(cached_file: NamedInMemoryFile<'a>) -> CachedFile<'a> { 40 | CachedFile::InMemory(cached_file) 41 | } 42 | } 43 | 44 | impl From for CachedFile<'static> { 45 | fn from(named_file: NamedFile) -> Self { 46 | CachedFile::FileSystem(named_file) 47 | } 48 | } 49 | 50 | impl<'a> Responder<'a> for CachedFile<'a> { 51 | fn respond_to(self, request: &Request) -> Result, Status> { 52 | 53 | match self { 54 | CachedFile::InMemory(cached_file) => cached_file.respond_to(request), 55 | CachedFile::FileSystem(named_file) => named_file.respond_to(request), 56 | CachedFile::NotFound => { 57 | error!("Response was `FileNotFound`.",); 58 | Err(Status::NotFound) 59 | } 60 | } 61 | } 62 | } 63 | 64 | 65 | impl<'a> PartialEq for CachedFile<'a> { 66 | fn eq(&self, other: &CachedFile) -> bool { 67 | match *self { 68 | CachedFile::InMemory(ref lhs_cached_file) => { 69 | match *other { 70 | CachedFile::InMemory(ref rhs_cached_file) => (*rhs_cached_file.file).get() == (*lhs_cached_file.file).get(), 71 | CachedFile::FileSystem(_) => false, 72 | CachedFile::NotFound => false 73 | } 74 | } 75 | CachedFile::FileSystem(ref lhs_named_file) => { 76 | match *other { 77 | CachedFile::InMemory(_) => false, 78 | CachedFile::FileSystem(ref rhs_named_file) => { 79 | // This just compares the file paths 80 | *lhs_named_file.path() == *rhs_named_file.path() 81 | } 82 | CachedFile::NotFound => false 83 | } 84 | } 85 | CachedFile::NotFound => { 86 | match *other { 87 | CachedFile::InMemory(_) => false, 88 | CachedFile::FileSystem(_) => false, 89 | CachedFile::NotFound => true 90 | } 91 | } 92 | } 93 | 94 | } 95 | } 96 | -------------------------------------------------------------------------------- /src/in_memory_file.rs: -------------------------------------------------------------------------------- 1 | 2 | use std::path::Path; 3 | use std::io::BufReader; 4 | use std::fs::File; 5 | use std::io; 6 | use std::io::Read; 7 | use std::fmt; 8 | 9 | 10 | /// The structure that represents a file in memory. 11 | /// Keeps an up to date record of its stats so the cache can use this information to remove the file 12 | /// from the cache. 13 | #[derive(Clone, PartialEq)] 14 | pub struct InMemoryFile { 15 | pub(crate) bytes: Vec, 16 | pub stats: FileStats, 17 | } 18 | 19 | impl fmt::Debug for InMemoryFile { 20 | fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { 21 | // The byte array shouldn't be visible in the log. 22 | write!( 23 | f, 24 | "SizedFile {{ bytes: ..., size: {}, priority: {} }}", 25 | self.stats.size, 26 | self.stats.priority 27 | ) 28 | } 29 | } 30 | 31 | 32 | impl InMemoryFile { 33 | /// Reads the file at the path into an InMemoryFile. 34 | pub fn open>(path: P) -> io::Result { 35 | let file = File::open(path.as_ref())?; 36 | let mut reader = BufReader::new(file); 37 | let mut bytes: Vec = vec![]; 38 | let size: usize = reader.read_to_end(&mut bytes)?; 39 | 40 | let stats = FileStats { 41 | size, 42 | access_count: 0, 43 | priority: 0, 44 | }; 45 | 46 | Ok(InMemoryFile { bytes, stats }) 47 | } 48 | } 49 | 50 | 51 | /// Holds information related to the InMemoryFile. 52 | /// This information will be used to determine if the file should be replaced in the cache. 53 | #[derive(Debug, PartialEq, Clone)] 54 | pub struct FileStats { 55 | /// The number of bytes the file contains. 56 | pub size: usize, 57 | /// The number of times the file has been requested. 58 | /// This value can be altered the `alter_access_count()` method on the `Cache`, 59 | /// and therefore will not represent the true number of access attempts the file has if that 60 | /// function is called. 61 | pub access_count: usize, 62 | /// The priority score. 63 | /// This is updated every time the access count is incremented by running the cache's `priority_function` 64 | /// on the `size` and `access_count`. 65 | pub priority: usize, 66 | } -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | #![feature(test)] 2 | 3 | extern crate rocket; 4 | #[macro_use] 5 | extern crate log; 6 | 7 | extern crate concurrent_hashmap; 8 | 9 | mod cache; 10 | mod in_memory_file; 11 | pub mod named_in_memory_file; 12 | mod cache_builder; 13 | mod priority_function; 14 | mod cached_file; 15 | 16 | pub use cache::Cache; 17 | pub use cache_builder::{CacheBuilder, CacheBuildError}; 18 | pub use cached_file::CachedFile; 19 | pub use priority_function::*; 20 | -------------------------------------------------------------------------------- /src/named_in_memory_file.rs: -------------------------------------------------------------------------------- 1 | use rocket::response::{Response, Responder}; 2 | use rocket::http::{Status, ContentType}; 3 | use rocket::request::Request; 4 | use rocket::response::Body; 5 | 6 | use std::result; 7 | use std::sync::Arc; 8 | use std::path::{PathBuf, Path}; 9 | 10 | use in_memory_file::InMemoryFile; 11 | 12 | use concurrent_hashmap::Accessor; 13 | 14 | use std::fmt::{Formatter, Debug}; 15 | use std::fmt; 16 | 17 | 18 | /// A wrapper around an in-memory file. 19 | /// This struct is created when when a request to the cache is made. 20 | /// The CachedFile knows its path, so it can set the content type when it is serialized to a response. 21 | pub struct NamedInMemoryFile<'a> { 22 | pub(crate) path: PathBuf, 23 | pub(crate) file: Arc>, 24 | } 25 | 26 | 27 | impl<'a> Debug for NamedInMemoryFile<'a> { 28 | fn fmt(&self, fmt: &mut Formatter) -> fmt::Result { 29 | write!(fmt, "path: {:?}, file: {:?}", self.path, self.file.get()) 30 | } 31 | } 32 | 33 | 34 | impl<'a> NamedInMemoryFile<'a> { 35 | /// Reads the file at the path into a NamedInMemoryFile. 36 | pub(crate) fn new>(path: P, m: Accessor<'a, PathBuf, InMemoryFile>) -> NamedInMemoryFile<'a> { 37 | NamedInMemoryFile { 38 | path: path.as_ref().to_path_buf(), 39 | file: Arc::new(m), 40 | } 41 | } 42 | } 43 | 44 | 45 | /// Streams the cached file to the client. Sets or overrides the Content-Type in 46 | /// the response according to the file's extension if the extension is recognized. 47 | /// 48 | /// If you would like to stream a file with a different Content-Type than that implied by its 49 | /// extension, convert the `CachedFile` to a `File`, and respond with that instead. 50 | /// 51 | /// Based on NamedFile from rocket::response::NamedFile 52 | impl<'a> Responder<'a> for NamedInMemoryFile<'a> { 53 | fn respond_to(self, _: &Request) -> result::Result, Status> { 54 | let mut response = Response::new(); 55 | if let Some(ext) = self.path.extension() { 56 | if let Some(ct) = ContentType::from_extension(&ext.to_string_lossy()) { 57 | response.set_header(ct); 58 | } 59 | } 60 | 61 | unsafe { 62 | let cloned_wrapper: *const Accessor<'a, PathBuf, InMemoryFile> = Arc::into_raw(self.file); 63 | response.set_raw_body( Body::Sized((*cloned_wrapper).get().bytes.as_slice(), (*cloned_wrapper).get().stats.size as u64) ); 64 | let _ = Arc::from_raw(cloned_wrapper); // To prevent a memory leak, an Arc needs to be reconstructed from the raw pointer. 65 | } 66 | 67 | Ok(response) 68 | } 69 | } 70 | -------------------------------------------------------------------------------- /src/priority_function.rs: -------------------------------------------------------------------------------- 1 | use std::usize; 2 | 3 | /// The default priority function used for determining if a file should be in the cache. 4 | /// 5 | /// This function takes the square root of the size of the file times the number of times it has been accessed. 6 | /// This should give some priority to bigger files, while still allowing some smaller files to enter the cache. 7 | pub fn default_priority_function(access_count: usize, size: usize) -> usize { 8 | match usize::checked_mul((size as f64).sqrt() as usize, access_count) { 9 | Some(v) => v, 10 | None => usize::MAX, 11 | } 12 | } 13 | 14 | /// Priority is calculated as the size times the access count. 15 | pub fn normal_priority_function(access_count: usize, size: usize) -> usize { 16 | match usize::checked_mul(size, access_count) { 17 | Some(v) => v, 18 | None => usize::MAX, 19 | } 20 | } 21 | 22 | /// This priority function will value files in the cache based solely on the number of times the file was accessed. 23 | pub fn access_priority_function(access_count: usize, _: usize) -> usize { 24 | access_count 25 | } 26 | 27 | 28 | /// Favor small files without respect to the number of times file was accessed. 29 | /// 30 | /// The smaller the file, the higher priority it will have. 31 | /// Does not take into account the number of accesses the file has. 32 | pub fn small_files_priority_function(_: usize, size: usize) -> usize { 33 | usize::checked_div(usize::MAX, size).unwrap_or(0) // don't give any priority to completely empty files. 34 | } 35 | 36 | /// Favor small files with respect to the number of times file was accessed. 37 | /// 38 | /// The smaller the file, the higher priority it will have. 39 | /// Does take into account the number of accesses the file has. 40 | pub fn small_files_access_priority_function(access_count: usize, size: usize) -> usize { 41 | match usize::checked_mul( 42 | usize::checked_div(usize::MAX, size).unwrap_or(0), 43 | access_count, 44 | ) { 45 | Some(v) => v, 46 | None => usize::MAX, // If the multiplication overflows, then the file will have the maximum priority. 47 | } 48 | } 49 | --------------------------------------------------------------------------------