├── README.md ├── TODO.md ├── admin.go ├── admin_test.go ├── atomic.go ├── cli └── cli.go ├── cmd ├── hafs-evict │ └── main.go ├── hafs-fuse │ └── main.go ├── hafs-gc-clients │ └── main.go ├── hafs-gc-unlinked │ └── main.go ├── hafs-list-clients │ └── main.go ├── hafs-list-filesystems │ └── main.go ├── hafs-mkfs │ └── main.go ├── hafs-object-storage │ └── main.go └── hafs-rmfs │ └── main.go ├── fs.go ├── fs_test.go ├── fuse.go ├── go.mod ├── object_storage.go ├── support └── nix │ ├── foundationdb.nix │ └── shell.nix └── testutil └── testutil.go /README.md: -------------------------------------------------------------------------------- 1 | # High Availability Filesystem 2 | 3 | A distributed filesystem library and fuse filesystem built on FoundationDB meant 4 | for storing metadata and doing file locking across a cluster of servers or containers. 5 | 6 | You could think of HAFS like consul or etcd, but as a fuse filesystem - applications can use file locks and write + rename to do atomic updates across your entire cluster without any library support. 7 | 8 | The HAFS filesystem also has the ability to store file data in s3 and other 9 | external data to create horizontally scalable directories that work well with 10 | write once, sequential read workloads. 11 | 12 | ## Why does this exist 13 | 14 | My project [bupstash.io](https://bupstash.io/) needed a reliable and fault tolerant 15 | way to serve repository metadata across many machines. Having been bitten by NFS in the past, I wanted to 16 | be confident in all the failure modes I would encounter when using distributed file locks. 17 | 18 | ### Ideal use cases 19 | 20 | In general this filesystem is good for configuration files and metadata that must be kept consistent 21 | across a whole cluster while also tolerating server failure - it has very low single threaded write throughput (unless you use the s3 backed file feature). 22 | 23 | ## Features 24 | 25 | - A distributed and consistent filesystem suitable for metadata and configuration across a cluster. 26 | - Directories that can efficiently scale to huge numbers of files. 27 | - Optional s3 backed files that are efficient for sequential access of bulk data. 28 | - Support for distributed posix whole file locks and BSD locks with client eviction - if 29 | a lock is broken, that client can no longer read or write to the filesystem without remounting. 30 | 31 | ## Current Limitations 32 | 33 | - Posix locks do not support partial file range locks (so non exclusive sqlite3 doesn't work yet). 34 | - Files backed by s3 or other object storage are not efficient for random access, only sequential access. 35 | - Because HAFS is built on top of foundationdb, it requires at least 4GB of ram per node, which can be a bit heavy depending on the use case and project budget (though it works great on an existing foundationDB deployment). 36 | - Currently HAFS uses fuse permission checks and is thus subject to TOCTU races if you allow multiple 37 | users to modify files. Unless you understand this limitation it is better to use a single user fuse mount. 38 | - Inodes may be accessed after unlinking like a typical filesystem, but in HAFS they expire after a 39 | configurable time limit, you cannot use an unlinked file indefinitely. 40 | 41 | ## Caveats and Gotchas. 42 | 43 | - File locking is only effective at protecting updates to the filesystem, you should not coordinate 44 | updates to resources outside the filesystem without at least doing periodic file access and 45 | understanding the limitations of this approach. 46 | 47 | ## Getting started 48 | 49 | You will need: 50 | 51 | - A running foundationdb cluster. 52 | - The hafs binaries added to your PATH. 53 | 54 | Create and mount a filesystem: 55 | 56 | ``` 57 | # Create the default 'fs' filesystem. 58 | $ hafs-mkfs 59 | $ hafs-fuse /mnt/hafs 60 | ``` 61 | 62 | From another terminal you can access hafs: 63 | 64 | ``` 65 | $ hafs-list-clients 66 | ... 67 | $ ls /mnt/hafs 68 | $ echo foo > /mnt/hafs/test.txt 69 | ``` 70 | 71 | From any number of other machines mount the same filesystem and have a fully consistent distributed 72 | filesystem - including file locks. 73 | 74 | ## S3 backed files 75 | 76 | Implemented but currently undocumented... -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | # TODO 2 | 3 | The filesystem works well for many use cases, but there are many 4 | things that could still be improved. 5 | 6 | ## Permission checks 7 | 8 | Permission checks are all done via fuse - however because HAFS is a distributed filesystem there 9 | are still TOCTOU races - we need to do our own permission checks within transactions to resolve this. 10 | 11 | It should be noted these might not be worth fixing if the various cases are documented - For example 12 | to prevent users affecting eachother they could simple deny other users any access to their directories at all. 13 | 14 | ## Loop checks 15 | 16 | In certain situations it is possible to move a directory inside itself and it will become unreachable, 17 | if there is an efficient way to avoid this we could consider it - though it might not be worth fixing. 18 | 19 | ## Respect relatime 20 | 21 | How can we properly respect atime and relatime? Is it a fuse flag we must respect? 22 | 23 | ## Full test coverage 24 | 25 | 100 percent test coverage. 26 | 27 | ## Review code for TODO 28 | 29 | There are various XXX and TODO in the code that need to be addressed or decided upon. 30 | 31 | -------------------------------------------------------------------------------- /admin.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "encoding/binary" 5 | "encoding/json" 6 | "errors" 7 | "fmt" 8 | "time" 9 | 10 | "github.com/apple/foundationdb/bindings/go/src/fdb" 11 | "github.com/apple/foundationdb/bindings/go/src/fdb/tuple" 12 | ) 13 | 14 | const ( 15 | CURRENT_FDB_API_VERSION = 710 16 | ) 17 | 18 | func init() { 19 | fdb.MustAPIVersion(CURRENT_FDB_API_VERSION) 20 | } 21 | 22 | type MkfsOpts struct { 23 | Overwrite bool 24 | } 25 | 26 | func Mkfs(db fdb.Database, fsName string, opts MkfsOpts) error { 27 | 28 | if fsName == "" { 29 | return errors.New("filesystem name must not be empty") 30 | } 31 | 32 | if len(fsName) > 255 { 33 | return errors.New("filesystem name must be less than 256 bytes") 34 | } 35 | 36 | validNameRune := func(r rune) bool { 37 | if r == '-' || r == '_' { 38 | return true 39 | } else if (r >= 'A' && r <= 'Z') || (r >= 'a' && r <= 'z') { 40 | return true 41 | } else if r >= '0' && r <= '9' { 42 | return true 43 | } else { 44 | return false 45 | } 46 | } 47 | 48 | for _, r := range fsName { 49 | if !validNameRune(r) { 50 | return errors.New("filesystem names must only contain 'a-z', 'A-Z', '0-9', '-' and _'") 51 | } 52 | } 53 | 54 | _, err := db.Transact(func(tx fdb.Transaction) (interface{}, error) { 55 | 56 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() != nil { 57 | if !opts.Overwrite { 58 | return nil, errors.New("filesystem already present") 59 | } 60 | } 61 | 62 | now := time.Now() 63 | 64 | rootStat := Stat{ 65 | Ino: ROOT_INO, 66 | Subvolume: 0, 67 | Flags: FLAG_SUBVOLUME, // The root inode is a subvolume of the filesystem. 68 | Size: 0, 69 | Atimesec: 0, 70 | Mtimesec: 0, 71 | Ctimesec: 0, 72 | Atimensec: 0, 73 | Mtimensec: 0, 74 | Ctimensec: 0, 75 | Mode: S_IFDIR | 0o755, 76 | Nlink: 1, 77 | Uid: 0, 78 | Gid: 0, 79 | Rdev: 0, 80 | } 81 | 82 | rootStat.SetMtime(now) 83 | rootStat.SetCtime(now) 84 | rootStat.SetAtime(now) 85 | 86 | rootStatBytes, err := rootStat.MarshalBinary() 87 | if err != nil { 88 | return nil, err 89 | } 90 | 91 | tx.ClearRange(tuple.Tuple{"hafs", fsName}) 92 | tx.Set(tuple.Tuple{"hafs", fsName, "object-storage"}, []byte("")) 93 | tx.Set(tuple.Tuple{"hafs", fsName, "version"}, []byte{CURRENT_SCHEMA_VERSION}) 94 | tx.Set(tuple.Tuple{"hafs", fsName, "ino", ROOT_INO, "stat"}, rootStatBytes) 95 | tx.Set(tuple.Tuple{"hafs", fsName, "inocntr"}, []byte{0, 0, 0, 0, 0, 0, 0, 1}) 96 | return nil, nil 97 | }) 98 | return err 99 | } 100 | 101 | type RmfsOpts struct { 102 | Force bool 103 | } 104 | 105 | func Rmfs(db fdb.Database, fsName string, opts RmfsOpts) (bool, error) { 106 | v, err := db.Transact(func(tx fdb.Transaction) (interface{}, error) { 107 | kvs := tx.GetRange(tuple.Tuple{"hafs", fsName}, fdb.RangeOptions{ 108 | Limit: 2, 109 | }).GetSliceOrPanic() 110 | if len(kvs) == 0 { 111 | return false, nil 112 | } 113 | if !opts.Force { 114 | kvs = tx.GetRange(tuple.Tuple{"hafs", fsName, "ino"}, fdb.RangeOptions{ 115 | Limit: 2, 116 | }).GetSliceOrPanic() 117 | if len(kvs) != 1 { 118 | // The root inode is an exception. 119 | return false, fmt.Errorf("filesystem is not empty") 120 | } 121 | kvs = tx.GetRange(tuple.Tuple{"hafs", fsName, "clients"}, fdb.RangeOptions{ 122 | Limit: 1, 123 | }).GetSliceOrPanic() 124 | if len(kvs) != 0 { 125 | return false, fmt.Errorf("filesystem has connected clients") 126 | } 127 | kvs = tx.GetRange(tuple.Tuple{"hafs", fsName, "unlinked"}, fdb.RangeOptions{ 128 | Limit: 1, 129 | }).GetSliceOrPanic() 130 | if len(kvs) != 0 { 131 | return false, fmt.Errorf("filesystem has inodes pending garbage collection") 132 | } 133 | } 134 | 135 | tx.ClearRange(tuple.Tuple{"hafs", fsName}) 136 | return true, nil 137 | }) 138 | if err != nil { 139 | return false, err 140 | } 141 | return v.(bool), nil 142 | } 143 | 144 | type ClientInfo struct { 145 | Id string `json:",omitempty"` 146 | Description string `json:",omitempty"` 147 | Hostname string `json:",omitempty"` 148 | Pid int64 `json:",omitempty"` 149 | Exe string `json:",omitempty"` 150 | AttachTimeUnix uint64 `json:",omitempty"` 151 | HeartBeatUnix uint64 `json:",omitempty"` 152 | } 153 | 154 | func GetClientInfo(db fdb.Database, fsName, clientId string) (ClientInfo, bool, error) { 155 | 156 | var ok bool 157 | var info ClientInfo 158 | 159 | _, err := db.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 160 | info = ClientInfo{} 161 | ok = false 162 | 163 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() == nil { 164 | return nil, ErrNotFormatted 165 | } 166 | 167 | infoBytes := tx.Get(tuple.Tuple{"hafs", fsName, "client", clientId, "info"}).MustGet() 168 | if infoBytes == nil { 169 | return nil, nil 170 | } 171 | 172 | err := json.Unmarshal(infoBytes, &info) 173 | if err != nil { 174 | return nil, err 175 | } 176 | 177 | heartBeatBytes := tx.Get(tuple.Tuple{"hafs", fsName, "client", clientId, "heartbeat"}).MustGet() 178 | if len(heartBeatBytes) != 8 { 179 | return nil, errors.New("heart beat bytes are missing or corrupt") 180 | } 181 | info.HeartBeatUnix = binary.LittleEndian.Uint64(heartBeatBytes) 182 | info.Id = clientId 183 | ok = true 184 | return nil, nil 185 | }) 186 | 187 | return info, ok, err 188 | } 189 | 190 | func ListClients(db fdb.Database, fsName string) ([]ClientInfo, error) { 191 | 192 | clients := []ClientInfo{} 193 | 194 | iterBegin, iterEnd := tuple.Tuple{"hafs", fsName, "clients"}.FDBRangeKeys() 195 | 196 | iterRange := fdb.KeyRange{ 197 | Begin: iterBegin, 198 | End: iterEnd, 199 | } 200 | 201 | for { 202 | v, err := db.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 203 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() == nil { 204 | return nil, ErrNotFormatted 205 | } 206 | kvs := tx.GetRange(iterRange, fdb.RangeOptions{ 207 | Limit: 100, 208 | }).GetSliceOrPanic() 209 | return kvs, nil 210 | }) 211 | if err != nil { 212 | return clients, err 213 | } 214 | 215 | kvs := v.([]fdb.KeyValue) 216 | 217 | if len(kvs) == 0 { 218 | break 219 | } 220 | 221 | nextBegin, err := fdb.Strinc(kvs[len(kvs)-1].Key) 222 | if err != nil { 223 | return clients, err 224 | } 225 | iterRange.Begin = fdb.Key(nextBegin) 226 | 227 | for _, kv := range kvs { 228 | tup, err := tuple.Unpack(kv.Key) 229 | if err != nil { 230 | return clients, err 231 | } 232 | 233 | if len(tup) < 1 { 234 | return clients, errors.New("corrupt client key") 235 | } 236 | 237 | clientId := tup[len(tup)-1].(string) 238 | 239 | client, ok, err := GetClientInfo(db, fsName, clientId) 240 | if err != nil { 241 | return clients, err 242 | } 243 | if !ok { 244 | continue 245 | } 246 | clients = append(clients, client) 247 | } 248 | } 249 | 250 | return clients, nil 251 | } 252 | 253 | func IsClientTimedOut(db fdb.Database, fsName, clientId string, clientTimeout time.Duration) (bool, error) { 254 | timedOut, err := db.Transact(func(tx fdb.Transaction) (interface{}, error) { 255 | heatBeatKey := tuple.Tuple{"hafs", fsName, "client", clientId, "heartbeat"} 256 | heartBeatBytes := tx.Get(heatBeatKey).MustGet() 257 | if len(heartBeatBytes) != 8 { 258 | return true, nil 259 | } 260 | lastSeen := time.Unix(int64(binary.LittleEndian.Uint64(heartBeatBytes)), 0) 261 | timedOut := lastSeen.Add(clientTimeout).Before(time.Now()) 262 | return timedOut, nil 263 | }) 264 | if err != nil { 265 | return false, err 266 | } 267 | return timedOut.(bool), nil 268 | } 269 | 270 | func tupleElem2u64(elem tuple.TupleElement) uint64 { 271 | switch elem := elem.(type) { 272 | case uint64: 273 | return elem 274 | case int64: 275 | return uint64(elem) 276 | default: 277 | panic(elem) 278 | } 279 | } 280 | 281 | func txBreakLock(tx fdb.Transaction, fsName, clientId string, ino uint64, owner uint64) error { 282 | exclusiveLockKey := tuple.Tuple{"hafs", fsName, "ino", ino, "lock", "exclusive"} 283 | exclusiveLockBytes := tx.Get(exclusiveLockKey).MustGet() 284 | if exclusiveLockBytes != nil { 285 | exclusiveLock := exclusiveLockRecord{} 286 | err := exclusiveLock.UnmarshalBinary(exclusiveLockBytes) 287 | if err != nil { 288 | return err 289 | } 290 | if exclusiveLock.ClientId == clientId && exclusiveLock.Owner == owner { 291 | tx.Clear(exclusiveLockKey) 292 | } 293 | } else { 294 | sharedLockKey := tuple.Tuple{"hafs", fsName, "ino", ino, "lock", "shared", clientId, owner} 295 | tx.Clear(sharedLockKey) 296 | } 297 | tx.Clear(tuple.Tuple{"hafs", fsName, "client", clientId, "lock", ino, owner}) 298 | return nil 299 | } 300 | 301 | func EvictClient(db fdb.Database, fsName, clientId string) error { 302 | _, err := db.Transact(func(tx fdb.Transaction) (interface{}, error) { 303 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() == nil { 304 | return nil, ErrNotFormatted 305 | } 306 | // Invalidate all the clients in progress transactions. 307 | tx.ClearRange(tuple.Tuple{"hafs", fsName, "client", clientId, "attached"}) 308 | return nil, nil 309 | }) 310 | if err != nil { 311 | return err 312 | } 313 | 314 | // Remove all file locks held by the client. 315 | iterBegin, iterEnd := tuple.Tuple{"hafs", fsName, "client", clientId, "lock"}.FDBRangeKeys() 316 | 317 | iterRange := fdb.KeyRange{ 318 | Begin: iterBegin, 319 | End: iterEnd, 320 | } 321 | 322 | for { 323 | v, err := db.Transact(func(tx fdb.Transaction) (interface{}, error) { 324 | kvs := tx.GetRange(iterRange, fdb.RangeOptions{ 325 | Limit: 64, 326 | }).GetSliceOrPanic() 327 | return kvs, nil 328 | }) 329 | if err != nil { 330 | return err 331 | } 332 | 333 | kvs := v.([]fdb.KeyValue) 334 | 335 | if len(kvs) == 0 { 336 | break 337 | } 338 | 339 | nextBegin, err := fdb.Strinc(kvs[len(kvs)-1].Key) 340 | if err != nil { 341 | return err 342 | } 343 | iterRange.Begin = fdb.Key(nextBegin) 344 | 345 | _, err = db.Transact(func(tx fdb.Transaction) (interface{}, error) { 346 | for _, kv := range kvs { 347 | tup, err := tuple.Unpack(kv.Key) 348 | if err != nil { 349 | return nil, err 350 | } 351 | if len(tup) < 2 { 352 | return nil, errors.New("corrupt lock entry") 353 | } 354 | owner := tupleElem2u64(tup[len(tup)-1]) 355 | ino := tupleElem2u64(tup[len(tup)-2]) 356 | err = txBreakLock(tx, fsName, clientId, ino, owner) 357 | if err != nil { 358 | return nil, err 359 | } 360 | } 361 | 362 | return nil, nil 363 | }) 364 | 365 | if err != nil { 366 | return err 367 | } 368 | 369 | } 370 | 371 | // Finally we can remove the client. 372 | _, err = db.Transact(func(tx fdb.Transaction) (interface{}, error) { 373 | tx.Clear(tuple.Tuple{"hafs", fsName, "clients", clientId}) 374 | tx.ClearRange(tuple.Tuple{"hafs", fsName, "client", clientId}) 375 | return nil, nil 376 | }) 377 | if err != nil { 378 | return err 379 | } 380 | 381 | return nil 382 | } 383 | 384 | type EvictExpiredClientsOptions struct { 385 | ClientExpiry time.Duration 386 | OnEviction func(string) 387 | } 388 | 389 | func EvictExpiredClients(db fdb.Database, fsName string, opts EvictExpiredClientsOptions) (uint64, error) { 390 | 391 | nEvicted := uint64(0) 392 | 393 | iterBegin, iterEnd := tuple.Tuple{"hafs", fsName, "clients"}.FDBRangeKeys() 394 | 395 | iterRange := fdb.KeyRange{ 396 | Begin: iterBegin, 397 | End: iterEnd, 398 | } 399 | 400 | for { 401 | v, err := db.Transact(func(tx fdb.Transaction) (interface{}, error) { 402 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() == nil { 403 | return nil, ErrNotFormatted 404 | } 405 | kvs := tx.GetRange(iterRange, fdb.RangeOptions{ 406 | Limit: 100, 407 | }).GetSliceOrPanic() 408 | return kvs, nil 409 | }) 410 | if err != nil { 411 | return nEvicted, err 412 | } 413 | 414 | kvs := v.([]fdb.KeyValue) 415 | 416 | if len(kvs) == 0 { 417 | break 418 | } 419 | 420 | nextBegin, err := fdb.Strinc(kvs[len(kvs)-1].Key) 421 | if err != nil { 422 | return nEvicted, err 423 | } 424 | iterRange.Begin = fdb.Key(nextBegin) 425 | 426 | for _, kv := range kvs { 427 | tup, err := tuple.Unpack(kv.Key) 428 | if err != nil { 429 | return nEvicted, err 430 | } 431 | 432 | if len(tup) < 1 { 433 | return nEvicted, errors.New("corrupt client key") 434 | } 435 | 436 | clientId := tup[len(tup)-1].(string) 437 | 438 | shouldEvict, err := IsClientTimedOut(db, fsName, clientId, opts.ClientExpiry) 439 | if err != nil { 440 | return nEvicted, err 441 | } 442 | 443 | if !shouldEvict { 444 | continue 445 | } 446 | 447 | err = EvictClient(db, fsName, clientId) 448 | if err != nil { 449 | return nEvicted, err 450 | } 451 | if opts.OnEviction != nil { 452 | opts.OnEviction(clientId) 453 | } 454 | 455 | nEvicted += 1 456 | } 457 | } 458 | 459 | return nEvicted, nil 460 | } 461 | 462 | func GetObjectStorageSpec(db fdb.Database, fsName string) (string, error) { 463 | v, err := db.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 464 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() == nil { 465 | return nil, ErrNotFormatted 466 | } 467 | v := tx.Get(tuple.Tuple{"hafs", fsName, "object-storage"}).MustGet() 468 | return string(v), nil 469 | }) 470 | if err != nil { 471 | return "", err 472 | } 473 | return v.(string), nil 474 | } 475 | 476 | type SetObjectStorageSpecOpts struct { 477 | Force bool 478 | } 479 | 480 | func SetObjectStorageSpec(db fdb.Database, fsName string, objectStorageSpec string, opts SetObjectStorageSpecOpts) error { 481 | 482 | objectStorage, err := NewObjectStorageEngine(objectStorageSpec) 483 | if err != nil { 484 | return fmt.Errorf("unable to validate storage engine spec: %w", err) 485 | } 486 | defer objectStorage.Close() 487 | 488 | _, err = db.Transact(func(tx fdb.Transaction) (interface{}, error) { 489 | if tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() == nil { 490 | return nil, ErrNotFormatted 491 | } 492 | kvs := tx.GetRange(tuple.Tuple{"hafs", fsName, "clients"}, fdb.RangeOptions{ 493 | Limit: 1, 494 | }).GetSliceOrPanic() 495 | if len(kvs) != 0 && !opts.Force { 496 | return nil, fmt.Errorf("unable to set object storage with active clients without the 'force' option") 497 | } 498 | tx.Set(tuple.Tuple{"hafs", fsName, "object-storage"}, []byte(objectStorageSpec)) 499 | return nil, nil 500 | }) 501 | if err != nil { 502 | return err 503 | } 504 | 505 | return nil 506 | } 507 | -------------------------------------------------------------------------------- /admin_test.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "os" 5 | "reflect" 6 | "testing" 7 | "time" 8 | 9 | "github.com/andrewchambers/hafs/testutil" 10 | "github.com/apple/foundationdb/bindings/go/src/fdb" 11 | ) 12 | 13 | func tmpDB(t *testing.T) fdb.Database { 14 | db := testutil.NewFDBTestServer(t).Dial() 15 | err := Mkfs(db, "testfs", MkfsOpts{Overwrite: false}) 16 | if err != nil { 17 | t.Fatal(err) 18 | } 19 | return db 20 | } 21 | 22 | func TestClientTimedOut(t *testing.T) { 23 | t.Parallel() 24 | db := testutil.NewFDBTestServer(t).Dial() 25 | 26 | err := Mkfs(db, "testfs", MkfsOpts{}) 27 | if err != nil { 28 | t.Fatal(err) 29 | } 30 | 31 | fs, err := Attach(db, "testfs", AttachOpts{}) 32 | if err != nil { 33 | t.Fatal(err) 34 | } 35 | 36 | expired, err := IsClientTimedOut(db, "testfs", fs.clientId, time.Duration(5*time.Second)) 37 | if err != nil { 38 | t.Fatal(err) 39 | } 40 | if expired { 41 | t.Fatal("expected not expired") 42 | } 43 | 44 | time.Sleep(1 * time.Second) 45 | 46 | expired, err = IsClientTimedOut(db, "testfs", fs.clientId, time.Duration(0)) 47 | if err != nil { 48 | t.Fatal(err) 49 | } 50 | if !expired { 51 | t.Fatal("expected expired") 52 | } 53 | } 54 | 55 | func TestClientInfo(t *testing.T) { 56 | t.Parallel() 57 | db := testutil.NewFDBTestServer(t).Dial() 58 | 59 | err := Mkfs(db, "testfs", MkfsOpts{Overwrite: false}) 60 | if err != nil { 61 | t.Fatal(err) 62 | } 63 | 64 | fs, err := Attach(db, "testfs", AttachOpts{}) 65 | if err != nil { 66 | t.Fatal(err) 67 | } 68 | 69 | info, ok, err := GetClientInfo(db, "testfs", fs.clientId) 70 | if err != nil { 71 | t.Fatal(err) 72 | } 73 | if !ok { 74 | t.Fatal("client missing") 75 | } 76 | 77 | if info.Pid != int64(os.Getpid()) { 78 | t.Fatalf("%v", info) 79 | } 80 | 81 | clients, err := ListClients(db, "testfs") 82 | if err != nil { 83 | t.Fatal(err) 84 | } 85 | if len(clients) != 1 { 86 | t.Fatal("unexpected number of clients") 87 | } 88 | 89 | if clients[0] != info { 90 | t.Fatal("client info differs from expected") 91 | } 92 | } 93 | 94 | func TestListFilesystems(t *testing.T) { 95 | t.Parallel() 96 | db := testutil.NewFDBTestServer(t).Dial() 97 | 98 | for _, name := range []string{"myfs", "zzz"} { 99 | err := Mkfs(db, name, MkfsOpts{}) 100 | if err != nil { 101 | t.Fatal(err) 102 | } 103 | } 104 | 105 | filesystems, err := ListFilesystems(db) 106 | if err != nil { 107 | t.Fatal(err) 108 | } 109 | 110 | if !reflect.DeepEqual(filesystems, []string{"myfs", "zzz"}) { 111 | t.Fatalf("unexpected filesystem list") 112 | } 113 | 114 | } 115 | 116 | func TestEvictClient(t *testing.T) { 117 | t.Parallel() 118 | db := testutil.NewFDBTestServer(t).Dial() 119 | 120 | err := Mkfs(db, "testfs", MkfsOpts{}) 121 | if err != nil { 122 | t.Fatal(err) 123 | } 124 | 125 | fs1, err := Attach(db, "testfs", AttachOpts{}) 126 | if err != nil { 127 | t.Fatal(err) 128 | } 129 | defer fs1.Close() 130 | fs2, err := Attach(db, "testfs", AttachOpts{}) 131 | if err != nil { 132 | t.Fatal(err) 133 | } 134 | defer fs2.Close() 135 | 136 | stat, err := fs1.Mknod(ROOT_INO, "f", MknodOpts{ 137 | Mode: S_IFREG | 0o777, 138 | Uid: 0, 139 | Gid: 0, 140 | }) 141 | 142 | ok, err := fs1.TrySetLock(stat.Ino, SetLockOpts{ 143 | Typ: LOCK_SHARED, 144 | Owner: 1, 145 | }) 146 | if err != nil { 147 | t.Fatal(err) 148 | } 149 | if !ok { 150 | t.Fatal() 151 | } 152 | 153 | err = EvictClient(db, "testfs", fs1.clientId) 154 | if err != nil { 155 | t.Fatal(err) 156 | } 157 | 158 | ok, err = fs2.TrySetLock(stat.Ino, SetLockOpts{ 159 | Typ: LOCK_EXCLUSIVE, 160 | Owner: 1, 161 | }) 162 | if err != nil { 163 | t.Fatal(err) 164 | } 165 | if !ok { 166 | t.Fatal() 167 | } 168 | 169 | _, err = fs1.Mknod(ROOT_INO, "f2", MknodOpts{ 170 | Mode: S_IFREG | 0o777, 171 | Uid: 0, 172 | Gid: 0, 173 | }) 174 | if err != ErrDetached { 175 | t.Fatal(err) 176 | } 177 | 178 | } 179 | -------------------------------------------------------------------------------- /atomic.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "sync/atomic" 5 | ) 6 | 7 | type atomicBool struct { 8 | v uint32 9 | } 10 | 11 | func (b *atomicBool) Store(v bool) { 12 | if v { 13 | atomic.StoreUint32(&b.v, 1) 14 | } else { 15 | atomic.StoreUint32(&b.v, 0) 16 | } 17 | } 18 | 19 | func (b *atomicBool) Load() bool { 20 | return atomic.LoadUint32(&b.v) == 1 21 | } 22 | 23 | type atomicUint64 struct { 24 | v uint64 25 | } 26 | 27 | func (b *atomicUint64) Add(n uint64) uint64 { 28 | return atomic.AddUint64(&b.v, n) 29 | } 30 | 31 | func (b *atomicUint64) Load() uint64 { 32 | return atomic.LoadUint64(&b.v) 33 | } 34 | -------------------------------------------------------------------------------- /cli/cli.go: -------------------------------------------------------------------------------- 1 | package cli 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | "os/signal" 8 | 9 | "github.com/andrewchambers/hafs" 10 | "github.com/apple/foundationdb/bindings/go/src/fdb" 11 | "golang.org/x/sys/unix" 12 | ) 13 | 14 | var FsName string 15 | var ClientDescription string 16 | var ClusterFile string 17 | var SmallObjectOptimizationThreshold uint64 = 1024 * 1024 * 4 18 | 19 | func RegisterClusterFileFlag() { 20 | defaultClusterFile := os.Getenv("FDB_CLUSTER_FILE") 21 | if defaultClusterFile == "" { 22 | defaultClusterFile = "./fdb.cluster" 23 | _, err := os.Stat("./fdb.cluster") 24 | if err != nil { 25 | defaultClusterFile = "/etc/foundationdb/fdb.cluster" 26 | } 27 | } 28 | 29 | flag.StringVar( 30 | &ClusterFile, 31 | "cluster-file", 32 | defaultClusterFile, 33 | "FoundationDB cluster file, defaults to FDB_CLUSTER_FILE if set, ./fdb.cluster if present, otherwise /etc/foundationdb/fdb.cluster", 34 | ) 35 | } 36 | 37 | func RegisterClientDescriptionFlag() { 38 | flag.StringVar( 39 | &ClientDescription, 40 | "client-description", 41 | "", 42 | "Optional decription of this fs client.", 43 | ) 44 | } 45 | 46 | func RegisterFsNameFlag() { 47 | flag.StringVar( 48 | &FsName, 49 | "fs-name", 50 | "", 51 | "Name of the filesystem to interact with.", 52 | ) 53 | } 54 | 55 | func RegisterSmallObjectOptimizationThresholdFlag() { 56 | flag.Uint64Var( 57 | &SmallObjectOptimizationThreshold, 58 | "small-object-optimization-threshold", 59 | SmallObjectOptimizationThreshold, 60 | "External object storage files smaller than this size in bytes are loaded and served from ram instead of streamed.", 61 | ) 62 | } 63 | 64 | func RegisterFsSignalHandlers(fs *hafs.Fs) { 65 | sigChan := make(chan os.Signal, 1) 66 | signal.Notify(sigChan, unix.SIGINT, unix.SIGTERM) 67 | 68 | go func() { 69 | <-sigChan 70 | signal.Reset() 71 | fmt.Fprintf(os.Stderr, "closing down due to signal...\n") 72 | err := fs.Close() 73 | if err != nil { 74 | fmt.Fprintf(os.Stderr, "error disconnecting client: %s\n", err) 75 | os.Exit(1) 76 | } 77 | os.Exit(0) 78 | }() 79 | } 80 | 81 | func MustOpenDatabase() fdb.Database { 82 | db, err := fdb.OpenDatabase(ClusterFile) 83 | if err != nil { 84 | fmt.Fprintf(os.Stderr, "unable to open database: %s\n", err) 85 | os.Exit(1) 86 | } 87 | return db 88 | } 89 | 90 | func MustAttach(db fdb.Database) *hafs.Fs { 91 | fs, err := hafs.Attach(db, FsName, hafs.AttachOpts{ 92 | ClientDescription: ClientDescription, 93 | SmallObjectOptimizationThreshold: SmallObjectOptimizationThreshold, 94 | OnEviction: func(fs *hafs.Fs) { 95 | fmt.Fprintf(os.Stderr, "client evicted, aborting...\n") 96 | os.Exit(1) 97 | }, 98 | }) 99 | if err != nil { 100 | fmt.Fprintf(os.Stderr, "unable to connect to filesystem: %s\n", err) 101 | os.Exit(1) 102 | } 103 | return fs 104 | } 105 | -------------------------------------------------------------------------------- /cmd/hafs-evict/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | 8 | "github.com/andrewchambers/hafs" 9 | "github.com/andrewchambers/hafs/cli" 10 | ) 11 | 12 | func main() { 13 | cli.RegisterClusterFileFlag() 14 | cli.RegisterFsNameFlag() 15 | flag.Parse() 16 | db := cli.MustOpenDatabase() 17 | 18 | args := flag.Args() 19 | if len(args) != 1 { 20 | fmt.Fprintf(os.Stderr, "expecting a single client to evict\n") 21 | os.Exit(1) 22 | } 23 | 24 | err := hafs.EvictClient(db, cli.FsName, args[0]) 25 | if err != nil { 26 | fmt.Fprintf(os.Stderr, "error evicting client: %s\n", err) 27 | os.Exit(1) 28 | } 29 | 30 | } 31 | -------------------------------------------------------------------------------- /cmd/hafs-fuse/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "log" 7 | "os" 8 | "os/exec" 9 | "time" 10 | 11 | "github.com/andrewchambers/hafs" 12 | "github.com/andrewchambers/hafs/cli" 13 | "github.com/hanwen/go-fuse/v2/fuse" 14 | ) 15 | 16 | func usage() { 17 | fmt.Printf("hafs-fuse [OPTS] MOUNTPOINT\n\n") 18 | flag.Usage() 19 | os.Exit(1) 20 | } 21 | 22 | func main() { 23 | cli.RegisterClusterFileFlag() 24 | cli.RegisterClientDescriptionFlag() 25 | cli.RegisterFsNameFlag() 26 | cli.RegisterSmallObjectOptimizationThresholdFlag() 27 | debugFuse := flag.Bool("debug-fuse", false, "Log fuse messages.") 28 | readdirPlus := flag.Bool("readdir-plus", false, "Enable readdir plus when listing directories (stat and readdir calls are batched together).") 29 | gcUnlinkedInterval := flag.Duration("gc-unlinked-interval", 8*time.Hour, "Unlinked inode garbage collection interval (0 to disable).") 30 | unlinkRemovalDelay := flag.Duration("unlink-removal-delay", 15*time.Minute, "Grace period for removal of unlinked files.") 31 | gcClientInterval := flag.Duration("gc-clients-interval", 24*time.Hour, "Client eviction interval (0 to disable).") 32 | clientExpiry := flag.Duration("client-expiry", 15*time.Minute, "Period of inactivity before a client is considered expired.") 33 | cacheDentries := flag.Duration("cache-dentries", 0, "Duration to cache dentry lookups, use with great care.") 34 | cacheAttributes := flag.Duration("cache-attributes", 0, "Duration to cache file attribute lookups, use with great care.") 35 | notifyCommand := flag.String("notify-command", "", "A command to run via sh -c \"$CMD\" once filesystem is successfully mounted.") 36 | 37 | flag.Parse() 38 | 39 | if len(flag.Args()) != 1 { 40 | usage() 41 | } 42 | 43 | mntDir := flag.Args()[0] 44 | 45 | db := cli.MustOpenDatabase() 46 | fs := cli.MustAttach(db) 47 | defer fs.Close() 48 | 49 | cli.RegisterFsSignalHandlers(fs) 50 | 51 | server, err := fuse.NewServer( 52 | hafs.NewFuseFs(fs, hafs.HafsFuseOptions{ 53 | CacheDentries: *cacheDentries, 54 | CacheAttributes: *cacheAttributes, 55 | }), 56 | mntDir, 57 | &fuse.MountOptions{ 58 | Name: "hafs", 59 | Options: []string{}, 60 | AllowOther: false, // XXX option? 61 | EnableLocks: true, 62 | IgnoreSecurityLabels: true, // option? 63 | Debug: *debugFuse, 64 | DisableReadDirPlus: !*readdirPlus, 65 | MaxWrite: fuse.MAX_KERNEL_WRITE, 66 | MaxReadAhead: fuse.MAX_KERNEL_WRITE, // XXX Use the max write as a guide for now, is this good? 67 | }) 68 | if err != nil { 69 | fmt.Fprintf(os.Stderr, "unable to create fuse server: %s\n", err) 70 | os.Exit(1) 71 | } 72 | 73 | go server.Serve() 74 | 75 | err = server.WaitMount() 76 | if err != nil { 77 | fmt.Fprintf(os.Stderr, "unable wait for mount: %s\n", err) 78 | os.Exit(1) 79 | } 80 | log.Printf("filesystem successfully mounted") 81 | 82 | if *gcUnlinkedInterval != 0 { 83 | go func() { 84 | for { 85 | time.Sleep(*gcUnlinkedInterval) 86 | log.Printf("starting garbage collection of unlinked inodes") 87 | nRemoved, err := fs.RemoveExpiredUnlinked(hafs.RemoveExpiredUnlinkedOptions{ 88 | RemovalDelay: *unlinkRemovalDelay, 89 | }) 90 | log.Printf("garbage collection removed %d unlinked inodes", nRemoved) 91 | if err != nil { 92 | log.Printf("error removing unlinked inodes: %s", err) 93 | } 94 | } 95 | }() 96 | } 97 | 98 | if *gcClientInterval != 0 { 99 | go func() { 100 | for { 101 | time.Sleep(*gcClientInterval) 102 | log.Printf("starting garbage collection of expired clients") 103 | nEvicted, err := hafs.EvictExpiredClients(db, cli.FsName, hafs.EvictExpiredClientsOptions{ 104 | ClientExpiry: *clientExpiry, 105 | }) 106 | log.Printf("garbage collection evicted %d expired clients", nEvicted) 107 | if err != nil { 108 | log.Printf("error evicting expired clients: %s", err) 109 | } 110 | } 111 | }() 112 | } 113 | 114 | if *notifyCommand != "" { 115 | cmdOut, err := exec.Command("sh", "-c", *notifyCommand).CombinedOutput() 116 | if err != nil { 117 | log.Fatalf("error running notify command: %s, output: %q", err, string(cmdOut)) 118 | } 119 | } 120 | 121 | // Serve the file system, until unmounted by calling fusermount -u 122 | server.Wait() 123 | } 124 | -------------------------------------------------------------------------------- /cmd/hafs-gc-clients/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "log" 7 | "os" 8 | "time" 9 | 10 | "github.com/andrewchambers/hafs" 11 | "github.com/andrewchambers/hafs/cli" 12 | ) 13 | 14 | func main() { 15 | verbose := flag.Bool("verbose", false, "Be verbose.") 16 | clientExpiry := flag.Duration("client-expiry", 15*time.Minute, "Period of inactivity before a client is considered expired.") 17 | cli.RegisterClusterFileFlag() 18 | cli.RegisterFsNameFlag() 19 | flag.Parse() 20 | db := cli.MustOpenDatabase() 21 | 22 | nEvicted, err := hafs.EvictExpiredClients(db, cli.FsName, hafs.EvictExpiredClientsOptions{ 23 | ClientExpiry: *clientExpiry, 24 | OnEviction: func(clientId string) { 25 | if *verbose { 26 | log.Printf("evicting client %s", clientId) 27 | } 28 | }, 29 | }) 30 | if err != nil { 31 | fmt.Fprintf(os.Stderr, "error removing evicting expired clients: %s\n", err) 32 | os.Exit(1) 33 | } 34 | 35 | log.Printf("evicted %d expired clients\n", nEvicted) 36 | } 37 | -------------------------------------------------------------------------------- /cmd/hafs-gc-unlinked/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "log" 7 | "os" 8 | "time" 9 | 10 | "github.com/andrewchambers/hafs" 11 | "github.com/andrewchambers/hafs/cli" 12 | ) 13 | 14 | func main() { 15 | verbose := flag.Bool("verbose", false, "Be verbose.") 16 | unlinkRemovalDelay := flag.Duration("unlink-removal-delay", 15*time.Minute, "Grace period for removal of unlinked files.") 17 | cli.RegisterClusterFileFlag() 18 | cli.RegisterFsNameFlag() 19 | flag.Parse() 20 | fs := cli.MustAttach(cli.MustOpenDatabase()) 21 | defer fs.Close() 22 | 23 | cli.RegisterFsSignalHandlers(fs) 24 | 25 | nRemoved, err := fs.RemoveExpiredUnlinked(hafs.RemoveExpiredUnlinkedOptions{ 26 | RemovalDelay: *unlinkRemovalDelay, 27 | OnRemoval: func(stat *hafs.Stat) { 28 | if *verbose { 29 | log.Printf("removing inode %d", stat.Ino) 30 | } 31 | }, 32 | }) 33 | if err != nil { 34 | fmt.Fprintf(os.Stderr, "error removing unlinked inodes: %s\n", err) 35 | os.Exit(1) 36 | } 37 | 38 | log.Printf("removed %d unlinked inodes\n", nRemoved) 39 | } 40 | -------------------------------------------------------------------------------- /cmd/hafs-list-clients/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | "sort" 8 | "time" 9 | 10 | "github.com/andrewchambers/hafs" 11 | "github.com/andrewchambers/hafs/cli" 12 | "github.com/cheynewallace/tabby" 13 | ) 14 | 15 | func main() { 16 | cli.RegisterClusterFileFlag() 17 | cli.RegisterFsNameFlag() 18 | flag.Parse() 19 | db := cli.MustOpenDatabase() 20 | 21 | clients, err := hafs.ListClients(db, cli.FsName) 22 | if err != nil { 23 | fmt.Fprintf(os.Stderr, "error listing clients: %s\n", err) 24 | os.Exit(1) 25 | } 26 | 27 | sort.Slice(clients, func(i, j int) bool { return clients[i].AttachTimeUnix > clients[j].AttachTimeUnix }) 28 | 29 | t := tabby.New() 30 | t.AddHeader("ID", "DESCRIPTION", "HOSTNAME", "PID", "ATTACHED", "HEARTBEAT") 31 | for _, info := range clients { 32 | t.AddLine( 33 | info.Id, 34 | info.Description, 35 | info.Hostname, 36 | fmt.Sprintf("%d", info.Pid), 37 | time.Unix(int64(info.AttachTimeUnix), 0).Format(time.Stamp), 38 | time.Now().Sub(time.Unix(int64(info.HeartBeatUnix), 0)).Round(time.Second).String()+" ago", 39 | ) 40 | } 41 | t.Print() 42 | } 43 | -------------------------------------------------------------------------------- /cmd/hafs-list-filesystems/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | 8 | "github.com/andrewchambers/hafs" 9 | "github.com/andrewchambers/hafs/cli" 10 | ) 11 | 12 | func main() { 13 | cli.RegisterClusterFileFlag() 14 | flag.Parse() 15 | db := cli.MustOpenDatabase() 16 | filesystems, err := hafs.ListFilesystems(db) 17 | if err != nil { 18 | fmt.Fprintf(os.Stderr, "unable to create filesystem: %s\n", err) 19 | os.Exit(1) 20 | } 21 | for _, fs := range filesystems { 22 | _, _ = fmt.Println(fs) 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /cmd/hafs-mkfs/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | 8 | "github.com/andrewchambers/hafs" 9 | "github.com/andrewchambers/hafs/cli" 10 | ) 11 | 12 | func main() { 13 | overwrite := flag.Bool("overwrite", false, "Overwrite any existing filesystem (does not free storage objects).") 14 | cli.RegisterClusterFileFlag() 15 | cli.RegisterFsNameFlag() 16 | flag.Parse() 17 | db := cli.MustOpenDatabase() 18 | err := hafs.Mkfs(db, cli.FsName, hafs.MkfsOpts{ 19 | Overwrite: *overwrite, 20 | }) 21 | if err != nil { 22 | fmt.Fprintf(os.Stderr, "unable to create filesystem: %s\n", err) 23 | os.Exit(1) 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /cmd/hafs-object-storage/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | 8 | "github.com/andrewchambers/hafs" 9 | "github.com/andrewchambers/hafs/cli" 10 | ) 11 | 12 | func main() { 13 | unset := flag.Bool("unset", false, "Unset the object storage config, disabling object storage.") 14 | set := flag.String("set", "", "The storage specfication specification.") 15 | force := flag.Bool("force", false, "Force the update even when there are active clients using the old specification.") 16 | cli.RegisterClusterFileFlag() 17 | cli.RegisterFsNameFlag() 18 | flag.Parse() 19 | 20 | db := cli.MustOpenDatabase() 21 | 22 | if *set == "" && !*unset { 23 | objectStorageSpec, err := hafs.GetObjectStorageSpec(db, cli.FsName) 24 | if err != nil { 25 | fmt.Fprintf(os.Stderr, "unable to get object storage specification: %s\n", err) 26 | os.Exit(1) 27 | } 28 | _, _ = fmt.Printf("%s\n", objectStorageSpec) 29 | os.Exit(0) 30 | } 31 | 32 | if *unset { 33 | if *set != "" { 34 | fmt.Fprintf(os.Stderr, "unable to use -unset and -set at the same time.\n") 35 | os.Exit(1) 36 | } 37 | } 38 | 39 | err := hafs.SetObjectStorageSpec(db, cli.FsName, *set, hafs.SetObjectStorageSpecOpts{ 40 | Force: *force, 41 | }) 42 | if err != nil { 43 | fmt.Fprintf(os.Stderr, "unable to set object storage specfication: %s\n", err) 44 | os.Exit(1) 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /cmd/hafs-rmfs/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "os" 7 | 8 | "github.com/andrewchambers/hafs" 9 | "github.com/andrewchambers/hafs/cli" 10 | ) 11 | 12 | func main() { 13 | force := flag.Bool("force", false, "Remove filesystem without regard for cleanup.") 14 | cli.RegisterClusterFileFlag() 15 | cli.RegisterFsNameFlag() 16 | flag.Parse() 17 | db := cli.MustOpenDatabase() 18 | ok, err := hafs.Rmfs(db, cli.FsName, hafs.RmfsOpts{ 19 | Force: *force, 20 | }) 21 | if err != nil { 22 | fmt.Fprintf(os.Stderr, "unable to remove filesystem: %s\n", err) 23 | os.Exit(1) 24 | } 25 | if !ok { 26 | os.Exit(2) 27 | } 28 | } 29 | -------------------------------------------------------------------------------- /fs.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "bytes" 5 | "context" 6 | cryptorand "crypto/rand" 7 | "encoding/binary" 8 | "encoding/hex" 9 | "encoding/json" 10 | "errors" 11 | "fmt" 12 | "io" 13 | "log" 14 | "math/bits" 15 | "os" 16 | "strings" 17 | "sync" 18 | "time" 19 | 20 | "github.com/apple/foundationdb/bindings/go/src/fdb" 21 | "github.com/apple/foundationdb/bindings/go/src/fdb/tuple" 22 | "github.com/detailyang/fastrand-go" 23 | "github.com/valyala/fastjson" 24 | "golang.org/x/sync/errgroup" 25 | "golang.org/x/sys/unix" 26 | ) 27 | 28 | var ( 29 | ErrNotExist = unix.ENOENT 30 | ErrExist = unix.EEXIST 31 | ErrNotEmpty = unix.ENOTEMPTY 32 | ErrNotDir = unix.ENOTDIR 33 | ErrInvalid = unix.EINVAL 34 | ErrNotSupported = unix.ENOTSUP 35 | ErrPermission = unix.EPERM 36 | ErrIntr = unix.EINTR 37 | ErrNameTooLong = unix.ENAMETOOLONG 38 | ErrNotFormatted = errors.New("filesystem is not formatted") 39 | ErrDetached = errors.New("filesystem detached") 40 | ) 41 | 42 | const ( 43 | NAME_MAX = 4096 44 | CURRENT_SCHEMA_VERSION = 1 45 | ROOT_INO = 1 46 | CHUNK_SIZE = 4096 47 | ) 48 | 49 | const ( 50 | S_IFIFO uint32 = unix.S_IFIFO 51 | S_IFCHR uint32 = unix.S_IFCHR 52 | S_IFBLK uint32 = unix.S_IFBLK 53 | S_IFDIR uint32 = unix.S_IFDIR 54 | S_IFREG uint32 = unix.S_IFREG 55 | S_IFLNK uint32 = unix.S_IFLNK 56 | S_IFSOCK uint32 = unix.S_IFSOCK 57 | S_IFMT uint32 = unix.S_IFMT 58 | ) 59 | 60 | const ( 61 | FLAG_SUBVOLUME uint64 = 1 << iota 62 | FLAG_OBJECT_STORAGE 63 | ) 64 | 65 | type DirEnt struct { 66 | Name string 67 | Mode uint32 68 | Ino uint64 69 | } 70 | 71 | func (e *DirEnt) MarshalBinary() ([]byte, error) { 72 | if S_IFMT != 0xf000 { 73 | // This check should be removed by the compiler. 74 | panic("encoding assumption violated") 75 | } 76 | bufsz := 2 * binary.MaxVarintLen64 77 | buf := make([]byte, bufsz, bufsz) 78 | b := buf 79 | b = b[binary.PutUvarint(b, uint64(e.Mode)>>12):] 80 | b = b[binary.PutUvarint(b, e.Ino):] 81 | return buf[:len(buf)-len(b)], nil 82 | } 83 | 84 | func (e *DirEnt) UnmarshalBinary(buf []byte) error { 85 | r := bytes.NewReader(buf) 86 | mode, _ := binary.ReadUvarint(r) 87 | e.Mode = uint32(mode << 12) 88 | e.Ino, _ = binary.ReadUvarint(r) 89 | return nil 90 | } 91 | 92 | type Stat struct { 93 | Ino uint64 94 | Subvolume uint64 95 | Flags uint64 96 | Size uint64 97 | Atimesec uint64 98 | Mtimesec uint64 99 | Ctimesec uint64 100 | Atimensec uint32 101 | Mtimensec uint32 102 | Ctimensec uint32 103 | Mode uint32 104 | Nlink uint32 105 | Uid uint32 106 | Gid uint32 107 | Rdev uint32 108 | } 109 | 110 | func (s *Stat) MarshalBinary() ([]byte, error) { 111 | bufsz := 14 * binary.MaxVarintLen64 112 | buf := make([]byte, bufsz, bufsz) 113 | b := buf 114 | b = b[binary.PutUvarint(b, s.Subvolume):] 115 | b = b[binary.PutUvarint(b, s.Flags):] 116 | b = b[binary.PutUvarint(b, s.Size):] 117 | b = b[binary.PutUvarint(b, s.Atimesec):] 118 | b = b[binary.PutUvarint(b, s.Mtimesec):] 119 | b = b[binary.PutUvarint(b, s.Ctimesec):] 120 | b = b[binary.PutUvarint(b, uint64(s.Atimensec)):] 121 | b = b[binary.PutUvarint(b, uint64(s.Mtimensec)):] 122 | b = b[binary.PutUvarint(b, uint64(s.Ctimensec)):] 123 | b = b[binary.PutUvarint(b, uint64(s.Mode)):] 124 | b = b[binary.PutUvarint(b, uint64(s.Nlink)):] 125 | b = b[binary.PutUvarint(b, uint64(s.Uid)):] 126 | b = b[binary.PutUvarint(b, uint64(s.Gid)):] 127 | b = b[binary.PutUvarint(b, uint64(s.Rdev)):] 128 | return buf[:len(buf)-len(b)], nil 129 | } 130 | 131 | func (s *Stat) UnmarshalBinary(buf []byte) error { 132 | r := bytes.NewReader(buf) 133 | s.Subvolume, _ = binary.ReadUvarint(r) 134 | s.Flags, _ = binary.ReadUvarint(r) 135 | s.Size, _ = binary.ReadUvarint(r) 136 | s.Atimesec, _ = binary.ReadUvarint(r) 137 | s.Mtimesec, _ = binary.ReadUvarint(r) 138 | s.Ctimesec, _ = binary.ReadUvarint(r) 139 | v, _ := binary.ReadUvarint(r) 140 | s.Atimensec = uint32(v) 141 | v, _ = binary.ReadUvarint(r) 142 | s.Mtimensec = uint32(v) 143 | v, _ = binary.ReadUvarint(r) 144 | s.Ctimensec = uint32(v) 145 | v, _ = binary.ReadUvarint(r) 146 | s.Mode = uint32(v) 147 | v, _ = binary.ReadUvarint(r) 148 | s.Nlink = uint32(v) 149 | v, _ = binary.ReadUvarint(r) 150 | s.Uid = uint32(v) 151 | v, _ = binary.ReadUvarint(r) 152 | s.Gid = uint32(v) 153 | v, _ = binary.ReadUvarint(r) 154 | s.Rdev = uint32(v) 155 | return nil 156 | } 157 | 158 | func (stat *Stat) setTime(t time.Time, secs *uint64, nsecs *uint32) { 159 | *secs = uint64(t.UnixNano() / 1_000_000_000) 160 | *nsecs = uint32(t.UnixNano() % 1_000_000_000) 161 | } 162 | 163 | func (stat *Stat) SetMtime(t time.Time) { 164 | stat.setTime(t, &stat.Mtimesec, &stat.Mtimensec) 165 | } 166 | 167 | func (stat *Stat) SetAtime(t time.Time) { 168 | stat.setTime(t, &stat.Atimesec, &stat.Atimensec) 169 | } 170 | 171 | func (stat *Stat) SetCtime(t time.Time) { 172 | stat.setTime(t, &stat.Ctimesec, &stat.Ctimensec) 173 | } 174 | 175 | func (stat *Stat) Mtime() time.Time { 176 | return time.Unix(int64(stat.Mtimesec), int64(stat.Mtimensec)) 177 | } 178 | 179 | func (stat *Stat) Atime() time.Time { 180 | return time.Unix(int64(stat.Atimesec), int64(stat.Atimensec)) 181 | } 182 | 183 | func (stat *Stat) Ctime() time.Time { 184 | return time.Unix(int64(stat.Ctimesec), int64(stat.Ctimensec)) 185 | } 186 | 187 | type Fs struct { 188 | db fdb.Database 189 | fsName string 190 | clientId string 191 | onEviction func(fs *Fs) 192 | clientDetached atomicBool 193 | txCounter atomicUint64 194 | inoChan chan uint64 195 | relMtime time.Duration 196 | smallObjectThreshold uint64 197 | objectStorage ObjectStorageEngine 198 | 199 | workerWg *sync.WaitGroup 200 | cancelWorkers func() 201 | logf func(string, ...interface{}) 202 | } 203 | 204 | func ListFilesystems(db fdb.Database) ([]string, error) { 205 | 206 | filesystems := []string{} 207 | 208 | iterBegin, iterEnd := tuple.Tuple{"hafs"}.FDBRangeKeys() 209 | iterRange := fdb.KeyRange{ 210 | Begin: iterBegin, 211 | End: iterEnd, 212 | } 213 | 214 | v, err := db.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 215 | kvs := tx.GetRange(iterRange, fdb.RangeOptions{ 216 | Limit: 1, 217 | }).GetSliceOrPanic() 218 | return kvs, nil 219 | }) 220 | if err != nil { 221 | return nil, err 222 | } 223 | kvs := v.([]fdb.KeyValue) 224 | if len(kvs) == 0 { 225 | return filesystems, nil 226 | } 227 | 228 | curFsTup, err := tuple.Unpack(kvs[0].Key) 229 | if err != nil { 230 | return nil, err 231 | } 232 | 233 | filesystems = append(filesystems, curFsTup[1].(string)) 234 | 235 | for { 236 | _, iterRange.Begin = curFsTup[:2].FDBRangeKeys() 237 | 238 | v, err := db.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 239 | kvs := tx.GetRange(iterRange, fdb.RangeOptions{ 240 | Limit: 1, 241 | }).GetSliceOrPanic() 242 | return kvs, nil 243 | }) 244 | if err != nil { 245 | return nil, err 246 | } 247 | kvs := v.([]fdb.KeyValue) 248 | if len(kvs) == 0 { 249 | return filesystems, nil 250 | } 251 | 252 | curFsTup, err = tuple.Unpack(kvs[0].Key) 253 | if err != nil { 254 | return nil, err 255 | } 256 | 257 | filesystems = append(filesystems, curFsTup[1].(string)) 258 | } 259 | 260 | } 261 | 262 | type AttachOpts struct { 263 | ClientDescription string 264 | // External storage objects smaller than this are loaded into and 265 | // served from memory - this is purely an optimization. 266 | SmallObjectOptimizationThreshold uint64 267 | OnEviction func(fs *Fs) 268 | Logf func(string, ...interface{}) 269 | RelMtime *time.Duration 270 | } 271 | 272 | func Attach(db fdb.Database, fsName string, opts AttachOpts) (*Fs, error) { 273 | 274 | if opts.RelMtime == nil { 275 | defaultRelMtime := 24 * time.Hour 276 | opts.RelMtime = &defaultRelMtime 277 | } 278 | 279 | if opts.Logf == nil { 280 | opts.Logf = log.Printf 281 | } 282 | 283 | hostname, _ := os.Hostname() 284 | exe, _ := os.Executable() 285 | 286 | if opts.ClientDescription == "" { 287 | if idx := strings.LastIndex(exe, "/"); idx != -1 { 288 | opts.ClientDescription = exe[idx+1:] 289 | } else { 290 | opts.ClientDescription = exe 291 | } 292 | } 293 | 294 | if opts.OnEviction == nil { 295 | opts.OnEviction = func(fs *Fs) {} 296 | } 297 | 298 | idBytes := [16]byte{} 299 | _, err := cryptorand.Read(idBytes[:]) 300 | if err != nil { 301 | return nil, err 302 | } 303 | clientId := hex.EncodeToString(idBytes[:]) 304 | 305 | now := time.Now() 306 | 307 | clientInfo := ClientInfo{ 308 | Pid: int64(os.Getpid()), 309 | Exe: exe, 310 | Description: opts.ClientDescription, 311 | Hostname: hostname, 312 | AttachTimeUnix: uint64(now.Unix()), 313 | } 314 | 315 | clientInfoBytes, err := json.Marshal(&clientInfo) 316 | if err != nil { 317 | return nil, err 318 | } 319 | 320 | initialHeartBeatBytes := [8]byte{} 321 | binary.LittleEndian.PutUint64(initialHeartBeatBytes[:], uint64(now.Unix())) 322 | objectStorageSpec := "" 323 | 324 | _, err = db.Transact(func(tx fdb.Transaction) (interface{}, error) { 325 | version := tx.Get(tuple.Tuple{"hafs", fsName, "version"}).MustGet() 326 | if version == nil { 327 | return nil, ErrNotFormatted 328 | } 329 | if !bytes.Equal(version, []byte{CURRENT_SCHEMA_VERSION}) { 330 | return nil, fmt.Errorf("filesystem has different version - expected %d but got %d", CURRENT_SCHEMA_VERSION, version[0]) 331 | } 332 | 333 | objectStorageSpec = string(tx.Get(tuple.Tuple{"hafs", fsName, "object-storage"}).MustGet()) 334 | 335 | tx.Set(tuple.Tuple{"hafs", fsName, "client", clientId, "info"}, clientInfoBytes) 336 | tx.Set(tuple.Tuple{"hafs", fsName, "client", clientId, "heartbeat"}, initialHeartBeatBytes[:]) 337 | tx.Set(tuple.Tuple{"hafs", fsName, "client", clientId, "attached"}, []byte{}) 338 | tx.Set(tuple.Tuple{"hafs", fsName, "clients", clientId}, []byte{}) 339 | return nil, nil 340 | }) 341 | if err != nil { 342 | return nil, fmt.Errorf("unable to add mount: %w", err) 343 | } 344 | 345 | objectStorage, err := NewObjectStorageEngine(objectStorageSpec) 346 | if err != nil { 347 | return nil, fmt.Errorf("unable to initialize the object storage engine: %w", err) 348 | } 349 | 350 | workerCtx, cancelWorkers := context.WithCancel(context.Background()) 351 | 352 | fs := &Fs{ 353 | db: db, 354 | fsName: fsName, 355 | onEviction: opts.OnEviction, 356 | logf: opts.Logf, 357 | relMtime: *opts.RelMtime, 358 | clientId: clientId, 359 | cancelWorkers: cancelWorkers, 360 | objectStorage: objectStorage, 361 | smallObjectThreshold: opts.SmallObjectOptimizationThreshold, 362 | workerWg: &sync.WaitGroup{}, 363 | inoChan: make(chan uint64, _INO_CHAN_SIZE), 364 | } 365 | 366 | fs.workerWg.Add(1) 367 | go func() { 368 | defer fs.workerWg.Done() 369 | fs.requestInosForever(workerCtx) 370 | }() 371 | 372 | fs.workerWg.Add(1) 373 | go func() { 374 | defer fs.workerWg.Done() 375 | fs.mountHeartBeatForever(workerCtx) 376 | }() 377 | 378 | return fs, nil 379 | } 380 | 381 | func reservedIno(ino uint64) bool { 382 | // XXX Why is this 4 bytes? 383 | const FUSE_UNKNOWN_INO = 0xFFFFFFFF 384 | // XXX We currently reserve this inode too pending an answer to https://github.com/hanwen/go-fuse/issues/439. 385 | const RESERVED_INO_1 = 0xFFFFFFFFFFFFFFFF 386 | return ino == FUSE_UNKNOWN_INO || ino == RESERVED_INO_1 || ino == ROOT_INO || ino == 0 387 | } 388 | 389 | func (fs *Fs) nextIno() (uint64, error) { 390 | // XXX This is a single bottleneck, could we shard or use atomics? 391 | for { 392 | ino, ok := <-fs.inoChan 393 | if !ok { 394 | return 0, ErrDetached 395 | } 396 | if !reservedIno(ino) { 397 | return ino, nil 398 | } 399 | } 400 | } 401 | 402 | // We try to allocate inodes in an order that helps foundationDB distribute 403 | // writes. We do this by stepping over the key space in fairly large 404 | // chunks and allocating inodes in their reverse bit order, this hopefully 405 | // lets inodes be written to servers in a nice spread over the keyspace. 406 | const ( 407 | _INO_STEP = 100271 // Large Prime, chosen so that ~100 bytes per stat * _INO_STEP covers a foundationdb range quickly to distribute load. 408 | _INO_CHAN_SIZE = _INO_STEP - 1 // The channel is big enough so the first step blocks until one it taken. 409 | ) 410 | 411 | func (fs *Fs) requestInosForever(ctx context.Context) { 412 | // Start with a small batch size, many clients never allocate an inode. 413 | inoBatchSize := uint64(_INO_STEP) 414 | inoCounterKey := tuple.Tuple{"hafs", fs.fsName, "inocntr"} 415 | defer close(fs.inoChan) 416 | for { 417 | v, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 418 | inoCounterBytes := tx.Get(inoCounterKey).MustGet() 419 | if len(inoCounterBytes) != 8 { 420 | panic("corrupt inode counter") 421 | } 422 | currentCount := binary.LittleEndian.Uint64(inoCounterBytes) 423 | if (currentCount % _INO_STEP) != 0 { 424 | // Realign the count with _INO_STEP if it has become out of sync. 425 | currentCount += _INO_STEP - (currentCount % _INO_STEP) 426 | } 427 | nextInoCount := currentCount + inoBatchSize 428 | if nextInoCount >= 0x7FFFFFFFFFFFFFFF { 429 | // Avoid overflow and other strange cases. 430 | panic("inodes exhausted") 431 | } 432 | binary.LittleEndian.PutUint64(inoCounterBytes, nextInoCount) 433 | tx.Set(inoCounterKey, inoCounterBytes) 434 | return currentCount, nil 435 | }) 436 | if err != nil { 437 | if errors.Is(err, ErrDetached) { 438 | return 439 | } 440 | fs.logf("unable to allocate inode batch: %s", err) 441 | time.Sleep(1 * time.Second) 442 | continue 443 | } 444 | inoBatchStart := v.(uint64) 445 | 446 | for i := uint64(0); i < _INO_STEP; i++ { 447 | for j := inoBatchStart; j < inoBatchStart+inoBatchSize; j += _INO_STEP { 448 | // Reverse the bits of the counter for better write load balancing. 449 | ino := bits.Reverse64(j + i) 450 | select { 451 | case fs.inoChan <- ino: 452 | default: 453 | select { 454 | case fs.inoChan <- ino: 455 | case <-ctx.Done(): 456 | return 457 | } 458 | } 459 | } 460 | } 461 | 462 | // Ramp up batch sizes. 463 | inoBatchSize *= 2 464 | const MAX_INO_BATCH = _INO_STEP * 64 465 | if inoBatchSize >= MAX_INO_BATCH { 466 | inoBatchSize = MAX_INO_BATCH 467 | } 468 | } 469 | } 470 | 471 | func (fs *Fs) mountHeartBeat() error { 472 | _, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 473 | heartBeatKey := tuple.Tuple{"hafs", fs.fsName, "client", fs.clientId, "heartbeat"} 474 | lastSeenBytes := [8]byte{} 475 | binary.LittleEndian.PutUint64(lastSeenBytes[:], uint64(time.Now().Unix())) 476 | tx.Set(heartBeatKey, lastSeenBytes[:]) 477 | return nil, nil 478 | }) 479 | return err 480 | } 481 | 482 | func (fs *Fs) mountHeartBeatForever(ctx context.Context) { 483 | ticker := time.NewTicker(5 * time.Minute) 484 | defer ticker.Stop() 485 | for { 486 | select { 487 | case <-ticker.C: 488 | err := fs.mountHeartBeat() 489 | if errors.Is(err, ErrDetached) { 490 | // Must be done in new goroutine to prevent deadlock. 491 | go fs.onEviction(fs) 492 | return 493 | } 494 | case <-ctx.Done(): 495 | return 496 | } 497 | } 498 | } 499 | 500 | func (fs *Fs) IsDetached() bool { 501 | return fs.clientDetached.Load() 502 | } 503 | 504 | func (fs *Fs) Close() error { 505 | fs.cancelWorkers() 506 | fs.workerWg.Wait() 507 | err := EvictClient(fs.db, fs.fsName, fs.clientId) 508 | return err 509 | } 510 | 511 | func (fs *Fs) ReadTransact(f func(tx fdb.ReadTransaction) (interface{}, error)) (interface{}, error) { 512 | attachKey := tuple.Tuple{"hafs", fs.fsName, "client", fs.clientId, "attached"} 513 | fWrapped := func(tx fdb.ReadTransaction) (interface{}, error) { 514 | attachCheck := tx.Get(attachKey) 515 | v, err := f(tx) 516 | if attachCheck.MustGet() == nil { 517 | return v, ErrDetached 518 | } 519 | return v, err 520 | } 521 | return fs.db.ReadTransact(fWrapped) 522 | } 523 | 524 | func (fs *Fs) Transact(f func(tx fdb.Transaction) (interface{}, error)) (interface{}, error) { 525 | attachKey := tuple.Tuple{"hafs", fs.fsName, "client", fs.clientId, "attached"} 526 | fWrapped := func(tx fdb.Transaction) (interface{}, error) { 527 | attachCheck := tx.Get(attachKey) 528 | v, err := f(tx) 529 | if attachCheck.MustGet() == nil { 530 | return v, ErrDetached 531 | } 532 | return v, err 533 | } 534 | return fs.db.Transact(fWrapped) 535 | } 536 | 537 | type futureStat struct { 538 | ino uint64 539 | bytes fdb.FutureByteSlice 540 | } 541 | 542 | func (fut futureStat) Get() (Stat, error) { 543 | stat := Stat{} 544 | statBytes := fut.bytes.MustGet() 545 | if statBytes == nil { 546 | return stat, ErrNotExist 547 | } 548 | err := stat.UnmarshalBinary(statBytes) 549 | if err != nil { 550 | return stat, err 551 | } 552 | stat.Ino = fut.ino 553 | return stat, nil 554 | } 555 | 556 | func (fs *Fs) txGetStat(tx fdb.ReadTransaction, ino uint64) futureStat { 557 | return futureStat{ 558 | ino: ino, 559 | bytes: tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "stat"}), 560 | } 561 | } 562 | 563 | func (fs *Fs) txSetStat(tx fdb.Transaction, stat Stat) { 564 | statBytes, err := stat.MarshalBinary() 565 | if err != nil { 566 | panic(err) 567 | } 568 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "stat"}, statBytes) 569 | } 570 | 571 | type futureGetDirEnt struct { 572 | name string 573 | bytes fdb.FutureByteSlice 574 | } 575 | 576 | func (fut futureGetDirEnt) Get() (DirEnt, error) { 577 | dirEntBytes := fut.bytes.MustGet() 578 | if dirEntBytes == nil { 579 | return DirEnt{}, ErrNotExist 580 | } 581 | dirEnt := DirEnt{} 582 | err := dirEnt.UnmarshalBinary(dirEntBytes) 583 | dirEnt.Name = fut.name 584 | return dirEnt, err 585 | } 586 | 587 | func (fs *Fs) txGetDirEnt(tx fdb.ReadTransaction, dirIno uint64, name string) futureGetDirEnt { 588 | return futureGetDirEnt{ 589 | name: name, 590 | bytes: tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", dirIno, "child", name}), 591 | } 592 | } 593 | 594 | func (fs *Fs) txSetDirEnt(tx fdb.Transaction, dirIno uint64, ent DirEnt) { 595 | dirEntBytes, err := ent.MarshalBinary() 596 | if err != nil { 597 | panic(err) 598 | } 599 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "ino", dirIno, "child", ent.Name}, dirEntBytes) 600 | } 601 | 602 | func (fs *Fs) GetDirEnt(dirIno uint64, name string) (DirEnt, error) { 603 | dirEnt, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 604 | dirEnt, err := fs.txGetDirEnt(tx, dirIno, name).Get() 605 | return dirEnt, err 606 | }) 607 | if err != nil { 608 | return DirEnt{}, err 609 | } 610 | return dirEnt.(DirEnt), nil 611 | } 612 | 613 | func (fs *Fs) txDirHasChildren(tx fdb.ReadTransaction, dirIno uint64) bool { 614 | kvs := tx.GetRange(tuple.Tuple{"hafs", fs.fsName, "ino", dirIno, "child"}, fdb.RangeOptions{ 615 | Limit: 1, 616 | }).GetSliceOrPanic() 617 | return len(kvs) != 0 618 | } 619 | 620 | type MknodOpts struct { 621 | Truncate bool 622 | Mode uint32 623 | Uid uint32 624 | Gid uint32 625 | Rdev uint32 626 | LinkTarget []byte 627 | } 628 | 629 | func (fs *Fs) txMknod(tx fdb.Transaction, dirIno uint64, name string, opts MknodOpts) (Stat, error) { 630 | 631 | if len(name) > NAME_MAX { 632 | return Stat{}, ErrNameTooLong 633 | } 634 | 635 | dirStatFut := fs.txGetStat(tx, dirIno) 636 | getDirEntFut := fs.txGetDirEnt(tx, dirIno, name) 637 | 638 | dirStat, err := dirStatFut.Get() 639 | if err != nil { 640 | return Stat{}, err 641 | } 642 | 643 | if dirStat.Nlink == 0 { 644 | return Stat{}, ErrNotExist 645 | } 646 | 647 | if dirStat.Mode&S_IFMT != S_IFDIR { 648 | return Stat{}, ErrNotDir 649 | } 650 | 651 | var stat Stat 652 | 653 | existingDirEnt, err := getDirEntFut.Get() 654 | if err == nil { 655 | if !opts.Truncate { 656 | return Stat{}, ErrExist 657 | } 658 | if existingDirEnt.Mode&S_IFMT != S_IFREG { 659 | return Stat{}, ErrInvalid 660 | } 661 | stat, err = fs.txModStat(tx, existingDirEnt.Ino, ModStatOpts{Valid: MODSTAT_SIZE, Size: 0}) 662 | if err != nil { 663 | return Stat{}, err 664 | } 665 | } else if err != ErrNotExist { 666 | return Stat{}, err 667 | } else { 668 | newIno, err := fs.nextIno() 669 | if err != nil { 670 | return Stat{}, err 671 | } 672 | stat = Stat{ 673 | Ino: newIno, 674 | Subvolume: dirStat.Subvolume, 675 | Flags: 0, 676 | Size: 0, 677 | Atimesec: 0, 678 | Mtimesec: 0, 679 | Ctimesec: 0, 680 | Atimensec: 0, 681 | Mtimensec: 0, 682 | Ctimensec: 0, 683 | Mode: opts.Mode, 684 | Nlink: 1, 685 | Uid: opts.Uid, 686 | Gid: opts.Gid, 687 | Rdev: opts.Rdev, 688 | } 689 | 690 | if dirStat.Flags&FLAG_SUBVOLUME != 0 { 691 | stat.Subvolume = dirStat.Ino 692 | } 693 | 694 | if opts.Mode&S_IFMT == S_IFREG { 695 | // Only files inherit storage from the parent directory. 696 | stat.Flags |= dirStat.Flags & FLAG_OBJECT_STORAGE 697 | } 698 | 699 | fs.txSubvolumeInodeDelta(tx, stat.Subvolume, 1) 700 | } 701 | 702 | now := time.Now() 703 | stat.SetMtime(now) 704 | stat.SetCtime(now) 705 | stat.SetAtime(now) 706 | fs.txSetStat(tx, stat) 707 | fs.txSetDirEnt(tx, dirIno, DirEnt{ 708 | Name: name, 709 | Mode: stat.Mode & S_IFMT, 710 | Ino: stat.Ino, 711 | }) 712 | 713 | if dirStat.Mtime().Before(now.Add(-fs.relMtime)) { 714 | dirStat.SetMtime(now) 715 | dirStat.SetAtime(now) 716 | fs.txSetStat(tx, dirStat) 717 | } 718 | 719 | if stat.Mode&S_IFMT == S_IFLNK { 720 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "target"}, opts.LinkTarget) 721 | } 722 | 723 | return stat, nil 724 | } 725 | 726 | func (fs *Fs) Mknod(dirIno uint64, name string, opts MknodOpts) (Stat, error) { 727 | stat, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 728 | stat, err := fs.txMknod(tx, dirIno, name, opts) 729 | return stat, err 730 | }) 731 | if err != nil { 732 | return Stat{}, err 733 | } 734 | return stat.(Stat), nil 735 | } 736 | 737 | func (fs *Fs) HardLink(dirIno, ino uint64, name string) (Stat, error) { 738 | stat, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 739 | 740 | dirStatFut := fs.txGetStat(tx, dirIno) 741 | dirEntFut := fs.txGetDirEnt(tx, dirIno, name) 742 | statFut := fs.txGetStat(tx, ino) 743 | 744 | dirStat, err := dirStatFut.Get() 745 | if err != nil { 746 | return Stat{}, err 747 | } 748 | if dirStat.Mode&S_IFMT != S_IFDIR { 749 | return Stat{}, ErrNotDir 750 | } 751 | 752 | stat, err := statFut.Get() 753 | if err != nil { 754 | return Stat{}, err 755 | } 756 | // Can't hardlink directories. 757 | if stat.Mode&S_IFMT == S_IFDIR { 758 | return Stat{}, ErrPermission 759 | } 760 | 761 | if stat.Nlink == 0 { 762 | // Don't resurrect inodes. 763 | return Stat{}, ErrInvalid 764 | } 765 | 766 | _, err = dirEntFut.Get() 767 | if err == nil { 768 | return Stat{}, ErrExist 769 | } 770 | if err != ErrNotExist { 771 | return Stat{}, err 772 | } 773 | 774 | if dirStat.Subvolume != stat.Subvolume { 775 | if dirStat.Flags&FLAG_SUBVOLUME == 0 { 776 | return Stat{}, ErrInvalid 777 | } 778 | if dirStat.Ino != stat.Subvolume { 779 | return Stat{}, ErrInvalid 780 | } 781 | } 782 | 783 | now := time.Now() 784 | 785 | stat.SetAtime(now) 786 | stat.SetCtime(now) 787 | 788 | stat.Nlink += 1 789 | if stat.Nlink == 0 { 790 | // Nlink overflow. 791 | return Stat{}, ErrInvalid 792 | } 793 | 794 | fs.txSetStat(tx, stat) 795 | 796 | fs.txSetDirEnt(tx, dirIno, DirEnt{ 797 | Name: name, 798 | Mode: stat.Mode & S_IFMT, 799 | Ino: stat.Ino, 800 | }) 801 | 802 | if dirStat.Mtime().Before(now.Add(-fs.relMtime)) { 803 | dirStat.SetMtime(now) 804 | dirStat.SetAtime(now) 805 | fs.txSetStat(tx, dirStat) 806 | } 807 | 808 | return stat, nil 809 | }) 810 | if err != nil { 811 | return Stat{}, err 812 | } 813 | return stat.(Stat), nil 814 | } 815 | 816 | func (fs *Fs) txSubvolumeCountDelta(tx fdb.Transaction, subvolume uint64, counter string, delta int64) { 817 | if delta == 0 { 818 | return 819 | } 820 | const COUNTER_SHARDS = 16 821 | counterShardIdx := uint64(fastrand.FastRand() % COUNTER_SHARDS) 822 | deltaBytes := [8]byte{} 823 | binary.LittleEndian.PutUint64(deltaBytes[:], uint64(delta)) 824 | tx.Add(tuple.Tuple{"hafs", fs.fsName, "ino", subvolume, counter, counterShardIdx}, deltaBytes[:]) 825 | } 826 | 827 | func (fs *Fs) txSubvolumeByteDelta(tx fdb.Transaction, subvolume uint64, delta int64) { 828 | fs.txSubvolumeCountDelta(tx, subvolume, "bcnt", delta) 829 | } 830 | 831 | func (fs *Fs) txSubvolumeInodeDelta(tx fdb.Transaction, subvolume uint64, delta int64) { 832 | fs.txSubvolumeCountDelta(tx, subvolume, "icnt", delta) 833 | } 834 | 835 | func (fs *Fs) txSubvolumeCount(tx fdb.ReadTransaction, subvolume uint64, counter string) (uint64, error) { 836 | kvs := tx.GetRange(tuple.Tuple{"hafs", fs.fsName, "ino", subvolume, counter}, fdb.RangeOptions{ 837 | Limit: 512, // Should be large enough to cover all counter shards. 838 | }).GetSliceOrPanic() 839 | v := int64(0) 840 | for _, kv := range kvs { 841 | if len(kv.Value) != 8 { 842 | return 0, errors.New("unexpected overflow or invalid counter value") 843 | } 844 | v += int64(binary.LittleEndian.Uint64(kv.Value)) 845 | } 846 | if v < 0 { 847 | // Underflow - this must be caused by a bug in our accounting. 848 | v = 0 849 | } 850 | return uint64(v), nil 851 | } 852 | 853 | func (fs *Fs) txSubvolumeByteCount(tx fdb.ReadTransaction, subvolume uint64) (uint64, error) { 854 | return fs.txSubvolumeCount(tx, subvolume, "bcnt") 855 | } 856 | 857 | func (fs *Fs) txSubvolumeInodeCount(tx fdb.ReadTransaction, subvolume uint64) (uint64, error) { 858 | return fs.txSubvolumeCount(tx, subvolume, "icnt") 859 | } 860 | 861 | func (fs *Fs) SubvolumeByteCount(subvolume uint64) (uint64, error) { 862 | v, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 863 | v, err := fs.txSubvolumeByteCount(tx, subvolume) 864 | return v, err 865 | }) 866 | if err != nil { 867 | return 0, err 868 | } 869 | return v.(uint64), nil 870 | } 871 | 872 | func (fs *Fs) SubvolumeInodeCount(subvolume uint64) (uint64, error) { 873 | v, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 874 | v, err := fs.txSubvolumeInodeCount(tx, subvolume) 875 | return v, err 876 | }) 877 | if err != nil { 878 | return 0, err 879 | } 880 | return v.(uint64), err 881 | } 882 | 883 | func (fs *Fs) Unlink(dirIno uint64, name string) error { 884 | _, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 885 | dirStatFut := fs.txGetStat(tx, dirIno) 886 | 887 | dirEnt, err := fs.txGetDirEnt(tx, dirIno, name).Get() 888 | if err != nil { 889 | return nil, err 890 | } 891 | stat, err := fs.txGetStat(tx, dirEnt.Ino).Get() 892 | if err != nil { 893 | return nil, err 894 | } 895 | 896 | dirStat, err := dirStatFut.Get() 897 | if err != nil { 898 | return nil, err 899 | } 900 | 901 | if dirEnt.Mode&S_IFMT == S_IFDIR { 902 | if fs.txDirHasChildren(tx, stat.Ino) { 903 | return nil, ErrNotEmpty 904 | } 905 | } 906 | 907 | now := time.Now() 908 | dirStat.SetMtime(now) 909 | dirStat.SetCtime(now) 910 | fs.txSetStat(tx, dirStat) 911 | stat.Nlink -= 1 912 | stat.SetMtime(now) 913 | stat.SetCtime(now) 914 | fs.txSetStat(tx, stat) 915 | if stat.Nlink == 0 { 916 | fs.txSubvolumeByteDelta(tx, stat.Subvolume, -int64(stat.Size)) 917 | fs.txSubvolumeInodeDelta(tx, stat.Subvolume, -1) 918 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "unlinked", dirEnt.Ino}, []byte{}) 919 | } 920 | tx.Clear(tuple.Tuple{"hafs", fs.fsName, "ino", dirIno, "child", name}) 921 | return nil, nil 922 | }) 923 | return err 924 | } 925 | 926 | type HafsFile interface { 927 | WriteData([]byte, uint64) (uint32, error) 928 | ReadData([]byte, uint64) (uint32, error) 929 | Fsync() error 930 | Close() error 931 | } 932 | 933 | func zeroTrimChunk(chunk []byte) []byte { 934 | i := len(chunk) - 1 935 | for ; i >= 0; i-- { 936 | if chunk[i] != 0 { 937 | break 938 | } 939 | } 940 | return chunk[:i+1] 941 | } 942 | 943 | var _zeroChunk [CHUNK_SIZE]byte 944 | 945 | func zeroExpandChunk(chunk *[]byte) { 946 | *chunk = append(*chunk, _zeroChunk[len(*chunk):CHUNK_SIZE]...) 947 | } 948 | 949 | type foundationDBFile struct { 950 | fs *Fs 951 | ino uint64 952 | } 953 | 954 | func (f *foundationDBFile) WriteData(buf []byte, offset uint64) (uint32, error) { 955 | const MAX_WRITE = 32 * CHUNK_SIZE 956 | 957 | // FoundationDB has a transaction time limit and a transaction size limit, 958 | // limit the write to something that can fit. 959 | if len(buf) > MAX_WRITE { 960 | buf = buf[:MAX_WRITE] 961 | } 962 | 963 | nWritten, err := f.fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 964 | 965 | futureStat := f.fs.txGetStat(tx, f.ino) 966 | currentOffset := offset 967 | remainingBuf := buf 968 | 969 | // Deal with the first unaligned and undersized chunks. 970 | if currentOffset%CHUNK_SIZE != 0 || len(remainingBuf) < CHUNK_SIZE { 971 | firstChunkNo := currentOffset / CHUNK_SIZE 972 | firstChunkOffset := currentOffset % CHUNK_SIZE 973 | firstWriteCount := CHUNK_SIZE - firstChunkOffset 974 | if firstWriteCount > uint64(len(buf)) { 975 | firstWriteCount = uint64(len(buf)) 976 | } 977 | firstChunkKey := tuple.Tuple{"hafs", f.fs.fsName, "ino", f.ino, "data", firstChunkNo} 978 | chunk := tx.Get(firstChunkKey).MustGet() 979 | zeroExpandChunk(&chunk) 980 | copy(chunk[firstChunkOffset:firstChunkOffset+firstWriteCount], remainingBuf) 981 | currentOffset += firstWriteCount 982 | remainingBuf = remainingBuf[firstWriteCount:] 983 | tx.Set(firstChunkKey, zeroTrimChunk(chunk)) 984 | } 985 | 986 | for { 987 | key := tuple.Tuple{"hafs", f.fs.fsName, "ino", f.ino, "data", currentOffset / CHUNK_SIZE} 988 | if len(remainingBuf) >= CHUNK_SIZE { 989 | tx.Set(key, zeroTrimChunk(remainingBuf[:CHUNK_SIZE])) 990 | currentOffset += CHUNK_SIZE 991 | remainingBuf = remainingBuf[CHUNK_SIZE:] 992 | } else { 993 | chunk := tx.Get(key).MustGet() 994 | zeroExpandChunk(&chunk) 995 | copy(chunk, remainingBuf) 996 | tx.Set(key, zeroTrimChunk(chunk)) 997 | currentOffset += uint64(len(remainingBuf)) 998 | break 999 | } 1000 | } 1001 | 1002 | stat, err := futureStat.Get() 1003 | if err != nil { 1004 | return nil, err 1005 | } 1006 | 1007 | if stat.Mode&S_IFMT != S_IFREG { 1008 | return nil, ErrInvalid 1009 | } 1010 | 1011 | nWritten := currentOffset - offset 1012 | 1013 | if stat.Size < offset+nWritten { 1014 | newSize := offset + nWritten 1015 | if stat.Nlink != 0 { 1016 | f.fs.txSubvolumeByteDelta(tx, stat.Subvolume, int64(newSize)-int64(stat.Size)) 1017 | } 1018 | stat.Size = newSize 1019 | } 1020 | stat.SetMtime(time.Now()) 1021 | f.fs.txSetStat(tx, stat) 1022 | return uint32(nWritten), nil 1023 | }) 1024 | if err != nil { 1025 | return 0, err 1026 | } 1027 | return nWritten.(uint32), nil 1028 | } 1029 | func (f *foundationDBFile) ReadData(buf []byte, offset uint64) (uint32, error) { 1030 | 1031 | const MAX_READ = 32 * CHUNK_SIZE 1032 | 1033 | if len(buf) > MAX_READ { 1034 | buf = buf[:MAX_READ] 1035 | } 1036 | 1037 | nRead, err := f.fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1038 | currentOffset := offset 1039 | remainingBuf := buf 1040 | 1041 | stat, err := f.fs.txGetStat(tx, f.ino).Get() 1042 | if err != nil { 1043 | return nil, err 1044 | } 1045 | 1046 | if stat.Mode&S_IFMT != S_IFREG { 1047 | return nil, ErrInvalid 1048 | } 1049 | 1050 | // Don't read past the end of the file. 1051 | if stat.Size < currentOffset+uint64(len(remainingBuf)) { 1052 | overshoot := (currentOffset + uint64(len(remainingBuf))) - stat.Size 1053 | if overshoot >= uint64(len(remainingBuf)) { 1054 | return 0, io.EOF 1055 | } 1056 | remainingBuf = remainingBuf[:uint64(len(remainingBuf))-overshoot] 1057 | } 1058 | 1059 | // Deal with the first unaligned and undersized chunk. 1060 | if currentOffset%CHUNK_SIZE != 0 || len(remainingBuf) < CHUNK_SIZE { 1061 | 1062 | firstChunkNo := currentOffset / CHUNK_SIZE 1063 | firstChunkOffset := currentOffset % CHUNK_SIZE 1064 | firstReadCount := CHUNK_SIZE - firstChunkOffset 1065 | if firstReadCount > uint64(len(remainingBuf)) { 1066 | firstReadCount = uint64(len(remainingBuf)) 1067 | } 1068 | 1069 | firstChunkKey := tuple.Tuple{"hafs", f.fs.fsName, "ino", f.ino, "data", firstChunkNo} 1070 | chunk := tx.Get(firstChunkKey).MustGet() 1071 | if chunk != nil { 1072 | zeroExpandChunk(&chunk) 1073 | copy(remainingBuf[:firstReadCount], chunk[firstChunkOffset:firstChunkOffset+firstReadCount]) 1074 | } else { 1075 | // Sparse read. 1076 | for i := uint64(0); i < firstReadCount; i += 1 { 1077 | remainingBuf[i] = 0 1078 | } 1079 | } 1080 | remainingBuf = remainingBuf[firstReadCount:] 1081 | currentOffset += firstReadCount 1082 | } 1083 | 1084 | nChunks := uint64(len(remainingBuf)) / CHUNK_SIZE 1085 | if (len(remainingBuf) % CHUNK_SIZE) != 0 { 1086 | nChunks += 1 1087 | } 1088 | chunkFutures := make([]fdb.FutureByteSlice, 0, nChunks) 1089 | 1090 | // Read all chunks in parallel using futures. 1091 | for i := uint64(0); i < nChunks; i++ { 1092 | key := tuple.Tuple{"hafs", f.fs.fsName, "ino", f.ino, "data", (currentOffset / CHUNK_SIZE) + i} 1093 | chunkFutures = append(chunkFutures, tx.Get(key)) 1094 | } 1095 | 1096 | for i := uint64(0); i < nChunks; i++ { 1097 | chunk := chunkFutures[i].MustGet() 1098 | zeroExpandChunk(&chunk) 1099 | n := copy(remainingBuf, chunk) 1100 | currentOffset += uint64(n) 1101 | remainingBuf = remainingBuf[n:] 1102 | } 1103 | 1104 | nRead := currentOffset - offset 1105 | 1106 | if (offset + nRead) == stat.Size { 1107 | return uint32(nRead), io.EOF 1108 | } 1109 | 1110 | return uint32(nRead), nil 1111 | }) 1112 | nReadInt, ok := nRead.(uint32) 1113 | if ok { 1114 | return nReadInt, err 1115 | } else { 1116 | return 0, err 1117 | } 1118 | } 1119 | func (f *foundationDBFile) Fsync() error { return nil } 1120 | func (f *foundationDBFile) Close() error { return nil } 1121 | 1122 | type invalidFile struct{} 1123 | 1124 | func (f *invalidFile) WriteData(buf []byte, offset uint64) (uint32, error) { return 0, ErrInvalid } 1125 | func (f *invalidFile) ReadData(buf []byte, offset uint64) (uint32, error) { return 0, ErrInvalid } 1126 | func (f *invalidFile) Fsync() error { return ErrInvalid } 1127 | func (f *invalidFile) Close() error { return nil } 1128 | 1129 | type zeroFile struct { 1130 | size uint64 1131 | } 1132 | 1133 | func (f *zeroFile) WriteData(buf []byte, offset uint64) (uint32, error) { return 0, ErrNotSupported } 1134 | func (f *zeroFile) ReadData(buf []byte, offset uint64) (uint32, error) { 1135 | n := uint32(0) 1136 | for i := uint64(0); i < uint64(len(buf)) && offset+i < f.size; i++ { 1137 | buf[i] = 0 1138 | n += 1 1139 | } 1140 | return n, nil 1141 | } 1142 | func (f *zeroFile) Fsync() error { return nil } 1143 | func (f *zeroFile) Close() error { return nil } 1144 | 1145 | type objectStoreReadOnlyFile struct { 1146 | storageObject ReaderAtCloser 1147 | } 1148 | 1149 | func (f *objectStoreReadOnlyFile) WriteData(buf []byte, offset uint64) (uint32, error) { 1150 | return 0, ErrNotSupported 1151 | } 1152 | 1153 | func (f *objectStoreReadOnlyFile) ReadData(buf []byte, offset uint64) (uint32, error) { 1154 | n, err := f.storageObject.ReadAt(buf, int64(offset)) 1155 | return uint32(n), err 1156 | } 1157 | 1158 | func (f *objectStoreReadOnlyFile) Fsync() error { 1159 | return nil 1160 | } 1161 | 1162 | func (f *objectStoreReadOnlyFile) Close() error { 1163 | return f.storageObject.Close() 1164 | } 1165 | 1166 | type objectStoreSmallReadOnlyFile struct { 1167 | fs *Fs 1168 | ino uint64 1169 | size uint64 1170 | readObjectOnce sync.Once 1171 | objectError error 1172 | objectData *bytes.Reader 1173 | } 1174 | 1175 | func (f *objectStoreSmallReadOnlyFile) WriteData(buf []byte, offset uint64) (uint32, error) { 1176 | return 0, ErrNotSupported 1177 | } 1178 | 1179 | func (f *objectStoreSmallReadOnlyFile) ReadData(buf []byte, offset uint64) (uint32, error) { 1180 | // Lazily read the data. 1181 | f.readObjectOnce.Do(func() { 1182 | buf := bytes.NewBuffer(make([]byte, 0, f.size)) 1183 | ok, err := f.fs.objectStorage.ReadAll(f.fs.fsName, f.ino, buf) 1184 | if err != nil { 1185 | f.objectError = err 1186 | return 1187 | } 1188 | if !ok { 1189 | f.objectData = bytes.NewReader(make([]byte, f.size, f.size)) 1190 | return 1191 | } 1192 | f.objectData = bytes.NewReader(buf.Bytes()) 1193 | return 1194 | }) 1195 | if f.objectError != nil { 1196 | return 0, f.objectError 1197 | } 1198 | n, err := f.objectData.ReadAt(buf, int64(offset)) 1199 | return uint32(n), err 1200 | } 1201 | 1202 | func (f *objectStoreSmallReadOnlyFile) Fsync() error { 1203 | return nil 1204 | } 1205 | 1206 | func (f *objectStoreSmallReadOnlyFile) Close() error { 1207 | return nil 1208 | } 1209 | 1210 | type objectStoreReadWriteFile struct { 1211 | fs *Fs 1212 | dirty atomicBool 1213 | ino uint64 1214 | tmpFile *os.File 1215 | } 1216 | 1217 | func (f *objectStoreReadWriteFile) WriteData(buf []byte, offset uint64) (uint32, error) { 1218 | f.dirty.Store(true) 1219 | n, err := f.tmpFile.WriteAt(buf, int64(offset)) 1220 | return uint32(n), err 1221 | } 1222 | 1223 | func (f *objectStoreReadWriteFile) ReadData(buf []byte, offset uint64) (uint32, error) { 1224 | n, err := f.tmpFile.ReadAt(buf, int64(offset)) 1225 | return uint32(n), err 1226 | } 1227 | 1228 | func (f *objectStoreReadWriteFile) Fsync() error { 1229 | 1230 | if !f.dirty.Load() { 1231 | return nil 1232 | } 1233 | 1234 | tmpFileStat, err := f.tmpFile.Stat() 1235 | if err != nil { 1236 | return err 1237 | } 1238 | 1239 | var nLink uint32 1240 | 1241 | _, err = f.fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1242 | stat, err := f.fs.txGetStat(tx, f.ino).Get() 1243 | if err != nil { 1244 | return nil, err 1245 | } 1246 | if stat.Size != 0 { 1247 | // Object storage files can only be written once. 1248 | return nil, ErrNotSupported 1249 | } 1250 | 1251 | nLink = stat.Nlink 1252 | 1253 | stat.Size = uint64(tmpFileStat.Size()) 1254 | if stat.Nlink != 0 { 1255 | f.fs.txSubvolumeByteDelta(tx, stat.Subvolume, int64(stat.Size)) 1256 | } 1257 | f.fs.txSetStat(tx, stat) 1258 | return nil, nil 1259 | }) 1260 | if err != nil { 1261 | return err 1262 | } 1263 | 1264 | if tmpFileStat.Size() == 0 { 1265 | // No point in uploading an empty object. 1266 | f.dirty.Store(false) 1267 | return nil 1268 | } 1269 | 1270 | if nLink == 0 { 1271 | // We don't want to ever upload an object for an unlinked file 1272 | // as that could cause orphaned objects in the object store. 1273 | f.dirty.Store(false) 1274 | return nil 1275 | } 1276 | 1277 | _, err = f.fs.objectStorage.Write(f.fs.fsName, f.ino, f.tmpFile) 1278 | if err != nil { 1279 | return err 1280 | } 1281 | 1282 | f.dirty.Store(false) 1283 | return nil 1284 | } 1285 | 1286 | func (f *objectStoreReadWriteFile) Close() error { 1287 | _ = f.tmpFile.Close() 1288 | return nil 1289 | } 1290 | 1291 | type OpenFileOpts struct { 1292 | Truncate bool 1293 | } 1294 | 1295 | func (fs *Fs) OpenFile(ino uint64, opts OpenFileOpts) (HafsFile, Stat, error) { 1296 | var stat Stat 1297 | _, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1298 | 1299 | existingStat, err := fs.txGetStat(tx, ino).Get() 1300 | if err != nil { 1301 | return nil, err 1302 | } 1303 | stat = existingStat 1304 | 1305 | // Might happen as a result of client side caching. 1306 | if stat.Nlink == 0 { 1307 | // N.B. don't return ErrNotExist, the file might exist but the cache is out of date. 1308 | return nil, ErrInvalid 1309 | } 1310 | 1311 | if stat.Mode&S_IFMT != S_IFREG { 1312 | return nil, ErrInvalid 1313 | } 1314 | 1315 | if opts.Truncate { 1316 | stat, err = fs.txModStat(tx, stat.Ino, ModStatOpts{ 1317 | Valid: MODSTAT_SIZE, 1318 | Size: 0, 1319 | }) 1320 | if err != nil { 1321 | return nil, err 1322 | } 1323 | } 1324 | 1325 | return nil, nil 1326 | }) 1327 | 1328 | var f HafsFile 1329 | if stat.Flags&FLAG_OBJECT_STORAGE == 0 { 1330 | f = &foundationDBFile{ 1331 | fs: fs, 1332 | ino: stat.Ino, 1333 | } 1334 | } else { 1335 | if stat.Size == 0 { 1336 | tmpFile, err := os.CreateTemp("", "") 1337 | if err != nil { 1338 | return nil, Stat{}, err 1339 | } 1340 | // XXX Make file anonymous, it would be nice to create it like this. 1341 | err = os.Remove(tmpFile.Name()) 1342 | if err != nil { 1343 | return nil, Stat{}, err 1344 | } 1345 | 1346 | f = &objectStoreReadWriteFile{ 1347 | fs: fs, 1348 | dirty: atomicBool{}, 1349 | ino: stat.Ino, 1350 | tmpFile: tmpFile, 1351 | } 1352 | } else { 1353 | if stat.Size >= fs.smallObjectThreshold { 1354 | storageObject, exists, err := fs.objectStorage.Open(fs.fsName, stat.Ino) 1355 | if err != nil { 1356 | return nil, Stat{}, err 1357 | } 1358 | if !exists { 1359 | f = &zeroFile{size: stat.Size} 1360 | } else { 1361 | f = &objectStoreReadOnlyFile{ 1362 | storageObject: storageObject, 1363 | } 1364 | } 1365 | } else { 1366 | f = &objectStoreSmallReadOnlyFile{ 1367 | fs: fs, 1368 | ino: stat.Ino, 1369 | size: stat.Size, 1370 | } 1371 | } 1372 | } 1373 | } 1374 | 1375 | return f, stat, err 1376 | } 1377 | 1378 | type CreateFileOpts struct { 1379 | Truncate bool 1380 | Mode uint32 1381 | Uid uint32 1382 | Gid uint32 1383 | } 1384 | 1385 | func (fs *Fs) CreateFile(dirIno uint64, name string, opts CreateFileOpts) (HafsFile, Stat, error) { 1386 | stat, err := fs.Mknod(dirIno, name, MknodOpts{ 1387 | Truncate: opts.Truncate, 1388 | Mode: (^S_IFMT & opts.Mode) | S_IFREG, 1389 | Uid: opts.Uid, 1390 | Gid: opts.Gid, 1391 | }) 1392 | if err != nil { 1393 | return nil, Stat{}, err 1394 | } 1395 | 1396 | var f HafsFile 1397 | 1398 | if stat.Flags&FLAG_OBJECT_STORAGE == 0 { 1399 | f = &foundationDBFile{ 1400 | fs: fs, 1401 | ino: stat.Ino, 1402 | } 1403 | } else { 1404 | tmpFile, err := os.CreateTemp("", "") 1405 | if err != nil { 1406 | return nil, Stat{}, err 1407 | } 1408 | // XXX Make file anonymous, it would be nice to create it like this. 1409 | err = os.Remove(tmpFile.Name()) 1410 | if err != nil { 1411 | return nil, Stat{}, err 1412 | } 1413 | 1414 | f = &objectStoreReadWriteFile{ 1415 | fs: fs, 1416 | dirty: atomicBool{}, 1417 | ino: stat.Ino, 1418 | tmpFile: tmpFile, 1419 | } 1420 | } 1421 | 1422 | return f, stat, err 1423 | } 1424 | 1425 | func (fs *Fs) ReadSymlink(ino uint64) ([]byte, error) { 1426 | l, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1427 | statFut := fs.txGetStat(tx, ino) 1428 | lFut := tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "target"}) 1429 | stat, err := statFut.Get() 1430 | if err != nil { 1431 | return nil, err 1432 | } 1433 | if stat.Mode&S_IFMT != S_IFLNK { 1434 | return nil, ErrInvalid 1435 | } 1436 | return lFut.MustGet(), nil 1437 | }) 1438 | if err != nil { 1439 | return nil, err 1440 | } 1441 | return l.([]byte), nil 1442 | } 1443 | 1444 | func (fs *Fs) GetStat(ino uint64) (Stat, error) { 1445 | stat, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1446 | stat, err := fs.txGetStat(tx, ino).Get() 1447 | return stat, err 1448 | }) 1449 | if err != nil { 1450 | return Stat{}, err 1451 | } 1452 | return stat.(Stat), nil 1453 | } 1454 | 1455 | const ( 1456 | MODSTAT_MODE = 1 << iota 1457 | MODSTAT_UID 1458 | MODSTAT_GID 1459 | MODSTAT_SIZE 1460 | MODSTAT_ATIME 1461 | MODSTAT_MTIME 1462 | MODSTAT_CTIME 1463 | ) 1464 | 1465 | type ModStatOpts struct { 1466 | Valid uint32 1467 | Size uint64 1468 | Atimesec uint64 1469 | Mtimesec uint64 1470 | Ctimesec uint64 1471 | Atimensec uint32 1472 | Mtimensec uint32 1473 | Ctimensec uint32 1474 | Mode uint32 1475 | Uid uint32 1476 | Gid uint32 1477 | } 1478 | 1479 | func (opts *ModStatOpts) setTime(t time.Time, secs *uint64, nsecs *uint32) { 1480 | *secs = uint64(t.UnixNano() / 1_000_000_000) 1481 | *nsecs = uint32(t.UnixNano() % 1_000_000_000) 1482 | } 1483 | 1484 | func (opts *ModStatOpts) SetMtime(t time.Time) { 1485 | opts.Valid |= MODSTAT_MTIME 1486 | opts.setTime(t, &opts.Mtimesec, &opts.Mtimensec) 1487 | } 1488 | 1489 | func (opts *ModStatOpts) SetAtime(t time.Time) { 1490 | opts.Valid |= MODSTAT_ATIME 1491 | opts.setTime(t, &opts.Atimesec, &opts.Atimensec) 1492 | } 1493 | 1494 | func (opts *ModStatOpts) SetCtime(t time.Time) { 1495 | opts.Valid |= MODSTAT_CTIME 1496 | opts.setTime(t, &opts.Ctimesec, &opts.Ctimensec) 1497 | } 1498 | 1499 | func (opts *ModStatOpts) SetSize(size uint64) { 1500 | opts.Valid |= MODSTAT_SIZE 1501 | opts.Size = size 1502 | } 1503 | 1504 | func (opts *ModStatOpts) SetMode(mode uint32) { 1505 | opts.Valid |= MODSTAT_MODE 1506 | opts.Mode = mode 1507 | } 1508 | 1509 | func (opts *ModStatOpts) SetUid(uid uint32) { 1510 | opts.Valid |= MODSTAT_UID 1511 | opts.Uid = uid 1512 | } 1513 | 1514 | func (opts *ModStatOpts) SetGid(gid uint32) { 1515 | opts.Valid |= MODSTAT_GID 1516 | opts.Gid = gid 1517 | } 1518 | 1519 | func (fs *Fs) txModStat(tx fdb.Transaction, ino uint64, opts ModStatOpts) (Stat, error) { 1520 | stat, err := fs.txGetStat(tx, ino).Get() 1521 | if err != nil { 1522 | return Stat{}, err 1523 | } 1524 | 1525 | if opts.Valid&MODSTAT_MODE != 0 { 1526 | stat.Mode = (stat.Mode & S_IFMT) | (opts.Mode & ^S_IFMT) 1527 | } 1528 | 1529 | if opts.Valid&MODSTAT_UID != 0 { 1530 | stat.Uid = opts.Uid 1531 | } 1532 | 1533 | if opts.Valid&MODSTAT_GID != 0 { 1534 | stat.Gid = opts.Gid 1535 | } 1536 | 1537 | if opts.Valid&MODSTAT_ATIME != 0 { 1538 | stat.Atimesec = opts.Atimesec 1539 | stat.Atimensec = opts.Atimensec 1540 | } 1541 | 1542 | now := time.Now() 1543 | 1544 | if opts.Valid&MODSTAT_MTIME != 0 { 1545 | stat.Mtimesec = opts.Mtimesec 1546 | stat.Mtimensec = opts.Mtimensec 1547 | } else if opts.Valid&MODSTAT_SIZE != 0 { 1548 | stat.SetMtime(now) 1549 | } 1550 | 1551 | if opts.Valid&MODSTAT_CTIME != 0 { 1552 | stat.Ctimesec = opts.Ctimesec 1553 | stat.Ctimensec = opts.Ctimensec 1554 | } else { 1555 | stat.SetCtime(now) 1556 | } 1557 | 1558 | if opts.Valid&MODSTAT_SIZE != 0 { 1559 | 1560 | if stat.Nlink != 0 { 1561 | fs.txSubvolumeByteDelta(tx, stat.Subvolume, int64(opts.Size)-int64(stat.Size)) 1562 | } 1563 | stat.Size = opts.Size 1564 | 1565 | if stat.Mode&S_IFMT != S_IFREG { 1566 | return Stat{}, ErrInvalid 1567 | } 1568 | 1569 | if stat.Size != 0 { 1570 | 1571 | if stat.Flags&FLAG_OBJECT_STORAGE != 0 { 1572 | // We don't support truncating object storage files for now. 1573 | return Stat{}, ErrNotSupported 1574 | } 1575 | 1576 | // Don't allow arbitrarily setting unrealistically huge file sizes 1577 | // that risk overflows and other strange problems by going past sensible limits. 1578 | if stat.Size > 0xFFFF_FFFF_FFFF { 1579 | return Stat{}, ErrInvalid 1580 | } 1581 | 1582 | clearBegin := (stat.Size + (CHUNK_SIZE - stat.Size%4096)) / CHUNK_SIZE 1583 | _, clearEnd := tuple.Tuple{"hafs", fs.fsName, "ino", ino, "data"}.FDBRangeKeys() 1584 | tx.ClearRange(fdb.KeyRange{ 1585 | Begin: tuple.Tuple{"hafs", fs.fsName, "ino", ino, "data", clearBegin}, 1586 | End: clearEnd, 1587 | }) 1588 | lastChunkIdx := stat.Size / CHUNK_SIZE 1589 | lastChunkSize := stat.Size % CHUNK_SIZE 1590 | lastChunkKey := tuple.Tuple{"hafs", fs.fsName, "ino", ino, "data", lastChunkIdx} 1591 | if lastChunkSize == 0 { 1592 | tx.Clear(lastChunkKey) 1593 | } 1594 | } else { 1595 | tx.ClearRange(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "data"}) 1596 | } 1597 | } 1598 | 1599 | fs.txSetStat(tx, stat) 1600 | return stat, nil 1601 | } 1602 | 1603 | func (fs *Fs) ModStat(ino uint64, opts ModStatOpts) (Stat, error) { 1604 | stat, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1605 | stat, err := fs.txModStat(tx, ino, opts) 1606 | return stat, err 1607 | }) 1608 | if err != nil { 1609 | return Stat{}, err 1610 | } 1611 | return stat.(Stat), nil 1612 | } 1613 | 1614 | func (fs *Fs) Lookup(dirIno uint64, name string) (Stat, error) { 1615 | stat, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 1616 | dirEnt, err := fs.txGetDirEnt(tx, dirIno, name).Get() 1617 | if err != nil { 1618 | return Stat{}, err 1619 | } 1620 | stat, err := fs.txGetStat(tx, dirEnt.Ino).Get() 1621 | return stat, err 1622 | }) 1623 | if err != nil { 1624 | return Stat{}, err 1625 | } 1626 | return stat.(Stat), nil 1627 | } 1628 | 1629 | func (fs *Fs) Rename(fromDirIno, toDirIno uint64, fromName, toName string) error { 1630 | 1631 | if fromName == toName && fromDirIno == toDirIno { 1632 | return nil 1633 | } 1634 | 1635 | _, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 1636 | 1637 | fromDirStatFut := fs.txGetStat(tx, fromDirIno) 1638 | toDirStatFut := fromDirStatFut 1639 | if toDirIno != fromDirIno { 1640 | toDirStatFut = fs.txGetStat(tx, toDirIno) 1641 | } 1642 | fromDirEntFut := fs.txGetDirEnt(tx, fromDirIno, fromName) 1643 | toDirEntFut := fs.txGetDirEnt(tx, toDirIno, toName) 1644 | 1645 | fromDirStat, fromDirStatErr := fromDirStatFut.Get() 1646 | toDirStat, toDirStatErr := toDirStatFut.Get() 1647 | fromDirEnt, fromDirEntErr := fromDirEntFut.Get() 1648 | toDirEnt, toDirEntErr := toDirEntFut.Get() 1649 | 1650 | if toDirStatErr != nil { 1651 | return nil, toDirStatErr 1652 | } 1653 | 1654 | if toDirStat.Mode&S_IFMT != S_IFDIR { 1655 | return nil, ErrNotDir 1656 | } 1657 | 1658 | if fromDirStatErr != nil { 1659 | return nil, fromDirStatErr 1660 | } 1661 | 1662 | if fromDirStat.Mode&S_IFMT != S_IFDIR { 1663 | return nil, ErrNotDir 1664 | } 1665 | 1666 | if fromDirEntErr != nil { 1667 | return nil, fromDirEntErr 1668 | } 1669 | 1670 | now := time.Now() 1671 | 1672 | if errors.Is(toDirEntErr, ErrNotExist) { 1673 | /* Nothing to do. */ 1674 | } else if toDirEntErr != nil { 1675 | return nil, toDirEntErr 1676 | } else { 1677 | toStat, err := fs.txGetStat(tx, toDirEnt.Ino).Get() 1678 | if err != nil { 1679 | return nil, err 1680 | } 1681 | 1682 | if toStat.Mode&S_IFMT == S_IFDIR { 1683 | if fs.txDirHasChildren(tx, toStat.Ino) { 1684 | return nil, ErrNotEmpty 1685 | } 1686 | } 1687 | 1688 | toStat.Nlink -= 1 1689 | toStat.SetMtime(now) 1690 | toStat.SetCtime(now) 1691 | fs.txSetStat(tx, toStat) 1692 | 1693 | if toStat.Nlink == 0 { 1694 | fs.txSubvolumeByteDelta(tx, toStat.Subvolume, -int64(toStat.Size)) 1695 | fs.txSubvolumeInodeDelta(tx, toStat.Subvolume, -1) 1696 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "unlinked", toStat.Ino}, []byte{}) 1697 | } 1698 | } 1699 | 1700 | if toDirIno != fromDirIno { 1701 | 1702 | // Enforce subvolume invariants. 1703 | if toDirStat.Flags&FLAG_SUBVOLUME != 0 && fromDirStat.Flags&FLAG_SUBVOLUME != 0 { 1704 | return nil, ErrInvalid 1705 | } else if toDirStat.Flags&FLAG_SUBVOLUME != 0 { 1706 | if toDirStat.Ino != fromDirStat.Subvolume { 1707 | return nil, ErrInvalid 1708 | } 1709 | } else if fromDirStat.Flags&FLAG_SUBVOLUME != 0 { 1710 | if fromDirStat.Ino != toDirStat.Subvolume { 1711 | return nil, ErrInvalid 1712 | } 1713 | } else { // Neither are subvolumes 1714 | if toDirStat.Subvolume != fromDirStat.Subvolume { 1715 | return nil, ErrInvalid 1716 | } 1717 | } 1718 | 1719 | toDirStat.SetMtime(now) 1720 | toDirStat.SetCtime(now) 1721 | fs.txSetStat(tx, toDirStat) 1722 | fromDirStat.SetMtime(now) 1723 | toDirStat.SetCtime(now) 1724 | fs.txSetStat(tx, fromDirStat) 1725 | } else { 1726 | toDirStat.SetMtime(now) 1727 | toDirStat.SetCtime(now) 1728 | fs.txSetStat(tx, toDirStat) 1729 | } 1730 | 1731 | tx.Clear(tuple.Tuple{"hafs", fs.fsName, "ino", fromDirIno, "child", fromName}) 1732 | fs.txSetDirEnt(tx, toDirIno, DirEnt{ 1733 | Name: toName, 1734 | Mode: fromDirEnt.Mode, 1735 | Ino: fromDirEnt.Ino, 1736 | }) 1737 | return nil, nil 1738 | }) 1739 | return err 1740 | } 1741 | 1742 | type DirIter struct { 1743 | lock sync.Mutex 1744 | fs *Fs 1745 | iterRange fdb.KeyRange 1746 | ents []DirEnt 1747 | stats []Stat 1748 | nextCalled bool 1749 | isPlus bool 1750 | done bool 1751 | } 1752 | 1753 | const _DIR_ITER_BATCH_SIZE = 512 1754 | 1755 | func (di *DirIter) fill() error { 1756 | 1757 | _, err := di.fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 1758 | 1759 | kvs := tx.GetRange(di.iterRange, fdb.RangeOptions{ 1760 | Limit: _DIR_ITER_BATCH_SIZE, 1761 | }).GetSliceOrPanic() 1762 | 1763 | if len(kvs) != 0 { 1764 | nextBegin, err := fdb.Strinc(kvs[len(kvs)-1].Key) 1765 | if err != nil { 1766 | return nil, err 1767 | } 1768 | di.iterRange.Begin = fdb.Key(nextBegin) 1769 | } else { 1770 | di.iterRange.Begin = di.iterRange.End 1771 | di.done = true 1772 | return nil, nil 1773 | } 1774 | 1775 | ents := make([]DirEnt, 0, len(kvs)) 1776 | 1777 | statFuts := []futureStat{} 1778 | if di.isPlus { 1779 | statFuts = make([]futureStat, 0, len(kvs)) 1780 | } 1781 | 1782 | for _, kv := range kvs { 1783 | keyTuple, err := tuple.Unpack(kv.Key) 1784 | if err != nil { 1785 | return nil, err 1786 | } 1787 | name := keyTuple[len(keyTuple)-1].(string) 1788 | dirEnt := DirEnt{} 1789 | err = dirEnt.UnmarshalBinary(kv.Value) 1790 | if err != nil { 1791 | return nil, err 1792 | } 1793 | dirEnt.Name = name 1794 | ents = append(ents, dirEnt) 1795 | if di.isPlus { 1796 | statFuts = append(statFuts, di.fs.txGetStat(tx, dirEnt.Ino)) 1797 | } 1798 | } 1799 | 1800 | // Reverse entries so we can pop them off in the right order. 1801 | for i, j := 0, len(ents)-1; i < j; i, j = i+1, j-1 { 1802 | ents[i], ents[j] = ents[j], ents[i] 1803 | } 1804 | 1805 | if di.isPlus { 1806 | stats := make([]Stat, 0, len(statFuts)) 1807 | // Read stats in reverse order. 1808 | for i := len(statFuts) - 1; i >= 0; i -= 1 { 1809 | stat, err := statFuts[i].Get() 1810 | if err != nil { 1811 | return nil, err 1812 | } 1813 | stats = append(stats, stat) 1814 | } 1815 | if len(ents) < _DIR_ITER_BATCH_SIZE { 1816 | di.done = true 1817 | } 1818 | di.stats = stats 1819 | } 1820 | 1821 | di.ents = ents 1822 | return nil, nil 1823 | }) 1824 | if err != nil { 1825 | return err 1826 | } 1827 | return nil 1828 | } 1829 | 1830 | func (di *DirIter) next(ent *DirEnt, stat *Stat) error { 1831 | 1832 | if len(di.ents) == 0 { 1833 | if di.done { 1834 | return io.EOF 1835 | } 1836 | // Fill initial listing, otherwise we should always have something buffered. 1837 | if !di.done { 1838 | err := di.fill() 1839 | if err != nil { 1840 | return err 1841 | } 1842 | } 1843 | if len(di.ents) == 0 && di.done { 1844 | return io.EOF 1845 | } 1846 | } 1847 | 1848 | if ent != nil { 1849 | *ent = di.ents[len(di.ents)-1] 1850 | } 1851 | di.ents = di.ents[:len(di.ents)-1] 1852 | 1853 | if di.isPlus { 1854 | if stat != nil { 1855 | *stat = di.stats[len(di.stats)-1] 1856 | } 1857 | di.stats = di.stats[:len(di.stats)-1] 1858 | } 1859 | 1860 | return nil 1861 | } 1862 | 1863 | func (di *DirIter) Next() (DirEnt, error) { 1864 | di.lock.Lock() 1865 | defer di.lock.Unlock() 1866 | 1867 | if !di.nextCalled { 1868 | di.nextCalled = true 1869 | // First use was a plain Next, iterator 1870 | // will not gather stat information. 1871 | di.isPlus = false 1872 | } 1873 | 1874 | ent := DirEnt{} 1875 | err := di.next(&ent, nil) 1876 | return ent, err 1877 | } 1878 | 1879 | func (di *DirIter) NextPlus() (DirEnt, Stat, error) { 1880 | di.lock.Lock() 1881 | defer di.lock.Unlock() 1882 | 1883 | ent := DirEnt{} 1884 | stat := Stat{} 1885 | 1886 | if !di.nextCalled { 1887 | di.nextCalled = true 1888 | // First use was a NextPlus, iterator 1889 | // will gather stat information. 1890 | di.isPlus = true 1891 | } 1892 | 1893 | err := di.next(&ent, &stat) 1894 | if err != nil { 1895 | return ent, stat, err 1896 | } 1897 | 1898 | if !di.isPlus { 1899 | // The iterator is not a 'plus' iterator, but 1900 | // we can still fill in the missing stat 1901 | // with a best effort fallback, manually its just slower. 1902 | stat, err = di.fs.GetStat(ent.Ino) 1903 | if err != nil { 1904 | return ent, stat, err 1905 | } 1906 | } 1907 | 1908 | return ent, stat, nil 1909 | } 1910 | 1911 | func (di *DirIter) Unget(ent DirEnt) { 1912 | di.lock.Lock() 1913 | defer di.lock.Unlock() 1914 | if di.isPlus { 1915 | panic("api misuse") 1916 | } 1917 | di.ents = append(di.ents, ent) 1918 | di.done = false 1919 | } 1920 | 1921 | func (di *DirIter) UngetPlus(ent DirEnt, stat Stat) { 1922 | di.lock.Lock() 1923 | defer di.lock.Unlock() 1924 | if !di.isPlus { 1925 | panic("api misuse") 1926 | } 1927 | di.ents = append(di.ents, ent) 1928 | di.stats = append(di.stats, stat) 1929 | di.done = false 1930 | } 1931 | 1932 | func (fs *Fs) IterDirEnts(dirIno uint64) (*DirIter, error) { 1933 | iterBegin, iterEnd := tuple.Tuple{"hafs", fs.fsName, "ino", dirIno, "child"}.FDBRangeKeys() 1934 | di := &DirIter{ 1935 | fs: fs, 1936 | iterRange: fdb.KeyRange{ 1937 | Begin: iterBegin, 1938 | End: iterEnd, 1939 | }, 1940 | } 1941 | _, err := fs.GetStat(dirIno) 1942 | if err != nil { 1943 | return nil, err 1944 | } 1945 | return di, err 1946 | } 1947 | 1948 | func (fs *Fs) GetXAttr(ino uint64, name string) ([]byte, error) { 1949 | x, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 1950 | switch name { 1951 | case "hafs.totals": 1952 | statFut := fs.txGetStat(tx, ino) 1953 | bcount, err := fs.txSubvolumeByteCount(tx, ino) 1954 | if err != nil { 1955 | return nil, err 1956 | } 1957 | icount, err := fs.txSubvolumeInodeCount(tx, ino) 1958 | if err != nil { 1959 | return nil, err 1960 | } 1961 | stat, err := statFut.Get() 1962 | if err != nil { 1963 | return nil, err 1964 | } 1965 | if stat.Flags&FLAG_SUBVOLUME == 0 { 1966 | return nil, ErrInvalid 1967 | } 1968 | return []byte(fmt.Sprintf(`{"bytes":%d,"inodes":%d}`, bcount, icount)), nil 1969 | case "hafs.total-bytes", "hafs.total-inodes": 1970 | var ( 1971 | count uint64 1972 | err error 1973 | ) 1974 | statFut := fs.txGetStat(tx, ino) 1975 | switch name { 1976 | case "hafs.total-bytes": 1977 | count, err = fs.txSubvolumeByteCount(tx, ino) 1978 | if err != nil { 1979 | return nil, err 1980 | } 1981 | case "hafs.total-inodes": 1982 | count, err = fs.txSubvolumeInodeCount(tx, ino) 1983 | if err != nil { 1984 | return nil, err 1985 | } 1986 | } 1987 | stat, err := statFut.Get() 1988 | if err != nil { 1989 | return nil, err 1990 | } 1991 | if stat.Flags&FLAG_SUBVOLUME == 0 { 1992 | return nil, ErrInvalid 1993 | } 1994 | return []byte(fmt.Sprintf("%d", count)), nil 1995 | default: 1996 | statFut := fs.txGetStat(tx, ino) 1997 | xFut := tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "xattr", name}) 1998 | _, err := statFut.Get() 1999 | if err != nil { 2000 | return nil, err 2001 | } 2002 | return xFut.MustGet(), nil 2003 | } 2004 | 2005 | }) 2006 | if err != nil { 2007 | return nil, err 2008 | } 2009 | return x.([]byte), nil 2010 | } 2011 | 2012 | func (fs *Fs) SetXAttr(ino uint64, name string, data []byte) error { 2013 | _, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2014 | stat, err := fs.txGetStat(tx, ino).Get() 2015 | if err != nil { 2016 | return nil, err 2017 | } 2018 | 2019 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "xattr", name}, data) 2020 | 2021 | switch name { 2022 | case "hafs.object-storage": 2023 | if stat.Mode&S_IFMT != S_IFDIR { 2024 | return nil, ErrInvalid 2025 | } 2026 | switch string(data) { 2027 | case "true": 2028 | stat.Flags |= FLAG_OBJECT_STORAGE 2029 | default: 2030 | return nil, ErrInvalid 2031 | } 2032 | fs.txSetStat(tx, stat) 2033 | case "hafs.subvolume": 2034 | if stat.Mode&S_IFMT != S_IFDIR { 2035 | return nil, ErrInvalid 2036 | } 2037 | if fs.txDirHasChildren(tx, stat.Ino) { 2038 | return nil, ErrInvalid 2039 | } 2040 | switch string(data) { 2041 | case "true": 2042 | stat.Flags |= FLAG_SUBVOLUME 2043 | default: 2044 | return nil, ErrInvalid 2045 | } 2046 | fs.txSetStat(tx, stat) 2047 | case "hafs.total-bytes", "hafs.total-inodes", "hafs.totals": 2048 | return nil, ErrInvalid 2049 | default: 2050 | if strings.HasPrefix(name, "hafs.") { 2051 | return nil, ErrInvalid 2052 | } 2053 | } 2054 | return nil, nil 2055 | }) 2056 | return err 2057 | } 2058 | 2059 | func (fs *Fs) RemoveXAttr(ino uint64, name string) error { 2060 | _, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2061 | stat, err := fs.txGetStat(tx, ino).Get() 2062 | if err != nil { 2063 | return nil, err 2064 | } 2065 | switch name { 2066 | case "hafs.object-storage": 2067 | if stat.Mode&S_IFMT != S_IFDIR { 2068 | return nil, ErrInvalid 2069 | } 2070 | stat.Flags &= ^FLAG_OBJECT_STORAGE 2071 | fs.txSetStat(tx, stat) 2072 | case "hafs.subvolume": 2073 | if stat.Mode&S_IFMT != S_IFDIR { 2074 | return nil, ErrInvalid 2075 | } 2076 | if fs.txDirHasChildren(tx, stat.Ino) { 2077 | return nil, ErrInvalid 2078 | } 2079 | stat.Flags &= ^FLAG_SUBVOLUME 2080 | fs.txSetStat(tx, stat) 2081 | default: 2082 | } 2083 | tx.Clear(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "xattr", name}) 2084 | return nil, nil 2085 | }) 2086 | return err 2087 | } 2088 | 2089 | func (fs *Fs) ListXAttr(ino uint64) ([]string, error) { 2090 | v, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 2091 | _, err := fs.txGetStat(tx, ino).Get() 2092 | if err != nil { 2093 | return nil, err 2094 | } 2095 | kvs := tx.GetRange(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "xattr"}, fdb.RangeOptions{}).GetSliceOrPanic() 2096 | return kvs, nil 2097 | }) 2098 | if err != nil { 2099 | return nil, err 2100 | } 2101 | kvs := v.([]fdb.KeyValue) 2102 | xattrs := make([]string, 0, len(kvs)) 2103 | for _, kv := range kvs { 2104 | unpacked, err := tuple.Unpack(kv.Key) 2105 | if err != nil { 2106 | return nil, err 2107 | } 2108 | xattrs = append(xattrs, unpacked[len(unpacked)-1].(string)) 2109 | } 2110 | return xattrs, nil 2111 | } 2112 | 2113 | const ( 2114 | LOCK_NONE = iota 2115 | LOCK_SHARED 2116 | LOCK_EXCLUSIVE 2117 | ) 2118 | 2119 | type LockType uint32 2120 | 2121 | type SetLockOpts struct { 2122 | Typ LockType 2123 | Owner uint64 2124 | } 2125 | 2126 | type exclusiveLockRecord struct { 2127 | Owner uint64 2128 | ClientId string 2129 | } 2130 | 2131 | func (lr *exclusiveLockRecord) MarshalBinary() ([]byte, error) { 2132 | bufsz := 2*binary.MaxVarintLen64 + len(lr.ClientId) 2133 | buf := make([]byte, bufsz, bufsz) 2134 | b := buf 2135 | b = b[binary.PutUvarint(b, lr.Owner):] 2136 | b = b[binary.PutUvarint(b, uint64(len(lr.ClientId))):] 2137 | b = b[copy(b, lr.ClientId):] 2138 | return buf[:len(buf)-len(b)], nil 2139 | } 2140 | 2141 | func (lr *exclusiveLockRecord) UnmarshalBinary(buf []byte) error { 2142 | r := bytes.NewReader(buf) 2143 | lr.Owner, _ = binary.ReadUvarint(r) 2144 | v, _ := binary.ReadUvarint(r) 2145 | var sb strings.Builder 2146 | sb.Grow(int(v)) 2147 | io.CopyN(&sb, r, int64(v)) 2148 | lr.ClientId = sb.String() 2149 | return nil 2150 | } 2151 | 2152 | func (fs *Fs) TrySetLock(ino uint64, opts SetLockOpts) (bool, error) { 2153 | ok, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2154 | stat, err := fs.txGetStat(tx, ino).Get() 2155 | if err != nil { 2156 | return false, err 2157 | } 2158 | 2159 | if stat.Mode&S_IFMT != S_IFREG { 2160 | return false, ErrInvalid 2161 | } 2162 | 2163 | exclusiveLockKey := tuple.Tuple{"hafs", fs.fsName, "ino", ino, "lock", "exclusive"} 2164 | 2165 | switch opts.Typ { 2166 | case LOCK_NONE: 2167 | exclusiveLockBytes := tx.Get(exclusiveLockKey).MustGet() 2168 | if exclusiveLockBytes != nil { 2169 | exclusiveLock := exclusiveLockRecord{} 2170 | err := exclusiveLock.UnmarshalBinary(exclusiveLockBytes) 2171 | if err != nil { 2172 | return false, err 2173 | } 2174 | // The lock isn't owned by this client. 2175 | if exclusiveLock.ClientId != fs.clientId { 2176 | return false, nil 2177 | } 2178 | // The request isn't for this owner. 2179 | if exclusiveLock.Owner != opts.Owner { 2180 | return false, nil 2181 | } 2182 | tx.Clear(exclusiveLockKey) 2183 | } else { 2184 | sharedLockKey := tuple.Tuple{"hafs", fs.fsName, "ino", ino, "lock", "shared", fs.clientId, opts.Owner} 2185 | tx.Clear(sharedLockKey) 2186 | } 2187 | tx.Clear(tuple.Tuple{"hafs", fs.fsName, "client", fs.clientId, "lock", ino, opts.Owner}) 2188 | return true, nil 2189 | case LOCK_SHARED: 2190 | exclusiveLockBytes := tx.Get(exclusiveLockKey).MustGet() 2191 | if exclusiveLockBytes != nil { 2192 | return false, nil 2193 | } 2194 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "lock", "shared", fs.clientId, opts.Owner}, []byte{}) 2195 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "client", fs.clientId, "lock", ino, opts.Owner}, []byte{}) 2196 | return true, nil 2197 | case LOCK_EXCLUSIVE: 2198 | exclusiveLockBytes := tx.Get(exclusiveLockKey).MustGet() 2199 | if exclusiveLockBytes != nil { 2200 | return false, nil 2201 | } 2202 | sharedLocks := tx.GetRange(tuple.Tuple{"hafs", fs.fsName, "ino", ino, "lock", "shared"}, fdb.RangeOptions{ 2203 | Limit: 1, 2204 | }).GetSliceOrPanic() 2205 | if len(sharedLocks) > 0 { 2206 | return false, nil 2207 | } 2208 | exclusiveLock := exclusiveLockRecord{ 2209 | ClientId: fs.clientId, 2210 | Owner: opts.Owner, 2211 | } 2212 | exclusiveLockBytes, err := exclusiveLock.MarshalBinary() 2213 | if err != nil { 2214 | return false, err 2215 | } 2216 | tx.Set(exclusiveLockKey, exclusiveLockBytes) 2217 | tx.Set(tuple.Tuple{"hafs", fs.fsName, "client", fs.clientId, "lock", ino, opts.Owner}, []byte{}) 2218 | return true, nil 2219 | default: 2220 | panic("api misuse") 2221 | } 2222 | }) 2223 | if err != nil { 2224 | return false, nil 2225 | } 2226 | return ok.(bool), nil 2227 | } 2228 | 2229 | func (fs *Fs) PollAwaitExclusiveLockRelease(cancel <-chan struct{}, ino uint64) error { 2230 | for { 2231 | released, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 2232 | exclusiveLockKey := tuple.Tuple{"hafs", fs.fsName, "ino", ino, "lock", "exclusive"} 2233 | return tx.Get(exclusiveLockKey).MustGet() == nil, nil 2234 | }) 2235 | if err != nil { 2236 | return err 2237 | } 2238 | if released.(bool) == true { 2239 | return nil 2240 | } 2241 | time.Sleep(1 * time.Second) 2242 | } 2243 | } 2244 | 2245 | func (fs *Fs) AwaitExclusiveLockRelease(cancel <-chan struct{}, ino uint64) error { 2246 | w, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2247 | exclusiveLockKey := tuple.Tuple{"hafs", fs.fsName, "ino", ino, "lock", "exclusive"} 2248 | if tx.Get(exclusiveLockKey).MustGet() == nil { 2249 | return nil, nil 2250 | } 2251 | w := tx.Watch(exclusiveLockKey) 2252 | return w, nil 2253 | }) 2254 | if err != nil { 2255 | return fs.PollAwaitExclusiveLockRelease(cancel, ino) 2256 | } 2257 | if w == nil { 2258 | return nil 2259 | } 2260 | 2261 | watch := w.(fdb.FutureNil) 2262 | result := make(chan error, 1) 2263 | go func() { 2264 | result <- watch.Get() 2265 | }() 2266 | 2267 | select { 2268 | case <-cancel: 2269 | watch.Cancel() 2270 | return ErrIntr 2271 | case err := <-result: 2272 | return err 2273 | } 2274 | } 2275 | 2276 | type RemoveExpiredUnlinkedOptions struct { 2277 | RemovalDelay time.Duration 2278 | OnRemoval func(*Stat) 2279 | } 2280 | 2281 | func (fs *Fs) RemoveExpiredUnlinked(opts RemoveExpiredUnlinkedOptions) (uint64, error) { 2282 | 2283 | iterBegin, iterEnd := tuple.Tuple{"hafs", fs.fsName, "unlinked"}.FDBRangeKeys() 2284 | 2285 | iterRange := fdb.KeyRange{ 2286 | Begin: iterBegin, 2287 | End: iterEnd, 2288 | } 2289 | 2290 | nRemoved := &atomicUint64{} 2291 | 2292 | errg, _ := errgroup.WithContext(context.Background()) 2293 | errg.SetLimit(128) 2294 | 2295 | for { 2296 | 2297 | v, err := fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2298 | kvs := tx.GetRange(iterRange, fdb.RangeOptions{ 2299 | Limit: 128, 2300 | }).GetSliceOrPanic() 2301 | return kvs, nil 2302 | }) 2303 | if err != nil { 2304 | return nRemoved.Load(), err 2305 | } 2306 | 2307 | kvs := v.([]fdb.KeyValue) 2308 | 2309 | if len(kvs) == 0 { 2310 | break 2311 | } 2312 | 2313 | nextBegin, err := fdb.Strinc(kvs[len(kvs)-1].Key) 2314 | if err != nil { 2315 | return nRemoved.Load(), err 2316 | } 2317 | iterRange.Begin = fdb.Key(nextBegin) 2318 | 2319 | errg.Go(func() error { 2320 | 2321 | v, err = fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2322 | 2323 | futureStats := make([]futureStat, 0, len(kvs)) 2324 | for _, kv := range kvs { 2325 | keyTuple, err := tuple.Unpack(kv.Key) 2326 | if err != nil { 2327 | return nil, err 2328 | } 2329 | ino := tupleElem2u64(keyTuple[len(keyTuple)-1]) 2330 | futureStats = append(futureStats, fs.txGetStat(tx, ino)) 2331 | } 2332 | 2333 | expiredStats := make([]Stat, 0, len(futureStats)) 2334 | 2335 | now := time.Now() 2336 | for _, futureStat := range futureStats { 2337 | stat, err := futureStat.Get() 2338 | if errors.Is(err, ErrNotExist) { 2339 | continue 2340 | } 2341 | if err != nil { 2342 | return nil, err 2343 | } 2344 | 2345 | if now.After(stat.Ctime().Add(opts.RemovalDelay)) { 2346 | expiredStats = append(expiredStats, stat) 2347 | } 2348 | } 2349 | 2350 | return expiredStats, nil 2351 | }) 2352 | if err != nil { 2353 | return err 2354 | } 2355 | 2356 | expiredStats := v.([]Stat) 2357 | 2358 | if len(expiredStats) == 0 { 2359 | return nil 2360 | } 2361 | 2362 | for i := range expiredStats { 2363 | stat := &expiredStats[i] 2364 | if stat.Mode&S_IFMT != S_IFREG { 2365 | continue 2366 | } 2367 | if stat.Flags&FLAG_OBJECT_STORAGE != 0 { 2368 | err = fs.objectStorage.Remove(fs.fsName, stat.Ino) 2369 | if err != nil { 2370 | return err 2371 | } 2372 | } 2373 | } 2374 | 2375 | _, err = fs.Transact(func(tx fdb.Transaction) (interface{}, error) { 2376 | for i := range expiredStats { 2377 | stat := &expiredStats[i] 2378 | tx.Clear(tuple.Tuple{"hafs", fs.fsName, "unlinked", stat.Ino}) 2379 | tx.ClearRange(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino}) 2380 | } 2381 | return nil, nil 2382 | }) 2383 | if err != nil { 2384 | return err 2385 | } 2386 | 2387 | if opts.OnRemoval != nil { 2388 | for i := range expiredStats { 2389 | opts.OnRemoval(&expiredStats[i]) 2390 | } 2391 | } 2392 | 2393 | nRemoved.Add(uint64(len(expiredStats))) 2394 | return nil 2395 | 2396 | }) 2397 | 2398 | } 2399 | 2400 | err := errg.Wait() 2401 | if err != nil { 2402 | return nRemoved.Load(), err 2403 | } 2404 | 2405 | return nRemoved.Load(), nil 2406 | } 2407 | 2408 | type FsStats struct { 2409 | UsedBytes uint64 2410 | FreeBytes uint64 2411 | } 2412 | 2413 | func (fs *Fs) FsStats() (FsStats, error) { 2414 | 2415 | fsStats := FsStats{} 2416 | 2417 | v, err := fs.db.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 2418 | return tx.Get(fdb.Key("\xFF\xFF/status/json")).MustGet(), nil 2419 | }) 2420 | if err != nil { 2421 | return FsStats{}, err 2422 | } 2423 | 2424 | var p fastjson.Parser 2425 | 2426 | status, err := p.ParseBytes(v.([]byte)) 2427 | if err != nil { 2428 | return fsStats, err 2429 | } 2430 | 2431 | processes := status.GetObject("cluster", "processes") 2432 | if processes != nil { 2433 | processes.Visit(func(key []byte, v *fastjson.Value) { 2434 | fsStats.FreeBytes += v.GetUint64("disk", "free_bytes") 2435 | fsStats.UsedBytes += v.GetUint64("disk", "total_bytes") 2436 | }) 2437 | } 2438 | 2439 | return fsStats, nil 2440 | } 2441 | -------------------------------------------------------------------------------- /fs_test.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "bytes" 5 | "errors" 6 | "fmt" 7 | "io" 8 | iofs "io/fs" 9 | mathrand "math/rand" 10 | "os" 11 | "testing" 12 | "time" 13 | 14 | "github.com/apple/foundationdb/bindings/go/src/fdb" 15 | "github.com/apple/foundationdb/bindings/go/src/fdb/tuple" 16 | ) 17 | 18 | func tmpFs(t *testing.T) *Fs { 19 | db := tmpDB(t) 20 | fs, err := Attach(db, "testfs", AttachOpts{}) 21 | if err != nil { 22 | t.Fatal(err) 23 | } 24 | t.Cleanup(func() { 25 | err = fs.Close() 26 | if err != nil { 27 | t.Logf("unable to close fs: %s", err) 28 | } 29 | }) 30 | return fs 31 | } 32 | 33 | func TestExclusiveLockRecordMarshalAndUnmarshal(t *testing.T) { 34 | t.Parallel() 35 | lr1 := exclusiveLockRecord{ 36 | Owner: 0xffffffffffffffff, 37 | ClientId: "foobar", 38 | } 39 | lr2 := exclusiveLockRecord{} 40 | buf, _ := lr1.MarshalBinary() 41 | _ = lr2.UnmarshalBinary(buf) 42 | if lr1 != lr2 { 43 | t.Fatalf("%v != %v", lr1, lr2) 44 | } 45 | } 46 | 47 | func TestDirEntMarshalAndUnmarshal(t *testing.T) { 48 | t.Parallel() 49 | e1 := DirEnt{ 50 | Mode: S_IFDIR, 51 | Ino: 12345, 52 | } 53 | e2 := DirEnt{} 54 | buf, _ := e1.MarshalBinary() 55 | _ = e2.UnmarshalBinary(buf) 56 | if e1 != e2 { 57 | t.Fatalf("%v != %v", e1, e2) 58 | } 59 | } 60 | 61 | func TestStatMarshalAndUnmarshal(t *testing.T) { 62 | t.Parallel() 63 | s1 := Stat{ 64 | Size: 1, 65 | Atimesec: 2, 66 | Mtimesec: 3, 67 | Ctimesec: 4, 68 | Atimensec: 5, 69 | Mtimensec: 6, 70 | Ctimensec: 7, 71 | Mode: 8, 72 | Nlink: 9, 73 | Uid: 10, 74 | Gid: 11, 75 | Rdev: 12, 76 | } 77 | s2 := Stat{} 78 | buf, _ := s1.MarshalBinary() 79 | _ = s2.UnmarshalBinary(buf) 80 | if s1 != s2 { 81 | t.Fatalf("%v != %v", s1, s2) 82 | } 83 | } 84 | 85 | func TestMkfsAndAttach(t *testing.T) { 86 | t.Parallel() 87 | fs := tmpFs(t) 88 | stat, err := fs.GetStat(ROOT_INO) 89 | if err != nil { 90 | t.Fatal(err) 91 | } 92 | if stat.Mode&S_IFMT != S_IFDIR { 93 | t.Fatal("unexpected mode") 94 | } 95 | err = fs.Close() 96 | if err != nil { 97 | t.Fatal(err) 98 | } 99 | } 100 | 101 | func TestMknod(t *testing.T) { 102 | t.Parallel() 103 | fs := tmpFs(t) 104 | 105 | fooStat, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 106 | Mode: S_IFDIR | 0o777, 107 | Uid: 0, 108 | Gid: 0, 109 | }) 110 | if err != nil { 111 | t.Fatal(err) 112 | } 113 | 114 | _, err = fs.Mknod(ROOT_INO, "foo", MknodOpts{ 115 | Mode: S_IFDIR | 0o777, 116 | Uid: 0, 117 | Gid: 0, 118 | }) 119 | if !errors.Is(err, ErrExist) { 120 | t.Fatal(err) 121 | } 122 | 123 | directStat, err := fs.GetStat(fooStat.Ino) 124 | if err != nil { 125 | t.Fatal(err) 126 | } 127 | 128 | lookupStat, err := fs.Lookup(ROOT_INO, "foo") 129 | if err != nil { 130 | t.Fatal(err) 131 | } 132 | 133 | if directStat != lookupStat { 134 | t.Fatalf("stats differ: %v != %v", directStat, lookupStat) 135 | } 136 | 137 | if directStat.Mode != S_IFDIR|0o777 { 138 | t.Fatalf("unexpected mode: %d", directStat.Mode) 139 | } 140 | 141 | } 142 | 143 | func TestSymlink(t *testing.T) { 144 | t.Parallel() 145 | fs := tmpFs(t) 146 | fooStat, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 147 | Mode: S_IFLNK | 0o777, 148 | Uid: 0, 149 | Gid: 0, 150 | LinkTarget: []byte("abc"), 151 | }) 152 | if err != nil { 153 | t.Fatal(err) 154 | } 155 | 156 | l, err := fs.ReadSymlink(fooStat.Ino) 157 | if err != nil { 158 | t.Fatal(err) 159 | } 160 | 161 | if string(l) != "abc" { 162 | t.Fatalf("unexpected link target: %v", l) 163 | } 164 | } 165 | 166 | func TestMknodTruncate(t *testing.T) { 167 | t.Parallel() 168 | fs := tmpFs(t) 169 | 170 | fooStat1, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 171 | Mode: S_IFREG | 0o777, 172 | Uid: 0, 173 | Gid: 0, 174 | }) 175 | if err != nil { 176 | t.Fatal(err) 177 | } 178 | 179 | fooStat2, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 180 | Truncate: true, 181 | Mode: S_IFDIR | 0o777, 182 | Uid: 0, 183 | Gid: 0, 184 | }) 185 | if err != nil { 186 | t.Fatal(err) 187 | } 188 | 189 | if fooStat1.Ino != fooStat2.Ino { 190 | t.Fatalf("inodes differ") 191 | } 192 | } 193 | 194 | func TestUnlink(t *testing.T) { 195 | t.Parallel() 196 | fs := tmpFs(t) 197 | 198 | err := fs.Unlink(ROOT_INO, "foo") 199 | if !errors.Is(err, ErrNotExist) { 200 | t.Fatal(err) 201 | } 202 | 203 | fooStat, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 204 | Mode: S_IFDIR | 0o777, 205 | Uid: 0, 206 | Gid: 0, 207 | }) 208 | if err != nil { 209 | t.Fatal(err) 210 | } 211 | 212 | err = fs.Unlink(ROOT_INO, "foo") 213 | if err != nil { 214 | t.Fatal(err) 215 | } 216 | 217 | stat, err := fs.GetStat(fooStat.Ino) 218 | if err != nil { 219 | t.Fatal(err) 220 | } 221 | 222 | if stat.Nlink != 0 { 223 | t.Fatal("expected unlinked") 224 | } 225 | 226 | nRemoved, err := fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 227 | RemovalDelay: 10 * time.Second, 228 | }) 229 | if err != nil { 230 | t.Fatal(err) 231 | } 232 | 233 | if nRemoved != 0 { 234 | t.Fatal("expected nothing to be removed") 235 | } 236 | 237 | nRemoved, err = fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 238 | RemovalDelay: 0, 239 | }) 240 | if err != nil { 241 | t.Fatal(err) 242 | } 243 | 244 | if nRemoved != 1 { 245 | t.Fatal("expected file to be removed") 246 | } 247 | 248 | nRemoved, err = fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 249 | RemovalDelay: 0, 250 | }) 251 | if err != nil { 252 | t.Fatal(err) 253 | } 254 | 255 | if nRemoved != 0 { 256 | t.Fatal("expected nothing to be removed") 257 | } 258 | } 259 | 260 | func TestObjectStorageWriteOnce(t *testing.T) { 261 | t.Parallel() 262 | db := tmpDB(t) 263 | 264 | storageDir := t.TempDir() 265 | 266 | err := SetObjectStorageSpec(db, "testfs", "file://"+storageDir, SetObjectStorageSpecOpts{}) 267 | if err != nil { 268 | t.Fatal(err) 269 | } 270 | 271 | fs, err := Attach(db, "testfs", AttachOpts{}) 272 | if err != nil { 273 | t.Fatal(err) 274 | } 275 | defer fs.Close() 276 | 277 | err = fs.SetXAttr(ROOT_INO, "hafs.object-storage", []byte("true")) 278 | if err != nil { 279 | t.Fatal(err) 280 | } 281 | 282 | f1, stat, err := fs.CreateFile(ROOT_INO, "f", CreateFileOpts{ 283 | Mode: 0o777, 284 | }) 285 | if err != nil { 286 | t.Fatal(err) 287 | } 288 | defer f1.Close() 289 | 290 | _, err = f1.WriteData([]byte{1}, 0) 291 | if err != nil { 292 | t.Fatal(err) 293 | } 294 | 295 | err = f1.Fsync() 296 | if err != nil { 297 | t.Fatal(err) 298 | } 299 | 300 | f2, stat, err := fs.OpenFile(stat.Ino, OpenFileOpts{}) 301 | if err != nil { 302 | t.Fatal(err) 303 | } 304 | defer f2.Close() 305 | 306 | _, err = f2.WriteData([]byte{1}, 0) 307 | if err == nil { 308 | t.Fatal(err) 309 | } 310 | 311 | } 312 | 313 | func TestObjectStorageReadWrite(t *testing.T) { 314 | t.Parallel() 315 | db := tmpDB(t) 316 | 317 | storageDir := t.TempDir() 318 | 319 | err := SetObjectStorageSpec(db, "testfs", "file://"+storageDir, SetObjectStorageSpecOpts{}) 320 | if err != nil { 321 | t.Fatal(err) 322 | } 323 | 324 | smallObjectOptimizationThreshold := int64(1024) 325 | 326 | fs, err := Attach(db, "testfs", AttachOpts{ 327 | SmallObjectOptimizationThreshold: uint64(smallObjectOptimizationThreshold), 328 | }) 329 | if err != nil { 330 | t.Fatal(err) 331 | } 332 | defer fs.Close() 333 | 334 | err = fs.SetXAttr(ROOT_INO, "hafs.object-storage", []byte("true")) 335 | if err != nil { 336 | t.Fatal(err) 337 | } 338 | 339 | for i, sz := range []int64{1, 2, 10, smallObjectOptimizationThreshold, smallObjectOptimizationThreshold * 2} { 340 | 341 | f, stat, err := fs.CreateFile(ROOT_INO, fmt.Sprintf("f%d", i), CreateFileOpts{ 342 | Mode: 0o777, 343 | }) 344 | if err != nil { 345 | t.Fatal(err) 346 | } 347 | defer f.Close() 348 | 349 | expectedBuffer := make([]byte, 0, sz) 350 | for j := int64(0); j < sz; j += 1 { 351 | expectedBuffer = append(expectedBuffer, byte(j%256)) 352 | } 353 | 354 | _, err = f.WriteData(expectedBuffer, 0) 355 | if err != nil { 356 | t.Fatal(err) 357 | } 358 | 359 | err = f.Fsync() 360 | if err != nil { 361 | t.Fatal(err) 362 | } 363 | 364 | err = f.Close() 365 | if err != nil { 366 | t.Fatal(err) 367 | } 368 | 369 | f, stat, err = fs.OpenFile(stat.Ino, OpenFileOpts{}) 370 | if err != nil { 371 | t.Fatal(err) 372 | } 373 | defer f.Close() 374 | 375 | buf := make([]byte, sz*2, sz*2) 376 | 377 | n, err := f.ReadData(buf, 0) 378 | if err != io.EOF { 379 | t.Fatal(err) 380 | } 381 | if int64(n) != sz { 382 | t.Fatal(n) 383 | } 384 | 385 | if !bytes.Equal(buf[:n], expectedBuffer) { 386 | t.Fatal(buf) 387 | } 388 | 389 | if sz < smallObjectOptimizationThreshold { 390 | _ = f.(*objectStoreSmallReadOnlyFile) 391 | } else { 392 | _ = f.(*objectStoreReadOnlyFile) 393 | } 394 | } 395 | 396 | } 397 | 398 | func TestObjectStorageUnlink(t *testing.T) { 399 | t.Parallel() 400 | db := tmpDB(t) 401 | 402 | storageDir := t.TempDir() 403 | 404 | err := SetObjectStorageSpec(db, "testfs", "file://"+storageDir, SetObjectStorageSpecOpts{}) 405 | if err != nil { 406 | t.Fatal(err) 407 | } 408 | 409 | fs, err := Attach(db, "testfs", AttachOpts{}) 410 | if err != nil { 411 | t.Fatal(err) 412 | } 413 | defer fs.Close() 414 | 415 | err = fs.SetXAttr(ROOT_INO, "hafs.object-storage", []byte("true")) 416 | if err != nil { 417 | t.Fatal(err) 418 | } 419 | 420 | f, stat, err := fs.CreateFile(ROOT_INO, "f", CreateFileOpts{ 421 | Mode: 0o777, 422 | }) 423 | if err != nil { 424 | t.Fatal(err) 425 | } 426 | defer f.Close() 427 | 428 | _, err = f.WriteData([]byte{1}, 0) 429 | if err != nil { 430 | t.Fatal(err) 431 | } 432 | 433 | err = f.Fsync() 434 | if err != nil { 435 | t.Fatal(err) 436 | } 437 | 438 | inodeDataPath := fmt.Sprintf("%s/%016x.%s", storageDir, stat.Ino, fs.fsName) 439 | _, err = os.Stat(inodeDataPath) 440 | if err != nil { 441 | t.Fatal(err) 442 | } 443 | 444 | err = fs.Unlink(ROOT_INO, "f") 445 | if err != nil { 446 | t.Fatal(err) 447 | } 448 | 449 | nRemoved, err := fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 450 | RemovalDelay: 0, 451 | }) 452 | if err != nil { 453 | t.Fatal(err) 454 | } 455 | 456 | if nRemoved != 1 { 457 | t.Fatal("expected file to be removed") 458 | } 459 | 460 | _, err = os.Stat(inodeDataPath) 461 | if !errors.Is(err, iofs.ErrNotExist) { 462 | t.Fatal(err) 463 | } 464 | 465 | } 466 | 467 | func TestRenameSameDir(t *testing.T) { 468 | t.Parallel() 469 | fs := tmpFs(t) 470 | 471 | foo1Stat, err := fs.Mknod(ROOT_INO, "foo1", MknodOpts{ 472 | Mode: S_IFDIR | 0o777, 473 | Uid: 0, 474 | Gid: 0, 475 | }) 476 | if err != nil { 477 | t.Fatal(err) 478 | } 479 | 480 | _, err = fs.Mknod(ROOT_INO, "foo2", MknodOpts{ 481 | Mode: S_IFREG | 0o777, 482 | Uid: 0, 483 | Gid: 0, 484 | }) 485 | if err != nil { 486 | t.Fatal(err) 487 | } 488 | 489 | err = fs.Rename(ROOT_INO, ROOT_INO, "foo1", "bar1") 490 | if err != nil { 491 | t.Fatal(err) 492 | } 493 | 494 | _, err = fs.Lookup(ROOT_INO, "foo1") 495 | if !errors.Is(err, ErrNotExist) { 496 | t.Fatal(err) 497 | } 498 | 499 | bar1Stat, err := fs.Lookup(ROOT_INO, "bar1") 500 | if err != nil { 501 | t.Fatal(err) 502 | } 503 | 504 | if bar1Stat.Ino != foo1Stat.Ino { 505 | t.Fatalf("bar1 stat is bad: %#v", bar1Stat) 506 | } 507 | 508 | } 509 | 510 | func TestRenameSameDirOverwrite(t *testing.T) { 511 | t.Parallel() 512 | fs := tmpFs(t) 513 | 514 | foo1Stat, err := fs.Mknod(ROOT_INO, "foo1", MknodOpts{ 515 | Mode: S_IFDIR | 0o777, 516 | Uid: 0, 517 | Gid: 0, 518 | }) 519 | if err != nil { 520 | t.Fatal(err) 521 | } 522 | 523 | f, _, err := fs.CreateFile(ROOT_INO, "bar1", CreateFileOpts{ 524 | Mode: 0o777, 525 | }) 526 | if err != nil { 527 | t.Fatal(err) 528 | } 529 | 530 | _, err = f.WriteData([]byte{1, 2, 3, 4, 5}, 0) 531 | if err != nil { 532 | t.Fatal(err) 533 | } 534 | 535 | err = f.Close() 536 | if err != nil { 537 | t.Fatal(err) 538 | } 539 | 540 | bcnt, err := fs.SubvolumeByteCount(ROOT_INO) 541 | if err != nil { 542 | t.Fatal(err) 543 | } 544 | if bcnt != 5 { 545 | t.Fatal(bcnt) 546 | } 547 | 548 | icnt, err := fs.SubvolumeInodeCount(ROOT_INO) 549 | if err != nil { 550 | t.Fatal(err) 551 | } 552 | if icnt != 2 { 553 | t.Fatal(icnt) 554 | } 555 | 556 | err = fs.Rename(ROOT_INO, ROOT_INO, "foo1", "bar1") 557 | if err != nil { 558 | t.Fatal(err) 559 | } 560 | 561 | _, err = fs.Lookup(ROOT_INO, "foo1") 562 | if !errors.Is(err, ErrNotExist) { 563 | t.Fatal(err) 564 | } 565 | 566 | bar1Stat, err := fs.Lookup(ROOT_INO, "bar1") 567 | if err != nil { 568 | t.Fatal(err) 569 | } 570 | 571 | if bar1Stat.Ino != foo1Stat.Ino { 572 | t.Fatalf("bar1 stat is bad: %#v", bar1Stat) 573 | } 574 | 575 | nRemoved, err := fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 576 | RemovalDelay: 0, 577 | }) 578 | if err != nil { 579 | t.Fatal(err) 580 | } 581 | 582 | if nRemoved != 1 { 583 | t.Fatal("expected file to be removed") 584 | } 585 | 586 | bcnt, err = fs.SubvolumeByteCount(ROOT_INO) 587 | if err != nil { 588 | t.Fatal(err) 589 | } 590 | if bcnt != 0 { 591 | t.Fatal(bcnt) 592 | } 593 | 594 | icnt, err = fs.SubvolumeInodeCount(ROOT_INO) 595 | if err != nil { 596 | t.Fatal(err) 597 | } 598 | if icnt != 1 { 599 | t.Fatal(icnt) 600 | } 601 | 602 | } 603 | 604 | func TestRenameDifferentDir(t *testing.T) { 605 | t.Parallel() 606 | fs := tmpFs(t) 607 | 608 | dStat, err := fs.Mknod(ROOT_INO, "d", MknodOpts{ 609 | Mode: S_IFDIR | 0o777, 610 | Uid: 0, 611 | Gid: 0, 612 | }) 613 | if err != nil { 614 | t.Fatal(err) 615 | } 616 | 617 | fooStat, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 618 | Mode: S_IFREG | 0o777, 619 | Uid: 0, 620 | Gid: 0, 621 | }) 622 | if err != nil { 623 | t.Fatal(err) 624 | } 625 | 626 | err = fs.Rename(ROOT_INO, dStat.Ino, "foo", "bar") 627 | if err != nil { 628 | t.Fatal(err) 629 | } 630 | 631 | _, err = fs.Lookup(ROOT_INO, "foo") 632 | if !errors.Is(err, ErrNotExist) { 633 | t.Fatal(err) 634 | } 635 | 636 | barStat, err := fs.Lookup(dStat.Ino, "bar") 637 | if err != nil { 638 | t.Fatal(err) 639 | } 640 | 641 | if barStat.Ino != fooStat.Ino { 642 | t.Fatalf("bar1 stat is bad: %#v", barStat) 643 | } 644 | 645 | } 646 | 647 | func TestRenameDifferentDirOverwrite(t *testing.T) { 648 | t.Parallel() 649 | fs := tmpFs(t) 650 | 651 | dStat, err := fs.Mknod(ROOT_INO, "d", MknodOpts{ 652 | Mode: S_IFDIR | 0o777, 653 | Uid: 0, 654 | Gid: 0, 655 | }) 656 | if err != nil { 657 | t.Fatal(err) 658 | } 659 | 660 | _, err = fs.Mknod(dStat.Ino, "bar", MknodOpts{ 661 | Mode: S_IFREG | 0o777, 662 | Uid: 0, 663 | Gid: 0, 664 | }) 665 | if err != nil { 666 | t.Fatal(err) 667 | } 668 | 669 | fooStat, err := fs.Mknod(ROOT_INO, "foo", MknodOpts{ 670 | Mode: S_IFREG | 0o777, 671 | Uid: 0, 672 | Gid: 0, 673 | }) 674 | if err != nil { 675 | t.Fatal(err) 676 | } 677 | 678 | err = fs.Rename(ROOT_INO, dStat.Ino, "foo", "bar") 679 | if err != nil { 680 | t.Fatal(err) 681 | } 682 | 683 | _, err = fs.Lookup(ROOT_INO, "foo") 684 | if !errors.Is(err, ErrNotExist) { 685 | t.Fatal(err) 686 | } 687 | 688 | barStat, err := fs.Lookup(dStat.Ino, "bar") 689 | if err != nil { 690 | t.Fatal(err) 691 | } 692 | 693 | if barStat.Ino != fooStat.Ino { 694 | t.Fatalf("bar1 stat is bad: %#v", barStat) 695 | } 696 | 697 | nRemoved, err := fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 698 | RemovalDelay: 0, 699 | }) 700 | if err != nil { 701 | t.Fatal(err) 702 | } 703 | 704 | if nRemoved != 1 { 705 | t.Fatal("expected file to be removed") 706 | } 707 | 708 | } 709 | 710 | func TestDirIter(t *testing.T) { 711 | t.Parallel() 712 | fs := tmpFs(t) 713 | 714 | const COUNT = _DIR_ITER_BATCH_SIZE*3 + _DIR_ITER_BATCH_SIZE/3 715 | 716 | for i := 0; i < COUNT; i += 1 { 717 | _, err := fs.Mknod(ROOT_INO, fmt.Sprintf("a%d", i), MknodOpts{ 718 | Mode: S_IFDIR | 0o777, 719 | Uid: 0, 720 | Gid: 0, 721 | }) 722 | if err != nil { 723 | t.Fatal(err) 724 | } 725 | } 726 | 727 | di, err := fs.IterDirEnts(ROOT_INO) 728 | if err != nil { 729 | t.Fatal(err) 730 | } 731 | 732 | count := 0 733 | for { 734 | _, err := di.Next() 735 | if errors.Is(err, io.EOF) { 736 | break 737 | } 738 | if err != nil { 739 | t.Fatalf("unexpected error %s", err) 740 | } 741 | count += 1 742 | } 743 | 744 | if count != COUNT { 745 | t.Fatalf("unexpected count: %d", count) 746 | } 747 | } 748 | 749 | func TestDirIterPlus(t *testing.T) { 750 | t.Parallel() 751 | fs := tmpFs(t) 752 | 753 | const COUNT = _DIR_ITER_BATCH_SIZE*2 + 57 754 | 755 | for i := 0; i < COUNT; i += 1 { 756 | _, err := fs.Mknod(ROOT_INO, fmt.Sprintf("a%d", i), MknodOpts{ 757 | Mode: S_IFDIR | 0o777, 758 | Uid: 0, 759 | Gid: 0, 760 | }) 761 | if err != nil { 762 | t.Fatal(err) 763 | } 764 | } 765 | 766 | di, err := fs.IterDirEnts(ROOT_INO) 767 | if err != nil { 768 | t.Fatal(err) 769 | } 770 | 771 | count := 0 772 | for { 773 | dirEnt, stat, err := di.NextPlus() 774 | if errors.Is(err, io.EOF) { 775 | break 776 | } 777 | if err != nil { 778 | t.Fatalf("unexpected error %s", err) 779 | } 780 | if dirEnt.Ino != stat.Ino { 781 | t.Fatalf("stat unexpectedly differs from dir ent: %#v %#v", dirEnt, stat) 782 | } 783 | count += 1 784 | } 785 | 786 | if count != COUNT { 787 | t.Fatalf("unexpected count: %d", count) 788 | } 789 | } 790 | 791 | func TestWriteDataOneChunk(t *testing.T) { 792 | t.Parallel() 793 | fs := tmpFs(t) 794 | 795 | testSizes := []uint64{0, 1, 3, CHUNK_SIZE - 1, CHUNK_SIZE} 796 | 797 | for i, n := range testSizes { 798 | name := fmt.Sprintf("d%d", i) 799 | f, stat, err := fs.CreateFile(ROOT_INO, name, CreateFileOpts{ 800 | Mode: 0o777, 801 | }) 802 | if err != nil { 803 | t.Fatal(err) 804 | } 805 | 806 | data := make([]byte, n, n) 807 | 808 | for j := 0; j < len(data); j += 1 { 809 | data[j] = byte(j % 256) 810 | } 811 | 812 | nWritten, err := f.WriteData(data, 0) 813 | if err != nil { 814 | t.Fatal(err) 815 | } 816 | if nWritten != uint32(len(data)) { 817 | t.Fatalf("unexpected write amount %d != %d", nWritten, len(data)) 818 | } 819 | 820 | fetchedData, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 821 | data := tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "data", 0}).MustGet() 822 | zeroExpandChunk(&data) 823 | return data, nil 824 | }) 825 | if err != nil { 826 | t.Fatal(err) 827 | } 828 | 829 | if len(fetchedData.([]byte)) != CHUNK_SIZE { 830 | t.Fatalf("unexpected chunk size: %d", fetchedData) 831 | } 832 | 833 | if !bytes.Equal(data, fetchedData.([]byte)[:len(data)]) { 834 | t.Fatalf("%v != %v", data, fetchedData) 835 | } 836 | 837 | } 838 | } 839 | 840 | func TestWriteDataTwoChunks(t *testing.T) { 841 | t.Parallel() 842 | fs := tmpFs(t) 843 | 844 | testSizes := []uint64{CHUNK_SIZE + 1, CHUNK_SIZE + 3, CHUNK_SIZE * 2} 845 | 846 | for i, n := range testSizes { 847 | name := fmt.Sprintf("d%d", i) 848 | f, stat, err := fs.CreateFile(ROOT_INO, name, CreateFileOpts{ 849 | Mode: 0o777, 850 | }) 851 | if err != nil { 852 | t.Fatal(err) 853 | } 854 | 855 | data := make([]byte, n, n) 856 | 857 | for j := 0; j < len(data); j += 1 { 858 | data[j] = byte(j % 256) 859 | } 860 | 861 | data1 := data[:CHUNK_SIZE] 862 | data2 := data[CHUNK_SIZE:] 863 | 864 | nWritten := 0 865 | for nWritten != len(data) { 866 | n, err := f.WriteData(data[nWritten:], uint64(nWritten)) 867 | if err != nil { 868 | t.Fatal(err) 869 | } 870 | nWritten += int(n) 871 | } 872 | 873 | fetchedData1, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 874 | data := tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "data", 0}).MustGet() 875 | zeroExpandChunk(&data) 876 | return data, nil 877 | }) 878 | if err != nil { 879 | t.Fatal(err) 880 | } 881 | 882 | fetchedData2, err := fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 883 | data := tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "data", 1}).MustGet() 884 | zeroExpandChunk(&data) 885 | return data, nil 886 | }) 887 | if err != nil { 888 | t.Fatal(err) 889 | } 890 | 891 | if !bytes.Equal(data1, fetchedData1.([]byte)[:len(data1)]) { 892 | t.Fatalf("%v != %v", data, fetchedData1) 893 | } 894 | 895 | if !bytes.Equal(data2, fetchedData2.([]byte)[:len(data2)]) { 896 | t.Fatalf("%v != %v", data, fetchedData2) 897 | } 898 | 899 | stat, err = fs.GetStat(stat.Ino) 900 | if err != nil { 901 | t.Fatal(err) 902 | } 903 | 904 | if stat.Size != uint64(len(data)) { 905 | t.Fatalf("unexpected size - %d != %d", stat.Size, len(data)) 906 | } 907 | 908 | } 909 | } 910 | 911 | func TestTruncate(t *testing.T) { 912 | t.Parallel() 913 | fs := tmpFs(t) 914 | 915 | testSizes := []uint64{ 916 | CHUNK_SIZE - 1, 917 | CHUNK_SIZE + 1, 918 | CHUNK_SIZE*2 - 1, 919 | CHUNK_SIZE * 2, 920 | CHUNK_SIZE*2 + 1, 921 | } 922 | 923 | for i, n := range testSizes { 924 | name := fmt.Sprintf("d%d", i) 925 | f, stat, err := fs.CreateFile(ROOT_INO, name, CreateFileOpts{ 926 | Mode: 0o777, 927 | Uid: 0, 928 | Gid: 0, 929 | }) 930 | if err != nil { 931 | t.Fatal(err) 932 | } 933 | 934 | data := make([]byte, n, n) 935 | 936 | for j := 0; j < len(data); j += 1 { 937 | data[j] = byte(j % 256) 938 | } 939 | 940 | nWritten := 0 941 | for nWritten != len(data) { 942 | n, err := f.WriteData(data[nWritten:], uint64(nWritten)) 943 | if err != nil { 944 | t.Fatal(err) 945 | } 946 | nWritten += int(n) 947 | } 948 | 949 | _, err = fs.ModStat(stat.Ino, ModStatOpts{ 950 | Valid: MODSTAT_SIZE, 951 | Size: 5, 952 | }) 953 | if err != nil { 954 | t.Fatal(err) 955 | } 956 | 957 | _, err = fs.ReadTransact(func(tx fdb.ReadTransaction) (interface{}, error) { 958 | kvs := tx.GetRange(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "data"}, fdb.RangeOptions{}).GetSliceOrPanic() 959 | if len(kvs) != 1 { 960 | t.Fatalf("bad number of data chunks: %d", len(kvs)) 961 | } 962 | data := tx.Get(tuple.Tuple{"hafs", fs.fsName, "ino", stat.Ino, "data", 0}).MustGet() 963 | zeroExpandChunk(&data) 964 | if len(data) != CHUNK_SIZE { 965 | t.Fatalf("bad data size: %d", len(data)) 966 | } 967 | if !bytes.Equal(data[:5], []byte{0, 1, 2, 3, 4}) { 968 | t.Fatalf("bad data: %v", data) 969 | } 970 | return nil, nil 971 | }) 972 | if err != nil { 973 | t.Fatal(err) 974 | } 975 | 976 | } 977 | } 978 | 979 | func TestReadWriteData(t *testing.T) { 980 | t.Parallel() 981 | fs := tmpFs(t) 982 | 983 | // Random writes at different offsets to exercise the sparse code paths. 984 | for i := 0; i < 100; i++ { 985 | f, stat, err := fs.CreateFile(ROOT_INO, "f", CreateFileOpts{ 986 | Mode: 0o777, 987 | }) 988 | if err != nil { 989 | t.Fatal(err) 990 | } 991 | 992 | referenceFile, err := os.CreateTemp("", "") 993 | if err != nil { 994 | t.Fatal(err) 995 | } 996 | size := mathrand.Int()%(CHUNK_SIZE*3) + CHUNK_SIZE/2 997 | nwrites := mathrand.Int() % 5 998 | for i := 0; i < nwrites; i++ { 999 | writeOffset := mathrand.Int() % size 1000 | writeSize := mathrand.Int() % (size - writeOffset) 1001 | writeData := make([]byte, writeSize, writeSize) 1002 | n, err := mathrand.Read(writeData) 1003 | if err != nil || n != len(writeData) { 1004 | t.Fatalf("%s %d", err, n) 1005 | } 1006 | n, err = referenceFile.WriteAt(writeData, int64(writeOffset)) 1007 | if err != nil || n != len(writeData) { 1008 | t.Fatalf("%s %d", err, n) 1009 | } 1010 | nWritten := 0 1011 | for nWritten != len(writeData) { 1012 | n, err := f.WriteData(writeData[nWritten:], uint64(writeOffset)+uint64(nWritten)) 1013 | if err != nil { 1014 | t.Fatal(err) 1015 | } 1016 | nWritten += int(n) 1017 | } 1018 | } 1019 | 1020 | referenceData, err := io.ReadAll(referenceFile) 1021 | if err != nil { 1022 | t.Fatal(err) 1023 | } 1024 | 1025 | stat, err = fs.Lookup(ROOT_INO, "f") 1026 | if err != nil { 1027 | t.Fatal(err) 1028 | } 1029 | 1030 | if stat.Size != uint64(len(referenceData)) { 1031 | t.Fatalf("read lengths differ:\n%v\n!=%v\n", stat.Size, len(referenceData)) 1032 | } 1033 | 1034 | actualData := &bytes.Buffer{} 1035 | nRead := uint64(0) 1036 | readSize := (mathrand.Int() % 2 * CHUNK_SIZE) + 100 1037 | readBuf := make([]byte, readSize, readSize) 1038 | for { 1039 | n, err := f.ReadData(readBuf, nRead) 1040 | nRead += uint64(n) 1041 | _, _ = actualData.Write(readBuf[:n]) 1042 | if err == io.EOF { 1043 | break 1044 | } 1045 | if err != nil { 1046 | t.Fatal(err) 1047 | } 1048 | if nRead > uint64(len(referenceData)) { 1049 | t.Fatalf("file too large - expected %d bytes, but read %d ", len(referenceData), nRead) 1050 | } 1051 | } 1052 | 1053 | if len(referenceData) != actualData.Len() { 1054 | t.Fatalf("read lengths differ:\n%v\n!=%v\n", len(referenceData), actualData.Len()) 1055 | } 1056 | 1057 | if !bytes.Equal(referenceData, actualData.Bytes()) { 1058 | t.Fatalf("read corrupt:\n%v\n!=%v\n", referenceData, actualData.Bytes()) 1059 | } 1060 | 1061 | _ = referenceFile.Close() 1062 | 1063 | err = fs.Unlink(ROOT_INO, "f") 1064 | if err != nil { 1065 | t.Fatal(err) 1066 | } 1067 | } 1068 | 1069 | } 1070 | 1071 | func TestSetLock(t *testing.T) { 1072 | t.Parallel() 1073 | fs := tmpFs(t) 1074 | 1075 | stat, err := fs.Mknod(ROOT_INO, "f", MknodOpts{ 1076 | Mode: S_IFREG | 0o777, 1077 | Uid: 0, 1078 | Gid: 0, 1079 | }) 1080 | 1081 | ok, err := fs.TrySetLock(stat.Ino, SetLockOpts{ 1082 | Typ: LOCK_NONE, 1083 | Owner: 1, 1084 | }) 1085 | if err != nil { 1086 | t.Fatal(err) 1087 | } 1088 | if !ok { 1089 | t.Fatal() 1090 | } 1091 | 1092 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1093 | Typ: LOCK_SHARED, 1094 | Owner: 1, 1095 | }) 1096 | if err != nil { 1097 | t.Fatal(err) 1098 | } 1099 | if !ok { 1100 | t.Fatal() 1101 | } 1102 | 1103 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1104 | Typ: LOCK_SHARED, 1105 | Owner: 2, 1106 | }) 1107 | if err != nil { 1108 | t.Fatal(err) 1109 | } 1110 | if !ok { 1111 | t.Fatal() 1112 | } 1113 | 1114 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1115 | Typ: LOCK_EXCLUSIVE, 1116 | Owner: 3, 1117 | }) 1118 | if err != nil { 1119 | t.Fatal(err) 1120 | } 1121 | if ok { 1122 | t.Fatal() 1123 | } 1124 | 1125 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1126 | Typ: LOCK_NONE, 1127 | Owner: 1, 1128 | }) 1129 | if err != nil { 1130 | t.Fatal(err) 1131 | } 1132 | if !ok { 1133 | t.Fatal() 1134 | } 1135 | 1136 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1137 | Typ: LOCK_NONE, 1138 | Owner: 2, 1139 | }) 1140 | if err != nil { 1141 | t.Fatal(err) 1142 | } 1143 | if !ok { 1144 | t.Fatal() 1145 | } 1146 | 1147 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1148 | Typ: LOCK_EXCLUSIVE, 1149 | Owner: 3, 1150 | }) 1151 | if err != nil { 1152 | t.Fatal(err) 1153 | } 1154 | if !ok { 1155 | t.Fatal() 1156 | } 1157 | 1158 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1159 | Typ: LOCK_EXCLUSIVE, 1160 | Owner: 4, 1161 | }) 1162 | if err != nil { 1163 | t.Fatal(err) 1164 | } 1165 | if ok { 1166 | t.Fatal() 1167 | } 1168 | 1169 | } 1170 | 1171 | func TestWaitForLockWithWatch(t *testing.T) { 1172 | t.Parallel() 1173 | fs := tmpFs(t) 1174 | 1175 | stat, err := fs.Mknod(ROOT_INO, "f", MknodOpts{ 1176 | Mode: S_IFREG | 0o777, 1177 | Uid: 0, 1178 | Gid: 0, 1179 | }) 1180 | 1181 | ok, err := fs.TrySetLock(stat.Ino, SetLockOpts{ 1182 | Typ: LOCK_EXCLUSIVE, 1183 | Owner: 1, 1184 | }) 1185 | if err != nil { 1186 | t.Fatal(err) 1187 | } 1188 | if !ok { 1189 | t.Fatal() 1190 | } 1191 | 1192 | waitStart := time.Now() 1193 | 1194 | go func() { 1195 | time.Sleep(100 * time.Millisecond) 1196 | ok, err := fs.TrySetLock(stat.Ino, SetLockOpts{ 1197 | Typ: LOCK_NONE, 1198 | Owner: 1, 1199 | }) 1200 | if err != nil { 1201 | t.Fatal(err) 1202 | } 1203 | if !ok { 1204 | t.Fatal() 1205 | } 1206 | }() 1207 | 1208 | err = fs.AwaitExclusiveLockRelease(make(chan struct{}, 1), stat.Ino) 1209 | if err != nil { 1210 | t.Fatal(err) 1211 | } 1212 | 1213 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1214 | Typ: LOCK_EXCLUSIVE, 1215 | Owner: 1, 1216 | }) 1217 | if err != nil { 1218 | t.Fatal(err) 1219 | } 1220 | if !ok { 1221 | t.Fatal() 1222 | } 1223 | 1224 | if time.Since(waitStart) > 200*time.Millisecond { 1225 | t.Fatal("wait took too long") 1226 | } 1227 | } 1228 | 1229 | func TestWaitForLockWithPoll(t *testing.T) { 1230 | t.Parallel() 1231 | fs := tmpFs(t) 1232 | 1233 | stat, err := fs.Mknod(ROOT_INO, "f", MknodOpts{ 1234 | Mode: S_IFREG | 0o777, 1235 | Uid: 0, 1236 | Gid: 0, 1237 | }) 1238 | 1239 | ok, err := fs.TrySetLock(stat.Ino, SetLockOpts{ 1240 | Typ: LOCK_EXCLUSIVE, 1241 | Owner: 1, 1242 | }) 1243 | if err != nil { 1244 | t.Fatal(err) 1245 | } 1246 | if !ok { 1247 | t.Fatal() 1248 | } 1249 | 1250 | waitStart := time.Now() 1251 | 1252 | go func() { 1253 | time.Sleep(100 * time.Millisecond) 1254 | ok, err := fs.TrySetLock(stat.Ino, SetLockOpts{ 1255 | Typ: LOCK_NONE, 1256 | Owner: 1, 1257 | }) 1258 | if err != nil { 1259 | t.Fatal(err) 1260 | } 1261 | if !ok { 1262 | t.Fatal() 1263 | } 1264 | }() 1265 | 1266 | err = fs.PollAwaitExclusiveLockRelease(make(chan struct{}, 1), stat.Ino) 1267 | if err != nil { 1268 | t.Fatal(err) 1269 | } 1270 | 1271 | ok, err = fs.TrySetLock(stat.Ino, SetLockOpts{ 1272 | Typ: LOCK_EXCLUSIVE, 1273 | Owner: 1, 1274 | }) 1275 | if err != nil { 1276 | t.Fatal(err) 1277 | } 1278 | if !ok { 1279 | t.Fatal() 1280 | } 1281 | 1282 | if time.Since(waitStart) > 1200*time.Millisecond { 1283 | t.Fatal("wait took too long") 1284 | } 1285 | } 1286 | 1287 | func TestInodeAllocation(t *testing.T) { 1288 | t.Parallel() 1289 | fs := tmpFs(t) 1290 | seen := make(map[uint64]struct{}) 1291 | for i := uint64(0); i < _INO_STEP*10; i++ { 1292 | ino, err := fs.nextIno() 1293 | if err != nil { 1294 | t.Fatal(err) 1295 | } 1296 | //t.Logf("%016x %016x", ino, bits.Reverse64(ino)) 1297 | _, seenBefore := seen[ino] 1298 | if seenBefore { 1299 | t.Fatal("repeated inode") 1300 | } 1301 | seen[ino] = struct{}{} 1302 | } 1303 | } 1304 | 1305 | func TestHardLink(t *testing.T) { 1306 | t.Parallel() 1307 | fs := tmpFs(t) 1308 | 1309 | foo1Stat, err := fs.Mknod(ROOT_INO, "foo1", MknodOpts{ 1310 | Mode: S_IFREG | 0o777, 1311 | Uid: 0, 1312 | Gid: 0, 1313 | }) 1314 | if err != nil { 1315 | t.Fatal(err) 1316 | } 1317 | 1318 | foo2Stat, err := fs.HardLink(ROOT_INO, foo1Stat.Ino, "foo2") 1319 | if err != nil { 1320 | t.Fatal(err) 1321 | } 1322 | 1323 | if foo2Stat.Nlink != 2 { 1324 | t.Fatal(err) 1325 | } 1326 | 1327 | foo1Stat, err = fs.Lookup(ROOT_INO, "foo1") 1328 | if err != nil { 1329 | t.Fatal(err) 1330 | } 1331 | if foo1Stat.Nlink != 2 { 1332 | t.Fatal(err) 1333 | } 1334 | 1335 | foo1Stat, err = fs.Lookup(ROOT_INO, "foo1") 1336 | if err != nil { 1337 | t.Fatal(err) 1338 | } 1339 | if foo1Stat.Nlink != 2 { 1340 | t.Fatal(err) 1341 | } 1342 | 1343 | if foo1Stat.Ino != foo2Stat.Ino { 1344 | t.Fatal("inos differ") 1345 | } 1346 | 1347 | err = fs.Unlink(ROOT_INO, "foo2") 1348 | if err != nil { 1349 | t.Fatal(err) 1350 | } 1351 | 1352 | foo1Stat, err = fs.Lookup(ROOT_INO, "foo1") 1353 | if err != nil { 1354 | t.Fatal(err) 1355 | } 1356 | if foo1Stat.Nlink != 1 { 1357 | t.Fatal(err) 1358 | } 1359 | 1360 | nRemoved, err := fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 1361 | RemovalDelay: 0, 1362 | }) 1363 | if err != nil { 1364 | t.Fatal(err) 1365 | } 1366 | if nRemoved != 0 { 1367 | t.Fatal("unexpected remove count") 1368 | } 1369 | 1370 | err = fs.Unlink(ROOT_INO, "foo1") 1371 | if err != nil { 1372 | t.Fatal(err) 1373 | } 1374 | 1375 | nRemoved, err = fs.RemoveExpiredUnlinked(RemoveExpiredUnlinkedOptions{ 1376 | RemovalDelay: 0, 1377 | }) 1378 | if err != nil { 1379 | t.Fatal(err) 1380 | } 1381 | if nRemoved != 1 { 1382 | t.Fatal("unexpected remove count") 1383 | } 1384 | 1385 | } 1386 | 1387 | func TestHardLinkDirFails(t *testing.T) { 1388 | t.Parallel() 1389 | fs := tmpFs(t) 1390 | 1391 | foo1Stat, err := fs.Mknod(ROOT_INO, "foo1", MknodOpts{ 1392 | Mode: S_IFDIR | 0o777, 1393 | Uid: 0, 1394 | Gid: 0, 1395 | }) 1396 | if err != nil { 1397 | t.Fatal(err) 1398 | } 1399 | 1400 | _, err = fs.HardLink(ROOT_INO, foo1Stat.Ino, "foo2") 1401 | if err != ErrPermission { 1402 | t.Fatal(err) 1403 | } 1404 | 1405 | } 1406 | 1407 | func TestClientSelfEvictExclusiveLock(t *testing.T) { 1408 | t.Parallel() 1409 | db := tmpDB(t) 1410 | fs1, err := Attach(db, "testfs", AttachOpts{}) 1411 | if err != nil { 1412 | t.Fatal(err) 1413 | } 1414 | defer fs1.Close() 1415 | 1416 | fs2, err := Attach(db, "testfs", AttachOpts{}) 1417 | if err != nil { 1418 | t.Fatal(err) 1419 | } 1420 | defer fs2.Close() 1421 | 1422 | stat, err := fs1.Mknod(ROOT_INO, "f", MknodOpts{ 1423 | Mode: S_IFREG | 0o777, 1424 | Uid: 0, 1425 | Gid: 0, 1426 | }) 1427 | 1428 | ok, err := fs1.TrySetLock(stat.Ino, SetLockOpts{ 1429 | Typ: LOCK_EXCLUSIVE, 1430 | Owner: 1, 1431 | }) 1432 | if err != nil { 1433 | t.Fatal(err) 1434 | } 1435 | if !ok { 1436 | t.Fatal() 1437 | } 1438 | 1439 | err = fs1.Close() 1440 | if err != nil { 1441 | t.Fatal(err) 1442 | } 1443 | 1444 | ok, err = fs2.TrySetLock(stat.Ino, SetLockOpts{ 1445 | Typ: LOCK_EXCLUSIVE, 1446 | Owner: 1, 1447 | }) 1448 | if err != nil { 1449 | t.Fatal(err) 1450 | } 1451 | if !ok { 1452 | t.Fatal() 1453 | } 1454 | 1455 | } 1456 | 1457 | func TestClientSelfEvictSharedLock(t *testing.T) { 1458 | t.Parallel() 1459 | db := tmpDB(t) 1460 | fs1, err := Attach(db, "testfs", AttachOpts{}) 1461 | if err != nil { 1462 | t.Fatal(err) 1463 | } 1464 | defer fs1.Close() 1465 | 1466 | fs2, err := Attach(db, "testfs", AttachOpts{}) 1467 | if err != nil { 1468 | t.Fatal(err) 1469 | } 1470 | defer fs2.Close() 1471 | 1472 | stat, err := fs1.Mknod(ROOT_INO, "f", MknodOpts{ 1473 | Mode: S_IFREG | 0o777, 1474 | Uid: 0, 1475 | Gid: 0, 1476 | }) 1477 | 1478 | ok, err := fs1.TrySetLock(stat.Ino, SetLockOpts{ 1479 | Typ: LOCK_SHARED, 1480 | Owner: 1, 1481 | }) 1482 | if err != nil { 1483 | t.Fatal(err) 1484 | } 1485 | if !ok { 1486 | t.Fatal() 1487 | } 1488 | 1489 | err = fs1.Close() 1490 | if err != nil { 1491 | t.Fatal(err) 1492 | } 1493 | 1494 | ok, err = fs2.TrySetLock(stat.Ino, SetLockOpts{ 1495 | Typ: LOCK_EXCLUSIVE, 1496 | Owner: 1, 1497 | }) 1498 | if err != nil { 1499 | t.Fatal(err) 1500 | } 1501 | if !ok { 1502 | t.Fatal() 1503 | } 1504 | 1505 | } 1506 | 1507 | func TestSubvolume(t *testing.T) { 1508 | t.Parallel() 1509 | fs := tmpFs(t) 1510 | 1511 | dir, err := fs.Mknod(ROOT_INO, "d", MknodOpts{ 1512 | Mode: S_IFDIR | 0o777, 1513 | Uid: 0, 1514 | Gid: 0, 1515 | }) 1516 | if err != nil { 1517 | t.Fatal(err) 1518 | } 1519 | 1520 | f, err := fs.Mknod(dir.Ino, "f", MknodOpts{ 1521 | Mode: S_IFREG | 0o777, 1522 | Uid: 0, 1523 | Gid: 0, 1524 | }) 1525 | if err != nil { 1526 | t.Fatal(err) 1527 | } 1528 | if f.Subvolume != ROOT_INO { 1529 | t.Fatal("unexpected subvolume") 1530 | } 1531 | 1532 | err = fs.SetXAttr(dir.Ino, "hafs.subvolume", []byte("true")) 1533 | if err != ErrInvalid { 1534 | t.Fatal(err) 1535 | } 1536 | 1537 | err = fs.Unlink(dir.Ino, "f") 1538 | if err != nil { 1539 | t.Fatal(err) 1540 | } 1541 | 1542 | err = fs.SetXAttr(dir.Ino, "hafs.subvolume", []byte("true")) 1543 | if err != nil { 1544 | t.Fatal(err) 1545 | } 1546 | 1547 | dir, err = fs.GetStat(dir.Ino) 1548 | if err != nil { 1549 | t.Fatal(err) 1550 | } 1551 | 1552 | if dir.Flags&FLAG_SUBVOLUME != FLAG_SUBVOLUME { 1553 | t.Fatal("expected subvolume flag to be set") 1554 | } 1555 | 1556 | f, err = fs.Mknod(dir.Ino, "f", MknodOpts{ 1557 | Mode: S_IFREG | 0o777, 1558 | Uid: 0, 1559 | Gid: 0, 1560 | }) 1561 | if err != nil { 1562 | t.Fatal(err) 1563 | } 1564 | if f.Subvolume != dir.Ino { 1565 | t.Fatal("unexpected subvolume") 1566 | } 1567 | 1568 | err = fs.RemoveXAttr(dir.Ino, "hafs.subvolume") 1569 | if err == nil { 1570 | t.Fatal(err) 1571 | } 1572 | 1573 | err = fs.RemoveXAttr(dir.Ino, "hafs.subvolume") 1574 | if err != ErrInvalid { 1575 | t.Fatal(err) 1576 | } 1577 | 1578 | err = fs.Unlink(dir.Ino, "f") 1579 | if err != nil { 1580 | t.Fatal(err) 1581 | } 1582 | 1583 | err = fs.RemoveXAttr(dir.Ino, "hafs.subvolume") 1584 | if err != nil { 1585 | t.Fatal(err) 1586 | } 1587 | 1588 | dir, err = fs.GetStat(dir.Ino) 1589 | if err != nil { 1590 | t.Fatal(err) 1591 | } 1592 | if dir.Flags&FLAG_SUBVOLUME != 0 { 1593 | t.Fatal("expected subvolume flag to be unset") 1594 | } 1595 | 1596 | } 1597 | 1598 | func TestSubvolumeNoCrossRenames(t *testing.T) { 1599 | t.Parallel() 1600 | fs := tmpFs(t) 1601 | 1602 | dir, err := fs.Mknod(ROOT_INO, "d", MknodOpts{ 1603 | Mode: S_IFDIR | 0o777, 1604 | Uid: 0, 1605 | Gid: 0, 1606 | }) 1607 | if err != nil { 1608 | t.Fatal(err) 1609 | } 1610 | 1611 | err = fs.SetXAttr(dir.Ino, "hafs.subvolume", []byte("true")) 1612 | if err != nil { 1613 | t.Fatal(err) 1614 | } 1615 | 1616 | _, err = fs.Mknod(dir.Ino, "f", MknodOpts{ 1617 | Mode: S_IFREG | 0o777, 1618 | Uid: 0, 1619 | Gid: 0, 1620 | }) 1621 | if err != nil { 1622 | t.Fatal(err) 1623 | } 1624 | 1625 | err = fs.Rename(dir.Ino, ROOT_INO, "f", "f") 1626 | if err != ErrInvalid { 1627 | t.Fatal(err) 1628 | } 1629 | 1630 | } 1631 | 1632 | func TestSubvolumeNoCrossHardlinks(t *testing.T) { 1633 | t.Parallel() 1634 | fs := tmpFs(t) 1635 | 1636 | dir, err := fs.Mknod(ROOT_INO, "d", MknodOpts{ 1637 | Mode: S_IFDIR | 0o777, 1638 | Uid: 0, 1639 | Gid: 0, 1640 | }) 1641 | if err != nil { 1642 | t.Fatal(err) 1643 | } 1644 | 1645 | err = fs.SetXAttr(dir.Ino, "hafs.subvolume", []byte("true")) 1646 | if err != nil { 1647 | t.Fatal(err) 1648 | } 1649 | 1650 | f, err := fs.Mknod(dir.Ino, "f", MknodOpts{ 1651 | Mode: S_IFREG | 0o777, 1652 | Uid: 0, 1653 | Gid: 0, 1654 | }) 1655 | if err != nil { 1656 | t.Fatal(err) 1657 | } 1658 | 1659 | _, err = fs.HardLink(ROOT_INO, f.Ino, "f") 1660 | if err != ErrInvalid { 1661 | t.Fatal(err) 1662 | } 1663 | 1664 | } 1665 | 1666 | func TestSubvolumeByteAccounting(t *testing.T) { 1667 | t.Parallel() 1668 | fs := tmpFs(t) 1669 | 1670 | b, err := fs.SubvolumeByteCount(ROOT_INO) 1671 | if err != nil { 1672 | t.Fatal(err) 1673 | } 1674 | if b != 0 { 1675 | t.Fatal(b) 1676 | } 1677 | 1678 | f, fstat, err := fs.CreateFile(ROOT_INO, "f", CreateFileOpts{ 1679 | Mode: 0o777, 1680 | }) 1681 | if err != nil { 1682 | t.Fatal(err) 1683 | } 1684 | 1685 | _, err = f.WriteData([]byte{1, 2, 3, 4, 5}, 0) 1686 | if err != nil { 1687 | t.Fatal(err) 1688 | } 1689 | 1690 | err = f.Close() 1691 | if err != nil { 1692 | t.Fatal(err) 1693 | } 1694 | 1695 | b, err = fs.SubvolumeByteCount(ROOT_INO) 1696 | if err != nil { 1697 | t.Fatal(err) 1698 | } 1699 | if b != 5 { 1700 | t.Fatal(b) 1701 | } 1702 | 1703 | usageXattr, err := fs.GetXAttr(ROOT_INO, "hafs.total-bytes") 1704 | if err != nil { 1705 | t.Fatal(err) 1706 | } 1707 | if string(usageXattr) != "5" { 1708 | t.Fatalf("unexpected usage xattr: %q", string(usageXattr)) 1709 | } 1710 | 1711 | fs.ModStat(fstat.Ino, ModStatOpts{ 1712 | Valid: MODSTAT_SIZE, 1713 | Size: 3, 1714 | }) 1715 | 1716 | b, err = fs.SubvolumeByteCount(ROOT_INO) 1717 | if err != nil { 1718 | t.Fatal(err) 1719 | } 1720 | if b != 3 { 1721 | t.Fatal(b) 1722 | } 1723 | 1724 | err = fs.Unlink(ROOT_INO, "f") 1725 | if err != nil { 1726 | t.Fatal(err) 1727 | } 1728 | 1729 | b, err = fs.SubvolumeByteCount(ROOT_INO) 1730 | if err != nil { 1731 | t.Fatal(err) 1732 | } 1733 | if b != 0 { 1734 | t.Fatal(b) 1735 | } 1736 | 1737 | } 1738 | -------------------------------------------------------------------------------- /fuse.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "errors" 5 | "io" 6 | iofs "io/fs" 7 | "log" 8 | mathrand "math/rand" 9 | "os" 10 | "sync" 11 | "sync/atomic" 12 | "time" 13 | 14 | "github.com/hanwen/go-fuse/v2/fuse" 15 | "golang.org/x/sys/unix" 16 | ) 17 | 18 | type openFile struct { 19 | maybeHasPosixLock atomicBool 20 | di *DirIter 21 | f HafsFile 22 | } 23 | 24 | type HafsFuseOptions struct { 25 | CacheDentries time.Duration 26 | CacheAttributes time.Duration 27 | Logf func(string, ...interface{}) 28 | } 29 | 30 | type FuseFs struct { 31 | server *fuse.Server 32 | 33 | cacheDentries time.Duration 34 | cacheAttributes time.Duration 35 | logf func(string, ...interface{}) 36 | 37 | fs *Fs 38 | 39 | fileHandleCounter uint64 40 | 41 | fh2OpenFile sync.Map 42 | } 43 | 44 | func NewFuseFs(fs *Fs, opts HafsFuseOptions) *FuseFs { 45 | 46 | if opts.Logf == nil { 47 | opts.Logf = log.Printf 48 | } 49 | 50 | return &FuseFs{ 51 | cacheDentries: opts.CacheDentries, 52 | cacheAttributes: opts.CacheAttributes, 53 | logf: opts.Logf, 54 | fs: fs, 55 | fh2OpenFile: sync.Map{}, 56 | } 57 | } 58 | 59 | func (fs *FuseFs) newFileHandle(f *openFile) uint64 { 60 | fh := atomic.AddUint64(&fs.fileHandleCounter, 1) 61 | fs.fh2OpenFile.Store(fh, f) 62 | return fh 63 | } 64 | 65 | func (fs *FuseFs) getFileFromHandle(handle uint64) (*openFile, bool) { 66 | f, ok := fs.fh2OpenFile.Load(handle) 67 | if !ok { 68 | return nil, false 69 | } 70 | return f.(*openFile), true 71 | } 72 | 73 | func (fs *FuseFs) releaseFileHandle(handle uint64) (*openFile, bool) { 74 | f, ok := fs.fh2OpenFile.LoadAndDelete(handle) 75 | if !ok { 76 | return nil, false 77 | } 78 | return f.(*openFile), true 79 | } 80 | 81 | func (fs *FuseFs) errToFuseStatus(err error) fuse.Status { 82 | if err == nil { 83 | return fuse.OK 84 | } 85 | 86 | if errno, ok := err.(unix.Errno); ok { 87 | return fuse.Status(errno) 88 | } 89 | 90 | if errors.Is(err, iofs.ErrNotExist) { 91 | return fuse.Status(unix.ENOENT) 92 | } else if errors.Is(err, iofs.ErrPermission) { 93 | return fuse.Status(unix.EPERM) 94 | } else if errors.Is(err, iofs.ErrExist) { 95 | return fuse.Status(unix.EEXIST) 96 | } else if errors.Is(err, iofs.ErrInvalid) { 97 | return fuse.Status(unix.EINVAL) 98 | } 99 | 100 | // Log all io errors that don't have a clear cause. 101 | fs.logf("io error: %s", err) 102 | return fuse.Status(fuse.EIO) 103 | } 104 | 105 | func (fs *FuseFs) fillFuseAttrFromStat(stat *Stat, out *fuse.Attr) { 106 | out.Ino = stat.Ino 107 | out.Size = stat.Size 108 | out.Blocks = stat.Size / 512 109 | out.Blksize = CHUNK_SIZE 110 | out.Atime = stat.Atimesec 111 | out.Atimensec = stat.Atimensec 112 | out.Mtime = stat.Mtimesec 113 | out.Mtimensec = stat.Mtimensec 114 | out.Ctime = stat.Ctimesec 115 | out.Ctimensec = stat.Ctimensec 116 | out.Mode = stat.Mode 117 | out.Nlink = stat.Nlink 118 | out.Owner.Uid = stat.Uid 119 | out.Owner.Gid = stat.Gid 120 | out.Rdev = stat.Rdev 121 | } 122 | 123 | func (fs *FuseFs) fillFuseAttrOutFromStat(stat *Stat, out *fuse.AttrOut) { 124 | fs.fillFuseAttrFromStat(stat, &out.Attr) 125 | out.AttrValid = uint64(fs.cacheAttributes.Nanoseconds() / 1_000_000_000) 126 | out.AttrValidNsec = uint32(uint64(fs.cacheAttributes.Nanoseconds()) - out.AttrValid*1_000_000_000) 127 | } 128 | 129 | func (fs *FuseFs) fillFuseEntryOutFromStat(stat *Stat, out *fuse.EntryOut) { 130 | out.Generation = 0 131 | out.NodeId = stat.Ino 132 | fs.fillFuseAttrFromStat(stat, &out.Attr) 133 | out.AttrValid = uint64(fs.cacheAttributes.Nanoseconds() / 1_000_000_000) 134 | out.AttrValidNsec = uint32(uint64(fs.cacheAttributes.Nanoseconds()) - out.AttrValid*1_000_000_000) 135 | 136 | out.EntryValid = uint64(fs.cacheDentries.Nanoseconds() / 1_000_000_000) 137 | out.EntryValidNsec = uint32(uint64(fs.cacheDentries.Nanoseconds()) - out.EntryValid*1_000_000_000) 138 | 139 | } 140 | 141 | func (fs *FuseFs) Init(server *fuse.Server) { 142 | fs.server = server 143 | } 144 | 145 | func (fs *FuseFs) Lookup(cancel <-chan struct{}, header *fuse.InHeader, name string, out *fuse.EntryOut) fuse.Status { 146 | stat, err := fs.fs.Lookup(header.NodeId, name) 147 | if err != nil { 148 | return fs.errToFuseStatus(err) 149 | } 150 | fs.fillFuseEntryOutFromStat(&stat, out) 151 | return fuse.OK 152 | } 153 | 154 | func (fs *FuseFs) Forget(nodeId, nlookup uint64) { 155 | 156 | } 157 | 158 | func (fs *FuseFs) GetAttr(cancel <-chan struct{}, in *fuse.GetAttrIn, out *fuse.AttrOut) fuse.Status { 159 | stat, err := fs.fs.GetStat(in.NodeId) 160 | if err != nil { 161 | return fs.errToFuseStatus(err) 162 | } 163 | fs.fillFuseAttrOutFromStat(&stat, out) 164 | return fuse.OK 165 | } 166 | 167 | func (fs *FuseFs) SetAttr(cancel <-chan struct{}, in *fuse.SetAttrIn, out *fuse.AttrOut) fuse.Status { 168 | 169 | modStat := ModStatOpts{} 170 | 171 | if mtime, ok := in.GetMTime(); ok { 172 | modStat.SetMtime(mtime) 173 | } 174 | if atime, ok := in.GetATime(); ok { 175 | modStat.SetAtime(atime) 176 | } 177 | if ctime, ok := in.GetCTime(); ok { 178 | modStat.SetCtime(ctime) 179 | } 180 | 181 | if size, ok := in.GetSize(); ok { 182 | modStat.Valid |= MODSTAT_SIZE 183 | modStat.SetSize(size) 184 | } 185 | 186 | if mode, ok := in.GetMode(); ok { 187 | modStat.SetMode(mode) 188 | } 189 | 190 | if uid, ok := in.GetUID(); ok { 191 | modStat.SetUid(uid) 192 | } 193 | 194 | if gid, ok := in.GetGID(); ok { 195 | modStat.SetGid(gid) 196 | } 197 | 198 | stat, err := fs.fs.ModStat(in.NodeId, modStat) 199 | if err != nil { 200 | return fs.errToFuseStatus(err) 201 | } 202 | 203 | fs.fillFuseAttrOutFromStat(&stat, out) 204 | return fuse.OK 205 | } 206 | 207 | func (fs *FuseFs) Open(cancel <-chan struct{}, in *fuse.OpenIn, out *fuse.OpenOut) fuse.Status { 208 | f, _, err := fs.fs.OpenFile(in.NodeId, OpenFileOpts{ 209 | Truncate: in.Flags&unix.O_TRUNC != 0, 210 | }) 211 | if err != nil { 212 | return fs.errToFuseStatus(err) 213 | } 214 | 215 | switch f.(type) { 216 | case *objectStoreSmallReadOnlyFile: 217 | // For now only the small read only files support readahead, 218 | // and are immutable plus cacheable. The streaming files 219 | // can't do this because the kernel readahead makes the 220 | // read offset bounce around. 221 | out.OpenFlags |= fuse.FOPEN_KEEP_CACHE 222 | default: 223 | out.OpenFlags |= fuse.FOPEN_DIRECT_IO 224 | } 225 | 226 | out.Fh = fs.newFileHandle(&openFile{f: f}) 227 | 228 | return fuse.OK 229 | } 230 | 231 | func (fs *FuseFs) Create(cancel <-chan struct{}, in *fuse.CreateIn, name string, out *fuse.CreateOut) fuse.Status { 232 | f, stat, err := fs.fs.CreateFile(in.NodeId, name, CreateFileOpts{ 233 | Truncate: in.Flags&unix.O_TRUNC != 0, 234 | Mode: in.Mode, 235 | Uid: in.Owner.Uid, 236 | Gid: in.Owner.Gid, 237 | }) 238 | if err != nil { 239 | return fs.errToFuseStatus(err) 240 | } 241 | 242 | out.OpenFlags |= fuse.FOPEN_DIRECT_IO 243 | 244 | fs.fillFuseEntryOutFromStat(&stat, &out.EntryOut) 245 | 246 | out.Fh = fs.newFileHandle(&openFile{f: f}) 247 | 248 | return fuse.OK 249 | } 250 | 251 | func (fs *FuseFs) Rename(cancel <-chan struct{}, in *fuse.RenameIn, fromName string, toName string) fuse.Status { 252 | fromDir := in.NodeId 253 | toDir := in.Newdir 254 | err := fs.fs.Rename(fromDir, toDir, fromName, toName) 255 | if err != nil { 256 | return fs.errToFuseStatus(err) 257 | } 258 | return fuse.OK 259 | } 260 | 261 | func (fs *FuseFs) Read(cancel <-chan struct{}, in *fuse.ReadIn, buf []byte) (fuse.ReadResult, fuse.Status) { 262 | 263 | f, ok := fs.getFileFromHandle(in.Fh) 264 | if !ok { 265 | return nil, fuse.EBADF 266 | } 267 | 268 | nTotal := uint32(0) 269 | for nTotal != uint32(len(buf)) { 270 | n, err := f.f.ReadData(buf[nTotal:], uint64(in.Offset)+uint64(nTotal)) 271 | nTotal += n 272 | if err == io.EOF { 273 | break 274 | } 275 | if err != nil { 276 | return nil, fs.errToFuseStatus(err) 277 | } 278 | } 279 | 280 | return fuse.ReadResultData(buf[:nTotal]), fuse.OK 281 | } 282 | 283 | func (fs *FuseFs) Write(cancel <-chan struct{}, in *fuse.WriteIn, buf []byte) (uint32, fuse.Status) { 284 | 285 | nTotal := uint32(0) 286 | 287 | f, ok := fs.getFileFromHandle(in.Fh) 288 | if !ok { 289 | return nTotal, fuse.EBADF 290 | } 291 | 292 | for nTotal != uint32(len(buf)) { 293 | n, err := f.f.WriteData(buf[nTotal:], uint64(in.Offset)+uint64(nTotal)) 294 | nTotal += uint32(n) 295 | if err != nil { 296 | return nTotal, fs.errToFuseStatus(err) 297 | } 298 | } 299 | 300 | return nTotal, fuse.OK 301 | } 302 | 303 | func (fs *FuseFs) Lseek(cancel <-chan struct{}, in *fuse.LseekIn, out *fuse.LseekOut) fuse.Status { 304 | // XXX We do support sparse files, so this could be implemented. 305 | // It's worth noting that it seems like fuse only uses Lseek for SEEK_DATA and SEEK_HOLE but 306 | // we could be wrong on that. 307 | return fuse.ENOSYS 308 | } 309 | 310 | func (fs *FuseFs) Fsync(cancel <-chan struct{}, in *fuse.FsyncIn) fuse.Status { 311 | f, ok := fs.getFileFromHandle(in.Fh) 312 | if !ok { 313 | return fuse.EBADF 314 | } 315 | 316 | err := f.f.Fsync() 317 | if err != nil { 318 | return fs.errToFuseStatus(err) 319 | } 320 | return fuse.OK 321 | } 322 | 323 | func (fs *FuseFs) Flush(cancel <-chan struct{}, in *fuse.FlushIn) fuse.Status { 324 | f, ok := fs.getFileFromHandle(in.Fh) 325 | if !ok { 326 | return fuse.EBADF 327 | } 328 | 329 | fsyncErr := f.f.Fsync() 330 | 331 | if f.maybeHasPosixLock.Load() { 332 | // Note, this this behavior intentionally violates posix lock semantics: 333 | // 334 | // Normally posix locks are associated with a process id, so any file descriptor 335 | // that opens and closes a file will release the lock for all files, but here we only 336 | // are only releasing it if the file that created the lock is closed. 337 | // 338 | // We *could* implement the full semantics, but at increased complexity, reduced 339 | // performance, and mainly to support potentially questionable use cases. For now 340 | // we will instead document our semantics and keep it simple. 341 | fs.releaseLocks(in.NodeId, in.LockOwner) 342 | } 343 | return fs.errToFuseStatus(fsyncErr) 344 | } 345 | 346 | func (fs *FuseFs) Release(cancel <-chan struct{}, in *fuse.ReleaseIn) { 347 | 348 | f, ok := fs.releaseFileHandle(in.Fh) 349 | if ok && f.f != nil { 350 | _ = f.f.Close() 351 | } 352 | 353 | const FUSE_RELEASE_FLOCK_UNLOCK = (1 << 1) // XXX remove once constant is in upstream go-fuse. 354 | if in.ReleaseFlags&FUSE_RELEASE_FLOCK_UNLOCK != 0 { 355 | fs.releaseLocks(in.NodeId, in.LockOwner) 356 | } 357 | } 358 | 359 | func (fs *FuseFs) Unlink(cancel <-chan struct{}, in *fuse.InHeader, name string) fuse.Status { 360 | err := fs.fs.Unlink(in.NodeId, name) 361 | return fs.errToFuseStatus(err) 362 | } 363 | 364 | func (fs *FuseFs) Rmdir(cancel <-chan struct{}, in *fuse.InHeader, name string) fuse.Status { 365 | err := fs.fs.Unlink(in.NodeId, name) 366 | return fs.errToFuseStatus(err) 367 | } 368 | 369 | func (fs *FuseFs) Link(cancel <-chan struct{}, in *fuse.LinkIn, name string, out *fuse.EntryOut) fuse.Status { 370 | stat, err := fs.fs.HardLink(in.NodeId, in.Oldnodeid, name) 371 | if err != nil { 372 | return fs.errToFuseStatus(err) 373 | } 374 | fs.fillFuseEntryOutFromStat(&stat, out) 375 | return fuse.OK 376 | } 377 | 378 | func (fs *FuseFs) Symlink(cancel <-chan struct{}, in *fuse.InHeader, pointedTo string, linkName string, out *fuse.EntryOut) fuse.Status { 379 | stat, err := fs.fs.Mknod(in.NodeId, linkName, MknodOpts{ 380 | Mode: S_IFLNK | 0o777, 381 | Uid: in.Owner.Uid, 382 | Gid: in.Owner.Gid, 383 | LinkTarget: []byte(pointedTo), 384 | }) 385 | if err != nil { 386 | return fs.errToFuseStatus(err) 387 | } 388 | fs.fillFuseEntryOutFromStat(&stat, out) 389 | return fuse.OK 390 | } 391 | 392 | func (fs *FuseFs) Readlink(cancel <-chan struct{}, in *fuse.InHeader) ([]byte, fuse.Status) { 393 | l, err := fs.fs.ReadSymlink(in.NodeId) 394 | if err != nil { 395 | return nil, fs.errToFuseStatus(err) 396 | } 397 | return l, fuse.OK 398 | } 399 | 400 | func (fs *FuseFs) Mkdir(cancel <-chan struct{}, in *fuse.MkdirIn, name string, out *fuse.EntryOut) fuse.Status { 401 | stat, err := fs.fs.Mknod(in.NodeId, name, MknodOpts{ 402 | Mode: (^S_IFMT & in.Mode) | S_IFDIR, 403 | Uid: in.Owner.Uid, 404 | Gid: in.Owner.Gid, 405 | }) 406 | if err != nil { 407 | return fs.errToFuseStatus(err) 408 | } 409 | fs.fillFuseEntryOutFromStat(&stat, out) 410 | return fuse.OK 411 | } 412 | 413 | func (fs *FuseFs) OpenDir(cancel <-chan struct{}, in *fuse.OpenIn, out *fuse.OpenOut) fuse.Status { 414 | dirIter, err := fs.fs.IterDirEnts(in.NodeId) 415 | if err != nil { 416 | return fs.errToFuseStatus(err) 417 | } 418 | 419 | out.OpenFlags |= fuse.FOPEN_DIRECT_IO 420 | 421 | out.Fh = fs.newFileHandle(&openFile{ 422 | di: dirIter, 423 | f: &invalidFile{}, 424 | }) 425 | 426 | return fuse.OK 427 | } 428 | 429 | func (fs *FuseFs) ReadDir(cancel <-chan struct{}, in *fuse.ReadIn, out *fuse.DirEntryList) fuse.Status { 430 | d, ok := fs.getFileFromHandle(in.Fh) 431 | if !ok { 432 | return fuse.EBADF 433 | } 434 | 435 | if d.di == nil { 436 | return fuse.Status(unix.EBADF) 437 | } 438 | 439 | // XXX TODO verify offset is correct. 440 | for { 441 | ent, err := d.di.Next() 442 | if err != nil { 443 | if errors.Is(err, io.EOF) { 444 | break 445 | } 446 | return fs.errToFuseStatus(err) 447 | } 448 | fuseDirEnt := fuse.DirEntry{ 449 | Name: ent.Name, 450 | Mode: ent.Mode, 451 | Ino: ent.Ino, 452 | } 453 | if !out.AddDirEntry(fuseDirEnt) { 454 | d.di.Unget(ent) 455 | break 456 | } 457 | } 458 | return fuse.OK 459 | } 460 | 461 | func (fs *FuseFs) ReadDirPlus(cancel <-chan struct{}, in *fuse.ReadIn, out *fuse.DirEntryList) fuse.Status { 462 | d, ok := fs.getFileFromHandle(in.Fh) 463 | if !ok { 464 | return fuse.EBADF 465 | } 466 | 467 | if d.di == nil { 468 | return fuse.Status(unix.EBADF) 469 | } 470 | 471 | // XXX TODO verify offset is correct. 472 | for { 473 | ent, stat, err := d.di.NextPlus() 474 | if err != nil { 475 | if errors.Is(err, io.EOF) { 476 | break 477 | } 478 | return fs.errToFuseStatus(err) 479 | } 480 | fuseDirEnt := fuse.DirEntry{ 481 | Name: ent.Name, 482 | Mode: ent.Mode, 483 | Ino: ent.Ino, 484 | } 485 | entryOut := out.AddDirLookupEntry(fuseDirEnt) 486 | if entryOut != nil { 487 | fs.fillFuseEntryOutFromStat(&stat, entryOut) 488 | } else { 489 | d.di.UngetPlus(ent, stat) 490 | break 491 | } 492 | } 493 | return fuse.OK 494 | } 495 | 496 | func (fs *FuseFs) FsyncDir(cancel <-chan struct{}, in *fuse.FsyncIn) fuse.Status { 497 | return fuse.OK 498 | } 499 | 500 | func (fs *FuseFs) ReleaseDir(in *fuse.ReleaseIn) { 501 | f, ok := fs.releaseFileHandle(in.Fh) 502 | if ok && f.f != nil { 503 | _ = f.f.Close() 504 | } 505 | } 506 | 507 | func (fs *FuseFs) GetXAttr(cancel <-chan struct{}, in *fuse.InHeader, attr string, dest []byte) (uint32, fuse.Status) { 508 | x, err := fs.fs.GetXAttr(in.NodeId, attr) 509 | if err != nil { 510 | return 0, fs.errToFuseStatus(err) 511 | } 512 | if len(dest) < len(x) { 513 | return uint32(len(x)), fuse.ERANGE 514 | } 515 | copy(dest, x) 516 | return uint32(len(x)), fuse.OK 517 | } 518 | 519 | func (fs *FuseFs) ListXAttr(cancel <-chan struct{}, in *fuse.InHeader, dest []byte) (uint32, fuse.Status) { 520 | xattrs, err := fs.fs.ListXAttr(in.NodeId) 521 | if err != nil { 522 | return 0, fs.errToFuseStatus(err) 523 | } 524 | 525 | nNeeded := uint32(0) 526 | for _, x := range xattrs { 527 | nNeeded += uint32(len(x)) + 1 528 | } 529 | if uint32(len(dest)) < nNeeded { 530 | return nNeeded, fuse.ERANGE 531 | } 532 | 533 | for _, x := range xattrs { 534 | copy(dest[:len(x)], x) 535 | dest[len(x)] = 0 536 | dest = dest[len(x)+1:] 537 | } 538 | 539 | return nNeeded, fuse.OK 540 | } 541 | 542 | func (fs *FuseFs) SetXAttr(cancel <-chan struct{}, in *fuse.SetXAttrIn, attr string, data []byte) fuse.Status { 543 | err := fs.fs.SetXAttr(in.NodeId, attr, data) 544 | return fs.errToFuseStatus(err) 545 | } 546 | 547 | func (fs *FuseFs) RemoveXAttr(cancel <-chan struct{}, in *fuse.InHeader, attr string) fuse.Status { 548 | err := fs.fs.RemoveXAttr(in.NodeId, attr) 549 | return fs.errToFuseStatus(err) 550 | } 551 | 552 | func (fs *FuseFs) releaseLocks(ino uint64, lockOwner uint64) { 553 | for { 554 | _, err := fs.fs.TrySetLock(ino, SetLockOpts{ 555 | Typ: LOCK_NONE, 556 | Owner: lockOwner, 557 | }) 558 | if err == nil { 559 | break 560 | } 561 | fs.logf("unable to release lock ino=%d owner=%d: %s", ino, lockOwner, err) 562 | time.Sleep(1 * time.Second) 563 | } 564 | } 565 | 566 | func (fs *FuseFs) GetLk(cancel <-chan struct{}, in *fuse.LkIn, out *fuse.LkOut) fuse.Status { 567 | return fuse.ENOSYS 568 | } 569 | 570 | func (fs *FuseFs) SetLk(cancel <-chan struct{}, in *fuse.LkIn) fuse.Status { 571 | f, ok := fs.getFileFromHandle(in.Fh) 572 | if !ok { 573 | return fuse.EBADF 574 | } 575 | 576 | var lockType LockType 577 | 578 | switch in.Lk.Typ { 579 | case unix.F_RDLCK: 580 | lockType = LOCK_SHARED 581 | case unix.F_WRLCK: 582 | lockType = LOCK_EXCLUSIVE 583 | case unix.F_UNLCK: 584 | lockType = LOCK_NONE 585 | default: 586 | return fuse.ENOTSUP 587 | } 588 | 589 | if in.Lk.Start != 0 { 590 | return fuse.ENOTSUP 591 | } 592 | if in.Lk.End != 0x7fffffffffffffff { 593 | return fuse.ENOTSUP 594 | } 595 | 596 | ok, err := fs.fs.TrySetLock(in.NodeId, SetLockOpts{ 597 | Typ: lockType, 598 | Owner: in.Owner, 599 | }) 600 | 601 | if in.LkFlags&fuse.FUSE_LK_FLOCK == 0 { 602 | f.maybeHasPosixLock.Store(true) 603 | } 604 | 605 | if err != nil { 606 | return fs.errToFuseStatus(err) 607 | } 608 | 609 | if !ok { 610 | return fuse.EAGAIN 611 | } 612 | 613 | return fuse.OK 614 | } 615 | 616 | func (fs *FuseFs) SetLkw(cancel <-chan struct{}, in *fuse.LkIn) fuse.Status { 617 | nAttempts := uint64(0) 618 | for { 619 | status := fs.SetLk(cancel, in) 620 | if status != fuse.EAGAIN { 621 | return status 622 | } 623 | 624 | select { 625 | case <-cancel: 626 | return fuse.EINTR 627 | default: 628 | } 629 | 630 | if nAttempts >= 2 { 631 | // Random delay to partially mitigate thundering herd on contended lock. 632 | time.Sleep(time.Duration(mathrand.Int()%5_000) * time.Millisecond) 633 | } 634 | err := fs.fs.AwaitExclusiveLockRelease(cancel, in.NodeId) 635 | if err != nil { 636 | return fs.errToFuseStatus(err) 637 | } 638 | nAttempts += 1 639 | } 640 | } 641 | 642 | func (fs *FuseFs) StatFs(cancel <-chan struct{}, in *fuse.InHeader, out *fuse.StatfsOut) fuse.Status { 643 | 644 | stats, err := fs.fs.FsStats() 645 | if err != nil { 646 | return fs.errToFuseStatus(err) 647 | } 648 | 649 | out.Bsize = CHUNK_SIZE 650 | out.Blocks = stats.UsedBytes / CHUNK_SIZE 651 | out.Bfree = stats.FreeBytes / CHUNK_SIZE 652 | out.Bavail = out.Bfree 653 | out.NameLen = 4096 654 | 655 | return fuse.OK 656 | } 657 | 658 | func (fs *FuseFs) Access(cancel <-chan struct{}, input *fuse.AccessIn) fuse.Status { 659 | return fuse.ENOSYS 660 | } 661 | 662 | func (fs *FuseFs) CopyFileRange(cancel <-chan struct{}, input *fuse.CopyFileRangeIn) (uint32, fuse.Status) { 663 | return 0, fuse.ENOSYS 664 | } 665 | 666 | func (fs *FuseFs) Fallocate(cancel <-chan struct{}, in *fuse.FallocateIn) fuse.Status { 667 | return fuse.ENOSYS 668 | } 669 | 670 | func (fs *FuseFs) Mknod(cancel <-chan struct{}, input *fuse.MknodIn, name string, out *fuse.EntryOut) fuse.Status { 671 | return fuse.ENOSYS 672 | } 673 | 674 | func (fs *FuseFs) SetDebug(dbg bool) { 675 | } 676 | 677 | func (fs *FuseFs) String() string { 678 | return os.Args[0] 679 | } 680 | -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/andrewchambers/hafs 2 | 3 | go 1.17 4 | 5 | require ( 6 | github.com/apple/foundationdb/bindings/go v0.0.0-20221108203244-b4bd84bd1985 7 | github.com/cheynewallace/tabby v1.1.1 8 | github.com/detailyang/fastrand-go v0.0.0-20191106153122-53093851e761 9 | github.com/hanwen/go-fuse/v2 v2.1.0 10 | github.com/minio/minio-go/v7 v7.0.39 11 | github.com/valyala/fastjson v1.6.3 12 | golang.org/x/sync v0.0.0-20220929204114-8fcdb60fdcc0 13 | golang.org/x/sys v0.0.0-20221006211917-84dc82d7e875 14 | ) 15 | 16 | require ( 17 | github.com/dustin/go-humanize v1.0.0 // indirect 18 | github.com/google/uuid v1.3.0 // indirect 19 | github.com/json-iterator/go v1.1.12 // indirect 20 | github.com/klauspost/compress v1.15.9 // indirect 21 | github.com/klauspost/cpuid/v2 v2.1.0 // indirect 22 | github.com/minio/md5-simd v1.1.2 // indirect 23 | github.com/minio/sha256-simd v1.0.0 // indirect 24 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect 25 | github.com/modern-go/reflect2 v1.0.2 // indirect 26 | github.com/rs/xid v1.4.0 // indirect 27 | github.com/sirupsen/logrus v1.9.0 // indirect 28 | github.com/stretchr/testify v1.8.0 // indirect 29 | golang.org/x/crypto v0.0.0-20220722155217-630584e8d5aa // indirect 30 | golang.org/x/net v0.0.0-20220722155237-a158d28d115b // indirect 31 | golang.org/x/text v0.3.7 // indirect 32 | golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 // indirect 33 | gopkg.in/ini.v1 v1.66.6 // indirect 34 | ) 35 | 36 | replace github.com/hanwen/go-fuse/v2 => github.com/andrewchambers/go-fuse/v2 v2.0.0-20230121043514-3c9647baf8ee 37 | -------------------------------------------------------------------------------- /object_storage.go: -------------------------------------------------------------------------------- 1 | package hafs 2 | 3 | import ( 4 | "context" 5 | "errors" 6 | "fmt" 7 | "io" 8 | "net/url" 9 | "os" 10 | "strings" 11 | "sync" 12 | 13 | "github.com/minio/minio-go/v7" 14 | miniocredentials "github.com/minio/minio-go/v7/pkg/credentials" 15 | ) 16 | 17 | type ReaderAtCloser interface { 18 | io.ReaderAt 19 | io.Closer 20 | } 21 | 22 | type ObjectStorageEngine interface { 23 | Open(fs string, inode uint64) (ReaderAtCloser, bool, error) 24 | ReadAll(fs string, inode uint64, w io.Writer) (bool, error) 25 | Write(fs string, inode uint64, data *os.File) (int64, error) 26 | Remove(fs string, inode uint64) error 27 | Close() error 28 | } 29 | 30 | var ErrStorageEngineNotConfigured error = errors.New("storage engine not configured") 31 | 32 | type unconfiguredStorageEngine struct{} 33 | 34 | func (s *unconfiguredStorageEngine) Open(fs string, inode uint64) (ReaderAtCloser, bool, error) { 35 | return nil, false, ErrStorageEngineNotConfigured 36 | } 37 | 38 | func (s *unconfiguredStorageEngine) ReadAll(fs string, inode uint64, w io.Writer) (bool, error) { 39 | return false, ErrStorageEngineNotConfigured 40 | } 41 | 42 | func (s *unconfiguredStorageEngine) Write(fs string, inode uint64, data *os.File) (int64, error) { 43 | return 0, ErrStorageEngineNotConfigured 44 | } 45 | 46 | func (s *unconfiguredStorageEngine) Remove(fs string, inode uint64) error { 47 | return ErrStorageEngineNotConfigured 48 | } 49 | 50 | func (s *unconfiguredStorageEngine) Close() error { 51 | return ErrStorageEngineNotConfigured 52 | } 53 | 54 | type fileStorageEngine struct { 55 | path string 56 | } 57 | 58 | func (s *fileStorageEngine) Open(fs string, inode uint64) (ReaderAtCloser, bool, error) { 59 | f, err := os.Open(fmt.Sprintf("%s/%016x.%s", s.path, inode, fs)) 60 | if err != nil { 61 | if os.IsNotExist(err) { 62 | return nil, false, nil 63 | } 64 | return nil, false, err 65 | } 66 | return f, true, nil 67 | } 68 | 69 | func (s *fileStorageEngine) Write(fs string, inode uint64, data *os.File) (int64, error) { 70 | f, err := os.Create(fmt.Sprintf("%s/%016x.%s", s.path, inode, fs)) 71 | if err != nil { 72 | return 0, err 73 | } 74 | defer f.Close() 75 | n, err := io.Copy(f, data) 76 | if err != nil { 77 | return n, err 78 | } 79 | return n, f.Sync() 80 | } 81 | 82 | func (s *fileStorageEngine) ReadAll(fs string, inode uint64, w io.Writer) (bool, error) { 83 | f, err := os.Open(fmt.Sprintf("%s/%016x.%s", s.path, inode, fs)) 84 | if err != nil { 85 | if os.IsNotExist(err) { 86 | return false, nil 87 | } 88 | return false, err 89 | } 90 | defer f.Close() 91 | _, err = io.Copy(w, f) 92 | if err != nil { 93 | return false, err 94 | } 95 | return true, nil 96 | } 97 | 98 | func (s *fileStorageEngine) Remove(fs string, inode uint64) error { 99 | err := os.Remove(fmt.Sprintf("%s/%016x.%s", s.path, inode, fs)) 100 | if err != nil { 101 | if os.IsNotExist(err) { 102 | return nil 103 | } 104 | return err 105 | } 106 | return nil 107 | } 108 | 109 | func (s *fileStorageEngine) Close() error { 110 | return nil 111 | } 112 | 113 | type s3StorageEngine struct { 114 | path string 115 | bucket string 116 | client *minio.Client 117 | } 118 | 119 | // Wrapper around a minio object that implements ReaderAt in a nicer way than 120 | // the minio client does when it comes to sequential reading. 121 | type s3Reader struct { 122 | lock sync.Mutex 123 | obj *minio.Object 124 | currentOffset int64 125 | } 126 | 127 | func (r *s3Reader) ReadAt(buf []byte, offset int64) (int, error) { 128 | r.lock.Lock() 129 | defer r.lock.Unlock() 130 | if offset != r.currentOffset { 131 | newOffset, err := r.obj.Seek(offset, io.SeekStart) 132 | if err != nil { 133 | r.currentOffset = ^0 134 | return 0, err 135 | } 136 | r.currentOffset = newOffset 137 | } 138 | n, err := io.ReadFull(r.obj, buf) 139 | r.currentOffset += int64(n) 140 | if err == io.ErrUnexpectedEOF { 141 | err = io.EOF 142 | } 143 | return n, err 144 | } 145 | 146 | func (r *s3Reader) Close() error { 147 | return r.obj.Close() 148 | } 149 | 150 | func (s *s3StorageEngine) Open(fs string, inode uint64) (ReaderAtCloser, bool, error) { 151 | obj, err := s.client.GetObject( 152 | context.Background(), 153 | s.bucket, 154 | fmt.Sprintf("%s%016x.%s", s.path, inode, fs), 155 | minio.GetObjectOptions{}, 156 | ) 157 | if err != nil { 158 | if minio.ToErrorResponse(err).StatusCode == 404 { 159 | return nil, false, nil 160 | } 161 | return nil, false, err 162 | } 163 | return &s3Reader{ 164 | obj: obj, 165 | currentOffset: 0, 166 | }, true, nil 167 | } 168 | 169 | func (s *s3StorageEngine) ReadAll(fs string, inode uint64, w io.Writer) (bool, error) { 170 | obj, err := s.client.GetObject( 171 | context.Background(), 172 | s.bucket, 173 | fmt.Sprintf("%s%016x.%s", s.path, inode, fs), 174 | minio.GetObjectOptions{}, 175 | ) 176 | defer obj.Close() 177 | if err != nil { 178 | if minio.ToErrorResponse(err).StatusCode == 404 { 179 | return false, nil 180 | } 181 | return false, err 182 | } 183 | _, err = io.Copy(w, obj) 184 | if err != nil { 185 | return false, err 186 | } 187 | return true, nil 188 | } 189 | 190 | func (s *s3StorageEngine) Write(fs string, inode uint64, data *os.File) (int64, error) { 191 | stat, err := data.Stat() 192 | if err != nil { 193 | return 0, err 194 | } 195 | obj, err := s.client.PutObject( 196 | context.Background(), 197 | s.bucket, 198 | fmt.Sprintf("%s%016x.%s", s.path, inode, fs), 199 | data, 200 | stat.Size(), 201 | minio.PutObjectOptions{}, 202 | ) 203 | if err != nil { 204 | if minio.ToErrorResponse(err).StatusCode == 403 { 205 | return 0, ErrPermission 206 | } 207 | return 0, err 208 | } 209 | return obj.Size, nil 210 | } 211 | 212 | func (s *s3StorageEngine) Remove(fs string, inode uint64) error { 213 | err := s.client.RemoveObject( 214 | context.Background(), 215 | s.bucket, 216 | fmt.Sprintf("%s%016x.%s", s.path, inode, fs), 217 | minio.RemoveObjectOptions{}, 218 | ) 219 | if err != nil { 220 | if minio.ToErrorResponse(err).StatusCode == 404 { 221 | return nil 222 | } 223 | return err 224 | } 225 | return nil 226 | } 227 | 228 | func (s *s3StorageEngine) Close() error { 229 | return nil 230 | } 231 | 232 | func NewObjectStorageEngine(storageSpec string) (ObjectStorageEngine, error) { 233 | 234 | if strings.HasPrefix(storageSpec, "file:") { 235 | return &fileStorageEngine{ 236 | path: storageSpec[5:], 237 | }, nil 238 | } 239 | 240 | if strings.HasPrefix(storageSpec, "s3:") { 241 | var creds *miniocredentials.Credentials 242 | 243 | u, err := url.Parse(storageSpec) 244 | if err != nil { 245 | return nil, err 246 | } 247 | 248 | q := u.Query() 249 | 250 | if u.User != nil { 251 | accessKeyID := u.User.Username() 252 | secretAccessKey, _ := u.User.Password() 253 | creds = miniocredentials.NewStaticV4(accessKeyID, secretAccessKey, "") 254 | } else { 255 | creds = miniocredentials.NewEnvAWS() 256 | } 257 | 258 | bucket, ok := q["bucket"] 259 | if !ok { 260 | return nil, fmt.Errorf("s3 storage url %q must contain bucket parameter", u.Redacted()) 261 | } 262 | 263 | isSecure := true 264 | if secureParam, ok := q["secure"]; ok { 265 | isSecure = secureParam[0] != "false" 266 | } 267 | 268 | endpoint := u.Hostname() 269 | if u.Port() != "" { 270 | endpoint = endpoint + ":" + u.Port() 271 | } 272 | 273 | client, err := minio.New(endpoint, &minio.Options{ 274 | Creds: creds, 275 | Secure: isSecure, 276 | }) 277 | if err != nil { 278 | return nil, err 279 | } 280 | 281 | return &s3StorageEngine{ 282 | bucket: bucket[0], 283 | path: u.Path, 284 | client: client, 285 | }, nil 286 | } 287 | 288 | if storageSpec == "" { 289 | return &unconfiguredStorageEngine{}, nil 290 | } 291 | 292 | return nil, errors.New("unknown/invalid storage specification") 293 | } 294 | -------------------------------------------------------------------------------- /support/nix/foundationdb.nix: -------------------------------------------------------------------------------- 1 | { fetchurl, stdenv, autoPatchelfHook }: 2 | let 3 | fdbVersion = "7.1.25"; 4 | 5 | baseUrl = "https://github.com/apple/foundationdb/releases/download/${fdbVersion}/"; 6 | 7 | fdbclients = fetchurl { 8 | url = "${baseUrl}/foundationdb-clients_${fdbVersion}-1_amd64.deb"; 9 | hash = "sha256-Z5HhQLSpEt1QwxzaZXbTYbX1XHfhV4hrsnkgA3LX/44"; 10 | }; 11 | 12 | fdbserver = fetchurl { 13 | url = "${baseUrl}/foundationdb-server_${fdbVersion}-1_amd64.deb"; 14 | hash = "sha256-g5bYMXsh4vt4bSA3h5S/o4sMQxzXoSKP6bEXHfX3DKQ"; 15 | }; 16 | in 17 | stdenv.mkDerivation { 18 | pname = "foundationdb"; 19 | 20 | version = fdbVersion; 21 | 22 | nativeBuildInputs = [ 23 | autoPatchelfHook 24 | ]; 25 | 26 | unpackPhase = '' 27 | mkdir clients 28 | cd clients 29 | ar x "${fdbclients}" 30 | tar -xzf data.tar.gz 31 | cd .. 32 | 33 | mkdir server 34 | cd server 35 | ar x "${fdbserver}" 36 | tar -xzf data.tar.gz 37 | cd .. 38 | ''; 39 | 40 | installPhase = '' 41 | mkdir "$out" 42 | cp -r clients/usr/include "$out/include" 43 | install -D -m755 clients/usr/lib/libfdb_c.so "$out/lib/libfdb_c.so" 44 | for b in $(ls clients/usr/bin) 45 | do 46 | install -D -m755 "clients/usr/bin/$b" "$out/bin/$b" 47 | done 48 | install -D -m755 "clients/usr/lib/foundationdb/backup_agent/backup_agent" "$out/libexec/backup_agent" 49 | mkdir -p "$out/lib/pkgconfig" 50 | 51 | cat < "$out/lib/pkgconfig/foundationdb-client.pc" 52 | Name: foundationdb-client 53 | Description: FoundationDB c client 54 | Version: ${fdbVersion} 55 | 56 | Libs: -L$out/lib -lfdb_c 57 | Cflags: -I$out/include 58 | EOF 59 | 60 | install -D -m755 "server/usr/sbin/fdbserver" "$out/bin/fdbserver" 61 | install -D -m755 "server/usr/lib/foundationdb/fdbmonitor" "$out/bin/fdbmonitor" 62 | ''; 63 | } -------------------------------------------------------------------------------- /support/nix/shell.nix: -------------------------------------------------------------------------------- 1 | let 2 | pkgs = (import ) {}; 3 | in 4 | pkgs.mkShell { 5 | buildInputs = [ 6 | pkgs.go 7 | pkgs.gotools 8 | ((pkgs.callPackage ./foundationdb.nix) {}) 9 | ]; 10 | } -------------------------------------------------------------------------------- /testutil/testutil.go: -------------------------------------------------------------------------------- 1 | package testutil 2 | 3 | import ( 4 | "bufio" 5 | "fmt" 6 | "net" 7 | "os" 8 | "os/exec" 9 | "path/filepath" 10 | "sync" 11 | "syscall" 12 | "testing" 13 | "time" 14 | 15 | "github.com/apple/foundationdb/bindings/go/src/fdb" 16 | ) 17 | 18 | func GetFreePort() (int, error) { 19 | addr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:0") 20 | if err != nil { 21 | return 0, err 22 | } 23 | 24 | l, err := net.ListenTCP("tcp", addr) 25 | if err != nil { 26 | return 0, err 27 | } 28 | defer l.Close() 29 | return l.Addr().(*net.TCPAddr).Port, nil 30 | } 31 | 32 | type FDBTestServer struct { 33 | t *testing.T 34 | ClusterFile string 35 | fdbServer *exec.Cmd 36 | } 37 | 38 | // Create a test server that is automatically cleaned up when the test finishes. 39 | func NewFDBTestServer(t *testing.T) *FDBTestServer { 40 | _, err := exec.LookPath("fdbserver") 41 | if err != nil { 42 | t.Skip("fdbserver not found in path") 43 | } 44 | 45 | port, err := GetFreePort() 46 | if err != nil { 47 | t.Fatal(err) 48 | } 49 | listenAddress := fmt.Sprintf("127.0.0.1:%d", port) 50 | 51 | dir := t.TempDir() 52 | clusterFile := filepath.Join(dir, "fdb.cluster") 53 | 54 | err = os.WriteFile(clusterFile, []byte(fmt.Sprintf("testcluster:12345678@%s", listenAddress)), 0o644) 55 | if err != nil { 56 | t.Fatal(err) 57 | } 58 | 59 | fdbServerOpts := []string{ 60 | "-p", listenAddress, 61 | "-C", clusterFile, 62 | } 63 | 64 | fdbServer := exec.Command( 65 | "fdbserver", 66 | fdbServerOpts..., 67 | ) 68 | fdbServer.Dir = dir 69 | 70 | rpipe, wpipe, err := os.Pipe() 71 | if err != nil { 72 | t.Fatal(err) 73 | } 74 | 75 | logWg := &sync.WaitGroup{} 76 | logWg.Add(1) 77 | go func() { 78 | defer logWg.Done() 79 | brdr := bufio.NewReader(rpipe) 80 | for { 81 | line, err := brdr.ReadString('\n') 82 | if err != nil { 83 | return 84 | } 85 | if len(line) == 0 { 86 | continue 87 | } 88 | t.Logf("fdbserver: %s", line[:len(line)-1]) 89 | } 90 | }() 91 | 92 | t.Cleanup(func() { 93 | logWg.Wait() 94 | }) 95 | 96 | fdbServer.Stderr = wpipe 97 | fdbServer.Stdout = wpipe 98 | 99 | err = fdbServer.Start() 100 | if err != nil { 101 | t.Fatal(err) 102 | } 103 | _ = wpipe.Close() 104 | 105 | t.Cleanup(func() { 106 | _ = fdbServer.Process.Signal(syscall.SIGTERM) 107 | _, _ = fdbServer.Process.Wait() 108 | }) 109 | 110 | t.Logf("starting fdbserver %v", fdbServerOpts) 111 | 112 | t.Logf("creating cluster...") 113 | err = exec.Command( 114 | "fdbcli", 115 | []string{ 116 | "-C", clusterFile, 117 | "--exec", "configure new single memory", 118 | }..., 119 | ).Run() 120 | if err != nil { 121 | t.Fatalf("unable to configure new cluster: %s", err) 122 | } 123 | 124 | up := false 125 | for i := 0; i < 2000; i++ { 126 | c, err := net.Dial("tcp", listenAddress) 127 | if err == nil { 128 | up = true 129 | _ = c.Close() 130 | break 131 | } 132 | time.Sleep(10 * time.Millisecond) 133 | } 134 | if !up { 135 | t.Fatal("fdb server never came up") 136 | } 137 | 138 | return &FDBTestServer{ 139 | t: t, 140 | ClusterFile: clusterFile, 141 | fdbServer: fdbServer, 142 | } 143 | } 144 | 145 | func (fdbServer *FDBTestServer) Dial() fdb.Database { 146 | db := fdb.MustOpenDatabase(fdbServer.ClusterFile) 147 | return db 148 | } 149 | --------------------------------------------------------------------------------