I am going to try to start doing these every week on Sunday, looking
back on the past weeks work, and looking forward to what should be done
in the next week. This is mostly to organize my thoughts, but also
serves as a look into what work has been done.
This edition covers work completed between 2020-02-24 to 2020-03-01
Features Landed
Cargo features
The asuran_core
library now has cargo features setup for the
dependencies of the various compression, encryption, and hashing
algorithms. A core
feature has been defined supporting ZStd, AES-CTR,
and Blake3, features have been setup for each encryption family, as well
as features for supporting all the algorithms for each operations (e.g..
all-compression
).
The enums in the core library still have variants for the algorithms
whose support has not been compiled in, for compatibility purposes,
however, attempting to perform operations on these variants will result
in a run time panic.
asuran_core
will also compile-time panic if there are no HMAC
algorithm features selected, as the repository format fundamentally does
not make sense without an HMAC.
The default
feature for asuran_core
includes all the features for
all the supported algorithms.
Underpinning of the Universal Listing API
The core data structure of the universal object listing API has been
written. I chose to implement this as a "flat" tree stored in a
HashMap
. Right now the Node entries are keyed by String
, however it
would probably be wise to change these to some sort of byte string, to
support keying Nodes by arbitrary data. The current model, at the very
least, conflicts with the definition of *nix paths as "any sequence of
non-NUL bytes".
I additionally added a field to the Archive
definition for the listing
associated with that archive.
I am still in the process of porting the Target
interface to use the
new listing API instead of the old strategy of using Vec<String>
.
Asuran CLI overhaul
The asuran-cli
binary crate was, more or less, completely rewritten
from scratch, with much less jank. I used structopt
to simplify
argument and sub-command handling, and created modules for each separate
command, which has generally simplified things. I still need to figure
out how to get structopt
to output the help for global options when
you pass --help
to a sub command.
The rewrite highlighted a pain point in the asuran
API, namely, using
tasks to insert objects into an archive involves lots of unnecessary
cloning. Currently, it looks like this:
let mut repo = repo.clone();
let archive = archive.clone();
let backup_target = backup_target.clone();
task_queue.push(task::spawn(async move {
(
path.clone(),
backup_target
.store_object(&mut repo, chunker.clone(), &archive, path.clone())
.await,
)
}));
In addition to the general code cleanliness improvement, the CLI has
also been updated to support the FlatFile backend, as well as the
MultiFile backend.
I believe the route forward on this is to have BackupTarget
implementations be required to store a reference to the repository and
archive, and have a method on the BackupTarget
Trait for spawning a
task directly.
Bug Fixes
Fix key length for Encryption::NoEncryption
The NoEncryption
variant of the Encryption
was updated to have a
non-zero key length. Some other places in the asuran
library assume
that encryption key has a non-zero length, such as the use of argon2
in the key encryption algorithm.
Even fully NoEncryption
repositories still require some key material
for HMAC generation and other tasks, so the best fix for this seemed to
be pretending NoEncryption
behaves like other encryption modes and
giving it a non-zero key length.
Removed key from FlatFile
constructor
This was mistakenly added as a reflection of the MultFile
interface.
Unlike MultFile
, which requires the key to verify the manifest merkle
tree on open, FlatFile
performs no such operation and thus does not
require the key.
Looking Forward
FlatFile
Produces Absurdly Large Archives
At the moment, FlatFile
is producing archives that are way too big.
With the test 10GB data set I have been using, and the same general
settings, MultiFile
produces a 3.6GB archive, where FlatFile produces
a 6.8GB archive. I need to do more debugging, but I believe this to be
due to a bug in how I am determining the location to start the next
segment. I believe the route forward is going to be to to have FlatFile
write the segements into an in memory buffer, and use the size of that
buffer to extract length information, rather than the current abuse of
read.seek(SeekFrom::Current(0)
that I am currently using.