Question about backups

Nick Anthis

Well-known member
Oct 29, 2020
15
11
Here is the question of the day. It's dealing with different types of backups. First, let's start with what we know.
- A Full backup backs up the files, then resets the archive bits to off so that it knows they've been backed up.
- An Incremental backup backs up the files with the archive bit set to on (meaning the file has been changed), then resets the bit to off for each file that has had changes.
- A Differential backup backs up all the files where the archive bit it set to on, but doesn't reset it to off, meaning it will back up everything where the bit is set to on, including files it has already backed up in any previous differential backups since the last full backup (since differentials do not reset the bit to off).
- Any file changes sets the bits to on. So either an incremental or differential will back up the changed files.
So here's the scenario:
- I do a full backup, change files, then do an incremental (which will backup the files, then change the bits back to off)
- I change no files over the next 24 hours, then I do a differential backup.

So here are the questions:
- How will the differential recognize the difference between the full and incremental backups because the bits are set to off for both cases?
- Will it copy since the last full backup or the last incremental?
 
So, when the differential backup software goes, it's going to go looking for any files that have that archive bit turned on. There isn't generally any sophistication there - it's only looking for the archive bit. The software isn't going to go back to "the last full" or "the last incremental".

And as you articulated correctly, fulls and incrementals turn it off, differentials leave it on.

If the bit is on, the file gets backed up. I can change the file, or even just flip the bit on at the file level. If I wanted to, I could flip the bit off on a file that had been changed and the file wouldn't get backed up.

The backup software is creating a meta-schema that creatively manages the concepts of Full, Incremental, and Differential backups based on the state of that bit from a user's perception. But under the surface, it's far less complex.

Now if we get into the subject of synthetic backups, that's where it gets more interesting.

At least this is how I've always seen it.

/r
 

Nick Anthis

Well-known member
Oct 29, 2020
15
11
So, when the differential backup software goes, it's going to go looking for any files that have that archive bit turned on. There isn't generally any sophistication there - it's only looking for the archive bit. The software isn't going to go back to "the last full" or "the last incremental".

And as you articulated correctly, fulls and incrementals turn it off, differentials leave it on.

If the bit is on, the file gets backed up. I can change the file, or even just flip the bit on at the file level. If I wanted to, I could flip the bit off on a file that had been changed and the file wouldn't get backed up.

The backup software is creating a meta-schema that creatively manages the concepts of Full, Incremental, and Differential backups based on the state of that bit from a user's perception. But under the surface, it's far less complex.

Now if we get into the subject of synthetic backups, that's where it gets more interesting.

At least this is how I've always seen it.

/r
Ok, so then since it is only looking at the bit (which I understand), then in theory if you do a full backup, then a bunch of incrementals, followed by a differential, you might be missing files on that differential that were backed up by the incrementals when they flipped off the bit. That is, if you only restore the full backup and that differential, you wouldn't necessarily have any of the files from the incrementals. Am I understanding that correctly?
 
Last edited:
Well, let's see... let's look at this...

Day 1 - Full
Day 2 - Incremental
Day 3 - Incremental
Day 4 - Incremental
Day 5 - Differential.

You would not have the files from the three incrementals if you did a restore from the Full and Differential only. To get back to current, in this case, you have to restore the Full, Incrementals from days 2-4, and then the Differential.

Now if you did Differentials on days 2-4, then you would only need the Full and the Day 5 differential.

So I think you're understanding it correctly.

/r
 

Jarrel

Well-known member
  • Feb 17, 2020
    350
    1
    522
    Australia
    www.jarrelrivera.com
    Here is the question of the day. It's dealing with different types of backups. First, let's start with what we know.
    - A Full backup backs up the files, then resets the archive bits to off so that it knows they've been backed up.
    - An Incremental backup backs up the files with the archive bit set to on (meaning the file has been changed), then resets the bit to off for each file that has had changes.
    - A Differential backup backs up all the files where the archive bit it set to on, but doesn't reset it to off, meaning it will back up everything where the bit is set to on, including files it has already backed up in any previous differential backups since the last full backup (since differentials do not reset the bit to off).
    - Any file changes sets the bits to on. So either an incremental or differential will back up the changed files.
    So here's the scenario:
    - I do a full backup, change files, then do an incremental (which will backup the files, then change the bits back to off)
    - I change no files over the next 24 hours, then I do a differential backup.

    So here are the questions:
    - How will the differential recognize the difference between the full and incremental backups because the bits are set to off for both cases?
    - Will it copy since the last full backup or the last incremental?

    backups will have a meta data to mark the backup, that's how the backup software gets to identify which one is full, which one is not.
    differential backups will backup all that has changed (delta) since the last full backup.
     
    backups will have a meta data to mark the backup, that's how the backup software gets to identify which one is full, which one is not.
    differential backups will backup all that has changed (delta) since the last full backup.
    Historically, this wasn't the case because of the reliance of the archive bit, which I believe is what the OP was asking*. The bit was the signal flag that said "this file has been changed, back it up". The OS would set this flag. As backup software became more sophisticated, adding in things like hashing to calculate versions, a file can end up being identified as "changed", not necessarily by the state of the archive bit, but by a hash comparison. So, yes, the differential is in theory "supposed" to go to the last full. But that's highly contingent on the sophistication of the backup software and how it signals that a file has been changed.

    Simple, older backup solutions relied on the archive bit, so if that bit got turned off by an incremental or some other way, that file would not show up in a differential, even though it should have (personally have seen this with older tools). Later solutions would do comparisons of the timestamp, looking at the full backup and the timestamp of the file, largely ignoring the archive bit. Today, even, the file is hashed and compared against the version in the full backup, so regardless of the state of the archive bit or timestamp, the hash was the flag that determined a change.

    In just about every backup scenario I've ever encountered, we never mixed incremental and differentials. This decision was determined more on the restoration strategy - do I want to have a full+differential restore, or have a full+all-the-incrementals, one after another, for restores. In other words, the differential becomes the better strategy if you need ease and simplicity, or if there isn't much changing between full backups. But if that differential grows quickly, well, backing up all the previous changes since the last full slows down the backup process, leading to more consumption of backup media space and slowing down the backup process. In times like that, the incremental strategy is more preferable.

    Even more interesting on this topic is what actually gets backed up when a file is changed. Older solutions would back up the file completely into the incremental or differential if it got changed. Newer solutions in the interest of retaining space would backup just the changed blocks, (file vs. block). Cloud solutions now are doing just the change for efficiency sake, leading to a more synthetic method of restoring files.

    Which leads into synthetic full backups, where you sort of get the best of both worlds. In a synthetic full backup, the backup software takes the previous full backup and all the incremental backups created over a set period of time and combines them into a new full, synthesized backup. The new synthetic backup contains the same data as an active full backup. The only difference is how the new backup is created. Instead of copying the source data to create a new, full backup, the synthetic full backup includes the unchanged data from the source plus all the incremental backups of changed data.

    All of this to say that how backups have evolved over time have related to trying to strategies to make backup and restoration operations faster, maximize media costs, and make disaster recovery easier to do, since it may be that lesser trained personnel may be put in place to conduct DR.

    /r

    * It's always amazing to me these days that training is still talking about the archive bit, as if it really has much to do with backup anymore. For reasons that I mentioned, it's largely ignored by today's backup solutions, deferring more to hashing to compare changes.
     
    Last edited:

    Jarrel

    Well-known member
  • Feb 17, 2020
    350
    1
    522
    Australia
    www.jarrelrivera.com
    Good points @Rick Butler

    I particularly like the post script about the archive bit. Sadly, the on/off of the archive bit is an exam question in cert exams. So, we still need to teach it.

    Also, thanks for the historical info. However, curriculum-wise, we do teach that incremental backs up any data that has changed whereas differential backs up any data that has changed since the full backup. I agree. It doesn't quite add up when you think of the archive bit and that's where the metadata comes in (or to your point, the hash).
     
    • Like
    Reactions: Rick Butler
    I particularly like the post script about the archive bit. Sadly, the on/off of the archive bit is an exam question in cert exams. So, we still need to teach it.

    Also, thanks for the historical info. However, curriculum-wise, we do teach that incremental backs up any data that has changed whereas differential backs up any data that has changed since the full backup. I agree. It doesn't quite add up when you think of the archive bit and that's where the metadata comes in (or to your point, the hash).
    Exactly, and to that point, it's an important distinction to understand because folks would look at that archive it and think it has bearing in the process much anymore. And yes, the theory is right with respect to incremental and differential backups, what they are intended to do on the surface. Once we dig a bit, we find out that it's not always what it is "supposed" to be - academic vs. technical.
     
    • Like
    Reactions: Jarrel
    Here is the question of the day. It's dealing with different types of backups. First, let's start with what we know.
    - A Full backup backs up the files, then resets the archive bits to off so that it knows they've been backed up.
    - An Incremental backup backs up the files with the archive bit set to on (meaning the file has been changed), then resets the bit to off for each file that has had changes.
    - A Differential backup backs up all the files where the archive bit it set to on, but doesn't reset it to off, meaning it will back up everything where the bit is set to on, including files it has already backed up in any previous differential backups since the last full backup (since differentials do not reset the bit to off).
    - Any file changes sets the bits to on. So either an incremental or differential will back up the changed files.
    So here's the scenario:
    - I do a full backup, change files, then do an incremental (which will backup the files, then change the bits back to off)
    - I change no files over the next 24 hours, then I do a differential backup.

    So here are the questions:
    - How will the differential recognize the difference between the full and incremental backups because the bits are set to off for both cases?
    - Will it copy since the last full backup or the last incremental?
    The problem here is in your logic - you would never combine both incremental and differential backups in a backup set. You use one or the other.
    If you want faster backups, do a Full backup Sundays, with incremental backups Monday-Saturday.
    If you want faster restores, do a Full backup Sundays, with differential backups Monday-Saturday.

    Today, we don't really use differential or incremental backups, because they were designed for the days when we used tapes to do backups (and a tape backup could take upwards of 12 hours or more depending on the size of data because they were so slow, and wouldn't be finished before the next business day). Since we back up to disk/network/cloud today, all backups are typically just full backups. The only other type of backup used today are surprise backups (a.k.a. data breaches ;-).