How ByteSync leverages the cloud and delta copy to synchronize with speed and ease

Published on December 29, 2021 by Paul Fresquet
Last update on March 27, 2023



How ByteSync leverages the cloud and delta copy to synchronize with speed and ease

About ByteSync

This article was originally published on https://www.pow-software.com/blog/bytesync-cloud-rsync-synchronize-speed-ease/ on December 29, 2021.

We are currently finalizing the development of ByteSync, our remote data backup and synchronization solution. Secure, powerful, light, easy to deploy, it will easily find its place in the palette of tools available to IT technicians.
ByteSync is unique in that it relies on the cloud and on algorithms derived from rsync technology to bring ease of use and efficiency to users: it brings rsync speed through the Internet.
The software will be publicly available in 2022 in both free and paid versions.

Before we begin, let’s make one important point clear right away: ByteSync is not a cloud data backup solution. ByteSync uses the Cloud to connect client software via the Internet and allow them to synchronize remote data.
In other words, if you need to back up data between two computers located on different networks, you can do it with ByteSync, but if you want to back up data to a Cloud (Google Drive, OneDrive, DropBox, etc.), then you will have to use another software.

ByteSync uses the Cloud to synchronize files remotely

ByteSync is broadly comparable to remote file backup solutions that use protocols such as FTP or SFTP. It can complement or replace them.
It can be used to quickly check the integrity of one or more backup copies and excels at resynchronizing large data sets between remote computers and networks. A special feature is that it can associate 2 to 5 participants in the same synchronization session.

Innovations with the Cloud and delta copy

The reason why ByteSync offers such ease of use, speed of analysis, and high synchronization performance is because the solution’s features leverage the power of the Cloud and delta copy (rsync-like algorithms).
We will see below, through 4 specific advantages, some of the possibilities offered by ByteSync thanks to the combined use of these technologies:

  • Solution deployment
    The implementation of the software is very easy. After downloading the client software, which is available in both installable and portable versions, you simply install or unzip it and run it. Since ByteSync communicates with other clients via the cloud using the HTTPS protocol, it is ready to use. Most of the time, there will be no network settings to make and no other components to deploy for the software to work properly.
  • Secure sessions
    To synchronize files between client software, they simply have to connect to each other through a system of secure and authenticated sessions. A data exchange session can gather up to 5 members.
    The software shows each participant who the other members of the session are, what data they have selected and the progress of the analysis and synchronization processes.
    To ensure security and confidentiality, data exchanged between client software in the same session is encrypted with end-to-end encryption (E2EE).
Key benefits that ByteSync offers through its use of the Cloud and rsync
  • 2 steps data analysis
    The analysis phase consists of taking inventory of the data and comparing those inventories to determine what is similar and what is different.
    ByteSync intelligently leverages the Cloud and rsync to perform these operations using the least amount of resources and time possible, while obtaining the maximum amount of useful information to then guide users in their choices. To do this, inventories are established in an iterative and dynamic way. They contain the rsync signatures of the duplicate files to accurately determine the differences and then accelerate the synchronization phase.
  • Synchronization and resynchronization
    During the synchronization phase, files are exchanged between session members according to the choices made by the users. The more differences to be corrected, the larger the volume of data to be transferred.
    ByteSync takes advantage of the Cloud and rsync to reduce the amount of data to be transferred and to make the transfers more reliable. The files are split into blocks to facilitate error recovery and then encrypted before being sent. They are buffered on the server so that the same block can be downloaded by the different recipients of the file. When a file already present at several clients is synchronized, the use of rsync makes it possible to transfer only the parts that differ.

Some definitions

Before describing more precisely these four innovations specific to ByteSync, here are some brief definitions of some of the terms used in this post.

Cloud
This is a term that appeared several years ago and that you probably know if you are an IT technician or if you have managed to read this post so far. Cloud, or Cloud Computing, refers to IT services hosted by a specialized service provider and accessible via the Internet.
The server part of ByteSync, which performs the orchestration and serves as a temporary repository for file transfers, is hosted on Azure, Microsoft’s cloud platform.

rsync
rsync is a free file synchronization software that operates unidirectionally and offers many possible options for backing up data.
One of the special features of rsync is that it uses a dedicated algorithm to calculate advanced digital signatures of files, which allows to efficiently determine the parts that differ between 2 duplicates. Thanks to this system, during resynchronization, a minimal amount of data containing only the differences will transit between the source and the destination and will allow to reconstruct the data on the destination. This is the functional set of rsync that is implemented in ByteSync.

ByteSync Open Beta has started,
You can join in for free!

Take part in the final testing phase of the application, help us improve it and receive discount codes to benefit from the solution at a reduced price!
👉 Get more information about the ByteSync Open Beta on our blog.

OPEN BETA

FREE

ByteSync

5 concurrent instances
1TB synchronization per month
All features
Early Access discount codes

Backup or synchronization?
Usually, backup is considered to mean copying files from a source A to a destination B while keeping the deleted data for a given period of time. It is possible to keep intermediate states of data.
Synchronizing means copying and deleting files between source A and destination B in such a way as to ensure that A and B have the same set of files at the end of the process.
In concept, ByteSync is more of a data synchronization software. However, since the user will be able to choose whether or not to keep the deleted data, and since up to 5 clients can be connected to synchronize data multi-directionally, it can also be considered as backup software.

HTTPS and End-to-End Encryption (E2EE)
HTTPS is one of the most widely used secure communication protocols on the Internet and in the cloud world. It uses digital certificates to guarantee the confidentiality and integrity of data exchanged between a user and a server.
End-to-end encryption (E2EE) is a communication system that allows only trusted users to exchange encrypted data.
ByteSync leverages HTTPS and E2EE. This allows clients to communicate with each other through the server in a bidirectional manner, without the server or any other unauthorized entity being able to read or modify the messages exchanged.

Advantage 1: ByteSync, an easy to deploy solution

ByteSync is a software solution with a client-server architecture where the server part is hosted in the Cloud. Each client software itself establishes a bidirectional outbound connection in HTTPS with the ByteSync server.

To get the client software, it just needs to be downloaded from the Customer Area. Two versions will be available: an “installer” version, which can be installed in just a few steps, and a “portable” version, which only requires decompression.

In order for ByteSync to work, the Microsoft .NET 6 framework must be installed on the machine beforehand. The download and installation is usually fast on modern computers and networks (about 200 MB).

There are no services to install or configure and only one piece of software is running on each machine. There are no specific settings to be made in firewalls and proxies for the software to work: the software does not require any redirection or opening of incoming ports.
On highly secured networks where domains that are not explicitly authorized are not accessible, you will have to authorize access to a few Azure servers whose domain names are indicated in the documentation. Only outgoing HTTPS connections are used by the software.

Thus, the deployment is very simple and a few minutes are usually enough to get ByteSync up and running.

The 4 steps to synchronize remote data with ByteSync

This is a game changer as it is rarely the case with backup solutions. Most of the time, the implementation is more complex. Deployment of several software components and configuration of network equipment are frequently required. It is often necessary to check and correct the settings several times to get the solution to work properly. And even when everything is set up correctly, the system must be kept running and secure with software updates and audits of network rules and settings.

But, let’s say you have a critical or urgent need to perform an integrity check or synchronize data between remote computers or networks for which either no solution has been provided or the deployed solutions are too slow or not suitable. With ByteSync, it will only take a few minutes for the system to be fully operational and for data analysis and synchronization to begin.

Advantage 2: ByteSync, secure sessions for up to 5 clients

Once the deployment is done, all you have to do is launch the software and perform a few operations to create or join a synchronization session.
In ByteSync, a session can be seen as a secure room in which the participants will join. The creator initiates the session and the other members join it after authenticating themselves. Each participant has a real-time view of the selected data and the progress of the analysis and synchronization operations of each other member.
A single session can gather up to 5 clients who can select different data to compare and synchronize them. Thus, a wide variety of use cases can be implemented, with great flexibility.

To start, here are 3 examples of one-to-one synchronization, with 2 clients A and B:

  • If each participant selects a directory, it will be possible to perform a mirror synchronization from A to B. It will also be possible to make a backup from A to B if you indicate that you want to keep the files deleted on A on B.
  • If each participant selects a directory, it will be possible to synchronize some files from A to B and some files from B to A.
  • If the participants select different directories or files, it will be possible to check which data are present on both A and B and to synchronize the differences with ease.

As soon as more than 3 or more members take part in a session, it will be possible to perform star synchronizations. This means that it will be possible to synchronize data asymmetrically, with members being both data source for some files and data destination for other files.

Here are some examples:

  • With 3 participants who will each select a directory, it will be possible to perform a mirror synchronization from A to B and C. It will also be possible to make a backup from A to B and C if you indicate that you want to keep files on B and C that would be deleted on A.
  • With 4 participants it will be possible to synchronize part of the data from A to B, part of the data from B to C, part of the data from C to D and part of the data from D to A.
  • With 5 participants selecting different directories or files, it will be possible to control which data from A are present on B and C, and which data from A are present on D and E. It will then be possible to synchronize part of A’s data on B and C and another part on D and E.
  • With 5 participants selecting different directories or files, it will be possible to control and synchronize data between one cluster (A and B) and another cluster (C, D and E).
Up to 5 clients can securely synchronize data during a ByteSync session

Obviously, security is at the heart of ByteSync and its session system. Connections between the clients and the server always use the HTTPS protocol, which encrypts communications over the Internet. Moreover, the data exchanged between the members of the same session are secured by a dedicated end-to-end encryption (E2EE). Thus, only the participants will be able to read or write shared data. This system prevents intrusions by unauthorized users or software (e.g. man-in-the-middle attack). Even the ByteSync server is unable to read the data. It can only blindly transmit it to the clients.

Do you want to be notified of future announcements?

Subscribe to the ByteSync newsletter!

Advantage 3: ByteSync, a fast and accurate data analysis

Usually, before any software synchronizes files, it starts with a data inventory of the sources and destinations to determine the differences that need to be corrected. Depending on the data tree and the protocol used, this inventory may require many exchanges of information between the endpoints involved. For example, in the case of an analysis via the FTP protocol, there are about as many remote requests as there are directories in the destination repositories. To avoid having to perform this analysis systematically, some software records the structure of the remote repository. This saves time by avoiding the initial analysis of the remote data, but it requires that no changes are made to the destination repositories between backups, which is not always applicable.

When the repositories are analyzed remotely, another problem arises. Most of the time, only the last modification dates and file sizes are inventoried, and no information about the content is recorded. Therefore, only the last modification dates and the file sizes can be compared and the files will be considered identical if these properties are identical. However, there is no guarantee of this. The content of one of the files could have been modified by software, malicious or not, without its last modification date or its size having changed. It is also possible that some blocks of the hard disk are defective and that it prevents the file from being read. Of course, if there is no known threat or hardware defect, identical last modification dates and sizes can be considered as a sufficient proof of uniqueness. But if there is any doubt, how can the operator in charge of the backup be sure if the protocol or the software used does not allow the control of the content?

Screenshot of a development version of ByteSync
Screenshot of a development version of ByteSync

ByteSync uses the Cloud and rsync to perform an iterative and adaptive repository analysis based on the following principles:

  • Each client takes inventory of its own data. Inventories are done in parallel on each client machine.
  • Inventories are exchanged between clients via the Cloud.
  • When a file is only present on one repository, it will have to be transferred in its entirety in case of synchronization, it is then not useful to calculate its rsync signature.
  • When files are duplicated and their last modification date or size differs, their rsync signature will be calculated and the inventories will be updated.
  • When files are duplicated and their last modification date and size are equal, the behavior will depend on the analysis parameters defined by the users.
    In checksum mode, their rsync signature will be calculated and the inventories will be updated.
    In standard mode, they will be considered identical and the rsync signature will not be calculated automatically. But in this mode, once the inventories are completed and displayed, users can request on a case-by-case basis that the rsync signature of certain files be calculated.

Analysis time is optimized because inventories are done locally, in parallel, and because rsync signatures are calculated only as needed.

Thanks to the Cloud, the analysis of ByteSync data is decentralized and calculates rsync signatures dynamically

Advantage 4: ByteSync, fast synchronizations and resynchronizations

ByteSync leverages the flexibility of the Cloud and the power of rsync to optimize file transfers between session members.
To transfer a file to one or more recipients, the sender breaks the file into chunks that are each encrypted with AES-256. Each chunk is then sent and stored on the ByteSync server waiting to be downloaded. Once all recipients have downloaded the chunk, it is deleted from the ByteSync server. This mode of operation offers at least 3 advantages:

  • If an upload or download of a song fails, it is sufficient to retry the operation for that chunk only and not for the whole file.
  • If a file has several recipients, it is uploaded by the sender only once. Each recipient then downloads the same parts.
  • The chunks stored on the server are systematically encrypted with the session’s end-to-end encryption system. They are therefore only readable and exploitable by the members of the session. The chunks of files are deleted from the server as soon as they have been downloaded by the recipients.

If the file to be transferred is present at several clients, the rsync technology will come into play to optimize the resynchronization. Thanks to the rsync signatures calculated during the analysis phase, ByteSync will determine which parts of the file should be transferred to reconstitute the file at the recipients. This system is particularly efficient when the copies of a large file differ only in a minor way. Instead of retransferring the entire contents of the file, only minimal parts containing the differences will be sent by the sender to the recipients.

Finally, and following on from what was discussed in the previous chapter on data analysis, the synchronization can be bidirectional, and even multidirectional when the session has 3 or more members.

Presentation of the Secure Buffer Zone where the data exchanged during the ByteSync synchronization is stored

An efficient remote data synchronization solution

The solution operates rsync over the internet and it goes further. In this post, we have seen four major features of ByteSync, achieved with the use of the Cloud, rsync or the joint operation of both.

The easy deployment of the solution will allow IT technicians to implement it very quickly, which will be an advantage for regular use as well as for a specific and targeted need. Its modern and secure session system will facilitate the connection of users who, thanks to its innovative analysis engine, will have relevant information at their disposal in a minimal time. Once the synchronization choices have been made, data transfers will be optimized thanks to the buffering on the server and the resynchronization technology of rsync.

All this makes ByteSync a unique solution in the data backup market. But the list of benefits of ByteSync does not end there. The software is packed with other features and characteristics designed to make it simple, fast and lightweight. Our goal is to make it one of the best backup and synchronization solution.

As I mentioned at the beginning of the article, the software is still under development. In the next few weeks, we will confirm which operating systems will be supported. Our goal is for ByteSync to run interoperably on Windows, Linux and Mac so that everyone can use it on their production and backup environments. In addition, we will also communicate the details of the Beta program that will start in early 2022.

Connect with ByteSync!

Find us on social networks to follow our news, releases and discover our videos.