1.问题
今天用python调用rsync来同步文件时,发现rsync出现了文件不同步的现象
python代码太长,就不贴了,用shell来演示下:
# cat test.sh
#!/bin/bash
echo "123456" > 1.txt
rsync -av 1.txt /tmp/1.txt
echo "abcdef" > 2.txt
rsync -av 2.txt /tmp/1.txt
多执行几次这个脚本,就会发现2.txt有时候不会同步到/tmp/1.txt,就像下面一样:
# sh test.sh
sending incremental file list
1.txt
sent 80 bytes received 31 bytes 222.00 bytes/sec
total size is 7 speedup is 0.06
sending incremental file list
sent 30 bytes received 12 bytes 84.00 bytes/sec
total size is 7 speedup is 0.17
2.解决办法
我去查看了rsync的说明,原来需要加一个-c
参数来解决这个问题:
-c, --checksum
This changes the way rsync checks if the files have been changed and are in need of a transfer.
Without this option, rsync uses a "quick check" that (by default) checks if each file's size
and time of last modification match between the sender and receiver. This option changes this to
compare a 128-bit check-sum for each file that has a matching size. Generating the checksums
means that both sides will expend a lot of disk I/O reading all the data in the files in the
transfer (and this is prior to any reading that will be done to transfer changed files), so this
can slow things down significantly.
The sending side generates its checksums while it is doing the file-system scan that builds the
list of the available files. The receiver generates its check-sums when it is scanning for changed
files, and will checksum any file that has the same size as the corresponding sender’s file:
files with either a changed size or a changed checksum are selected for transfer.
Note that rsync always verifies that each transferred file was correctly reconstructed on the
receiving side by checking a whole-file checksum that is generated as the file is transferred, but
that automatic after-the-transfer verification has nothing to do with this option's
before-the-transfer "Does this file need to be updated?" check.
For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5.
For older protocols, the checksum used is MD4.
原来不加-c
参数的话,rsync会使用快速检测的方法,如果文件大小和修改时间一致的话,rsync就会认为文件是没有更改的。也就不会进行文件内容的检测。
OK,那么问题来了, 那rsync识别文件的修改时间到底精确到什么级别呢,是秒吗?
我又去查看了官网文档,但是没有找到相关的说明,在google上查这个问题,只找到下面的回答:
rsync uses the utime() call which sets the modification time of a file down to 1 second resolution. So, effectively, files that are the same up to the second, are considered the same for the time comparison piece of rsync’s checks.
原文: rsync time comparison – what’s the precision of the Modified times comparison
既然没有相关说明,rsync的源码又看不懂, 只好自己测试下喽
3.rsync时间戳精度测试
(1).准备文件
我准备了两个文件,里面写了一些内容:
# cat 1.txt
123456
123456
123456
123456
# cat 2.txt
abcdef
abcdef
abcdef
abcdef
(2).时间戳相差不足一秒
这两个文件大小是一样的,下面设置一个时间戳:
# touch -m -d "2015-10-28 16:20:30.000000000" 1.txt
# stat 1.txt
File: `1.txt'
Size: 28 Blocks: 8 IO Block: 4096 regular file
Device: 803h/2051d Inode: 2228580 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2015-10-28 16:29:42.047297095 +0800
Modify: 2015-10-28 16:20:30.000000000 +0800
Change: 2015-10-28 16:27:37.876298708 +0800
# touch -m -d "2015-10-28 16:20:30.999999999" 2.txt
# stat 2.txt
File: `2.txt'
Size: 28 Blocks: 8 IO Block: 4096 regular file
Device: 803h/2051d Inode: 2230112 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2015-10-28 16:29:44.974308899 +0800
Modify: 2015-10-28 16:20:30.999999999 +0800
Change: 2015-10-28 16:28:00.651298688 +0800
(3).文件戳相差1秒
把两个文件的mtime修改成秒数一样,但是只有纳秒不一样,执行同步测试下
# rsync -v 1.txt 2.txt
sent 26 bytes received 12 bytes 76.00 bytes/sec
total size is 28 speedup is 0.74
可以看出来并没有同步
把2.txt的mtime设置成和1.txt相差1s
# touch -m -d "2015-10-28 16:20:35.000000000" 1.txt
# touch -m -d "2015-10-28 16:20:36.000000000" 2.txt
再次同步一下
# rsync -v 1.txt 2.txt
1.txt
sent 97 bytes received 31 bytes 256.00 bytes/sec
total size is 28 speedup is 0.22
这次同步过去了
(3). 结论
rsync识别文件的修改时间到底精度是秒级别的!
4.总结
rsync 默认先对比文件时间及文件大小,这样能大大减少文件对比所花的时间,当时也牺牲了对比的精度。
当我们需要对文件内容严格对比的时候,最好加上-c
参数, 不过缺点就是文件对比时间会加长。