记录学习笔记、分享资源工具、交流技术思想、提升工作效率

文件大小一致,文件修改时间相差不足1秒,rsync不同步的问题

运维 xiaomudk 6年前 (2015-10-28) 122次浏览 0个评论
文章目录[隐藏]

1.问题

今天用python调用rsync来同步文件时,发现rsync出现了文件不同步的现象

python代码太长,就不贴了,用shell来演示下:

# cat test.sh

#!/bin/bash

echo "123456" > 1.txt
rsync -av 1.txt /tmp/1.txt

echo "abcdef" > 2.txt
rsync -av 2.txt /tmp/1.txt

多执行几次这个脚本,就会发现2.txt有时候不会同步到/tmp/1.txt,就像下面一样:

# sh test.sh  
sending incremental file list
1.txt

sent 80 bytes  received 31 bytes  222.00 bytes/sec
total size is 7  speedup is 0.06
sending incremental file list

sent 30 bytes  received 12 bytes  84.00 bytes/sec
total size is 7  speedup is 0.17

2.解决办法

我去查看了rsync的说明,原来需要加一个-c参数来解决这个问题:

-c, --checksum
    This changes the way rsync checks if the files have been changed and are in need of a transfer.
Without this option, rsync  uses  a "quick  check" that  (by default)  checks if each file's size 
and time of last modification match between the sender and receiver.  This option changes this to 
compare a 128-bit check-sum for each file that has a matching size.  Generating the checksums 
means that both sides will expend a lot of disk I/O reading all the data in the files  in the 
transfer (and this is prior to any reading that will be done to transfer changed files), so this 
can slow things down significantly.

    The sending side generates its checksums while it is doing the file-system scan that builds the
list of the available files.  The receiver generates its check-sums when it is scanning for changed 
files, and will checksum any file that has the same size as the corresponding sender’s file:  
files with either a  changed size or a changed checksum are selected for transfer.

    Note  that  rsync always verifies that each transferred file was correctly reconstructed on the
receiving side by checking a whole-file checksum that is generated as the file is transferred, but 
that automatic after-the-transfer verification has nothing to do with this option's 
before-the-transfer  "Does  this  file need to be updated?" check.

        For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5.  
For older protocols, the checksum used is MD4.

原来不加-c参数的话,rsync会使用快速检测的方法,如果文件大小和修改时间一致的话,rsync就会认为文件是没有更改的。也就不会进行文件内容的检测。

OK,那么问题来了, 那rsync识别文件的修改时间到底精确到什么级别呢,是秒吗?

我又去查看了官网文档,但是没有找到相关的说明,在google上查这个问题,只找到下面的回答:

rsync uses the utime() call which sets the modification time of a file down to 1 second resolution. So, effectively, files that are the same up to the second, are considered the same for the time comparison piece of rsync’s checks.

原文: rsync time comparison – what’s the precision of the Modified times comparison

既然没有相关说明,rsync的源码又看不懂, 只好自己测试下喽

3.rsync时间戳精度测试

(1).准备文件

我准备了两个文件,里面写了一些内容:

# cat 1.txt
123456
123456
123456
123456
# cat 2.txt 
abcdef
abcdef
abcdef
abcdef

(2).时间戳相差不足一秒

这两个文件大小是一样的,下面设置一个时间戳:

# touch -m -d "2015-10-28 16:20:30.000000000" 1.txt
# stat 1.txt 
  File: `1.txt'
  Size: 28              Blocks: 8          IO Block: 4096   regular file
Device: 803h/2051d      Inode: 2228580     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-10-28 16:29:42.047297095 +0800
Modify: 2015-10-28 16:20:30.000000000 +0800
Change: 2015-10-28 16:27:37.876298708 +0800

# touch -m -d "2015-10-28 16:20:30.999999999" 2.txt
# stat 2.txt  
  File: `2.txt'
  Size: 28              Blocks: 8          IO Block: 4096   regular file
Device: 803h/2051d      Inode: 2230112     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-10-28 16:29:44.974308899 +0800
Modify: 2015-10-28 16:20:30.999999999 +0800
Change: 2015-10-28 16:28:00.651298688 +0800

(3).文件戳相差1秒

把两个文件的mtime修改成秒数一样,但是只有纳秒不一样,执行同步测试下

# rsync -v 1.txt 2.txt  

sent 26 bytes  received 12 bytes  76.00 bytes/sec
total size is 28  speedup is 0.74

可以看出来并没有同步
把2.txt的mtime设置成和1.txt相差1s

# touch -m -d "2015-10-28 16:20:35.000000000" 1.txt 
# touch -m -d "2015-10-28 16:20:36.000000000" 2.txt

再次同步一下

# rsync -v 1.txt 2.txt
1.txt

sent 97 bytes  received 31 bytes  256.00 bytes/sec
total size is 28  speedup is 0.22

这次同步过去了

(3). 结论

rsync识别文件的修改时间到底精度是秒级别的!

4.总结

rsync 默认先对比文件时间及文件大小,这样能大大减少文件对比所花的时间,当时也牺牲了对比的精度。
当我们需要对文件内容严格对比的时候,最好加上-c参数, 不过缺点就是文件对比时间会加长。


发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址