能力有限,大多拾人牙慧,恳请指正。
今天偶然看到一篇博客,讲的powershell的stdout引起的Docker跨平台导出/导入镜像时报错:”archive/tar: invalid tar header”问题,于是我便想详细探索一下个中缘由。
简单写一个脚本来比较一下:
# 创建测试目录和文件
$testDir = Join-Path (Get-Location) "tar-test"
New-Item -ItemType Directory -Path $testDir -Force | Out-Null
Set-Location $testDir
# 创建测试文件
"Hello World" | Out-File "test.txt"
"Test Data" | Out-File "data.txt"
# 使用tar命令创建三个不同的tar文件
# 方法1:使用重定向
tar -cf - test.txt data.txt > "redirect.tar"
# 方法2:使用-f参数
tar -cf "direct.tar" test.txt data.txt
# 方法3:使用Out-File而不是Set-Content
$tarOutput = tar -cf - test.txt data.txt
[System.IO.File]::WriteAllBytes("$testDir\setcontent.tar", [System.Text.Encoding]::UTF8.GetBytes($tarOutput))
function Show-TarHeader {
param([string]$filePath)
if (!(Test-Path $filePath)) {
Write-Host "File not found: $filePath"
return
}
try {
$bytes = [System.IO.File]::ReadAllBytes($filePath)
Write-Host "`nFile: $filePath"
Write-Host "Size: $($bytes.Length) bytes"
# TAR文件头部通常是512字节
$headerSize = [Math]::Min(512, $bytes.Length)
Write-Host "First $headerSize bytes of TAR header:"
# 每16个字节显示一行
for ($i = 0; $i -lt $headerSize; $i += 16) {
$line = $bytes[$i..([Math]::Min($i + 15, $headerSize - 1))]
$hex = ($line | ForEach-Object { $_.ToString("X2") }) -join " "
$ascii = ($line | ForEach-Object {
if ([char]::IsControl($_) -or $_ -gt 127) { "." } else { [char]$_ }
}) -join ""
$addr = $i.ToString("X4")
# 补齐hex显示
$hex = $hex.PadRight(48)
Write-Host "$addr $hex $ascii"
}
# 检查是否为有效的TAR文件
if ($bytes.Length -ge 262) {
# 确保文件足够长
$ustarBytes = $bytes[257..261]
$ustarString = [System.Text.Encoding]::ASCII.GetString($ustarBytes)
if ($ustarString -eq "ustar") {
Write-Host "`nValid TAR format: True"
Write-Host "Found valid 'ustar' marker"
}
else {
Write-Host "`nValid TAR format: False"
Write-Host "Invalid or missing 'ustar' marker"
}
}
else {
Write-Host "`nValid TAR format: False"
Write-Host "File too short for TAR format"
}
}
catch {
Write-Host "Error processing file: $_"
}
Write-Host "-------------------"
}
Show-TarHeader "$testDir\redirect.tar"
Show-TarHeader "$testDir\direct.tar"
Show-TarHeader "$testDir\setcontent.tar"
# 显示文件大小比较
Write-Host "`nFile sizes:"
Get-Item "$testDir\redirect.tar", "$testDir\direct.tar", "$testDir\setcontent.tar" | Select-Object Name, Length | Format-Table
# 尝试解压这些文件来验证它们
Write-Host "`nTesting tar extraction..."
$extractDirs = @("redirect", "direct", "setcontent")
foreach ($dir in $extractDirs) {
$extractPath = Join-Path $testDir $dir
New-Item -ItemType Directory -Path $extractPath -Force | Out-Null
Push-Location $extractPath
try {
$tarFile = Join-Path $testDir "$dir.tar"
tar -xf $tarFile
Write-Host "Extracted files in ${dir}:"
Get-ChildItem | Select-Object Name, Length | Format-Table
}
catch {
Write-Host "Error extracting $dir.tar: $_"
}
Pop-Location
}
# 清理
Set-Location ..
Remove-Item $testDir -Recurse -Force
Write-Host "Test files cleaned up."
结果:
File: D:\temp\tar-test\redirect.tar
Size: 20490 bytes
First 512 bytes of TAR header:
0000 FF FE 74 00 65 00 73 00 74 00 2E 00 74 00 78 00 ..t.e.s.t...t.x.
0010 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 t...............
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00C0 00 00 00 00 00 00 00 00 00 00 30 00 30 00 30 00 ..........0.0.0.
00D0 36 00 36 00 36 00 20 00 00 00 30 00 30 00 30 00 6.6.6. ...0.0.0.
00E0 30 00 30 00 30 00 20 00 00 00 30 00 30 00 30 00 0.0.0. ...0.0.0.
00F0 30 00 30 00 30 00 20 00 00 00 30 00 30 00 30 00 0.0.0. ...0.0.0.
0100 30 00 30 00 30 00 30 00 30 00 30 00 33 00 34 00 0.0.0.0.0.0.3.4.
0110 20 00 31 00 34 00 37 00 33 00 33 00 34 00 37 00 .1.4.7.3.3.4.7.
0120 36 00 35 00 32 00 31 00 20 00 30 00 31 00 30 00 6.5.2.1. .0.1.0.
0130 37 00 36 00 31 00 00 00 20 00 30 00 00 00 00 00 7.6.1... .0.....
0140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Valid TAR format: False
Invalid or missing 'ustar' marker
-------------------
File: D:\temp\tar-test\direct.tar
Size: 3072 bytes
First 512 bytes of TAR header:
0000 74 65 73 74 2E 74 78 74 00 00 00 00 00 00 00 00 test.txt........
0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 00 00 00 00 30 30 30 36 36 36 20 00 30 30 30 30 ....000666 .0000
0070 30 30 20 00 30 30 30 30 30 30 20 00 30 30 30 30 00 .000000 .0000
0080 30 30 30 30 30 33 34 20 31 34 37 33 33 34 37 36 0000034 14733476
0090 35 32 31 20 30 31 30 37 36 31 00 20 30 00 00 00 521 010761. 0...
00A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0100 00 75 73 74 61 72 00 30 30 00 00 00 00 00 00 00 .ustar.00.......
0110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0140 00 00 00 00 00 00 00 00 00 30 30 30 30 30 30 20 .........000000
0150 00 30 30 30 30 30 30 20 00 00 00 00 00 00 00 00 .000000 ........
0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Valid TAR format: True
Found valid 'ustar' marker
-------------------
File: D:\temp\tar-test\setcontent.tar
Size: 10246 bytes
First 512 bytes of TAR header:
0000 74 65 73 74 2E 74 78 74 00 00 00 00 00 00 00 00 test.txt........
0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 00 00 00 00 30 30 30 36 36 36 20 00 30 30 30 30 ....000666 .0000
0070 30 30 20 00 30 30 30 30 30 30 20 00 30 30 30 30 00 .000000 .0000
0080 30 30 30 30 30 33 34 20 31 34 37 33 33 34 37 36 0000034 14733476
0090 35 32 31 20 30 31 30 37 36 31 00 20 30 00 00 00 521 010761. 0...
00A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0100 00 75 73 74 61 72 00 30 30 00 00 00 00 00 00 00 .ustar.00.......
0110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0140 00 00 00 00 00 00 00 00 00 30 30 30 30 30 30 20 .........000000
0150 00 30 30 30 30 30 30 20 00 00 00 00 00 00 00 00 .000000 ........
0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Valid TAR format: True
Found valid 'ustar' marker
-------------------
File sizes:
Name Length
---- ------
redirect.tar 20490
direct.tar 3072
setcontent.tar 10246
Testing tar extraction...
tar.exe: Error opening archive: Unrecognized archive format
Extracted files in redirect:
Extracted files in direct:
Name Length
---- ------
data.txt 24
test.txt 28
tar.exe: Damaged tar archive
tar.exe: Retrying...
tar.exe: Damaged tar archive
tar.exe: Retrying...
Extracted files in setcontent:
Name Length
---- ------
test.txt 28
使用重定向输出不仅在每个字节后写入的空字节,因此header中缺失了tar标记,导致无法正常解压。
但这里会发现一个问题,直接使用-f
参数是没有问题的,而使用WriteAllBytes(pipe)
却只能解压一部分文件,这是什么原因呢?
由于文件头完全相同,我们比较一下后续内容:
# 创建测试目录和文件
$testDir = Join-Path (Get-Location) "tar-test"
New-Item -ItemType Directory -Path $testDir -Force | Out-Null
Set-Location $testDir
# 创建测试文件
"Hello World" | Out-File "test.txt"
"Test Data" | Out-File "data.txt"
# 方法2:使用-f参数
tar -cf "direct.tar" test.txt data.txt
# 方法3:使用WriteAllBytes
$tarOutput = tar -cf - test.txt data.txt
[System.IO.File]::WriteAllBytes("$testDir\setcontent.tar", [System.Text.Encoding]::UTF8.GetBytes($tarOutput))
function Compare-BinaryFiles {
param (
[string]$file1,
[string]$file2
)
$bytes1 = [System.IO.File]::ReadAllBytes($file1)
$bytes2 = [System.IO.File]::ReadAllBytes($file2)
Write-Host "File sizes:"
Write-Host "${file1}: $($bytes1.Length) bytes"
Write-Host "${file2}: $($bytes2.Length) bytes"
# 找到第一个不同的位置
$minLength = [Math]::Min($bytes1.Length, $bytes2.Length)
$firstDiff = -1
for ($i = 0; $i -lt $minLength; $i++) {
if ($bytes1[$i] -ne $bytes2[$i]) {
$firstDiff = $i
break
}
}
if ($firstDiff -ge 0) {
Write-Host "`nFirst difference at position: $firstDiff"
# 显示差异周围的内容
$start = [Math]::Max(0, $firstDiff - 16)
$end = [Math]::Min($firstDiff + 16, $minLength)
Write-Host "`nFile1 content around difference:"
$hex1 = ($bytes1[$start..$end] | ForEach-Object { $_.ToString("X2") }) -join " "
$ascii1 = ($bytes1[$start..$end] | ForEach-Object {
if ([char]::IsControl($_) -or $_ -gt 127) { "." } else { [char]$_ }
}) -join ""
Write-Host "Hex: $hex1"
Write-Host "ASCII: $ascii1"
Write-Host "`nFile2 content around difference:"
$hex2 = ($bytes2[$start..$end] | ForEach-Object { $_.ToString("X2") }) -join " "
$ascii2 = ($bytes2[$start..$end] | ForEach-Object {
if ([char]::IsControl($_) -or $_ -gt 127) { "." } else { [char]$_ }
}) -join ""
Write-Host "Hex: $hex2"
Write-Host "ASCII: $ascii2"
}
else {
Write-Host "`nFiles are identical up to length of shorter file"
}
}
Compare-BinaryFiles "$testDir\direct.tar" "$testDir\setcontent.tar"
结果:
File sizes:
D:\temp\tar-test\direct.tar: 3072 bytes
D:\temp\tar-test\setcontent.tar: 10246 bytes
First difference at position: 512
File1 content around difference:
Hex: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F
ASCII: ..................H.e.l.l.o. .W.o
File2 content around difference:
Hex: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 EF A3 B5 EF A8 9F 00 65 00 6C 00 6C 00 6F 00 20 00
ASCII: .......................e.l.l.o. .
这里的差异是双重编码导致的,与重定向无关:
- tar 命令输出的二进制数据被 PowerShell 当作字符串处理
- UTF8.GetBytes 把这个字符串转回字节
修改一下:
# 创建测试目录和文件
$testDir = Join-Path (Get-Location) "tar-test"
New-Item -ItemType Directory -Path $testDir -Force | Out-Null
Set-Location $testDir
# 创建测试文件
"Hello World" | Out-File "test.txt"
"Test Data" | Out-File "data.txt"
# 方法1:使用-f参数(基准方法)
tar -cf "direct.tar" test.txt data.txt
# 方法2:使用管道和字节流
$process = Start-Process -FilePath "tar" -ArgumentList "-cf", "-", "test.txt", "data.txt" -RedirectStandardOutput "piped.tar" -NoNewWindow -PassThru -Wait
# 方法3:使用临时文件
$tempFile = Join-Path $testDir "temp.tar"
tar -cf $tempFile test.txt data.txt
Move-Item $tempFile "moved.tar" -Force
function Compare-BinaryFiles {
param (
[string[]]$files
)
# 显示所有文件的大小
Write-Host "File sizes:"
foreach ($file in $files) {
$size = (Get-Item $file).Length
Write-Host "${file}: $size bytes"
}
# 比较文件内容
$referenceFile = $files[0]
$referenceBytes = [System.IO.File]::ReadAllBytes($referenceFile)
for ($i = 1; $i -lt $files.Length; $i++) {
$compareFile = $files[$i]
$compareBytes = [System.IO.File]::ReadAllBytes($compareFile)
Write-Host "`nComparing $referenceFile with $compareFile..."
# 找到第一个不同的位置
$minLength = [Math]::Min($referenceBytes.Length, $compareBytes.Length)
$firstDiff = -1
for ($j = 0; $j -lt $minLength; $j++) {
if ($referenceBytes[$j] -ne $compareBytes[$j]) {
$firstDiff = $j
break
}
}
if ($firstDiff -ge 0) {
Write-Host "First difference at position: $firstDiff"
# 显示差异周围的内容(前后16字节)
$start = [Math]::Max(0, $firstDiff - 16)
$end = [Math]::Min($firstDiff + 16, $minLength - 1)
Write-Host "`nReference file content around difference:"
$hexRef = ($referenceBytes[$start..$end] | ForEach-Object { $_.ToString("X2") }) -join " "
Write-Host $hexRef
Write-Host "`nCompare file content around difference:"
$hexComp = ($compareBytes[$start..$end] | ForEach-Object { $_.ToString("X2") }) -join " "
Write-Host $hexComp
}
else {
if ($referenceBytes.Length -eq $compareBytes.Length) {
Write-Host "Files are identical"
}
else {
Write-Host "Files are identical up to position $minLength"
}
}
}
}
Write-Host "`nComparing tar files..."
$tarFiles = @(
"$testDir\direct.tar",
"$testDir\piped.tar",
"$testDir\moved.tar"
)
Compare-BinaryFiles $tarFiles
# 测试解压
Write-Host "`nTesting extraction..."
foreach ($tarFile in $tarFiles) {
$extractDir = Join-Path $testDir ([System.IO.Path]::GetFileNameWithoutExtension($tarFile))
New-Item -ItemType Directory -Path $extractDir -Force | Out-Null
Push-Location $extractDir
try {
tar -xf $tarFile
Write-Host "`nExtracted files from $([System.IO.Path]::GetFileName($tarFile)):"
Get-ChildItem | Select-Object Name, Length | Format-Table
}
catch {
Write-Host "Error extracting ${tarFile}: $_"
}
Pop-Location
}
# 清理
Set-Location ..
Remove-Item $testDir -Recurse -Force
Write-Host "Test files cleaned up."
结果:
File sizes:
D:\temp\tar-test\direct.tar: 3072 bytes
D:\temp\tar-test\piped.tar: 10240 bytes
D:\temp\tar-test\moved.tar: 3072 bytes
Comparing D:\temp\tar-test\direct.tar with D:\temp\tar-test\piped.tar...
Files are identical up to position 3072
Comparing D:\temp\tar-test\direct.tar with D:\temp\tar-test\moved.tar...
Files are identical
Testing extraction...
Extracted files from direct.tar:
Name Length
---- ------
data.txt 24
test.txt 28
Extracted files from piped.tar:
Name Length
---- ------
data.txt 24
test.txt 28
Extracted files from moved.tar:
Name Length
---- ------
data.txt 24
test.txt 28
Test files cleaned up.
我们可以看出:
direct.tar
和 moved.tar
:
- 大小完全相同:3072 bytes
- 内容完全相同("Files are identical")
- 本质上是同一种方法(直接写文件)
piped.tar
:
- 大小更大:10240 bytes
- 与其他文件前3072字节完全相同
- 后面有额外的7168字节
- 仍然可以正确解压,文件内容完全正确
因此:
- PowerShell 的
-RedirectStandardOutput
似乎在处理二进制流时添加了额外的数据,但这些额外数据被添加在有效的 tar 数据之后,所以不影响解压 - tar 格式具有很好的容错性,它能正确识别文件结束位置,忽略多余数据
现在总结一下:
PowerShell 重定向(>)的问题:
- 会在文件开头添加 UTF-16 BOM (FF FE)
- 将二进制数据转换为 UTF-16,每个字节后面都会插入一个 null 字节(0x00),导致文件大小约为原始大小的两倍
- 对于 tar 这样的二进制格式,这种转换会完全破坏文件结构
对于现在的程序员来说,UTF-8似乎已经是理所当然的编码方式了,那么Powershell已经背后的Windows为何采用UTF-16编码呢?
互联网上有关的回答并不多:
Windows jumped from code pages (called multi byte) to UTF-16LE (called wide char). It made the code simpler if you assume all chars were 16 bits. They assumed that (this encoding is called UCS-2). Unicode consortium later decided 16 bits was not enough, but it was too late for Microsoft to change course. So, their definition of "wide char" just changed from UCS-2 to UTF-16LE.
我是一个千禧年后CS专业的学生,对于那个4MB及以下RAM的时代非常陌生,因此不再继续深挖下去了。
Comments | NOTHING