harbor使用过程遇到的一些问题集合

1. 【已解决】harbor降级core无法启动:failed to initialize database: file does not exist

【版本环境】harbor1.8.6
【安装方式】helm
【操作描述】从harbor1.9.4 降级到harbor1.8.6,无法启动core
【报错信息】

2019-09-16T14:02:03Z [ERROR] [/common/config/manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-09-16T14:02:03Z [ERROR] [/common/config/manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-09-16T14:02:03Z [INFO] [/core/config/config.go:143]: initializing the project manager based on local database...
2019-09-16T14:02:03Z [INFO] [/core/main.go:96]: configurations initialization completed
2019-09-16T14:02:03Z [INFO] [/common/dao/base.go:84]: Registering database: type-PostgreSQL host-maxim-harbor-database port-5432 databse-registry sslmode-"disable"
[ORM]2019/09/16 14:02:03 Detect DB timezone: UCT open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory
2019-09-16T14:02:03Z [INFO] [/common/dao/base.go:89]: Register database completed
2019-09-16T14:02:03Z [INFO] [/common/dao/pgsql.go:107]: Upgrading schema for pgsql ...
2019-09-16T14:02:03Z [ERROR] [/common/dao/pgsql.go:112]: Failed to upgrade schema, error: "file does not exist"
2019-09-16T14:02:03Z [FATAL] [/core/main.go:103]: failed to initialize database: file does not exist

【分析原因】
在官方文档Upgrading Harbor Deployed with Helm中,提到
Notes

  • As the database schema may change between different versions of Harbor, there is a progress to migrate the schema during the upgrade and the downtime cannot be avoid
  • The database schema cannot be downgraded automatically, so the helm rollback is not supported

就是说由于数据库schema在不同的harbor版本之间是会改变的,所以在升级过程中由一步是同步schema,并且下线停机是不可避免的;数据库shema不能自动升级,所以不支持helm rollback。

另外在一个issue中也有关于这个问题的探讨和说明:failed to initialize database: file does not exist #354,里面的说明:The root cause for the error file does not exist is that you guys use dev images or try to downgrade the installation.

可能是我之前的理解有误,自1.6以后的版本harbor可以不再依赖migration同步工具而是在core启动时自动尝试迁移db,所以我把降级也理解为会自动做了,所以我并没有单独的去做数据库的降级,而是直接通过Helm拉起了旧的版本Chart,但实际上并无法完成,我又尝试了直接在db里面更改schema版本:

# 进入db容器
kubectl exec -it harbor-harbor-database-0 bash
# 进入db切换数据库
psql -U postgres -d registry
# 查询schema并更新
select * from schema_migrations;
 version | dirty
---------+-------
      12 | f
(1 row)

update schema_migrations set version=5 where version=12

但这个操作之后再重启core的pod,db变脏了,dirty属性被改为了’true’

【解决方案】
这个坑我是放弃了,其实还可以再做一步实验的就是单独执行migration down操作,因为是Helm部署在K8S了挂的PV,操作还是不如在单机时直接docker run执行方便。

总的来说数据库的降级操作建议还是在升级之前先保存数据库文件,然后再执行升级,降级/回退时直接采用原来备份的db文件,避免原来的db文件升级又升级,导致db变脏。


2. 【未解决不影响】core报错:failed to get key admiral_url, error: the configure value is not set

【版本环境】harbor1.8.6
【安装方式】helm
【操作描述】harbor1.8.6安装完一切都正常,只是看到日志持续报错
【报错信息】

2019/05/07 09:23:35 [D] [server.go:2619] |     10.112.0.1| 200 |    846.014µs|   match| GET      /api/ping   r:/api/ping
2019-05-07T09:23:43Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-05-07T09:23:43Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019/05/07 09:23:43 [D] [server.go:2619] |     10.112.0.1| 200 |   1.063696ms|   match| GET      /api/ping   r:/api/ping
2019-05-07T09:23:45Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-05-07T09:23:45Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set

【分析原因】
看到这个issue:failed to get key admiral_url, error: the configure value is not set #226

“Make sure you are using the GA version of Harbor on the 1.0.0 branch.
In your templates, you should have the adminserver configmaps (adminserver-cm.yaml) with the value ADMIRAL_URL: “NA””

里面提到说是使用了master分支版本,但是我用的并不是master分支呀。

【解决方案】

暂不影响使用,未解决


3. 【已解决】UI登陆报错无法登陆

【版本环境】harbor1.9.4
【安装方式】helm
【操作描述】harbor1.9.4升级安装完,登陆UI
【报错信息】提示报错,用户名密码不对
【分析原因】1.9的版本引入了quoto机制,这个机制需要扫描所有的镜像,会比较费时间,等待migration完成后
【解决方案】耐心等待


4. 【已解决】harbor 同步失败任务无法停止及删除

【版本环境】harbor 1.5.0
【安装方式】docker-compose
【操作描述】harbor同步任务卡住不动没有进展后点击“停止任务”,无反映后点击删除
【报错信息】无反映,过一会弹出“500内部未知错误”,切发现无法删除这个复制策略,提示“删除失败,有处于pending/running状态任务”
【分析原因】job被卡死,参考Issue:Impossible to delete replication rule : “have pending/running/retrying status” #7057
【解决方案】进入harbor-db,直接更改replication的数据库表,并重启jobservice,然后在WEB UI重新开启该策略的同步复制

# 进入db
docker exec -it harbor-db bash
 
mysql -h localhost -p -uroot
 
# 输入默认密码
 
MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| registry           |
+--------------------+
4 rows in set (0.01 sec)
 
MariaDB [(none)]> use registry;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [registry]> show tables;
+-------------------------------+
| Tables_in_registry            |
+-------------------------------+
| access                        |
| access_log                    |
| alembic_version               |
| clair_vuln_timestamp          |
| harbor_label                  |
| harbor_resource_label         |
| img_scan_job                  |
| img_scan_overview             |
| project                       |
| project_member                |
| project_metadata              |
| properties                    |
| replication_immediate_trigger |
| replication_job               |
| replication_policy            |
| replication_target            |
| repository                    |
| role                          |
| user                          |
| user_group                    |
+-------------------------------+
20 rows in set (0.00 sec)
 
 
# 可以看到和复制相关的就三张表,其中我们要修改的是replication_job
# 先select出数据看下,这里只保留几条记录做范例
MariaDB [registry]> select * from    replication_job  where policy_id=8 and status in ('pending', 'running');
 
| id    | status  | policy_id | repository                              | operation | tags               | job_uuid                 | creation_time       | update_time    |
|  6758 | pending |         8 | prod/andddt                                | transfer  | 0.2.0-build136   |                          | 2020-03-17 01:21:28 | 2020-03-17 01:21:28 |
|  9962 | running |         8 | prod/safadrn-fe-pro                      | transfer  | 0.1.0-build110,0.1.0-build111,0.1.0-build112,0.1.0-build113,0.1.0-build115,0.1.0-build116,0.1.0-build117,0.1.0-build118,0.1.0-build119,0.1.0-build120,0.1.0-build121,0.1.0-build122,0.1.0-build123,0.1.0-build124,0.1.0-build125,0.1.0-build126,0.1.0-build127,1.0.0-build128,1.0.0-build129,1.0.0-build130,1.0.0-build131,1.0.0-build132,1.0.0-build133,1.0.0-build134,1.0.0-build135,1.0.0-build138,1.0.0-build139,1.0.0-build140,1.0.0-build141     | d5dcb08860d0a7dd4d3bdac2 | 2020-05-17 14:44:54 | 2020-05-17 18:02:10 |
|  9967 | running |         8 | prod/opfadools-srv                       | transfer  | 1.0.0,1.0.2,1.0.6,1.0.7,1.0.8,1.0.9,1.0.10,1.0.11,1.0.12,1.0.13,1.0.14,1.0.17,1.0.18,1.0.19,1.0.20,1.0.21,1.0.22,1.0.23,1.0.24,1.0.25,1.0.26,1.0.27,1.0.28,1.0.29,1.0.30,1.0.31,1.0.32,1.0.33,1.0.34,1.0.35,1.0.36,1.0.37,1.0.38,1.0.39,1.0.40,1.0.41,1.0.42,1.0.43,1.0.44,1.0.45,1.0.46,1.0.47,1.0.48,1.0.49                                                                                                                                                
 
# 修改数据,这里sql语句中的 policy_id 换成具体被卡住的同步策略id,可以在replication_policy表或者浏览器开发者模式中找到
MariaDB [registry]> update replication_job set status='finished',update_time=now() where policy_id=8 and status in ('pending', 'running');                            
Query OK, 82 rows affected (0.01 sec)
Rows matched: 82  Changed: 82  Warnings: 0
 
# 退出db
exit
 
# 重启jobservice进程
docker restart harbor-jobservice

此方案适用于harbor v1.6以下版本,在harbor v1.6及以后,数据库从原来的mysql更改为pgsql,同时在harbor v1.8对复制任务的数据库表结构做了调整,把replication_job的数据迁入replication_execution表。

所以总的来说在harbor 1.6以下版本可以直接使用上述命令操作,在1.6以上版本则参考该思路进行操作,命令和方式稍微不同。


5. 【未解决】harbor同步失败:failed to fetch image: FetchImages error when collect tags for repos

【版本环境】harbor 1.9
【安装方式】helm
【操作描述】harbor 1.9 主动pullbased 策略拉取harbor 1.7的镜像
【报错信息】点击同步策略开始后不久,马上报failed,然后看core的日志:failed to fetch image: FetchImages error when collect tags for repos,而jobservice压根儿还没开始。注意这里对于普通的标签数量不多的仓库来说不会发生,但是对于大量比如>5000个tag的仓库就出现这个问题:

[root@SYSOPS00613596 ~]# kubectl -n goharbor logs goharbor-harbor-core-67dcd898b6-s6nm5 --tail 200
2020/05/17 15:30:48 [D] [server.go:2774] |  10.14.133.223| 200 |   6.384939ms|   match| GET      /api/systeminfo   r:/api/systeminfo
2020-05-17T15:30:48Z [ERROR] [/common/api/base.go:73]: GET /api/users/current failed with error: {"code":401,"message":"UnAuthorize"}
2020/05/17 15:30:48 [D] [server.go:2774] |  10.14.133.223| 401 |   1.383716ms|   match| GET      /api/users/current   r:/api/users/:id
2020-05-17T15:30:48Z [ERROR] [/common/api/base.go:73]: GET /api/users/current failed with error: {"code":401,"message":"UnAuthorize"}
2020/05/17 15:30:48 [D] [server.go:2774] |  10.14.133.223| 401 |    1.32869ms|   match| GET      /api/users/current   r:/api/users/:id
2020/05/17 15:30:48 [D] [server.go:2774] |  10.14.133.223| 200 |   7.970204ms|   match| GET      /api/systeminfo   r:/api/systeminfo
2020-05-17T15:30:53Z [ERROR] [/core/controllers/base.go:110]: Error occurred in UserLogin: Failed to authenticate user, due to error 'Invalid credentials'
2020/05/17 15:30:55 [D] [server.go:2774] |    172.16.12.1| 200 |    630.641µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:30:57 [D] [server.go:2774] |    172.16.12.1| 200 |    656.857µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:00 [D] [server.go:2774] |  10.14.133.223| 200 |   8.132316ms|   match| POST     /c/login   r:/c/login
2020/05/17 15:31:00 [D] [server.go:2774] |  10.14.133.223| 200 |   6.063739ms|   match| GET      /api/users/current   r:/api/users/:id
2020/05/17 15:31:00 [D] [server.go:2774] |  10.14.133.223| 200 |   1.180429ms|   match| GET      /api/configurations   r:/api/configurations
2020/05/17 15:31:00 [D] [server.go:2774] |  10.14.133.223| 200 |   7.681296ms|   match| GET      /api/statistics   r:/api/statistics
2020-05-17T15:31:00Z [WARNING] [/core/systeminfo/imagestorage/filesystem/driver.go:51]: The path /data is not found, will return zero value of capacity
2020/05/17 15:31:00 [D] [server.go:2774] |  10.14.133.223| 200 |    476.811µs|   match| GET      /api/systeminfo/volumes   r:/api/systeminfo/volumes
2020/05/17 15:31:00 [D] [server.go:2774] |  10.14.133.223| 200 | 186.361331ms|   match| GET      /api/projects   r:/api/projects/
2020/05/17 15:31:05 [D] [server.go:2774] |    172.16.12.1| 200 |    607.796µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:07 [D] [server.go:2774] |    172.16.12.1| 200 |    692.877µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:15 [D] [server.go:2774] |    172.16.12.1| 200 |    749.928µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:17 [D] [server.go:2774] |    172.16.12.1| 200 |    621.023µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:19 [D] [server.go:2774] |  10.14.133.223| 200 |   9.952446ms|   match| GET      /api/projects/3/members   r:/api/projects/:pid([0-9]+)/members/?:pmid([0-9]+)
2020/05/17 15:31:19 [D] [server.go:2774] |  10.14.133.223| 200 |  20.347816ms|   match| GET      /api/users/current/permissions   r:/api/users/:id/permissions
2020/05/17 15:31:19 [D] [server.go:2774] |  10.14.133.223| 200 |  12.188607ms|   match| GET      /api/projects/3   r:/api/projects/:id([0-9]+)
2020/05/17 15:31:19 [D] [server.go:2774] |  10.14.133.223| 200 |  41.119236ms|   match| GET      /api/projects   r:/api/projects/
2020/05/17 15:31:19 [D] [server.go:2774] |  10.14.133.223| 200 |  17.705786ms|   match| GET      /api/projects/3/summary   r:/api/projects/:id([0-9]+)/summary
2020/05/17 15:31:21 [D] [server.go:2774] |  10.14.133.223| 200 |  11.394304ms|   match| GET      /api/systeminfo   r:/api/systeminfo
2020/05/17 15:31:21 [D] [server.go:2774] |  10.14.133.223| 200 |  96.774652ms|   match| GET      /api/repositories   r:/api/repositories
2020/05/17 15:31:21 [D] [server.go:2774] |  10.14.133.223| 200 |  88.841389ms|   match| GET      /api/repositories   r:/api/repositories
2020/05/17 15:31:25 [D] [server.go:2774] |    172.16.12.1| 200 |    598.692µs|   match| GET      /api/ping   r:/api/ping
2020-05-17T15:31:26Z [WARNING] [/core/systeminfo/imagestorage/filesystem/driver.go:51]: The path /data is not found, will return zero value of capacity
2020/05/17 15:31:26 [D] [server.go:2774] |  10.14.133.223| 200 |     872.73µs|   match| GET      /api/systeminfo/volumes   r:/api/systeminfo/volumes
2020/05/17 15:31:26 [D] [server.go:2774] |  10.14.133.223| 200 |    1.53326ms|   match| GET      /api/configurations   r:/api/configurations
2020/05/17 15:31:26 [D] [server.go:2774] |  10.14.133.223| 200 |   7.776343ms|   match| GET      /api/statistics   r:/api/statistics
2020/05/17 15:31:26 [D] [server.go:2774] |  10.14.133.223| 200 |   34.05679ms|   match| GET      /api/projects   r:/api/projects/
2020/05/17 15:31:27 [D] [server.go:2774] |    172.16.12.1| 200 |    597.104µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:35 [D] [server.go:2774] |    172.16.12.1| 200 |    617.784µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:37 [D] [server.go:2774] |    172.16.12.1| 200 |    670.764µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:45 [D] [server.go:2774] |    172.16.12.1| 200 |    594.816µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:47 [D] [server.go:2774] |    172.16.12.1| 200 |    620.794µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:55 [D] [server.go:2774] |    172.16.12.1| 200 |    614.962µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:31:57 [D] [server.go:2774] |    172.16.12.1| 200 |    646.838µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:05 [D] [server.go:2774] |    172.16.12.1| 200 |   1.512053ms|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:07 [D] [server.go:2774] |    172.16.12.1| 200 |    604.774µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:15 [D] [server.go:2774] |    172.16.12.1| 200 |    584.353µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:17 [D] [server.go:2774] |    172.16.12.1| 200 |    571.165µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:25 [D] [server.go:2774] |    172.16.12.1| 200 |    664.619µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:27 [D] [server.go:2774] |    172.16.12.1| 200 |    585.596µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:32 [D] [server.go:2774] |  10.14.133.223| 200 |   1.425728ms|   match| GET      /api/registries   r:/api/registries
2020/05/17 15:32:32 [D] [server.go:2774] |  10.14.133.223| 200 |    652.048µs|   match| GET      /api/replication/adapters   r:/api/replication/adapters
2020/05/17 15:32:32 [D] [server.go:2774] |  10.14.133.223| 200 |  42.583993ms|   match| GET      /api/projects   r:/api/projects/
2020/05/17 15:32:33 [D] [server.go:2774] |  10.14.133.223| 200 |   7.083254ms|   match| GET      /api/replication/policies   r:/api/replication/policies
2020/05/17 15:32:33 [D] [server.go:2774] |  10.14.133.223| 200 |   1.176269ms|   match| GET      /api/registries   r:/api/registries
2020/05/17 15:32:35 [D] [server.go:2774] |    172.16.12.1| 200 |   1.031396ms|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:37 [D] [server.go:2774] |    172.16.12.1| 200 |    689.483µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:40 [D] [server.go:2774] |  10.14.133.223| 200 |  11.612709ms|   match| GET      /api/replication/executions   r:/api/replication/executions
2020/05/17 15:32:42 [D] [server.go:2774] |  10.14.133.223| 200 |   3.353095ms|   match| GET      /api/replication/policies/2   r:/api/replication/policies/:id([0-9]+)
2020/05/17 15:32:42 [D] [server.go:2774] |  10.14.133.223| 200 |  24.422296ms|   match| GET      /api/registries/1/info   r:/api/registries/:id/info
2020/05/17 15:32:45 [D] [server.go:2774] |    172.16.12.1| 200 |    617.396µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:47 [D] [server.go:2774] |    172.16.12.1| 200 |    580.195µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:51 [D] [server.go:2774] |  10.14.133.223| 201 |   6.959826ms|   match| POST     /api/replication/executions   r:/api/replication/executions
2020/05/17 15:32:51 [D] [server.go:2774] |  10.14.133.223| 200 |   7.618415ms|   match| GET      /api/replication/executions   r:/api/replication/executions
2020-05-17T15:32:52Z [ERROR] [/common/utils/passports.go:109]: List tags for repo 'ai-cloud/quay.io/coreos/kube-state-metrics' error: http error: code 500, message 
2020-05-17T15:32:55Z [ERROR] [/replication/operation/controller.go:108]: the execution 11 failed: failed to fetch image: FetchImages error when collect tags for repos
2020/05/17 15:32:55 [D] [server.go:2774] |    172.16.12.1| 200 |     663.94µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:32:57 [D] [server.go:2774] |    172.16.12.1| 200 |    685.434µs|   match| GET      /api/ping   r:/api/ping
2020/05/17 15:33:01 [D] [server.go:2774] |  10.14.133.223| 200 |   2.960291ms|   match| GET      /api/replication/executions   r:/api/replication/executions
2020/05/17 15:33:03 [D] [server.go:2774] |    29.2.32.132| 401 |   1.799198ms|   match| GET      /v2/   r:/v2/*
2020/05/17 15:33:03 [D] [server.go:2774] |    29.2.32.132| 200 |   17.12181ms|   match| GET      /service/token   r:/service/token
2020/05/17 15:33:03 [D] [server.go:2774] |    29.2.32.132| 200 |   1.604455ms|   match| HEAD     /v2/   r:/v2/*

【分析原因】
Issue:failed to fetch image: FetchImages error when collect tags for repos #9295有讨论这个问题,不过它复制的仓库是基于gitlab-based仓库,而且我的case里面jobservice完全没有任何报错信息,jobservice还没起作用就failed了。而且我的core里面报这个错的前面一个错是:List tags for repo 'ai-cloud/quay.io/coreos/kube-state-metrics' error: http error: code 500, message,而且单独从API上list tags是没有问题的。

所以根本上还是为何会在大量tag的时候遍历镜像tag为何会报500错误?自己单独调用API遍历整个仓库的时候都不会报错。

根据官方文档,1.9版本是个分水岭,低于1.9的版本和1.9版本不能同步。后面升级版本后再验证下

【解决方案】

6. 【已解决】harbor界面上一次删除多个tag异常产生0B size tag无法删除

【版本环境】多个版本有遇到此情况
【安装方式】——
【操作描述】在harbor界面上一次删除多个tag
【报错信息】删除后页面刷新,返回http code 200删除成功,但刷新页面后tag还在,但是size变为0B,并且在UI中无法再删除
【分析原因】
查看ui.log日志,报:unknow error,看起来是manifests找不到link指向数据了

# tail --tail 100 ui.log
[ERROR] [repository.go:298]: failed to delete tag v2: 500 {"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{"Path":
"/docker/registry/v2/repositories/libriry/test/_manifests/tags/v1/current/link","DriverName":"filesystem"}}]}

# 我们根据日志来看具体的link指向
# cat /docker/registry/docker/registry/v2/repositories/libriry/test/_manifests/tags/6b32ea9cfc8b3df0c2962336661077865baa9d8d/index/sha256/e7dd642e969acb450e3f5fe9067d096f0947c5f3084d23c0fea8038087d038f7/link
sha256:e7dd642e969acb450e3f5fe9067d096f0947c5f3084d23c0fea8038087d038f7[/]#
# 根据dist来到registry里面找layer文件,提示无法找到
# ls /data/registry/docker/registry/v2/blobs/sha256/e7/e7dd*
ls: cannot access /data/registry/docker/registry/v2/blobs/sha256/e7/e7dd*: No such file or directory

再看registry.log日志

Dec 11 15:48:59 172.18.0.1 registry[3597772]: time="2018-12-11T07:48:59.125019435Z" level=error msg="response completed with error" auth.user.name=harborAdmin err.code=unknown err.message="fi
lesystem: Path not found: /docker/registry/v2/repositories/testforrsync/test2/_manifests/tags/v1/current/link" go.version=go1.7.3 http.request.host="10.182.2.108:25000" http.request.id=51bbb9
d6-22cc-46d5-a204-c76f28955cb8 http.request.method=DELETE http.request.remoteaddr=10.182.2.120 http.request.uri="/v2/testforrsync/test2/manifests/sha256:3e533db287adef4291610c6186c49efcb808e8
eb76f5b8e8873077a769fd09bc" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=7.86665ms http.response.status=500 h
ttp.response.written=188 instance.id=9c456576-0ef7-4676-9710-fa01b8527d3e service=registry vars.name="testforrsync/test2" vars.reference="sha256:3e533db287adef4291610c6186c49efcb808e8eb76f5b8
e8873077a769fd09bc" version=v2.6.2

看起来是docker registry的bug,而不是harbor,我们直接操作registry去删除或者查询也是报manifest unknown这个错

curl -X DELETE -H 'Authorization: Bearer xxx' http://registry:5000/v2/myrepo/myimage/manifests/sha256:03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097
{
    "errors": [
        {
            "code": "MANIFEST_UNKNOWN",
            "message": "manifest unknown"
        }
    ]
}

curl -H 'Authorization: Bearer xxx' http://registry:5000/v2/myrepo/myimage/manifests/sha256:03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097
{
    "errors": [
        {
            "code": "MANIFEST_UNKNOWN",
            "detail": {
                "Name": "myrepo/myimage",
                "Revision": "sha256:03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097"
            },
            "message": "manifest unknown"
        }
    ]
}

因为正如上面日志分析的,你到存储系统上去看该文件指向是空的

/data/docker/registry/v2/repositories/myrepo/myimage/_manifests/revisions/sha256/03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097

但是对于正常的镜像来说,这个路径下面应该是有一个link文件,并有相应的文件在/data/docker/registry/v2/blobs/sha256下面,

根据官方的回答是:These are due to a bug in upstream docker distribution, essentially the blob is not written to the storage but the manifest created successfully and referencing the non-existing blob.参见:Delete tag failed – size 0 image. #6515

即这是因为docker distribution的问题,也就是说manifest文件已经创建成功了,但是blob并没有写到存储里面,导致映射到一个不存在的blob。

【解决方案】
对于已经产生了0B的镜像,可以通过重新push该tag的镜像,并再次从UI中删除。

对于未出现想要规避这种情况的的,可以考虑
(1)对于现存版本,若要批量删除tag,请以api的方式逐一删除;
(2)升级新版本到2.0,因为2.0版本采用OCI工件方式存储镜像


7. 同步复制镜像报404

【版本环境】多个版本有遇到
【安装方式】——
【操作描述】从一个harbor仓库同步/复制镜像到另外一个仓库
【报错信息】在仓库镜像很多的时候复制会遇到不少的manifest 404报错,日志:pull manifest from local registry failed404 tag unknown, tag=xxx.
【分析原因】
这里我们可能需要大致了解下harbor与docker registry的交互方式,我们知道harbor其实是利用了docker registry,本质上是对docker registry的管理。所以理解它与registry的交互就很关键了:
首先是docker registry持续监听是否有客户端push的操作,一旦监听到,则会触发notification通知(event 1),然后是registry在后端存储当中创建tag file(event 2),而当harbor从registry接收到一个镜像的推送通知(push notification)之后,它就会从通知消息当中分离出repo和tag信息,然后会执行一个pull manifest的操作(类似docker pull xxx.com/:),这里表示为event 3,这回触发registry在后端读取tag file。

实际上,event 2和event 3是异步进行的,如果event 2在event 3请求抵达之前就已经完成了,那么什么事都没有,但,如果event 3请求到达的时候event 2还没有结束,那么就回返回一个404 tag unknow error错误。所以往往在遇到404的手只需要将镜像同步组件短暂的休息50ms,那么问题就解决了。

参考这两个issue的分析:
Wait for manifest in notification handler #6223
registry sends out image push notifications before tag is ready in backend storage #2625

【解决方案】
所以再次遇到此类错误,我们可以遵循这个思路和原则看如何处理

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐