harbor使用过程遇到的一些问题集合
harbor使用过程遇到的一些问题集合1. harbor降级core无法启动:failed to initialize database: file does not exist【版本环境】harbor1.8.6【安装方式】helm【操作描述】从harbor1.9.4 降级到harbor1.8.6,无法启动core【报错信息】2019-09-16T14:02:03Z [ERROR] [/commo
harbor使用过程遇到的一些问题集合
文章目录
- harbor使用过程遇到的一些问题集合
- @[toc]
- 1. 【已解决】harbor降级core无法启动:failed to initialize database: file does not exist
- 2. 【未解决不影响】core报错:failed to get key admiral_url, error: the configure value is not set
- 3. 【已解决】UI登陆报错无法登陆
- 4. 【已解决】harbor 同步失败任务无法停止及删除
- 5. 【未解决】harbor同步失败:failed to fetch image: FetchImages error when collect tags for repos
- 6. 【已解决】harbor界面上一次删除多个tag异常产生0B size tag无法删除
- 7. 同步复制镜像报404
文章目录
- harbor使用过程遇到的一些问题集合
- @[toc]
- 1. 【已解决】harbor降级core无法启动:failed to initialize database: file does not exist
- 2. 【未解决不影响】core报错:failed to get key admiral_url, error: the configure value is not set
- 3. 【已解决】UI登陆报错无法登陆
- 4. 【已解决】harbor 同步失败任务无法停止及删除
- 5. 【未解决】harbor同步失败:failed to fetch image: FetchImages error when collect tags for repos
- 6. 【已解决】harbor界面上一次删除多个tag异常产生0B size tag无法删除
- 7. 同步复制镜像报404
1. 【已解决】harbor降级core无法启动:failed to initialize database: file does not exist
【版本环境】harbor1.8.6
【安装方式】helm
【操作描述】从harbor1.9.4 降级到harbor1.8.6,无法启动core
【报错信息】
2019-09-16T14:02:03Z [ERROR] [/common/config/manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-09-16T14:02:03Z [ERROR] [/common/config/manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-09-16T14:02:03Z [INFO] [/core/config/config.go:143]: initializing the project manager based on local database...
2019-09-16T14:02:03Z [INFO] [/core/main.go:96]: configurations initialization completed
2019-09-16T14:02:03Z [INFO] [/common/dao/base.go:84]: Registering database: type-PostgreSQL host-maxim-harbor-database port-5432 databse-registry sslmode-"disable"
[ORM]2019/09/16 14:02:03 Detect DB timezone: UCT open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory
2019-09-16T14:02:03Z [INFO] [/common/dao/base.go:89]: Register database completed
2019-09-16T14:02:03Z [INFO] [/common/dao/pgsql.go:107]: Upgrading schema for pgsql ...
2019-09-16T14:02:03Z [ERROR] [/common/dao/pgsql.go:112]: Failed to upgrade schema, error: "file does not exist"
2019-09-16T14:02:03Z [FATAL] [/core/main.go:103]: failed to initialize database: file does not exist
【分析原因】
在官方文档Upgrading Harbor Deployed with Helm中,提到
Notes
- As the database schema may change between different versions of Harbor, there is a progress to migrate the schema during the upgrade and the downtime cannot be avoid
- The database schema cannot be downgraded automatically, so the helm rollback is not supported
就是说由于数据库schema在不同的harbor版本之间是会改变的,所以在升级过程中由一步是同步schema,并且下线停机是不可避免的;数据库shema不能自动升级,所以不支持helm rollback。
另外在一个issue中也有关于这个问题的探讨和说明:failed to initialize database: file does not exist #354,里面的说明:The root cause for the error file does not exist is that you guys use dev images or try to downgrade the installation.
可能是我之前的理解有误,自1.6以后的版本harbor可以不再依赖migration同步工具而是在core启动时自动尝试迁移db,所以我把降级也理解为会自动做了,所以我并没有单独的去做数据库的降级,而是直接通过Helm拉起了旧的版本Chart,但实际上并无法完成,我又尝试了直接在db里面更改schema版本:
# 进入db容器
kubectl exec -it harbor-harbor-database-0 bash
# 进入db切换数据库
psql -U postgres -d registry
# 查询schema并更新
select * from schema_migrations;
version | dirty
---------+-------
12 | f
(1 row)
update schema_migrations set version=5 where version=12
但这个操作之后再重启core的pod,db变脏了,dirty属性被改为了’true’
【解决方案】
这个坑我是放弃了,其实还可以再做一步实验的就是单独执行migration down操作,因为是Helm部署在K8S了挂的PV,操作还是不如在单机时直接docker run执行方便。
总的来说数据库的降级操作建议还是在升级之前先保存数据库文件,然后再执行升级,降级/回退时直接采用原来备份的db文件,避免原来的db文件升级又升级,导致db变脏。
2. 【未解决不影响】core报错:failed to get key admiral_url, error: the configure value is not set
【版本环境】harbor1.8.6
【安装方式】helm
【操作描述】harbor1.8.6安装完一切都正常,只是看到日志持续报错
【报错信息】
2019/05/07 09:23:35 [D] [server.go:2619] | 10.112.0.1| 200 | 846.014µs| match| GET /api/ping r:/api/ping
2019-05-07T09:23:43Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-05-07T09:23:43Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019/05/07 09:23:43 [D] [server.go:2619] | 10.112.0.1| 200 | 1.063696ms| match| GET /api/ping r:/api/ping
2019-05-07T09:23:45Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
2019-05-07T09:23:45Z [ERROR] [manager.go:192]: failed to get key admiral_url, error: the configure value is not set
【分析原因】
看到这个issue:failed to get key admiral_url, error: the configure value is not set #226
“Make sure you are using the GA version of Harbor on the 1.0.0 branch.
In your templates, you should have the adminserver configmaps (adminserver-cm.yaml) with the value ADMIRAL_URL: “NA””
里面提到说是使用了master分支版本,但是我用的并不是master分支呀。
【解决方案】
暂不影响使用,未解决
3. 【已解决】UI登陆报错无法登陆
【版本环境】harbor1.9.4
【安装方式】helm
【操作描述】harbor1.9.4升级安装完,登陆UI
【报错信息】提示报错,用户名密码不对
【分析原因】1.9的版本引入了quoto机制,这个机制需要扫描所有的镜像,会比较费时间,等待migration完成后
【解决方案】耐心等待
4. 【已解决】harbor 同步失败任务无法停止及删除
【版本环境】harbor 1.5.0
【安装方式】docker-compose
【操作描述】harbor同步任务卡住不动没有进展后点击“停止任务”,无反映后点击删除
【报错信息】无反映,过一会弹出“500内部未知错误”,切发现无法删除这个复制策略,提示“删除失败,有处于pending/running状态任务”
【分析原因】job被卡死,参考Issue:Impossible to delete replication rule : “have pending/running/retrying status” #7057
【解决方案】进入harbor-db,直接更改replication的数据库表,并重启jobservice,然后在WEB UI重新开启该策略的同步复制
# 进入db
docker exec -it harbor-db bash
mysql -h localhost -p -uroot
# 输入默认密码
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| registry |
+--------------------+
4 rows in set (0.01 sec)
MariaDB [(none)]> use registry;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [registry]> show tables;
+-------------------------------+
| Tables_in_registry |
+-------------------------------+
| access |
| access_log |
| alembic_version |
| clair_vuln_timestamp |
| harbor_label |
| harbor_resource_label |
| img_scan_job |
| img_scan_overview |
| project |
| project_member |
| project_metadata |
| properties |
| replication_immediate_trigger |
| replication_job |
| replication_policy |
| replication_target |
| repository |
| role |
| user |
| user_group |
+-------------------------------+
20 rows in set (0.00 sec)
# 可以看到和复制相关的就三张表,其中我们要修改的是replication_job
# 先select出数据看下,这里只保留几条记录做范例
MariaDB [registry]> select * from replication_job where policy_id=8 and status in ('pending', 'running');
| id | status | policy_id | repository | operation | tags | job_uuid | creation_time | update_time |
| 6758 | pending | 8 | prod/andddt | transfer | 0.2.0-build136 | | 2020-03-17 01:21:28 | 2020-03-17 01:21:28 |
| 9962 | running | 8 | prod/safadrn-fe-pro | transfer | 0.1.0-build110,0.1.0-build111,0.1.0-build112,0.1.0-build113,0.1.0-build115,0.1.0-build116,0.1.0-build117,0.1.0-build118,0.1.0-build119,0.1.0-build120,0.1.0-build121,0.1.0-build122,0.1.0-build123,0.1.0-build124,0.1.0-build125,0.1.0-build126,0.1.0-build127,1.0.0-build128,1.0.0-build129,1.0.0-build130,1.0.0-build131,1.0.0-build132,1.0.0-build133,1.0.0-build134,1.0.0-build135,1.0.0-build138,1.0.0-build139,1.0.0-build140,1.0.0-build141 | d5dcb08860d0a7dd4d3bdac2 | 2020-05-17 14:44:54 | 2020-05-17 18:02:10 |
| 9967 | running | 8 | prod/opfadools-srv | transfer | 1.0.0,1.0.2,1.0.6,1.0.7,1.0.8,1.0.9,1.0.10,1.0.11,1.0.12,1.0.13,1.0.14,1.0.17,1.0.18,1.0.19,1.0.20,1.0.21,1.0.22,1.0.23,1.0.24,1.0.25,1.0.26,1.0.27,1.0.28,1.0.29,1.0.30,1.0.31,1.0.32,1.0.33,1.0.34,1.0.35,1.0.36,1.0.37,1.0.38,1.0.39,1.0.40,1.0.41,1.0.42,1.0.43,1.0.44,1.0.45,1.0.46,1.0.47,1.0.48,1.0.49
# 修改数据,这里sql语句中的 policy_id 换成具体被卡住的同步策略id,可以在replication_policy表或者浏览器开发者模式中找到
MariaDB [registry]> update replication_job set status='finished',update_time=now() where policy_id=8 and status in ('pending', 'running');
Query OK, 82 rows affected (0.01 sec)
Rows matched: 82 Changed: 82 Warnings: 0
# 退出db
exit
# 重启jobservice进程
docker restart harbor-jobservice
此方案适用于harbor v1.6以下版本,在harbor v1.6及以后,数据库从原来的mysql更改为pgsql,同时在harbor v1.8对复制任务的数据库表结构做了调整,把replication_job的数据迁入replication_execution表。
所以总的来说在harbor 1.6以下版本可以直接使用上述命令操作,在1.6以上版本则参考该思路进行操作,命令和方式稍微不同。
5. 【未解决】harbor同步失败:failed to fetch image: FetchImages error when collect tags for repos
【版本环境】harbor 1.9
【安装方式】helm
【操作描述】harbor 1.9 主动pullbased 策略拉取harbor 1.7的镜像
【报错信息】点击同步策略开始后不久,马上报failed,然后看core的日志:failed to fetch image: FetchImages error when collect tags for repos,而jobservice压根儿还没开始。注意这里对于普通的标签数量不多的仓库来说不会发生,但是对于大量比如>5000个tag的仓库就出现这个问题:
[root@SYSOPS00613596 ~]# kubectl -n goharbor logs goharbor-harbor-core-67dcd898b6-s6nm5 --tail 200
2020/05/17 15:30:48 [D] [server.go:2774] | 10.14.133.223| 200 | 6.384939ms| match| GET /api/systeminfo r:/api/systeminfo
2020-05-17T15:30:48Z [ERROR] [/common/api/base.go:73]: GET /api/users/current failed with error: {"code":401,"message":"UnAuthorize"}
2020/05/17 15:30:48 [D] [server.go:2774] | 10.14.133.223| 401 | 1.383716ms| match| GET /api/users/current r:/api/users/:id
2020-05-17T15:30:48Z [ERROR] [/common/api/base.go:73]: GET /api/users/current failed with error: {"code":401,"message":"UnAuthorize"}
2020/05/17 15:30:48 [D] [server.go:2774] | 10.14.133.223| 401 | 1.32869ms| match| GET /api/users/current r:/api/users/:id
2020/05/17 15:30:48 [D] [server.go:2774] | 10.14.133.223| 200 | 7.970204ms| match| GET /api/systeminfo r:/api/systeminfo
2020-05-17T15:30:53Z [ERROR] [/core/controllers/base.go:110]: Error occurred in UserLogin: Failed to authenticate user, due to error 'Invalid credentials'
2020/05/17 15:30:55 [D] [server.go:2774] | 172.16.12.1| 200 | 630.641µs| match| GET /api/ping r:/api/ping
2020/05/17 15:30:57 [D] [server.go:2774] | 172.16.12.1| 200 | 656.857µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:00 [D] [server.go:2774] | 10.14.133.223| 200 | 8.132316ms| match| POST /c/login r:/c/login
2020/05/17 15:31:00 [D] [server.go:2774] | 10.14.133.223| 200 | 6.063739ms| match| GET /api/users/current r:/api/users/:id
2020/05/17 15:31:00 [D] [server.go:2774] | 10.14.133.223| 200 | 1.180429ms| match| GET /api/configurations r:/api/configurations
2020/05/17 15:31:00 [D] [server.go:2774] | 10.14.133.223| 200 | 7.681296ms| match| GET /api/statistics r:/api/statistics
2020-05-17T15:31:00Z [WARNING] [/core/systeminfo/imagestorage/filesystem/driver.go:51]: The path /data is not found, will return zero value of capacity
2020/05/17 15:31:00 [D] [server.go:2774] | 10.14.133.223| 200 | 476.811µs| match| GET /api/systeminfo/volumes r:/api/systeminfo/volumes
2020/05/17 15:31:00 [D] [server.go:2774] | 10.14.133.223| 200 | 186.361331ms| match| GET /api/projects r:/api/projects/
2020/05/17 15:31:05 [D] [server.go:2774] | 172.16.12.1| 200 | 607.796µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:07 [D] [server.go:2774] | 172.16.12.1| 200 | 692.877µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:15 [D] [server.go:2774] | 172.16.12.1| 200 | 749.928µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:17 [D] [server.go:2774] | 172.16.12.1| 200 | 621.023µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:19 [D] [server.go:2774] | 10.14.133.223| 200 | 9.952446ms| match| GET /api/projects/3/members r:/api/projects/:pid([0-9]+)/members/?:pmid([0-9]+)
2020/05/17 15:31:19 [D] [server.go:2774] | 10.14.133.223| 200 | 20.347816ms| match| GET /api/users/current/permissions r:/api/users/:id/permissions
2020/05/17 15:31:19 [D] [server.go:2774] | 10.14.133.223| 200 | 12.188607ms| match| GET /api/projects/3 r:/api/projects/:id([0-9]+)
2020/05/17 15:31:19 [D] [server.go:2774] | 10.14.133.223| 200 | 41.119236ms| match| GET /api/projects r:/api/projects/
2020/05/17 15:31:19 [D] [server.go:2774] | 10.14.133.223| 200 | 17.705786ms| match| GET /api/projects/3/summary r:/api/projects/:id([0-9]+)/summary
2020/05/17 15:31:21 [D] [server.go:2774] | 10.14.133.223| 200 | 11.394304ms| match| GET /api/systeminfo r:/api/systeminfo
2020/05/17 15:31:21 [D] [server.go:2774] | 10.14.133.223| 200 | 96.774652ms| match| GET /api/repositories r:/api/repositories
2020/05/17 15:31:21 [D] [server.go:2774] | 10.14.133.223| 200 | 88.841389ms| match| GET /api/repositories r:/api/repositories
2020/05/17 15:31:25 [D] [server.go:2774] | 172.16.12.1| 200 | 598.692µs| match| GET /api/ping r:/api/ping
2020-05-17T15:31:26Z [WARNING] [/core/systeminfo/imagestorage/filesystem/driver.go:51]: The path /data is not found, will return zero value of capacity
2020/05/17 15:31:26 [D] [server.go:2774] | 10.14.133.223| 200 | 872.73µs| match| GET /api/systeminfo/volumes r:/api/systeminfo/volumes
2020/05/17 15:31:26 [D] [server.go:2774] | 10.14.133.223| 200 | 1.53326ms| match| GET /api/configurations r:/api/configurations
2020/05/17 15:31:26 [D] [server.go:2774] | 10.14.133.223| 200 | 7.776343ms| match| GET /api/statistics r:/api/statistics
2020/05/17 15:31:26 [D] [server.go:2774] | 10.14.133.223| 200 | 34.05679ms| match| GET /api/projects r:/api/projects/
2020/05/17 15:31:27 [D] [server.go:2774] | 172.16.12.1| 200 | 597.104µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:35 [D] [server.go:2774] | 172.16.12.1| 200 | 617.784µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:37 [D] [server.go:2774] | 172.16.12.1| 200 | 670.764µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:45 [D] [server.go:2774] | 172.16.12.1| 200 | 594.816µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:47 [D] [server.go:2774] | 172.16.12.1| 200 | 620.794µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:55 [D] [server.go:2774] | 172.16.12.1| 200 | 614.962µs| match| GET /api/ping r:/api/ping
2020/05/17 15:31:57 [D] [server.go:2774] | 172.16.12.1| 200 | 646.838µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:05 [D] [server.go:2774] | 172.16.12.1| 200 | 1.512053ms| match| GET /api/ping r:/api/ping
2020/05/17 15:32:07 [D] [server.go:2774] | 172.16.12.1| 200 | 604.774µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:15 [D] [server.go:2774] | 172.16.12.1| 200 | 584.353µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:17 [D] [server.go:2774] | 172.16.12.1| 200 | 571.165µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:25 [D] [server.go:2774] | 172.16.12.1| 200 | 664.619µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:27 [D] [server.go:2774] | 172.16.12.1| 200 | 585.596µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:32 [D] [server.go:2774] | 10.14.133.223| 200 | 1.425728ms| match| GET /api/registries r:/api/registries
2020/05/17 15:32:32 [D] [server.go:2774] | 10.14.133.223| 200 | 652.048µs| match| GET /api/replication/adapters r:/api/replication/adapters
2020/05/17 15:32:32 [D] [server.go:2774] | 10.14.133.223| 200 | 42.583993ms| match| GET /api/projects r:/api/projects/
2020/05/17 15:32:33 [D] [server.go:2774] | 10.14.133.223| 200 | 7.083254ms| match| GET /api/replication/policies r:/api/replication/policies
2020/05/17 15:32:33 [D] [server.go:2774] | 10.14.133.223| 200 | 1.176269ms| match| GET /api/registries r:/api/registries
2020/05/17 15:32:35 [D] [server.go:2774] | 172.16.12.1| 200 | 1.031396ms| match| GET /api/ping r:/api/ping
2020/05/17 15:32:37 [D] [server.go:2774] | 172.16.12.1| 200 | 689.483µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:40 [D] [server.go:2774] | 10.14.133.223| 200 | 11.612709ms| match| GET /api/replication/executions r:/api/replication/executions
2020/05/17 15:32:42 [D] [server.go:2774] | 10.14.133.223| 200 | 3.353095ms| match| GET /api/replication/policies/2 r:/api/replication/policies/:id([0-9]+)
2020/05/17 15:32:42 [D] [server.go:2774] | 10.14.133.223| 200 | 24.422296ms| match| GET /api/registries/1/info r:/api/registries/:id/info
2020/05/17 15:32:45 [D] [server.go:2774] | 172.16.12.1| 200 | 617.396µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:47 [D] [server.go:2774] | 172.16.12.1| 200 | 580.195µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:51 [D] [server.go:2774] | 10.14.133.223| 201 | 6.959826ms| match| POST /api/replication/executions r:/api/replication/executions
2020/05/17 15:32:51 [D] [server.go:2774] | 10.14.133.223| 200 | 7.618415ms| match| GET /api/replication/executions r:/api/replication/executions
2020-05-17T15:32:52Z [ERROR] [/common/utils/passports.go:109]: List tags for repo 'ai-cloud/quay.io/coreos/kube-state-metrics' error: http error: code 500, message
2020-05-17T15:32:55Z [ERROR] [/replication/operation/controller.go:108]: the execution 11 failed: failed to fetch image: FetchImages error when collect tags for repos
2020/05/17 15:32:55 [D] [server.go:2774] | 172.16.12.1| 200 | 663.94µs| match| GET /api/ping r:/api/ping
2020/05/17 15:32:57 [D] [server.go:2774] | 172.16.12.1| 200 | 685.434µs| match| GET /api/ping r:/api/ping
2020/05/17 15:33:01 [D] [server.go:2774] | 10.14.133.223| 200 | 2.960291ms| match| GET /api/replication/executions r:/api/replication/executions
2020/05/17 15:33:03 [D] [server.go:2774] | 29.2.32.132| 401 | 1.799198ms| match| GET /v2/ r:/v2/*
2020/05/17 15:33:03 [D] [server.go:2774] | 29.2.32.132| 200 | 17.12181ms| match| GET /service/token r:/service/token
2020/05/17 15:33:03 [D] [server.go:2774] | 29.2.32.132| 200 | 1.604455ms| match| HEAD /v2/ r:/v2/*
【分析原因】
在Issue:failed to fetch image: FetchImages error when collect tags for repos #9295有讨论这个问题,不过它复制的仓库是基于gitlab-based仓库,而且我的case里面jobservice完全没有任何报错信息,jobservice还没起作用就failed了。而且我的core里面报这个错的前面一个错是:List tags for repo 'ai-cloud/quay.io/coreos/kube-state-metrics' error: http error: code 500, message
,而且单独从API上list tags是没有问题的。
所以根本上还是为何会在大量tag的时候遍历镜像tag为何会报500错误?自己单独调用API遍历整个仓库的时候都不会报错。
根据官方文档,1.9版本是个分水岭,低于1.9的版本和1.9版本不能同步。后面升级版本后再验证下
【解决方案】
6. 【已解决】harbor界面上一次删除多个tag异常产生0B size tag无法删除
【版本环境】多个版本有遇到此情况
【安装方式】——
【操作描述】在harbor界面上一次删除多个tag
【报错信息】删除后页面刷新,返回http code 200删除成功,但刷新页面后tag还在,但是size变为0B,并且在UI中无法再删除
【分析原因】
查看ui.log日志,报:unknow error,看起来是manifests找不到link指向数据了
# tail --tail 100 ui.log
[ERROR] [repository.go:298]: failed to delete tag v2: 500 {"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{"Path":
"/docker/registry/v2/repositories/libriry/test/_manifests/tags/v1/current/link","DriverName":"filesystem"}}]}
# 我们根据日志来看具体的link指向
# cat /docker/registry/docker/registry/v2/repositories/libriry/test/_manifests/tags/6b32ea9cfc8b3df0c2962336661077865baa9d8d/index/sha256/e7dd642e969acb450e3f5fe9067d096f0947c5f3084d23c0fea8038087d038f7/link
sha256:e7dd642e969acb450e3f5fe9067d096f0947c5f3084d23c0fea8038087d038f7[/]#
# 根据dist来到registry里面找layer文件,提示无法找到
# ls /data/registry/docker/registry/v2/blobs/sha256/e7/e7dd*
ls: cannot access /data/registry/docker/registry/v2/blobs/sha256/e7/e7dd*: No such file or directory
再看registry.log日志
Dec 11 15:48:59 172.18.0.1 registry[3597772]: time="2018-12-11T07:48:59.125019435Z" level=error msg="response completed with error" auth.user.name=harborAdmin err.code=unknown err.message="fi
lesystem: Path not found: /docker/registry/v2/repositories/testforrsync/test2/_manifests/tags/v1/current/link" go.version=go1.7.3 http.request.host="10.182.2.108:25000" http.request.id=51bbb9
d6-22cc-46d5-a204-c76f28955cb8 http.request.method=DELETE http.request.remoteaddr=10.182.2.120 http.request.uri="/v2/testforrsync/test2/manifests/sha256:3e533db287adef4291610c6186c49efcb808e8
eb76f5b8e8873077a769fd09bc" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=7.86665ms http.response.status=500 h
ttp.response.written=188 instance.id=9c456576-0ef7-4676-9710-fa01b8527d3e service=registry vars.name="testforrsync/test2" vars.reference="sha256:3e533db287adef4291610c6186c49efcb808e8eb76f5b8
e8873077a769fd09bc" version=v2.6.2
看起来是docker registry的bug,而不是harbor,我们直接操作registry去删除或者查询也是报manifest unknown
这个错
curl -X DELETE -H 'Authorization: Bearer xxx' http://registry:5000/v2/myrepo/myimage/manifests/sha256:03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097
{
"errors": [
{
"code": "MANIFEST_UNKNOWN",
"message": "manifest unknown"
}
]
}
curl -H 'Authorization: Bearer xxx' http://registry:5000/v2/myrepo/myimage/manifests/sha256:03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097
{
"errors": [
{
"code": "MANIFEST_UNKNOWN",
"detail": {
"Name": "myrepo/myimage",
"Revision": "sha256:03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097"
},
"message": "manifest unknown"
}
]
}
因为正如上面日志分析的,你到存储系统上去看该文件指向是空的
/data/docker/registry/v2/repositories/myrepo/myimage/_manifests/revisions/sha256/03d7371cdfd09a41f04cf4f862e1204a91231f80c1bae785231fafd936491097
但是对于正常的镜像来说,这个路径下面应该是有一个link文件,并有相应的文件在/data/docker/registry/v2/blobs/sha256
下面,
根据官方的回答是:These are due to a bug in upstream docker distribution, essentially the blob is not written to the storage but the manifest created successfully and referencing the non-existing blob.参见:Delete tag failed – size 0 image. #6515
即这是因为docker distribution的问题,也就是说manifest文件已经创建成功了,但是blob并没有写到存储里面,导致映射到一个不存在的blob。
【解决方案】
对于已经产生了0B的镜像,可以通过重新push该tag的镜像,并再次从UI中删除。
对于未出现想要规避这种情况的的,可以考虑
(1)对于现存版本,若要批量删除tag,请以api的方式逐一删除;
(2)升级新版本到2.0,因为2.0版本采用OCI工件方式存储镜像
7. 同步复制镜像报404
【版本环境】多个版本有遇到
【安装方式】——
【操作描述】从一个harbor仓库同步/复制镜像到另外一个仓库
【报错信息】在仓库镜像很多的时候复制会遇到不少的manifest 404报错,日志:pull manifest from local registry failed
,404 tag unknown, tag=xxx.
【分析原因】
这里我们可能需要大致了解下harbor与docker registry的交互方式,我们知道harbor其实是利用了docker registry,本质上是对docker registry的管理。所以理解它与registry的交互就很关键了:
首先是docker registry持续监听是否有客户端push的操作,一旦监听到,则会触发notification通知(event 1),然后是registry在后端存储当中创建tag file(event 2),而当harbor从registry接收到一个镜像的推送通知(push notification)之后,它就会从通知消息当中分离出repo和tag信息,然后会执行一个pull manifest的操作(类似docker pull xxx.com/:),这里表示为event 3,这回触发registry在后端读取tag file。
实际上,event 2和event 3是异步进行的,如果event 2在event 3请求抵达之前就已经完成了,那么什么事都没有,但,如果event 3请求到达的时候event 2还没有结束,那么就回返回一个404 tag unknow error错误。所以往往在遇到404的手只需要将镜像同步组件短暂的休息50ms,那么问题就解决了。
参考这两个issue的分析:
Wait for manifest in notification handler #6223
registry sends out image push notifications before tag is ready in backend storage #2625
【解决方案】
所以再次遇到此类错误,我们可以遵循这个思路和原则看如何处理
更多推荐
所有评论(0)