Files
brapi-java/docs/architecture/00-overall-data-architecture.md
2026-05-28 11:56:17 +08:00

228 lines
9.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BrAPI Test Server 总体数据架构图
本文档把 4 个模块串成一张总览图:
```text
Core -> Germplasm/Seed -> Phenotyping -> Genotyping
```
对应的模块文档:
| 模块 | 文档 | 核心作用 |
| --- | --- | --- |
| Core | `core-data-flow.md` | crop、program、trial、study、location、person 等基础上下文 |
| Germplasm/Seed | `04-germplasm-seed-data-flow.md` | germplasm、breeding_method、seed_lot、cross、pedigree、attribute |
| Phenotyping | `02-phenotyping-data-flow.md` | observation_unit、observation_variable、event、image、observation |
| Genotyping | `03-genotyping-data-flow.md` | sample、plate、reference、variantset、variant、callset、allele_call |
## 总体结论
整个数据模型的主干是:
```text
Core: crop -> program -> trial -> study
Germplasm: breeding_method -> germplasm -> cross / seed_lot / pedigree / attribute
Phenotyping: study + germplasm/seed_lot/cross -> observation_unit -> observation
Genotyping: observation_unit/study -> sample -> callset -> allele_call
Genotyping: reference_set -> variantset -> variant -> allele_call
```
`study` 是 Core 到 Phenotyping/Genotyping 的主桥;`germplasm` 是 Germplasm/Seed 到 Phenotyping/Genotyping 的主桥;`observation_unit` 是 Phenotyping 到 Genotyping 的主桥。
## 总架构图
```mermaid
flowchart TD
subgraph CORE["Core 基础上下文"]
CROP["crop<br/>作物"]
PERSON["person<br/>人员"]
PROGRAM["program<br/>项目"]
LOCATION["location<br/>地点"]
TRIAL["trial<br/>试验批次"]
SEASON["season<br/>季节"]
STUDY["study<br/>研究/试验实施单元"]
LIST["list / list_item<br/>通用列表"]
CROP --> PROGRAM
PERSON --> PROGRAM
PROGRAM --> TRIAL
CROP --> TRIAL
PROGRAM --> LOCATION
CROP --> LOCATION
TRIAL --> STUDY
PROGRAM --> STUDY
CROP --> STUDY
LOCATION --> STUDY
SEASON --> STUDY
PERSON --> LIST
end
subgraph GERM["Germplasm / Seed 种质与种子"]
BM["breeding_method<br/>育种方法"]
GERMPLASM["germplasm<br/>种质"]
GAD["germplasm_attribute_definition<br/>属性定义"]
GAV["germplasm_attribute_value<br/>属性值"]
CP["crossing_project<br/>杂交项目"]
CROSS["cross_entity<br/>Cross / PlannedCross"]
XP["cross_parent<br/>杂交亲本"]
PEDNODE["pedigree_node<br/>系谱节点"]
PEDEDGE["pedigree_edge<br/>系谱边"]
SEEDLOT["seed_lot<br/>种子批次"]
MIX["seed_lot_content_mixture<br/>批次组成"]
TX["seed_lot_transaction<br/>批次流转"]
BM --> GERMPLASM
GAD --> GAV
GERMPLASM --> GAV
CP --> CROSS
CROSS --> XP
GERMPLASM --> XP
CROSS --> CROSS_PLANNED["cross_entity<br/>planned cross 自关联"]
GERMPLASM --> PEDNODE
CP --> PEDNODE
PEDNODE --> PEDEDGE
PEDEDGE --> PEDNODE2["pedigree_node<br/>父本/子代节点"]
GERMPLASM --> MIX
CROSS --> MIX
MIX --> SEEDLOT
SEEDLOT --> TX
TX --> SEEDLOT
end
subgraph PHENO["Phenotyping 表型"]
ONTOLOGY["ontology<br/>本体"]
TRAIT["trait<br/>性状"]
METHOD["method<br/>方法"]
SCALE["scale<br/>标尺"]
OV["observation_variable<br/>观测变量"]
OU["observation_unit<br/>观测单元"]
EVENT["event<br/>事件"]
IMAGE["image<br/>图像"]
OBS["observation<br/>观测值"]
ONTOLOGY --> TRAIT
ONTOLOGY --> METHOD
ONTOLOGY --> SCALE
TRAIT --> OV
METHOD --> OV
SCALE --> OV
OU --> OBS
OV --> OBS
EVENT --> OU
OU --> IMAGE
IMAGE --> OBS
end
subgraph GENO["Genotyping 基因型"]
PLATE["plate<br/>样本板"]
SAMPLE["sample<br/>样本"]
REFSET["reference_set<br/>参考集"]
REF["reference<br/>参考序列"]
REFB["reference_bases<br/>参考片段"]
VARSET["variantset<br/>变异集合"]
VARIANT["variant<br/>变异位点"]
CALLSET["callset<br/>样本调用集合"]
CALL["allele_call<br/>基因型结果"]
GMAP["genome_map<br/>遗传图谱"]
LG["linkageGroup<br/>连锁群"]
MP["marker_position<br/>图谱位置"]
PLATE --> SAMPLE
SAMPLE --> CALLSET
CALLSET --> CALL
REFSET --> REF
REF --> REFB
REFSET --> VARSET
VARSET --> VARIANT
REFSET --> VARIANT
VARIANT --> CALL
GMAP --> LG
LG --> MP
VARIANT --> MP
end
CROP --> GERMPLASM
CROP --> GAD
TRAIT --> GAD
METHOD --> GAD
SCALE --> GAD
ONTOLOGY --> GAD
PROGRAM --> CP
PROGRAM --> SEEDLOT
LOCATION --> SEEDLOT
STUDY --> OU
TRIAL --> OU
PROGRAM --> OU
CROP --> OU
GERMPLASM --> OU
SEEDLOT --> OU
CROSS --> OU
STUDY --> EVENT
STUDY --> OBS
TRIAL --> OBS
PROGRAM --> OBS
CROP --> OBS
STUDY --> PLATE
TRIAL --> PLATE
PROGRAM --> PLATE
STUDY --> SAMPLE
TRIAL --> SAMPLE
PROGRAM --> SAMPLE
OU --> SAMPLE
GERMPLASM --> REFSET
STUDY --> VARSET
CROP --> GMAP
```
## 跨模块关键桥接关系
| 桥接点 | 连接模块 | 说明 |
| --- | --- | --- |
| `crop` | Core -> Germplasm/Pheno/Geno | 作物维度贯穿 program、trial、study、germplasm、变量、图谱 |
| `program` | Core -> Germplasm/Seed/Pheno/Geno | 项目维度连接 crossing_project、seed_lot、observation_unit、sample、plate |
| `trial` | Core -> Pheno/Geno | 试验批次维度连接 study、observation_unit、observation、sample、plate |
| `study` | Core -> Pheno/Geno | 最重要的实验上下文,连接 observation_unit、event、observation、sample、plate、variantset |
| `germplasm` | Germplasm -> Pheno/Geno | 种质可连接 observation_unit、cross_parent、seed_lot_content_mixture、reference_set |
| `seed_lot` | Germplasm/Seed -> Pheno | SeedLot 可作为 observation_unit 的材料来源 |
| `cross_entity` | Germplasm/Seed -> Pheno | Cross/PlannedCross 可作为 observation_unit 或 seed_lot_content_mixture 的来源 |
| `observation_unit` | Pheno -> Geno | 表型观测单元可生成或关联 genotyping sample |
| `sample` | Geno 内部入口 | 从 observation_unit/study/trial/program 进入 callset 和 allele_call |
| `variant` | Geno 内部位点 | 与 allele_call、marker_position 连接,承载基因型结果定位 |
## 推荐整体录入顺序
1. 录入 Core 基础上下文:`crop``person``program``location``trial``season``study`
2. 录入 Germplasm 上游:`breeding_method``germplasm_attribute_definition` 依赖的 `trait/method/scale/ontology`
3. 录入 `germplasm`,再补充 `germplasm_attribute_value`、donor、origin、institute、synonym、taxon 等扩展信息。
4. 如果涉及杂交,录入 `crossing_project``cross_entity``cross_parent`;计划杂交使用 `cross_entity.planned``planned_cross_id` 自关联表达。
5. 录入 Seed 数据:`seed_lot``seed_lot_content_mixture``seed_lot_transaction`
6. 录入 Phenotyping 定义:`ontology``trait``method``scale``observation_variable`
7. 录入 Phenotyping 实体与事实:`observation_unit``event``image``observation`
8. 录入 Genotyping 样本入口:`plate``sample`
9. 录入 Genotyping 参考和变异:`reference_set``reference``reference_bases``variantset``variant`
10. 录入 Genotyping 结果:`callset``callset_variant_sets``allele_call`
11. 如需遗传图谱定位,录入 `genome_map``linkageGroup``marker_position`
## 模块边界速记
| 模块 | 根节点 | 主要事实表 | 向外输出 |
| --- | --- | --- | --- |
| Core | `crop/program/trial/study` | `study` | 给所有业务模块提供上下文 |
| Germplasm/Seed | `germplasm` | `germplasm_attribute_value`, `seed_lot_content_mixture`, `seed_lot_transaction`, `cross_parent`, `pedigree_edge` | 给 Pheno 提供材料来源,给 Geno 提供 reference source |
| Phenotyping | `observation_unit` | `observation` | 给 Geno 提供 sample 的观测对象来源 |
| Genotyping | `sample`, `variant` | `allele_call` | 输出样本在位点上的 genotype 结果 |
## 关键注意点
1. `study` 是大多数实验数据的上下文入口;如果数据要进入 Pheno 或 Geno通常都应该能追溯到 `study`
2. `germplasm` 描述种质主数据,`seed_lot` 描述库存批次;二者通过 `seed_lot_content_mixture` 间接关联。
3. `plannedcross` 没有独立数据库表,落库在 `cross_entity`,通过 `planned``planned_cross_id` 表达。
4. `observation_unit` 可以关联 `germplasm``seed_lot``cross_entity`,是材料进入表型观测的入口。
5. `sample` 可以从 `observation_unit` 来,也冗余关联 `study/trial/program`,是基因型流程入口。
6. `allele_call` 是最终 genotype 结果表,连接 `callset``variant`
7. `additional_info``external_references` 是跨模块通用扩展表,主图中未展开,以免遮挡主干关系。