fix:sample/plate 之前的开发
This commit is contained in:
227
docs/architecture/00-overall-data-architecture.md
Normal file
227
docs/architecture/00-overall-data-architecture.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# BrAPI Test Server 总体数据架构图
|
||||
|
||||
本文档把 4 个模块串成一张总览图:
|
||||
|
||||
```text
|
||||
Core -> Germplasm/Seed -> Phenotyping -> Genotyping
|
||||
```
|
||||
|
||||
对应的模块文档:
|
||||
|
||||
| 模块 | 文档 | 核心作用 |
|
||||
| --- | --- | --- |
|
||||
| Core | `core-data-flow.md` | crop、program、trial、study、location、person 等基础上下文 |
|
||||
| Germplasm/Seed | `04-germplasm-seed-data-flow.md` | germplasm、breeding_method、seed_lot、cross、pedigree、attribute |
|
||||
| Phenotyping | `02-phenotyping-data-flow.md` | observation_unit、observation_variable、event、image、observation |
|
||||
| Genotyping | `03-genotyping-data-flow.md` | sample、plate、reference、variantset、variant、callset、allele_call |
|
||||
|
||||
## 总体结论
|
||||
|
||||
整个数据模型的主干是:
|
||||
|
||||
```text
|
||||
Core: crop -> program -> trial -> study
|
||||
Germplasm: breeding_method -> germplasm -> cross / seed_lot / pedigree / attribute
|
||||
Phenotyping: study + germplasm/seed_lot/cross -> observation_unit -> observation
|
||||
Genotyping: observation_unit/study -> sample -> callset -> allele_call
|
||||
Genotyping: reference_set -> variantset -> variant -> allele_call
|
||||
```
|
||||
|
||||
`study` 是 Core 到 Phenotyping/Genotyping 的主桥;`germplasm` 是 Germplasm/Seed 到 Phenotyping/Genotyping 的主桥;`observation_unit` 是 Phenotyping 到 Genotyping 的主桥。
|
||||
|
||||
## 总架构图
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph CORE["Core 基础上下文"]
|
||||
CROP["crop<br/>作物"]
|
||||
PERSON["person<br/>人员"]
|
||||
PROGRAM["program<br/>项目"]
|
||||
LOCATION["location<br/>地点"]
|
||||
TRIAL["trial<br/>试验批次"]
|
||||
SEASON["season<br/>季节"]
|
||||
STUDY["study<br/>研究/试验实施单元"]
|
||||
LIST["list / list_item<br/>通用列表"]
|
||||
|
||||
CROP --> PROGRAM
|
||||
PERSON --> PROGRAM
|
||||
PROGRAM --> TRIAL
|
||||
CROP --> TRIAL
|
||||
PROGRAM --> LOCATION
|
||||
CROP --> LOCATION
|
||||
TRIAL --> STUDY
|
||||
PROGRAM --> STUDY
|
||||
CROP --> STUDY
|
||||
LOCATION --> STUDY
|
||||
SEASON --> STUDY
|
||||
PERSON --> LIST
|
||||
end
|
||||
|
||||
subgraph GERM["Germplasm / Seed 种质与种子"]
|
||||
BM["breeding_method<br/>育种方法"]
|
||||
GERMPLASM["germplasm<br/>种质"]
|
||||
GAD["germplasm_attribute_definition<br/>属性定义"]
|
||||
GAV["germplasm_attribute_value<br/>属性值"]
|
||||
CP["crossing_project<br/>杂交项目"]
|
||||
CROSS["cross_entity<br/>Cross / PlannedCross"]
|
||||
XP["cross_parent<br/>杂交亲本"]
|
||||
PEDNODE["pedigree_node<br/>系谱节点"]
|
||||
PEDEDGE["pedigree_edge<br/>系谱边"]
|
||||
SEEDLOT["seed_lot<br/>种子批次"]
|
||||
MIX["seed_lot_content_mixture<br/>批次组成"]
|
||||
TX["seed_lot_transaction<br/>批次流转"]
|
||||
|
||||
BM --> GERMPLASM
|
||||
GAD --> GAV
|
||||
GERMPLASM --> GAV
|
||||
CP --> CROSS
|
||||
CROSS --> XP
|
||||
GERMPLASM --> XP
|
||||
CROSS --> CROSS_PLANNED["cross_entity<br/>planned cross 自关联"]
|
||||
GERMPLASM --> PEDNODE
|
||||
CP --> PEDNODE
|
||||
PEDNODE --> PEDEDGE
|
||||
PEDEDGE --> PEDNODE2["pedigree_node<br/>父本/子代节点"]
|
||||
GERMPLASM --> MIX
|
||||
CROSS --> MIX
|
||||
MIX --> SEEDLOT
|
||||
SEEDLOT --> TX
|
||||
TX --> SEEDLOT
|
||||
end
|
||||
|
||||
subgraph PHENO["Phenotyping 表型"]
|
||||
ONTOLOGY["ontology<br/>本体"]
|
||||
TRAIT["trait<br/>性状"]
|
||||
METHOD["method<br/>方法"]
|
||||
SCALE["scale<br/>标尺"]
|
||||
OV["observation_variable<br/>观测变量"]
|
||||
OU["observation_unit<br/>观测单元"]
|
||||
EVENT["event<br/>事件"]
|
||||
IMAGE["image<br/>图像"]
|
||||
OBS["observation<br/>观测值"]
|
||||
|
||||
ONTOLOGY --> TRAIT
|
||||
ONTOLOGY --> METHOD
|
||||
ONTOLOGY --> SCALE
|
||||
TRAIT --> OV
|
||||
METHOD --> OV
|
||||
SCALE --> OV
|
||||
OU --> OBS
|
||||
OV --> OBS
|
||||
EVENT --> OU
|
||||
OU --> IMAGE
|
||||
IMAGE --> OBS
|
||||
end
|
||||
|
||||
subgraph GENO["Genotyping 基因型"]
|
||||
PLATE["plate<br/>样本板"]
|
||||
SAMPLE["sample<br/>样本"]
|
||||
REFSET["reference_set<br/>参考集"]
|
||||
REF["reference<br/>参考序列"]
|
||||
REFB["reference_bases<br/>参考片段"]
|
||||
VARSET["variantset<br/>变异集合"]
|
||||
VARIANT["variant<br/>变异位点"]
|
||||
CALLSET["callset<br/>样本调用集合"]
|
||||
CALL["allele_call<br/>基因型结果"]
|
||||
GMAP["genome_map<br/>遗传图谱"]
|
||||
LG["linkageGroup<br/>连锁群"]
|
||||
MP["marker_position<br/>图谱位置"]
|
||||
|
||||
PLATE --> SAMPLE
|
||||
SAMPLE --> CALLSET
|
||||
CALLSET --> CALL
|
||||
REFSET --> REF
|
||||
REF --> REFB
|
||||
REFSET --> VARSET
|
||||
VARSET --> VARIANT
|
||||
REFSET --> VARIANT
|
||||
VARIANT --> CALL
|
||||
GMAP --> LG
|
||||
LG --> MP
|
||||
VARIANT --> MP
|
||||
end
|
||||
|
||||
CROP --> GERMPLASM
|
||||
CROP --> GAD
|
||||
TRAIT --> GAD
|
||||
METHOD --> GAD
|
||||
SCALE --> GAD
|
||||
ONTOLOGY --> GAD
|
||||
PROGRAM --> CP
|
||||
PROGRAM --> SEEDLOT
|
||||
LOCATION --> SEEDLOT
|
||||
|
||||
STUDY --> OU
|
||||
TRIAL --> OU
|
||||
PROGRAM --> OU
|
||||
CROP --> OU
|
||||
GERMPLASM --> OU
|
||||
SEEDLOT --> OU
|
||||
CROSS --> OU
|
||||
|
||||
STUDY --> EVENT
|
||||
STUDY --> OBS
|
||||
TRIAL --> OBS
|
||||
PROGRAM --> OBS
|
||||
CROP --> OBS
|
||||
|
||||
STUDY --> PLATE
|
||||
TRIAL --> PLATE
|
||||
PROGRAM --> PLATE
|
||||
STUDY --> SAMPLE
|
||||
TRIAL --> SAMPLE
|
||||
PROGRAM --> SAMPLE
|
||||
OU --> SAMPLE
|
||||
|
||||
GERMPLASM --> REFSET
|
||||
STUDY --> VARSET
|
||||
CROP --> GMAP
|
||||
```
|
||||
|
||||
## 跨模块关键桥接关系
|
||||
|
||||
| 桥接点 | 连接模块 | 说明 |
|
||||
| --- | --- | --- |
|
||||
| `crop` | Core -> Germplasm/Pheno/Geno | 作物维度贯穿 program、trial、study、germplasm、变量、图谱 |
|
||||
| `program` | Core -> Germplasm/Seed/Pheno/Geno | 项目维度连接 crossing_project、seed_lot、observation_unit、sample、plate |
|
||||
| `trial` | Core -> Pheno/Geno | 试验批次维度连接 study、observation_unit、observation、sample、plate |
|
||||
| `study` | Core -> Pheno/Geno | 最重要的实验上下文,连接 observation_unit、event、observation、sample、plate、variantset |
|
||||
| `germplasm` | Germplasm -> Pheno/Geno | 种质可连接 observation_unit、cross_parent、seed_lot_content_mixture、reference_set |
|
||||
| `seed_lot` | Germplasm/Seed -> Pheno | SeedLot 可作为 observation_unit 的材料来源 |
|
||||
| `cross_entity` | Germplasm/Seed -> Pheno | Cross/PlannedCross 可作为 observation_unit 或 seed_lot_content_mixture 的来源 |
|
||||
| `observation_unit` | Pheno -> Geno | 表型观测单元可生成或关联 genotyping sample |
|
||||
| `sample` | Geno 内部入口 | 从 observation_unit/study/trial/program 进入 callset 和 allele_call |
|
||||
| `variant` | Geno 内部位点 | 与 allele_call、marker_position 连接,承载基因型结果定位 |
|
||||
|
||||
## 推荐整体录入顺序
|
||||
|
||||
1. 录入 Core 基础上下文:`crop`、`person`、`program`、`location`、`trial`、`season`、`study`。
|
||||
2. 录入 Germplasm 上游:`breeding_method`、`germplasm_attribute_definition` 依赖的 `trait/method/scale/ontology`。
|
||||
3. 录入 `germplasm`,再补充 `germplasm_attribute_value`、donor、origin、institute、synonym、taxon 等扩展信息。
|
||||
4. 如果涉及杂交,录入 `crossing_project`、`cross_entity`、`cross_parent`;计划杂交使用 `cross_entity.planned` 和 `planned_cross_id` 自关联表达。
|
||||
5. 录入 Seed 数据:`seed_lot`、`seed_lot_content_mixture`、`seed_lot_transaction`。
|
||||
6. 录入 Phenotyping 定义:`ontology`、`trait`、`method`、`scale`、`observation_variable`。
|
||||
7. 录入 Phenotyping 实体与事实:`observation_unit`、`event`、`image`、`observation`。
|
||||
8. 录入 Genotyping 样本入口:`plate`、`sample`。
|
||||
9. 录入 Genotyping 参考和变异:`reference_set`、`reference`、`reference_bases`、`variantset`、`variant`。
|
||||
10. 录入 Genotyping 结果:`callset`、`callset_variant_sets`、`allele_call`。
|
||||
11. 如需遗传图谱定位,录入 `genome_map`、`linkageGroup`、`marker_position`。
|
||||
|
||||
## 模块边界速记
|
||||
|
||||
| 模块 | 根节点 | 主要事实表 | 向外输出 |
|
||||
| --- | --- | --- | --- |
|
||||
| Core | `crop/program/trial/study` | `study` | 给所有业务模块提供上下文 |
|
||||
| Germplasm/Seed | `germplasm` | `germplasm_attribute_value`, `seed_lot_content_mixture`, `seed_lot_transaction`, `cross_parent`, `pedigree_edge` | 给 Pheno 提供材料来源,给 Geno 提供 reference source |
|
||||
| Phenotyping | `observation_unit` | `observation` | 给 Geno 提供 sample 的观测对象来源 |
|
||||
| Genotyping | `sample`, `variant` | `allele_call` | 输出样本在位点上的 genotype 结果 |
|
||||
|
||||
## 关键注意点
|
||||
|
||||
1. `study` 是大多数实验数据的上下文入口;如果数据要进入 Pheno 或 Geno,通常都应该能追溯到 `study`。
|
||||
2. `germplasm` 描述种质主数据,`seed_lot` 描述库存批次;二者通过 `seed_lot_content_mixture` 间接关联。
|
||||
3. `plannedcross` 没有独立数据库表,落库在 `cross_entity`,通过 `planned` 和 `planned_cross_id` 表达。
|
||||
4. `observation_unit` 可以关联 `germplasm`、`seed_lot`、`cross_entity`,是材料进入表型观测的入口。
|
||||
5. `sample` 可以从 `observation_unit` 来,也冗余关联 `study/trial/program`,是基因型流程入口。
|
||||
6. `allele_call` 是最终 genotype 结果表,连接 `callset` 与 `variant`。
|
||||
7. `additional_info` 和 `external_references` 是跨模块通用扩展表,主图中未展开,以免遮挡主干关系。
|
||||
Reference in New Issue
Block a user