Files
brapi-java/docs/architecture/00-overall-data-architecture.md
2026-05-28 11:56:17 +08:00

9.0 KiB
Raw Permalink Blame History

BrAPI Test Server 总体数据架构图

本文档把 4 个模块串成一张总览图:

Core -> Germplasm/Seed -> Phenotyping -> Genotyping

对应的模块文档:

模块 文档 核心作用
Core core-data-flow.md crop、program、trial、study、location、person 等基础上下文
Germplasm/Seed 04-germplasm-seed-data-flow.md germplasm、breeding_method、seed_lot、cross、pedigree、attribute
Phenotyping 02-phenotyping-data-flow.md observation_unit、observation_variable、event、image、observation
Genotyping 03-genotyping-data-flow.md sample、plate、reference、variantset、variant、callset、allele_call

总体结论

整个数据模型的主干是:

Core: crop -> program -> trial -> study
Germplasm: breeding_method -> germplasm -> cross / seed_lot / pedigree / attribute
Phenotyping: study + germplasm/seed_lot/cross -> observation_unit -> observation
Genotyping: observation_unit/study -> sample -> callset -> allele_call
Genotyping: reference_set -> variantset -> variant -> allele_call

study 是 Core 到 Phenotyping/Genotyping 的主桥;germplasm 是 Germplasm/Seed 到 Phenotyping/Genotyping 的主桥;observation_unit 是 Phenotyping 到 Genotyping 的主桥。

总架构图

flowchart TD
    subgraph CORE["Core 基础上下文"]
        CROP["crop<br/>作物"]
        PERSON["person<br/>人员"]
        PROGRAM["program<br/>项目"]
        LOCATION["location<br/>地点"]
        TRIAL["trial<br/>试验批次"]
        SEASON["season<br/>季节"]
        STUDY["study<br/>研究/试验实施单元"]
        LIST["list / list_item<br/>通用列表"]

        CROP --> PROGRAM
        PERSON --> PROGRAM
        PROGRAM --> TRIAL
        CROP --> TRIAL
        PROGRAM --> LOCATION
        CROP --> LOCATION
        TRIAL --> STUDY
        PROGRAM --> STUDY
        CROP --> STUDY
        LOCATION --> STUDY
        SEASON --> STUDY
        PERSON --> LIST
    end

    subgraph GERM["Germplasm / Seed 种质与种子"]
        BM["breeding_method<br/>育种方法"]
        GERMPLASM["germplasm<br/>种质"]
        GAD["germplasm_attribute_definition<br/>属性定义"]
        GAV["germplasm_attribute_value<br/>属性值"]
        CP["crossing_project<br/>杂交项目"]
        CROSS["cross_entity<br/>Cross / PlannedCross"]
        XP["cross_parent<br/>杂交亲本"]
        PEDNODE["pedigree_node<br/>系谱节点"]
        PEDEDGE["pedigree_edge<br/>系谱边"]
        SEEDLOT["seed_lot<br/>种子批次"]
        MIX["seed_lot_content_mixture<br/>批次组成"]
        TX["seed_lot_transaction<br/>批次流转"]

        BM --> GERMPLASM
        GAD --> GAV
        GERMPLASM --> GAV
        CP --> CROSS
        CROSS --> XP
        GERMPLASM --> XP
        CROSS --> CROSS_PLANNED["cross_entity<br/>planned cross 自关联"]
        GERMPLASM --> PEDNODE
        CP --> PEDNODE
        PEDNODE --> PEDEDGE
        PEDEDGE --> PEDNODE2["pedigree_node<br/>父本/子代节点"]
        GERMPLASM --> MIX
        CROSS --> MIX
        MIX --> SEEDLOT
        SEEDLOT --> TX
        TX --> SEEDLOT
    end

    subgraph PHENO["Phenotyping 表型"]
        ONTOLOGY["ontology<br/>本体"]
        TRAIT["trait<br/>性状"]
        METHOD["method<br/>方法"]
        SCALE["scale<br/>标尺"]
        OV["observation_variable<br/>观测变量"]
        OU["observation_unit<br/>观测单元"]
        EVENT["event<br/>事件"]
        IMAGE["image<br/>图像"]
        OBS["observation<br/>观测值"]

        ONTOLOGY --> TRAIT
        ONTOLOGY --> METHOD
        ONTOLOGY --> SCALE
        TRAIT --> OV
        METHOD --> OV
        SCALE --> OV
        OU --> OBS
        OV --> OBS
        EVENT --> OU
        OU --> IMAGE
        IMAGE --> OBS
    end

    subgraph GENO["Genotyping 基因型"]
        PLATE["plate<br/>样本板"]
        SAMPLE["sample<br/>样本"]
        REFSET["reference_set<br/>参考集"]
        REF["reference<br/>参考序列"]
        REFB["reference_bases<br/>参考片段"]
        VARSET["variantset<br/>变异集合"]
        VARIANT["variant<br/>变异位点"]
        CALLSET["callset<br/>样本调用集合"]
        CALL["allele_call<br/>基因型结果"]
        GMAP["genome_map<br/>遗传图谱"]
        LG["linkageGroup<br/>连锁群"]
        MP["marker_position<br/>图谱位置"]

        PLATE --> SAMPLE
        SAMPLE --> CALLSET
        CALLSET --> CALL
        REFSET --> REF
        REF --> REFB
        REFSET --> VARSET
        VARSET --> VARIANT
        REFSET --> VARIANT
        VARIANT --> CALL
        GMAP --> LG
        LG --> MP
        VARIANT --> MP
    end

    CROP --> GERMPLASM
    CROP --> GAD
    TRAIT --> GAD
    METHOD --> GAD
    SCALE --> GAD
    ONTOLOGY --> GAD
    PROGRAM --> CP
    PROGRAM --> SEEDLOT
    LOCATION --> SEEDLOT

    STUDY --> OU
    TRIAL --> OU
    PROGRAM --> OU
    CROP --> OU
    GERMPLASM --> OU
    SEEDLOT --> OU
    CROSS --> OU

    STUDY --> EVENT
    STUDY --> OBS
    TRIAL --> OBS
    PROGRAM --> OBS
    CROP --> OBS

    STUDY --> PLATE
    TRIAL --> PLATE
    PROGRAM --> PLATE
    STUDY --> SAMPLE
    TRIAL --> SAMPLE
    PROGRAM --> SAMPLE
    OU --> SAMPLE

    GERMPLASM --> REFSET
    STUDY --> VARSET
    CROP --> GMAP

跨模块关键桥接关系

桥接点 连接模块 说明
crop Core -> Germplasm/Pheno/Geno 作物维度贯穿 program、trial、study、germplasm、变量、图谱
program Core -> Germplasm/Seed/Pheno/Geno 项目维度连接 crossing_project、seed_lot、observation_unit、sample、plate
trial Core -> Pheno/Geno 试验批次维度连接 study、observation_unit、observation、sample、plate
study Core -> Pheno/Geno 最重要的实验上下文,连接 observation_unit、event、observation、sample、plate、variantset
germplasm Germplasm -> Pheno/Geno 种质可连接 observation_unit、cross_parent、seed_lot_content_mixture、reference_set
seed_lot Germplasm/Seed -> Pheno SeedLot 可作为 observation_unit 的材料来源
cross_entity Germplasm/Seed -> Pheno Cross/PlannedCross 可作为 observation_unit 或 seed_lot_content_mixture 的来源
observation_unit Pheno -> Geno 表型观测单元可生成或关联 genotyping sample
sample Geno 内部入口 从 observation_unit/study/trial/program 进入 callset 和 allele_call
variant Geno 内部位点 与 allele_call、marker_position 连接,承载基因型结果定位

推荐整体录入顺序

  1. 录入 Core 基础上下文:croppersonprogramlocationtrialseasonstudy
  2. 录入 Germplasm 上游:breeding_methodgermplasm_attribute_definition 依赖的 trait/method/scale/ontology
  3. 录入 germplasm,再补充 germplasm_attribute_value、donor、origin、institute、synonym、taxon 等扩展信息。
  4. 如果涉及杂交,录入 crossing_projectcross_entitycross_parent;计划杂交使用 cross_entity.plannedplanned_cross_id 自关联表达。
  5. 录入 Seed 数据:seed_lotseed_lot_content_mixtureseed_lot_transaction
  6. 录入 Phenotyping 定义:ontologytraitmethodscaleobservation_variable
  7. 录入 Phenotyping 实体与事实:observation_uniteventimageobservation
  8. 录入 Genotyping 样本入口:platesample
  9. 录入 Genotyping 参考和变异:reference_setreferencereference_basesvariantsetvariant
  10. 录入 Genotyping 结果:callsetcallset_variant_setsallele_call
  11. 如需遗传图谱定位,录入 genome_maplinkageGroupmarker_position

模块边界速记

模块 根节点 主要事实表 向外输出
Core crop/program/trial/study study 给所有业务模块提供上下文
Germplasm/Seed germplasm germplasm_attribute_value, seed_lot_content_mixture, seed_lot_transaction, cross_parent, pedigree_edge 给 Pheno 提供材料来源,给 Geno 提供 reference source
Phenotyping observation_unit observation 给 Geno 提供 sample 的观测对象来源
Genotyping sample, variant allele_call 输出样本在位点上的 genotype 结果

关键注意点

  1. study 是大多数实验数据的上下文入口;如果数据要进入 Pheno 或 Geno通常都应该能追溯到 study
  2. germplasm 描述种质主数据,seed_lot 描述库存批次;二者通过 seed_lot_content_mixture 间接关联。
  3. plannedcross 没有独立数据库表,落库在 cross_entity,通过 plannedplanned_cross_id 表达。
  4. observation_unit 可以关联 germplasmseed_lotcross_entity,是材料进入表型观测的入口。
  5. sample 可以从 observation_unit 来,也冗余关联 study/trial/program,是基因型流程入口。
  6. allele_call 是最终 genotype 结果表,连接 callsetvariant
  7. additional_infoexternal_references 是跨模块通用扩展表,主图中未展开,以免遮挡主干关系。