The core objective of this internship project is the accurate prediction of crop gene phenotypes. Our core research logic is as follows: the differential expression of SNPs in different samples is the key factor driving the phenotypic differentiation of crops. Therefore, we can establish a mapping relationship between genes and phenotypes through machine learning models to achieve direct prediction of phenotypes from genetic data, providing technical support for precise screening at the seed stage.
In the implementation of the project, I mainly completed the following work:
- Taking the SNP state matrix as input, the high-dimensional genetic heterozygosity data is mapped to a low-dimensional latent space through an encoder to learn the potential features of gene expression;
- The reconstruction of genetic heterozygous features is completed through a decoder to verify the effectiveness of feature extraction;
- The core features of the latent layer are concatenated to a dedicated decoder, the structure is optimized for standardized phenotypic data, and the data is mapped to the phenotypic feature dimension through a fully connected layer, ultimately realizing an end-to-end prediction process of genetic input – feature extraction – phenotypic output. After multiple rounds of parameter tuning and iteration, a stable and efficient prediction model was successfully built.
During the internship, I not only put my professional knowledge into practice and consolidated my technical foundation, but also realized the importance of team learning through communication and collaboration with senior fellows and sisters. In the future, I will carry these insights forward to delve deeper into my studies and keep exploring and forging ahead in the interdisciplinary field.