About me

Welcome! I am a Ph.D. candidate in the Department of Industrial and Operations Engineering at the University of Michigan, advised by Dr. Raed Al Kontar. My research is driven by the need for novel statistical and optimization methodologies addressing scientific and engineering challenges across diverse domains, including distributed data ecosystems, Digital Twins, smart manufacturing, and spatial transcriptomics. I am also interested in investigating the theoretical underpinnings of these methods. In particular, my current research in personalized, collaborative, and decentralized data analytics explores computational techniques to integrate knowledge from multiple sources and builds tailored machine-learning models.  

I am looking for academic positions!
I am the winner of the 2024 Best General Paper Competition in INFORMS Data Mining Society!

Featured papers

A few topics of my research are introduced below.


Personalized PCA: Decoupling Shared and Unique Features Naichen Shi, Raed Al Kontar. Journal of Machine Learning Research (JMLR), 2024. Link, Video, Code.

When data are collected from multiple related but heterogeneous sources, how can we efficiently integrate the common information among these sources? How to describe and make use of the unique feature in each source? We take advantage of the feature extraction power of principal component analysis (PCA). More specifically, we use global principal components (PCs) to model the common information and local principal components to capture the unique information.  

  Personalized PCA

Identifiability is a key hurdle as the global PCs can be confounded with local PCs in data. We propose a misalignment condition that measures the "smallest difference" among the subspaces spanned by global PCs. The condition helps us establish an upper bound on the statistical error of the global and local PCs, which almost matches their lower bound. Intriguingly, the results suggest that a higher level of heterogeneity can decrease the statistical error in our method, a benefit of personalization.  
Despite the simplicity, Personalized PCA and its derivatives have proven valuable in a variety of fields, including additive manufacturing, solar flare detection, and spatial transcriptomics analysis.

  3D printing   Spatial transcriptomics   Solar flare

An article from Phys.org reports this method.

Personalized Federated Learning via Domain Adaptation with an Application to Distributed 3D Printing Naichen Shi, Raed Al Kontar. Technometrics, 2023. Link, Code.

Federated learning (FL) uses data from multiple clients to collectively train predictive models. Challenges arise when marginal distributions on the input are heterogeneous among clients. In such scenarios, training predictive models with existing methods is uneasy as the input distributions can differ and even be non-overlapping. To address these challenges, we propose a method based on domain adaptation that firstly maps the input into a common feature space, then predict the outputs from the features. We also use a bi-lavel optimization to optimize the feature encoder and decoder. In an application of material extrusion 3D printing, our method demonstrates improved accuracy and robustness predictive performances.

  3D printing examples


Here is a more comprehensive list of publications. You can also check my Google scholar profile.

News

  • September 2024: Our paper, “Multi-physics Simulation Guided Generative Diffusion Models with Applications in Fluid and Heat Dynamics,” is selected as the finalist for the QSR best paper competition in INFORMS, 2024!

  • September 2024: Our paper, “Triple Component Matrix Factorization: Untangling Global, Local, and Noisy Components,” is selected as the winner for the Data Mining best paper competition in INFORMS, 2024!

  • July 2024: I presented our paper, “Multi-physics Simulation Guided Generative Diffusion Models with Applications in Fluid and Heat Dynamics”, at ICQSR 2024, in Como, Italy.

  • June 2024: Our paper, “Triple Component Matrix Factorization: Untangling Global, Local, and Noisy Components”, won the Wilson prize!

  • October 2023: Our paper, “Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data”, is selected as the finalist of the INFORMS 2023 QSR best refereed paper competition!

  • October 2023: Our paper, “Heterogeneous Matrix Factorization: When features differ by datasets”, is selected as the finalist of the INFORMS 2023 best student paper competition!

  • July 2023: I am selected as the instructor of the small course of IOE 202 Operations Engineering and Analytics!

  • June 2023: I presented at ICQSR 2023 on the topic of heterogeneous matrix factorization!