Daniel J. B. Clarke
Email: danieljbclarke@gmail.com
Voicemail: 201-357-7356
Website: danieljbclarke.github.io
SUMMARY
My name is Daniel J. B. Clarke. I’m an open source enthusiast and data scientist currently working in the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai. I received a BS in Electrical Engineering and a MS in Computer Engineering in May 2017 from Fairleigh Dickinson University. Since then, I’ve been building and maintaining bioinformatics web applications and conducting bioinformatics research mostly around open source tools, web applications, and accessible data. I’ve been applying standard machine learning approaches for the purpose of knowledge inference and more recently biomarker identification.
Recent advances in machine learning have renewed my strong interest in the field, prompting me to educate myself on methods in deep learning and dabble in zero-shot learning for functional prediction using multi-omics data, parametric dimensionality reduction for updatable TSNE or UMAP visualizations of genomic data, graph neural networks for predictions on knowledge graphs and other directions. After spending time exploring the landscape of deep learning, I find myself particularly intrigued and excited about Energy Based Models and Reinforcement Learning. I hope to persue a PhD in the area and make meaningful contributions to the field.
EDUCATION
MS Computer Engineering, Fairleigh Dickinson University, Teaneck NJ
Spring 2017
BS Electrical Engineering, Minor in Computer Science & Mathematics, Fairleigh Dickinson University, Teaneck NJ
Magna Cum Laude, Global Scholars, Spring 2017
EMPLOYMENT
PRESENT
Data Science Analyst, Ma’ayan Laboratory of Computational Systems Biology, Icahn School of Medicine at Mount Sinai in New York
February 2018 - Present
- Conduct research, develop, & maintain biomedical software applications and infrastructure as part of several NIH-funded projects including: Data Commons, BD2K-LINCS DCIC, IDG, CFDE, CPTAC
- Develop & publish bioinformatic web applications including: playbook-workflow-builder.cloud, appyters.maayanlab.cloud, rummagene.com maayanlab.cloud/X2K/, cfde-gene-pages.cloud, fairshake.cloud, maayanlab.cloud/covid19 maayanlab.cloud/sigcom-lincs/, targetranger.maayanlab.cloud
- Develop & publish bioinformatic software packages including: appyters, maayanlab-bioinformatics, react-scatter-board, signature-commons
- Conduct research and perform data analysis on several projects including: Predicting Drug Toxicity in Pregnancy, Post Treatment Lyme Disease Biomarkers, Anti-Viral compounds for COVID-19
- Migrated public & heavily used bioinformatic software applications & databases to AWS managed Kubernetes, MariaDB, and S3 from an on premise set of machines
- Maintain hundreds of legacy and actively developed bioinformatics web applications spanning several locations (cloud/on prem) and providers (aws, google cloud)
- Collaborate with other individuals at other instituitions from diverse backgrounds across the country on shared research projects
- Mentor summer student researchers every summer, conduct workshops, and give lectures on bioinformatics & machine learning topics
PREVIOUS
Cyberlab Research and Development, Center for Cybersecurity and Information Assurance, Fairleigh Dickinson University, Teaneck NJ
Summer 2014, Fall 2017 - December 2017
- Conducted research in planning and implementing a virtual Cyber Defense and Forensics Laboratory
- Incorporated concepts from cybersecurity, embedded systems, and IoT in developed labs
Student Tutor, Fairleigh Dickinson University, Teaneck NJ
Fall 2016 - Spring 2017
- Available as tutor for every class I’d ever taken
- Typically tutored higher level math/engineering courses including but not limited to:
- Calculus 2 & 3, Signals and Systems I & II, Physics I & II, Electronics II, & III
BD2K-LINCS Summer Research in Biomedical Big Data Science, Ma’ayan Laboratory of Computational Systems Biology, Icahn School of Medicine at Mount Sinai in New York
Summer 2016
- Built python-based web apps for biomedical applications
- Applied big data analytics to make novel predictions with large genomics datasets
- Work publicly available: adhesome.org github.com/MaayanLab/adhesome github.com/MaayanLab/GenesToWordCloud
Student Worker, Grants and Sponsored Projects, Fairleigh Dickinson University, Teaneck NJ
Summer 2015 - Summer 2017
- Created a data acquisition and reformatting pipeline for cybersecurity and grants websites
- Assisted with Annual Cybersecurity Symposiums and NSA National Centers of Academic Excellence in Information Assurance/Cyber Defense designation
Intern, NIKSUN, Inc, Princeton NJ
Summer 2013
- Re-engineered existing proprietary security application interfaces for extended use cases
- Modified embedded device system firmware
- Assisted front-end developers by shaping a backend API to meet application requirements
AWARDS AND HONORS
- 1st Place Winner: IEEE Region 1 Student Paper Competition 2017
- 1st Place Winner: FDU IEEE Local Student Paper Competition 2017
- BD2K-LINCS Data Coordination and Integration Center Summer Research Training Fellowship 2016
- Radio Club of America Scholarship 2016
- Outstanding Poster Award: LSAMP Research Conference 2016
- 1st Place Winner: IEEE Region 1 Student Ethics Competition 2015
- Editor and Writer, FDU Equinox: Student Newspaper 2015 - 2017
- 1st Place Winner: FDU Cybersecurity Symposium Poster Competition 2014
- President, FDU Green Team: Campus Environmental Advocacy Club 2014 - 2016
- 15th Place Winner: NJ Governors Cyber Challenge 2013
- IEEEXtreme Competitor: Team Marshmallow 2012 - 2016
PUBLICATIONS (ORCID 0000-0003-3471-7416 )
Clarke, D. J. B., Marino, G. B., Deng, E. Z., Xie, Z., Evangelista, J. E., & Ma’ayan, A. (2023). Rummagene: Mining Gene Sets from Supporting Materials of PMC Publications. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2023.10.03.560783 (Preprint)
Evangelista, J. E., Clarke, D. J. B., Xie, Z., Marino, G. B., Utti, V., Jenkins, S. L., Ahooyi, T. M., Bologa, C. G., Yang, J. J., Binder, J. L., Kumar, P., Lambert, C. G., Grethe, J. S., Wenger, E., Taylor, D., Oprea, T. I., de Bono, B., & Ma’ayan, A. (2023). Toxicology knowledge graph for structural birth defects. In Communications Medicine (Vol. 3, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s43856-023-00329-2
Deng, E. Z., Fleishman, R. H., Xie, Z., Marino, G. B., Clarke, D. J. B., & Ma’ayan, A. (2023). Computational screen to identify potential targets for immunotherapeutic identification and removal of senescence cells. In Aging Cell (Vol. 22, Issue 6). Wiley. https://doi.org/10.1111/acel.13809
Evangelista, J. E., Xie, Z., Marino, G. B., Nguyen, N., Clarke, D. J. B., & Ma’ayan, A. (2023). Enrichr-KG: bridging enrichment analysis across multiple libraries. In Nucleic Acids Research (Vol. 51, Issue W1, pp. W168–W179). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkad393
Marino, G. B., Ngai, M., Clarke, D. J. B., Fleishman, R. H., Deng, E. Z., Xie, Z., Ahmed, N., & Ma’ayan, A. (2023). GeneRanger and TargetRanger: processed gene and protein expression levels across cells and tissues for target discovery. In Nucleic Acids Research (Vol. 51, Issue W1, pp. W213–W224). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkad399
Marino, G. B., Wojciechowicz, M. L., Clarke, D. J. B., Kuleshov, M. V., Xie, Z., Jeon, M., Lachmann, A., & Ma’ayan, A. (2023). lncHUB2: aggregated and inferred knowledge about human and mouse lncRNAs. In Database (Vol. 2023). Oxford University Press (OUP). https://doi.org/10.1093/database/baad009
Lachmann, A., Rizzo, K. A., Bartal, A., Jeon, M., Clarke, D. J. B., & Ma’ayan, A. (2023). PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices. In PeerJ (Vol. 11, p. e14927). PeerJ. https://doi.org/10.7717/peerj.14927
Jeon, M., Xie, Z., Evangelista, J. E., Wojciechowicz, M. L., Clarke, D. J. B., & Ma’ayan, A. (2022). Transforming L1000 profiles to RNA-seq-like profiles with deep learning. In BMC Bioinformatics (Vol. 23, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1186/s12859-022-04895-5
Kropiwnicki, E., Lachmann, A., Clarke, D. J. B., Xie, Z., Jagodnik, K. M., & Ma’ayan, A. (2022). DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules. In BMC Bioinformatics (Vol. 23, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1186/s12859-022-04590-5
Evangelista, J. E., Clarke, D. J. B., Xie, Z., Lachmann, A., Jeon, M., Chen, K., Jagodnik, K. M., Jenkins, S. L., Kuleshov, M. V., Wojciechowicz, M. L., Schürer, S. C., Medvedovic, M., & Ma’ayan, A. (2022). SigCom LINCS: data and metadata search engine for a million gene expression signatures. In Nucleic Acids Research (Vol. 50, Issue W1, pp. W697–W709). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkac328
Clarke, D. J. B., Kuleshov, M. V., Xie, Z., Evangelista, J. E., Meyers, M. R., Kropiwnicki, E., Jenkins, S. L., & Ma’ayan, A. (2022). Gene and drug landing page aggregator. In S. Forslund (Ed.), Bioinformatics Advances (Vol. 2, Issue 1). Oxford University Press (OUP). https://doi.org/10.1093/bioadv/vbac013
Charbonneau, A. L., Brady, A., Czajkowski, K., Aluvathingal, J., Canchi, S., Carter, R., Chard, K., Clarke, D. J. B., Crabtree, J., Creasy, H. H., D’Arcy, M., Felix, V., Giglio, M., Gingrich, A., Harris, R. M., Hodges, T. K., Ifeonu, O., Jeon, M., Kropiwnicki, E., … White, O. (2022). Making Common Fund data more findable: catalyzing a data ecosystem. In GigaScience (Vol. 11). Oxford University Press (OUP). https://doi.org/10.1093/gigascience/giac105
Clarke, D. J. B., Jeon, M., Stein, D. J., Moiseyev, N., Kropiwnicki, E., Dai, C., Xie, Z., Wojciechowicz, M. L., Litz, S., Hom, J., Evangelista, J. E., Goldman, L., Zhang, S., Yoon, C., Ahamed, T., Bhuiyan, S., Cheng, M., Karam, J., Jagodnik, K. M., … Ma’ayan, A. (2021). Appyters: Turning Jupyter Notebooks into data-driven web apps. Patterns, 2(3), 100213. https://doi.org/10.1016/j.patter.2021.100213
Clarke, D. J. B., Rebman, A. W., Bailey, A., Wojciechowicz, M. L., Jenkins, S. L., Evangelista, J. E., Danieletto, M., Fan, J., Eshoo, M. W., Mosel, M. R., Robinson, W., Ramadoss, N., Bobe, J., Soloski, M. J., Aucott, J. N., & Ma’ayan, A. (2021). Predicting Lyme Disease From Patients’ Peripheral Blood Mononuclear Cells Profiled With RNA-Sequencing. Frontiers in Immunology, 12. https://doi.org/10.3389/fimmu.2021.636289
Kropiwnicki, E., Evangelista, J. E., Stein, D. J., Clarke, D. J. B., Lachmann, A., Kuleshov, M. V., Jeon, M., Jagodnik, K. M., & Ma’ayan, A. (2021). Drugmonizome and Drugmonizome-ML: integration and abstraction of small molecule attributes for drug enrichment analysis and machine learning. In Database (Vol. 2021). Oxford University Press (OUP). https://doi.org/10.1093/database/baab017
Kuleshov, M. V., Stein, D. J., Clarke, D. J. B., Kropiwnicki, E., Jagodnik, K. M., Bartal, A., Evangelista, J. E., Hom, J., Cheng, M., Bailey, A., Zhou, A., Ferguson, L. B., Lachmann, A., & Ma’ayan, A. (2020). The COVID-19 Drug and Gene Set Library. Patterns, 1(6), 100090. https://doi.org/10.1016/j.patter.2020.100090
Hoagland, D. A., Clarke, D. J. B., Møller, R., Han, Y., Yang, L., Wojciechowicz, M. L., Lachmann, A., Oguntuyo, K. Y., Stevens, C., Lee, B., Chen, S., Ma’ayan, A., & tenOever, B. R. (2020). Modulating the transcriptional landscape of SARS-CoV-2 as an effective method for developing antiviral compounds. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2020.07.12.199687 (Preprint)
Rao, A. R., & Clarke, D. (2020). Perspectives on emerging directions in using IoT devices in blockchain applications. Internet of Things, 10, 100079. https://doi.org/10.1016/j.iot.2019.100079
Clarke, D. J. B., Wang, L., Jones, A., Wojciechowicz, M. L., Torre, D., Jagodnik, K. M., Jenkins, S. L., McQuilton, P., Flamholz, Z., Silverstein, M. C., Schilder, B. M., Robasky, K., Castillo, C., Idaszak, R., Ahalt, S. C., Williams, J., Schurer, S., Cooper, D. J., de Miranda Azevedo, R., … Ma’ayan, A. (2019). FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Systems, 9(5), 417–421. https://doi.org/10.1016/j.cels.2019.09.011
Rao, A.R., Clarke, D. Exploring relationships between medical college rankings and performance with big data. Big Data Anal 4, 3 (2019). https://doi.org/10.1186/s41044-019-0040-9
Clarke, D. J. B., Kuleshov, M. V., Schilder, B. M., Torre, D., Duffy, M. E., Keenan, A. B., Lachmann, A., Feldmann, A. S., Gundersen, G. W., Silverstein, M. C., Wang, Z., & Ma’ayan, A. (2018). eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Research, 46(W1), W171–W179. https://doi.org/10.1093/nar/gky458
A. R. Rao, D. Clarke, M. Bhdiyadra and S. Phadke, “Development of an embedded system course to teach the Internet-of-Things,” 2018 IEEE Integrated STEM Education Conference (ISEC), Princeton, NJ, 2018, pp. 154-160. https://doi.org/10.1109/ISECon.2018.8340468
A. R. Rao, S. Garai, D. Clarke and S. Dey, “A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data,” 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8, https://doi.org/10.1109/IJCNN.2018.8489448
A. R. Rao and D. Clarke, “A comparison of models to predict medical procedure costs from open public healthcare data,” 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8. https://doi.org/10.1109/IJCNN.2018.8489257
A. Ravishankar Rao & Daniel Clarke (2018) Hiding in Plain Sight: Insights about Health-Care Trends Gained through Open Health Data, Journal of Technology in Human Services, 36:1, 48-55, https://doi.org/10.1080/15228835.2017.1416515
Ravishankar Rao A., Clarke D. (2018) Facilitating the Exploration of Open Health-Care Data Through BOAT: A Big Data Open Source Analytics Tool. In: Tadj L., Garg A. (eds) Emerging Challenges in Business, Optimization, Technology, and Industry. Springer Proceedings in Business and Economics. Springer, Cham, https://doi.org/10.1007/978-3-319-58589-5_7
A. R. Rao and D. Clarke, “An open-source framework for the interactive exploration of Big Data: Applications in understanding health care,” 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 1641-1648, https://doi.org/10.1109/IJCNN.2017.7966048
A. R. Rao and D. Clarke, “A fully integrated open-source toolkit for mining healthcare big-data: architecture and applications,” 2016 IEEE International Conference on Healthcare Informatics (ICHI), Chicago, IL, 2016, pp. 255-261, https://doi.org/10.1109/ICHI.2016.35
Nandikotkur, G., Gomez, D., Dovale, J., Clarke, D., Komstead, K., Shah, R., & Aboasu, S. (2016). A Spectral Variability Study Using the Entire FERMI Data from the Blazar 3C 454.3. In American Astronomical Society Meeting Abstracts #228 (pp. 314.10). http://adsabs.harvard.edu/abs/2016AAS...22831410N
TECHNICAL SKILLS AND HOBBIES
Baka MPlayer, u8sand.github.io/Baka-MPlayer
Summer 2014 - Winter 2017
- Lead programmer, maintainer, and manager
- Contributed to dependent projects including mpv and qt
- Collaborated with UX designer and open source community
Amateur Radio License, Technician, Call sign: KD2IQK
Spring 2015
Open Source Software Development github.com/u8sand
2007 - Current
- Gained expertise in a substantial number of technologies including but not limited to:
- Python: LangChain, pandas, sklearn, tensorflow, huggingface, fastapi, django, selenium, scapy
- HTML/Javascript: SolidJS, Svelte, NextJS, React, GraphQL, Typescript, ThreeJS, d3
- DevOps: postgres, kubernetes, docker, vagrant, ansible, terraform, OpenAPI, CWL, flatcar
- Unix: nginx, awk, sed, rsync, rclone, iptables, bpf, git, perf, gdb, radare, jq, restic
- Rust: rocket, rayon, pyo3, wasm-bindgen
- C/C++: Qt, boost, win32, .NET, CMake, OGRE, DirectX
- Other: OpenAI, WebAssembly, Haskell, Jekyll, Latex