Humans are good at looking at pictures and finding patterns or making comparisons. For example, look at a collection of dog photos and you can sort them by color, ear size, face shape, etc. But could you compare them quantitatively? And perhaps even more intriguing: could a machine extract meaningful information from images that humans cannot?
Now a team of scientists at Stanford University’s Chan Zuckerberg Biohub has developed a machine learning method to quantitatively analyze and compare images – in this case, microscopy images of proteins – without any prior knowledge. As reported in natural methods, their algorithm, dubbed “cytoself,” provides comprehensive, detailed information about the location and function of proteins within a cell. This ability could reduce research time for cell biologists and eventually be used to speed up the drug discovery and drug screening process.
“This is very exciting — we’re applying AI to a new type of problem and we’re still recovering everything humans know and then some,” said Loic Royer, co-corresponding author of the study. “In the future, we could do that for different types of images. That opens up many possibilities.”
In addition to demonstrating the power of machine learning algorithms, Cytoself has generated insights into cells, the basic building blocks of life, and into proteins, the molecular building blocks of cells. Every cell contains about 10,000 different types of proteins – some work alone, many work together and do different jobs in different parts of the cell to keep it healthy. “A cell is spatially much better organized than we previously thought. This is an important biological finding about how the human cell is wired,” said Manuel Leonetti, also a co-author of the study.
And like all tools developed at CZ Biohub, Cytoself is open source and accessible to all. “We hope it will inspire many people to use similar algorithms to solve their own image analysis problems,” Leonetti said.
Not to mention a PhD, machines can learn for themselves
Cytoself is an example of what is known as self-supervised learning, which means the human doesn’t teach the algorithm about the protein images like they do in supervised learning. “In supervised learning, you have to teach the machine one by one with examples, which is a lot of work and very tedious,” says Hirofumi Kobayashi, first author of the study. And when the machine is limited to the categories humans teach it, it can introduce bias into the system.
“Manu [Leonetti] believed the information was already in the images,” Kobayashi said. “We wanted to see what the machine can figure out on its own.”
In fact, the team, which included CZ Biohub software engineer Keith Cheveralls, was surprised at how much information the algorithm was able to extract from the images.
“The level of detail in protein localization was much higher than we would have thought,” said Leonetti, whose group is developing tools and technologies to understand cell architecture. “The machine converts each protein image into a mathematical vector. Then you can start sorting images that look the same. We found that in this way we can predict proteins working together in the cell with high accuracy simply by comparing their images, which was kind of surprising.”
The first of its kind
While there has been some previous work on protein images using either self-supervised or unsupervised models, never before has self-supervised learning been used so successfully with such a large data set of over 1 million images covering over 1,300 proteins measured from living human cells, he told Kobayashi, an expert in machine learning and high-speed imaging.
The images were a product of CZ Biohub’s OpenCell, a project led by Leonetti to create a complete map of the human cell, including characterization of the approximately 20,000 types of proteins that power our cells. Released earlier this year in Science were the first 1,310 proteins they characterized, including images of each protein (generated using some type of fluorescent label) and maps of their interactions with each other.
Cytoself was key to OpenCell’s success (all images available at opencell.czbiohub.org) and provided very detailed and quantitative information on protein localization.
“The question of all the ways that a protein can localize in a cell – to all the places it can be and to all possible combinations of locations – is fundamental,” Royer said. “For decades, biologists have tried to determine all possible locations and all possible structures within a cell. But that was always done by people looking at the data. The question is how much human limitations and prejudices have rendered this process imperfect?”
Royer added: “As we have shown, machines are better at this than humans. You can find finer categories and see differences in the images that are extremely subtle.”
The team’s next goal for Cytoself is to pursue how small changes in protein localization can be used to recognize different cell states, such as a normal cell versus a cancer cell. This could be the key to a better understanding of many diseases and facilitate drug discovery.
“Drug screening is basically trial and error,” Kobayashi said. “But with Cytoself, this is a big leap because you don’t have to do experiments individually with thousands of proteins. It’s an inexpensive method that could significantly increase the speed of research.”
The AI program accurately predicts protein localization
Hirofumi Kobayashi et al, Self-supervised deep learning encodes high-resolution features of subcellular localization of proteins, natural methods (2022). DOI: 10.1038/s41592-022-01541-z
Provided by Stanford University
Citation: AI Can Reveal New Cell Biology Just by Looking at Pictures (2022, August 1) Retrieved August 1, 2022 from https://phys.org/news/2022-08-ai-reveal-cell-biology- images.html
This document is protected by copyright. Except for fair trade for the purpose of private study or research, no part may be reproduced without written permission. The content is for informational purposes only.
#reveal #cell #biology #pictures
Leave a Comment