Job Dekker has been pioneering 3-D mapping technology since 2002.
Maps help us arrive at our destination quickly and safety, navigating us away from dead-ends and potential risks. But these guides—whether folded-up under our car seats or programmed into our dashboard GPS systems—actually tell us little about the 3-D world. But if those maps are supplemented with snapshots and 3-D architectural models, a clearer picture begins to emerge. The same is true for the human genome.
Ten years after researchers published the first full sequence of the human genome, we still know relatively little about its structure and biology within the cell. Sequencing produces a linear representation of ordered base pairs, but our genomes regulate gene expression in three dimensions.
Genome structure, which plays a significant role in gene expression, factors into a host of human diseases. Dysfunctional genomic expression can lead to disorders such as asthma, diabetes, mental retardation, heart disease, and cancers. Some of these disorders may be caused by dysregulated regulatory genes that could prevent proper functioning of a gene or promote expression of a mutated gene. Techniques to map the topology of the interactome will allow researchers to analyze cell populations and predict events such as disease onset.
“There are things that are far apart in a linear chromosome that like to be in the same 3-D neighborhood,” says Job Dekker, associate professor at University of Massachusetts Medical School. Without tools to understand how and where these gene neighborhoods are constructed, many questions about disease and development remain unanswered.
To find new answers about genome structure and biology, Dekker developed a technique to study the interaction of two loci in 3-D. Now, he and others are busy upgrading the method to take a more comprehensive look at these gene neighborhoods to see how they interact with one another. This information will provide a deeper understanding of gene expression and how it is affected in chromosome-related diseases.
Caught in a genomic web
In 2002, Dekker was thinking about long-distance relationships. He was a post-doctorial fellow in Nancy Kleckner’s lab in the Department of Molecular and Cellular Biology at Harvard University, and had become smitten with studies demonstrating that genes are regulated by functional elements located far away on the chromosome. Some research even found that genes could be regulated by elements on another chromosome altogether. Dekker wanted to know how these long-distance relationships worked.
To find out, Dekker needed to know how these elements interacted in the nucleus, not just in the sequence of A, G, C, and T. Instead of extracting and unwinding the DNA for sequencing experiments, Dekker needed a way to hold together two interacting DNA loci from the genome to understand their spatial orientation. “I thought if I can just glue together things that are touching each other and then figure out which two parts are glued together, I can discover which pieces of DNA are important for contact,” recalls Dekker.
Formaldehyde was the answer. When Dekker introduced formaldehyde to the cell nucleus, the DNA strands stuck to each other. It was like capturing the genome in “Spider-Man’s web,” says Erez Lieberman-Aiden, one of Dekker’s collaborators. Restriction enzymes specific to the two targeted loci then digested the chromatin, leaving only the formaldehyde-glued fragments to be ligated. Dekker’s team then used PCR to analyze these ligated DNA fragment pairs to understand the interactions of these two known loci.
Dekker published the method—called chromatin conformation capture (3C)—in Science later that year (1). In that paper, Dekker and colleagues used 3C in the yeast Saccharomyces cerevisiae to confirm known features of the nucleus and track changes in the genome structure during cell division. For the first time, researchers had access to a technique that provided them with spatial information about how the genome is organized in the nucleus in 3-D. The paper has been cited more than 400 times, and has provided scientists with a snapshot of how elements at distant locations within the genomic sequence influence each other. It’s because these interacting elements are actually neighbors in the 3-D genome structure.
But the technique does have limitations. For example, to study the interaction between two loci using 3C, researchers need to know the sequence of both sites of interest. It is impossible to investigate how a particular gene interacts with unknown parts of the genome. To address this, Dekker’s team—as well as other researchers—have worked to expand and adapt the original method.
Rolf Ohlsson, professor at the Karolinska Institute’s Department of Microbiology, Tumor, and Cell Biology, developed circular chromosome conformation capture (4C) in 2006. In the 4C approach, the known DNA fragment is first crosslinked to its unknown interaction partners in the genome. Then, the ends of the known DNA fragment are ligated to the ends of the unknown fragment it is crosslinked to. Upon reversal of the crosslinking, this forms a circular piece of DNA. The unknown fragment of this circular DNA can then be PCR-amplified using nested primers located in the surrounding known DNA fragment, and sequenced (2).
Meanwhile, Dekker was working on a high-throughput 3C method—called carbon-copy chromosome conformation capture (5C)—to observe the interactions of multiple loci at a single site. The approach detects multiple interactions with microarray or deep sequencing using multiplex ligation-mediated amplification. 5C enabled the analysis of interactions along the same chromosome and between chromosomes (3).
“Which method you use depends very much on what you’re studying,” says Dekker.
But while researchers could use 3C-related techniques to study particular interactions between neighboring chromatin fibers at high resolutions on the kilobase scale, the field was still missing a technique to study genome-wide interactions.
Looking at the big picture
In 2008, Lieberman-Aiden heard a researcher talk about the several months of work it took to prove that one particular sequence was proximal to another sequence in the 3-D genome structure. There has to be a better way, thought Lieberman-Aiden, who was at the time a candidate in a joint Harvard University–Massachusetts Institute of Technology Ph.D. program in applied mathematics and bioengineering.
“I didn’t know if there was anything to do in this area because folks were already doing high-throughput sequencing, he says, “but it still felt like there could be further improvement.”
So Lieberman-Aiden set out to create a technique to capture large sections of the human genome in 3-D, under the guidance of Dekker and MIT professor of biology Eric Lander. He spent months tinkering with 3C. Much of the process remained the same: the genome was still glued together with formaldehyde, restriction enzymes still digested the chromatin, and the DNA was still ligated.
But the major difference was that the restriction enzyme was designed to leave a four-base overhang on the 5′ end of the DNA fragments. DNA polymerase was then used to end-fill the 5’ overhang with a biotinylated nucleotide as a label. After the DNA fragments were sheared, streptavidin beads—which have an affinity for biotin—could pull down the biotin-labeled fragments (that is, the formaldehyde-glued pieces that neighbored each other in the 3-D genome) for sequencing analysis.
The technique provides a comprehensive list of all chromatin interactions in the genome, theoretically allowing researchers to piece together a model of the genome structure from these data points. “One picture won’t tell you much, but when you have 30 or 50 million pictures, all of a sudden you can reconstruct how the genome is folded,” says Lieberman-Aiden.
In 2009, Lieberman-Aiden, Dekker, Lander, and their colleagues published a paper in Science that described the technique. Using the so-called Hi-C method, Lieberman-Aiden and his team discovered two intriguing bits of information: the genome is compartmentalized, and the genome is a fractal globule.
The genome is organized into two regions, active chromatin and inactive chromatin. This allows the active chromatin to be easily accessible while the inactive is hidden away. This organization provides an efficient way to determine which parts of the chromosome are active. “I view it like water and oil in motion,” says Dekker. “You can shake it up and you’ll have a more cloudy solution, but you really only have two states: water and oil.”
A fractal globule is the concept that Lieberman-Aiden uses to describe how two meters of DNA is folded into a tight, dense, but unknotted structure. Without knots, the genome can easily unfold, proving access to different genes without expending excess energy untangling or searching for relevant sections. “It explains why if I take a pair of headphones, which is a couple of feet long, and put it in my fairly roomy pocket and then take it out, it’s knotted and hard to use,” he explains. “Whereas, if you take the human genome and compress it into a six micron–wide nucleus, somehow the cell is able to use it.”
Although Hi-C provides a high-throughput view of the genome interactions, it only provides a low-resolution image. Hi-C provides 1-Mb resolution, whereas 3C techniques provide 1-kb resolution or better. 1-Mb resolution does not provide enough detail to be applicable to most 3-D chromatin interaction studies.
But Dekker and Liberman-Aiden are both working to increase the resolution for Hi-C, hoping to publish an improved technique within the next few years. “If we push the resolution for Hi-C, we’ll be able to see which regulatory elements are touching which genes for the whole human genome, or any genome,” says Dekker. “That’s the next thing we should do.”
A different lens
Ohlsson agrees with that goal. And his lab is already busy building a network of chromatin interactions that will provide a high-resolution image of the entire genome. This summer, Ohlsson hopes to publish what will be the first chromosomal interactome. This chromosomal interactome could have a huge impact on disease research, adding a new tool to more thoroughly understand how diseases emerge and how to develop effective treatments. “It will be really mind-blowing,” he says.
But he’s using different means to build his network. All 3C-related techniques currently provide limited views of the genome architecture, Ohlsson points out. 3C focuses only on two loci, and 4C provides an extremely specific perspective. While 5C provides a relatively high resolution for multiple interactions, data analysis requires significant computing power and could drag on for years, making it unfeasible for interactome-mapping applications. And Hi-C is still low resolution.
While a high-resolution Hi-C technique could advance the field, Ohlsson believes hybrid techniques are more promising. His lab is combining different 3C-related methods so that they complement one another. “The field will really explode,” he says, when such hybrids are available, “because [now] there are limitations to the way the techniques work and what you can say about the results. But that’s going to change a lot.”
To build the chromosomal interactome, Ohlsson’s team is developing a hybrid 4C technique. He hopes the hybrid 4C technique will be able to provide a high-throughput high-resolution image of the genome structure without constraining data analysis. This technique could easily identify diseased cells, and specifically, cancer cells. “Our new technique will allow us to walk along the nodes of the interactome. We will be able to look at these cells in very high resolution,” he says. “Once we can identify individual cells, then we can verify whether they are tumor cells.”
Hi-C images revealed the fractal globule, shown above.
In 2009, Lieberman-Aiden, Dekker, Lander, and their colleagues published a paper in Science that described the technique. Using the so-called Hi-C method, Lieberman-Aiden and his team discovered two intriguing bits of information: the genome is compartmentalized, and the genome is a fractal globule.
The genome is organized into two regions, active chromatin and inactive chromatin. This allows the active chromatin to be easily accessible while the inactive is hidden away. This organization provides an efficient way to determine which parts of the chromosome are active. “I view it like water and oil in motion,” says Dekker. “You can shake it up and you’ll have a more cloudy solution, but you really only have two states: water and oil.”
A fractal globule is the concept that Lieberman-Aiden uses to describe how two meters of DNA is folded into a tight, dense, but unknotted structure. Without knots, the genome can easily unfold, proving access to different genes without expending excess energy untangling or searching for relevant sections. “It explains why if I take a pair of headphones, which is a couple of feet long, and put it in my fairly roomy pocket and then take it out, it’s knotted and hard to use,” he explains. “Whereas, if you take the human genome and compress it into a six micron–wide nucleus, somehow the cell is able to use it.”
Although Hi-C provides a high-throughput view of the genome interactions, it only provides a low-resolution image. Hi-C provides 1-Mb resolution, whereas 3C techniques provide 1-kb resolution or better. 1-Mb resolution does not provide enough detail to be applicable to most 3-D chromatin interaction studies.
But Dekker and Liberman-Aiden are both working to increase the resolution for Hi-C, hoping to publish an improved technique within the next few years. “If we push the resolution for Hi-C, we’ll be able to see which regulatory elements are touching which genes for the whole human genome, or any genome,” says Dekker. “That’s the next thing we should do.”
A different lens
Ohlsson agrees with that goal. And his lab is already busy building a network of chromatin interactions that will provide a high-resolution image of the entire genome. This summer, Ohlsson hopes to publish what will be the first chromosomal interactome. This chromosomal interactome could have a huge impact on disease research, adding a new tool to more thoroughly understand how diseases emerge and how to develop effective treatments. “It will be really mind-blowing,” he says.
But he’s using different means to build his network. All 3C-related techniques currently provide limited views of the genome architecture, Ohlsson points out. 3C focuses only on two loci, and 4C provides an extremely specific perspective. While 5C provides a relatively high resolution for multiple interactions, data analysis requires significant computing power and could drag on for years, making it unfeasible for interactome-mapping applications. And Hi-C is still low resolution.
While a high-resolution Hi-C technique could advance the field, Ohlsson believes hybrid techniques are more promising. His lab is combining different 3C-related methods so that they complement one another. “The field will really explode,” he says, when such hybrids are available, “because [now] there are limitations to the way the techniques work and what you can say about the results. But that’s going to change a lot.”
To build the chromosomal interactome, Ohlsson’s team is developing a hybrid 4C technique. He hopes the hybrid 4C technique will be able to provide a high-throughput high-resolution image of the genome structure without constraining data analysis. This technique could easily identify diseased cells, and specifically, cancer cells. “Our new technique will allow us to walk along the nodes of the interactome. We will be able to look at these cells in very high resolution,” he says. “Once we can identify individual cells, then we can verify whether they are tumor cells.”
Ohlsson is working to create a comprehensive interactome map, like the one shown above. The ultimate goal is to map the 3-D genome of individual patients, providing doctors with more information to prevent disease and create personalized treatments. “In terms of clinical application, it’s going to come,” says Ohlsson. “It must come, but we’re still some distance away.”