1 00:00:00,000 --> 00:00:01,000 2 00:00:01,000 --> 00:00:04,890 I'm sure many of y'all have already heard of the molecule 3 00:00:04,889 --> 00:00:08,980 DNA, and it stands for deoxyribonucleic acid. 4 00:00:08,980 --> 00:00:11,609 I wrote it out ahead of time to spare you the pain of 5 00:00:11,609 --> 00:00:13,739 watching me spell this in real time. 6 00:00:13,740 --> 00:00:15,859 But it is-- and I think you already have an idea. 7 00:00:15,859 --> 00:00:21,169 This is the basic unit of heredity, or it's what codes 8 00:00:21,170 --> 00:00:23,790 all of our genetic information. 9 00:00:23,789 --> 00:00:25,769 And what I want to do in this video-- because I think that's 10 00:00:25,769 --> 00:00:26,910 kind of common knowledge. 11 00:00:26,910 --> 00:00:30,240 That's popular knowledge that, oh, everything that makes my 12 00:00:30,239 --> 00:00:34,640 hair black or my eyes blue or whatever, that's all somehow 13 00:00:34,640 --> 00:00:36,329 encoded in our DNA. 14 00:00:36,329 --> 00:00:39,489 But what I want to do in this video is give you an idea of 15 00:00:39,490 --> 00:00:43,020 how something like DNA, a molecule, can actually code 16 00:00:43,020 --> 00:00:44,500 for what we are. 17 00:00:44,500 --> 00:00:48,390 How does the information, one, get stored in this type of a 18 00:00:48,390 --> 00:00:50,240 molecule, then how does that actually turn into the 19 00:00:50,240 --> 00:00:54,600 proteins that make up our enzymes and our organs and our 20 00:00:54,600 --> 00:00:58,770 brain cells and everything else that really make us us? 21 00:00:58,770 --> 00:01:04,400 So this is a computer graphics representation of DNA, and I'm 22 00:01:04,400 --> 00:01:06,859 sure many of y'all have heard of the double helix. 23 00:01:06,859 --> 00:01:11,400 24 00:01:11,400 --> 00:01:14,450 And that's in reference to the structure that DNA takes. 25 00:01:14,450 --> 00:01:16,859 And you can see here it's a double helix. 26 00:01:16,859 --> 00:01:20,560 As you can see here, you have two of these lines, and 27 00:01:20,560 --> 00:01:23,450 they're intertwined with each other. 28 00:01:23,450 --> 00:01:27,549 You see there, that's one of them, and then you see another 29 00:01:27,549 --> 00:01:30,759 one intertwined like that. 30 00:01:30,760 --> 00:01:33,190 And then they're connected by-- you can almost view it as 31 00:01:33,189 --> 00:01:38,780 like these bridges between the two helixes, and they twist 32 00:01:38,780 --> 00:01:40,579 around each other. 33 00:01:40,579 --> 00:01:41,670 I think you get the idea. 34 00:01:41,670 --> 00:01:44,950 So the double helix just describes the structure, the 35 00:01:44,950 --> 00:01:47,710 shape that DNA takes, and it leads to all sorts of 36 00:01:47,709 --> 00:01:50,799 interesting repercussions in terms of how heredity takes 37 00:01:50,799 --> 00:01:55,329 place and how natural selection and variation might 38 00:01:55,329 --> 00:01:56,340 take place as well. 39 00:01:56,340 --> 00:01:59,859 And actually, in the future, I do want to actually read with 40 00:01:59,859 --> 00:02:04,349 you Watson and Crick's paper on the double helix where they 41 00:02:04,349 --> 00:02:06,829 essentially talk about their discovery. 42 00:02:06,829 --> 00:02:08,669 The best thing about that paper, besides the fact that 43 00:02:08,669 --> 00:02:11,120 it was probably one of the biggest discoveries in the 44 00:02:11,120 --> 00:02:14,370 history of mankind, is that the paper is only a page and a 45 00:02:14,370 --> 00:02:17,950 half long, and it goes to my general view that if you have 46 00:02:17,949 --> 00:02:19,439 something good to say, it shouldn't take you 47 00:02:19,439 --> 00:02:20,509 that long to say it. 48 00:02:20,509 --> 00:02:22,879 But with that said, let's think a little bit about how 49 00:02:22,879 --> 00:02:26,599 this can actually generate the proteins and whatever else 50 00:02:26,599 --> 00:02:29,239 that make up all of us. 51 00:02:29,240 --> 00:02:34,290 So right here this is a zoomed-up version of that 52 00:02:34,289 --> 00:02:37,139 graphic that I just showed you a little bit earlier, and this 53 00:02:37,139 --> 00:02:38,619 is each of the helixes. 54 00:02:38,620 --> 00:02:44,740 So if this is the magenta side, if you unwound this 55 00:02:44,740 --> 00:02:47,530 helix-- right now it shows it in its wound state, but if I 56 00:02:47,530 --> 00:02:54,009 unwind this helix, one side would maybe be this magenta 57 00:02:54,009 --> 00:02:58,349 side of our helix and then one side is 58 00:02:58,349 --> 00:03:00,879 this green side, right? 59 00:03:00,879 --> 00:03:02,400 And if you twist it up, you get back to 60 00:03:02,400 --> 00:03:04,110 this drawing up here. 61 00:03:04,110 --> 00:03:06,650 And then these bridges that you see in this drawing in the 62 00:03:06,650 --> 00:03:10,870 double helix, those are these connections right here. 63 00:03:10,870 --> 00:03:12,120 These are the bridges. 64 00:03:12,120 --> 00:03:15,039 65 00:03:15,039 --> 00:03:21,409 Now, what allows us to code information is that the blocks 66 00:03:21,409 --> 00:03:25,030 that make up the bridges are made of different molecules. 67 00:03:25,030 --> 00:03:27,250 And the four different molecules that are made up in 68 00:03:27,250 --> 00:03:31,669 DNA are adenine-- and it's written here 69 00:03:31,669 --> 00:03:32,389 on this little chart. 70 00:03:32,389 --> 00:03:34,369 I got all of this from Wikipedia, so if you want more 71 00:03:34,370 --> 00:03:36,310 information I encourage you to go there. 72 00:03:36,310 --> 00:03:39,199 Adenine, that's up here. 73 00:03:39,199 --> 00:03:43,239 This is the molecular structure of adenine. 74 00:03:43,240 --> 00:03:47,270 It's connected to a sugar right here, ribose. 75 00:03:47,270 --> 00:03:49,900 I won't go into a deoxyribose. 76 00:03:49,900 --> 00:03:51,599 And then you have your phosphate group. 77 00:03:51,599 --> 00:03:55,840 But these kind of form the backbone of the DNA: the sugar 78 00:03:55,840 --> 00:03:56,770 and the phosphate groups. 79 00:03:56,770 --> 00:03:58,420 And I'm not going to go into the microbiology of it, 80 00:03:58,419 --> 00:04:01,089 because that's not important right now to understanding 81 00:04:01,090 --> 00:04:04,770 just how does this intuitively code for what we are. 82 00:04:04,770 --> 00:04:07,620 So along the backbone, which is identical, and 83 00:04:07,620 --> 00:04:08,240 we'll talk about it. 84 00:04:08,240 --> 00:04:09,550 They run in different directions. 85 00:04:09,550 --> 00:04:12,960 It's called antiparallel, so they label the ends. 86 00:04:12,960 --> 00:04:14,500 And I'm not going to go into detail there, but the 87 00:04:14,500 --> 00:04:17,540 important thing are these bases here. 88 00:04:17,540 --> 00:04:26,180 So you have adenine, and adenine pairs with thymine, 89 00:04:26,180 --> 00:04:27,634 and you see that up here. 90 00:04:27,634 --> 00:04:31,469 If you have an adenine molecule here, an adenine base 91 00:04:31,470 --> 00:04:33,550 here, it'll pair with thymine, and this is 92 00:04:33,550 --> 00:04:34,590 called the base pair. 93 00:04:34,589 --> 00:04:36,500 Adenine and thymine pair with each other. 94 00:04:36,500 --> 00:04:39,470 If you have thymine, it's going to pair with adenine. 95 00:04:39,470 --> 00:04:41,790 And then you have guanine and it pairs with cytosine. 96 00:04:41,790 --> 00:04:47,110 97 00:04:47,110 --> 00:04:49,480 And the names of these, you should know these names, just 98 00:04:49,480 --> 00:04:54,200 because they are almost-- well, if you ever enter any 99 00:04:54,199 --> 00:04:56,550 discussion about DNA and base pairs, 100 00:04:56,550 --> 00:04:59,310 this is expected knowledge. 101 00:04:59,310 --> 00:05:02,250 But the names of the molecules and how they're structured, 102 00:05:02,250 --> 00:05:03,839 not important just yet. 103 00:05:03,839 --> 00:05:06,289 But what's important is the fact that there are four of 104 00:05:06,290 --> 00:05:10,069 them and that they essentially code information. 105 00:05:10,069 --> 00:05:12,560 So you can view one of these strands in kind of a 106 00:05:12,560 --> 00:05:13,410 simplified way. 107 00:05:13,410 --> 00:05:17,280 You can just view it as a strand of-- so this one, if it 108 00:05:17,279 --> 00:05:24,599 has an adenine and then it has a cytosine, 109 00:05:24,600 --> 00:05:26,370 then it has a guanine. 110 00:05:26,370 --> 00:05:27,050 That's a guanine. 111 00:05:27,050 --> 00:05:29,290 They did it in purple. 112 00:05:29,290 --> 00:05:33,050 And then it has a-- oh, no, it has a thymine, not a guanine. 113 00:05:33,050 --> 00:05:36,090 So it has a thymine in purple, and then in 114 00:05:36,089 --> 00:05:37,159 blue, it has a guanine. 115 00:05:37,160 --> 00:05:41,870 So this strand right here codes ACTG. 116 00:05:41,870 --> 00:05:44,470 And if you were to code the opposite side of the strand, 117 00:05:44,470 --> 00:05:46,780 you could immediately-- I don't even have to look here. 118 00:05:46,779 --> 00:05:49,549 I can look at this side and say, OK, adenine will pair 119 00:05:49,550 --> 00:05:58,050 with thymine, cytosine pairs with guanine, thymine pairs 120 00:05:58,050 --> 00:06:01,340 with adenine, and guanine pairs with cytosine. 121 00:06:01,339 --> 00:06:03,529 So they're complementary strands. 122 00:06:03,529 --> 00:06:04,979 So if you think about it, they're really 123 00:06:04,980 --> 00:06:06,540 coding the same thing. 124 00:06:06,540 --> 00:06:08,980 If you have one of them, you have all of the information 125 00:06:08,980 --> 00:06:10,819 for the other. 126 00:06:10,819 --> 00:06:15,819 Now, in our DNA, in a human's DNA, you might say, hey, Sal, 127 00:06:15,819 --> 00:06:22,120 how do I go from these little chains of these molecules? 128 00:06:22,120 --> 00:06:23,750 How does that turn into me? 129 00:06:23,750 --> 00:06:26,089 How does that turn into this complex organism? 130 00:06:26,089 --> 00:06:30,519 And the simple answer is, well, the human genome has 131 00:06:30,519 --> 00:06:32,504 three billion of these base pairs. 132 00:06:32,504 --> 00:06:39,120 133 00:06:39,120 --> 00:06:41,230 And that's actually just in half of your chromosomes. 134 00:06:41,230 --> 00:06:44,040 And I'll tell you, maybe in this video or a future video, 135 00:06:44,040 --> 00:06:46,540 why we only consider half of your chromosomes, and that's 136 00:06:46,540 --> 00:06:53,150 because essentially you have a pair of every chromosome. 137 00:06:53,149 --> 00:06:55,179 I'll talk in more detail about that. 138 00:06:55,180 --> 00:06:58,889 And this number, to some people, they might say, it 139 00:06:58,889 --> 00:07:04,419 only takes three billion base pairs to describe who I am? 140 00:07:04,420 --> 00:07:08,259 And some people would say, wow, it takes three billion 141 00:07:08,259 --> 00:07:09,490 base pairs to describe who I am. 142 00:07:09,490 --> 00:07:11,060 I never thought I was that complex. 143 00:07:11,060 --> 00:07:13,389 So depending on your point of view, this is either a large 144 00:07:13,389 --> 00:07:14,560 or small number. 145 00:07:14,560 --> 00:07:16,810 But when you take these three billion base pairs, you're 146 00:07:16,810 --> 00:07:19,899 actually encoding all of the information that it takes to 147 00:07:19,899 --> 00:07:22,560 make in this case a human being. 148 00:07:22,560 --> 00:07:25,329 And actually it turns out a lot of primates don't have 149 00:07:25,329 --> 00:07:29,449 that many different base pairs than human beings. 150 00:07:29,449 --> 00:07:31,789 The amazing thing is even things like roundworms and 151 00:07:31,790 --> 00:07:38,000 fruit flies also number in a surprisingly large fraction of 152 00:07:38,000 --> 00:07:39,600 the base pairs of a human being. 153 00:07:39,600 --> 00:07:41,680 Maybe I'll do another video where I go 154 00:07:41,680 --> 00:07:42,720 into comparative biology. 155 00:07:42,720 --> 00:07:48,370 But how do these base pairs actually lead to proteins? 156 00:07:48,370 --> 00:07:49,530 I mean, it's fair enough. 157 00:07:49,529 --> 00:07:50,469 That's information. 158 00:07:50,470 --> 00:07:53,100 It's like you can view these as ones and zeroes in some 159 00:07:53,100 --> 00:07:55,689 type of computer language, but really they're not just ones 160 00:07:55,689 --> 00:07:58,139 and zeroes, because they can take on four different values. 161 00:07:58,139 --> 00:08:01,860 They can take on an A, a T, a C or a G, so you could think 162 00:08:01,860 --> 00:08:04,930 of them as zero, ones, twos and threes, but I won't go 163 00:08:04,930 --> 00:08:06,840 into that whole aspect of it just now. 164 00:08:06,839 --> 00:08:10,339 So how does that actually code information? 165 00:08:10,339 --> 00:08:17,689 So DNA when it actually transcribes something-- the 166 00:08:17,689 --> 00:08:24,344 process is called transcription, and I'm going 167 00:08:24,345 --> 00:08:27,000 to do a pretty gross simplification of it, but I 168 00:08:27,000 --> 00:08:30,449 think it'll give you the gist of how it codes for proteins. 169 00:08:30,449 --> 00:08:32,480 So what happens when transcription happens is that 170 00:08:32,480 --> 00:08:35,980 these two strands split up, and one of the strands-- let 171 00:08:35,980 --> 00:08:36,889 me just take one of them. 172 00:08:36,889 --> 00:08:37,830 Let's say it looks like this. 173 00:08:37,830 --> 00:08:39,580 I'll do it all in one color. 174 00:08:39,580 --> 00:08:47,930 Let's say it's just ATGGACG-- I'm just making up stuff-- TA. 175 00:08:47,929 --> 00:08:50,789 Let's say that that's the strand that got split up. 176 00:08:50,789 --> 00:08:54,079 And what happens is it transcribes-- 177 00:08:54,080 --> 00:08:56,060 and I won't say itself. 178 00:08:56,059 --> 00:08:59,309 There's a whole bunch of enzymes and proteins and a 179 00:08:59,309 --> 00:09:03,059 whole bunch of chemical reactions that have to happen, 180 00:09:03,059 --> 00:09:07,049 but this DNA essentially transcribes a 181 00:09:07,049 --> 00:09:10,019 complementary mRNA. 182 00:09:10,019 --> 00:09:12,259 And I'll introduce RNA. 183 00:09:12,259 --> 00:09:14,809 184 00:09:14,809 --> 00:09:17,929 It's essentially the exact same thing as-- well, the word 185 00:09:17,929 --> 00:09:21,629 is ribonucleic acid, so it's literally-- you get rid of the 186 00:09:21,629 --> 00:09:24,639 deoxy, so you can kind of say it's got its oxy, and it's 187 00:09:24,639 --> 00:09:28,000 ribonucleic acid, but it's very similar to DNA. 188 00:09:28,000 --> 00:09:29,960 It codes in the exact same way. 189 00:09:29,960 --> 00:09:37,259 The only difference between RNA, instead of a thymine, it 190 00:09:37,259 --> 00:09:40,750 has something called a uracil. 191 00:09:40,750 --> 00:09:43,759 So every place where you would have expected a thymine, you 192 00:09:43,759 --> 00:09:46,519 would have expected a T, you'll now see a U. 193 00:09:46,519 --> 00:09:51,259 So, for example, if this is the DNA strand, then an RNA, 194 00:09:51,259 --> 00:09:54,519 an mRNA, in a messenger RNA strand, will be built 195 00:09:54,519 --> 00:09:55,779 complementary to this. 196 00:09:55,779 --> 00:09:57,069 So it'll be built-- let's see. 197 00:09:57,070 --> 00:09:59,310 With A, you'd normally have thymine when you're talking 198 00:09:59,309 --> 00:10:03,179 DNA, but now we're talking RNA, so it'll be a uracil, 199 00:10:03,179 --> 00:10:09,259 then an adenine, cytosine, cytosine, uracil, then we got 200 00:10:09,259 --> 00:10:14,069 a guanine, a cytosine, an adenine, and then 201 00:10:14,070 --> 00:10:14,770 we'll have a uracil. 202 00:10:14,769 --> 00:10:18,230 So this is the mRNA strand here. 203 00:10:18,230 --> 00:10:19,820 And all of this is occurring inside the 204 00:10:19,820 --> 00:10:22,500 nucleus of your cells. 205 00:10:22,500 --> 00:10:25,669 And we'll do a whole series of videos in the future about the 206 00:10:25,669 --> 00:10:30,995 structure of our cells, but I think most of us know that our 207 00:10:30,995 --> 00:10:34,909 cells-- and I'll talk more about eukaryotic and 208 00:10:34,909 --> 00:10:37,219 prokaryotic organisms in the future, but most complex 209 00:10:37,220 --> 00:10:39,860 organisms, they have a cell nucleus where we have all of 210 00:10:39,860 --> 00:10:42,539 our chromosomes that contain all of our DNA. 211 00:10:42,539 --> 00:10:48,689 And so this mRNA then detaches itself from the DNA that it 212 00:10:48,690 --> 00:10:52,400 was transcribed from, and then it leaves the nucleus, and it 213 00:10:52,399 --> 00:10:54,949 goes to these structures called ribosomes. 214 00:10:54,950 --> 00:10:57,360 I'm oversimplifying it a little bit, but at the 215 00:10:57,360 --> 00:11:02,509 ribosomes, this mRNA is translated into proteins. 216 00:11:02,509 --> 00:11:03,450 So let me do that. 217 00:11:03,450 --> 00:11:06,360 So let's say this is the mRNA. 218 00:11:06,360 --> 00:11:08,779 It was transcribed from that DNA, so let me get 219 00:11:08,779 --> 00:11:12,319 rid of that DNA now. 220 00:11:12,320 --> 00:11:13,300 I got rid of the DNA. 221 00:11:13,299 --> 00:11:15,990 This is the mRNA that we were able to transcribe from that 222 00:11:15,990 --> 00:11:19,220 DNA, and they have these other things called 223 00:11:19,220 --> 00:11:21,600 tRNA or transfer RNA. 224 00:11:21,600 --> 00:11:23,230 And what these are-- and this is the 225 00:11:23,230 --> 00:11:24,930 really interesting part. 226 00:11:24,929 --> 00:11:28,229 So you may or may not know that pretty much everything we 227 00:11:28,230 --> 00:11:30,200 are is made up of proteins. 228 00:11:30,200 --> 00:11:32,610 And these proteins, the building blocks of proteins 229 00:11:32,610 --> 00:11:33,789 are amino acids. 230 00:11:33,789 --> 00:11:37,379 And for those of you who like to lift weights, I'm sure 231 00:11:37,379 --> 00:11:41,120 you've seen ads for amino acid supplements and 232 00:11:41,120 --> 00:11:42,639 things of the like. 233 00:11:42,639 --> 00:11:45,419 And the reason why they talk about amino acids is because 234 00:11:45,419 --> 00:11:49,399 those are the building blocks of proteins. 235 00:11:49,399 --> 00:11:52,199 My son actually has an allergy to milk protein, so we had to 236 00:11:52,200 --> 00:11:55,690 get him a formula that was just pure amino acids, just 237 00:11:55,690 --> 00:11:58,520 all of the milk proteins broken down. 238 00:11:58,519 --> 00:12:04,259 So if you look at a protein, it's actually a chain of these 239 00:12:04,259 --> 00:12:07,240 amino acids and usually a fairly long chain. 240 00:12:07,240 --> 00:12:11,750 We'll look at some protein structures in the very near 241 00:12:11,750 --> 00:12:14,090 future, just to give you an idea of things. 242 00:12:14,090 --> 00:12:17,550 It's a very long chain of these amino acids, and there 243 00:12:17,549 --> 00:12:20,699 are actually 20 different amino acids. 244 00:12:20,700 --> 00:12:23,759 Twenty different amino acids are pretty much the structure 245 00:12:23,759 --> 00:12:24,990 of all of our proteins. 246 00:12:24,990 --> 00:12:26,240 Let me write that. 247 00:12:26,240 --> 00:12:30,789 248 00:12:30,789 --> 00:12:34,069 So a very obvious question is how can these things code for 249 00:12:34,070 --> 00:12:36,050 20 different amino acids? 250 00:12:36,049 --> 00:12:40,370 I can only have four different things in this little bucket 251 00:12:40,370 --> 00:12:41,409 right here. 252 00:12:41,409 --> 00:12:44,879 And then you just have to go back to your combinatorics, or 253 00:12:44,879 --> 00:12:48,509 if you can't go back to it to watch the playlist on 254 00:12:48,509 --> 00:12:51,289 probability and combinatorics, and say, OK, there's only four 255 00:12:51,289 --> 00:12:56,089 ways that I can have for each of these bases. 256 00:12:56,090 --> 00:12:58,600 There's only four different bases that I can have here, 257 00:12:58,600 --> 00:13:02,540 either an adenine guanine, cytosine or, depending on 258 00:13:02,539 --> 00:13:04,769 whether we're talking about DNA or RNA, 259 00:13:04,769 --> 00:13:06,980 a uracil or a thymine. 260 00:13:06,980 --> 00:13:08,810 But how can we increase the combinations? 261 00:13:08,809 --> 00:13:13,449 Well, if we include two of them, if we include two bases, 262 00:13:13,450 --> 00:13:16,350 then how many combinations can we have? 263 00:13:16,350 --> 00:13:18,259 Well, we have four possibilities here, then we'd 264 00:13:18,259 --> 00:13:22,189 have four possibilities here, so we'd have 16 possibilities. 265 00:13:22,190 --> 00:13:23,850 But that's still not enough. 266 00:13:23,850 --> 00:13:27,800 That's still not enough to code for one of 20 amino acids 267 00:13:27,799 --> 00:13:31,079 to say, hey, this is going to code for amino acid number 268 00:13:31,080 --> 00:13:33,950 five, and we'll talk more about their actual names. 269 00:13:33,950 --> 00:13:34,820 So what do we have to do? 270 00:13:34,820 --> 00:13:36,740 Well, we have to use three of them. 271 00:13:36,740 --> 00:13:40,039 So three of them, there's actually four times four times 272 00:13:40,039 --> 00:13:44,019 four possibilities here, so they could code for 64 273 00:13:44,019 --> 00:13:45,610 different things. 274 00:13:45,610 --> 00:13:49,169 They could take on 64 different combinations or 275 00:13:49,169 --> 00:13:51,669 permutations, this UAC right here. 276 00:13:51,669 --> 00:13:54,990 So if we have three of these bases, we can actually code 277 00:13:54,990 --> 00:13:56,399 for an amino acid. 278 00:13:56,399 --> 00:13:58,579 Actually, it's overkill, because we can actually have 279 00:13:58,580 --> 00:14:02,509 64 combinations here, and there are only 20 amino acids, 280 00:14:02,509 --> 00:14:05,409 so we can even have redundant combinations code for 281 00:14:05,409 --> 00:14:06,469 different amino acids. 282 00:14:06,470 --> 00:14:11,210 For example, we might say that, and this isn't the 283 00:14:11,210 --> 00:14:14,350 actual code, but maybe UAC, and I should look these up. 284 00:14:14,350 --> 00:14:18,019 This codes for amino acid number 1. 285 00:14:18,019 --> 00:14:24,730 And if it was AAU, then this codes for amino acid number 2. 286 00:14:24,730 --> 00:14:29,430 And if I have-- I mean, I think you get the idea. 287 00:14:29,429 --> 00:14:35,109 If I have GGG, this codes for amino acid number 10. 288 00:14:35,110 --> 00:14:38,490 And what happens is when this messenger RNA leaves the 289 00:14:38,490 --> 00:14:41,330 nucleus, it goes to the ribosomes, and at the 290 00:14:41,330 --> 00:14:44,620 ribosomes-- we're going to look at that diagram in a few 291 00:14:44,620 --> 00:14:49,470 seconds-- but at the ribosomes-- let me take my 292 00:14:49,470 --> 00:14:53,940 same mRNA molecule. 293 00:14:53,940 --> 00:14:55,570 And they're much longer than what I'm showing here. 294 00:14:55,570 --> 00:14:57,700 This is just a fraction of an mRNA molecule. 295 00:14:57,700 --> 00:15:01,080 296 00:15:01,080 --> 00:15:04,700 So I'll take my mRNA molecule, and what they do is they 297 00:15:04,700 --> 00:15:07,629 essentially act as a template for tRNA molecules. 298 00:15:07,629 --> 00:15:12,269 And tRNA molecules are these molecules that are attached to 299 00:15:12,269 --> 00:15:15,309 the-- they're almost like the trucks for the amino acids. 300 00:15:15,309 --> 00:15:18,609 So let's say I have some amino acid right here, and then I 301 00:15:18,610 --> 00:15:22,370 have another amino acid that's right here like that, and then 302 00:15:22,370 --> 00:15:25,110 I have another amino acid that's like that. 303 00:15:25,110 --> 00:15:27,300 They'll be attached to tRNA molecules. 304 00:15:27,299 --> 00:15:33,879 So let's say that this tRNA molecule has on it-- so this 305 00:15:33,879 --> 00:15:37,379 amino acid is attached to a tRNA molecule that has the 306 00:15:37,379 --> 00:15:42,570 code on it A-- let me do it in a darker color. 307 00:15:42,570 --> 00:15:44,415 It has the code AUG. 308 00:15:44,414 --> 00:15:48,809 309 00:15:48,809 --> 00:15:56,899 This one right here has the code-- let me 310 00:15:56,899 --> 00:15:57,649 pick another one. 311 00:15:57,649 --> 00:15:58,899 Let's say it has GGAC. 312 00:15:58,899 --> 00:16:03,299 313 00:16:03,299 --> 00:16:04,289 So what's going to happen? 314 00:16:04,289 --> 00:16:07,209 When you're in the ribosome, and it's a complex situation, 315 00:16:07,210 --> 00:16:09,750 but actually what's happening isn't too fancy. 316 00:16:09,750 --> 00:16:14,379 This tRNA, it wants to bond to this part of the mRNA. 317 00:16:14,379 --> 00:16:14,840 Why? 318 00:16:14,840 --> 00:16:19,670 Because adenine bonds with uracil, uracil bonds with 319 00:16:19,669 --> 00:16:22,939 adenine, and guanine bonds with cyotsine, so it'll pull 320 00:16:22,940 --> 00:16:23,930 up right here. 321 00:16:23,929 --> 00:16:25,969 It'll pull up right next to this thing, and actually, I 322 00:16:25,970 --> 00:16:29,920 should probably-- well, I don't know if I can rotate it. 323 00:16:29,919 --> 00:16:32,529 But it'll just pull up right here and attach 324 00:16:32,529 --> 00:16:33,799 to this mRNA molecule. 325 00:16:33,799 --> 00:16:35,059 And this right here is tRNA. 326 00:16:35,059 --> 00:16:37,929 327 00:16:37,929 --> 00:16:39,449 This is mRNA. 328 00:16:39,450 --> 00:16:40,410 And the names don't matter. 329 00:16:40,409 --> 00:16:42,509 I really just want to give you the big picture idea of how 330 00:16:42,509 --> 00:16:44,490 the proteins are actually formed. 331 00:16:44,490 --> 00:16:45,960 And this is an amino acid. 332 00:16:45,960 --> 00:16:49,600 I don't know, let's call it amino acid 1, amino acid 5, 333 00:16:49,600 --> 00:16:52,090 amino acid 20. 334 00:16:52,090 --> 00:16:54,670 This guy, he's going to pull up right here. 335 00:16:54,669 --> 00:16:57,529 The guanine is attracted to the cytosine, and if you watch 336 00:16:57,529 --> 00:16:59,740 the chemistry videos, these are actually hydrogen bonds 337 00:16:59,740 --> 00:17:01,409 that form the base pairs. 338 00:17:01,409 --> 00:17:05,379 Adenine, wants to pull up to uracil, cytosine to guanine, 339 00:17:05,380 --> 00:17:06,579 and so on and so forth. 340 00:17:06,578 --> 00:17:08,430 And so once all of these guys have pulled 341 00:17:08,430 --> 00:17:09,970 up-- let me do that. 342 00:17:09,970 --> 00:17:13,029 So once you've pulled up, let's say that this is-- I 343 00:17:13,029 --> 00:17:14,740 could do it up here. 344 00:17:14,740 --> 00:17:16,720 This is my mRNA molecule. 345 00:17:16,720 --> 00:17:19,180 I'm not going to draw the specifics right there. 346 00:17:19,180 --> 00:17:24,339 My little tRNA's pull up, pull up next to it, and they each 347 00:17:24,338 --> 00:17:26,470 hold a payload, right? 348 00:17:26,470 --> 00:17:29,250 So this first one holds this payload right here of this 349 00:17:29,250 --> 00:17:30,230 amino acid. 350 00:17:30,230 --> 00:17:33,839 The second one holds this payload of this amino acid and 351 00:17:33,839 --> 00:17:36,799 so forth and so on. 352 00:17:36,799 --> 00:17:40,029 And so it might keep going, and there's another green 353 00:17:40,029 --> 00:17:40,869 amino acid here. 354 00:17:40,869 --> 00:17:43,229 They really don't have those colors, but I'm just-- just 355 00:17:43,230 --> 00:17:45,299 for the sake of simplicity like that. 356 00:17:45,299 --> 00:17:47,879 And then the amino acids bond to each other when they're 357 00:17:47,880 --> 00:17:49,600 held like that close to each other. 358 00:17:49,599 --> 00:17:51,359 This doesn't happen all by itself. 359 00:17:51,359 --> 00:17:53,859 The ribosome serves a purpose, and there are enzymes that 360 00:17:53,859 --> 00:17:58,209 facilitate this process, but once these guys bond together, 361 00:17:58,210 --> 00:18:01,160 the tRNA detaches, and you have this 362 00:18:01,160 --> 00:18:04,330 chain of amino acids. 363 00:18:04,329 --> 00:18:08,589 And then the chain of amino acids starts to bend around so 364 00:18:08,589 --> 00:18:10,909 they have all of these-- and it's actually a fascinating-- 365 00:18:10,910 --> 00:18:14,920 I mean, people spend their lives studying how proteins 366 00:18:14,920 --> 00:18:17,680 fold, and that's actually where they get most of their 367 00:18:17,680 --> 00:18:18,769 structural properties. 368 00:18:18,769 --> 00:18:21,029 It's not just the chain of the amino acids, but what's more 369 00:18:21,029 --> 00:18:23,660 important is how these amino acids actually fold. 370 00:18:23,660 --> 00:18:26,880 So once you fold them, they form these really ultracomplex 371 00:18:26,880 --> 00:18:30,990 patterns based on what amino acid is attracted to what 372 00:18:30,990 --> 00:18:33,180 other amino acid in these very intricate 373 00:18:33,180 --> 00:18:34,820 three-dimensional shapes. 374 00:18:34,819 --> 00:18:38,250 And what I took here from Wikipedia is these are some 375 00:18:38,250 --> 00:18:38,920 amino acids. 376 00:18:38,920 --> 00:18:44,150 And just to be able to relate this to the DNA, this right 377 00:18:44,150 --> 00:18:45,220 here is insulin. 378 00:18:45,220 --> 00:18:49,850 It's key in our ability to process glucose in our body. 379 00:18:49,849 --> 00:18:51,299 So this right here is insulin. 380 00:18:51,299 --> 00:18:52,430 It's a hormone. 381 00:18:52,430 --> 00:18:54,970 So sometimes you hear people talk about your immune system. 382 00:18:54,970 --> 00:18:57,110 Sometimes you hear people talking about your endocrine 383 00:18:57,109 --> 00:19:00,199 system and hormones, sometimes your digestive system. 384 00:19:00,200 --> 00:19:05,700 This is hemoglobin, what essentially transports our 385 00:19:05,700 --> 00:19:07,090 oxygen in our blood. 386 00:19:07,089 --> 00:19:10,019 But all of these things are proteins, and all these 387 00:19:10,019 --> 00:19:12,509 little, little folds you see, these are all little amino-- I 388 00:19:12,509 --> 00:19:16,519 mean, they're just little dots of amino acids. 389 00:19:16,519 --> 00:19:19,809 Some of these are multiple chains of amino acids kind of 390 00:19:19,809 --> 00:19:22,250 fitting together like a big puzzle, but some of them or 391 00:19:22,250 --> 00:19:23,950 just single chains of amino acids. 392 00:19:23,950 --> 00:19:27,309 For insulin right here, this is 50 amino acids. 393 00:19:27,309 --> 00:19:30,849 And then once the chain forms, it all bundles together and 394 00:19:30,849 --> 00:19:34,709 forms this little blob like you see, but the shape of that 395 00:19:34,710 --> 00:19:38,350 blob is super important for insulin being able to perform 396 00:19:38,349 --> 00:19:41,869 the function that it needs to perform in our systems. 397 00:19:41,869 --> 00:19:47,039 But this right here is approximately 50-- I forgot 398 00:19:47,039 --> 00:19:49,144 the exact number-- amino acids. 399 00:19:49,144 --> 00:19:53,210 400 00:19:53,210 --> 00:19:59,670 This right here, this immunoglobulin G, which is 401 00:19:59,670 --> 00:20:01,759 part of our immune system, this is 402 00:20:01,759 --> 00:20:09,700 roughly 1,500 amino acids. 403 00:20:09,700 --> 00:20:12,160 So how much DNA or how many base pairs 404 00:20:12,160 --> 00:20:12,990 had to code for this? 405 00:20:12,990 --> 00:20:15,180 Well, three times as much, right? 406 00:20:15,180 --> 00:20:19,350 Because you have to have three base pairs that code for one 407 00:20:19,349 --> 00:20:21,490 amino acid, and actually, three base pairs, this is 408 00:20:21,490 --> 00:20:27,180 called a codon, because it codes for amino acids. 409 00:20:27,180 --> 00:20:29,380 So three base pairs make a codon. 410 00:20:29,380 --> 00:20:32,170 So if you have 50 amino acids that make up insulin, that 411 00:20:32,170 --> 00:20:34,230 means you're going to have to have 50 codons, which means 412 00:20:34,230 --> 00:20:39,940 you have to have 150 bases or 150 of these 413 00:20:39,940 --> 00:20:41,769 A's and G's and T's. 414 00:20:41,769 --> 00:20:45,099 If you have 1,500 amino acids, that means you're going to 415 00:20:45,099 --> 00:20:47,419 have to have 1,500 codons, which means you're going to 416 00:20:47,420 --> 00:20:54,660 have roughly 4,500 of these base pairs that code for it. 417 00:20:54,660 --> 00:20:59,279 Now, there are some notions that get confused a lot, so I 418 00:20:59,279 --> 00:21:02,480 went to kind of the smallest level of our DNA right here, 419 00:21:02,480 --> 00:21:05,180 and this is the level at which-- well, this is RNA that 420 00:21:05,180 --> 00:21:08,299 I'm pointing to right there, but this is the smallest level 421 00:21:08,299 --> 00:21:10,579 of DNA, and that's the level at which the information is 422 00:21:10,579 --> 00:21:11,319 actually coded. 423 00:21:11,319 --> 00:21:14,490 But how does that relate to things like genes and 424 00:21:14,490 --> 00:21:16,700 chromosomes and things that you might talk 425 00:21:16,700 --> 00:21:18,182 about in other contexts? 426 00:21:18,182 --> 00:21:20,859 427 00:21:20,859 --> 00:21:25,879 So let's say the 150 base pairs that coded for insulin, 428 00:21:25,880 --> 00:21:27,515 these make up a gene. 429 00:21:27,515 --> 00:21:30,600 430 00:21:30,599 --> 00:21:36,449 And these 4,500 base pairs make up another gene. 431 00:21:36,450 --> 00:21:40,069 Now, all of the genes don't make proteins, but all of the 432 00:21:40,069 --> 00:21:41,720 proteins are made by genes. 433 00:21:41,720 --> 00:21:45,940 So let's say I have just a bunch of-- I'll just make 434 00:21:45,940 --> 00:21:51,769 another A, G, and it goes down, down, down, and you have 435 00:21:51,769 --> 00:21:54,920 a T and then a C and a C, and let's say I 436 00:21:54,920 --> 00:21:57,840 have 4,500 of these. 437 00:21:57,839 --> 00:22:01,480 These could code for a protein. 438 00:22:01,480 --> 00:22:03,319 These could code for protein, or they could have all of 439 00:22:03,319 --> 00:22:07,480 these other kind of regulatory functions telling what other 440 00:22:07,480 --> 00:22:10,400 parts of the DNA should and should not be coded and how 441 00:22:10,400 --> 00:22:13,870 the DNA behaves, so it becomes super, super complex. 442 00:22:13,869 --> 00:22:18,049 But this kind of section of our DNA, this is what we refer 443 00:22:18,049 --> 00:22:24,399 to as a gene, and a gene can have anywhere from a couple of 444 00:22:24,400 --> 00:22:28,990 hundreds of these base pairs or these bases to several 445 00:22:28,990 --> 00:22:30,859 thousand of these base pairs. 446 00:22:30,859 --> 00:22:34,219 Now, a gene is that part of our chromosome that codes for 447 00:22:34,220 --> 00:22:37,920 a particular protein or serves a certain function. 448 00:22:37,920 --> 00:22:40,285 Now, there are different versions of genes. 449 00:22:40,285 --> 00:22:43,000 450 00:22:43,000 --> 00:22:45,579 It's a gross oversimplification, but let me 451 00:22:45,579 --> 00:22:49,814 say this is the gene for insulin. 452 00:22:49,815 --> 00:22:53,039 453 00:22:53,039 --> 00:22:56,649 Now, there might be slight variations in how insulin can 454 00:22:56,650 --> 00:22:59,880 be coded for, and I'm kind of going out of my domain right 455 00:22:59,880 --> 00:23:00,870 here, because I don't know if that's true. 456 00:23:00,869 --> 00:23:02,579 And maybe I shouldn't just speak specifically about 457 00:23:02,579 --> 00:23:04,859 insulin, but it's coding for some protein, but there's 458 00:23:04,859 --> 00:23:06,990 maybe multiple different ways that that 459 00:23:06,990 --> 00:23:08,470 protein can be coded. 460 00:23:08,470 --> 00:23:11,539 Maybe instead of a T here, sometimes there's a C there. 461 00:23:11,539 --> 00:23:13,000 It still codes for the same protein. 462 00:23:13,000 --> 00:23:16,329 It doesn't change it quite enough, but that protein acts 463 00:23:16,329 --> 00:23:18,730 just a little bit different. 464 00:23:18,730 --> 00:23:20,420 It's a slight variant. 465 00:23:20,420 --> 00:23:21,750 I'll use that word. 466 00:23:21,750 --> 00:23:24,849 Now, each variant of this gene is called an allele. 467 00:23:24,849 --> 00:23:27,539 468 00:23:27,539 --> 00:23:34,845 It's a specific variant of your gene. 469 00:23:34,845 --> 00:23:37,460 470 00:23:37,460 --> 00:23:45,230 Now, if you take this DNA chain, and this chain over 471 00:23:45,230 --> 00:23:45,839 here-- let's see. 472 00:23:45,839 --> 00:23:47,909 This is one base pair. 473 00:23:47,910 --> 00:23:50,000 This might be like one base. 474 00:23:50,000 --> 00:23:51,200 This is another base. 475 00:23:51,200 --> 00:23:54,430 Maybe this is an adenine and then this would be a thymine 476 00:23:54,430 --> 00:23:55,900 over here in green. 477 00:23:55,900 --> 00:23:59,220 This is an adenine and this would be a thymine. 478 00:23:59,220 --> 00:24:02,569 If right here this is a guanine, then right here would 479 00:24:02,569 --> 00:24:03,710 be a cytosine. 480 00:24:03,710 --> 00:24:05,610 This would be just a very small section. 481 00:24:05,609 --> 00:24:10,799 If I were to like zoom out, and let's say we have a big 482 00:24:10,799 --> 00:24:14,180 chain of DNA where each of these little dots are a base 483 00:24:14,180 --> 00:24:19,509 pair that I'm drawing here, maybe this section 484 00:24:19,509 --> 00:24:23,309 codes for gene 1. 485 00:24:23,309 --> 00:24:25,629 And then there's some noise or things that we haven't fully 486 00:24:25,630 --> 00:24:26,120 understood yet. 487 00:24:26,119 --> 00:24:26,939 Now, I want to be clear. 488 00:24:26,940 --> 00:24:29,789 Just with a simple discussion of DNA, we're already kind of 489 00:24:29,789 --> 00:24:32,109 approaching the frontiers of what we know and what we don't 490 00:24:32,109 --> 00:24:35,889 know, because DNA is hugely complex, and there's all of 491 00:24:35,890 --> 00:24:38,300 these feedback structures, and certain genes tell you to code 492 00:24:38,299 --> 00:24:40,599 for other genes and not to code for other genes and to 493 00:24:40,599 --> 00:24:43,250 code under certain circumstances, hugely complex. 494 00:24:43,250 --> 00:24:45,049 So there's huge sections of DNA that we still don't 495 00:24:45,049 --> 00:24:46,990 understand what exactly they do. 496 00:24:46,990 --> 00:24:48,930 But then maybe they'll have another section here that 497 00:24:48,930 --> 00:24:50,470 codes for gene 2. 498 00:24:50,470 --> 00:24:52,569 Maybe gene 2 is a little bit longer. 499 00:24:52,569 --> 00:24:54,509 Maybe it's 1,000 base pairs. 500 00:24:54,509 --> 00:25:00,379 But when you take all of these and you turn it into a-- it 501 00:25:00,380 --> 00:25:04,990 kind of winds in on itself like this. 502 00:25:04,990 --> 00:25:05,609 Let me do it. 503 00:25:05,609 --> 00:25:08,979 So it'll wind up, winding in on itself like this and do all 504 00:25:08,980 --> 00:25:10,440 sorts of crazy things. 505 00:25:10,440 --> 00:25:14,880 Remember, it completely bundles itself up, and then it 506 00:25:14,880 --> 00:25:16,300 looks something like that. 507 00:25:16,299 --> 00:25:17,799 Then you get a chromosome. 508 00:25:17,799 --> 00:25:20,549 509 00:25:20,549 --> 00:25:24,619 And just to get an idea of how large a chromosome is compared 510 00:25:24,619 --> 00:25:29,089 to the actual base pairs, chromosome number one in the 511 00:25:29,089 --> 00:25:31,599 human genome-- so we have 23 pairs. 512 00:25:31,599 --> 00:25:34,139 If you look at it inside of a nucleus-- so let's say that's 513 00:25:34,140 --> 00:25:34,620 the nucleus. 514 00:25:34,619 --> 00:25:35,719 Let's say this is the cell. 515 00:25:35,720 --> 00:25:37,960 The cell is much bigger than what I'm showing. 516 00:25:37,960 --> 00:25:39,924 But we have 23 pairs of chromosomes. 517 00:25:39,924 --> 00:25:44,019 518 00:25:44,019 --> 00:25:45,619 I won't do all of them. 519 00:25:45,619 --> 00:25:47,279 You can actually see chromosomes in a 520 00:25:47,279 --> 00:25:50,289 not-too-expensive microscope, so we're already getting to a 521 00:25:50,289 --> 00:25:51,579 scale that we can start to look at. 522 00:25:51,579 --> 00:25:55,199 But the largest chromosome, which is chromosome number one 523 00:25:55,200 --> 00:25:58,370 in the human genome, just to give an idea of how much 524 00:25:58,369 --> 00:26:02,759 information it's packing, that thing right there has 220 525 00:26:02,759 --> 00:26:06,640 million base pairs. 526 00:26:06,640 --> 00:26:08,960 Sometimes people talk about chromosomes and genetics and 527 00:26:08,960 --> 00:26:11,069 genes and base pairs interchangeably, but it's very 528 00:26:11,069 --> 00:26:12,730 important to kind of get an idea of scale. 529 00:26:12,730 --> 00:26:15,910 These chromosomes are a super-long strand of DNA 530 00:26:15,910 --> 00:26:18,850 that's all configured and bundled up, and it contains 531 00:26:18,849 --> 00:26:21,059 220 million base pairs. 532 00:26:21,059 --> 00:26:23,919 So the actual elements that are coding for the information 533 00:26:23,920 --> 00:26:27,529 are unbelievably small relative to 534 00:26:27,529 --> 00:26:29,149 the chromosome itself. 535 00:26:29,150 --> 00:26:31,259 But now that we understand a little bit, and actually I 536 00:26:31,259 --> 00:26:34,490 want to take a look back at this, because this kind of 537 00:26:34,490 --> 00:26:36,960 blows my mind, that if you just take those little 538 00:26:36,960 --> 00:26:40,370 combinations of those amino acids, you can form these very 539 00:26:40,369 --> 00:26:42,889 intricate, very advanced structures that we're still 540 00:26:42,890 --> 00:26:46,500 fully understanding how they actually interact with each 541 00:26:46,500 --> 00:26:51,154 other and regulate how all of our biological processes work. 542 00:26:51,154 --> 00:26:54,849 And what's even more amazing is that this scheme that I've 543 00:26:54,849 --> 00:26:58,339 talked about in this video about DNA to mRNA to tRNA to 544 00:26:58,339 --> 00:27:02,579 these molecules, this is true for all of life on our planet, 545 00:27:02,579 --> 00:27:05,579 so we all share this same mechanism. 546 00:27:05,579 --> 00:27:09,929 Me and this plant, we share that common root 547 00:27:09,930 --> 00:27:11,340 that we all have DNA. 548 00:27:11,339 --> 00:27:14,579 As different as me and that roach that I might not like to 549 00:27:14,579 --> 00:27:18,240 be in the same room, we all share that same common root of 550 00:27:18,240 --> 00:27:21,880 DNA and that all of it codes to proteins in this exact same 551 00:27:21,880 --> 00:27:24,440 way, that there's this commonality amongst all life. 552 00:27:24,440 --> 00:27:25,700 That, to me, is mind blowing. 553 00:27:25,700 --> 00:27:28,970 Then even more mind blowing is how these very complex shapes 554 00:27:28,970 --> 00:27:31,360 are formed by the DNA. 555 00:27:31,359 --> 00:27:33,859 And this isn't speculation. 556 00:27:33,859 --> 00:27:36,709 This is observed behavior. 557 00:27:36,710 --> 00:27:38,799 This is a fascinating structure right here, but it's 558 00:27:38,799 --> 00:27:43,059 just based on 20 amino acid-- you can almost view the amino 559 00:27:43,059 --> 00:27:46,500 acid as the LEGOS, and you put the LEGOS together, and just 560 00:27:46,500 --> 00:27:51,890 the chemical interactions form these fairly impressive 561 00:27:51,890 --> 00:27:53,090 structures right here. 562 00:27:53,089 --> 00:27:55,459 So now that we know a little bit about DNA and how it codes 563 00:27:55,460 --> 00:27:58,400 into protein, we can take a little jump back and talk a 564 00:27:58,400 --> 00:28:01,480 little bit more about how variation is actually 565 00:28:01,480 --> 00:28:04,210 introduced into a population. 566 00:28:04,210 --> 00:28:04,866