From 0aaf1bc43b61120a6f93ebc334942780e497e24d Mon Sep 17 00:00:00 2001 From: don Date: Mon, 9 Aug 2010 06:34:10 +0000 Subject: [PATCH] update kinetic formalism with program notes and add omer points git-svn-id: svn+ssh://hemlock.ucr.edu/srv/svn/misc/trunk/origins_of_life@595 25fa0111-c432-4dab-af88-9f31a2f6ac42 --- kinetic_formalism.Rnw | 177 +++++++++++++++++++++++++++++++++++++++++- omer_points.txt | 28 +++++++ 2 files changed, 202 insertions(+), 3 deletions(-) create mode 100644 omer_points.txt diff --git a/kinetic_formalism.Rnw b/kinetic_formalism.Rnw index afd5984..c3541f4 100644 --- a/kinetic_formalism.Rnw +++ b/kinetic_formalism.Rnw @@ -81,9 +81,10 @@ $\left[C_{i_\mathrm{monomer}}\right]$, the surface area of the vesicle $S_\mathrm{ves}$, the base backwards kinetic parameter for the $i$th specie $k_{bi}$ which is also dependent on lipid type, its adjustment parameter $k_{bi\mathrm{adj}}$ (see~\fref{eq:kb_adj}), and the molar -concentration of the $i$th specie in the vesicle $C_{i_\mathrm{ves}}$, -the change in concentration of the $i$th specie in the vesicle per -change in time $\frac{d C_{i_\mathrm{ves}}}{dt}$ can be calculated: +concentration of the $i$th specie in the vesicle +$\left[C_{i_\mathrm{ves}}\right]$, the change in concentration of the +$i$th specie in the vesicle per change in time $\frac{d + C_{i_\mathrm{ves}}}{dt}$ can be calculated: \begin{equation} \frac{d C_{i_\mathrm{ves}}}{dt} = k_{fi}k_{fi\mathrm{adj}}\left[C_{i_\mathrm{monomer}}\right]S_\mathrm{ves} - @@ -111,6 +112,7 @@ $\mathrm{n}$, Thus, we have The 1000 isn't in \fref{eq:state} above, because it is unit-dependent. \subsection{Forward adjustments ($k_{fi\mathrm{adj}}$)} +\label{sec:forward_adj} The forward rate constant adjustment, $k_{fi\mathrm{adj}}$ takes into account unsaturation ($un_f$), charge ($ch_f$), curvature ($cu_f$), @@ -639,6 +641,175 @@ rm(grid) @ +\section{Simulation Methodology} + +\subsection{Overall Architecture} + +The simulation is currently run by single program written in perl +using various different modules to handle the subsidiary parts. It +produces output for each generation, including the step immediately +preceeding and immediately following a vesicle split (and optionally, +each step) that is written to a state file which contains the state of +the vesicle, environment, kinetic parameters, program invocation +options, time, and various other parameters necessary to recreate the +state vector at a given time. This output file is then read by a +separate program in perl to produce different output which is +generated out-of-band; output which includes graphs and statistical +analysis is performed using R (and various grid graphics modules) +called from the perl program. + +The separation of simulation and output generation allows refining +output, and simulations performed on different versions of the +underlying code to be compared using the same analysis methods and +code. It also allows later simulations to be restarted from a specific +generation, utilizing the same environment. + +Random variables of different distributions are calculated using +Math::Random, which is seeded for each run using entropy from the +linux kernel's urandom device. + +\subsection{Environment Creation} + +\subsubsection{Components} +The environment contains different concentrations of different +components. In the current set of simulations, there are +\Sexpr{1+4*5*7} different components, consisting of PC, PE, PS, SM, +and CHOL, with all lipids except for CHOL having 5 possible +unsaturations rangiong from 0 to 4, and 7 lengths from $12,14,...,22$ +($7\cdot 5\cdot4+1=\Sexpr{1+4*5*7}$). In cases where the environment +has less than the maximum number of components, the components are +selected in order without replacement from a randomly shuffled deck of +component (with the exception of CHOL) represented once until the +desired number of unique components are obtained. CHOL is over +representated $\Sexpr{5*7}$ times to be at the level of other lipid +types, ensures that the probability of CHOL being asbent in the +environment is the same as the probability of one of the other lipid +types (PS, PE, etc.) being entirely absent. This reduces the number of +simulations with a small number of components which are entirely +devoid of CHOL. + +\subsubsection{Concentration} +Once the components of the environment have been selected, the +concentration of thoes components is determined. In experiments where +the environmental concentration is the same across all lipid +components, the concentration is set to $10^{-10}\mathrm{M}$. In other +cases, the environmental concentration is set to a random number from +a gamma distribution with shape parameter 2 and an average of +$10^{-10}\mathrm{M}$. The base concentration ($10^{-10}\mathrm{M}$) +can also be altered in the initialization of the experiment to +specific values for specific lipid types or components. + +\subsection{Initial Vesicle Creation} + +Initial vesicles are seeded by selecting lipid molecules from the +environment until the vesicle reaches a specific starting size. The +vesicle starting size has gamma distribution with shape parameter 2 +and a mean of the experimentally specified starting size, with a +minimum of 5 lipid molecules. Lipid molecules are then selected to be +added to the lipid membrane according to four different methods. In +the constant method, molecules are added in direct proportion to their +concentration in the environment. The uniform method adds molecules in +proportion to their concentration in the environment scaled by a +uniform random value, whereas the random method adds molecules in +proportion to a uniform random value. The final method is a binomial +method, which adjust the porbability of adding a molecule of a +specific component by the concentration of that component, and then +adds the molecules one by one to the membrane. This final method is +also used in the first three methods to add any missing molecules to +the starting vesicle which were unallocated due to the requirement to +add integer numbers of molecules. + +\subsection{Simulation Step} + +Once the environment has been created and the initial vesicle has been +formed, molecules join and leave the vesicle based on the kinetic +parameters and state equation discussed until the vesicle splits +forming two daughter vesicles, one of which the program continues to +follow. + +\subsubsection{Calculation of Vesicle Properties} + +\label{sec:ves_prop} +$S_\mathrm{ves}$ is the surface area of the vesicle, and is the sum of +the surface area of all of the individual lipid components; each lipid +type has a different surface area; we na\"ively assume that the lipid +packing is optimal, and there is no empty space. + +\subsubsection{Joining and Leaving of Lipid Molecules} + +Determining the number of molecules to add to the lipid membrane +($n_i$) requires knowing $k_{fi\mathrm{adj}}$, the surface area of the +vesicle $S_\mathrm{ves}$ (see~\fref{sec:ves_prop}), the time interval +$dt$ during which lipids are added, the base $k_{fi}$, and the +concentration of the monomer in the environment +$\left[C_{i\mathrm{ves}}\right]$ (see~\fref{eq:state}). +$k_{fi\mathrm{adj}}$ is calculated (see~\fref{eq:kf_adj}) based on the +vesicle properties and their hypothesized effect on the rate (in as +many cases as possible, experimentally based) +(see~\fref{sec:forward_adj} for details). $dt$ can be varied +(see~\fref{sec:step_duration}), but for a given step is constant. This +leads to the following: + +$n_i = k_{fi}k_{fi\mathrm{adj}}\left[C_{i_\mathrm{monomer}}\right]S_\mathrm{ves}dt\mathrm{NA}$ + +In the cases where $n_i > 1$, the integer number of molecules is +added. Fractional $n_i$ or the fractional remainder after the addition +of the integer molecules are the probability of adding a specific +molecule, and are compared to a uniformly distributed random value +between 0 and 1. If the random value is less than or equal to the +fractional part of $n_i$, an additional molecule is added. + +Molecules leaving the vesicle are handled in a similar manner, with + +$n_i = k_{bi}k_{bi\mathrm{adj}}C_{i_\mathrm{ves}}dt\mathrm{NA}$. + +While programatically, the molecule removal happens after the +addition, the properties that each operates on are the same, so they +can be considered to have been added and removed at the same instant. +[This also avoids cases where a removal would have resulted in +negative molecules for a particular type, which is obviously +disallowed.] + +\subsubsection{Step duration} +\label{sec:step_duration} + +$dt$ is the time taken for each step of joining, leaving, and checking +split. In the current implementation, it starts with a value of +$10^{-6}\mathrm{s}$ but this is modified in between each step if the +number of molecules joining or leaving is too large or small. If more +than half of the starting vesicle size molecules join or leave in a +single step, $dt$ is reduced by half. If less than the starting +vesicle size molecules join or leave in 100 steps, $dt$ is doubled. +This is necessary to curtail run times and to automatically adjust to +different experimental runs. + +(In every run seen so far, the initial $dt$ was too small, and was +increased before the first generation occured; at no time was $dt$ too +large.) + +\subsubsection{Vesicle splitting} + +If a vesicle has grown to a size which is more than double the +starting vesicle size, the vesicle splits. More elaborate mechanisms +for determining whether a vesicle should split are of course possible, +but not currently implemented. Molecules are assigned to the daughter +vesicles at random, with each daughter vesicle having an equal chance +of getting a single molecule. The number of molecules to assign to the +first vesicle has binomial distribution with a probability of an event +in each trial of 0.5 and a number of trials equal to the number of +molecules. + +\subsection{Output} + +The environment, initial vesicle, and the state of the vesicle +immediately before and immediately after splitting are stored to disk +to produce later output. + +\section{Analyzing output} + +\subsection{PCA plots} + +\subsection{Carpet plots} diff --git a/omer_points.txt b/omer_points.txt new file mode 100644 index 0000000..d5ed105 --- /dev/null +++ b/omer_points.txt @@ -0,0 +1,28 @@ + + +Points for Pap318 comparing R-GARD to A-GARD + +0. A main goal is to compare the two abovementioned GARD models and point +out similarities and differences. We note the constraints on such a +comparison, as the models are inherently different in some respects. But a +later part of the paper will also deal with analyses and properties of R- +GARD irrespective of such comparison. + +1. To better compare R-GARD with A- GARD, IL will generate A-GARD carpets +using Euclidean distance instead of the dot product (H), which we currently +use. We will also compare internally the difference between these two +metrics when applied to the same data. + +2. We will also calculate the compotype counts of A-GARD with kmeans +clustering by using Euclidean distance instead of dot product. We should +check how this change affects the resulting number of compotypes. If for +some reason this does not work easily, we will try a more phenomenological +comparison. + +3. Choose carpets from R-Gard results for us to compare with similar +carpets from A-GARD. These carpets should exhibit repeating compotypes, +drift, and composomes. + +4. Calculate and display statistics to emphasize the dependence of various +results on the parameters of R-GARD. Example: number of compotypes +dependency on split size. -- 2.39.2