An Artificial Intelligence Approach for Modeling and Prediction of Water Diffusion Inside a Carbon Nanotube

Modeling of water flow in carbon nanotubes is still a challenge for the classic models of fluid dynamics. In this investigation, an adaptive-network-based fuzzy inference system (ANFIS) is presented to solve this problem. The proposed ANFIS approach can construct an input–output mapping based on both human knowledge in the form of fuzzy if-then rules and stipulated input–output data pairs. Good performance of the designed ANFIS ensures its capability as a promising tool for modeling and prediction of fluid flow at nanoscale where the continuum models of fluid dynamics tend to break down.


Introduction
Carbon nanotubes (CNTs) have drawn much attention, not only for their exceptional mechanical and electrical properties, but also for their application in the new emerging area of nanofluidics since they can transport fluids at an extraordinarily fast flow rate. This property has diverse applications, such as in charge storage devices [1], membrane industry [2], drug-delivery devices [3], and understanding the transport processes in biological channels [4].
In the past few years, a significant number of works have been devoted to the study of fluid flow through CNTs [5][6][7][8]. Fast pressure-driven flow of fluids in membranes of CNTs 1.6 and 7 nm in diameter has been measured by Majumder et al. [5] and Holt et al. [6], respectively. They indicated measured values of 2 to 5 orders of magnitude larger than those calculated by the continuum-based no-slip Hagen-Poiseuille equation. Recently, Thomas et al. [7] have re-evaluated water transport through CNTs having diameters ranging from 1.66 to 4.99 nm. They found that the measured flow rates exceeded those predicted by the no-slip Hagen-Poiseuille relation. Interestingly, new experimental results for the flow of water, ethanol, and decane through carbon nanopipes with relatively large inner diameters (i.e., 43 ± 3 nm) have demonstrated that transport is enhanced up to 45 times that of theoretical predictions [8]. Extraordinarily fast flow rate of fluids in nanopores other than CNTs has also been observed [9,10]. As can be seen, the classic models of fluid dynamics start to break down while we diminish the working length scale. As a result, new approaches for modeling of fluid flow at nanoscale dimensions are needed. The present work is an attempt to introduce an alternative methodology, namely, the fuzzy logic approach, to explain the behavior of fluids at nanoscale. As a case study, we applied this method for modeling and prediction of water diffusion inside a CNT (6,6).
Modeling of phenomena based on conventional mathematical tools (e.g., differential equations) is not appropriate for dealing with ill-defined and uncertain problems. By contrast, a fuzzy logic approach employing fuzzy if-then rules can model the qualitative aspects of human knowledge and reasoning processes without employing precise quantitative analyses. The fuzzy modeling or fuzzy identification, first explored systematically by Takagi et al. [11]. The aim of this paper is to suggest an architecture called adaptive-network-based fuzzy inference system (ANFIS) for modeling and prediction of fluid flow at nanoscale dimensions since it has been suggested to be universal approximator of any continuous function [12]. Furthermore, it has been shown that the obtained results by the ANFIS approach in estimation of non-linear functions outperform the auto-regressive models and other connectionist approaches, such as neural networks [12]. ANFIS can serve as a basis for constructing a set of fuzzy if-then rules with appropriate membership functions to generate the stipulated input-output pairs. This architecture was proposed by Jang in 1991 [13,14]. More information regarding the architecture and the performance of ANFIS can be found in the literature [12]. In what follows, first, performance of an MD simulation of water diffusion through a CNT (6,6) is described. An ANFIS technique is then employed for modeling and prediction of this phenomenon. Finally, some benefits of the designed ANFIS are detailed.

Model and MD Simulation
To show the diffusion of water molecules in CNTs, a CNT (6,6) (13.4 Å long and 8.1 Å in diameter) was solvated in a cubic box (Box length L = 32.06 Å ) of 1,034 TIP3P water molecules [15]. The MD simulation was performed using Discover, which is a molecular modeling software package implemented with Materials Studio 4.2 [16]. In this investigation, the force field used to model the interatomic interactions was the consistent-valence force field (CVFF). The MD simulation was done at the NVT statistical ensemble (i.e., a constant number of particles, volume and temperature). The temperature was kept constant at 300 K using a Nosé-Hoover thermostat [17,18]. The cell multipole method [19] was employed for the non-bond summation method. A time step of 2 fs was used and structures were sampled every 1 ps. The overall time of the simulation was set to be 50 ns.
Initially, the CNT was in the center of the bath of water molecules. However, the nanotube was free and could be displaced. During the simulation time (i.e., 50 ns), it was observed that water molecules penetrated into the CNT and passed through it in such a way that the CNT remained occupied by an average of about five water molecules during the whole period of 50 ns. During the simulation time, an average of about 17 water molecules per nanosecond entered the nanotube and left the other side. It yields an average volumetric flow rate of about 50.4 9 10 -14 cm 3 s -1 , which is comparable to the reported water diffusion rate through a channel of the transmembrane protein aquaporin-1 [20]. As a result, our MD simulation showed good agreement with experimental results.

Results and Discussion
Now, let us define the flow rate of water molecules as the number of water molecules entering the CNT on one side and leaving the other side per nanosecond. Using the simulation described earlier, the flow rate of water molecules as a function of time was recorded. This correlation is shown in Fig. 1. In the following section, the applicability of the ANFIS approach for modeling and prediction of the flow rate of water molecules as a function of time, which is demonstrated in Fig. 1, is put to test. In other words, we attempted to find the unknown function Y = F(X) with the aid of the ANFIS approach, where Y is defined as the values of the flow rate of water molecules and X stands for the corresponding time values. The Fuzzy Logic Toolbox embedded in MATLAB 7.0 [21] is used for modeling and prediction of the flow rate of water molecules as a function of time. The input layer of the ANFIS consists of the time while the output layer of the ANFIS corresponds to the flow rate. The known values of the function up to the point Y = X are used to predict the value at some point in the future Y = X ? P. The standard method for this type of prediction is to create a mapping from D points of the function spaced D apart, that is, ðYðX À ðD À 1ÞDÞ; . . .; YðX À DÞ; YðXÞÞ, to a predicted future value Y(X ? P). To this end, all recorded data set was divided into 6 parts. The first 5 parts of pairs (i.e., training data set) were used for training the ANFIS while the remaining 1 part of pairs (i.e., checking data set) were used for validating the identified model. Here, the value D was selected to be 1. Therefore, we had 4 inputs. In order to achieve an effective ANFIS, all Fig. 1 Plot of the flow rate of water molecules through a CNT (6,6) as a function of time resulting from the molecular dynamics (MD) simulation data sets are needed to be normally preprocessed using an appropriate transformation method. It has been reported that the ANFIS systems trained on transformed data sets achieve better performance and faster convergence in general. There are many transformation procedures that can be applied to a data set [22]. In this study, the all data sets (i.e., training and checking data sets) were individually transformed with the log function, which has the following equation [23]: where z trn and z chk are the transformed values of the training and checking data sets, a is an arbitrary constant (Here a = 4), and b is set to 1 to avoid the entry of zero in the log functions. The number of membership functions assigned to each input of the ANFIS was arbitrarily set to 2, therefore the rule number is 16. The ANFIS used here contains a total of 104 fitting parameters, of which 24 are premise parameters and 80 are consequent parameters. Notice that the number of membership functions, which was chosen to be 2 is the maximum number to obtain the maximum performance of the designed ANFIS since we should take care of overtraining. In a sense, the so-called ''overtraining'' term indicates that a given ANFIS adapts itself too well to the training data set in such a way that further improvement based on the training data set not only impairs more accurate predictions of the checking data set but may also have adverse effects on those predictions. Note that in the case of overtraining, usually the total number of fitting parameters in the ANFIS is more than the number of pairs in the training data set. The root mean squared error (RMSE) was used in order to assess the accuracy of the actual output in comparison with the one predicted by the ANFIS. Indeed, this statistical parameter does measure the correlation between the target values (i.e., the flow rate of water molecules resulting from the MD simulation) and the corresponding values predicted by the ANFIS. Note that for a perfect correlation, RMSE should be 0. After 200 epochs, we had RMSE trn , and RMSE chk equal zero (note that the designed ANFIS yielded RMSE trn = 0.4289 ns -1 and RMSE chk = 0.4840 ns -1 . Since these values have not a real physical meaning, they were reported with only one significant digit). It should be noted that the so-called ''epoch'' term means the presentation of each data set to the ANFIS and the receipt of output values. After 200 epochs, we obtained 200 RMSE values for both training and checking data sets. In order to check whether the difference between two RMSE values is significant, we did use the t-test. As you know, the t-test assesses whether the means of two data sets are statistically different from each other. The result showed that with a probability more than 95%, the difference between two RMSE values is not significant. During repeated epochs, it was observed that the RMSE monotonically declines for both data sets (i.e., training, and checking data sets). Eventually, it reaches a value less than 0.5 after 200 epochs. The correlated function, namely, Y = F(X), using the ANFIS approach is also showed in Fig. 1. As can be seen, the designed ANFIS has a very good performance to depict the behavior of water inside the CNT. In addition, since both RMSEs are very small, we conclude that the proposed ANFIS has captured the essential components of the underlying dynamics. In other words, the designed ANFIS can successfully model and predict the flow rate of water molecules through the CNT as a function of time, which has been derived by the MD simulation. The resulting 16 fuzzy if-then rules are listed below.
If input1 is MF1 and input2 is MF1 and input3 is MF1 and input4 is MF1, then output = c 1 :Ỹ If input1 is MF1 and input2 is MF1 and input3 is MF1 and input4 is MF2, then output = c 2 :Ỹ If input1 is MF1 and input2 is MF1 and input3 is MF2 and input4 is MF1, then output = c 3 :Ỹ If input1 is MF1 and input2 is MF1 and input3 is MF2 and input4 is MF2, then output = c 4 :Ỹ If input1 is MF1 and input2 is MF2 and input3 is MF1 and input4 is MF1, then output = c 5 :Ỹ If input1 is MF1 and input2 is MF2 and input3 is MF1 and input4 is MF2, then output = c 6 :Ỹ If input1 is MF1 and input2 is MF2 and input3 is MF2 and input4 is MF1, then output = c 7 :Ỹ If input1 is MF1 and input2 is MF2 and input3 is MF2 and input4 is MF2, then output = c 8 :Ỹ If input1 is MF2 and input2 is MF1 and input3 is MF1 and input4 is MF1, then output = c 9 :Ỹ If input1 is MF2 and input2 is MF1 and input3 is MF1 and input4 is MF2, then output = c 10 :Ỹ If input1 is MF2 and input2 is MF1 and input3 is MF2 and input4 is MF1, then output = c 11 :Ỹ If input1 is MF2 and input2 is MF1 and input3 is MF2 and input4 is MF2, then output = c 12 :Ỹ If input1 is MF2 and input2 is MF2 and input3 is MF1 and input4 is MF1, then output = c 13 :Ỹ If input1 is MF2 and input2 is MF2 and input3 is MF1 and input4 is MF2, then output = c 14 :Ỹ If input1 is MF2 and input2 is MF2 and input3 is MF2 and input4 is MF1, then output = c 15 :Ỹ If input1 is MF2 and input2 is MF2 and input3 is MF2 and input4 is MF2, then output = c 16 :Ỹ where Ỹ ¼ ½YðX À 3Þ; YðX À 2Þ; YðX À 1Þ; YðXÞ; 1 and c i is the ith row of the following consequent parameter matrix C: The linguistic labels MF1 i and MF2 i (i = l to 4) are defined by the bell membership function (with different parameters a, b, and c): Table 1 also lists the linguistic labels and the corresponding consequent parameters in Eq. 1. Each of these parameters has a physical meaning: c determines the center of the membership function, a is the half width of the membership function and b (together with a) controls the slopes at the crossover points (where the membership function value is 0.5). An example of the bell membership function is showed in Fig. 2.
Indeed, in this case study, the MD simulation can explain the observed phenomenon (i.e., the water flow inside the CNT). However, it would be so difficult to explain and interpret this phenomenon, if we consider other parameters such as temperature, pressure, shape, length, etc., as a variable parameter. In the latter case, the MD simulation is just able to draw a picture of the phenomenon and it is not able to tell us about the effect of each parameter on the phenomenon, the effect of combined parameters (e.g., pressure and temperature together), etc. On the other hand, modeling of this phenomenon using the ANFIS approach would provide us invaluable information in the analysis of the process (e.g., the underlying physical relationships among influencing parameters) and therefore design of new applications. Therefore, this methodology holds potential of becoming a useful tool in modeling and predicting the behavior of molecular flows through the CNTs (or generally nanoscale dimensions). In addition, the proposed ANFIS has the following advantages: 1) If the human expertise is not available, we can still set up intuitively reasonable initial membership functions and start the learning process to generate a set of fuzzy if-then rules to approximate a desired data set. 2) An ANFIS is able to learn and therefore generalization. Generalization refers to the production by the ANFIS of reasonable outputs for inputs not encountered during the training process. The generalization ability comes from this fact that while the training process is occurring, the checking data set is used to assess the generalization ability of the ANFIS. As a result, a well-designed ANFIS is capable of producing reliable output(s) for unseen input(s). Just imagine that the ANFIS approach could give us the reliable results for unseen cases, which are so difficult to perform by the computer simulations and to do by experiments. In addition, in those cases, which we can perform computer simulations and/or experiments, using a designed ANFIS is much faster and easier than doing computer simulations and/or experiments since doing a corresponding experimental work is a difficult task and takes much time and cost and using computer simulations also take much time in the order of several days with the aid of a supercomputer. However, a designed ANFIS can be run on a normal personal computer. 3) An ANFIS ignores a relatively large amount of noise or variations in solving problems while it derives  principle rules of a given problem. Statistical errors in reported data either from experiments or from computer simulations can always be expected. Generally, experimentally measured values include statistical errors since by repeating an experiment the same result is not achieved. Interestingly, such errors can also be observed in computer simulations [24]. Since the obtained results using computer simulations and/or experiments bear statistical errors, we should repeat those tasks several times to ensure the accuracy of the results and therefore they are time and cost consuming. As a result, a model describing a phenomenon, which is capable of removing such undesirable errors, is needed. 4) A designed ANFIS is able to predict the reasonable output(s) for future unseen data sets. In other words, the predictive ability of a designed ANFIS is not restricted to the training data set. This property could be an asset in modeling of fluid flow at nanoscale dimensions since experimental reports on the dynamics of fluids inside nanotubes are less abundant than the static case. The main reason is the high complexity of experimental setups to perform such experiments. Therefore, available experiments have been performed, mostly on multi-wall nanotubes. On the other hand, a vast literature exists on computer simulations of liquid flow, liquid molecular structure under confinement, and interaction of liquid with the tube walls. Computing power requirements have, thus far, limited these findings to small single-wall nanotubes, creating a gap in terms of nanotube size between experimental and simulation results [25]. As a result, computer simulations should be extended beyond these computational limits. This can only be achieved with new algorithms that allow for the coupling of different simulation methods on different scales, both in time and space. The need to make such extrapolations has practical applications, such as in determination of the osmotic permeability of nanochannels [26].

Conclusion
In summary, by employing an ANFIS approach, we succeeded to derive fuzzy if-then rules to describe the inputoutput behavior of water diffusing through a nanochannel. Some advantages of this approach over conventional methods for modeling and prediction of this phenomenon were mentioned. The application shown here is only the tip of an iceberg. We hope that this tool could provide us new insights in the field of nanofluidics where the continuum models of fluid dynamics tend to break down.