Neuromorphic crossbar circuit with nanoscale filamentary-switching binary memristors for speech recognition

In this paper, a neuromorphic crossbar circuit with binary memristors is proposed for speech recognition. The binary memristors which are based on filamentary-switching mechanism can be found more popularly and are easy to be fabricated than analog memristors that are rare in materials and need a more complicated fabrication process. Thus, we develop a neuromorphic crossbar circuit using filamentary-switching binary memristors not using interface-switching analog memristors. The proposed binary memristor crossbar can recognize five vowels with 4-bit 64 input channels. The proposed crossbar is tested by 2,500 speech samples and verified to be able to recognize 89.2% of the tested samples. From the statistical simulation, the recognition rate of the binary memristor crossbar is estimated to be degraded very little from 89.2% to 80%, though the percentage variation in memristance is increased very much from 0% to 15%. In contrast, the analog memristor crossbar loses its recognition rate significantly from 96% to 9% for the same percentage variation in memristance.


Background
The memristors that had been mathematically predicted by Leon O. Chua in 1971 as the fourth basic circuit element [1] were experimentally found in 2008 [2]. Since the first prediction of memristors, they have been thought as a potential candidate for future neuromorphic computing systems. Among the many advantages of memristors, particularly, the nonlinear charge-flux relationship is important in mimicking synaptic plasticity of biological neuronal systems such as human brains [3][4][5][6][7].
In realizing memristor-based synaptic systems, a crossbar circuit that is made of only passive memristors can be thought of as the densest and simplest architecture among various synaptic circuits that have been developed previously. If a crossbar circuit is made of both memristors and selectors such as transistors and diodes, this kind of hybrid-type crossbar circuit is difficult to be stacked layer by layer. Thus, the pure crossbar circuit with only passive memristors can be a key element to implement the densest and simplest three-dimensional architecture of neuromorphic systems.
A conceptual diagram of a neuromorphic speechrecognition system is shown in Figure 1. In Figure 1, a voice signal enters the cochlea first. In the cochlea, the voice input is divided into many different channels according to the voice's frequencies. Basically, the cochlea is modeled as a group of band-pass filters, where the voice input is divided and filtered by a band-pass filter array with the frequency range from 20 Hz to 20 KHz [8,9]. Each channel in the band-pass filter array can deliver a different band signal to the crossbar circuit as shown in Figure 1. Here, we assume that our goal is recognizing five vowels: 'a', 'i', 'u', 'e', and 'o', from the input of a human voice. To do so, the voice input is filtered and sampled as the cochlea does. Then, the filtered and sampled signals go into the memristor crossbar circuit as shown in Figure 1, where the voice input is compared with the previously trained patterns of five different vowels which are already stored in the memristor crossbar array. By doing so, we can decide which vowel among the five different vowels is the best match with the voice input to the crossbar array.
In realizing a memristor crossbar circuit, we can use either analog memristors [10,11] or binary memristors [12][13][14][15][16][17] as shown in Figure 2a,b. For the analog memristors in Figure 2a, their memristance value can be changed gradually and not abruptly due to the interface-switching mechanism. In the interface-switching behavior, the interface between the low-resistance region and the highresistance region can be controlled precisely according to an applied voltage or current. As a result, we can store not only binary data but also analog data on the interfaceswitching memristors with high accuracy. However, materials that show the interface-switching behavior are not so popular, and the accuracy in controlling the memristance value is still considered to be a big concern. Also, even a small amount of memristance variation can degrade the overall accuracy severely in analog-memristor-based neuromorphic systems. On the contrary, most memristors are known that they are based on the filamentary-switching mechanism. In filamentary switching, memristors can have either a high resistance state (HRS) or a low resistance state (LRS) as represented in Figure 2b. By doing so, we can store only '1' or '0' on the filamentary-switching binary memristors.
In addition to the advantage of popularity of filamentaryswitching materials, binary memristors can be much more tolerant against statistical variations compared to analog memristors. This is due to the fact that HRS can still be much higher than LRS, in spite of the large amount of statistical variation in LRS and HRS.
In this paper, we propose a binary memristor crossbar circuit for recognizing five different vowels. The block diagram and the detailed circuit schematic are shown and explained in the following section. In addition, the circuit simulation and statistical simulation are performed, and the simulation results are discussed and finally summarized in this paper [18]. Figure 3 shows a block diagram of the binary memristor crossbar circuit for recognizing five vowels: 'a', 'i', 'u', 'e', and 'o'. The voice input is divided into 64 channels according to the voice's frequencies. The magnitude of each channel is sampled and digitized by 4 bits. The band-pass filtering, sampling, and digitization for the voice input are implemented by MATLAB simulation in this paper. The 4-bit 64 channel inputs that are obtained by MATLAB simulation are applied to the binary memristor crossbar array as shown in Figure 3. For recognizing five vowels, we need not only 4-bit 64 channel inputs but also their inverted values. Thus, the total number of channel inputs is as many as 128 with 64 channels of the true signals and 64 channels of the inverted signals. Each channel is composed of 4-bit binary values. In Figure 3, I a,0 is the current of the 'x1' column in the crossbar array for recognizing 'a'. I a,1 is the current of the 'x2' column in the crossbar array for recognizing 'a'. Similarly, I a,2 and I a,3 are the currents of the 'x4' and 'x8' columns in the 'a' crossbar array. Here, 'x1' means that the weight of this column current is as much as 1. In Figure 3, 'x2', 'x4', and 'x8' mean that the weight values are 2, 4, and 8, respectively, for the  Interface between the undoped and doped regions Figure 2 Analog memristors with interface-switching mechanism and binary memristors with filamentary-switching mechanism. (a) Analog memristor with the interface-switching mechanism [10,11], where the memristance value can be changed gradually from LRS to HRS, and (b) binary memristor with the filamentary-switching mechanism [12][13][14][15][16][17], where the memristance value can be changed very abruptly between LRS and HRS. corresponding columns in the 'a' crossbar array. Here, I a can be calculated with the weighted summation of 8I a,3 + 4I a,2 + 2I a,1 + I a,0 . Similarly, I u is the weighted summation of 8I u,3 + 4I u,2 + 2I u,1 + I u,0 for recognizing 'u'. I o is the weighted summation of 8I o,3 + 4I o,2 + 2I o,1 + I o,0 for recognizing 'o'. The currents of I a , I i , I u , I e , and I o are compared with each other in the winner-take-all circuit [19] to decide which vowel is the best match with the voice input as shown in Figure 3. Output a , Output i , Output u , Output e , and Output o are the output signals of the winner-take-all circuit. Figure 4a shows the detailed schematic of the binary memristor crossbar circuit. Here, 64 input channels are applied to the crossbar circuit. Each channel has 4-bit binary values and each binary value is divided into true and inverted signals as shown in Figure 4a. M 1,0 , M 1,1 , M 1,2 , and M 1,3 are memristors of the 'x1' column, 'x2' column, 'x4' column, and 'x8' column, respectively, for the crossbar array of vowel 'a'. These four memristors are connected to the true signal of channel 1. Similarly, M 2,0 , M 2,1 , M 2,2 , and M 2,3 are memristors of the 'x1' column, 'x2' column, 'x4' column, and 'x8' column, respectively, which are connected to the inverted signal of channel 1.

Methods
The weighted summation of I a is calculated with 8I a,3 + 4I a,2 + 2I a,1 + I a,0 , as explained just earlier. The circuit for performing the weighted summation is implemented by current mirror circuits as shown in Figure 4a. For example, to realize the weight of '1', we use the current mirror circuit, which is composed of M 7 and M 8 . Here, M 7 and M 8 should have the same size. By doing so, I a,0 of M 7 can be copied to M 8 . If the weight is 2, the size of M 6 should be twice larger than M 5 . Thereby, the current of M 6 can be twice larger than I a,1 . For the weight factor of 4, M 4 should be four times larger than M 3 . For the weight factor of 8, M 2 should be eight times larger than M 1 . The currents of M 2 , M 4 , M 6 , and M 8 can be summated by Kirchhoff's current law. The capacitor C a can be discharged by the weighted summation of I a , which comes from M 2 , M 4 , M 6 , and M 8 . If the weighted summation of I a is large, C a can be discharged to GND very fast. Here, GND means the ground potential. If the weighted summation of I a is small, it takes longer time to discharge C a to GND. M 9 is the precharge PMOS, which becomes on when the clock (CLK) signal is low. If M 9 is on, the VC a node is precharged by V DD . When the CLK signal is high, M 9  Figure 3 The block diagram of the proposed binary memristor crossbar circuit with 4-bit 64 input channels. Each 4-bit input channel is composed of the true signal and the inverted signal.  is off. At this time, VC a can be discharged by the weighted summation of I a that comes from M 2 , M 4 , M 6 , and M 8 . Figure 4b shows the winner-take-all circuit that can decide which capacitor becomes discharged the fastest among the five capacitors of C a , C i , C u , C e , and C o . The five capacitors of C a , C i , C u , C e , and C o are corresponding to the five vowels 'a', 'i', 'u', 'e', and 'o', respectively. Using the winner-take-all circuit, we can figure out that a certain vowel corresponding to the fastest-discharged capacitor is the best match with the input of a human voice. VC a , VC i , VC u , VC e , and VC o are the voltages on capacitors C a , C i , C u , C e , and C o , respectively. Here, I 1 , In Figure 4a, we may be concerned that the reverse current through LRS and HRS may degrade the recognition rate. To elaborate on this reverse current more, we assume two cases of memristor crossbar circuit that are matched and unmatched as shown in Figure 5a,b, respectively. In Figure 5a, V i,0 and V i,1 are 0 and 1, respectively. These inputs match the stored memristance values of M 1 , M 2 , M 3 , and M 4 . Here, HRS means high resistance state and LRS is low resistance state. The current summation of I a can be calculated with I a = I 2,a + I 3,a − I 1,a − I 4,a . I 2,a and I 3,a are the forward currents through M 2 and M 3 that are LRS. I 1,a and I 4,a are the reverse currents through M 1 and M 4 that are HRS. In calculating this current summation, I a can be expressed simply with I a ≈ I 2,a + I 3,a because the reverse currents of I 1,a and I 4,a are much smaller than the forward currents of I 2,a and I 3,a . As we know, HRS is much larger than LRS; thus, we can ignore I 1,a and I 4,a in calculating I a . From  Figure 5a with the unmatched column's current of I b , we can be sure that I a is much larger than I b . Thus, we can think that the reverse current does not degrade the recognition rate.

VC
The simulated waveforms of VC a , VC i , VC u , VC e , and VC o are shown in Figure 6. Here, VC a seems to be discharged by GND faster than the other capacitor nodes of VC i , VC u , VC e , and VC o . It means that the voice input matches with the vowel 'a' better than the other vowels. The timing diagram of important signals in Figure 4a is the largest amount of current, VC a is discharged by GND faster than VC i , VC u , VC e , and VC o . If VC a becomes lower than V REF , D a becomes high. As explained earlier, because VC a is the fastest falling node among the five capacitive nodes, D a can also be the fastest rising signal among D a , D i , D u , D e , and D o . The fastest rising signal of D a can generate the locking pulse that can be used as the clock signal of D flip-flop circuits of FF 1 , FF 2 , FF 3 , FF 4 , and FF 5 . By doing so, we can decide which vowel is the best match to the voice input. The first-rising signal of D a makes Output a high, as shown in Figure 7. The other output signals, such as Output i , Output u , Output e , and Output o , are prevented from rising from low to high by the locking pulse that is generated by the firstrising signal of D a .

Results and discussion
In this work, the memristor-CMOS hybrid circuits were simulated by Cadence Spectre software. Here, memristors were modeled by Verilog-A [20,21], and CMOS SPICE parameters were obtained from Samsung's 0.13-μm CMOS technology. The training and recalling process of the memristor crossbar array are shown in Figure 8a. In this paper, we used 100 samples for training a crossbar array to learn the vowel 'a'. Similarly, we used 400 samples for the crossbar array to learn four vowels: 'i', 'u', 'e', and 'o'. By the training process, we can find the best memristance values of the crossbar array for maximizing the recognition rate of five vowels: 'a', 'i', 'u', 'e', and 'o' [18]. The memristance values that are found by the training process were written to the crossbar array circuit by the V DD /3 write scheme that is known better in mitigating the half-selected cell problem compared to the V DD /2 write scheme [22]. For the training process, we have to convert the original speech signal to a 4-bit 64-channel digitized signal. In a biological system, the cochlea in the human ear can perform this conversion function. In this paper, we used MATLAB software that performs the same conversion function with the human cochlea. The cochlea function that is simulated by MATLAB software is shown in Figure 8b. The function of the cochlea can be modeled by preprocessing, framing, windowing, discrete Fourier transforming (DFT), band-pass filtering, and digitization [23]. For the digitization process, 64 outputs from 64 band-pass filters are converted to 4-bit binary signals and they are delivered to the rows of the memristor crossbar array. For the band-pass filtering, the nonlinear frequency scale which is known as the mel scale is used [23]. In the mel scale, the frequency scale is linear up to 1,000 Hz and is logarithmic when the input voice has a higher frequency than 1,000 Hz [23]. (a) (b) Figure 10 Statistical distribution of memristance and comparison of recognition rate between analog and binary memristor crossbar.
(a) Statistical distribution of memristance with the standard deviation as much as 10%, and (b) comparison of the recognition rate between the analog memristor crossbar and binary memristor crossbar with varying percentage variation in memristance from 0% to 15%. Figure 9 shows the simulation results for the recognition rate of the proposed binary memristor crossbar circuit. In this case, we tested 2,500 input voices for recognizing five different vowels. Each vowel is tested by 500 different voices. The average recognition rate of five different vowels is estimated to be around 89.2%. Among the five vowels, the recognition rate of 'u' is the highest at 95.2% while the vowel 'e' has the lowest recognition rate, as low as 84%. Figure 10a shows the statistical variation of memristance in HRS and LRS with the standard deviation (=σ) of 10%. The statistical variation was obtained by Monte Carlo simulation that was also provided by Cadence software. This statistical simulation is very important because real memristors are susceptible to process variation. To analyze how tolerant the proposed binary memristor crossbar is against the memristance variation, we tested various cases of memristance variation from 0% to 15%. In Figure 10b, we compared the proposed binary memristor crossbar circuit with the analog memristor crossbar one increasing the percentage variation in memristance from 0% to 15%.
When the memristance variation is as low as 0%, the recognition rate of the analog memristor array is higher by 6.8% than the binary memristor array. This is due to the fact that the proposed binary memristor crossbar has a 4-bit resolution; thus, it loses some amount of accuracy compared to the analog memristor crossbar. As the percentage of variation in memristance is increased, the recognition rate of analog memristor crossbar becomes degraded very rapidly. For example, when the percentage variation in memristance becomes 5%, the recognition rate of the analog crossbar is decreased from 96% to 23%. On the contrary, the binary memristor crossbar can keep almost the same amount of recognition rate for five vowels. For a percentage variation as severe as 15%, the analog crossbar shows a recognition rate as low as 9%. However, the binary crossbar still keeps the recognition rate as high as 80%, indicating that it is only degraded by 9.2% compared to the percentage variation of 0%. This strong tolerance of the binary memristor crossbar is due to the fact that the accuracy of the information stored in binary memristors can be little affected by the percentage variation in memristance. Memristance of LRS can still be much smaller and cannot become larger than that of HRS, even though the percentage variation in LRS is very large. This is the reason why the binary memristor crossbar can maintain the recognition rate over 80% regardless of the percentage variation in memristance.

Conclusions
In this paper, the binary memristor crossbar circuit was proposed for neuromorphic application of speech recognition. Compared with analog memristors that are rare in available materials and need a complicated fabrication process, binary memristors which are based on the filamentary-switching mechanism are found more popularly and easy to be fabricated. Thus, we developed the neuromorphic crossbar circuit using filamentary-switching binary memristors instead of interface-switching analog memristors. The proposed binary memristor crossbar could recognize five vowels with 64 input channels and a 4-bit resolution. The proposed crossbar array was tested by 2,500 speech samples and verified to be able to recognize 89.2% of the total tested samples. Moreover, the recognition rate of the binary memristor crossbar is degraded very little only from 89.2% to 80%, even though the percentage statistical variation in memristance is increased from 0% to 15%. In contrast, the analog memristor crossbar is degraded significantly from 96% to 9% with the same percentage variation in memristance.