Background

Memristor crossbars have been studied for many years for neuromorphic pattern recognitions [15]. Memristor crossbars can be thought very suitable to pattern recognition, in which all the columns of crossbar can be compared with the input pattern to find the best match simultaneously. Once the best-matched column is decided, the rest of columns are inhibited according to winner-take-all algorithm [3, 4].

Memristors which are used in pattern recognition can be either analog or binary. If we use analog memristor which can change its memristance gradually, pattern matching can be more accurate and demand a smaller number of memristors in crossbar array [6, 7]. However, analog memristor is more difficult to fabricate and more susceptible to noise and statistical variation than binary memristor [3]. Moreover, the number of memristive materials that show analog behavior is much smaller than the number of binary memristors. Based on these facts, binary memristors are used in pattern-matching crossbar, in this paper.

For the pattern-matching crossbar, we already proposed twin memristor crossbar (TMC) which could replace complementary memristor crossbar (CMC) [4]. CMC uses two memristor arrays of M+ and M to perform the exclusive NOR (XNOR) operation, where the M+ and M arrays are applied by the input vector and the inversion, respectively [3]. One thing to note here is that the number of low-resistance state (LRS) is very important in terms of sneak-path leakage because the leakage current flows mainly through LRS rather than high resistance state (HRS). In CMC, the total number of LRS in M+ and M arrays cannot be reduced at all, even though we use image compression algorithms such as discrete cosine transform (DCT) [4]. In CMC, M+ and M arrays are complementary to each other [3, 4]. It means that the same number of LRS in M array is increased always, though we reduce the number of LRS in M+ array using DCT [4]. Thus, the image compression becomes meaningless in CMC.

Unlike CMC, TMC uses two identical M+ arrays for performing XNOR operation. It means that the total number of LRS in the two identical arrays can be significantly reduced by using DCT, as explained well in the previous publication [4]. Based on TMC, we propose to apply a new time-sharing concept to TMC for reducing the number of TMC arrays by half, in this paper.

Method

Figure 1a shows the previous TMC with two identical M+ arrays. Here, the time-sharing concept is not used in Fig. 1a. The XNOR operation in TMC is expressed by the following Eq. (1) [4].

Fig. 1
figure 1

The conceptual diagram of twin memristor crossbars (TMCs) for pattern recognition. a The previous twin memristor crossbar (TMC) with two identical M+ arrays [4] and b the proposed time-shared TMC, where the number of crossbar arrays is reduced by half

$$ {y}_j={y}_j^{+}-{y}_j^{-}={\displaystyle \sum_{i=0}^{n-1}\left({a}_i{g}_{i, j}-{a}_i^{\hbox{'}}{g}_{i, j}\right)} $$
(1)

Using Eq. (1), we can measure the amount of similarity between the input vector and the stored pattern in TMC arrays. Here, the input vector is represented by a 0, a 1, …, a n − 1 which enters the upper M+ array. a ' 0, a ' 1, …, a ' n − 1 are the inversion of the input vector a 0, a 1, …, a n − 1 which enters the lower M+ array in Fig. 1a. The pattern stored at column j is represented by g 0,j , g 1,j , …, g n − 1,j . I0 is the inverter in Fig. 1a. S0 and W0 are the subtractor and weighting circuit, respectively, in Fig. 1a. S0 and W0 can be designed using CMOS current mirror very easily [4]. \( {y}_j^{+} \) and \( {y}_j^{-} \) can be obtained from the jth column currents of the upper M+ and lower M+ arrays, respectively, in Fig. 1a. y j means the amount of similarity of jth column with the input vector. Here, we assume that two jth columns in the upper and lower M+ arrays can store the same image in Fig. 1a. The number of columns in M+ array is as many as “m,” as shown in Fig. 1a. If we compare y j values from j = 0 to m−1, we can know the largest y j means the best matched column with the input vector. The largest y j can be chosen by the winner-take-all circuit, as shown in Fig. 1a [3, 4].

As we explained earlier, TMC is composed of two identical M+ arrays. These two identical arrays are applied by the input vector, a 0, a 1, …, a n − 1, and the inversion, a ' 0, a ' 1, …, a ' n − 1, respectively, as shown in Fig. 1a. These two arrays can be time-shared by applying a ' 0, a ' 1, …, a ' n − 1 and a 0, a 1, …, a n − 1, respectively, at different time, as shown in Fig. 1b. This is possible because both the input vector and its inversion are applied to the same array of M+ in Fig. 1b. By doing so, the time-sharing array can reduce the number of memristors by half, resulting in a great amount of area reduction. The operation of the time-shared TMC with two phases can be explained as follows. Here, for the first phase at t = k−1, we apply the inversion of input, a ' 0, a ' 1, …, a ' n − 1, to M+ array. At the following second phase at t = k, we apply the input vector, a 0, a 1, …, a n − 1, to the same M+ array with the previous time. By doing so, the input vector, a 0, a 1, …, a n − 1, and the inversion, a ' 0, a ' 1, …, a ' n − 1, can share the same M+ array at different time, respectively. The advantage of time-shared M+ array is array-area reduction. In Fig. 1b, the array area can be reduced by half, compared to two M+ arrays in Fig. 1a. I0 is the simple inverter, in Fig. 1b. Here, the multiplexer X0 and de-multiplexer D0 are controlled by the timing signal, CLK. When CLK is low, the inverted input enters the crossbar and we can obtain \( {y}_j^{-}={\displaystyle \sum_{i=0}^{n-1}{a}_i^{\hbox{'}}{g}_{i, j}} \) from the de-multiplexer D0. When CLK is high, the input vector is applied to M+ and the de-multiplexer D0 delivers \( {y}_j^{+}={\displaystyle \sum_{i=0}^{n-1}{a}_i{g}_{i, j}} \) to the time-shared subtractor S0 which will be shown in Fig. 2b. W0 is the weighting circuit in Fig. 1b. One more thing to note here is timing overhead due to the two-phase operation in Fig. 1b. The overall operation time in pattern recognition includes not only the time of crossbar array but also the time of winner-take-all circuit. Usually, because the time needed in the winner-take-all circuit is much longer than the time of crossbar operation, the overhead of two-phase operation of Fig. 1b can be ignored. Compared to negligible overhead of the two-phase operation in Fig. 1b, the array-area reduction is obviously as large as 50%.

Fig. 2
figure 2

The schematic of the proposed time-shared TMC. a The schematic of the proposed time-shared TMC for recognizing 10 images. b The detailed schematic of the time-shared subtractor of IC3. c The voltage and current waveforms of the time-shared subtractor. During the phase I, I is measured and stored at C1. During the following phase II, I+-I can be calculated from recalling the I which was measured during the previous phase I

Figure 2a shows the detailed schematic of the proposed time-shared TMC in Fig. 1b for recognizing 10 images from the image #0 to the image #9. M0,0, M0,1, M0,2, and M0,3 are memristors which correspond to the 0th pixel of the image #0. The image #0 is stored from the 0th row to 1023rd row. M1023,0, M1023,1, M1023,2, and M1023,3 are for the 1023rd pixel of the image #0. M1023,0, M1023,1, M1023,2, and M1023,3 should be weighted by ×1, ×2, ×4, and ×8, respectively, using the simple current mirror circuit, as explained in [4]. M0,0 is applied by a0<0> and the inversion a’0<0>, respectively, at different time, which is controlled by CLK signal. Similarly, M0,3 is applied by a0<3> and a’0<3>, at different time. COL0,0, COL0,1, COL0,2, and COL0,3 are for calculating the pattern-matching current of the image #0, with the weight of 1, 2, 4, and 8, respectively. In Fig. 2a, I0 and I1023 are the simple inverters, for a0 and a1023, respectively. X0 and X1023 are the multiplexers for a0 and a1023, respectively. D3 is the de-multiplexer for COL0,3. S3 and W3 are the subtractor and weighting circuit for COL0,3, respectively. The column current of COL0,3 is delivered to IC3 which is composed of D3, S3, and W3 in Fig. 2a, for COL0,3. The detailed schematic of IC3 is shown in Fig. 2b. The winner-take-all circuit can decide the best match array with the input image among 10 arrays which store 10 images, respectively.

Figure 2b shows the time-shared subtractor, IC3, for the column COL0,3 in Fig. 2a. IC3 is composed of D3, S3, and W3, as shown in Fig. 2a. The IC3 circuit has two phases of operation, which are the phase I and the phase II, respectively. Simply explaining, I current is measured during the phase I and I+–I current is calculated using the previously measured I during the phase II. If we look at Fig. 2b, the amount of \( {I}_{0,3}^{-} \) is obtained from the COL0,3 and stored in C1, during the phase I, for the inverted input of a ' 0, a ' 1, …, a ' n − 1. At this time, S1 and S2 are on and S3 is off. During the following phase II, S1 and S2 are off and S3 is on. During this phase II, \( {I}_{0,3}^{+} \) is measured from the COL0,3 and \( {I}_{0,3}\left(={I}_{0,3}^{+}-{I}_{0,3}^{-}\right) \) is calculated by the current mirror circuit of M1, M2, and M3. Here, the subtraction is performed by the current of M2 which can recall \( {I}_{0,3}^{-} \), stored at C1 during the previous phase I, as shown in Fig. 2b. Here, it can be noted that the de-multiplexer function can be realized by controlling three switches of S1, S2, and S3. The subtraction can be performed by the current mirror circuit of M1, M2, and M3. The weighting is realized by sizing of M3 and M4, in Fig. 2b. Similarly, \( {I}_{0,2}\left(={I}_{0,2}^{+}-{I}_{0,2}^{-}\right) \), \( {I}_{0,1}\left(={I}_{0,1}^{+}-{I}_{0,1}^{-}\right) \), and \( {I}_{0,0}\left(={I}_{0,0}^{+}-{I}_{0,0}^{-}\right) \) are also calculated from IC2, IC1, and IC0, respectively, in Fig. 2a. I 0,3, I 0,2, I 0,1, and I 0,0 are added to each other and the weighted sum I 0(=8I 0,3 + 4I 0,2 + 2I 0,1 + I 0,0) is delivered to the winner-take-all circuit, in Fig. 2a [3, 4]. In the winner-take-all, I 0 of the image #0 is compared with the other currents of I 1, …, I 9 from the image #1 to the image #9.

The detailed timing diagram of the time-shared subtractor is shown in Fig. 2c. During the phase I, when S1 and S2 are on and S3 is off, the circuit IC3 in Fig. 2b measures \( {I}_{0,3}^{-} \) and stores the measured amount of \( {I}_{0,3}^{-} \) for the inverted input vector, at the capacitor C1. From Fig. 2c, VC1 represents the amount of current of \( {I}_{0,3}^{-} \) which is converted to the capacitor’s voltage, during the phase I. During the following phase II, S1 and S2 become off and S3 is on. We can calculate an amount of \( {I_{0,3}}^{+}-{I}_{0,3}^{-} \) by measuring I 0,3 + and recalling \( {I}_{0,3}^{-} \) which was stored at C1 from the previous phase I. In Fig. 2b, we used the weighting factor as large as 8, resulting in \( 8\times \left({I_{0,3}}^{+}-{I}_{0,3}^{-}\right) \) in Fig. 2c.

Results and Discussion

The time-shared TMC proposed in this paper was verified by the fabricated 3 × 3 memristor crossbar. Figure 3a shows the fabricated single memristor which is made of carbon fiber and aluminum film [8, 9]. Here, the carbon fiber is placed on the top of thermally evaporated aluminum film like a stripe pattern. The fabrication process can be explained as follows [8, 9]. First, aluminum (Al) wire with 100 nm thickness is evaporated on a glass substrate with a 1-mm thickness. And then, a carbon fiber with 5 ~ 10-μm diameter is placed on the patterned aluminum film. The carbon fiber and aluminum film act as the top and bottom electrodes, respectively [8, 9]. The fabricated memristor demonstrated the memristive switching behavior, as shown in Fig. 3b. Here, the applied voltage is swept from −2.5 to 2.5 V and vice versa. For the positive sweep, SET-to-RESET switching can be found around 1.7 V, as shown in Fig. 3b. For the negative sweep, RESET-to-SET switching was observed around −1.8 V. The measured high-resistance state (HRS) was measured 1000 times higher than the low-resistance state (LRS) for this fabricated memristor.

Fig. 3
figure 3

The schematic of the fabricated memristor and its voltage-current relationship. a The schematic of the fabricated memristor device, where carbon fiber and aluminum film crossed each other on glass substrate [8, 9]. b The measured current-voltage relationship that shows memristive hysteresis, in which the memristor’s voltage is swept between −2.5 and +2.5 V

Using the fabricated memristors, 3 × 3 memristor crossbar was measured to verify the operation of the time-shared TMC proposed in this paper. The time-shared crossbar employs only one 3 × 3 array instead of two arrays as explained in Fig. 1a, b. Figure 4a shows the measurement setup for testing the time-shared TMC with 3 × 3 array. Here, we used Keithley 4200-SCS (Semiconductor Characterization System) to apply the programming and reading pulses to the memristor crossbar which has three rows and three columns. The switching matrix (Keithley 708B) is used to deliver the voltage pulses to three rows and three columns from the Source-Measure Units (SMU) of Keithley 4200. In Fig. 4a, we stored three patterns of [LHH], [HHL], and [HLH], at the three columns of crossbar, respectively. Here, “L” and “H” mean LRS and HRS, respectively.

Fig. 4
figure 4

The measured results of pattern matching of the proposed time-shared TMC. a The measurement setup and the measured memristance values of 3 × 3 TMC. Here, the memristor crossbar were programmed to store [LHH], [HHL], and [HLH] at the 1st, 2nd, and 3rd columns, respectively. b The measured column currents for the input vector of [LHH]. The 1st column shows the largest amount of current among three columns. c The measured column currents for the input vector of [HHL]. The 2nd column shows the largest amount of current among three columns. d The measured column currents for the input vector of [HLH]. The 3rd column shows the largest amount of current among three columns

For testing the pattern recognition of the time-shared TMC, we applied three different input vectors to the crossbar, which are [LHH], [HHL], and [HLH], respectively. Figure 4b shows the measured currents of three columns when we apply the input vector [LHH] and its inversion [HLL] to the time-shared crossbar, respectively. The measurement shows that the first column’s current is the largest among the three columns. Thus, the following winner-take-all circuit can choose the first column as a winner. When we apply the input vector [HHL] and the inversion [LLH], respectively, at different time, the time-shared subtractor measured the I + − I values for three columns. Comparing the three currents, the measurement shows the second column has the largest current, as shown in Fig. 4c. Similarly, the third column was measured to have the largest current among three columns, for the input vector [HLH] and its inversion [LHL], as indicated in Fig. 4d.

For verifying the image recognition of the time-shared TMC, we designed 1024 × 40 memristor crossbar with the CMOS time-shared subtractor and the CMOS winner-take-all circuit, as shown in Fig. 2a, b. The memristor crossbar and CMOS circuits were simulated together by the circuit simulator (CADENCE/SPECTRE) [10]. Here, the memristive behavior was modeled by Verilog-A in the circuit simulator [11]. The time-shared subtractor and winner-take-all circuit were designed by the commercial CMOS technology which was obtained from SAMSUNG 0.13-μm process. The tested images with 32 × 32 pixels are shown in Fig. 5a. Figure 5b compares the recognition rate between the original TMC and the proposed time-shared TMC for the recognition of 10 images. Here, the Gaussian noise is added to the tested images, in which the signal-to-noise ratio (SNR) varies from −10 to +10 dB for each image. From Fig. 5b, we can know that the proposed time-shared TMC and the original TMC show the same recognition rate for the 10 tested images. This simulation verifies that the proposed time-shared TMC has the same performance in pattern recognition with the original TMC.

Fig. 5
figure 5

The simulation results of the proposed time-shared TMC for image recognition application. a The tested images shown in the inset. b The comparison between the original TMC and the time-shared TMC for recognizing 10 images. Here, the signal-to-noise ratio is varied from −10 to +10 dB

Conclusions

In this paper, we proposed the time-shared TMC for pattern-recognition applications. By sharing two memristor arrays at different time, the number of memristor arrays can be reduced by half, saving the crossbar’s area by about half. To implement the time-shared TMC, we designed and verified the CMOS time-shared subtractor by the circuit simulation. The operation of the time-shared TMC was experimentally verified using the fabricated 3 × 3 memristor array which was made of aluminum film and carbon fiber. Here, we programmed the array to store three different patterns. By applying three different input vectors to the time-shared TMC, we could verify that the input vectors were recognized well by the proposed circuits. Moreover, the proposed time-shared TMC was tested for the recognition of more complicated gray-scale images. Here, 10 gray-scale images with 32 × 32 pixels were tested and verified to be recognized well by the proposed time-shared TMC, even though the noise level was varied from −10 to +10 dB.