## Abstract

In electroholography, a real-time reconstruction is one of the grand challenges. To realize it, we developed a parallelized high-performance computing board for computer-generated hologram, named HORN-5 board, where four large-scale field programmable gate array chips were mounted. The number of circuits for hologram calculation implemented to the board was 1,408. The board calculated a hologram at higher speed by 360 times than a personal computer with Pentium4 processor. A personal computer connected with four HORN-5 boards calculated a hologram of 1,408×1,050 made from a three-dimensional object consisting of 10,000 points at 0.0023 s. In other words, beyond at video rate (30 frames/s), it realized a real-time reconstruction.

©2005 Optical Society of America

## 1. Introduction

Electroholography system [1–3] by computer-generated hologram (CGH) [4] is said to have the potential of an ultimate three-dimensional (3-D) television because holography is the only technology that can directly record and reconstruct a 3-D image. It is, however, difficult to develop the system into practical use since electroholography requires a high-performance computational power for its real-time reconstruction [5]. A real-time reconstruction is very important also at a point of view that it can give us interactive operation between a man and a display, which expands its usage such as a 3-D computer display.

In holography, an image is reconstructed with a diffracted light. Therefore, a spatial light modulator (SLM) on which a fringe pattern of hologram is displayed has to be a high resolution. Recently, the resolution of a reflective liquid-crystal display (LCD) or a digital micromirror device (DMD) has become increasingly higher and the reconstructions by them as a SLM have been reported [6–8]. A less than 10-µm pixel-pitch panel is already available on the market. A minute display panel, however, inevitably increases the calculation cost. In holography, the cost of data processing is proportional to *M*×*N*, where *M* is the number of points of a 3-D object and *N* is the number of points of a hologram (the display resolution), whereas the cost in a two-dimensional (2-D) display system is proportional only to the display resolution. Some researchers developed the fast algorithms [9–12] which calculate CGH beyond 10 times faster than the direct calculation algorithm. At present time, however, even the fast algorithms cannot reconstruct electroholography at video rate.

As another approach, hardware for elctroholography has been studied. The research group of MIT Media Lab developed a special-purpose computational board implemented into their holographic video system. It was built by using an array of regularly spaced holographic elements as the unit of a fringe pattern of CGH [13, 14]. The calculation speed recorded 50 times faster than a those day’s workstation. Our research group has also studied a special-purpose hardware system for holography, named HORN (HOlographic ReconstructioN) since 1992 [15–19].

Because the pixel-pitch of a LCD or a DMD is roughly 10 µm, the diffracted angle is narrow at ~3°. In an electroholography system with a LCD or a DMD as a SLM, therefore, we adopt the algorithm of in-line hologram. Namely, a plane-wave reference light is incident perpendicularly on a hologram. In that case, the intensity of hologram point is calculated by a simple arithmetic operation as follows [9]:

Here, the indices *α* and *j* show the hologram and the object respectively, the parameters *x, y* and *z* mean the horizontal, vertical and depth components, *A*_{j}
is the intensity of the object point, and *λ* is the wavelength of the reference light. In making CGH, the calculations of Eq. (1) represent almost all of the total calculation cost. If we can accelerate the calculation of Eq. (1), therefore, we can accelerate the total calculation time. The HORN computers have been designed and developed to calculate Eq. (1) by hardware. Figure 1 is the basic structure of the HORN system consisting of a host computer and a special-purpose hardware HORN. We adopt a general-purpose computer, usually a personal computer (PC), as the host computer to deal with computational tasks except for the calculation of Eq. (1).

The first machine, HORN-1, was developed in March 1993 [15, 16]. It was designed by pipeline architecture, one of parallel processing methods, and built with integrated circuit (IC) chips. HORN-1 had a restriction to deal with only a fixed size hologram of 400×400 grids since we developed the hardware easily. The total number of chips was 26 and the clock frequency was 10 MHz. The calculation speed of HORN-1 was approximately 100 times faster than that of a PC in those days, which showed a special-purpose computer system by pipeline architecture is suited to calculate CGH.

The second machine, HORN-2, was developed in April 1994 [17]. It was improved over HORN-1 to deal with any size of hologram. In HORN-2, the pipeline of Eq. (1) was built with digital signal processors (DSP) chips. The total number of chips was 76 and the clock frequency was 10 MHz.

The third machine, HORN-3, was developed in July 1999 [18]. It was designed and built with field programmable gate array (FPGA) technology. FPGA is a rewritable large scale IC (LSI) and has rapidly developed since the middle of 1990’s. We could implement one pipeline of Eq. (1) into one FPGA chip, not a board. The size of the FPGA was about 70,000 gates. Since data on CGH can be calculated individually in parallel, if we can implement the calculation circuits of Eq. (1) massively to a chip, the total calculation accelerates in proportion to the number of the implemented circuits. It was, therefore, very important that the pipeline was implemented into a chip. The clock frequency of HORN-3 was 20 MHz.

The fourth machine, HORN-4 was developed in January 2001 [19]. In HORN-4, we designed the pipeline by use of our proposed algorithm which calculates CGH by recurrence formulas to reduce the circuit size [12]. As the result, we could implement 21 pipelines for CGH into one FPGA chip of 300,000 gates. On the HORN-4 board, we mounted two FPGA chips. HORN-4 activated at 35MHz and operated 42 pipelines in parallel.

In our previous studies, from HORN-1 to HORN-4, we had shown the effectiveness of the special-purpose hardware system for electroholography by the hand-made wire-rapping machines step by step. On the basis of the results, in this study, we developed the massively parallelized calculation board for CGH, named the HORN-5 board, by large-scale FPGA and printed circuit board (PCB) technology. On the HORN-5 board, we mounted four large-scale FPGA chips which had 7 million gates. The HORN-5 board activates at 166 MHz and calculates CGH in parallel of 1,408, whose calculation speed is beyond 100 times faster than HORN-4. We designed the HORN-5 board by PCI (Peripheral Component Interconnect) standard. Therefore, we can connect several HORN-5 boards into a PC. In actually, a PC with four HORN-5 boards calculated CGH in parallel of 5,632 and realized a real-time reconstruction for a hologram of 1,408×1,050 (~1.5 Mega pixels) resolution and a 3-D object consisting of 10,000 points.

In Section 2, we describe the hardware design of HORN-5. In Section 3, we describe the performance of HORN-5 and show an example of the reconstruction. Section 4 is for discussion and future work, where we discuss a parallel system with HORN architecture toward practical use of 3-D television by electroholography.

## 2. Hardware design

In the first place, we show the total system of HORN-5 including the optical setup in Fig. 2. The system consists of a reflective LCD panel as the SLM, a light-emitting diode (LED) as the reference light source, a pinhole filter, a collimator lens, a beam splitter, a field lens (an output lens) and a PC connected to the HORN-5 boards. The LCD panel used here is DILA-SX070 by Victor [20], where the pixel-pitch is 10.4 µm×10.4 µm and the resolution is 1,408×1,050. The panel size is 14.6 mm×10.9 mm. The response time of less than 16 ms, which means more than 62.5 frames/s, guarantees a reconstruction at video rate. The reference light of the LED goes through the pinhole filter (diameter 0.5 mm) and is converted into the plane wave by the collimator lens (focal length 300 mm). The beam splitter is used for the reflective mode LCD and the field lens for observing a real image of holography. As for the calculation condition, we set the distance between the hologram (LCD panel) and the place where the real image is reconstructed to 1000 mm. Since the diffracted angle by 10-µm pixel-pitch is ~3° for the wave length of 500–700 nm (visible light range), the 1000-mm distance provides ~50-mm reconstruction area. We set the field lens (focal length 300 mm) at the reconstruction place and observe a real image of holography ~300 mm from the field lens.

Next, we describe the fast algorithm of CGH fitted for hardware [12, 19], by which we design the circuits for HORN-5. Since *x, y≪z* in the above system, Eq. (1) can be approximated as the following expression by Fresnel approximation:

Here, we replace (*x*_{α}*-x *_{j}
) and (*y*_{α}*-y*_{j}
) with *x*_{αj}
and *y*_{αj}
. Normalizing the parameters of the positions in Eq. (2) by the pixel-pitch of hologram, *p* (10.4 µm in the HORN-5 system), we obtain the next equation:

Here, *X, Y*, and *Z* are integers. Using Eq. (3), we can calculate CGH with recurrence formulas by additions. We show the schematic drawing in Fig. 3 and the equations as follows.

In the beginning, we calculate the intensity *I*(*X*_{α}*,Y*_{α}
) at one point on the hologram by using *Θ*
_{0}, namely, by calculating Eq. (5) directly. Next, for the points on the horizontal (*x*-axis direction) line, we can calculate them by using Eq. (6), namely, by additions only. For the next line, we replace *Y*_{α}
with *Y*
_{α+1} and calculate them with the same procedure. The significant feature of this algorithm is that it requires few multiplications. It is a large advantage for hardware design because a multiplier occupies a large area of circuits.

The HORN-5 hardware is constructed from two modules: BPU (Basic Processing Unit) which calculates Θ_{0} as shown in Fig. 4 and APU (Additional Processing Unit) which calculates Θ_{k≠0} as shown in Fig. 5. Here, we omit *A*_{j}
as the light intensities emitted from object points are all the same value, and we rewrite

The 14-bit data width of *X *_{j}
or *Y*_{j}
means the maximum width or height of an object graphics is 2^{14}×10.4 µm~16 cm. The 11-bit counters for *X *_{j}
and *Y*_{j}
mean the maximum resolution of a hologram is 2,048×2,048. Both satisfy the condition of the system described above. Moreover, a cosine function is a periodic function of 2*π*. Using the feature, we can ignore the integer parts of Θ_{0} and Θ_{k}, which makes the data width short in each part of the circuits.

We assign 1 bit to the resultant data *I*_{α}
. Although the LCD pixels we use have 8-bit depth, a gray scale data format has little advantage as compared with a binary format (black or white) in respect of observation by today’s small-scale electroholography system and it has been shown that even a binary format hologram can produce a clear reconstructed image [6]. It is, of course, easy to change the hardware design there from 1 bit to 8 bits if necessary. For the present, therefore, we adopt the binary format in order to reduce the transferred data between HON-5 and the host PC.

As shown in Fig. 6, HORN-5 consists of one BPU and *n* APUs, which calculates *n*+1 hologram points in parallel.

## 3. Packaging and performance

Figure 7 is the large-scale FPGA board for HORN-5, named the HORN-5 board. On the board, four large-scale FPGA chips for CGH calculation, XC2VP70 (7 million gates) by Xilinx, and one middle-scale FPGA chip for PCI control, XC2V1000 (1 million gates) by Xilinx, were mounted. The HORN-5 board was developed in January 2004.

We implemented one BPU and 351 APUs into one FPGA chip. Therefore, the board calculates intensities at 1,408 (352×4 chips) points on a hologram in parallel. Namely, it executes the calculations of one line of the LCD panel at a time because the resolution of the LCD panel in our system is 1,408×1,050 as described in section 2. The clock frequency of the board is 166 MHz. We can also connect several HORN-5 boards to a PC which has usually several PCI slots and use the HORN-5 boards in parallel because the board is designed according to the PCI standard. We also designed and implemented the PCI controller with DMA (Direct Memory Access) transfer function on the board. The communication speed from the host PC to the HORN-5 board is 75.8 Mbyte/s and that from the HORN-5 board to the host PC is 81.9 Mbyte/s in actual measurement.

As the additional function, the board has a LVDS (Low Voltage Differential Signaling) bus interface in order to output hologram data without returning them to the host PC. This architecture is very important when we scale up the HORN-5 system largely. We discuss it as our future work in section 4.

In Table 1, we show the performance of the HORN-5 system. “Time/CGH (sec)” means the calculation time required to make one hologram of 1,408×1,050 resolution for an object consisting of 10,000 points.

In the first place, we confirmed the speed-up ratio by software only. As the main specifications of the PC we used here, the CPU (Central Processing Unit) was a Pentium 4 by Intel operating at 3.2 GHz, the size of main memory was 2 Gbyte and the OS (Operating System) was Windows XP by Microsoft.

“Direct calculation algorithm” means we calculate Eq. (1) directly. Since we have to calculate ${x}_{\alpha j}^{2}$+${y}_{\alpha j}^{2}$+${z}_{j}^{2}$ directly, the numerical dynamic range is required approximately ${z}_{j}^{2}$/${x}_{\alpha j}^{2}$(or ${y}_{\alpha j}^{2}$) [(1 m)^{2}/(10.4 µm)^{2}]~10^{10}. Therefore, we had to use double precision (64-bit) data format in this algorithm. As for the compiler, we used Microsoft Visual C++6.0 with the optimal complier option. This algorithm needed 1630 s (27 min) for generating one CGH.

“Fast algorithm” means we calculate Eqs. (4)–(6). Although this algorithm was developed for hardware, it was also effective as software. As for the data format, this fast algorithm requires only single precision (32-bit) because we use Fresnel approximation and the feature of periodical function. The fast algorithm needed 60.0 s (1 min) for generating one CGH using the same compiler condition of Microsoft Visual C++6.0 with the optimal complier option. It is approximately 30 times faster than “Direct calculation algorithm”.

We tried to develop the faster programming code by use of assembler language in order to effectively use SSE2 (Streaming SIMD Extension 2) technology installed into Pentium 4 processor. It needed 35.7 s for generating one CGH. It is often said that assembler language becomes not always an effective method at present because CPU technology has become complicated. Our result also showed it. The fastest time by software, 24.7 s, was recorded when we compiled the fast algorithm by Intel C++8.0 with the optimal complier option, which was supplied by Intel, the supplier of Pentium processors.

However, even the fastest time by software was slow by approximately 1,000 times for a real-time electroholography. The result suggests that it is difficult to realize a real-time electroholography by improvement on only software method even if a faster method than our one may exist.

On the other hand, Table 1 shows the hardware system is very effective. We adopted the same PC used in computing by only software algorithms as the host PC of the HORN-5 system. The host PC had four slots for PCI boards. We tested the HORN-5 systems consisting of one host PC and from one to four HORN-5 boards. “Speed ratio” means the ratio compared with the fastest time by software only, namely, 24.7 s. The system with three HORN-5 boards recorded 0.0271 s for generating one CGH, in other words, 37 holograms/s, namely, a real-time reconstruction was realized. Moreover, the speed of the system with four HORN-5 boards achieved beyond 1,000 times faster than a PC with only software.

Finally, we show an example of real-time electroholography by the HORN-5 system in Fig. 8. In the beginning, we prepared a rotating 3-D graphics of a musical instrument horn combined with characters “HORN” made by computer graphics software on a PC (Fig. 8(a)). The number of the picture elements (the object points) is 10,000. The HORN-5 system generated the CGHs from the object data (Fig. 8(b)) and the 3-D images were constructed by the optical setup as shown in Fig. 2 at video rate (Fig. 8(c)). The size of the constructed image was approximately 3 cm×3 cm×3 cm. The constructed image was pictured by an ordinary digital camera, MVC-FD97 by Sony. Of course, we can also directly observe it clearly. The multimedia files corresponding to Fig. 8(a) and (c) are also prepared.

## 4. Discussion and future work

As shown in Table 1, the speed-up ratio is not proportional to the number of HORN-5 boards. This overhead cost is caused by the data communication between the host PC and the HORN- 5 boards, mainly the communication cost for retuning the resultant data *I*_{α}
from the HORN-5 boards to the host PC. Since the LCD resolution is 1,408×1,050 (1.5 Mega pixels) and the communication speed is 81.9 Mbyte/s, the communication time for *I*_{α}
is 0.0023 s/hologram in the ideal case. If we assign 8 bits for *I*_{α}
, it requires 0.018 s. For a LCD panel consisting of more than 3 Mega pixels, therefore, our system cannot realize a real-time reconstruction due to the communication cost.

However, in use of electroholography technique as 3-D visualization system, we can neglect the communication cost without returning data from the HORN board to the host computer. That is, the HORN board sends data to the display directly as shown in Fig. 9.

In the HORN-5 system, we use a reflective LCD panel as a SLM. A reflective LCD panel can reconstruct relatively clear 3-D images. However, the viewing zone or the diffracted light angle, depending on the pixel-pitch, is still small. Since the distance between our two eyes is ~6.5 cm, the observation with both eyes in relatively wide range requires less than 5-µm pixel-pitch. The pixel-pitch of 5 µm produces ~7° of diffracted light angle, where the size of a reconstructed image can be expanded to ~10 cm. A pixel-pitch is getting shorter every year. A LCD panel with a 5-µm pixel pitch will be produced in the near future. However, the panel size with 5-µm pixel-pitch is very small, for example, 5 mm×5 mm for the resolution of 1,000×1,000. For practical use of electroholography, therefore, we need to parallelize display panels. The increase of panels, of course, causes the increase of calculation cost.

For overcoming such a problem, the special-purpose hardware system HORN is effective, since it can be easily parallelized and expanded. It has two advantages: First, we can operate with distributed parallel processing, including data flow. Second, we can easily scale up the system. We show a parallel processing system in Fig. 10. Data flow along the directions of the arrows. Especially, the host computer sends only object data. Even for 100,000 points of object data (~1 Mbyte), the present day’s communication speed of ~100 Mbyte/s is enough. The host computer broadcasts object data to all of HORN boards simultaneously, and each HORN board generates CGH for each assigned area and sends the CGH data to the connected LCD panel. The calculations are independently executed in parallel. No bottlenecks of data communication occur.

To verify the efficiency of such a parallel system, we are developing a cluster system consisting of four HORN-5 boards and four LCD panel boards which we also developed [21]. In the system, a HORN board is connected to a LCD board with LVDS interface. We hope to report the favorable result in the near future.

## Acknowledgments

We thank Mr. Masahiko Horiuchi for his help on software technique and Mr. Hirotaka Nakayama for his help on 3-D computer graphics.

## References and Links

**1. **P. S. Hilaire, S. A. Benton, M. Lucente, M. L. Jepsen, J. Kollin, H. Yoshikawa, and J. Underkoffler, “Electronic display system for computational holography,” Proc. SPIE **1212–20**, 174–182 (1990).

**2. **P. S. Hilaire, S. A. Benton, M. Lucente, J. D. Sutter, and W. J. Plesniak, “Advances in holographic video,” Proc. SPIE **1914–27**, 188–196 (1993).

**3. **P. S. Hilaire, “Scalable optical architecture for electronic holography,” Opt. Eng. **34**, 2900–2911 (1995).

**4. **G. Tricoles, “Computer generated holograms: an historical review,” Appl. Opt. **26**, 4351–4360 (1987).

**5. **M. Lucente, “Interactive three-dimensional holographic displays: seeing the future in depth,” Comp. Graphics **31**, 63–67 (1997).

**6. **T. Ito, T. Shimobaba, H. Godo, and M. Horiuchi, “Holographic reconstruction with a 10-µm pixel-pitch reflective liquid-crystal display by use of a light-emitting diode reference light,” Opt. Lett. **27**, 1406–1408 (2002).

**7. **T. Ito and K. Okano, “Color electroholography by three colored reference lights simultaneously incident upon one hologram panel,” Opt, Express **12**, 4320–4325 (2004), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-12-18-4320.

**8. **M. Huebschman, B. Munjuluri, and R. G. Garner, “Dynamic holographic 3-D image projection,” Opt. Express **11**, 437–445 (2003), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-11-5-437.

**9. **M. Lucente, “Interactive Computation of Holograms Using a Look-Up Table,” J. Electron. Imaging **2**, pp. 28–34 (1993).

**10. **H. Yoshikawa, S. Iwase, and T. Oneda, “Fast computation of Fresnel holograms employing difference,” Proc. SPIE **3956**, 48–55 (2000).

**11. **K. Matsushima and M. Takai, “Fast computation of Fresnel holograms employing difference,” Appl. Opt. **39**, 6587–6594 (2000).

**12. **T. Shimobaba and T. Ito, “An efficient computational method suitable for hardware of computer-generated hologram with phase computation by addition,” Comp. Phys. Commun. **138**, 44–52 (2001).

**13. **J. A. Watlington, M. Lucente, C. J. Sparrell, V. M. Bove Jr., and I. Tamitani. “A hardware architecture for rapid generation of electro-holographic fringe patterns,” Proc SPIE **2406–23**, 172–183 (1995).

**14. **M. Lucente and T. A. Galyean, “Rendering interactive holographic images”, Proc. ACM SIGGRAPH **95**, 387–394 (1995).

**15. **T. Yabe, T. Ito, and M. Okazaki, “Holography machine HORN-1 for computer-aided retrieval of virtual three-dimensional image,” Jpn. J. Appl. Phys. **32**, L1359–L1361 (1993).

**16. **T. Ito, T. Yabe, M. Okazaki, and M. Yanagi, “Special-purpose computer HORN-1 for reconstruction of virtual image in three dimensions,” Comp. Phys. Commun. **82**, 104–110 (1994).

**17. **T. Ito, H. Eldeib, K. Yoshida, S. Takahashi, T. Yabe, and T. Kunugi, “Special-purpose computer for holography HORN-2,” Comp. Phys. Commun. **93**, 13–20 (1996).

**18. **T. Shimobaba, N. Masuda, T. Sugie, S. Hosono, S. Tsukui, and T. Ito, “Special-purpose computer for holography HORN-3 with PLD technology,” Comp. Phys. Commun. **130**, 75–82 (2000).

**19. **T. Shimobaba and T. Ito, “Special-purpose computer for holography HORN-4 with recurrence algorithm,” Comp. Phys. Commun. **148**, 160–170 (2002).

**20. **http://www.jvcdig.com/technology.htm.

**21. **T. Ito and T. Shimobaba, “One-unit system for electroholography by use of a special-purpose computational chip with a high-resolution liquid-crystal display toward a three-dimensional television”, Opt. Express , **12**, 1788–1793 (2004), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-12-9-1788.