Title of Invention

METHODS AND SYSTEMS FOR DIGITALLY RE-MASTERING OF 2D AND 3D MOTION PICTURES FOR EXHIBITION WITH ENHANCED VISUAL QUALITY

Abstract The present invention relates to methods and systems for the exhibition of a motion picture with enhanced per- ceived resolution and visual quality. The enhancement of perceived resolution is achieved both spatially and temporally. Spatial resolution enhancement creates image details using both temporal-based methods and learning-based methods. Temporal resolution enhancement creates synthesized new image frames that enable a motion picture to be displayed at a higher frame rate. The digitally enhanced motion picture is to be exhibited using a projection system or a display device that supports a higher frame rate and/or a higher display resolution than what is required for the original motion picture.
Full Text Methods and Systems for Digitally Re-mastering of 2D and 3D Motion Pictures
for Exhibition with Enhanced Visual Quality
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. provisional patent application
no. 60/762,964, entitled "Methods and Systems for Digitally Re-mastering of 2D and
3D Motion Pictures for Exhibition with Enhanced Visual Quality," filed January 27,
2006, the entire contents of which is incorporated herein by reference.
FIELD OF THE INVENTION
[0002] The present invention relates generally to methods and systems for enhancing
a motion picture and more specifically to enhancing the resolution and quality of the
motion picture.
BACKGROUND OF THE INTVENTION
[0003] Motion pictures are composed of a sequence of image frames displayed to
viewers at a fast frame rate. The perceived image resolution is a key indicator for the
exhibition quality of a motion picture, and it is a combined result of both spatial
resolution and temporal resolution. The spatial resolution measures the level of details
within each image frame that can be perceived by audience, and it is determined by
the quality of the display system as well as by the quality of motion picture content.
The temporal resolution of a motion picture measures the level of motion smoothness
of a moving image sequence, and it is determined by the frame rate at which the
motion picture images are displayed. For cinematic presentations, the standard frame
rate of a conventional motion picture is 24fps (frames per second). However, there
exists a number of higher frame rate motion picture formats. An example of
presenting a motion picture digitally at a higher frame rate is generally described in
U.S. patent application no. 2002/0149696 as a method of temporally interpolating
motion images to a higher frame rate so the motion picture can be presented digitally
at the original frame rate or a higher frame rate. The frame Tate interpolation method
may rely on motion vector analysis, such as the analysis used in Kodak's Cineon
System. However, such system does not provide for digitally re-mastering an entire
motion picture or producing acceptable image quality for cinematic or large format
cinematic applications that demand relatively high visual and audio quality. For
example, the system does not provide for artifact repair or sufficient processing speed
to meet day and date release schedules.

[0004] Other examples of such high frame rate motion picture formats include
Showscan (60fps), ToDD-AO (30fps) and MAX® HD (48fps). IMAX® HD is a 15-
Perf/70mm film format that captures and displays a motion picture at a frame rate of
48fps. The first IMAX® HD film Momentum was produced by the National Film
Board (NFB) of Canada and premiered at EXPO92 in Seville, Spain, in 1992. The
study on the IMAX® HD technology was subsequently presented to 135th SMPTE
Technical Conference in 1993. The study indicated that, compared with a standard
IMAX® format using a frame rate of 24fps, IMAX® HD dramatically improves
image realism by enhancing clarity and sharpness, reducing film grain noise and
virtually eliminating of motion artifacts like strobing and motion blur. The study
further indicated that, even for still shots, the perceived image resolution was "notably
greater than standard IMAX® (format)". The study provided evidence that temporal
resolution enhancement through frame rate increase could improve perceived image
resolution. Similar resolution improvement effects were later reported from the
experiment work on two other 48fps film-based projection formats. One of such
format was a 3-Perf/35mm format called MaxiVision, and the other was a 5-
Perf/70mm format called Super Dimension-70 (SDS-70) by Super Vista.
[0005] Over the past decade, relatively few motion pictures have been produced and
exhibited at a frame rate higher than 24fps. There are both economical and technical
limitations that prevent a film-based motion picture from being produced at a higher
frame rate. On the production side, shooting at a higher frame rate increases the film
costs and production costs. More lighting may be needed on a set due to reduced
exposure time as the result of using a higher frame rate, which contributes to the
production cost increase. On the exhibition side, projecting at a higher frame rate
significantly increases the complexity and the cost of a film projector as well as the
cost of film prints. Because of those limitations, neither IMAX® HD nor other
proposed higher frame rate film formats became financially viable for mainstream
motion picture productions.
[0006] The advance of digital projection technology makes it possible to
economically exhibit motion pictures at a higher frame rate. The Digital Cinema
System Specification recently released by Digital Cinema Initiatives (DCI) includes
48fps as a projection option. However, the cost of producing a motion picture at a
48fps remains relatively high. One solution is to enhance the temporal resolution of a
motion picture by converting the images to a higher frame rate. A frame rate

conversion method actually creates synthesized image frames digitally based on the
original image frames. Over the past decades, a number of frame rate conversion
methods were developed for motion pictures and for video format conversion. These
methods range from simple frame (field) repeating, frame (field) averaging to more
complex methods such as motion-compensated frame interpolation. A motion-
compensated (MC) method analyzes the motion of image elements across neighboring
image frames and creates a synthesized new frames based on the estimated motion
information. An MC method usually produces smoother motion than other methods.
[0007] A typical MC method has a motion estimator that calculates the movement of
each image element of an image frame with respect to adjacent frames. An image
element can be defined as a single pixel, a block of n x m pixels or a group of pixels
describing an object. A single motion vector is normally used to indicate the direction
and the strength of the movement of an image element from a present frame to a
future frame. Sometimes, a pair of motion vectors is used to indicate the movement of
an image element both from a present frame to a future frame and from the future
frame back to the present frame. This is called bi-directional motion estimation.
Motion vectors may not be sufficient to describe the movement of a group of pixels
describing an object because the shape of the object may also change from a present
frame to a future frame. In such a case, some forms of mathematical description of
object shape warping may also be included along with motion vectors. There has been
a plethora of MC methods proposed over the last decade for video format conversion.
A majority of those methods can be fully automated with little or no need for human
intervention. However, none of those methods are capable of producing adequate
image quality required for motion picture applications.
[0008] Some algorithms have been proposed for converting a motion picture to a
video format at a field rate of 50/60 fields per second. Such applications typically
require fully automated algorithms ranging from standard 3:2 pulldown to motion-
compensation based frame rate conversation (MCFRC) algorithms. An MCFRC
algorithm may create better image quality and smoother motion but it also produce
other artifacts that result from motion estimation errors. The MCFRC algorithm
generally includes three categories: (1) block-based methods; (2) object-based
methods; and (3) pixel-based methods. The block-based methods can be implemented
using common block-based motion estimation algorithms similar to those in MPEG
and H.264/AVC codecs. The object based methods may produce fewer artifacts than

others, but are generally not very stable. An example of an advanced object-based
MCFRC algorithm is generally disclosed in U.S. patent no. 6,625,333. The pixel-
based methods are generally computationally expensive.
[0009] Frame rate conversion methods are also used for creating special visual effects
(VFX) such as to create slow motion, fast motion or variable-motion sequences,
which are frequently practiced in the production of a motion picture, commercials and
video. Examples of commercial software tools available for such VFX applications
include ReTimer software by Realviz and Time Warp software by Algolith. Retimer
provides the ability to create digital "slow motion" or variable motion and allows
users to edit rate curves and motion vector fields to achieve desirable results. Some
such commercial software tools deploy some forms of MC methods. For instance, the
MC algorithms behind ReTimer are based on block image elements, while the
algorithms behind TimeWarp are based on object image elements, such as those
developed by Communications Research Centre Canada (CRC) and described in U.S.
patent no. 6,625,333. Such commercial software tools, however, are not designed for
automated computation, and they reply on human users to provide user inputs
interactively through a GUI. Furthermore, these software tools inevitably produce
unacceptable artifacts due to problems like occlusion and motion estimation errors,
and they do not provide efficient tools and methods to handle those problems.
Although the resulting artifacts can potentially be fixed through manual fixes by
skillful human users, the process is relatively labor-intensive, costly and time
consuming.
[0010] Increasing the spatial resolution of each image frame can also improve the
perceived resolution of a motion picture. A conventional motion picture shot on
35mm negative film is limited to a spatial resolution of approximately 80 cycles per
mm, or approximately 1,800 lines per picture height for 1.85 projection format Due
to the generational modulation transfer function (MTF) losses from standard film lab
processes, the spatial resolution of a release film print is reduced to approximately
875 lines per picture height or lower.
[0011] The advance of digital cinema technology eliminates some major sources of
MTF losses, especially those from the standard film lab process, so that it becomes
feasible to present a motion picture with a higher perceived resolution than a typical
release film print. The DCI Digital Cinema System Specification recommends that a
digital motion picture be presented at a 2K or 4K format. A 2K digital format can

theoretically support a spatial resolution up to 1,080 lines per picture height, while a
4K digital format can support up to 2,160 lines per picture height. However, the
quality of a digital cinema presentation cannot be guaranteed unless the quality of
motion picture image content can match the spatial resolution of a digital cinema
system. Because of MTF degradations from various stages of the motion picture
production and post-production processes, including capture, scanning, VFX and data
compression, the resulting motion picture images may have a much lower spatial
resolution than what can be supported by a digital projector.
[0012] It is a major challenge to improve the spatial resolution of motion picture
images in order to produce a high quality cinematic experience, especially when a
motion picture is to be presented in a large format cinema. A typical large format
cinema, such as an IMAX® theatre, has a screen as large as 80 feet in height. In such
a theatre, the audience seated much closer to the screen than in a conventional cinema.
Delivering a satisfactory visual experience to the audience in such a theatre requires
significant enhancement in image quality, such as the perceived resolution. Even
when such enhancement methods are applied, it is difficult to complete all required
processing within a relatively short time window in order for the enhanced motion
picture to be released on schedule.
SUMMARY OF THE INVENTION
[0013] Certain aspects and embodiments of the present invention provide methods
for digitally re-mastering a motion picture to achieve enhanced perceived resolution
and visual image quality. The images of the motion picture can be enhanced through
both spatial resolution enhancement and temporal resolution enhancement. Spatial
resolution enhancement can be achieved though a combination of "motion-based" and
"learning-based" spatial resolution enhancement methods. Temporal resolution can be
achieved by computing motion vectors for every pixel in each motion picture
sequence with relatively accurate motion estimation methods. Certain methods are
designed to be implemented in a highly automated fashion with limited human
interaction. Aspects of the present invention can be implemented with a system of one
or more process-based devices.
[0014] Certain methods and systems of the present invention can be applied to two-
dimensional (2D) and/or three-dimensional (3D) motion pictures. A 2D motion
picture is a sequence of image frames, which can be either captured by a motion
picture camera or created one frame at a time by computer graphics. Enhancement

methods according to some embodiments of the present invention perform processes
on digital data. Accordingly, if a motion picture already exists in a digital format,
such as, for example, Source Master files, the image data may be directly used in the
enhancement process. Motion picture in formats other than digital may be converted
before the enhancement process. For example, if a motion picture is on film, it can be
digitized to convert it into digital data through a film scanner.
[0015] A 3D motion picture creates illusion of both movement and depth based on the
principle of stereoscopic vision. A 3D motion picture consists of two sequences of
image frames, one representing a view from the left eye and one from the right eye.
Those two sequences are typically referred to as the left-eye (L) images and the right-
eye (R) images. The L images and R images can be synchronized such that one frame
from each sequence captures an action in a given instant from left eye and right eye
view points and they form an image pair called an L-R image pair. Some
embodiments of the present invention can be applied to 2D and 3D motion picture
data because the L and R images can be treated as two separate image sequences.
Other embodiments of the present invention apply different enhancement processes to
the 3D motion picture data, such as using information from one eye image, for
example L, to improve processing results of the other eye image, for example R.
[0016] Some embodiments of the present invention provide methods for enhancing
2D motion picture sequences using 3D motion picture data. For example, either L
image sequences or R image sequences may be created for output, but the correlation
between L and R images may be used to enhance the image sequence.
[0017] One application of a digital re-mastering process, such as a spatial resolution
enhancement process and a temporal resolution enhancement process, according to
some embodiments is to enhance a conventional motion picture of 24 fps to be
displayed at a frame rate of 48 fps or higher. Another application is to allow a motion
picture to be captured at a lower frame rate, such as 12 fps, in order to increase the
shooting time for a data storage device or a film roll. The resulting images can be
enhanced to a normal frame rate of 24fps or higher by embodiments of the disclosed
digital re-mastering process. Other similar applications of the present invention
should be apparent for those skilled in the art.
[0018] Embodiments of the present invention provide methods and systems for
enhancing the perceived resolution of a motion picture through both spatial resolution

enhancement and temporal resolution enhancement that meet the requirement for
motion picture release schedules.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] These and other features, aspects, and advantages of the present invention are
better understood when the following Detailed Description is read with reference to
the accompanying drawings, wherein:
[0020] Fig. 1 is a flow diagram for digitally re-mastering a motion picture according
to one embodiment of the present invention.
[0021] Fig. 2 is a flow diagram for digitally re-mastering a 2D motion picture
according to one embodiment of the present invention.
[0022] Fig. 3 is a flow diagram for digitally re-mastering a 3D motion picture
according to one embodiment of the present invention.
[0023] Fig. 4 is a layout of a system for digitally re-mastering a motion picture
according to one embodiment of the present invention.
[0024] Fig. 5 is a flow diagram of data in a system for digitally re-mastering a motion
picture according to one embodiment of the present invention.
[0025] Fig. 6 is flow diagram of a temporal resolution enhancement process for the
enhancement of motion picture image data according to one embodiment of the
present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0026] Fig. 1 shows one embodiment of a process 10 for enhancing motion picture
image sequences. The process 10 starts by receiving motion picture image sequences
that may be in a film format of 2D or 3D motion picture 12 or a digital source master
of 2D or 3D motion pictures 14. If the motion picture is in the 2D or 3D motion
picture film format 12, it is scanned using a film scanning process 16. If the motion
picture is a digital source master 2D or 3D motion picture 14, the data can be used
directly or can be converted into another digital format using a format conversion 18.
[0027] After film scanning 16 or format conversion 18, the motion picture sequence
data includes image sequences or frames. The image sequences are then enhanced by
a digital re-mastering process 20. The digital re-mastering process 20 can include a
spatial resolution enhancement method 22 and a temporal resolution enhancement
method 24, embodiments of which are described in more detail below. The spatial
resolution enhancement method 22 can create image details that are absent in an


image frame through image analysis. Two different spatial resolution enhancement
methods may be used in order to achieve the desirable performance. One method can
be a motion-based method in which additional image details are "stolen" from
adjacent neighboring image frames through the analysis of the motion of image
elements. The second method may be a learning-based method in which additional
image details are "created" based on previously learned knowledge through the
analysis of image feature space.
[0028] The temporal resolution enhancement method 24 may improve perceived
resolution by adding synthesized new image frames through frame interpolation to
increase temporal sampling rate or frame rate. The temporal resolution enhancement
method 24 may include a MCFRC method that is relatively accurate and stable. The
temporal method 24 may also be adapted to handle occlusion.
[0029] In some digital re-mastering processes, the motion picture image data are
divided into smaller segments called shots. The shot segmentation process may use
editorial information, available after the final cut of a motion picture is approved.
With editorial information, image data can be divided into shots, and digital
processing can be done independently for each shot. After digital re-mastering,
editorial information 28 may need to be modified because of increased frame counts
in each shot. With the modified editorial information 30, the enhanced shot segments
can be put together in the right order. For example, after the motion picture data is
enhanced, it undergoes a confirming process 26 that synchronizes the data in
accordance with audio tracks 32. The confirmed enhanced image data is then
converted into a standardized digital source master format 34, which can be similar to
the original source master format, but with increased frame rate. Digital release
master files 36 can then be produced ,based on the digital source master files 34, for
display 38 in theatres or further mastered into other distribution formats, including
video and broadcasting. Audio 32 may also be combined to create the digital release
master file 36. Data compression may be applied in creating the digital release master
file 36 to meet the storage and bandwidth requirements of release platforms.
[0030] Enhancement of perceived resolution and visual quality of motion picture
images are especially important for the release of a motion picture for large-format
cinema venues that are capable of delivering substantially higher image resolution and
visual quality. For that purpose, a frame from an enhanced motion picture has higher
pixel resolution than what is needed for a conventional cinema. Converting a source

master to a release master for a conventional cinema may require reduction in frame
pixel resolution. The original audio files may also be digitally re-mastered to support
a higher audio quality standard as required for a large-format cinema venue. The
digital release master file may also be recorded back to film for distribution.
[0031] Figs. 2-3 show a process flow of creating a digital release master file with
enhanced characteristics. Fig. 2 shows a process for enhancing 2D motion picture
images while Fig. 3 shows a process for enhancing 3D motion picture images.
[0032] The processes shown in Figs. 2-3 begin by receiving a 2D or 3D motion
picture image sequence as an image data input 102. The 2D or 3D image sequence is
either in digital format or converted, using a digitization process, to digital data if the
image data is in a non-digital format, such as on celluloid film.
[0033] The digital 2D or 3D motion image sequence is then divided using a scene
segmentation process 104. For example, the image sequence can be divided into
shots, where each shot is a sequence of image frames representing a continuous
action. Scene segmentation 104 assists in preparing the image sequences for
enhancement that is preferably performed on a sequence of image frames with
continuous action. Scene segmentation 104 can be performed automatically by
detecting abrupt changes of image frame characteristics. Examples of such
characteristics include color, intensity, histogram, texture and motion. If automated
scene segmentation 104 is used, the entire motion picture data can be treated as a
continuous sequence.
[0034] In some embodiments of the present invention, a motion picture is already
separated into shots as a result of editing, and a shot list called an Edit Decision List
(EDL) file is available. EDL records accurate information about every shot including
time code data or frame count data representing the start and end of each shot. An
EDL can be used to guide the scene segmentation process 104.

Spatial Resolution Enhancement
[0035] After scene segmentation 104, the motion picture image sequence is enhanced
by a spatial resolution enhancement process. Fig. 2 illustrates a spatial resolution
enhancement process 106 that can be applied to 2D image sequences, while Fig. 3
illustrates a spatial resolution enhancement process 206 that can be applied to a 3D
image sequence. As described below, the spatial resolution enhancement processes
106, 206 may be applied differently, but both can include a motion-based method and
a learning-based method.
Motion-Based Spatial Resolution Enhancement
[0036] The motion-based methods 108, 208 can enhance the spatial resolution of
image sequences based on motion analysis. The motion-based methods 108, 208 may
include three general steps: (1) motion estimation; (2) motion field regulation; and (3)
detail recovery. Motion estimation may be based on a hierarchical motion model in
which every image frame is represented by a multi-level pyramidal data structure.
Each pyramidal data structure can represent a certain level of image details. At each
pyramid level, a motion estimate of every pixel can be computed from all frames
within a sliding temporal window using a variable-size block-matching algorithm.
The resulting motion fields can be regulated using constraints such as high-frequency
features, smoothness and quality measure. Each motion estimate can be assigned a
reliability measure. Any pixel with a lower reliability measure can be considered for
regulation to reduce estimation error. A group of synthesized frames are constructed
by mapping each neighboring frame to the present frame based on computed motion
estimates. In some embodiments, image details are recovered through adaptive
temporal interpolation of synthesized frames within the temporal window. Temporal
filtering may be performed using single-pass, multi-pass, or iterated process. The
motion-based spatial resolution enhancement methods 108, 208 may be implemented
as an automated distributed computing system controlled by a processor-based device,
such as an intelligent controller. Motion-based methods are described in more detail
in U.S. patent application serial no. 19/474,780.
[0037] For 3D motion picture image sequences, the motion-based spatial resolution
method 208 can be applied to L and R image sequences separately. Furthermore,
spatial resolution may be further improved using the correlation between the pixels of
L and R image pairs, such as by estimating the disparity between L and R image
sequences. Such estimating may include disparity estimation, disparity map


regulation and detail recovery. Disparity estimation begins by correcting the
horizontal misalignment between two image sequences. For each pixel in one image,
for example the L image, a matching pixel in the other image, for example the R
image, is located. Matching may be performed in the horizontal directions with
limited allowance to accommodate the remnant vertical misalignment that may not be
eliminated. Pixels for which a match is not found may be ignored. The disparity
matching can be done in both directions in order to improve the accuracy of the
resulting disparity map. A disparity map is generated for each L/R image pair. The
disparity map can be further refined by removing local abnormalities using similar
constraints as those used in motion regulation. A synthesized R image is then
generated using the disparity map to be mapped on to the L image to improve its
spatial resolution, and a synthesized L image is generated and mapped to the R image
to improve its spatial resolution.
[0038] The motion estimates resulting from the above process can include multi-
resolution motion vector fields, which can be stored and used as initial motion
estimates 140 for a subsequent temporal resolution enhancement process.
Learning-Based Spatial Resolution Enhancement
[0039] Motion-based spatial resolution enhancement methods 108 may be effectively
use for image sequences having relatively predictable motion. For images having
relatively complex motion, the impact of detail enhancement diminishes as motion
estimates become less accurate. In the embodiments illustrated in Figs. 2-3, a
learning-based method 110, 210 may be used to match each pixel of an image to a
library of pre-selected image patterns having higher resolution and replace the pixel
with a value calculated from a matching higher resolution pattern. The higher
resolution patterns can be generated using a set of selected sample images that contain
a higher level of image details than the original motion picture images. Such a library
of higher resolution patterns is a high-resolution codebook and each pattern is
described by a codeword.
[0040] The learning-based spatial resolution 110 can perform the following steps to
enhance the spatial resolution of original image sequences. Each original image
sequence is upsized to an intended higher resolution. A matching process is then
applied to each pixel of the upsized image to find a matching codeword in a pre-set
codebook. If a match is found, the pixel is replaced by the central pixel of the higher
resolution image pattern associated with the matching codeword. After the above


process is repeated for each pixel of an upsized image, the resulting enhanced image
may need an additional pass of temporal filtering process to ensure temporal
consistency with other enhanced image frames. This matching process can be
extended to a block of pixels of the upsized images. For example, each block of pixels
is matched to a codeword, and it is replaced by the matched higher resolution pattern
through a transformation. A blending process can be applied to reduce the spatial
discontinuity between the blocks. A temporal filtering process can also be applied to
ensure temporal consistency.
[0041] A similar learning-based spatial resolution enhancement process 210 may be
applied to a 3D motion picture. For example, the L and R images can share the same
codebook. Resolution enhancement is then performed to each eye separately. A
filtering process can be applied, using the disparity map produced in the motion-based
resolution enhancement stage to remove inconsistency between resulting L and R
images.
[0042] One implementation of the codebook generation process is described as
follows. First, all higher resolution image patterns are downsized to the same pixel
resolution of the motion picture images such that they share a similar level of image
details. The level of image details is measured through Fourier spectral analysis. The
downsized image patterns are then upsized to a higher pixel resolution in which the
motion picture is to be presented. The resulting pattern forms a pair with the higher
resolution image pattern from which it is produced. The upsizing processing may
increase image pixel count and not create additional image details. A training process
can be applied to all image pattern pairs to calculate and extract a number of image
features from each image pattern pairs for each pixel from surrounding pixels. Image
features having a higher level of image details can be described by a fixed number of
data bytes or word. The collection of all features from each pair of image form a data
set or codeword. The length of the codeword can be reduced using a principle analysis
process to remove redundant feature attributes. The codewords are then collected into
a data library or codebook and saved in a data storage. In some embodiments, the size
and content of the initial codebook are dependent on the size and the content of the
image patterns selected. A clustering analysis can be applied to the codebook to
reduce the codebook size. The clustering analysis may group pixels having similar
image patterns.

Temporal Resolution Enhancement
[0043] The output of the spatial resolution enhancement processes 110, 210 are
applied to temporal resolution enhancement processes 112, 212. The temporal
resolution enhancement process 112, 212 can increase perceived resolution by
increasing the display frame rate. Because original images have a fixed frame rate,
new image frames need to be synthesized based on the original image frames to
achieve frame rate increase. In some embodiments of the present invention, the
temporal resolution enhancement processes 112, 212 may include pre-processing,
global motion estimation, local motion estimation, half-motion vector generation,
frame interpolation, and artifact repair by a temporal consistency check. Synthesized
frames are created based on high quality motion estimates calculated by an
embodiment of MCFRC method described below with reference to Figs. 2-3 and Fig.
6.
Pre-processing
[0044] First, the image sequence is pre-processed 114. Pre-processing 114 may be
used to calculate edge mask maps and color segmentation maps from image frames.
For example and referring to Fig. 6, edge mask map 612 can be generated from each
frame of the image by an edge detection algorithm 602, such as a Canny edge
detector. A color segmentation map 614 is then generated from each frame by a color
segmentation algorithm 604, such as Meanshift or Watershed. For a 3D motion
picture, separate edge mask maps 612 and color segmentation maps 614 are generated
for each eye.
Global Motion Estimation
[0045] The global motion between adjacent frames is then calculated using a global
motion estimation process 116. Global motion estimation 116 may be used to achieve
relatively accurate local motion estimates. Approximate but correct global motion
estimation 116 can be used to make a general first estimate for local motion
estimation algorithms. Using global motion estimation 116, the motion of an image
background can be computed using camera motion and depth information. If depth
information is not available, such as for 2D or some 3D image sequences, global
motion can be modeled approximately as a three-dimensional planar transform, which
is a simplified version of three-dimensional perspective projection transformation and
does not need depth information in calculation. An example of three-dimensional
planar transform used in the method is shown here:


where (x,y) and (x '.y') are positions of a pair of matching feature points from two
adjacent frames. The coefficients al, a2,..., a8 are determined by fitting the three-
dimensional planar model in equation (1) based on randomly tracked feature points.
Feature points are computed based on standard 2x2 gradient cross correlation matrix

where fx and fy represent the local horizontal and vertical gradient, and the sums are
taken over a small local region (3x3, 5x5, etc.) around each pixel. Feature points can
be extracted by a number of methods. One method is to calculate the minimum Eigen
value of matrix G, which is the basis of Kanada-Lucas Tomasi (KLT) feature
detector. Another method is to calculate the maximum corner strength measure based
on matrix G, which is the basis of another well-known Harris corner detector.
Another method is to calculate the following values PEG, QEG and θEG, which are
derived from matrix G:

[0046] The value of PEG represents the local edge scale, which can be used to
generate edge mask values. The value of QEG represents the response to local
gradient changes, which can be used to locate feature points by calculating its local
maximum. Among the pixels with local maximum, true feature points are not far

away and they can be located by calculating the value of θEG, which represents the
orientation of the local gradient, and an energy measure V and searching for local
maximum positions:

[0047] L(θEG) is the projection length from centre pixel to the border of the local
region. Because the local region is usually rectangle, this value varies depending on
the angular values. Typically N directional angles are used, and N is usually set to 16
or larger. Feature points extracted though the above method need to be matched in
pairs between adjacent frames. For a feature point in the current frame, it is matched
to a feature point in the next frame to form a pair they have the highest correlation
value.
[0048] Once feature points are faired, the coefficients of a three-dimensional planar
transform (1) that models the global motion between any two adjacent frames can be
calculated by randomly selecting at least four corresponding feature pairs from those
two frames, each pair generating two linear equations such as:

[0049] The selected 4 feature pairs produce eight linear equations, so that coefficients
a1,...,a8 can be solved. The resulting three-dimensional planar transform can be
tested on all feature pairs and the transform as well as the number of inliers are saved
for later use. Then the next four pairs of feature points are randomly selected and a
three-dimensional planar transform is calculated and saved. This process is repeated
over a sufficiently large of iterations. Among all three-dimensional planar transforms
estimated from all iterations, the planar transform having the maximum number of

inliers is selected as an initial estimation. The inliers are used to estimate a second
three-dimensional planar transform using a standard least square algorithm. This step
is repeated until the number of inliers becomes stable. The resulting three-
dimensional planar transform is an estimate of the dominant global motion between
those two frames. If other dominance global motions exist, they can be calculated in
the same way from remaining unclassified feature points. The required computation
for this algorithm depends on the number of feature pairs and is independent of the
image frame size.
[0050] The most dominant motion between adjacent frames is usually the background
motion, typically caused by camera motion. However, the motion of the foreground
objects may become the dominant motion if the objects are large enough and move
fast enough. In most cases, however, the background motion can be selected among
dominant motions based on similarity in motion and percent of inliers among adjacent
frames.
[0051] For 3D image sequences, separate global motion estimates may be performed
for each eye. Where stereoscopic disparity between L and R images are relatively
small, a single global motion estimates may be performed for both eyes..
Local Motion Estimation
[0052] A local motion estimation process, such as local motion estimation 118, 218,
can then be applied based on edge mask maps 612, color segmentation maps 614,
global motion estimation 116, and initial motion estimates 140 received from the
motion-based spatial resolution enhancement process 108, 208. Local motion
estimation 118, 218 can include a pyramid voting-based algorithm. The pyramid
voting-based algorithm can synthesize new frames based on resulting local motion
vectors. The new frames can be combined with spatially enhanced image frames to
achieve a desirable frame rate.
[0053] The pyramid voting-based algorithms can further minimize errors during
image sequence enhancement. A specific local motion estimation algorithm is usually
optimized for a certain type of motion and may not remain accurate for other types of
motion. In some embodiments of the present invention, a multiple of local motion
estimation methods are used and a voting process is deployed to determine the best
estimates and minimizing errors. In some of those methods, the motion estimates from
the previous motion-based spatial resolution enhancement are used as initial
estimates.

[0054] In one embodiment of the present invention, up to four local motion
estimation algorithms or methods are used. One of the four algorithms is a block-
matching method. The block-matching method includes dividing two adjacent image
frames into small blocks. In one frame, its global motion is used as the initial guess
and the average pixel shift of each block is calculated. A starting block in the second
frame is determined and all blocks near the starting block are searched to find the best
match based on minimum of block matching error. A morion vector is then assigned
to each pixel in the first frame block that is equivalent to the shift of the block from
the first frame to the best matching block in the second frame.
[0055] A second local motion estimation algorithm is a feature propagation method.
The feature propagation method estimates the motion of feature points and the results
are propagated to the rest of pixels of the entire image frame. The feature point pairs
extracted for global motion estimation can be used. All feature pairs with low
correlation values and large initial motion estimates are removed. Among the
remaining feature pairs, those with duplicate correspondence are also removed. The
remaining feature points are considered accurate and can be used as seed pairs. Based
on the motion vectors of seed pairs, a propagation algorithm is applied to spread the
motion to more pixels. In some embodiments, the pixel status from the previous
analysis are recorded to assist in speeding up the process. Missing vectors are then
filled based on color segmentation maps.
[0056] A third local motion estimation algorithm is a modified optical flow algorithm
with control points and Lorenzo function. The control points are defined in a lattice
grid and Lorenzo function is used to replace the standard Mean Square Error (MSE)
to control errors. The method calculates horizontal and vertical gradients for each
pixel of the next frame, determines the structure and distribution of the control points,
and calculates weights of each pixel to the related control points. Starting from an
initial guess of motion vectors of each pixel, motion vector modifications are
computed for the control points. Motion vector modifications are computed for all
pixels based on the control points. An energy cost function (motion tracking errors) is
calculated using Lorenzo function. If the value changes are small, then the method
stops, otherwise the computations are repeated until either value changes are small or
the maximum number of iterations is reached.
[0057] A fourth local motion estimation algorithm is an integral block matching with
multiple block sizes. In this method, initial motion estimates between two frames are

used as starting points for pixel-based searching using block correlation values
calculated using local central moments. For each pixel in the current frame, a
matching pixel is found in the next frame based on the highest correlation values. This
process is repeated for a multiple of different block sizes, resulting in a multiple of
motion vectors for each block size for each pixel.
[0058] Each pixel in an image frame can have both forward motion vectors and
backward motion vectors. The former are estimated by searching for matching pixels
from the next frame, and the latter are estimated by searching in the reverse direction
from the previous frame. All algorithms discussed are applicable in either directions,
resulting in a multiple of forward motion vectors and a multiple of backward motion
vectors.
[0059] A voting process is used to select a most accurate motion vector for each pixel
from both forward and backward motion vectors. In one embodiment of the present
invention, voting consists of the following schemes, each for a certain portion of both
image frames. Scheme #1 is to select a motion vector by edge difference values,
which is calculated based on the edge masks generated previously. For each motion
vector associated with a certain block size, the normalized edge difference is
calculated by summing the absolute edge differences of surrounding pixels within the
block, and dividing by the absolute edge values of the block. A motion vector with a
sufficiently small normalized edge difference value is selected and the motion vector
is assigned to the pixel. If more than one motion vector has sufficiently small
normalized edge differences, the motion vector obtained using the largest block size is
selected. Selected pixels are marked with a relatively high accuracy value in a
corresponding accuracy mask.
[0060] If Scheme #1 fails to find a motion vector, the next Scheme #2 is used to
select a motion vector by coherence check. In this scheme, each pixel with a forward
motion vector tries to find a corresponding pixel in the next frame with a
corresponding backward vector to form a pair based on sufficiently small matching
error. If a pair is found, the motion vector pair is assigned to both pixels and mark
them with relatively high accuracy values in a resulting accuracy mask.
[0061] If Scheme #2 fails, the next voting scheme (Scheme #3) is used, which is also
based on forward-backward checks. The corresponding backward motion vector is
analyzed when a pixel is projected to the next frame based on a forward motion
vector.. If the backward motion vector has a relatively low accurate value, it is

replaced with the forward motion vector with direction reversed because the former
has a higher accuracy value. The accuracy value in the accuracy mask is modified
accordingly.
[0062] If Scheme #3 does not produce a motion vector, the next Scheme #4 is used to
select a motion vector based on minimum pixel color difference. If such a motion
vector is found, corresponding pixels are marked with a relative low accuracy value in
the accuracy mask.
[0063] For the remaining pixels, color segmentation maps are used to calculate an
average for all motion vectors found previously in the same segmented region. The
averaged motion vector is then used as a reference, and a motion vector that is both
close to the reference motion vector and has minimum pixel matching errors is
selected. This last voting method is Scheme #5.
[0064] Although five voting scheme are disclosed, those skilled in the art will realize
that other voting schemes can be devised and added to the method for the purpose of
producing a pair of forward and backward motion vectors with the highest possible
accuracy for every pixel of every frame.
[0065] In the local motion estimation process 218 for a 3D motion picture, motion
vectors can be estimated separately from L images or R images. However, because
high correlations between L and R images exists, the motion estimates from one eye
can be used as accurate initial motion estimates for the second eye. In one
embodiment of the present invention, the motion vectors obtained from the L images
are used as the initial motion estimates for the pyramid voting-based motion
estimation process of the R images. This can improve the accuracy of the motion
vectors for the R images and reduce inconsistency between L and R images.
[0066] A voting-based local motion estimation in a multi-level pyramidal structure in
which image data at an upper level represents a coarse version of the image data, and
the lowest level (level 0) represents the finest details of the image data can be
relatively efficient and accurate. Such a pyramid representation of an image sequence
is generated by progressively low-pass filtering and subsampling each frame. A
similar pyramid representations is implemented in the motion-based spatial resolution
enhancement methods 108, 208 discussed previously, which also produces multi-
level motion estimates.
[0067] Pyramid voting-based local motion estimation methods 608 according to some
embodiments of the present invention can be implemented at each pyramid level. For


example, and referring to Fig. 6, an edge mask map 612, a color segmentation map
614 and global motion estimates 116 are generated for every frame at each level.
Based on global motion estimates, all the data of the next frame (image data, the edge
masks maps 612 and the color segmentation maps 614) are warped by a warping
process 606 to the current frame to create warped data at each pyramid level.
[0068] All the voting schemes discussed previously can be applied to all pyramid
levels with slight or no modifications. For example, Scheme #2 can be extended to
checking coherence of multiple motion vectors produced by more than one method.
At each pyramid level, if at least two morion vectors from different methods have
very small matching errors, the motion vector produced by a method with a higher
accuracy is selected or an average of those vectors is determined, and accuracy values
in a corresponding accuracy mask 618 are updated accordingly.
[0069] For any pyramid level higher than level 0, both the motion vectors and
accuracy masks can be progressively refined by using bilinear interpolation as the
initial guesses of the next pyramid level. The accuracy mask values 618 are used as
weights in the interpolation such that motion vectors with high accuracy values will
be weighted more than those with lower accuracy in interpolation.
[0070] For the pyramid level 0, motion vectors need to be adjusted, taking pixel
occlusion into consideration. An intermediate mask is created by projecting pixels
from second image frame using backward motion vectors to the first image frame.
This mask is then dilated to generate a traceable mask of the first image frame. Those
pixels that cannot be traced from the second image back to the first image are
considered occlusion pixels. For occlusion pixels, their motion vectors are adjusted
using color segmentation-based averages. This step is repeated in the reverse direction
from the second image frame back to the first image frame. If warping 606 is applied,
the resulting bi-directional motion vectors 616 and accuracy masks 618 are warped
back to normal by an inverse warping process 610.
[0071] The pyramid structure provides a possibility to select the performance of local
motion estimation methods, ranging from maximum accuracy (using all multiple
methods at all pyramidal levels) to maximum efficiency (using a single method at
level 0). A 4-digit binary word called "flag" 624 is used to mark the combination of
methods selected. For instance, a flag 624 having a value of 1010 can indicate that
method #2 and method #4 are selected but method #1 and method #3 are not. More
accurate results are achieved by selecting more methods but at same time increasing

computational cost. At the upper levels, more methods can be used. As a result, higher
accuracy are achieved at upper levels without significant increase in computational
costs. If upper level motion estimates are highly accurate, it can also improve the
accuracy oflower level motion estimates so that fewer methods may be needed for
lower levels.
[0072] In the local motion estimation process 218 for a 3D motion picture, pyramid
representations of L and R images can be obtained from the motion based resolution
enhancement stage, and a voting-based motion estimation method similar to the
pyramid voting-based local motion estimation 608 can be performed separately in L
images and R images. However, because there exist strong correlations between L and
R images, the motion estimates from one eye can be used as accurate initial motion
estimates for the second eye. The motion vectors obtained from the L images can also
be used as the initial motion estimates for the pyramid voting-based motion estimation
process of the R images.
Half-Motion Vector Generation
[0073] A half-motion vector generation process 120 is then applied to the image
sequences to convert the image sequence to a higher frame rate by synthesizing new
frames at time intervals based on the desirable frame rate. Half motion vectors 620 are
created based on bi-directional motion vectors 616 and the accuracy masks 618 to
indicate the movement of pixels from an existing frame to a synthesized frame or
otherwise. The term "half" is used here to indicate that the time interval of a
synthesized frame is somewhere between two original image frames, and it does not
restrict a synthesized frame to be created exactly half way between two original image
frames. In a similar way, half accuracy masks 622 also created with respect to the
time interval of a synthesized frame, which can be used later in frame interpolation.
[0074] Half-motion vector generation 120 generally assumes that a synthesized frame
is to be created at a time interval in between a first image frame and a second image
frame. However, half-motion vector generation 120 can be use when the frame rate is
more than doubled and more than one synthesized frames are to be created between
two image frames. A synthesized frame is created by projecting all pixels of an
existing frame to the synthesized frame based on forward or backward motion
vectors. For example, new forward motion vectors called "half forward motion
vectors" are assigned from the first frame to the synthesized frame, and also from the
synthesized frame to the second frame. Similarly, new backward motion vectors

called "half backward motion vectors" are assigned from the second frame to the
synthesized frame and from the synthesized frame to the first frame. If a pixel in the
synthesized frame has both a half forward motion vector and a half backward motion
vectors, it is considered accurate and marked accordingly in a half accuracy mask
corresponding to the synthesized frame. If one of the half motion vector 620 exists,
the pixel is considered an occlusion pixel and marked as such in the half accuracy
mask 622. If a pixel that has neither half forward or backward motion vectors, it is
considered inaccurate and marked accordingly in the half accuracy mask. The
missing half motion vectors can be estimated from averaging the half motion vectors
of neighboring accurate pixels. The resulting half accuracy mask can be used to
locate potential artifact pixels in subsequent processing stages.
Frame Interpolation
[0075] Frame interpolation 122 is then performed using the half motion vectors 620
and the half accuracy masks 622. In a preferred embodiment of the present invention,
frame interpretation 122 uses the pyramid representations of two adjacent images and
their accuracy masks 618.
[0076] Frame interpolation 122 is generally applied to create a synthesized frame
between a first image frame and a second image frame. A blank synthesized frame is
first created in the same pyramid structure. The task of frame interpolation is to fill
each pixel of the synthesized frame at each pyramid level with correct color values.
Starting from a target pixel position in the synthesized frame of a certain level, the
values of pixels at the same position and at the same level in both first and second
frames are compared. If their values are sufficiently close, the averaged color is
assigned to the pixel of the same position in the synthesized frame. This pixel is also
marked as accurate.
[0077] If a color match cannot be found, the target pixel in the synthesized frame is
projected to the first frame and to the second frame using corresponding half motion
vectors and one color value from each frame is obtained. If those two color values are
sufficiently close, the average of those two color values are computed and assigned to
the target pixel. In addition, the pixel is marked as accurate. If those two color values
are not sufficiently close, one projected pixel whose color values share is selected by
a majority of its neighboring pixels, and the pixel is marked as an occlusion pixel.
[0078] If the target pixel has one half motion vector, either the half forward motion
vector or the half backward motion vector, the pixel is assigned with the color values

of the pixel projected using the available half motion vector. This pixel is marked as
an occlusion pixel.
[0079] If the target pixel has no half motion vectors, accurate pixels are searched
within a group of pixels at an upper pyramid level. Once found, the target pixel is
assigned with color values obtained from a bilinear interpolation of those accurate
pixels. If no accurate pixels are found within the neighborhood, the radius of the
group of pixels is gradually expanded until it includes at least some accurate pixels
and the target pixel is assigned with a bilinear interpolation of those accurate pixels.
[0080] The frame interpolation 122 process can create a synthesized image frame
from two existing image frames. This process is repeated for the entire image
sequence until a desirable frame rate is reached. The result is a temporally enhanced
image sequence that contains both the existing image frames and synthesized image
frames. Depending on the desirable frame rate, not all exiting image frames may be
kept in the resulting enhanced image sequence. For example, to convert a standard
24fps motion picture sequence to an output frame rate of 48fps, a synthesized frame
can be created in between each pair of neighboring image frames. As a result, nearly
half of the image frames of the resulting enhanced image sequence are synthesized
frames. In other cases where the output frame rate is not exactly a multiple of the
original frame rate, a higher percentage (up to 100%) of image frames of an enhanced
image sequence will be synthesized image frames.
[0081] The temporally enhanced motion picture can be synchronized with the original
audio track when displayed at a higher frame rate. One or more additional synthesized
frames may need to be added after the last existing frame of the image sequence to
assist with synchronization. Those synthesized end frames are typically created from
the last image frame. There are a number of ways of creating synthesized frames from
a single image frame. One method is to create a duplicate of the frame. A second
method is to generate half forward motion vectors from the image frame to the
synthesized frames and fill all pixels of the synthesized frames.
[0082] A frame interpolation method based on more than two frames can significantly
reduce the temporal inconsistency artifacts. In general, a temporal window with a
length of 2M+1 frames is used to define the range of the image frames used for
generating a synthesized frame. The length of the window can vary according to
motion in an image sequence. A small window length is used for fast motion
sequences and a relatively large window length is used for slow motion sequences.

[0083] Motion estimation is done between a frame immediately before the
synthesized frame and every previous frame, and between a frame immediately after
the synthesized frame and every future frames. The same pyramid voting-based
motion estimation method can be used between each pair of frames and generate both
forward and backward motion vectors.
[0084] Half motion vectors 620 are generated between the synthesized frame and
every other image frames within the temporal window using the same method as
disclosed previously. The pixel value of the synthesized frames can be calculated by a
weighted average of all projected pixels that exist
[0085] For image sequences that contain fast motion, it becomes more challenging to
enhance temporal resolution because motion estimates computed using the method
described above become less accurate. Some embodiments of the present invention
provide a layered approach 155 for dealing with scenes with fast motion. The layered
approach 155 segments images into different motion layers and groups image
elements that share similar motion into the same layer.
[0086] The half motion vectors 620 estimated from the previous methods can be used
for motion segmentation. Each original image frame is divided into small blocks. The
motion representing each block is estimated by an affine motion model, which is
calculated through the least square algorithm based on motion vectors of all pixels of
the block. The fitting error is calculated to evaluate if the motion model is a good fit
or a poor fit. For all good fitting blocks, their affine motion parameters are collected.
Clustering algorithms are applied to cluster the affine motion models into a small
number of classes representing the dominant motions in the whole image.
[0087] With the dominant motion classes, each pixel is mapped to the closest motion
class based on its motion vector. Then all pixels that belong to the same motion class
are grouped into a segment. If a segmented region that is too small, it can be merged
with a larger neighboring segment. If two segments have similar affine motions, they
can be merged into one region. If a region has an affine motion model fitting error too
large to stay as one region, it can be split into two regions, each having a distinct
affine motion model. This segmentation process is repeated until the regions become
stable or the maximum of iterations are reached.
[0088] A final motion segmentation mask is created for each frame containing the
segmentation index for all pixels. The segmentation mask defines the layer structure
for image pixels of each frame.

[0089] The motion segmentation results, including the motion segmentation mask and
the affine motion model parameters, can be are used as the initial estimation for the
segmentation of the next frame. With the initial estimation, and the same processes as
described previously are repeated until the next frame is segmented. This process is
repeated until the motion segmentation masks are created for all original images.
[0090] The motion segmentation masks for a synthesized frame can be interpolated
from the motion segmentation masks of original frames using "AND" or "OR"
operation. Within each layer, a global motion homography matrix for the layer is
calculated and each frame is warped to its neighboring frame based on the global
motion homography matrix before applying pyramid motion tracking. Finally new
layered images are generated through interpolation at each layer using tracked motion
vectors.
[0091] Based on the motion segmentation masks, a new synthesized image frame is
created through composition of all layered images.
[0092] Compared with the layered approach 155 , the frame interpolation method 122
described above can be considered as a single layer approach. Global motion
estimation is applied for each layer in a layered approach so that the resulting motion
vectors are relatively accurate. The layered approach can reduce artifacts occurring in
the edge and occlusion areas.
Artifact Repair by Temporal Consistency Check
[0093] Any visible artifacts can be repaired through temporal consistency check 124,
224 equipped with an automated occlusion detection and occlusion fill capability.
The synthesized frames may contain artifact pixels, and the most visible ones are
those that are inconsistent with their neighboring frames including original image
frames. The temporal consistency check 124, 224 can automatically identify and
repair temporally inconsistent artifacts by checking temporal consistency.
[0094] Artifact pixels of a synthesized frame are those that are marked as anything
but "accurate" in corresponding half accuracy masks generated from the previous
process. Those artifact pixels can further be grouped by their "visibility" within a
temporal window. An artifact pixel is "visible" if it can be "seen" from another frame
if it can be projected onto that frame using a half motion vector and with a sufficiently
small matching error.
[0095] Artifact pixels can be grouped by their visibility. For example, the first group
can include pixels which are visible from a majority of frames within a temporal


window. The second group can include those pixels which are visible from fewer than
50% of past frames within the window. The third group can include those pixels that
are visible from fewer than 50% of future frames within the window. The artifact
pixels of the first group can be considered non-occlusion pixels, while the other two
groups can be considered occlusion pixels. Half accuracy masks can be used to group
artifact pixels. Artifact pixels as identified can be automatically removed.
[0096] Artifact pixels that are inconsistent temporally produce the most objectionable
artifacts. Although those artifacts can be repaired by a skilled user interactively with
conventional painting, cloning or compositing software tools, the process is
nevertheless time consuming and labor-intensive. The present invention discloses a
far more efficient method of repairing temporally inconsistent artifacts.
[0097] Half accuracy masks 622 can be used to group artifact pixels. For a pixel of
the first group, pixel-matching errors are calculated with the pixel's projected pixels
in past and future frames within the temporal window. If the matching error for a
certain frame is noticeably larger than others, the pixel is marked as an artifact pixel.
For a pixel of the second group, the pixel matching error is check with its projected
pixel in future frames. If the matching error for a certain frame is noticeably larger
than others, the pixel is marked as an artifact pixel. For a pixel of the third group, the
temporal consistency check is done to past frames within the temporal window and
the pixel is marked as an artifact pixel if the matching errors are large. For all
identified artifact pixels, repair is performed by automatically replacing the pixel with
the average color values of all projected pixels in corresponding past and/or future
frames.
[0098] The artifact repair process 224 for a 3D motion picture can repair artifacts
from inconsistencies between L and R images, and can identify such artifacts by
checking L-R consistency. For example, a stereo matching process is applied to each
L-R image pair, and depth information can be estimated. The resulting depth maps are
temporally interpolated to create depth maps for synthesized frames. Using
interpolated depth maps, the pixels in the synthesized L and R images are checked
and inconsistent pixels are marked and repaired using corresponding pixels from the
other eye or directly from other frames of the same eye.
[0099] The artifact pixels that are survived from the previous temporal consistency
check 124, 224 can be further reduced or removed interactively by, for example,
merging pixels of different versions. Different versions of a pixel are created under


different flags and merged to reduce artifacts. In a GUI environment, a user can
review different versions of enhanced image sequences and select the best version or
merge a number of selected versions together to generate a new version containing
fewer artifacts.
[00100] As described in more detail below, artifacts may also be repaired by user
interaction 126, such as providing access to the data on a processor-based device
having computer-executable code that allows users to repair the data.
Quality Control
[00101] The Quality Control (QC) 128, 228 is a process in which a trained human
operator visually examines the enhanced image sequences to decide if the resulting
quality is acceptable. As shown in Fig. 4, the QC process 128, 228 can be performed
on a QC station 420. The QC station 420 can be, for example, a workstation equipped
with a high-quality display system, a sufficiently large disk storage and software that
allows the operator to perform all functions needed for QC inspection. The display
system can be a 2D display system or a stereoscopic 3D display system or a system
that supports both 2D and 3D. The enhanced image sequences are sent to the QC
station 420 by a central server that hosts an Intelligent Controller (IC) 406 once the
required processing is completed, and the tracking software tells the operator which
sequences are available for QC inspection. If the operator identifies an artifact, he or
she can report the nature of the issue back to the IC 406 through the QC software that
provides necessary user inputs. The sequence is then reprocessed with adjustments
based on the inputs of the operator, which may require the certain frames to be reprocessed
by the automatic processing using a new set of parameters or may require
using different artifact removal methods by a human operator. The re-processed
image data can be brought back by the IC 406 to the QC station 420 for further
inspection. This process is repeated until the resulting quality is seen as acceptable
and then the processing is completed. The operator can notify the IC 406 that the
sequence is acceptable, and the image data is ready for output.
[00102] A completed and accepted image sequence can be a series of files on a
central server, such as IC 406. These files have a standardized file format and serve as
the source master files for the re-mastered motion picture. Upon output 130, 230 or
release of a motion picture, the source master files are converted to release master
files for cinematic releases. The format of release master files may depend on the
release display platform deployed in a theatre. For a standardized digital cinema

release, the release master files may be directly used for exhibition. For other non-
standardized digital releases, the release master files may be converted to an
appropriate digital file format for exhibition. For film-based exhibition, the release
master files (or the source master files) can be recorded onto an appropriate film
format for a film print release. There are a myriad of file types and compression
schemes in the digital realm that the source master files can be converted to release
master files. In a typical release process, the release master files need to be written to
a tape or external disk storage for transporting to cinemas or for archival purposes.
The IC can schedule the work required for file format conversion based on a priority
scheme established by users. An example of a priority scheme could be to prioritize
certain sequences, like movie trailers, that appear before the main motion picture
presentation. The specific release master format, the compression scheme used and
the priority scheme for each cinema presentation are tracked by the IC so that the
location and status of each motion picture release is known to the overall Production.
[00103] The digital re-mastering methods can be implemented as a highly
automated production computing system. Fig. 4 illustrates one embodiment of a
computing system 400 implemented as a combination of two subsystems: a front-end
system 402 that mainly supports applications that require user interaction and a back-
end system 404 that can be totally automated. Both subsystems use a combination of
networked software and hardware. Both hardware and software are monitored and
controlled through a central software entity or Intelligent Controller (IC) 406 on a
server.
[00104] The back-end subsystem 404 includes the IC 406, a central data storage
408 and many distributed render clients 410a-n forming a render farm. A data
input/output 415 may be associated with the IC 406 in the back-end subsystem to
provide access to the IC 406 for loading computer-executable code such as software
and otherwise configuring the IC 406. The back-end subsystem 404 may also include
cluster switches 414a-n and a backbone switch 417 for controlling data flow over the
network. The render farm may have any suitable number of render clients 410a-n.
Most computing tasks can be carried out in the back-end subsystem 404, and those
tasks can include: automated scene segmentation, motion-based spatial resolution
enhancement, learning-based spatial resolution enhancement, and a majority of tasks
of temporal resolution enhancement. The IC 406 performs various control and

tracking functions. A number of daemons run on this server, continuously monitoring
for work or updates required and performing them independently.
[00105] The IC can perform functions in three main areas:
• Monitor the physical hardware and data, tracking all system resources being
used and available;
• Respond to queries from users providing real-time reports; and
• Launch processes on data as actions are required and as system resources
become available.
[00106] The IC 406 can internally represent the state of all data and processes ran
or being run for a production. All processes running, whether automatic machine tasks
or manual human operator actions, report back their progress and status back to this
IC 406. It monitors a central data storage 408 for data that is newly available as well
as data that has been processed and may be ready for tape backup. The IC 406 can be
queried to obtain accurate real-time information as the progress of a production or a
single shot
[00107] IC 406 information can be accessed remotely from any machine with
network access to the IC 406. For example local front-end workstations 412a-n or
external computers or communication devices can remotely communicate with the IC
406 over the Internet. The information accessed is user specific enabling a level of
security and access available to each individual. The information is reported through
fixed reports or a proprietary 'query builder' that allows users to create their own
reports. They can control the search criteria for the results and also set what
information they wish to have returned for the matched objects.
[00108] The IC 406 can track processing at three general levels: Frame, Shot, and
Production.
[00109] Examples of attributes tracked at a frame level include:
• render clients used to run the various processes on the frame;
• process completion status, e.g. waiting to run, completed, and running;.
• times that processes occurred or were queued for each frame; and
• frame dimensions, file type, and bit depth.
[00110] Examples of attributes tracked at a shot level include:
• processes that have been or are needed to be performed on the shot;

• shot information i.e. length, other names the shot may be referred to by
external productions or companies, descriptions, and keycode on film when
recorded;
• parameters set, specific to those processes;
• track and monitor changes requested for a shot by an approver;
• user identification that manipulated the shot;
• times of completion of any stages or processes to the shot;
• version control ~ signing in and out by users;
• shipping information; and

• film-recording information such as times, and recorder used.
[00111] Examples of attributes tracked at a production level include:
• shipping information;
• users' assigned work;
• users' past completed work; and
• production statistics, e.g. completion percentage, estimated completion times,
and frequency of multiple versions of shot.
[00112] The IC 406 is also responsible for launching all the processes applied to
the data. When system resources become available the IC 406 allocates the processing
to the many distributed render clients 410a-n, preferably in an optimal manner based
on need and resources. Each render client 410a-n, once instructed to run a job, is
responsible for pulling all image data it requires from the central data storage 408,
executing required operations on each frame and pushing the enhanced image data to
a temporary location at controller data storage 408. For a job that was distributed to
multiple render clients 410a-n, the IC 406 assembles the rendered segments from
render clients 410a-n into a continuous shot. The IC 406 also checks the integrity of
the assembled data for occasional missing frames or incomplete frames in the shot. If
missing frames or incomplete frames are discovered, the IC 406 sends a request to the
same render clients for re-rendering of those frames. The communication between the
IC 410 and render clients 410a-n is crucial for render efficiency.
[00113] The IC 406 tracks the current state of each render client 410a-n and
constantly monitors for available processors. In the eventuality of failure of a render
client 410a-n, the IC 406 raises an alert for repair. It reroutes the job to other available
render clients 410a-n for processing. A diagnostics process ensures that there is no

loss of data during the transfer. If the IC 406 experiences a failure, the state of the
system before malfunction is preserved. In one embodiment of the present invention,
the IC 406 re-starts by killing all processes that are running on render clients 410a-n
and re-assigns jobs to each render client 410a-n. In another embodiment, the IC 406
polls the render clients 410a-n for their status, finds their current states and resumes
the control. This is a more complicated re-start scheme, but no re-rendering of data is
required.
[00114] In one embodiment of the present invention, the IC 406 comprises the
following software components:
[00115] Scheduler - monitors for processes needing to be run on data. It manages
render job distribution and assigns jobs to specific render clients 410a-n based on a
pre-determined load-balancing scheme. If there are multiple available candidates, the
IC 406 checks the network traffic load distribution among render client clusters and
selects a render client (or render clients) 410a-n from the cluster (or clusters) with the
lowest traffic load. For each job in a queue, it may assign it to a single render client
410a-n, especially when there are more jobs waiting in the queue than the number of
available render clients, or it may assign the job to a multiple of render clients 410a-n,
especially when the job needs to be completed as quickly as possible.
[00116] Auto Launch - monitors for data becoming available for processing.
Usually with processing the data is assembled and a command is launched on the
data. Actions or processes are set to be run on any number of shots (or all shots of a
film) at the start of the production. As our processes are self-setting, that is, they
intelligently analyze the data to pick optimal settings and parameters independently,
no human interaction is required. The Auto Launch then, monitors the physical
hardware and network for data to become available. Once all components are
available it submits a request to the Scheduler to launch the processes on that
particular data. This optimizes the workflow, resulting in no lost time between data
being ready to process and the launching of the required processes.
[00117] File Setup - monitors data that is complete and is required in another
physical location or format. Often during a production, data is required to be available
in different formats. The File Setup daemon monitors files and their status for any
different versions or formats needed. If required it will process the data into the
necessary format and also transfer it physically to another location. This has some
useful applications. Primarily it optimizes the workflow by having all data available

in all required location and formats in the minimum amount of time. It can also be
used to create intelligent caching on the overall system so as to improve network
performance and sped up user interaction. An example of this is that the IC 406
knows which data a user will be working on the next day and which front-end
workstation will be used. A proxy version of the data can be transferred locally to the
workstation in off-peak hours to be available immediately for the user and to
eliminate the resource use during a busier time.
[00118] Tape Writer - monitors for finished data that is required to be backed up
to tape and writes it. The daemon is in constant contact with the other elements of the
IC 406 and so is aware when data is both required and available for tape backup. It
can independently write the data to a tape and report the relevant information of tape
name and write times back to the central database
[00119] The front-end subsystem 402 can include a number of computer
workstations 412a-n and a number of QC stations 420a-n, each capable of being
manned by an operator or artist. The front-end subsystem may also include a network
switch 421 for controlling and otherwise assisting dataflow through the network. The
workstations 412a-n are standard desktop computers and run a combination of custom
software as well as commercially available applications. The computing tasks
performed at the front-end subsystem 402 include inputting of EDL information in
scene segmentation, quality control in both the spatial and temporal resolution
enhancement processes and artifact repair.
[00120] All applications running on the front-end machines are connected through
a network connection to the back end control software. Any changes or updates made
are reported back to the back end processes where the state of each shot and the
overall production is stored. Examples of this communication include the signing out
and signing back in of data, completion of a human manual task on data etc. The
operators on the front-end machines can also query the IC 406 explicitly to obtain
information regarding the processing and status of the production.
[00121] One software application provided by the front-end workstations 412a-n
primarily is a tool for artifact repair. A human operator can highlight the problem area
and use a number of methods to remove the artifact. The methods include:
• manual painting, either with information from the same frame or another
frame from the sequences;

• automated painting that used either, or both, spatial and temporal analysis to
remove unwanted results; and
• a combination of manual painting and automatic analysis that intelligently
provides suitable data to be painted in by the operator, removing the artifact.
[00122] The image data flow of a typical motion picture re-mastering production
between the front-end and back-end subsystems is depicted in Fig. 5. The overall
workflow is as follows.
[00123] Scene editorial information, an edit description list (EDL) file can be input
into the IC. This enables the 'Auto Launch' to map the input digital frames from the
original production to discrete individual shots that will run through the system. The
digital data that is input may be, for example, a series of numbered files scanned from
film (e.g. frame.0001.cin, frame.0002.cin ...) or could be arbitrarily named files from
a digital color grading process, where there are many different types of file prefixes
and numbering schemes.
[00124] The IC for this production begins to monitor the system in preparation for
performing tasks. For example, the IC may obtain whether data and system resources
are available to run the re-mastering processes and data needs to be moved or
somehow reformatted.
[00125] After set-up processes, image data from the source production is provided
to the central data storage in step 510 where it can be seen by the IC. The data can
arrive in many ways such as a tape format, disk drives or arrays, directly from a
scanner output, etc.
[00126] In step 515, the IC notices the data is available and consults the editorial
information input for this particular production. If some required grouping of data is
complete on disk, (all frames from a shot), then the source data is segmented into
groupings of similar frames called shots based on the editorial information. Once the
original data is divided into a complete shot the IC queues the shot for re-mastering,
known as a 'job', to be run on the distributed processors. The 'Scheduler' observes
that there are jobs queued for processing and so splits the shot among the render
clients. This means taking a shot and dividing it up to run in pieces, separately on
different remote processors.
[00127] In step 520, the remote processors or render clients, receive their jobs and
report back that they are beginning the processing. They copy the shot data required
to their local drives, for example the frames from the range of the shot they have been

assigned. The render clients can automatically analyze the data in order to decide on
the best procedure to run and the optimal settings for those processes. Once complete
the render client runs the required processes at the optimal settings for that data.
[00128] In step 525, the render clients transfer back the finished data to the server
and report to the IC that they are finished processing and that the range of frames
assigned to it are ready for the next step of the workflow. They also report to the IC
that they are available for more work.
[00129] Alternatively, the analysis processes are separated from the executing
processes. For example, an analysis job being queued with the IC is run first on the
render clients and the results are passed back to the IC. From this point the actual
processing is then queued with the parameters established by the analysis and the IC
then splits and assigns the re-mastering processes to the remote processors. This can
add efficiency and consistency in analyzing once per shot, trading off against sub-shot
adjustment in parameters for possibly more accurate results if there is a great variance
within a shot.
[00130] In step 530, the IC transfers preview data of the re-mastered frames to a
front-end quality control workstation (QC stations) for quality assurance and
inspection. All work and data flow up to thus point has occurred in the back end of the
System. At this point the work is passed to the front end. The IC informs the user or
System Manager that there is data waiting to be viewed. A trained quality control
operator views the data and has a few options based on their findings. They then must
tell the IC their decision usually by setting the status of a shot within the front-end
software they are evaluating the data with.
[00131] In step 532, the operator can inform the IC that the data is accurately
processed and complete and therefore is approved or decide that the data needs
additional automated processing and the nature of the re-processing needed.
[00132] If additional processing is indicated, the shot is then queued again for
analysis and processing with the suggestions of the operator translated by the IC into
parameter influences for the automatic analysis in step 535. The job is given a higher
priority to run sooner and faster so as to not cause bottlenecks in the flow of a project
through the workflow. These two decisions pass the control of the shot back to the
back end of the workflow. The third decision retains the control in the front end.
[00133] If no additional processing is indicated, the data is set to move to storage,
such as tape backup, disk backup, film-recording, digital projection mastering or any


variety of end display manipulation in step 550. The system can then output the data,
such as with a data I/O device.
[00134] In step 538, the operator can decide that the shot requires some front end,
user-assisted fixing of residual artifacts. The IC then transfers data to the necessary
local front-end workstations for user-assisted repair of the shot The IC assigns any
data marked for human manual artifact repair to available repair operators much as it
would assign work to the remote processors or render clients. The same optimizations
of workflow can be achieved with the human operators by either splitting shots into
multiple jobs across many people to speed up the completion or to make maximum
use of idle workstations.
[00135] In step 540, a repair operator works on their assigned job and when
complete, submits them back to the IC Server.
[00136] In step 542, data can be considered 'approved' by the human operator or,
in step 545, can be cycled back through to the front-end QC stations for further
quality assurance and inspection.
[00137] In step 550, the IC watches for completed jobs and assembles any shots
that have been split across multiple workstations. Once a compete set of frames or a
shot has completed all processing and is approved The IC monitors for completed
data compared to required shipments and transfers any data that fulfills the criteria to
the final stage. This can include processes such as tape or disk backup for shipping,
film-recording, digital display or any number of display end processes.
General
[00138] The foregoing description of the embodiments, including preferred
embodiments, of the invention has been presented only for the purpose of illustration
and description and is not intended to be exhaustive or to limit the invention to the
precise forms disclosed. Numerous modifications and adaptations thereof will be
apparent to those skilled in the art without departing from the spirit and scope of this
invention.

CLAIMS
What is claimed is:
1. A method for enhancing the quality of a motion picture image sequence, the
method comprising:
receiving an original motion picture image sequence comprising digital
data of a plurality of image frames;
applying a spatial resolution enhancement process and a temporal
resolution enhancement process to the digital image sequence to create an enhanced
image sequence;
wherein the enhancement processes are automatically controlled; and
wherein the enhanced image sequence has a greater frame rate than the
original image sequence and the enhanced image sequence has greater image detail
than the original image sequence.
2. The method of claim 1, further comprising:
synchronizing the enhanced image sequence to an audio track for the
original image sequence.
3. The method of claim 1, wherein the original motion picture image sequence
and the enhanced image sequence are two-dimensional (2D) sequences.
4. The method of claim 1, wherein the original motion picture image sequence
and the enhanced image sequence are three-dimensional (3D) sequences.
5. The method of claim 1, wherein the original motion picture image sequence is
in 3D and the enhanced image sequence is in 2D.
6. The method of claim 1, further comprising:
dividing the original motion picture image sequence into shots; and
performing the enhancement processes on each shot
7. The method of claim 1, wherein the original motion picture image sequence is
a single shot.
8. The method of claim 1, further comprising:
formatting the enhanced image sequence to a display presentation
format; and
synchronizing the formatted enhanced image sequence to an audio
track for the original image sequence.
9. The method of claim 1, wherein the spatial resolution enhancement process
comprises:

a motion-based spatial resolution enhancement process; and
a learning-based resolution enhancement process.
10. The method of claim 9, further comprising:
applying the motion-based spatial resolution process to a 3D image
sequence, wherein applying the motion-based spatial resolution process comprises:
disparity estimation;
display map regulation; and
detail discovery.
11. The method of claim 9, wherein the learning-based spatial resolution
enhancement process comprises:
generating a codebook comprising codewords, each codeword being
associated with a high-resolution pattern;
applying a clustering analysis to reduce the size of the codebook;
upsizing an original image of the original image sequence to a higher
resolution, the original image comprising a plurality of pixels;
matching each pixel of the upsized image to a codeword; and
replacing each pixel by a central pixel of the high-resolution pattern
associated with the matched codeword.
12. The method of claim 9, wherein the learning-based spatial resolution
enhancement process comprises:
generating a codebook comprising codewords, each codeword being
associated with a high-resolution pattern;
applying a clustering analysis to reduce the size of the codebook;
upsizing an original image of the original image sequence to a higher
resolution, the original image comprising at least one block of pixels;
matching each block of pixels of the upsized image to a codeword;
replacing the block of pixels by a high-resolution pattern associated
with the matched codeword using a transformation process to create an enhanced
block of pixels;
applying a blending process to the enhanced block of pixels; and
applying a temporal filtering process.
13. The method of claim 1, wherein the temporal resolution enhancement process
comprises:
pre-processing;

global motion estimation;
local motion estimation;
half-motion vector generation;
frame interpolation; and
artifact repair by temporal consistency check.
14. The method of claim 13, wherein pre-processing comprises:
generating an edge mask map from each image frame; and
generating a color segmentation map from each image frame.
15. The method of claim 13, wherein each image frame comprises a plurality of
pixels, the global motion estimation comprising:
a. computing a gradient cross correlation matrix for each pixel;
b. calculating at least one feature point for each pixel based on the
gradient cross correlation matrix;
c. matching at least one of the calculated feature point to a feature
point of a next frame in the image sequence;
d. selecting at least four matched feature points;
e. estimating global motion based on the selected feature points;
and
f. interactively repeating steps d and e by selecting different
matched feature points until a global motion estimate is obtained for each pixel.
16. The method of claim 13, wherein the local motion estimation process
comprises:
receiving motion estimates from the spatial resolution enhancement
process;
using the motion estimates as initial motion estimates;
receiving edge mask maps and color segmentation maps from the pre-
processing process;
using global motion estimates from the global motion estimation; and
computing local motion vectors based on a voting-based method.
17. The method of claim 16, wherein each image frame comprises a plurality of
pixels; and
wherein the voting-based method comprises:
applying at least one local motion estimation method to each
pixel;

computing forward and backward motion vectors for each local
motion estimation method; and
selecting a motion vector for each pixel from the forward and
backward motion vectors using a voting process.
18. The method of claim 17, wherein local motion estimation method comprises at
least one of:
block matching method;
feature propagation method;
modified optical flow method; and
integral block matching method.
19. The method of claim 16, wherein the voting-based method is a pyramid
voting-based method.
20. The method of claim 17, wherein the voting process comprises at least one of:
selecting a motion vector by different edge values;
selecting a motion vector by coherence check;
selecting a motion vector by minimum matching errors;
selecting a motion vector through forward and backward checks;
selecting a motion vector using color segmentation maps.
21. The method of claim 13, wherein half-motion vector generation comprises:
determining a time interval of a synthesized image frame between a
first image frame and a second image frame;
assigning half-forward motion vectors and half-backward motion
vectors to each pixel of the synthesized frame; and
generating a half-accuracy mask corresponding to the synthesized
frame, the half-accuracy mask marking a status of each pixel.
22. The method of claim 13, wherein frame interpolation comprises:
receiving half-motion vectors and half-accuracy masks generated from
at least two image frames;
creating a synthesized frame based, at least in part, on the half-motion
vectors and half-accuracy masks;
generating missing pixels of the synthesized frame by interpolation and
averaging;
inserting the missing pixels into the synthesized frame; and

generating the enhanced image sequence having the synthesized image
frames.
23. The method of claim 22, further comprising:
maintaining synchronization with an audio track of the original image
sequence by adding at least one synthesized frame.
24. The method of claim 22, wherein the synthesized frame is created using
pyramid representation of a pair of images.
25. The method of claim 22, wherein the synthesized frame is created using more
than two images.
26. The method of claim 13, wherein artifact repair by temporal consistency
check comprises:
identifying artifact pixels by half accuracy masks;
grouping artifact pixels by visibility;
checking temporal consistency based on grouping; and
automatically repairing temporal inconsistent artifact pixels.
27. The method of claim 13, further comprising a layered frame interpolation
process.
28. The method of claim 1, further comprising removing artifacts by user
interaction.
29. The method of claim 27, wherein removing artifacts comprises merging
different versions of synthesized frames. .
30. A system for enhancing the quality of a motion picture image sequence, the
system comprising:
a back-end subsystem comprising:
a central data storage for storing an original motion picture
image sequence and an enhanced image sequence, the original motion picture image
sequence comprising digital data of a plurality of image frames;
a render client for configured to perform a spatial resolution
enhancement process and a temporal resolution enhancement process on the original
image sequence to create the enhanced image sequence; and
an intelligent controller for controlling the render client and
accessing the central data storage; and
wherein the intelligent controller automatically controls the
enhancement processes; and

wherein the enhanced image sequence has a greater frame rate than the
original image sequence and the enhanced image sequence has greater detail than the
original image sequence.
31. The system of claim 30, further comprising:
a front-end subsystem comprising a workstation for communicating
with the intelligent controller, the workstation being adapted to provide user input and
interaction in repairing artifacts in the enhanced image sequence, performing a quality
control check of the enhanced image sequence, and segmenting the original image
sequence.
32. The system of claim 31, wherein the workstation comprises multiple
workstations.
33. The system of claim 32, wherein at least one of the multiple workstations
comprises a quality control workstation.
34. The system of claim 30, wherein the original motion picture image sequence
and the enhanced image sequence are in 2D.
35. The system of claim 30, wherein the original motion picture image sequence
and the enhanced image sequence are in 3D.
36. The system of claim 30, wherein the original motion picture image sequence is
in 3D and the enhanced imaged sequence is in 2D.
37. The system of claim 30, wherein the render client is adapted to examine the
quality of the enhanced image sequence.
38. The system of claim 30, wherein the intelligent controller comprises a
processor and a memory comprising executable code, the executable code
comprising:
a scheduler;
a auto launch;
a file setup; and
a tape writer.
39. The system of claim 30, wherein the render client comprises multiple render
clients.
40. The system of claim 39, wherein the intelligent controller is adapted to detect
a system failure and shut down each render client and re-assign a job to each render
client.

41. The system of claim 39, wherein the intelligent controller is adapted to
monitor the render clients to prevent re-rendering of data.
42. A method for enhancing the quality of an original motion picture image
sequence, the method comprising:
receiving an image sequence;
receiving initial motion estimates;
applying a temporal resolution enhancement process comprising:
pre-processing;
global motion estimation;
local motion estimation;
half-motion vector generation;
frame interpolation; and
artifact repair by temporal consistency check;
wherein the temporal resolution enhancement process is automatically
controlled; and
wherein the enhanced image sequence has a greater frame rate than the
original image sequence.
43. The method of claim 42, wherein the original image sequence and enhanced
image sequence is digital data.
44. The method of claim 42, wherein the original image sequence and the
enhanced image sequence are in 2D.
45. The method of claim 42, wherein the original image sequence and the
enhanced image sequence are in 3D.
46. The method of claim 42, wherein the original image sequence is in 3D and the
enhanced imaged sequence is in 2D.
47. A method for enhancing the quality of an original motion picture image
sequence, the method comprising:
receiving a 3D original motion picture image sequence;
applying a spatial resolution enhancement process to the 3D original
image sequence to create an enhanced image sequence; and
wherein the enhanced image sequence has greater image detail than the
original image sequence.
48. The method of claim 47, wherein the spatial resolution enhancement process
comprises

a motion-based spatial resolution enhancement process; and
a learning-based resolution enhancement process.
49. The method of claim 48, further comprising:
applying the motion-based spatial resolution process to a 3D image
sequence, wherein applying the motion-based spatial resolution process comprises:
disparity estimation;
display map regulation; and
detail discovery.
50. The method of claim 48, wherein the learning-based spatial resolution
enhancement process comprises:
generating a codebook comprising codewords, each codeword being
associated with a high-resolution pattern;
applying a clustering analysis to reduce the size of the codebook;
upsizing an original image of the original image sequence to a higher
resolution, the original image comprising a plurality of pixels;
matching each pixel of the upsized image to a codeword; and
replacing each pixel by a central pixel of the high-resolution pattern
associated with the matched codeword.
51. The method of claim 48, wherein the learning-based spatial resolution
enhancement process comprises:
generating a codebook comprising codewords, each codeword being
associated with a high-resolution pattern;
applying a clustering analysis to reduce the size of the codebook;
upsizing an original image of the original image sequence to a higher
resolution, the original image comprising at least one block of pixels;
matching each block of pixels of the upsized image to a codeword;
replacing the block of pixels by a high-resolution pattern associated
with the matched codeword using a transformation process to create an enhanced
block of pixels;
applying a blending process to the enhanced block of pixels; and
applying a temporal filtering process.

The present invention relates to methods and systems for the exhibition of a motion picture with enhanced per- ceived resolution and visual quality. The enhancement of perceived resolution is achieved both spatially and temporally. Spatial resolution enhancement creates image details using both temporal-based methods and learning-based methods. Temporal resolution enhancement creates synthesized new image frames that enable a motion picture to be displayed at a higher frame rate. The digitally
enhanced motion picture is to be exhibited using a projection system or a display device that supports a higher frame rate and/or a higher display resolution than what is required for the original motion picture.

Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=qhtSQAw+iMMVJ7lAaYn5OA==&loc=wDBSZCsAt7zoiVrqcFJsRw==


Patent Number 280051
Indian Patent Application Number 3002/KOLNP/2008
PG Journal Number 06/2017
Publication Date 10-Feb-2017
Grant Date 08-Feb-2017
Date of Filing 24-Jul-2008
Name of Patentee IMAX CORPORATION
Applicant Address 2525 SPEAKMAN DRIVE, SHERIDAN PARK MISSISSAUGA, ONTARIO L5K 1B1
Inventors:
# Inventor's Name Inventor's Address
1 ZHOU, SAMUEL 2888 SHELFORD TERRACE, MISSISSAUGA ONTARIO L6M 6J9
2 YE, PING 3382 ASH ROW CRESCENT, MISSISSAUGA, ONTARIO L5L 1K4
3 JUDKINS, PAUL 27 HALTON STREET, TORONTO, ONTARIO M6J 1R5
PCT International Classification Number G06T 13/00,G06T 5/00
PCT International Application Number PCT/IB2007/00188
PCT International Filing date 2007-01-29
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/762964 2006-01-27 U.S.A.