Managing Multimedia and Unstructured Data in the Oracle Database
上QQ阅读APP看书,第一时间看更新

Understanding each image type

When looking at the different types of digital images, it becomes apparent that there is a lot more to understanding them. Each has different characteristics and capabilities, which when well understood can add a new depth to their usage. This section covers those features.

Photo

A photo is a two-dimensional representation of anything—also referred to as a digital image. A photo can be taken with a digital camera or it can be scanned in. It is composed of pixels, where each pixel represents information, typically a color. The more pixels that are able to be compacted together, the higher the resolution of the display. The iPhone 4 introduced a display format, where the pixel format is so tightly compacted that the human eye cannot discern the pixels, making it appear as true color (which represents all the colors the human eye can see). The goal is to produce displays that are true colors.

Displaying pixels on a computer screen is a completely different process to printing a photo on paper. Paper does not use the concept of a pixel, and combining different colors on a computer screen produces different results when printing (see Color space later).

Icon

An icon is also a two-dimensional representation of anything. It is created manually and not scanned in or photographed. Its format is simpler and can be stored uncompressed (bitmap) or compressed using an algorithm that is lossless. The two most common formats are GIF and PNG. An icon is generally used for screen display and is also used to help with navigation, to convey information, or represent a digital object. An icon can be used to visually represent an audio file.

Color space

A color space is used to define how a color is digitally represented.

Colors when printed to paper are represented as CMYK (cyan, magenta, yellow, and black). Newspapers and photo labs use CMYK to combine the inks and print the image. Color printers include a three-tone color cartridge and black to cover the range of all visible colors.

Computer display screens cannot use CMYK for display. They use RGB (red, green, blue) to cover the range of colors. Cathode ray tube screens, LCD screens, or ones using LEDs all use the RGB format, as when they are combined in different amounts they can cover the whole visible light spectrum.

One key goal of a color space is to ensure consistency in picture display across devices. As different electronic devices use different methods for emitting color, the idea of the color space is to ensure when a color is displayed, it represents as closely as possible the true color.

Color calibration

Even though a color space(2) is used, there is no true way to calibrate the image with the color. There is no sure way to guarantee that the color you see on the electronic device is the correct color. When you have gone to an electronic store to buy a television, you can easily see all the different monitors. Each one displays the same picture with a different brightness or color hue. This shows how easy it is to separate colors that are similar to the light spectrum.

One method for addressing this is Windows Color System(3), which is a model that extends the color space and takes into account the characteristics of the display device and can adjust the color to better match it.

The simplest and traditional method is to embed a color chart into the image. The chart is a set of color boxes that are included in the picture. On viewing the picture, the photographer can then visually adjust the color balance to match what the expected colors are. Some photo management tools can automatically adjust the balance by detecting the color chart.

The disadvantage of a color chart is that it is visibly seen on the image. Most organizations will take two pictures. One with the color chart embedded in it, and one without the embed. The assumption being that by adjusting the colors using the photo with the color chart in it, this can also be applied to the photo taken without the color chart.

Color calibration

Different products use different color spaces when storing digital images. The color space is stored embedded in the image. With changes in technology, the color space itself can be updated to handle the new technology. The most common is the RGB color space. A variation called sRGB(4) is now being extensively used, as it is designed for home and office viewing.

The RGB color model family

With RGB, 3 bytes are typically used to store one pixel of information. Each byte represents a number. A byte equals 8 bits, which represents 256 values. Each of the RGB values are referred to as a channel or a triplet. The combination of these three values enables 16,777,216 colors to be displayed (256 x 256 x 256). In some color spaces, 2 bytes or 16 bits can be stored. This doubles the size of the image but enables a greater color range to be stored. A 16-bit number enables 65,535 different values.

  • RGB(5): All of the RGB color spaces are additive, meaning that by combining the light emanating from a color element, they can achieve the colors in the light spectrum.
  • sRGB(6): Is color space designed by companies including Microsoft and Hewlett Packard? The goal was to ensure the colors presented on a computer screen in most typical home and work environments matched correctly. It is now the most popular color space and used in the display of most JPEG images one finds on the Internet.
  • Adobe RGB(7): Is a color space that closely matches CMYK ensuring the color on the screen matches the one that is printed out on a color printer? This is important for photographers, who provide a visual and printed copy of photos. This includes photo laboratories, portrait, and wedding photographers. It is also referred to as Adobe RGB 98.
  • ProPhoto RGB(8): Is a color space designed to handle all possible color combinations occurring in the real world? It can even include colors that are not visible. The recommendation when using it is to use 16 bits rather than 8 bits to store the RGB values.

Viewing colors

It must be pointed out that with the improvements in technology for taking images and devices for displaying them, there is a drive to try and have a system that truly matches real-life colors. The problem is that this is an impossible goal to achieve. All that can be achieved is to match the colors at the exact time the photo was taken.

Cloud color, time of day, and shadows can change the color display of any real-world object. Also, people view colors differently. It is estimated that between 1 to 10 percent of the male population exhibits some form of color-blindness. With females, it less than 0.3 percent(9). Color perception and sensitivity can also change overtime. It can be impacted by glasses or any eye wear.

Though being color-blind might be considered to be a deficiency, it is now realized that, in some cases, there is an advantage(10). A person who exhibits some form of color-blindness will not be able to see or discern a difference between a photo using sRGB or ProPhoto RGB. The desire to match colors exactly to the real world is not a driving concern.

The strong desire to exactly match viewed colors to the real world can be seen as an impractical aspiration and one in which the rewards and time spent in trying to achieve that match, can only be appreciated by a tiny fraction of the world population, who have the color sensitivity to see it. A similar argument was and still is raised with listening to a digital CD versus listening to the same music being played on Vinyl. Those with a well-trained ear and not suffering from any form of hearing impairment can, when the music is played on a vinyl-playing system that is of high quality with exceptional speakers, be able to discern a difference to the one being played on a CD. The conclusion is that the Vinyl system is better, because no information is lost, whereas the CD in the digitization process has lost information.

The question is not whether one is better, but whether it's cost effective to attempt to achieve this level of accuracy when only a small percentage of the population, when focused, can truly tell there is a difference. The analogy of Don Quixote is tilting at windmills(11) seems apt for those intent on achieving a 100-percent color match of the image displayed to the one appearing in the real world.

As covered in Chapter 3, The Multimedia Warehouse the multimedia environment is not the one designed to be exact. It is fuzzy. It's full of scenarios where things do not match correctly or can be interpreted differently.

Though it is a worthwhile goal to achieve a view of the digital object that exactly matches the real world, the more realistic and cost-effective goal is to achieve the one that very closely matches it, the one in which the vast majority of the population viewing it will not be able to discern any difference.

Printing using the CMYK colorspace

CMYK(12) is referred to as a subtractive color model. Unlike RGB, in which colors are added together to achieve the color spectrum, with CMYK, the brightness is subtracted to achieve the desired color. By adding more colors together, the resultant color becomes darker. In RGB, the opposite happens. The more colors added, the greater the tendency there is to move to a white color. CMYK coloring occurs when real-world ink or dyes are added together. Black is added because when used it saves on ink. To achieve black, otherwise, would require adding the CMY colors together.

An image encoded using the RGB color space needs to be recoded into CMYK to enable it to be printed. As covered, the Adobe RGB color space is designed to make this translation with minimal errors.

Other color spaces

The are the color spaces:

  • YIQ(13): This color space is designed for NTSC TVs. These are the ones used in North and Central America.
  • YpbPr(14): This color space is also designed for TVs. The goal was to separate the colors out into separate cables. It only had a use for analog systems, as digital systems transfer the information using bits rather than waves.
  • YCbCr(15): This color space is the digital equivalent of YpbPr and is designed for taking an RGB signal, making it more efficient for transferring TV signals.
  • LAB(16): This color space is found in TIF images and is designed to approximate human vision. Its focus is more on the perception of lightness. It was originated before RGB and was used extensively in the 1990s, as there was no loss in quality in the color of the image, unlike RGB, which had an 8-bit limitation on the color range. With 16-bit RGB support, the requirement for using the LAB color space has diminished and is now typically found in archaic images.

Little endian and big endian

When digital images are stored, numbers are routinely used to represent values in those images. This can be a pixel color or the instructions for drawing a rectangle. In documents, we are used to storing numbers in their character format. How those numbers are read in, from left-hand side to right-hand side or right-hand side to left-hand side, is referred to as endian. A detailed description is covered in Appendix E, Loading and Reading, which can be downloaded from link given in the Preface.

The Intel CPU uses little endian, whereas the Motorola and SPARC CPU use big endian. This means that for some image formats, copying them between environments with different CPUs can effectively corrupt the format. Fortunately, this doesn't happen as the number stored in each image format is locked into little or big endian in its core specification, and the CPU used is taken out of the equation. The JPEG and PNG formats always use big endian. The TIF image format indicates within the header whether little or big endian is used, and the program used to decode it has to handle the byte conversion accordingly(17).

Digital image storage formats

Digital image formats can be broken down into the following:

  • Raster graphics(18): These graphics are also referred to as a bitmap. An image is represented as a set of pixels typically in a rectangular structure with a width and height. Each pixel represents a color and can be stored using multiple bytes.
  • Vector graphics(19): These graphics create an image using a set of instructions (mathematical expressions). These can include points, lines, curves, shapes, text, and polygons—the most well-known format is SVG. Creating shapes in Microsoft PowerPoint, Adobe Illustrator, or figures in OpenOffice Draw can be stored using vector graphics.

Raster graphics formats

The following is a simplified table containing commonly used digital image file formats. Over time, each format has grown to handle different characteristics and capabilities. The GIF format can handle animation. There are exceptions to the rules for each format. A GIF is typically limited to 256 colors but can support a transparent color enabling the image to blend in with the background.

The following table includes information about the various formats:

Raw

The JPEG 2000 image compression was touted as being the standard that would replace JPEG. It could compress better, it offered lossless compression, error resilience, and progressive transmission. The format looked promising until it was pointed out that there is an undeclared and obscure submarine patent in it. This effectively killed its use, as companies would not utilize it if there was a legal risk or potential licensing and cost issue that might appear years down the track (exactly what happened with GIF). An open source standard would have resulted in the usage of JPEG 2000, and web browser builders would have included it in all the browsers. Unfortunately, this didn't happen and the result was that camera manufacturers started to work on their own image storage formats for their cameras. Each one was touted as being the next standard. The result were a sort of new proprietary formats that camera manufacturer started to use. Adobe pushed its own standard DNG, and this one seems to be leading the group as the most popular raw image format.

The idea behind having a raw format is to be able to have a format, where the original is not modified. The original is the exact picture that was taken. Additional smarts might be included in the raw format to enable easier changes in color spaces, fix blurry images, and correct for common issues such as red eye. The following are some of the more common raw formats available:

  • Adobe Digital Negative (DNG)
  • Nikon Electronic Format (NEF)
  • Kodak Digital Camera Format (DCR)
  • Olympus Digital Camera Format (ORF)

Vector graphics

Vector graphic formats do not use pixels for storage. Instead, they instructions for how to draw the image. This makes the image scalable and is used for drawing and designing, especially three-dimensional graphics. The following are some of the more popular formats:

  • Scalable Vector Graphics (SVG)
  • Computer Aided Design (CAD)
  • Drawing Exchange Format (DXF) – CAD format used to enable interoperability between different products
  • DraWinG (DWG) – CAD format used for three-dimensional design

Audio

Audio encompasses the capturing and storage of sound over a period of time. The following describes some of the key attributes one will come across when dealing with audio.

Bit rate

Bit rate is a number of bits of data conveyed in a unit of time, typically per second. Format is usually expressed as bits per second. Note that 8 bits make a byte, so the bit rate is usually a lot less than the interpreted value, which might be confused with bytes per second.

For MP3, the bit rate is expressed in kilobits per second. The lower the bit rate, the more noticeable the loss in quality of the audio file:

  • 64 to 96 kbit/s is the quality of an FM radio signal
  • 128 to 192 kbit/s is DVD quality
  • 224 to 320 kbit/s is high-quality audio storage

Encoding

This is the codec used for compressing the audio stream. Common formats include MP3, 3GP, AIFF, ASF, and WAV.

Channels

This is a single track or audio stream. Multiple channels are combined to create stereo. The more channels, the greater the perceived depth of the audio track. A channel can also hold a separate audio track to the main one.

Video

Video encompasses the capturing and storage of visual information and optionally sound over a period of time. The following describes some of the key attributes one will come across when dealing with video.

Frame

A frame is a single digital image taken from within the video. A frame is a treated as the lowest common denominator in a video. Using the old film cell (that is equivalent to a frame) is exactly one cell.

Frame resolution

Frame resolution is the width by height in pixels of the video image. Different video standards have different resolutions. There can be a large variation in width by height supported for different mediums. TV quality is approximately 640 x 480 pixels, DVD is around 720 x 575 pixels, and HD can be either 1280 x 720 or 1920 x 1080.

Frame aspect ratio

Frame aspect ratio is the ratio of the width by height of the frame resolution. The two most common formats are 16:9 (wide screen) or 4:3 (TV screen). Converting a video from 4:3 to 16:9 will result in image distortion and might require cropping to remove the distortion.

Frame rate

Frame rate is the number of frames displayed per second. The more frames shown, the smoother the picture appears to the human eye. The higher the frame rate, the higher the storage requirement, as more information is required to display the video.

Progressive scan versus interlaced

Interlaced was designed for cathode ray tube screens. It breaks up the screen into horizontal lines and alternatively displays one line and then the other. This can result in a flickering effect. Progressive scan displays the horizontal lines in sequence offering a sharper picture. Progressive scan is used extensively in LCD monitors.

Codecs/containers

The following section covers some of the more popular video codecs used in the marketplace today:

  • Moving Picture Experts Group (MPEG): It's an open standard designed for the compression of video. The MPEG format can be considered to be a container, as it supports a variety of codecs. Each one has attributes well suited to compress different video sizes.
    • MPEG-1 was designed for compression of low-quality video on DVD.
    • MPEG-2 was designed for digital TV broadcast.
    • MPEG-3 was merged with MPEG-2 and is not the same as the audio MP3.
    • MPEG-4(34) is designed for video that uses high-quality graphics. The format is used by Blu-ray. It's an incredibly robust and adaptable format used in a variety of applications. MPEG-4 Part 10 matches the H.264 standard. MPEG-4 Part 12 and 14 is better known as MP4, which is a format suited for Internet streaming, especially streaming to small devices. An earlier version was used on mobile phones and had a .3gp extension.
  • Audio Video Interleave (AVI): It is a container managed by Microsoft and was first crafted in 1992. The original codecs used within it are not supported any more. The format can now use MPEG and Real Video. As AVI is an old format, it suffers from limitations that have naturally restricted it as technology changes. Issues with aspect ratio, variable frame rate and bit rate means that even though codecs such as MPEG-4 can be used within the AVI container, their usage is restricted.
  • H.264: It is a codec accepted for the use on Blu-ray discs. Its standard overlaps with MPEG-4 Part 10, and the two are kept in sync. Its popularity grew, because it was efficient in compression and flexible in what it could compress. With support from Google and Apple and now with native support of the format in Firefox and Chrome browsers, the acceptance of its usage is growing.
  • Real Player :It is a format that dominated the market because of its capability to stream video. It's managed by the company Real Networks. The product used a streaming video server (which could integrate with Oracle) enabling real-time streaming capabilities of video. The format is a container but originally supported two codecs, one for audio and one for video. They were recognized by the .ra and .rv file extensions. Since then the container has been enhanced to support the MPEG, Flash, and Microsoft formats.
  • Flash Video: This format became popular as the Adobe Flash Player gained popularity. The video format could be embedded in a Flash SWF enabling applications to easily create interfaces with embedded video. With the rise in popularity of YouTube, which natively supported Flash, it looked like its format would dominate the video market. With the gain in popularity of Apple and its refusal to use Flash, the container was enhanced to support H.264.

Issues when converting

When converting video between different formats, the following issues need to be addressed:

  • Frame resolution: Most formats have limits in the width and height they support. It might not be possible to convert one format directly to the other if that other format doesn't support the frame resolution. In this case, the video might need to be converted to an intermediate format before being finally converted.
  • Frame rate: It might not be possible to go from a video with a lower frame to one with a higher frame rate. There might not be enough information available. In some cases, frames can be repeated to increase the frame rate.
  • Audio: For video with supporting audio, the codec used with the audio might not be supported in the video format its being converted to. In this case, the audio codec will need to be converted.

Documents

A document is primarily a set of text based on a character set optionally grouped into pages. A document can contain audio, video, and digital images. There are a large number of document formats in existence. Oracle has an indexing feature called Oracle Text, which can index and enable sophisticated searches to be performed against the documents using a structure embedded in a SQL statement.

Terminology

Though there over 3,000 document formats available in the market, the predominate number of document types fall into the following products.

PDF

Is a format supported by Adobe that was originally designed to be an open standard for document exchange. The goal being that when organizations and individuals pass a document around, they will convert it to PDF first. It became an open standard in 2008 and has been accepted and used extensively in the market place. Most browsers support PDF for display. PDF is also now considered secure and unlikely to contain a virus or Trojan within it.

In most cases, the PDF document when converted is read-only, but it is possible to create a PDF document that enables data to be entered into it; this is like a form. PDF supports images to be embedded in it and can be used primarily as a method for transferring digital images between sites.

A PDF document can be encrypted and digitally signed to ensure it's authentic. The number of pages within a PDF can be easily extracted. Metadata can also be stored in either a name, value pair, or using XMP.

The Oracle Database does not internally support the conversion of an existing document or image to PDF. The database can extract an HTML version of the document, as well as a summary using the Oracle Text index. There are third-party PL/SQL tools available that can create a PDF file using a combination of routines to build up the base PDF document.

DOC/DOCX

This is the Microsoft format for document storage. DOCX is the later XML version designed to be open and conform to the ISO/IEC 29500 Strict standard. The two formats together are dominant in the market place and used in a large number of sites.

ODT

Open Document (ODT)(35) is originally developed by Sun and used in the OpenOffice product set. It is XML-based and conforms to the ISO/IEC 26300:2006 standard.

TXT

This is any digital file containing just characters. There is no structure in the text file unless defined by the author. There is an ambiguity concerning when a text file ends and a structured document begins. A text file can contain XML. A text file can contain multi-part mime attachments. A text file can contain CSV data as well as HTML characters.

The common feature of a text file is that it can be opened up in a text editor such as Windows Notepad or Unix: vi Editor and viewed and edited in a meaningful fashion.

Transformation

The Oracle Database supports the indexing and ability to summarize most document formats. The database does not offer any abilities to transform, edit, or convert the documents. It's not possible to convert a DOCX file to PDF. Though you can convert the document formats to HTML, there is no support for the extraction of an image or other digital objects embedded within them. Oracle also does not support the transformation of an individual page into a JPEG image (for thumbnail display). Some of these capabilities are obtained by integrating OpenOffice and its batch manipulation routine.