Iso h.264




















Subclause 5. Its use with SVC streams is deprecated. The following restrictions apply to SVC data in addition to the requirements in 5. When the decoder configuration record defined in 5. In that case, parameter sets shall not be included in the decoder configuration record.

Sequence and Picture parameter sets stored in this record in a file may be referenced using this 1-based index by the InitialParameterSetBox. However, the reserved bits preceding and succeeding the lengthSizeMinusOne field are re-defined.

The syntax is as follows:. Other tracks may be removed from the file without loss of any portion of the original encoded bitstream, and, once the set of tracks has been reduced to only those in the complete subset, any further removal of a track removes a portion of the encoded information.

The value of numOfSequenceParameterSets shall be in the range of 0 to 64, inclusive. Subset SPSs shall occur in order of ascending parameter set identifier with gaps being allowed. A scalable video stream is represented by one or more video tracks in a file. Each track represents one or more operating points of the scalable stream. A scalable stream may, of course, be further thinned, if desired. There is a minimal set of one or more tracks that, when taken together, contain the complete set of encoded information.

NOTE An alternate group may also include completely independent bitstreams, as well as alternative operating points of the same bitstream. The SVC tracks in the alternate group must be examined to see how many scalable base tracks are identified. However, such a scalable bitstream is typically a non-conforming bitstream. Different tracks may logically share data. This sharing can take one of the following two forms: a The sample data is copied from one track into another track and possibly compacted or re- interleaved with other data, such as audio.

Quantity: One or more sample entries may be present. The URI is treated here as a name only; it should be de-referenceable, though this is not required.

NOTE When AVC compatibility is indicated, it may be necessary to indicate an unrealistic level for the AVC base layer, to accommodate the bit rate of the entire stream, because all the NAL units are considered as included in the AVC base layer and hence may be fed to the decoder, which is expected to discard those NAL unit it does not recognize.

The parameter sets required to decode a NAL unit that is present in the sample data of a video stream, either directly or by reference from an Extractor, shall be present in the decoder configuration of that video stream or in the associated parameter set stream if used.

The following table shows for a video track all the possible uses of sample entries, configurations and the SVC tools excluding timed metadata, which is always used in another track :. In the case of absence of this box, the priority assignment method is unknown.

NOTE The sync sample table, if present, documents only access units that are IDR access units for both the AVC compatible base layer and the layer corresponding to decoding the entire bitstream contained in the track. In case documenting of layer-specific IDR access units is desired, the stream should be stored in separate tracks, e. However, extractors must then be used for tracks that are not the scalable base track.

Its use for AVC is deprecated. NOTE If the random access recovery points for the AVC decoder and the SVC decoder operating on the entire bitstream are not all aligned, the random access recovery points table will not document all of them.

In this case, the stream can be stored in multiple tracks, e. Care should be taken when the SVC structures aggregators or extractors are in use and the track is hinted. These structures are defined only for use in the file format and should not be transmitted. In particular, a hint track that points at an extractor in a video track would cause the extractor itself to be transmitted which is probably both incorrect and not the desired behaviour , not the data the extractor references.

This subclause extends the definition of a sub-sample for AVC in 5. The presence of this box is optional; however, if present in a track containing SVC data, it shall have the semantics defined here. As required in 5. NOTE that this is not the same definition as the discardable field in the sub-sample information box. Annex B normative. Aggregators and Extractors use the NAL unit syntax. These structures are seen as NAL units in the context of the sample structure.

While accessing a sample, Aggregators must be removed leaving their contained or referenced NAL Units and Extractors must be replaced by the data they reference. Aggregators and Extractors must not be present in a stream outside the file format.

This subclause describes Aggregators, which enable NALU-map-group entries to be consistent and repetitive. See Annex C. Aggregators are used to group NAL units belonging to the same sample. An Aggregator may include or reference Extractors. An Extractor may extract from Aggregators. An aggregator must not include or reference another aggregator directly; however, an aggregator may include or reference an extractor which references an aggregator. When scanning the stream: a if the aggregator is unrecognized e.

The value of the variable AggregatorSize is equal to the size of the aggregator NAL unit, and the function sizeof X returns the size of the field X in bytes. The size of this field is specified with the lengthSizeMinusOne field.

This subclause describes Extractors, which enable compact formation of tracks that extract, by reference, NAL unit data from other tracks. An Extractor may reference Aggregators. When an extractor is processed by a file reader that requires it, the extractor is logically replaced by the bytes it references. Those bytes must not contain extractors; an extractor must not reference, directly or indirectly, another extractor. NOTE The track that is referenced may contain extractors even though the data that is referenced by the extractor must not.

An extractor contains an instruction to extract data from another track, which is linked to the track in which the extractor resides, by means of a track reference of type 'scal'. The bytes copied shall be one of the following: a One entire NAL unit; note that when an Aggregator is referenced, both the included and referenced bytes are copied b More than one entire NAL unit. In both cases the bytes extracted start with a valid length field and a NAL unit header.

The alignment is on decoding time, i. Extractors are a media-level concept and hence apply to the destination track before any edit list is considered. However, one would normally expect that the edit lists in the two tracks would be identical.

The sample in that track from which data is extracted is temporally aligned or nearest preceding in the media decoding timeline, i. The first track reference has the index value 1; the value 0 is reserved. Sample 0 zero is the sample with the same, or the closest preceding, decoding time compared to the decoding time of the sample containing the extractor; sample 1 one is the next sample, sample -1 minus 1 is the previous sample, and so on.

If the extraction starts with the first byte of data in that sample, the offset takes the value 0. The offset shall reference the beginning of a NAL unit length field. If this field takes the value 0, then the entire single referenced NAL unit is copied i. The NAL units extracted by an extractor or aggregated by an aggregator are all those NAL units that are referenced or included by recursively inspecting the contents of aggregator or extractor NAL units.

If the set of extracted or aggregated NAL units is empty, then each of these fields takes a value conformant with the mapped tier description. NAL units belonging to a region of interest. The description of such Aggregators may be done with the tier description and the NAL unit map groups. In this case more than one Aggregator with the same scalability information may occur in one sample.

NOTE If multiple scalable tracks reference the same media data, then an aggregator should group NAL units with identical scalability information only. This ensures that the resulting pattern can be accessed by each of the tracks. Annex C normative. If views from the same MVC bitstream are stored in multiple MVC tracks and one or more of these tracks contain multiple views, sample group entries and map groups can be used for these tracks containing multiple views.

Each of the subsets is associated with a tier and may contain one or more operating points. Only one of those entries is the primary definition of the tier.

A multiview group specifies an MVC operating point and is therefore associated with the target output views of the MVC operating point. The Multiview Group box, defined in F. The tier information box provides information about the profile, level, frame size, discardability, and frame-rate of a covered bitstream subset. If the Tier Information box is included in a Scalable Group entry or a Multiview Group entry, the covered bitstream subset consists of the tier and tiers it depends upon.

If the Tier Information box is included in a Multiview Group box, the covered bitstream subset consists of the target output views of the multiview group and all the views required for decoding the target output views. Otherwise, the semantics of tierID are unspecified, and in this case, tierID must be set to the reserved value 0.

If the Tier Information Box is included in a Multiview Group Entry, levelIndication shall be valid when all the views of the covered bitstream subset are target output views.

If the Tier Information Box is included in a Multiview Group Box, levelIndication shall be valid when the views specified by the respective multiview group are the target output views. If levelIndication is equal to 0 for an MVC stream, the level that applies to the covered bitstream subset and operating with all the views being target output views is unspecified.

A coded sub-picture consists of a proper subset of coded slices of a coded picture. A tier may consist of only sub-pictures. In this case, the tier is referred to as a sub-picture tier. A sub-picture tier may represent a region-of-interest part of the region represented by the entire stream.

NOTE The tier representation of a sub-picture tier might not be a valid stream. One example is as follows. An AVC bitstream is encoded using two slice groups. The first slice group includes the macroblocks representing a region-of- interest and is coded without referring to slices in the other slice group for inter prediction over all the access units. The slices of the first slice group in each access unit then form a sub-picture and a sub-picture tier can be specified to include all the sub-pictures over all the access units.

A value of 0 denotes a non- constant frame rate, a value of 1 denotes a constant frame rate and a value of 2 denotes that it is not clear whether the frame rate is constant.

A value of 3 is reserved. If constantFrameRate has a value of 0 or 2 then frameRate gives the average frame rate. If constantFrameRate has a value of 1 then frameRate gives the constant frame rate.

For SVC streams, decoded frames, complementary field pairs and non-paired fields are regarded as frames when deriving the value of frameRate. For MVC streams, decoded view components of any single view only are regarded as frames when deriving the value of frameRate, regardless of the total number of the views, since all output views are required to have simultaneous view components. When included in a Scalable Group entry or a Multiview Group entry, the tier bit rate box provides information about the bit rate values of a tier.

Two sets of information are provided: for the tier representation, including all the tiers on which the current tier depends, and for the tier alone. Similarly, for each set of information, the following values are supplied:. For MVC streams, the lowest long-term average bit rate that this tier could deliver is equal to the long-term average bit rate of the tier, when all NAL units of the tier are considered.

When included in a Multiview Group box, the tier bit rate box provides information about the bit rate values of the covered bitstream subset consisting of the target output views indicated by the multiview group and all the views required for decoding of the target output views. The maximum and long-term average bit rate for the covered bitstream subset are provided. All NAL units in this tier and the lower tiers this tier depends on are taken into account.

For SVC streams, the set of NAL units that are taken into account when calculating this bit rate value is the same as for baseBitRate but excluding all NAL units of the lower tiers this tier depends on.

All NAL units mapped to this tier are taken into account. All NAL units of the lower tiers this tier depends on are not considered. The SVC initial parameter sets box documents which parameter sets are needed for decoding this tier and all the lower tiers it depends on. The SVC rect region box documents the geometry information of the region represented by the current tier relative to the region represented by another tier. When extended spatial scalability was used to encode in the current tier a cropped region of another tier, then the geometry information of the cropped region can be signaled by this box.

This box can also be used to signal the geometry information of a region-of-interest ROI when the current tier is a sub-picture tier. This area can either be static for all samples or vary at sample-by- sample basis. Note that it is possible that independent sub-pictures do not depend on all the tiers with lower tierID. In this case dependencies can be given with the tier dependency box. Otherwise the region represented by the current tier is a fixed rectangular part of the base region.

The BufferingBox contains the buffer information of the covered bitstream subset. If the Buffering box is included in a Scalable Group entry or a Multiview Group entry, the covered bitstream subset consists of the tier and all tiers on which it depends. If the Buffering box is included in a Multiview Group box, the covered bitstream subset consists of the target output views of the multiview group and all the views required for decoding the target output views.

Values of the HRD parameters are specified separately for each operating point. It is in units of a 90 kHz clock. The TierDependencyBox identifies the tiers that the current tier is dependent on. Tier A is directly dependent on tier B if there is at least one NAL unit in tier A using inter prediction, inter-layer prediction, or inter-view prediction from tier B.

Tier A is indirectly dependent on tier B if tier A is not directly dependent on tier B while decoding of tier A requires the presence of tier B. The value of dependencyTierId shall be smaller than the tierId of the current tier. The decoding of the current tier requires the presence of the tier indicated by dependencyTierId. All dependencies up to the tier with the lowest tierId shall be given with the TierDependencyBox.

This box provides the geometry information of region-of-interest ROI divisions of the current tier, when the current tier is encoded to multiple typically a large number of independent rectangular ROIs. The value 0 indicates that all the ROIs except possibly the right-most ones and the bottom-most ones are of identical width and height.

The value 1 indicates that the geometry information for each ROI is separately signalled. The value 2 indicates that the geometry can not be given. Values greater than 2 are reserved. All the ROIs have identical width and height, with the following exceptions. The presence of the box indicates that the bitstream represented by this tier and tiers it depends upon can be transcoded from an SVC stream to an AVC stream as indicated, and that the transcoded bitstream can be given the indicated profile and level indicators, with the indicated bit rates.

The information on the resulting profile, level, and bit rate may be given for either of the entropy coding systems, or both. Each scalable or multiview group entry is associated with a groupID and a tierID.

The tierID entries are ordered in terms of their dependency signalled by the value of tierID. A larger value of tierID indicates a higher tier. A value 0 indicates the lowest tier. Decoding of a tier is independent of any higher tier but may be dependent on lower tiers. Therefore, the lowest tier can be decoded independently, decoding of tier 1 may be dependent on tier 0, decoding of tier 2 may be dependent on tiers 0 and 1, and so on. A tier can include data from one or more layers or views in the video stream.

If two tiers are mutually independent in an SVC stream, then it is required that the tier that has the greater importance, in the view of the content creator, shall be the lower tier i. NOTE For example, two tiers are mutually independent though there may be some lower tiers that they both depend on. The first tier, if presented, has higher frame rate but lower individual picture quality, while the second tier, if presented, has lower frame rate but higher individual picture quality.

If the file composer can identify that the first tier offers a better user experience for this content than the second tier, then the first tier is assigned a lower tierID value than the second tier. There shall be exactly one primary definition for each tier. If for a certain tier no TierDependencyBox is present then this tier may depend on all tiers with lower tierID. If parameter set streams are used, then the InitialParameterSetBox shall not be present.

In other words, it is disallowed to specify tiers that are not used in the track. A Server or Player can choose a subset of tierID values that will be needed for proper decoding operation based on the values of the description fields present within the entries e. Since the ScalableGroupEntry and the MultiviewGroupEntry are of variable length and have no internal length field, the SampleGroupDescription Box which contains either of them must carry length information for its entries according to version 1 of the SampleGroupDescription box definition.

The data in a particular tier may be protected; this is indicated by the presence of a ProtectionSchemeInfoBox in the tier definition. If any tier is so protected then:. The original format box in the ProtectionSchemeInfoBox is required but may not be needed as the four-character code in the SampleEntry might not have changed if, for example, the base layer is un-protected.

When protecting, if extractors are permitted by the scheme in use, and the protection changes data sizes, then extractors may need re-writing. If this value is equal to the value of groupID then this group is the primary definition of this tier. A value of 0 indicates that, for the members of this group, the coded pictures of the representation of the highest layer are not IDR pictures.

A value of 0 indicates that the members of this group may have been coded using inter layer prediction. This required distance for a particular sample may be reduced by a temporal layer switching distance statement in the time parallel metadata track for a specific sample.

The value 0 indicates a temporal switching point with no dependency on the lower temporal layer. In order to describe scalability or view hierarchy within an SVC or MVC access unit, two kinds of sample groups are used:.

Note that these describe tiers, not the entire stream, and therefore describe the NAL units belonging to one tier at any instant, not the entire AU. See C. Defining map groups requires that there is a limited number of map grouping patterns for all access units.

If there is a varying number of NAL units in successive access units for a given tier, Aggregators can be used to make these varying structures consistent and to reduce the number of map groups required. Each of those NAL units maps to the corresponding scalable or multiview group as described by the groupID.

NOTE 1 An arbitrarily chosen groupID is used here, rather than the more obvious scalable or multiview group index from the sample group description box, so that if scalable groups are deleted or re-ordered these operations can be detected and handled. Note also that there may be one or more scalable or multiview groups in a given tier. NOTE 2 If movie fragments are used, new maps cannot be introduced in the fragments only the association of the new samples to pre-existing maps.

In this case, care should be taken to introduce, in the movie box, all the maps that may be needed. When temporal layers are discarded, re-timing the decoding times of some or all samples may be needed to ensure that the stream complies with all buffer and HRD requirements. Also re-timing may improve the transmission and decoding process. Composition times are not affected. This re-timing is given as sample groups, which are associated with samples by using the normal sample-to- group structures.

Each group provides a set of re-timing deltas and their associated tierIDs. The group definition must be ordered by increasing tierID. View Priority sample grouping is used to label views with priorities based on content. The higher the content priority, the more interesting or important the view is for the viewer audience.

Content priority id can help a player or viewer selecting interesting views and can also be used as additional information when pruning views from a file. In the latter case, content priority indicates where pruning is least harmful when several views have similar structural priorities due to encoding constraints. Either version 0 or version 1 of the Sample to Group Box may be used with the View Priority sample grouping. See the View Identifier box. A view that has a lower value than another view has a higher priority than that view.

Annex D normative. Temporal metadata support. This metadata is stored in metadata tracks. The metadata is stored in samples, the decoding time of which is equal to the media samples it describes. Composition offsets are permitted but not required in timed metadata tracks, but, if used, the composition timing must match the composition timing of the associated media track.

The metadata is structured using conceptual statements. Each statement has a one-byte type field — indicating what it is asserting, and a size, which is the length of its payload in bytes, not including the size and type fields. The length of the size field depends on the type field.

The statement groupOfStatements allows several statements to be made about one thing, by grouping them. A groupOfStatements contains a set of statements all of which are asserted about the thing described. The statement type sequenceOfStatements may be used in the description of the entire sample or of a NAL unit in the media stream that is an Aggregator or Extractor, to describe its sequence of NAL units. A sequenceOfStatements contains a set of statements, which are in one-to-one correspondence with the sequence of contained objects in that which is described.

Each metadata sample is a collection a group or sequence of one or more statements about the temporally aligned media sample.

Each of the statements in the collection may have a default type from the sample entry, or have an explicit single type in each statement.

Similarly, the default length may be indicated in the sample entry, or be inline in each sample. The overall sample is a collection of N statements. The sample entry provides the statement type of each sample group or sequence , and optionally the default type and length values of the statements in the sample.

There is a set of pre-defined statement types defined in this International Standard, and there is explicit provision for extension statements by other bodies. There are statement types reserved to ISO, and other statement types reserved for dynamic assignment. Dynamic assignment consists of a table in the Sample Entry of the metadata track, containing pairs mapping a local statement ID to URIs.

The allocation of different categories of statement types is as follows. If URLs are used, they should contain a month indication in the form yyyymm, indicating that this use of the domain in the URL was authorized by the owner of that domain as of that month.

An example may be:. An aggregator NAL unit may be described, and if it is described by a sequence, the elements in the sequence correspond one-one to the NAL units aggregated by both inclusion and reference. Similarly, an extractor may be described, and if it is described by a sequence, the elements in the sequence correspond one-one to the NAL units in the extracted data.

If they need to be skipped in sequences, an empty statement or a NAL header statement can be used. Creating ISO image from DVD could not only provent your discs from scratches but also free up a large amount of storage space. Therefore, demand of ISO to H. Whatever reasons you may have, this tutorial will teach you how to compress and convert ISO to H.

According to diverse of usages of ISO encoded output, you need to choose a video codec for the outcome of ISO converter. And to convert ISO to H. In other word, after converting ISO image to H. You simply need a powerful and efficient ISO to H. There are numbers of paid or free programs that can get the job done. It's kinda a paradox to get the fast speed, optimal quality, and small size for the ISO to H.

It is available for for Windows 10, 8. Its features include but are not limited to:. Specifically, the converted H. The primary reason was that AVC delivered the best video quality at that time while at a much lower data rate than its peers. Another reason for H. And Apple was planning to join this trend and take the lead in the market of H. Of course yes. Apple promised to build H. For example, SlideShare, and many other programs that utilize the QuickTime architecture can use the H.

This is a question that should have been asked 10 years ago. As for now, modern devices surely meet its hardware requirements as it is a universal standard codec in our digital life. And an Internet-sized content at 40kbps - kbps runs on the most basic of processors, like those in smartphones and consumer-level computers.

As mentioned above that H. And MP4 usually refers to a video format or format container. YouTube officially claims that the best video formats is MP4 with H. So you have to deal with the video bitrate carefully to reduce quality loss as much as possible. And it provides integrated support for both video transmission and storage. You might or might noticed that it has been adopted in all most all multi-media fields:. MEPG A method of defining compression of both video and audio data. It consists of several standards parts.

MPEG-4 Part is one of them and used to define video compression. Will H. This is a question we'll ask every time new codecs come up but never settled down. Some thought it might be replaced by HEVC. But as for the year , AVC still has its place. And Plex still only transcodes to H.

In retrospect, plenty of video codecs are knocked out, such as H. At least now H. And it is reasonable to predict that when codecs wtih a higher compression ratio went viral on every application and device, the battlefield would turn to the patent pool.

But, as some talents have developed open source encoders x. Just wait and see whether it can survive in the next video codec revolution. Cecilia Hwung is the marketing manager of Digiarty Software and the editor-in-chief of VideoProc team. She pursues common progress with her team and expects to share more creative content and useful information to readers. She has strong interest in copywriting and rich experience in video editing tips. Create cinematic videos and beyond. Learn More. VideoProc Converter One-stop video processing software.

Convert, transcode, compress, download and record. VideoProc Converter Convert, transcode, compress, download and record. Everything You Should Know about H. Part 1. What Is H. Part 2. The Evolution of H. The Advantages of H. It supports high definition videos including p and 4K 60fps.

Part 4. How H. Inter and Intra Prediction The eternal purpose of video coding is to improve efficiency and save up bit rate as much as possible. I-frame contains the complete information of the image, it is coded independently of other non-I-frame pictures. P-frame contains differences relative to preceding frames. B-frame contains differences relative to both preceding and following frames. The more frames between I-frames, the longer the GOP.

Motion estimation: Frames are departed into macroblocks.



0コメント

  • 1000 / 1000