Skip to content

reader

This section contains the documentation of movinets_helper\reader.py.

Functionalities to read a TFRecordDataset ready to train a network.

add_states(video, label, stream_states={})

This function is expected to modify the dataset to make it ready for the movinet stream models, but couldn't get to train them

Parameters:

Name Type Description Default
video _type_

description

required
label _type_

description

required
stream_states dict

description. Defaults to {}.

{}

Returns:

Type Description
Tuple[Dict[str, tf.Tensor], tf.Tensor]

Tuple[Dict[str, tf.Tensor], tf.Tensor]: description

Source code in movinets_helper/reader.py
def add_states(
    video, label, stream_states={}
) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
    """This function is expected to modify the dataset to make it ready
    for the movinet stream models, but couldn't get to train them

    Args:
        video (_type_): _description_
        label (_type_): _description_
        stream_states (dict, optional): _description_. Defaults to {}.

    Returns:
        Tuple[Dict[str, tf.Tensor], tf.Tensor]: _description_
    """
    return {**stream_states, "image": video}, label

encode_label(label, num_classes)

One hot encodes the labels according to the number of classes.

Parameters:

Name Type Description Default
label str

Label representing the movement of the video.

required
num_classes int

Total number of classes in the dataset.

required

Returns:

Type Description
tf.Tensor

tf.Tensor: Encoded representation of the label

Source code in movinets_helper/reader.py
def encode_label(label: str, num_classes: int) -> tf.Tensor:
    """One hot encodes the labels according to the number of classes.

    Args:
        label (str): Label representing the movement of the video.
        num_classes (int): Total number of classes in the dataset.

    Returns:
        tf.Tensor: Encoded representation of the label
    """
    return tf.one_hot(label, num_classes)

format_features(video, label, resolution=172, scaling_factor=255.0, num_classes=2)

Transforms the data to have the appropriate shape.

This function must be called on a tf.data.Dataset (passed via its .map method).

Parameters:

Name Type Description Default
video tf.Tensor

Decoded video.

required
label str

Corresponding class of the video.

required
resolution int

The resolution will be model dependent. Movinet a0 and a1 use 172, a2 uses 224. Defaults to 172.

172
scaling_factor float

Given the videos have the pixels in the range 0.255, transforms the data to the range [0, 1]. Defaults to 255..

255.0
num_classes int

Number of classes the model is trained on. I.e. for Kinetics 600 will be 600, for UCF 101 that will be the number. Defaults to 2.

2

Returns:

Type Description
Tuple[tf.Tensor, tf.Tensor]

Tuple[tf.Tensor, tf.Tensor]: When iterated, the first element will be the video, and the second will be the label as required by the model.

Source code in movinets_helper/reader.py
def format_features(
    video: tf.Tensor,
    label: str,
    resolution: int = 172,
    scaling_factor: float = 255.0,
    num_classes: int = 2,
) -> Tuple[tf.Tensor, tf.Tensor]:
    """Transforms the data to have the appropriate shape.

    This function must be called on a tf.data.Dataset (passed
    via its .map method).

    Args:
        video (tf.Tensor): Decoded video.
        label (str): Corresponding class of the video.
        resolution (int, optional):
            The resolution will be model dependent.
            Movinet a0 and a1 use 172, a2 uses 224.
            Defaults to 172.
        scaling_factor (float, optional):
            Given the videos have the pixels in the range 0.255,
            transforms the data to the range [0, 1]. Defaults to 255..
        num_classes (int, optional):
            Number of classes the model is trained on.
            I.e. for Kinetics 600 will be 600, for UCF 101 that will be the number.
            Defaults to 2.

    Returns:
        Tuple[tf.Tensor, tf.Tensor]:
            When iterated, the first element will be the video, and
            the second will be the label as required by the model.

    """
    label = tf.cast(label, tf.int32)
    label = encode_label(label, num_classes)

    video = tf.image.resize(video, (resolution, resolution))
    video = tf.cast(video, tf.float32) / scaling_factor

    return video, label

get_dataset(filenames)

Generates a td.data.Dataset from the TFRecord files.

This is the appropriate format to be passed to model.fit, after it is formated and there is some batch called, so the final video object ingested by the model will have the shape [n_videos, n_frames, resolution, resolution, channels].

Parameters:

Name Type Description Default
filenames List[str]

List of .tfrecord files.

required

Returns:

Type Description
tf.data.Dataset

tf.data.Dataset: Dataset ready to train the model.

Example

target_path is the path to the .tfrecords files directory.

ds = get_dataset(list(Path(target_path).iterdir()))

This iterable may be formatted appropriately:

ds = get_dataset(list(Path(target_path_train).iterdir())) ds = ds.map(format_features)

To see a single example:

next(iter(ds))

Source code in movinets_helper/reader.py
def get_dataset(filenames: List[str]) -> tf.data.Dataset:
    """Generates a td.data.Dataset from the TFRecord files.

    This is the appropriate format to be passed to model.fit,
    after it is formated and there is some batch called, so the
    final video object ingested by the model will have the shape
    [n_videos, n_frames, resolution, resolution, channels].

    Args:
        filenames (List[str]): List of .tfrecord files.

    Returns:
        tf.data.Dataset: Dataset ready to train the model.

    Example:
        target_path is the path to the .tfrecords files directory.

        >>> ds = get_dataset(list(Path(target_path).iterdir()))

        This iterable may be formatted appropriately:

        >>> ds = get_dataset(list(Path(target_path_train).iterdir()))
        >>> ds = ds.map(format_features)

        To see a single example:

        >>> next(iter(ds))
    """
    raw_dataset = tf.data.TFRecordDataset(filenames, compression_type="GZIP")
    return raw_dataset.map(_parse_example)