reader

This section contains the documentation of movinets_helper\reader.py.

Functionalities to read a TFRecordDataset ready to train a network.

`add_states(video, label, stream_states={})`

This function is expected to modify the dataset to make it ready for the movinet stream models, but couldn't get to train them

Parameters:

Name	Type	Description	Default
`video`	`_type_`	description	required
`label`	`_type_`	description	required
`stream_states`	`dict`	description. Defaults to {}.	`{}`

Returns:

Type	Description
`Tuple[Dict[str, tf.Tensor], tf.Tensor]`	Tuple[Dict[str, tf.Tensor], tf.Tensor]: description

Source code in movinets_helper/reader.py

def add_states(
    video, label, stream_states={}
) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
    """This function is expected to modify the dataset to make it ready
    for the movinet stream models, but couldn't get to train them

    Args:
        video (_type_): _description_
        label (_type_): _description_
        stream_states (dict, optional): _description_. Defaults to {}.

    Returns:
        Tuple[Dict[str, tf.Tensor], tf.Tensor]: _description_
    """
    return {**stream_states, "image": video}, label

`encode_label(label, num_classes)`

One hot encodes the labels according to the number of classes.

Parameters:

Name	Type	Description	Default
`label`	`str`	Label representing the movement of the video.	required
`num_classes`	`int`	Total number of classes in the dataset.	required

Returns:

Type	Description
`tf.Tensor`	tf.Tensor: Encoded representation of the label

Source code in movinets_helper/reader.py

def encode_label(label: str, num_classes: int) -> tf.Tensor:
    """One hot encodes the labels according to the number of classes.

    Args:
        label (str): Label representing the movement of the video.
        num_classes (int): Total number of classes in the dataset.

    Returns:
        tf.Tensor: Encoded representation of the label
    """
    return tf.one_hot(label, num_classes)

`format_features(video, label, resolution=172, scaling_factor=255.0, num_classes=2)`

Transforms the data to have the appropriate shape.

This function must be called on a tf.data.Dataset (passed via its .map method).

Parameters:

Name	Type	Description	Default
`video`	`tf.Tensor`	Decoded video.	required
`label`	`str`	Corresponding class of the video.	required
`resolution`	`int`	The resolution will be model dependent. Movinet a0 and a1 use 172, a2 uses 224. Defaults to 172.	`172`
`scaling_factor`	`float`	Given the videos have the pixels in the range 0.255, transforms the data to the range [0, 1]. Defaults to 255..	`255.0`
`num_classes`	`int`	Number of classes the model is trained on. I.e. for Kinetics 600 will be 600, for UCF 101 that will be the number. Defaults to 2.	`2`

Returns:

Type	Description
`Tuple[tf.Tensor, tf.Tensor]`	Tuple[tf.Tensor, tf.Tensor]: When iterated, the first element will be the video, and the second will be the label as required by the model.

Source code in movinets_helper/reader.py

def format_features(
    video: tf.Tensor,
    label: str,
    resolution: int = 172,
    scaling_factor: float = 255.0,
    num_classes: int = 2,
) -> Tuple[tf.Tensor, tf.Tensor]:
    """Transforms the data to have the appropriate shape.

    This function must be called on a tf.data.Dataset (passed
    via its .map method).

    Args:
        video (tf.Tensor): Decoded video.
        label (str): Corresponding class of the video.
        resolution (int, optional):
            The resolution will be model dependent.
            Movinet a0 and a1 use 172, a2 uses 224.
            Defaults to 172.
        scaling_factor (float, optional):
            Given the videos have the pixels in the range 0.255,
            transforms the data to the range [0, 1]. Defaults to 255..
        num_classes (int, optional):
            Number of classes the model is trained on.
            I.e. for Kinetics 600 will be 600, for UCF 101 that will be the number.
            Defaults to 2.

    Returns:
        Tuple[tf.Tensor, tf.Tensor]:
            When iterated, the first element will be the video, and
            the second will be the label as required by the model.

    """
    label = tf.cast(label, tf.int32)
    label = encode_label(label, num_classes)

    video = tf.image.resize(video, (resolution, resolution))
    video = tf.cast(video, tf.float32) / scaling_factor

    return video, label

`get_dataset(filenames)`

Generates a td.data.Dataset from the TFRecord files.

This is the appropriate format to be passed to model.fit, after it is formated and there is some batch called, so the final video object ingested by the model will have the shape [n_videos, n_frames, resolution, resolution, channels].

Parameters:

Name	Type	Description	Default
`filenames`	`List[str]`	List of .tfrecord files.	required

Returns:

Type	Description
`tf.data.Dataset`	tf.data.Dataset: Dataset ready to train the model.

Example

target_path is the path to the .tfrecords files directory.

ds = get_dataset(list(Path(target_path).iterdir()))

This iterable may be formatted appropriately:

ds = get_dataset(list(Path(target_path_train).iterdir())) ds = ds.map(format_features)

To see a single example:

next(iter(ds))

Source code in movinets_helper/reader.py

def get_dataset(filenames: List[str]) -> tf.data.Dataset:
    """Generates a td.data.Dataset from the TFRecord files.

    This is the appropriate format to be passed to model.fit,
    after it is formated and there is some batch called, so the
    final video object ingested by the model will have the shape
    [n_videos, n_frames, resolution, resolution, channels].

    Args:
        filenames (List[str]): List of .tfrecord files.

    Returns:
        tf.data.Dataset: Dataset ready to train the model.

    Example:
        target_path is the path to the .tfrecords files directory.

        >>> ds = get_dataset(list(Path(target_path).iterdir()))

        This iterable may be formatted appropriately:

        >>> ds = get_dataset(list(Path(target_path_train).iterdir()))
        >>> ds = ds.map(format_features)

        To see a single example:

        >>> next(iter(ds))
    """
    raw_dataset = tf.data.TFRecordDataset(filenames, compression_type="GZIP")
    return raw_dataset.map(_parse_example)