Analyze the video and suggest appropriate start and end points
to remove any unnecessary content (eg. blurry, reframing, etc) at the beginning or end of the clip.
Provide your suggestions in seconds along with a brief explanation.

Estimate the time of day based on lighting. Choose one: [dawn, morning, noon, afternoon, golden_hour, twilight, night, unknown_indoors, unknown_other].

Read any visible text (OCR) in the video (e.g., street signs, menu items, train station names, ticket stubs).

Identify specific landmarks or generic setting types (e.g., Airport Terminal, Mountain Trail, Coffee Shop Interior).

Describe the environment setting (e.g., "busy train station," "quiet forest path," "cafe interior").

Identify who is in the shot (e.g., Solo, Couple, Crowd, Locals, Nobody).

Determine the emotional "vibe" or mood of the clip (e.g., Chaotic, Serene, Energetic, Melancholic, Romantic).

Describe this video clip using 1-2 sentences. Then give it a short name to use as a filename. Use snake_case.

Identify the single best frame to serve as a thumbnail for this video clip. This frame should be visually clear, representative of the clip's content, and aesthetically pleasing. Provide the timestamp of this frame in seconds.

Respond in JSON format with the following fields:
- trim: Boolean describing whether or not trim is needed.
- start_sec: Suggested start time in seconds (float) or null if no trim needed.
- end_sec: Suggested end time in seconds (float) or null if no trim needed.
- trim_reason: A brief explanation of your trim suggestions.
- time_of_day: Estimated time based on lighting.
- detected_text: Any visible text found in the video.  Choose up to 10 most important.
- landmark_identification: Specific landmarks or generic setting types.
- environment: Description of the setting.
- people_presence: Who is in the shot.
- mood: The emotional vibe of the clip.
- subject_keywords: A list of keywords describing subjects, scenes, environment, etc.
- action_keywords: A list of keywords describing what is happening in the video.
- clip_description: A short summary of the clip's content.
- audio_description: A description of the audio content (e.g. "ambient noise", "dialogue", "music", "silence", "lecture").
- clip_name: A concise name (2-5 words) for the clip based on its content in lower_snake_case.
- thumbnail_timestamp_sec: The timestamp in seconds (float) of the best thumbnail frame.