Google · Engineering Analyst

YouTube Trust & Safety, L4 — SQL + policy/case studies

Hyderabad · May 2026 · 39 views

Rejected

5 yrs exp·Tier-2·Product startup·Referral

Total process: 35 days

5 rounds: →→→→

Background

I interviewed for an Engineering Analyst (L4) role on Google's YouTube Trust & Safety team in Hyderabad. I currently work at a product company (e-commerce), where my work spans analytics, coding, and cross-functional coordination.

Timeline

When	Stage
2 Feb 2026	Got Referral
5 May 2026	Round 1 — Recruiter Screen (HR) · cleared
11 May 2026	Round 2 — SQL · ~45 min · cleared
27 May 2026	Round 3 — Behavioural / Fit · ~30 min · cleared
27 May 2026	Round 4 — Case + SQL (RRK) · cleared
3 June 2026	Round 5 — Hiring Manager · case-style · cleared
7 July 2026	Not selected

Rounds

Round 1 — Recruiter Screen (HR)

Format: Video | Duration: ~45 min | Interviewer: Recruiter

The interviewer asked me to describe my current work. I took a bit too long here — I should have wrapped it up in about 5–7 minutes. She then walked through my compensation (fixed, variable, stock — the full breakdown) and asked why I wanted to switch.

Since the role is for YouTube Trust & Safety, she moved into product-thinking questions.

Q1 — If you had to develop a new feature on YouTube, what would it be?

After my answer, she asked for a second feature — so be prepared to pitch multiple ideas and explain the value each one adds.

Q2 — How does YouTube generate revenue?

Covered ads, premium, etc.
Follow-up: What are two other revenue streams YouTube could explore?

Feedback received

Be clear and direct in your answers — avoid back-and-forth.
Ask clarifying questions before jumping in.
Take a minute or two to think if you're unsure.

Prep guidance for upcoming rounds

Deep-dive into YouTube's product offerings and latest features.
Study YouTube Community Guidelines thoroughly.
Understand global trends in Trust & Safety — what's new, what's changing.
The loop has four 45-minute elimination rounds: Googliness, Leadership, Technical Expertise, and Cognitive Ability (Google-specific).
The full process can take 4–5 weeks, and each round can cover a wide range of topics.

Verdict: Cleared

Round 2 — SQL

Format: Video | Duration: ~45 min | Interviewer: Engineer

After a brief 2–3 minute intro, the interviewer had 3 SQL questions planned. I solved the first three quickly (~15 min), so they added a harder open-ended question.

Q1 — Given a table of students and grades, get the names of students with the second lowest grade.

Why DENSE_RANK and not RANK or ROW_NUMBER? What's the difference?
What if two students are tied — what does your query return?
Solve it without window functions (very important).
Now get students with both the lowest AND second lowest grade.

Q2 — How old are disabled users on average?

USERS

user_id	user_country	creation_timestamp	disable_timestamp

SPAMMY_ACTIONS

action_id	product	user_id	spam_timestamp

Expected clarification: does "old" mean time since account creation or time since disabling? (Answer: current_date - disable_timestamp.)

Q3 — What are the top 3 spammed products per country? (from the schema above)

What if SPAMMY_ACTIONS has duplicate rows — how does that affect your count?
If 2 products are tied at rank 3, what does your query return? How many rows per country?

Q4 — Given a table of monthly YouTube channel revenue (`channel_id`, `billing_country`, `revenue`, `month`) going back several years, write a query to find channels that had a "spike" in typical monthly revenue, along with the month of the spike.

This was open-ended (~20 min) and ran in three steps. First, the interviewer asked me to define a spike before writing any SQL. I initially said a 50% increase from last month, then changed to mean + 2 standard deviations; the interviewer probed which definition has fewer pitfalls and why. Then I wrote the query, and the bulk of the discussion was edge-case probing:

Should the current month be included in the baseline calculation? What happens if it is?
What if a channel has only 1 month of data?
What about channels with constant revenue — what does your query do?
What about seasonal channels (e.g., holiday content that spikes every December)?
What if the channel is on a growth journey — how does that break your query, and how do you accommodate it?
What if there's one massive spike in the data — how does that affect your baseline, and how do you handle it?
You used average — what would be a better approach? (Expected: rolling 12-month window, median instead of mean.)

The interviewer cared less about getting the perfect query on the first try and more about identifying edge cases, articulating trade-offs, and iterating toward a robust solution.

Verdict: Cleared

Round 3 — Behavioural / Fit

Format: Video | Duration: ~30 min (45-min slot)

The interviewer opened with a ~5 min intro covering format, expectations, and introductions from both sides. The rest was a behavioural conversation built around one core project with a series of follow-ups on collaboration, stakeholders, and fit.

Q1 — Of all the projects you've worked on, which was the most impactful?

Walked through one of my most impactful projects at a product company, which involved coordinating across multiple teams and stakeholders.

Q2 — How did you get all the teams on the same page and moving forward?

Discussed my alignment approach — shared goals, communication cadence, etc.

Q3 — Who was the most difficult stakeholder to work with, and what did you do to keep things moving?

Discussed the specific stakeholder, the friction, and how I navigated it.

Q4 — What was the metric you were trying to move, and what was the actual business impact?

Walked through the target metric and the measured outcome.

Q5 — If you were to build an ideal work environment, what would its features be?

Feedback culture
Open communication
All ideas considered

Q6 — What could break this environment?

People misalignment
Misaligned objectives across teams

Q7 — If you're working with an entirely new team of stakeholders, how do you influence them?

Find a common goal
Establish a shared baseline/context before pushing ideas

Q8 — What are common obstacles you might encounter due to a lack of experience with a new team — and how do you still push forward?

Discussed challenges of being new — lack of credibility, missing context, unclear decision-makers — and how to navigate them.

Q9 — What is something you'd want to avoid in the new organization going forward?

Discussed factors I'd avoid — framed positively as preferences.

Q10 — What if your stakeholders' metrics are opposite to yours? How do you influence them to prioritize your metric or find common ground?

Framed it as a shared trade-off rather than win/lose — finding the higher-level metric both ladder up to, quantifying trade-offs, escalating cleanly if needed.

At the end I asked questions focused on Trust & Safety — the work, the challenges, and team dynamics.

The interviewer was clearly testing stakeholder management, influence without authority, and self-awareness — all critical for a T&S Analyst role where you work cross-functionally with policy, product, ops, and ML teams. They used one project as a hook and pulled multiple behavioural signals from it (impact, collaboration, conflict, metrics), so be ready to defend one project from many angles. It wrapped 15 min early — could be efficiency, or that the interviewer had what they needed; hard to read either way.

Verdict: Cleared

Round 4 — Case + SQL (RRK)

Format: Video | Duration: ~45 min

The interviewer intro'd, I intro'd, and we discussed my current role (coding languages, analytics scope) before moving into a case question and then SQL.

Q1 — A group of users displaying pro-Russia behaviour in Palestine war content gets detected and removed, but they keep returning to the platform. How would you automate this?

Detection: an LLM-based classifier that learns from past flagged behaviours across the platform.
Rule/answer bank: past violations form a reference set of rules and signals.
Automation: new users displaying matching behaviour are automatically flagged against this bank.

Q2 — Precision vs. recall trade-off: 50 bad + 50 good users. Model A flags 75 → 50 bad, 25 good (high recall, lower precision). Model B flags 25 → all 25 good. Which do you pick?

The choice depends on severity of harm.
High-severity harm → favour recall (catch everything, accept false positives).
Low-severity harm → favour precision (avoid jamming reviewer queues with false positives).

(Note: as stated, Model B catches zero bad users — likely a misstatement in the original problem.)

Self-note: I over-explained with calculations — I should have anchored directly on the precision vs. recall framing.

Q3 — How does high-recall / low-precision hurt creators?

Legitimate creator content gets flagged → stuck in pending review, demonetized, or suppressed in distribution.
Adds friction, hurts creator experience and retention.

Q4 — How do you improve this for trusted creators?

Build a user reputation signal: prior strikes, category history, account age, engagement quality.
For clean-history users:
- Don't auto-demonetize at upload.
- Route to a fast-track human review queue.

Q5 — How would you automate the broader system?

Pure keyword/rule-based detection misses context.
An LLM can ingest the full video + metadata to build contextual understanding and classify better.

Q6 — Precision is at 50%. How do you improve it?

50% is an average — segment by category/content type.
High-precision segments → auto-action.
Low-precision segments → human review.
LLM-assisted human review: present the reviewer with video history, a metadata snapshot, the applicable policy, and a suggested action.
The reviewer's decisions feed back into LLM training → feedback loop → precision improves over time.

Q7 — SQL: a simple `COUNT(*) ... GROUP BY` query, plus a regex pattern match on emails — match digits, a symbol, a three-digit number, then `@gmail.com`.

SELECT *
FROM users
WHERE email REGEXP '^[0-9]+[^a-zA-Z0-9][0-9]{3}@gmail\\.com$';

Verdict: Cleared

Round 5 — Hiring Manager (Case)

Format: Video | Duration: ~45 min | Interviewer: Hiring Manager

The interviewer opened with a single anchoring scenario — measuring AI-generated content on the platform — and progressively layered on harder constraints: detection → measurement → good/bad classification → technical implementation → a real-world inauthentic-behaviour case → an operational incident. Less behavioural than a case interview probing systems thinking, T&S judgment, and how you reason about detection at scale.

The framing scenario: you're on a team tasked with figuring out what percentage of videos on the platform are AI-generated.

Q1 — How would you figure out what percentage of videos on the platform are AI-generated?

Detection — mechanisms for identifying AI-generated content.
Measurement — out of X videos, how many contain AI?
Follow-up: What are the drawbacks of measuring the way you proposed? (Pushed on limitations — e.g. sampling bias, classifier precision/recall, evolving generation methods.)

Q2 — Say the classifier flags 1,000 AI-generated videos. How do you detect which are "bad" vs "good"?

I clarified what "good/bad" meant. The interviewer's framing: a news channel using AI as part of a video isn't harmful; someone using it for deceptive/harmful use cases is bad.
Follow-up: How do you build a long-term mechanism for tracking good vs. bad over time?
Follow-up: How do you implement this technically — the actual detection mechanism for good vs. bad?

Q3 — A public figure has come to us for the third time: ~10 deepfake videos across ~40 channels keep reposting. We've removed content before, but this time we want to tell them we've removed all deepfake videos of them on the platform. How do you find and remove all of it, and make a credible assurance it's all gone and stays gone?

Addressed both finding/removing all instances at scale and how to make a credible, durable assurance.

Q4 — You arrive one morning and the human-review queue is 10x normal volume. How do you reduce the queue while maintaining review quality, and make sure you don't fall into this trap again?

Q5 — How do you detect AI-generated videos and produce a number for what % of AI videos on the platform are harmful — then explain that number to stakeholders?

Overlaps with Q1; here the emphasis was on arriving at a defensible figure and communicating it.

The interviewer tested end-to-end T&S systems thinking — detection, measurement, classification, technical implementation, and operations — not just behavioural fit. One scenario (AI on platform) was used as a hook, then layered with constraints to see how deep I could reason. Clarifying "good vs. bad" before answering was the right move and was explicitly invited. Each answer triggered a "now how about…" follow-up.

Verdict: Cleared

Final Outcome

I completed all five rounds through the Hiring Manager interview on 3 June 2026. After a month's wait the HR contacted me and said they will not be proceeding with my profile.

Tips for Future Candidates

Deep-dive into YouTube's product offerings and latest features, study the Community Guidelines thoroughly, and follow global Trust & Safety trends — these show up across every round.
For SQL, be ready to justify DENSE_RANK vs. RANK vs. ROW_NUMBER, solve without window functions, and reason aloud about ties, duplicates, and edge cases — the interviewer cares more about trade-offs than a perfect first query.
In case rounds, clarify ambiguous terms (like "good vs. bad" or "old") before answering — it's explicitly welcomed — and anchor directly on the framework (e.g. precision vs. recall) instead of over-explaining with calculations.
Have one strong project you can defend from many angles: impact, collaboration, difficult stakeholders, and measured business outcome.
Expect each answer to trigger a harder follow-up — practice reasoning about detection, measurement, and operations at scale.
Have a lot of patience it took me nearly 3 months just to get shortlisted for the role

YouTube Trust & Safety, L4 — SQL + policy/case studies

Background

Timeline

Rounds

Round 1 — Recruiter Screen (HR)

Q1 — If you had to develop a new feature on YouTube, what would it be?

Q2 — How does YouTube generate revenue?

Round 2 — SQL

Q1 — Given a table of students and grades, get the names of students with the second lowest grade.

Q2 — How old are disabled users on average?

Q3 — What are the top 3 spammed products per country? (from the schema above)

Q4 — Given a table of monthly YouTube channel revenue (channel_id, billing_country, revenue, month) going back several years, write a query to find channels that had a "spike" in typical monthly revenue, along with the month of the spike.

Round 3 — Behavioural / Fit

Q1 — Of all the projects you've worked on, which was the most impactful?

Q2 — How did you get all the teams on the same page and moving forward?

Q3 — Who was the most difficult stakeholder to work with, and what did you do to keep things moving?

Q4 — What was the metric you were trying to move, and what was the actual business impact?

Q5 — If you were to build an ideal work environment, what would its features be?

Q6 — What could break this environment?

Q7 — If you're working with an entirely new team of stakeholders, how do you influence them?

Q8 — What are common obstacles you might encounter due to a lack of experience with a new team — and how do you still push forward?

Q9 — What is something you'd want to avoid in the new organization going forward?

Q10 — What if your stakeholders' metrics are opposite to yours? How do you influence them to prioritize your metric or find common ground?

Round 4 — Case + SQL (RRK)

Q1 — A group of users displaying pro-Russia behaviour in Palestine war content gets detected and removed, but they keep returning to the platform. How would you automate this?

Q2 — Precision vs. recall trade-off: 50 bad + 50 good users. Model A flags 75 → 50 bad, 25 good (high recall, lower precision). Model B flags 25 → all 25 good. Which do you pick?

Q3 — How does high-recall / low-precision hurt creators?

Q4 — How do you improve this for trusted creators?

Q5 — How would you automate the broader system?

Q6 — Precision is at 50%. How do you improve it?

Q7 — SQL: a simple COUNT(*) ... GROUP BY query, plus a regex pattern match on emails — match digits, a symbol, a three-digit number, then @gmail.com.

Round 5 — Hiring Manager (Case)

Q1 — How would you figure out what percentage of videos on the platform are AI-generated?

Q2 — Say the classifier flags 1,000 AI-generated videos. How do you detect which are "bad" vs "good"?

Q4 — You arrive one morning and the human-review queue is 10x normal volume. How do you reduce the queue while maintaining review quality, and make sure you don't fall into this trap again?

Q5 — How do you detect AI-generated videos and produce a number for what % of AI videos on the platform are harmful — then explain that number to stakeholders?

Final Outcome

Tips for Future Candidates

Q4 — Given a table of monthly YouTube channel revenue (`channel_id`, `billing_country`, `revenue`, `month`) going back several years, write a query to find channels that had a "spike" in typical monthly revenue, along with the month of the spike.

Q7 — SQL: a simple `COUNT(*) ... GROUP BY` query, plus a regex pattern match on emails — match digits, a symbol, a three-digit number, then `@gmail.com`.