Google · Engineering Analyst
Hyderabad · May 2026 · 16 views
In progressTotal process: 35 days
5 rounds: →→→→
I applied for an Engineering Analyst role at Google's YouTube Trust & Safety team in Hyderabad. This was an L4-level position, and the interview process spanned from early May through early June 2026.
I was contacted by a Google recruiter and scheduled for an initial screening on May 5th. The full process consisted of four elimination rounds, each lasting 45 minutes, scheduled over roughly four weeks.
Format: Phone / Virtual
Duration: ~45 minutes
The recruiter started with a general self-introduction and asked me to walk through my current work. I spent too long on this part — I should have wrapped it up in 5–7 minutes instead.
We then moved into compensation discussion. The recruiter asked for a full breakdown: fixed pay, variable pay, stock, and other benefits. This was straightforward factual exchange.
She asked why I was looking to switch from my current company, which I answered directly.
Product thinking and YouTube context:
Since the role was for YouTube Trust & Safety, she pivoted to product thinking. She asked: "If you had to develop a new feature on YouTube, what would it be?" After I answered, she immediately followed up: "Now pitch a second feature." So I needed to be ready with multiple feature ideas and be able to explain the value each one adds.
Next came a revenue question: "How does YouTube generate revenue?" followed by "What are two other revenue streams YouTube could explore?"
Feedback received:
Preparation guidance for upcoming rounds: The recruiter laid out the structure: four elimination rounds covering Googliness, Leadership, Technical Expertise, and Cognitive Ability. She advised deep-diving into YouTube's product offerings, studying YouTube Community Guidelines thoroughly, and understanding global trends in Trust & Safety.
Verdict: Cleared
Format: Virtual
Duration: ~45 minutes (May 11th, 2026)
The interviewer gave a brief 2–3 minute intro, then presented three SQL questions planned for ~15 minutes, with a fourth harder open-ended question added when I solved the first three quickly.
Q1 — Second Lowest Grade
Given a table of students and grades, find the names of students with the second lowest grade.
Follow-ups:
DENSE_RANK and not RANK or ROW_NUMBER? What's the difference?Q2 — Average Age of Disabled Users
I had two tables:
USERS (user_id, user_country, creation_timestamp, disable_timestamp)SPAMMY_ACTIONS (action_id, product, user_id, spam_timestamp)The question: How old are disabled users on average?
The key clarification the interviewer pushed for: What does "old" mean — time since account creation, or time since disabling? The answer they wanted: current_date - disable_timestamp.
Q3 — Top 3 Spammed Products Per Country
Using the same schema, find the top 3 spammed products per country.
Follow-ups:
SPAMMY_ACTIONS table has duplicate rows — how does that affect your count?Q4 — Revenue Spike Detection (Open-Ended, ~20 minutes)
Given a table of monthly YouTube channel revenue:
channel_id, billing_country, revenue, month (spanning several years)Find channels that had a "spike" in typical monthly revenue, and identify the month of the spike.
Step 1 — Define spike (2–3 min discussion before writing SQL):
The interviewer deliberately asked me to define what a "spike" looks like before touching any code. I initially suggested "50% increase from last month," then revised to "mean + 2 standard deviations." The interviewer probed: Which definition has fewer pitfalls and why? This forced me to articulate the trade-offs.
Step 2 — Write the query:
I wrote a query using window functions, CTEs, and statistical aggregation.
Step 3 — Edge case probing (the bulk of the discussion):
The interviewer then hammered me with edge cases and constraints, one after another:
Where I got stuck:
I was so focused on getting the initial query right that I didn't anticipate how many edge cases would derail it. The interviewer cared less about the perfect query on the first try and more about whether I could identify edge cases, articulate trade-offs, and iterate toward a robust solution.
Verdict: Cleared
Format: Virtual
Duration: ~45 minutes (May 27th, 2026)
The interviewer opened with a ~5 minute intro covering format, expectations, and introductions. The rest was a behavioral conversation anchored around one core project with follow-ups on collaboration, stakeholders, and organizational fit.
Part 1 — Deep dive on a past project:
Q1. Of all the projects you've worked on, which was the most impactful?
I walked through one of my most impactful projects, which involved coordinating across multiple teams and stakeholders.
Q2. How did you get all the teams on the same page and moving forward?
I discussed my alignment approach — shared goals, communication cadence, regular sync-ups.
Q3. Who was the most difficult stakeholder to work with, and what did you do to keep things moving?
I described a specific stakeholder, the friction points, and how I navigated the conflict.
Q4. What was the metric you were trying to move, and what was the actual business impact?
I walked through the target metric and the measured outcome.
Part 2 — Working environment & ways of working:
Q5. If you were to build an ideal work environment, what would its features be?
I outlined:
Q6. What could break this environment?
I identified:
Q7. If you're working with an entirely new team of stakeholders, how do you influence them?
I explained: find a common goal, establish a shared baseline/context before pushing ideas.
Q8. What are common obstacles you might encounter due to a lack of experience with a new team — and how do you still push forward?
I discussed challenges: lack of credibility, missing context, unclear decision-makers — and how to navigate each.
Q9. What is something you'd want to avoid in the new organization going forward?
I framed this positively as preferences rather than complaints.
Q10. What if your stakeholders' metrics are opposite to yours? How do you influence them to prioritize your metric or find common ground?
I discussed framing it as a shared trade-off rather than win/lose, finding the higher-level metric both ladder up to, quantifying trade-offs, and escalating cleanly if needed.
Part 3 — My questions to the interviewer:
I asked questions focused on Trust & Safety work, challenges, and team dynamics.
My observations:
The interviewer was clearly testing stakeholder management, influence without authority, and self-awareness — all critical for a T&S Analyst role. The pattern was to use one project as a hook and pull multiple behavioral signals from it: impact, collaboration, conflict resolution, metrics thinking.
The conversation wrapped 15 minutes early, which I couldn't definitively read — could signal efficiency or that the interviewer had what they needed early.
Verdict: Cleared
Format: Virtual
Duration: ~45 minutes (May 27th, 2026)
The interviewer opened with current role context (my work at previous company, coding languages, analytics scope), then launched into a case question followed by SQL.
Q1 — Automating detection of repeat offenders:
Scenario: A group of users displaying pro-Russia behavior in Palestine war content gets detected and removed, but they keep returning to the platform. How would you automate this?
My approach:
Q2 — Precision vs Recall trade-off:
Setup: 50 bad users + 50 good users.
My answer: Choice depends on severity of harm.
Where I got stuck: I over-explained with calculations instead of anchoring directly on the precision vs recall framing upfront.
Q3 — Follow-up: How does high-recall / low-precision hurt creators?
Legitimate creator content gets flagged → stuck in pending review, demonetized, or suppressed in distribution. This adds friction and hurts creator experience and retention.
Q4 — Follow-up: How do you improve this for trusted creators?
Build a user reputation signal: prior strikes, category history, account age, engagement quality.
For clean-history users:
Q5 — How would you automate the broader system?
Pure keyword/rule-based detection misses context. LLM can ingest the full video + metadata to build contextual understanding and classify better.
Q6 — Precision is at 50%. How do you improve it?
50% is an average — segment by category/content type.
Implement LLM-assisted human review: present the reviewer with video history, metadata snapshot, applicable policy, and suggested action. Their decisions feed back into LLM training → feedback loop → precision improves over time.
Q7 — SQL:
Two short questions:
COUNT(*) ... GROUP BY query.Example:
SELECT *
FROM users
WHERE email REGEXP '^[0-9]+[^a-zA-Z0-9][0-9]{3}@gmail\\.com$';
Verdict: Cleared
Format: Virtual
Duration: ~45 minutes (June 3rd, 2026)
The hiring manager opened with a single anchoring scenario — measuring AI-generated content on the platform — and progressively layered on harder constraints: detection → measurement → good/bad classification → technical implementation → a real-world inauthentic-behavior case → an operational incident.
This was less behavioral and more a case interview probing systems thinking, T&S judgment, and how I reason about detection at scale.
Part 1 — Detecting & measuring AI-generated video:
Scenario: You're on a team tasked with figuring out what percentage of videos on the platform are AI-generated.
Q1. How would you do it? (Two parts: detection mechanisms for identifying AI-generated content, and measurement — out of X videos, how many contain AI?)
I outlined detection approaches (classifiers, watermark detection, metadata analysis) and measurement strategies (sampling, classifier confidence thresholds).
Q1a. What are the drawbacks of the approach you proposed?
The interviewer pushed on limitations: sampling bias, classifier precision/recall drift, evolving generation methods that evade detection.
Part 2 — Good vs. bad AI videos:
Q2. Say the classifier flags 1,000 AI-generated videos. How do you detect which are "bad" vs "good"?
I asked for clarification on what "good/bad" meant. The interviewer framed it: A news channel using AI as part of a video isn't harmful; someone using it for deceptive/harmful use cases is bad.
I outlined distinguishing signals: presence of disclosure, context (news vs. impersonation), creator history.
Q2a. How do you build a long-term mechanism for tracking good vs bad over time?
I discussed historical tracking, feedback loops, and evolving policy context.
Q2b. How do you implement this technically — the actual detection mechanism for good vs bad?
I outlined policy-rule inference, creator metadata signals, and contextual heuristics.
Part 3 — Inauthentic behavior (deepfake removal at scale):
Q3. Sadhguru has come to us for the third time: ~10 deepfake videos across ~40 channels keep reposting. We've cleaned/removed content before, but this time we want to tell him we've removed all deepfake videos of him on the platform. How do you find and remove all of it? And how do you make a credible assurance that it's all gone (and stays gone)?
I discussed:
Part 4 — Operational incident (review queue surge):
Q4. You arrive one morning, and the human-review queue is 10x normal volume. How do you reduce the queue while maintaining review quality? And how do you make sure you don't fall into this trap again?
I outlined:
Part 5 — Reporting harm to stakeholders:
Q5. How do you detect AI-generated videos and produce a number for what % of AI videos on the platform are harmful — then explain that number to stakeholders?
I discussed:
My observations:
The interviewer tested end-to-end T&S systems thinking — detection, measurement, classification, technical implementation, and operations — not just behavioral fit. One scenario (AI on platform) was used as a hook, then layered with constraints to see how deep I could reason. Clarifying "good vs bad" before answering was the right move and was explicitly invited.
Pacing was progressively harder; each answer triggered a "now how about…" follow-up.
Verdict: Unknown
SQL Round: I should have anticipated edge cases before writing code, not after. On the revenue spike question, spending the first 5–10 minutes explicitly listing edge cases (seasonality, growth, single-month channels, outliers) would have made the subsequent iteration smoother and shown proactive systems thinking.
Behavioral Round: I took too long on my self-introduction in Round 1. A crisp 5–7 minute overview would have left more room for the interviewer's questions and shown respect for their time.
Case Interviews (Rounds 4 & 5): On precision/recall, I should have led with the framework ("precision vs recall is a trade-off; the choice depends on harm severity") instead of jumping to calculations. Frameworks first, details second.
When asked about "good vs bad AI," I immediately started listing detection signals. Better: ask clarifying questions first (as the interviewer invited), then structure the answer.
Define before you solve. On open-ended questions like "detect a spike" or "classify good/bad AI," spend 2–3 minutes defining the problem with the interviewer before writing code or listing signals. Trade-offs and assumptions discussed upfront make iteration smoother.
Edge cases aren't afterthoughts — they're the bulk. On the SQL round, the interviewer spent ~20 minutes of a 45-minute session on edge cases alone. Bring them up proactively; don't wait to be asked. This signals systems thinking.
Ask for clarification on ambiguous framing. On the AI/good/bad question, the interviewer explicitly invited me to ask what "good" and "bad" meant. Take that invitation. Clarity beats assumptions.
Use behavioral anchors strategically. In Round 3, one past project became the hook for ten different questions. Pick a project you can defend from multiple angles — impact, collaboration, conflict, metrics, stakeholder navigation.
Precision vs recall frameworks apply everywhere in Trust & Safety. This trade-off came up in multiple rounds in different forms (model trade-offs, queue management, incident response). Internalize it deeply and practice articulating it clearly without over-explaining.